Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in noise for character vectors in minNchar #211

Closed
ChristophLeonhardt opened this issue May 19, 2022 · 1 comment
Closed

bug in noise for character vectors in minNchar #211

ChristophLeonhardt opened this issue May 19, 2022 · 1 comment

Comments

@ChristophLeonhardt
Copy link
Contributor

With noise() you are collecting terms in the data which might be considered as noisy in certain regards. If I am not mistaken, in one scenario - minNchar - this is handled the wrong way around. There, the tokens for which the minNchar threshold does not apply (i.e. which are shorter than minNchar) are removed and tokens which are longer than or equal to minNchar are added to the return value of noise.

If I am right, this line

.Object[-terms_to_remove]

should read

.Object[terms_to_remove] 

instead.

ablaette pushed a commit that referenced this issue Aug 24, 2022
@ablaette
Copy link
Collaborator

Perfectly true. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants