-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Remove WordSplitter #3345
Comments
Can I do the following changes?
On a side note can we can get replace the default spacy tokenizer with something that does only tokenization faster (like https://github.com/microsoft/BlingFire). But we do need spacy for pos tagging for POS indexer. |
Yes, you can do those. This will break sniff tests, though, as it will change config file requirements. A few thoughts:
|
Well, spacy should probably stay the default, so we don't break a whole lot of existing saved models. |
|
|
Another candidate idea for simplifying API stuff for a 1.0 release: remove the whole notion of
WordSplitter
, and just call everythingTokenizers
. Basically no one uses the extra stuff that the WordTokenizer has, and it just adds a level of indirection that's unnecessary. If you want the extra filtering and stemming, implement it in a standaloneTokenizer
.The text was updated successfully, but these errors were encountered: