-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient workflow to decode sentences of corpus #176
Comments
A new polmineR version on the dev branch (v0.8.5.9011) now includes an implementation of the first option that occurred to me. Generally speaking, the implementation is much faster than the original approach and performance is satisfactory. The bottleneck is a |
This is a fine solution. However it does not yet work with subcorpora. It splits the entire corpus even if you do insert a |
Good point. I now implemented a x <- corpus("GERMAPARL_PARLACLARIN_III") %>%
subset(year = "2017") %>%
regions(s_attribute = "s") %>%
get_token_stream(split = TRUE) Performance is ok, I think. |
Getting a list of sentences would be possible by something like
corpus() %>% split() %>% get_token_stream()
. This may not be very efficient. A nice and efficient workflow might work as follows:Or, alternatively:
The text was updated successfully, but these errors were encountered: