You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for reporting this issue and offering the example. I managed easily to reproduce the error - in both cases, irrespective of how the argument p_attribute is defined. So this is a consistent bugy behavior...
A new polmineR-version (v0.7.11.9045) addresses the issue. You can download it from the development branch of the GitHub-repo:
To explain: The cause for the error we was was that the s-attribute "party" does not have a value for all speeches. You will see this when calling ...
s_attributes("GERMAPARL")
So there are empty character vectors (""). Certainly, we can debate whether the value should rather be "parteilos" in the GermaParl-corpus. Be that as it may, the bug we see should not occur.
It has been caused by a reindexing that needs to be performed to make the method work if we throw out values by providing additional values for s-attributes: The procedure I had implemented was not robust if an empty string ("") needs to be reindexed, resulting in the error we saw to create the simple_triplet_matrix.
I now rely on the temporary creation of a factor, achieving the same result, yet more robust and (somewhat) faster than before.
I had been thinking about this before, yet the issue you report was a good incentive to finally do this. Thanks! Let me know whether everything works now.
I am getting an error when I try to convert the GERMAPARL data into a document-term matrix.
The command performs as expected when the p_attribute = "word.
temp<-polmineR::as.DocumentTermMatrix("GERMAPARL", p_attribute="word", s_attribute="parliamentary_group" , verbose=T)
This, however, does not work when I try to use this either for p_attribute = "lemma" or "pos".
For example, trying to run
`tdm <- polmineR::as.TermDocumentMatrix("GERMAPARL", p_attribute = "lemma", s_attribute = "party")'
I receive the following error message
Error in simple_triplet_matrix(i = countDT[["doc_id"]], j = countDT[["new_token_id"]], : 'nrow, ncol' invalid
The text was updated successfully, but these errors were encountered: