Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors with partition_bundle from context-objects #58

Closed
mxi-hug opened this issue Dec 12, 2018 · 2 comments
Closed

errors with partition_bundle from context-objects #58

mxi-hug opened this issue Dec 12, 2018 · 2 comments

Comments

@mxi-hug
Copy link

mxi-hug commented Dec 12, 2018

I'm not entirely sure, what the internal problem is here, but the partition_bundle-method on context-objects returns a strange partition_bundle-object. As a result, all functions applied to the new pb-object fail (merge(), features(), as.DocumentTermMatrix()).

First hint at the problem: str(pb_a) returns the warning

Warnung in str.default(obj, ...)
'str.default': 'le' is NA, also als 0 betrachtet

Minimal Example

cont_a <- context("GERMAPARL", query = 'Teilzeitzwangsgesetz', cqp = F)

... getting corpus positions
... number of hits: 1
... checking that all p-attributes are available
... getting token id for p-attribute: word
... generating contexts
... counting tokens

pb_a <- partition_bundle(cont_a)

|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s
Warnung in str.default(obj, ...)
'str.default': 'le' is NA, so taken as 0

pb_a <- enrich(pb_a, p_attribute = "word", progress = TRUE)

|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s
Warnung in str.default(obj, ...)
'str.default': 'le' is NA, so taken as 0

dtm <- polmineR::as.DocumentTermMatrix(pb_a, col = "count")

... using the p_attribute-slot of the first object in the bundle as p_attribute: word
... generating (temporary) key column
... generating cumulated data.table
... getting unique keys
... generating integer keys
Fehler in simple_triplet_matrix(i = unname(i), j = DT[["j"]], v = DT[[col]], :
'i, j' invalid

@mxi-hug
Copy link
Author

mxi-hug commented Dec 12, 2018

(And yes, i'm aware that a context, let alone a partition-bundle on a single hit is rather useless. I just wanted to exclude overlapping windows etc as a source for the error)

@PolMine
Copy link
Collaborator

PolMine commented Dec 12, 2018

Dear Max,

there are two kinds of issues we have (had) here: The first warning arises from missing sizes in the partition objects that are generated. That's fixed. The second one is that generating a TermDocumentMatrix requires names of the partitions in a partition_bundle to be present. The solution I chose now is to assign (increasing) integer numbers as names to the objects in a partition_bundle, if not present.

Should work now, with the next push to the dev branch.

Kind regards
Andreas

@PolMine PolMine closed this as completed Dec 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant