Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

categorical variables are not detected #40

Closed
earnaud opened this issue Oct 30, 2019 · 9 comments
Closed

categorical variables are not detected #40

earnaud opened this issue Oct 30, 2019 · 9 comments

Comments

@earnaud
Copy link
Contributor

earnaud commented Oct 30, 2019

As reported in this pull request from the MetaShARK repository (issue #9 ), the template_categorical_variables return the following error:

Templating categorical variables ...
Warning in utils::read.table(file = paste0(path, "/", templates[i]), header = TRUE,  :
  cols = 1 != length(data) = 7
Warning in utils::read.table(file = paste0(path, "/", templates[i]), header = TRUE,  :
  cols = 1 != length(data) = 7
No categorical variables found.
No categorical variables found.
Done.

(Note that this is not an internal error in the function).
In fact, there are actually two problems:

  • the dimension of the read table (aka the attributes_*.txt files) seem to be mistaken
  • the categorical variables are not found
@earnaud
Copy link
Contributor Author

earnaud commented Oct 30, 2019

Here are views of the content of the "attributes_*.txt" files built from provided ones:
nitrogen.txt
image
decomp.csv
image

@earnaud
Copy link
Contributor Author

earnaud commented Oct 30, 2019

Additional info:

  • In the table processing, NA are replaced by "" character vectors.
  • "categorical" is recognized among "class" column from a browser() interactive session ran just before template_categorical_variables
  • values for the "unit" and "dateTimeFormatString" have been filled automatically for development purposes (thus are all identical), regardless of their meaning.

@clnsmth
Copy link
Contributor

clnsmth commented Oct 31, 2019

Thanks for reporting this issue @earnaud. The attributes template looks good, but I suspect the problem is related to the data tables it's referencing. What files do your directories at path and data.path contain?

@clnsmth
Copy link
Contributor

clnsmth commented Nov 18, 2019

Is this still an issue @earnaud?

@earnaud
Copy link
Contributor Author

earnaud commented Nov 19, 2019

The issue is not met anymore in MetaShARK.

@earnaud earnaud closed this as completed Nov 19, 2019
@earnaud earnaud reopened this Nov 18, 2020
@earnaud
Copy link
Contributor Author

earnaud commented Nov 18, 2020

I do not want to open a new issue as this is linked:

I get some variables which shall be detected as "character" which are indeed detected as "categorical". I would suggest this metric to guess the actual nature of a variable:

  • character if diversity (number of different terms) > modal occurrence (maximum occurrence of one term)
  • categorical else

which could be translated in R as:

type <- ifelse( length(unique(var)) > max(table(var)), # alternatively, mean() could be used. Any way, it is just an approximation
  "character", # high diversity
  "categorical" # high repetitions
)

@clnsmth
Copy link
Contributor

clnsmth commented Nov 18, 2020

This would be a nice Improvement @earnaud . Would you mind refactoring template_table_attributes() (beginning at line 223) and send as a pull request?

@earnaud
Copy link
Contributor Author

earnaud commented Nov 19, 2020

I will do it with the #77 review

@clnsmth
Copy link
Contributor

clnsmth commented Mar 12, 2021

Done

@clnsmth clnsmth closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants