Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on plotting categorical data #67

Open
pokrovskyy opened this issue Mar 27, 2020 · 1 comment
Open

Discussion on plotting categorical data #67

pokrovskyy opened this issue Mar 27, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@pokrovskyy
Copy link
Collaborator

pokrovskyy commented Mar 27, 2020

Suggested by clsu22 regarding the explore_feature_map() function:

This function is useful for numeric variables but seems to do nothing with categorical variables. I think you should clarify this in your function description. What's my suggestion is that you could also include categorical variables and use ANOVA test statistics or p-value to show the correlation between numeric variables and categorical variables. Also could do the chi-square test to find the correlation between two categorical variables.

The categorical data visualization is complex as it is much dependent on the type of categorical data. How do you define if it is not just some textual data? Is it sequential?

For now, I believe the best solution would be to keep this function as is and let the end user partition their data at their discretion. Then they could run pairwise feature correlation / plot on each partition individually.

@pokrovskyy pokrovskyy added the enhancement New feature or request label Mar 27, 2020
@pokrovskyy
Copy link
Collaborator Author

One idea could be to designate a list of categorical features (via function arguments) and then return an array of plots for each level / combination of levels.

One problem with that is that if there are many levels / combinations, it can take considerable amount of time to run.

Another thing is, this can be easily done by user at their discretion (looping through their subset of levels etc.) This gives even more flexibility to the user as compared to just brute-force-plotting through all the levels.

Open for your ideas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant