-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document jitter and raise exception when used on continuous axis #2236
Comments
Is it crucial that it is used on a categorical axis? Seems like it's also valid in some continuous axis cases. If values are all integer, or are quantized to some other value (e.g. NYC taxi fare values quantized to 0.25), seems like jitter can be used to reduce overplotting, same as for categorical axes? |
The fact that your example has quantized data means it is somewhat like a categorical axis. Of course, quantized data on a continuous axis is common (typically from regular sampling) but here we are considering cases where you then decide that you want to scatter overlapping samples to make them out visually. The question then is whether this sort of case is common enough to drop such validation. I do think that accidentally adding jitter to a continuous axis will be very misleading in the average case when working with continuous data. You are presenting the data as underlying data + noise and this error might not be obvious until you check the plot options. Essentially, I don't think we should allow jitter to 'corrupt' the data in the continuous case unless we really believe that data quantization + continuous axes + jitter makes sense for enough real scenarios. In general, I don't want to always have to check for the jitter plot option before deciding whether I can trust the data shown in the plot or not. |
Sounds reasonable. I guess the best way to meet those requirements would be for it to warn when used in the continuous case, with instructions for turning off the warning? |
That does sound like a reasonable approach which wouldn't disable your ability to use jitter in the continuous case entirely. I suppose this is a good use for a config option? I don't think people would often want to turn the warning off but a config flag would allow them to do that... |
Sure, a warning seems fine. It should also be easy enough to skip the warning if the data is of integer dtype, so you'd only see it when using floats. A config option to disable it seems fine although I'm also wondering whether instead of a specific option we should have a global verbosity level, which would also let us add some debugging messages throughout the code. |
That might be an idea: a more general config option for overall verbosity. Then the default level (i.e for users) would be to warn about potential problems such as this one, then there would be a level to suppress warnings (that the user would have to deliberately enable). Then there would be developer level warnings/info messages etc. |
That all sounds ok to me. |
As the title says, the
jitter
plot option forScatter
is currently undocumented and does not enforce that it is used on a categorical axis.The text was updated successfully, but these errors were encountered: