Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document jitter and raise exception when used on continuous axis #2236

Open
philippjfr opened this issue Jan 3, 2018 · 7 comments
Open

Document jitter and raise exception when used on continuous axis #2236

philippjfr opened this issue Jan 3, 2018 · 7 comments

Comments

@philippjfr
Copy link
Member

As the title says, the jitter plot option for Scatter is currently undocumented and does not enforce that it is used on a categorical axis.

@jbednar
Copy link
Member

jbednar commented Jan 4, 2018

Is it crucial that it is used on a categorical axis? Seems like it's also valid in some continuous axis cases. If values are all integer, or are quantized to some other value (e.g. NYC taxi fare values quantized to 0.25), seems like jitter can be used to reduce overplotting, same as for categorical axes?

@jlstevens
Copy link
Contributor

jlstevens commented Jan 4, 2018

Seems like it's also valid in some continuous axis cases.

The fact that your example has quantized data means it is somewhat like a categorical axis. Of course, quantized data on a continuous axis is common (typically from regular sampling) but here we are considering cases where you then decide that you want to scatter overlapping samples to make them out visually.

The question then is whether this sort of case is common enough to drop such validation. I do think that accidentally adding jitter to a continuous axis will be very misleading in the average case when working with continuous data. You are presenting the data as underlying data + noise and this error might not be obvious until you check the plot options.

Essentially, I don't think we should allow jitter to 'corrupt' the data in the continuous case unless we really believe that data quantization + continuous axes + jitter makes sense for enough real scenarios. In general, I don't want to always have to check for the jitter plot option before deciding whether I can trust the data shown in the plot or not.

@jbednar
Copy link
Member

jbednar commented Jan 4, 2018

Sounds reasonable. I guess the best way to meet those requirements would be for it to warn when used in the continuous case, with instructions for turning off the warning?

@jlstevens
Copy link
Contributor

jlstevens commented Jan 4, 2018

That does sound like a reasonable approach which wouldn't disable your ability to use jitter in the continuous case entirely.

I suppose this is a good use for a config option? I don't think people would often want to turn the warning off but a config flag would allow them to do that...

@philippjfr
Copy link
Member Author

Sure, a warning seems fine. It should also be easy enough to skip the warning if the data is of integer dtype, so you'd only see it when using floats.

A config option to disable it seems fine although I'm also wondering whether instead of a specific option we should have a global verbosity level, which would also let us add some debugging messages throughout the code.

@jlstevens
Copy link
Contributor

I'm also wondering whether instead of a specific option we should have a global verbosity level, which would also let us add some debugging messages throughout the code.

That might be an idea: a more general config option for overall verbosity. Then the default level (i.e for users) would be to warn about potential problems such as this one, then there would be a level to suppress warnings (that the user would have to deliberately enable). Then there would be developer level warnings/info messages etc.

@jbednar
Copy link
Member

jbednar commented Jan 4, 2018

That all sounds ok to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants