Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggMarginal does not apply scale transforms from scatter plot to marginal plots #81

Closed
crew102 opened this issue Nov 8, 2017 · 6 comments

Comments

@crew102
Copy link
Contributor

crew102 commented Nov 8, 2017

ggmarginal isn't applying scale transforms (e.g., limits, scale_reverse, scale_log, etc.) to the scales of the marginal histograms.

Example: When points are excluded from the scatter plot due to limits on the plot's range, the marginal plots do not reflect the in-range data. This occurs only for the opposite margin for which the limits are set (e.g., if you set the limits of the x-axis, the y marginal plot will not be correct.

library(devtools)
library(withr)

with_temp_libpaths({
  
  # Get ggExtra version pre-refactor
  install_github("daattali/ggExtra", ref = "863b870")
  
  # Get ggplot2 2.2.0 so code runs
  install_version("ggplot2", "2.2.0")
  
  library(ggplot2)
  library(ggExtra)
  
  p <- ggplot(data = mtcars) +
    geom_point(aes(wt, mpg)) 
  
  marg_p_x <- ggMarginal(p = p + xlim(c(0, 2)))
  
  marg_p_y <- ggMarginal(p = p + ylim(c(25, 35)))

})

marg_p_x gives us a y density plot that appears to reflect the entirety of the data, as opposed to just the four points in range.

marg_p_x

marg_p_y has a similar problem, except it occurs for the x marginal:

marg_p_y

@daattali
Copy link
Owner

daattali commented Nov 8, 2017

Interesting. I wonder how long this has been the case (always?)

I'm really glad the testing framework works well now on Travis so that whenever problems are fixed they won't pop up again

@crew102
Copy link
Contributor Author

crew102 commented Nov 8, 2017

Tough to say for how long this has been happening. I assume it's hard to tell that anything is going on for most cases (i.e., when limits do not drastically change the set of rendered points)...For sure it will be nice to rely on tests moving forward, esp. given potential changes in ggplot2 internals and my tendency to refactor in a regression or two!

@crew102 crew102 mentioned this issue Dec 27, 2017
@crew102 crew102 changed the title ggMarginal has a hard time getting things right when points are excluded from the scatter plot via range limits ggMarginal does not apply scale transforms from scatter plot to marginal plots Feb 23, 2018
@crew102
Copy link
Contributor Author

crew102 commented Feb 27, 2018

I'm close to having a solution for this, but I've got one issue that I can't seem to figure out. Hoping you can help. The issue is related to the one discussed at tidyverse/ggplot2#1651 and https://stackoverflow.com/questions/37876096/geom-histogram-wrong-bins, but I'm still not seeing a solution.

In short, the problem is that the right-most bin present in marginal plot histograms are getting excluded from the marginal plots when I add a range limit. For example, let's say I have the following plot:

library(ggplot2)

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)

p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(boundary = min_wt)
p

rplot

Now when I add xlim(), I lose the bin on the far right even though that bin should still be within the plot limits:

p + xlim(c(min_wt, max_wt))

rplot2

Here's where I think the problem is coming from:

# The value of the right edge of the right-most bin, according to ggplot:
xmax_max <- max(ggplot_build(p)$data[[1]]$xmax) 
xmax_max # 5.4240000

# The max value of the data:
max_wt # 5.24

# The issue:
xmax_max > max_wt # TRUE, so we lose that last bin

I could go with the workaround solution suggested in that SO answer (e.g., multiplying max_wt by something like 1 + 100000000000 * .Machine$double.eps) but I'd prefer not to. Any ideas?

@daattali
Copy link
Owner

daattali commented Mar 7, 2018

I've looked into this a lot this week and haven't gotten anywhere further than you have. Is this still a known issue in ggplot? Because the issue on their repo was closed with a commit in Sept 2016, so do you think they don't know about this issue?

@crew102
Copy link
Contributor Author

crew102 commented Mar 10, 2018

Hey Dean, I'm not sure whether this is an issue with ggplot or whether it's my lack of understanding of geom_histogram(). I've posted a question on SO to hopefully get to the bottom of it https://stackoverflow.com/questions/49204576/values-getting-dropped-from-ggplot2-histogram-when-specifying-limits

This was referenced Mar 12, 2018
@daattali
Copy link
Owner

fixed by @crew102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants