-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding marginal plots for grouped data #61
Comments
You are right, this would be useful to others as well. I unfortunately will probably not have time to look into this feature myself, but I would be happy to accept a pull request if someone wants to take the lead on this feature. |
This would be do-able...It would be a little awkward given that we currently use I actually think the API suggested by @kassambara is good. I.e., the call would look something like:
I think we'll want to require that the user specifies a color or fill mapping for the scatterplot if they also specify one for the marginal plots. We could rely on the I'll take a stab at it sometime next week. @daattali , we should think about submitting a new version to CRAN after this as well, no? |
Yep, I already emailed the authors of packages using ggextra and told them about an upcoming cran release and to check the package for any regression bugs. We're good to go for CRAN. If you're thinking to have a go at this within the next few weeks then the cran release can wait for that. API: is the idea that the user can also specify a different mapping than the one in the plot? And would using the aes() function be required? In ghplot aes is needed because without it you take the value literally rather than a mapping, would that be needed here as well? |
Yeah, let's wait until I take a stab at implementing this feature
Technically, yes, but the mapping should use the same variable. For example, this would be OK:
But we would not be supporting this:
We wouldn't have to use
But I came around on the use of |
I think if we're not actually using Would there be a technical limitation or any extra code to make something like
work? From an implementation point of view, does it matter that the grouping in the plot and in the margin is not the same? |
There would be two things that would make it awkward/more difficult if we tried to allow that:
|
Good point re: legend. Would the only allowed values be either "colour" and "fill", or would it allow any kind of mapping? And what exactly would the enforcement on the variables be - would it only allow variables that already have some mapping in the original plot? |
I think the only relevant values for this would be colour or fill...Can you think of any others? The enforcement would basically just check that the variable specified in |
If it's just colour and fill, then it feels wrong to me to have a parameter that claims to take a list of mappings when there are only 2 allowed elements. What do you think instead of one of these two options, which would be the best for end users?
Let me know your thoughts. |
My first instinct was to do a combo of choices 1 and 2, so something like:
With the reason being that, I think people will want to use different values for the alpha of the points vs the fill of the distributions. I don't have any strong feelings for whether we just have one |
Nvm, I forgot what I was planning to do for alpha, which was to just suggest that users specify it in the But we should seperate colour and fill...So either a single |
I don't follow the whole alpha thing. Why is alpha needed for the marginal plots? I think alpha should always be 1 for the marginal density/histogram. In the marginal plot, would it make sense to have mappings for both colour and fill into different variables? I don't even know what that would look like |
Alpha is needed (at least for fill) for the marginal plots because alpha = 1 will result in you not being able to see the distributions when they overlap. For example, in the example that kassambara posted, you get to see what the distributions look like across their entire support, even when there is another distribution that is overlapping. So we would want to set a default value for alpha somewhere around .5, I think. I think we should just allow one variable to be mapped to fill or colour (or potentially both)...Using two different variables in the marginal map would bring up the two issues I mentioned above (e.g., adding an extra legend). |
Right, alpha <1 definitely needed. But let's just fix it at a value, doesn't need to be customized. You're right. My second question was: would both colour AND fill be able to get a mapping? What would it look like when they both are used? |
I think we should allow users to specify the alpha level, given that it will be difficult to choose a default that looks good for all different scenarios (i.e., many vs few groups, lighter vs darker cols, etc.). Regarding your second question, that's what I thought you meant...We could potentially map a single variable to both fill and colour (but again, there would be no support for two different variables mapped to fill and colour). When you specify a fill param but no colour, the distribution(s) is outlined in black:
When you specify colour as well, the outline shares the same colour as the fill, and you only get one legend (at least for the current version of ggplot2 that I'm at):
I just checked out the case for histograms, and it fill looks pretty bad. It's too difficult to tell which bins refer to which groups:
The case for boxplot looks reasonable, though:
I think we should support all three but just suggest that the user choose |
I think that fixing the default You might have also noted that, when type = "boxplot", the color/fill variable should be used as the x axis variable in the marginal box plot. Thank you :-)! |
I'm wondering, If it wouldn't be better, if the final format of ggMarginal looks like this: # Basic usage
ggMarginal(p)
# Grouped data
# (Only) color by groups
ggMarginal(p, colourGroup = TRUE)
# or
# (Only) fill by groups
ggMarginal(p, fillGroup = TRUE, alpha = 0.5)
# or
# color and fill by groups
ggMarginal(p, colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5) Instead of this (more typing): # Basic usage
ggMarginal(p)
# Grouped data
ggMarginal(p, margMapping = list(colourGroup = TRUE))
# or
ggMarginal(p, margMapping = list(fillGroup = TRUE, alpha = 0.5))
# or
ggMarginal(p, margMapping = list(colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5)) |
@kassambara thank you for your input @crew102 and I are discussing this, and it seems like the likely API will indeed be without a list. A few more items we agreed on:
We did not settle on whether the colourGroup/fillGroup will be boolean flags or the name of a variable, though leaning towards the former. Need to ensure whatever we choose is not too restrictive and will support these scenarios:
|
@crew102 I think we left this unresolved - do you have time/would like to come back to this? |
Yeah, I've been meaning to get to it. I'll probably push something in the next 1-2 weeks. |
Closed? |
Indeed! @kassambara this exists now |
Way late to this but perhaps worthwhile - I cannot figure out how to combine the functionality of state <- fviz_pca_ind(move_pca, state1 <- ggMarginal(state, type = "density", col = "black", groupFill = TRUE)``` |
Sorry. For context even when I try to specify the data, it says the nrows are misaligned despite the ggplot object has stored data with all the data stored, including a fill variable -- so even specifying my own data, and x,y coords, and fill object throws an error which I'm not sure why:
Not sure where |
Hi Dean,
Thank you for your work in ggExtra package, which makes it really easy to add marginal plots to ggplots.
It would be highly appreciated, if one can add marginal plots for grouped data as illustrated here and here
Suggestion: Improve
ggMarginal()
so that it reacts to themapping
arguments.Best regards,
The text was updated successfully, but these errors were encountered: