Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotplot bar chart? #1283

Open
concimuscb opened this issue Jan 15, 2025 · 6 comments
Open

Dotplot bar chart? #1283

concimuscb opened this issue Jan 15, 2025 · 6 comments
Labels

Comments

@concimuscb
Copy link

Is there any way to create a chart similar to this?

Image

Bar charts are one of the most used chart types and having alternatives for them is great. Currently Let's Plot has geom_lollipop but DotPlot bars are another great alternative, especially when the measured value is an integer. Counting number of records is probably the best use case for this.

I tried creating them using geom_dotplot and geom_ydotplot but was not able to do it.

If it is not currently possible I believe it would be a great addition.

@alshan
Copy link
Collaborator

alshan commented Jan 15, 2025

I see, geom_dotplot() paints each observation as a point, and it doesn't use any kind of summary or stat.

As a workaround you can have a synthetic dataset containing "observations" according to the counts by category and then use geom_dotplot().

If we had "patterns" like in ggplot2 : Patterns and Gradients then "polka dot" pattern would also do I guess.

@alshan alshan added the * label Jan 15, 2025
@concimuscb
Copy link
Author

To be honest even after your pointers I was not able to create the chart that I wanted.

I got close but the dots were very, very small and using dotsize would just make them overlap instead of expand upwards.

@alshan
Copy link
Collaborator

alshan commented Jan 16, 2025

You can fix the overlapping by adjusting the stackratio parameter. Examples: dot_plots.ipynb.

@concimuscb
Copy link
Author

concimuscb commented Jan 17, 2025

My best two attempts, still not quite what I want.

Prepare the data

mpg = (
    pl.read_csv("../../../data/raw/external/mpg.csv")
    .filter(pl.col("class").is_in(["pickup", "suv", "subcompact", "midsize"]))
    .select("class")
    .with_columns(pl.col("class").cum_count().over("class").alias("count"))
)

Attempt with geom_dotplot

dotplot = (
    ggplot(mpg, aes(x=as_discrete("class", order_by="..count..")))
    + scale_y_continuous()
    + geom_dotplot(binwidth=1 / 60, fill="orange", color="orange")
)

Image

Issues:

  • Axis is displaying density instead of count
  • Dots are way too small

To fix the dot size issue I used dotsize but this eventually leads to the dots going out of chart bounds.

dotplot = (
    ggplot(mpg, aes(x=as_discrete("class", order_by="..count..")))
    + geom_dotplot(
        binwidth=1 / 60,
        fill="orange",
        color="orange",
        dotsize=4,
    )
)

Image

Any attempts at resizing the y axis with scale_y_continuous fails with the plot being exactly the same but with different labels:

dotplot = (
    ggplot(mpg, aes(x=as_discrete("class", order_by="..count..")))
    + scale_y_continuous(limits=[0, 1000])
    + geom_dotplot(
        binwidth=1 / 60,
        fill="orange",
        color="orange",
        dotsize=4,
    )
)

Image

Attempt with geom_ydotplot

ydotplot = (
    ggplot(mpg, aes(x="class", y="count"))
    + scale_y_continuous()
    + geom_ydotplot(binwidth=1, method="histodot")
)

Image

Issues:

  • Dots are way too small

To fix the dot size issue I used dotsize but this eventually leads to overlapping.

ydotplot = (
    ggplot(mpg, aes(x="class", y="count"))
    + scale_y_continuous()
    + geom_ydotplot(binwidth=1, method="histodot", dotsize=5)
)

Image

Use stackratio to fix overlapping:

ydotplot = (
    ggplot(mpg, aes(x="class", y="count"))
    + scale_y_continuous()
    + geom_ydotplot(binwidth=1, method="histodot", dotsize=5, stackratio=20)
)

Image

Chart seems unresponsive to it.

All of the above happens whether method is histodot or dotdensity.

I guess the addition of patterns and gradients would also help with #1199 but probably quite low on the priority list and understandably so.

@alshan
Copy link
Collaborator

alshan commented Jan 17, 2025

I see, geom_dotplot and geom_ydotplot aren't really cut for what you want to achieve.

However, this can be done using just geom_point:

categories = ['a', 'b', 'c']
counts = [6, 5, 8] 

xs_arrays = [[cat] * count for cat, count in zip(categories, counts)]
ys_arrays = [list(range(1, count + 1)) for count in counts]

data = dict(
   xs = sum(xs_arrays, []), 
   ys = sum(ys_arrays, []) 
)

ggplot(data) + geom_point(aes('xs', 'ys', color='xs'),
                          size_unit='y',              # <-- relative size
                          size=0.5
                         )

Image

The size_unit='y', here is our secret weapon that allows us to define size of points relative to 'data resolution' (i.e. 1 here) along the y-axis.

@alshan
Copy link
Collaborator

alshan commented Jan 17, 2025

Or better 0-based maybe:

xs_arrays = [[cat] + [cat] * count for cat, count in zip(categories, counts)]
ys_arrays = [[0] + list(range(1, count + 1)) for count in counts]

data = dict(
   xs = sum(xs_arrays, []), 
   ys = sum(ys_arrays, []) 
)

(ggplot(data) + geom_point(aes('xs', 'ys', color='xs'),
                          size_unit='y',              # <-- relative size
                          size=0.5
                         )
 + coord_cartesian(ylim=[0, None])
)

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants