Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Additional items for the cheat sheet #40680

Open
Dr-Irv opened this issue Mar 29, 2021 · 39 comments · May be fixed by #60658
Open

DOC: Additional items for the cheat sheet #40680

Dr-Irv opened this issue Mar 29, 2021 · 39 comments · May be fixed by #60658

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 29, 2021

Location of the documentation

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Potential Cheat Sheet Improvements

Per discussion at #39806 (review) , add a third page to the cheat sheet:

  • More visualization examples that use pandas plotting (no dependence on third party libraries)
  • List of frequently used options shown here: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#frequently-used-options
  • I/O: A whole section showing a variety of popular IO usage (CSV, Excel, SQL, HTML), and also output file formats (feather, parquet, HDF)
  • An Apply Functions section
  • The new Extension Types (String, Integer, Float) and how pd.NA works
  • Anything else that could fill up the space (if needed)
@Dr-Irv Dr-Irv added the Docs label Mar 29, 2021
@Dr-Irv Dr-Irv mentioned this issue Mar 29, 2021
4 tasks
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Apr 1, 2021
@kunal21sinha
Copy link

I am first time contributor and would like to take up this issue.
Could you assign this to me.

@Dr-Irv Dr-Irv assigned kunal21sinha and unassigned kunal21sinha Apr 8, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 8, 2021

@kunal21sinha I've assigned it to you but I know that @OliEfr is possibly working on this based on the discussion from the PR he created.

@OliEfr
Copy link
Contributor

OliEfr commented Apr 8, 2021

Hello,
yes, I was working on the cheatsheet before. However, I did not start working on this precise issue. I think you can go ahead and create something according to your thoughts @kunal21sinha. I'll probably also make some contributions later.

@kunal21sinha
Copy link

Okay sure, will work on it.

@AlexGCas
Copy link

maybe you want to add other functions of loc and iloc, iloc for reassignment of a row and loc for reassingment and append new row

@amaru-g
Copy link

amaru-g commented Jun 14, 2021

Hi, This is the very first time I am contributing. according to wiki. I should figure it out if this is is taken or not. @AlexGCas . What is the status of this?
Regards,

@rishitbhojak
Copy link

@Dr-Irv Can you let me know how can I get started with contributing to this cheatsheet. Like, how do I make changes to this cheatsheet ? I have learnt Data Analysis recently and Pandas was of great help. I can add more plotting functions from a pandas dataframe.

@OliEfr
Copy link
Contributor

OliEfr commented Jun 23, 2021

@rishitbhojak
Well, I can help you with that. Go to this folder: https://github.com/OliEfr/pandas/tree/master/doc/cheatsheet
Here you will find the powerpoints, which are used to create the pdf files. Its also useful to read the README.txt.
Besides that of course, it is useful to know how git and github works.

@rishitbhojak
Copy link

@Dr-Irv @OliEfr Can I start with pie chart plots and subplots . It seems to be a good idea because we'll add more visualization examples. What's your opinion? Can I add the third page and move forward with it?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 23, 2021

@rishitbhojak Add a third page with respect to plotting. Move the examples at the bottom of page 2 to page 3, but find something to fill the empty space on page 2 that will be created by the move of those 2 examples. Probably a list of the frequently used options would fit nicely there.

Follow the instructions for contributing to pandas here; https://pandas.pydata.org/docs/development/contributing.html

@rishitbhojak
Copy link

Alright sir. Can you let me know whether I have done it correctly or not? I am attaching a sample screenshot
sample

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 23, 2021

@rishitbhojak that's fine. Try to keep the graphics smaller. For the subplots example, ideally there would be titles on each subplot, so you should show how to do that.

@rishitbhojak
Copy link

Got it! Thank you sir

@rishitbhojak
Copy link

image
Took the visualization portion on the third page and will put some function which we use regularly in place of it. Also, I added titles to both the subplots of the pie chart

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 28, 2021

@rishitbhojak Any review will occur in a pull request, so when you are all done with your changes, create the PR, and I'll provide more feedback there.

@rishitbhojak
Copy link

rishitbhojak commented Jul 6, 2021

In which branch should I make the pull request? I am done with the plotting pie charts portion and in place of the scatter plot and the histogram, I made a section to drop a dataframe column. I will make the PR in this week

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 6, 2021

In which branch should I make the pull request? I am done with the plotting pie charts portion and in place of the scatter plot and the histogram, I made a section to drop a dataframe column. I will make the PR in this week

Create your own branch off of master. See https://pandas.pydata.org/docs/development/contributing.html#working-with-the-code

@NishitaPatnaik21
Copy link

Is this close ? or Can I contribute in this.?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Aug 19, 2021

Is this close ? or Can I contribute in this.?

There is a PR #43036 that was just submitted that I need to review. Once that is done, I will update the list above with the open items.

@KeeratKG
Copy link

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 12, 2022

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

Thanks for offering your help. I just realized from your note that I had forgotten to review #43036 . So let me get that done and merged, and then you could do additional things from there.

@KeeratKG
Copy link

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

Thanks for offering your help. I just realized from your note that I had forgotten to review #43036 . So let me get that done and merged, and then you could do additional things from there.

Absolutely! I kind of realised that #43036 might still be in the queue, so I went through its contents and made sure I wasn't repeating anything :)

@KeeratKG
Copy link

Hi. I have added a fourth page to the cheatsheet after #43036. Please have a look @Dr-Irv and let me know what you think. Will update the contribution accordingly. Refer #45347.
I wasn't sure what 'The new Extension Types (String, Integer, Float) and how pd.NA works' referred to. Tried to cover everything else.

@MichaelTiemannOSC
Copy link
Contributor

Some additional suggestions to consider for this round or the next:

  • wide_to_long: more general and more powerful than melt and important to know it exists
  • groupby box should at least mention the concept of split-apply-combine
  • groupby.first() operator: important for dealing with missing data
  • Group by: split-apply-combine (which brings in the the whole concept of operations on data). The elaboration of these concepts will easily fill half to a full page with high-quality content.
  • The confusing topic of using conditionals in pandas (what to do with a.any, a.all, etc and when they can be avoided entirely)

@geo7
Copy link

geo7 commented Mar 29, 2022

@MichaelTiemannOSC could you explain how wide_to_long is "more general" and "more powerful" than melt? As I thought the opposite was true, and the documentation (https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html) seems to support that "_ Less flexible but more user-friendly than melt_". Maybe there's something I'm unaware of though.

@MichaelTiemannOSC
Copy link
Contributor

Good question. melt is "more general" in that it can handle multilevel indexes. wide_to_long is more user-friendly in that in the single-index case, it can both reshape and rename columns, whereas melt only concerns itself with reshaping data.

@geo7
Copy link

geo7 commented Mar 29, 2022

Hrm - to be honest I've always just used melt - but wide_to_long does have a couple of things packed in that I might use other methods for, I find melt clearer - perhaps because that's what i typically use though. Sorry to derail the thread - I'll leave an example from the wide_to_long documentation with a melt version incase that's of use to anyone in future.

data

df = pd.DataFrame(
    {
        "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
        "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
        "ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
    }
)

wide_to_long

pd.wide_to_long(
    df,
    stubnames="ht",
    i=["famid", "birth"],
    j="age",
    sep="_",
    suffix=r"\w+",
)

melt

(
    df.melt(id_vars=["famid", "birth"], value_name="ht")
    .assign(age=lambda df: df["variable"].str.split("_").str[-1])
    .drop("variable", axis=1)
    .sort_values(["famid", "birth", "age"])
    .set_index(["famid", "birth", "age"])
)

@MichaelTiemannOSC
Copy link
Contributor

For the record, I use melt 90% of the time. But there's lots of financial data of the form xyz_2016, xyz_2017, ... and wide_to_long is great for that.

@suryakapurothu
Copy link

Is it open now?
@geo7 @Dr-Irv
A bit of guidance would let me add up value to the cheatsheet getting cooked.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 6, 2022

Is it open now? @geo7 @Dr-Irv A bit of guidance would let me add up value to the cheatsheet getting cooked.

We always welcome improvements to the cheatsheet, so feel free to create a pull request with your suggested changes.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@ashishbalti4
Copy link

@Dr-Irv I have worked and updated the cheat sheet in pdf format. Can I contribute here as it will be my first open-source contribution.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 16, 2022

@Dr-Irv I have worked and updated the cheat sheet in pdf format. Can I contribute here as it will be my first open-source contribution.

@ashishbalti4 You should create a pull request that includes the updated powerpoint and PDF.

@Aryabhatt1234
Copy link

Hello @Dr-Irv, I am a first time contributor. Can I take up this issue? Please allow me.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 11, 2024

Hello @Dr-Irv, I am a first time contributor. Can I take up this issue? Please allow me.

Yes, just provide your edits in a PR and I will review. The PR should update the Powerpoint and the PDF.

@rootsmusic
Copy link

I suggest specifying the minimum version of pandas that's needed.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jun 3, 2024

I suggest specifying the minimum version of pandas that's needed.

I think, but I'm not sure, that all of the examples in the cheat sheet work with version 1.x on up. If you see otherwise, let me know

@samuel-davidson
Copy link

Hi,
I am new to open source software and would love to contribute. I have added a third page to the cheatsheet and submitted a PR. Thank you!

@MichaelTiemannOSC
Copy link
Contributor

Could you link to the PR...I'd be interested. The list of suggestions I made much earlier in this thread (#40680 (comment)) is not being ticked off (though one topic--melt vs. wide_to_long was discussed in detail). split-transform-combine remains a major concept in data science and is nowhere referenced or exemplified in the Cheat Sheet.

@samuel-davidson
Copy link

samuel-davidson commented Aug 10, 2024

Here is the PR. I just started on the third page and added the frequently used options from the initial list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.