You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw your note to the pandas-dev list and have some feedback:
I don't agree with your statement about querying data. The .query method is easier to read and understand. Using the pandas expressions can be difficult to parse when you have complex expressions and long DataFrame names, for example:
my_big_dataframe.query('order_date >= "20201001" and order_date <= "20201031" and customer == "Apple"']
In addition, "query" statements can be dynamically formatted.
If you believe otherwise, could you add text as to why you don't prefer .query ?
You might want to start using the new nullable types (String, Int64, etc.) and pd.NA in your examples
In the "column selection" section, one advantage of using something like df.column is that if you are in a notebook, you can get autocompletion, which can help with long column names. But your point that all names might not work is also correct.
Hi @Dr-Irv, thank you for reviewing the document - much appreciated 😃. Here are my replies to your comments:
Querying data: I agree that .query can be better when dataframe names are long. However, you can easily use a shorthand variable name in these circumstances e.g.
Column selection: I agree this feature is useful when using notebooks as it provides autocompletion. However, this style guide is intended for production code and so more likely written using plain python files and this advantage is lost.
Thanks for the link. I've added a section about "Avoid chained indexing" prompted by these blog posts.
To track changes inspired by your comments I have added them to a separate PR: #2 .
I saw your note to the pandas-dev list and have some feedback:
.query
method is easier to read and understand. Using the pandas expressions can be difficult to parse when you have complex expressions and longDataFrame
names, for example:versus
In addition, "query" statements can be dynamically formatted.
If you believe otherwise, could you add text as to why you don't prefer
.query
?String
,Int64
, etc.) andpd.NA
in your examplesdf.column
is that if you are in a notebook, you can get autocompletion, which can help with long column names. But your point that all names might not work is also correct.Hope this helps.
The text was updated successfully, but these errors were encountered: