-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: json_normalize, for basic use case #40035
Conversation
…use it for simple use cases
Just remembered forgot to update the changelog, woopsy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to simply dispatch to this if the basic case is selected?
why is ordering not preserved?
do we have sufficient asv's for this? e.g. pls add the cases you are measuring.
If possible that would be best, but am not familiar enough with pandas codebase to know where to look. Have tried looking around the
Oh it is, I need to update that part of the comment.
I do, I will add those in at the next opportunity and think of a few more cases. What would you recommend to fix the type hint issue I have?
|
Added asv, as there were none for |
asv failing, I've not written any before and my laptop is too slow to run them |
asv benchmark should be working now |
Can you run the related JSON benchmarks and post the output of them here? |
Running
gave this log: |
so net effect is this
? |
this doesnt' seem to match your results. |
Can you please explain a bit more on how you are comparing them. The test is the same, but asv sets up its own environment which surely would change the times. |
if you added an asv then we could see this doesn't thatch your timings from the top (the ratio not the absolute time) |
Assuming asv is comparing my forked master to main master then my addition in asv might be wrong. |
Found the issue, my checking of the parameters was incorrect. Reran the benchmark.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look pretty good. can you add a whatsnew note in the 1.3 Perf section.
…ter check explicit, moved nested functions to module level
Have made the whatsnew note and the changes you advised. |
…proved type hints
thanks @smpurkis very nice! |
Proposed speed up for very simple use cases using the
pd.json_normalize
function. E.g.pd.json_normalize(data)
The speed up can be seen in this example:
With output from Pandas 1.2.2:
pandas time taken for a 100,000 rows: 3.0518009662628174 seconds
From this branch:
pandas time taken for a 100,000 rows: 0.632451057434082 seconds
To show tests pass for the appropriate file, ran
pytest pandas/tests/io/json/test_normalize.py -v
test_normalize.py_pytest.logTo show pre-commit passed on file, ran
pre-commit run --files pandas/io/json/_normalize.py
_normalize.py_pre-commit.logThere was one code check that was caught, running
./ci/code_checks.sh
code_checks.logAs it a type hint issue, decided to still make pull request. Can you advice on how to fix, I'm fairly new to type hints?
Kind regards,
Sam