-
Notifications
You must be signed in to change notification settings - Fork 48
Conversation
Analysis on efficiency and usage of extension arrays in dask Issue mozilla#36
Hi @Aimaanhasan - this is a great start. Congrats on getting an analysis PR up. Now the back and forth starts :D. Some next steps:
It seems like you might be struggling to convert your columns. To say the fletcher docs are limited would be an understatement. I had to dig around in the fletcher codebase to figure this out, but given that I have now here's some pseudocode that might be useful:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see note above
Hi, @birdsarah! Thank you so much for your feedback. I am facing some issues and want to ask some questions regarding the changes.
|
Part 1
I can't debug your error without a full traceback. Part 2Yes, always be committing and pushing. Part 3I'd like to see you work on that yourself. Just think about how to present the information you have gathered carefully. |
I've just been resting this which gives some context for fletcher so I thought I'd share. https://www.dataschool.io/future-of-pandas/ The trick with dask vs pandas is to remember that dask ends up being lots of little bits of pandas but we have to let dask manage that itself. Don't get completely stuck, keep trying things and reaching out. |
Added link to the fletcher docs and gave an example for usability. Relocated the analyses for readability Issue mozilla#36
Hello @birdsarah, I've tried in many ways to convert the columns of Approach 1Used the code below to implement: `import pyarrow as pa fletcher_string_dtype = fr.FletcherDtype(pa.string()) C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\core.py in astype(self, dtype) C:\ProgramData\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, copy, errors, values, **kwargs) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs) TypeError: data type not understoodApproach 2Used the code below to implement: This gives me the following error TypeError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\core.py in setitem(self, key, value) C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\core.py in assign(self, **kwargs) TypeError: Column assignment doesn't support type FletcherDtype It will be very helpful if you can guide me here. I have tried searching the docs for the solution but failed to do it. However, Fletcher Arrays works perfectly fine with |
I'm sorry you're having struggles and it's great that you tried a bunch of options. Unfortunately this issue is about figuring out how to work with fletcher. I feel that if I start guiding further from where you are, I'll just be working on the issue myself, which is not the point. I'm going to close this PR for now. |
Analysis on efficiency and usage of extension arrays in dask
Issue #36