You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>>df.to_csv('data.csv.gz') # Awfully slow>>>df.to_pickle('data.pkl.bz2') # Awfully slow>>>df.to_csv('data.csv.gz', fast=True) # Uses fast compressionlevel=1# or, better:>>>pd.options.io.compressionlevel=1>>>df.to_pickle('data.pkl.bz2') # Uses fast compressionlevel=1
...
Problem description
Compression of large objects in pandas is slow.
Popular evidence comparing compression levels for average payloads shows [1][2] that compressed size is usually far less variable than compression time, which most often spans several folds. compressionlevel=1 is orders of magnitude faster, whereas compressionlevel=9 is only 10% smaller.
One optimizes for size, the other for speed, and much fewer people ever need something in between.
Expected Output
Output of pd.show_versions()
1.1.0.dev0+786.gec7734169
The text was updated successfully, but these errors were encountered:
Code Sample, a copy-pastable example if possible
Enhancement proposal
Problem description
Compression of large objects in pandas is slow.
Popular evidence comparing compression levels for average payloads shows [1] [2] that compressed size is usually far less variable than compression time, which most often spans several folds.
compressionlevel=1
is orders of magnitude faster, whereascompressionlevel=9
is only 10% smaller.One optimizes for size, the other for speed, and much fewer people ever need something in between.
Expected Output
Output of
pd.show_versions()
1.1.0.dev0+786.gec7734169
The text was updated successfully, but these errors were encountered: