Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
### Rationale for this change Mimalloc and jemalloc can allocate a [relatively large amount of memory for the ScalarMemoTable](#40301). For this reason, the ScalarMemoTable should only be allocated when it is used (when `options.deduplicate_objects=True`). I tested this change, and for small tables it does improve memory allocation. `options.deduplicate_objects=False` After this change: 📦 Total memory allocated: 174.422MB 📊 Histogram of allocation size: min: 1.000B -------------------------------------------- < 6.000B : 3064 ▇▇ < 36.000B : 7533 ▇▇▇▇ < 222.000B : 9974 ▇▇▇▇▇ < 1.319KB : 53264 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ < 7.999KB : 5188 ▇▇▇ < 48.503KB : 742 ▇ < 294.066KB: 102 ▇ < 1.741MB : 22 ▇ < 10.556MB : 1 ▇ <=64.000MB : 1 ▇ -------------------------------------------- max: 64.000MB Before this change: 📦 Total memory allocated: 1.295GB 📊 Histogram of allocation size: min: 1.000B -------------------------------------------- < 6.000B : 3064 ▇▇ < 36.000B : 7543 ▇▇▇▇ < 222.000B : 10009 ▇▇▇▇▇ < 1.319KB : 53269 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ < 7.999KB : 5192 ▇▇▇ < 48.503KB : 761 ▇ < 294.066KB: 102 ▇ < 1.741MB : 22 ▇ < 10.556MB : 1 ▇ <=64.000MB : 19 ▇ -------------------------------------------- max: 64.000MB ### What changes are included in this PR? The allocation of `memo_table` and `unique_values` have been moved underneath an `if (options.deduplicate_objects)` block. Since they are used within a lambda, they have been changed to shared pointers, so that their values exist for the lifetime needed. ### Are these changes tested? `deduplicate_objects` has extensive existing tests: https://github.com/apache/arrow/blob/b235f83ed10bcad174b267113479a24ca045def5/python/pyarrow/tests/test_pandas.py#L3211 and https://github.com/apache/arrow/blob/b235f83ed10bcad174b267113479a24ca045def5/python/benchmarks/convert_pandas.py#L71 ### Are there any user-facing changes? Nope. * GitHub Issue: #40316 Lead-authored-by: anjakefala <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
- Loading branch information