Skip to content

Commit

Permalink
Doc-modin-project#7382: Add documentation on how to use Modin Native …
Browse files Browse the repository at this point in the history
…query compiler

Signed-off-by: arunjose696 <[email protected]>
  • Loading branch information
arunjose696 committed Sep 6, 2024
1 parent cf5d638 commit 1037ca0
Showing 1 changed file with 31 additions and 0 deletions.
31 changes: 31 additions & 0 deletions docs/usage_guide/optimization_notes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,37 @@ Copy-pastable example, showing how mixing pandas and Modin DataFrames in a singl
# Possible output: TypeError
Execute dataframe operations using pandas NativeQueryCompiler
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

By default, Modin distributes every dataframe across partitions and performs operations
using the PandasQueryCompiler. However, for certain scenarios such as handling small or empty dataframes,
distributing them may introduce unnecessary overhead. In such cases, it's more efficient to default
to Pandas at the query compiler level. This can be achieved by setting the ``cfg.NativeDataframeMode``
:doc:`configuration variable: </flow/modin/config>` to "Pandas". When enabled, all operations in Modin default to Pandas, and the dataframes are not distributed,
avoiding additional overhead. This configuration can be toggled on or off depending on whether
dataframe distribution is required.

Dataframes created while the NativeDataframeMode is active will continue to use the NativeQueryCompiler
even after the config is disabled. Modin supports interoperability between distributed Modin dataframes and
those using the NativeQueryCompiler.

.. code-block:: python
import modin.pandas as pd
import modin.config as cfg
# This dataframe will be distributed and use `PandasQueryCompiler` by default
df_distributed = pd.DataFrame(...)
# Set mode to "Pandas" to avoid distribution and use `NativeQueryCompiler`
cfg.NativeDataframeMode.put("Pandas")
df_native_qc = pd.DataFrame(...)
# Revert to default settings for distributed dataframes
cfg.NativeDataframeMode.put("Default")
df_distributed = pd.DataFrame(...)
Operation-specific optimizations
""""""""""""""""""""""""""""""""

Expand Down

0 comments on commit 1037ca0

Please sign in to comment.