- Add support for Python 3.12.
- ENH: added plotting kwargs to correlation_report function. #58
- FIX: fix of bin edge values they are rounded with 1e-14 #60
- FIX: numpy random multinomial requires integer number of samples (for nixOS) #73
- FIX: pandas deprecation warning #74
- Drop support for Python 3.7, has reached end of life.
- Add support for Python 3.11
- Fix missing setup.py and pyproject.toml in source distribution
- Support wheels ARM MacOS (Apple silicone)
- Two fixes to make calculation of global phik robust: global phik capped in range [0, 1], and check for successful correlation matrix inversion.
- Migration to to scikit-build 0.13.1.
- Support wheels for Python 3.10.
Phi_K contains an optional C++ extension to compute the significance matrix using the hypergeometric method (also called the`Patefield` method).
Note that the PyPi distributed wheels contain a pre-build extension for Linux, MacOS and Windows.
A manual (pip) setup will attempt to build and install the extension, if it fails it will install without the extension. If so, using the hypergeometric method without the extension will trigger a NotImplementedError.
Compiler requirements through Pybind11:
- Clang/LLVM 3.3 or newer (for Apple Xcode's clang, this is 5.0.0 or newer)
- GCC 4.8 or newer
- Microsoft Visual Studio 2015 Update 3 or newer
- Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
- Cygwin/GCC (previously tested on 2.5.1)
- NVCC (CUDA 11.0 tested in CI)
- NVIDIA PGI (20.9 tested in CI)
You can now manually set the number of parallel jobs in the evaluation of Phi_K or its statistical significance (when using MC simulations). For example, to use 4 parallel jobs do:
df.phik_matrix(njobs = 4) df.significance_matrix(njobs = 4)
The default value is -1, in which case all available cores are used. When using
njobs=1
no parallel processing is applied.Phi_K can now be calculated with an independent expectation histogram:
from phik.phik import phik_from_hist2d cols = ["mileage", "car_size"] interval_cols = ["mileage"] observed = df1[["feature1", "feature2"]].hist2d() expected = df2[["feature1", "feature2"]].hist2d() phik_value = phik_from_hist2d(observed=observed, expected=expected)
The expected histogram is taken to be (relatively) large in number of counts compared with the observed histogram.
Or can compare two (pre-binned) datasets against each other directly. Again the expected dataset is assumed to be relatively large:
from phik.phik import phik_observed_vs_expected_from_rebinned_df phik_matrix = phik_observed_vs_expected_from_rebinned_df(df1_binned, df2_binned)
Added links in the readme to the basic and advanced Phi_K tutorials on google colab.
Migrated the spark example Phi_K notebook from popmon to directly using histogrammar for histogram creation.
- Please see documentation for full details: https://phik.readthedocs.io