Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Staging/dev/profile serialization (capitalone#940)
* initial changes to categoricalColumn decoder (capitalone#818) * Implemented decoding for numerical stats mixin and integer profiles (capitalone#844) * hot fixes for encode and decode of numeric stats mixin and intcol profiler (capitalone#852) * Float column profiler encode decode (capitalone#854) * hot fixes for encode and decode of numeric stats mixin and intcol profiler * cleaned up type checking and updated numericstatsmixin readin helper to give type conversions to more attributes * Added docstring to the _load_stats_helper function * Update dataprofiler/profilers/numerical_column_stats.py Co-authored-by: Taylor Turner <[email protected]> * Update dataprofiler/profilers/numerical_column_stats.py * fix for nan values issue in pytesting * Implementation of float profiler encode and decode process --------- Co-authored-by: Taylor Turner <[email protected]> * Json decode date time column (capitalone#861) * more verbose error log with types for easy debug * add load_from_dict to handle tiimestamps * add json decode tests * include DateTimeColumn class * Added decoding for encoding of ordered column profiles (capitalone#864) * Added ordered col test to ensure correct response to update when different ordering of values is introduced (capitalone#868) * added decode text_column_profiler functionality and tests (capitalone#870) * Created encoder for the datalabelercolumn (capitalone#869) * feat: add test and compiler serialization (capitalone#884) * [WIP] Adds tests validating serialization with Primitive type for compiler (capitalone#885) * feat: add test and compiler serialization * fix: move primitive tests to own class * feat: add primitive col compiler save tests * fix: float serializers asserts * Adds deserialization for compilers and validates tests for Primitive; fixes numerical deserialization (capitalone#886) * feat: add test and compiler serialization * fix: move primitive tests to own class * feat: add primitive col compiler save tests * fix: float serializers asserts * feat: add tests and allow primitive compiler to deserialize * fix: bug in numeric stats deserial * fix: missing `)` after conflict resolution * Add Serialization and Deserialization Tests for Stats Compiler, plus refactors for order Typing (capitalone#887) * fix: organize categorical and add get function * refactor: reorganize tests and add stats test * feat: order typing * feat: add serial and deserial for stats compiler * fix: bug when sample_size == 0 * ready datalabeler for deserialization and improvement on serialization for datalabeler (capitalone#879) * Deserialization of datalabeler (capitalone#891) * Added initial profiler decoding for datalabeler column (WIP) * Intialial implementation for deserialization of datalabelercolumn * Fix LSP violations (capitalone#840) * Make profiler superclasses generic Makes the superclasses BaseColumnProfiler, NumericStatsMixin, and BaseCompiler generic, to avoid casting in subclass diff() methods and violating LSP in principle. * Add needed cast import --------- Co-authored-by: Junho Lee <[email protected]> * Encode Options (capitalone#875) * encode testing * encode dataLabeler testing * encode structuredOptions testing * cleaned up datalabeler test * added text options * [WIP] ColumnDataLabelerCompiler: serialize / deserialize (capitalone#888) * formatting * update formatting * setting up full test suite for DataLabelerCompiler * update isort * updates to test -- still failing * update * Quick Test update (capitalone#893) * update * string in list * formatting * Decode options (capitalone#894) * refactored options encode testing * updated test name * updated class names * fixing test * initial base option decode * inital tests * refactor: allow options to go through all (capitalone#902) * refactor: allow options to go through all * fix: bug * StructuredColProfiler Encode / Decode (capitalone#901) * refactor: allow options to go through all * fix: bug * update * update * update * updates * update * Fixes for taylors StructuredCol Issue * update * update * remove try/except --------- Co-authored-by: Jeremy Goodsitt <[email protected]> Co-authored-by: ksneab7 <[email protected]> * fix: bug and add tests for structuredcolprofiler (capitalone#904) * fix: bug and add tests * fix: limit scipy requirements till problem understood and fixed * Stuctured profiler encode decode (capitalone#903) * refactor: allow options to go through all * fix: bug in loading options * update * update * Fixes for taylors StructuredCol Issue * Created load and save code from structuredprofiler * intermidiate commit for fixing structured profile --------- Co-authored-by: Jeremy Goodsitt <[email protected]> Co-authored-by: taylorfturner <[email protected]> * [WIP] Added NoImplementationError for UnstructuredProfiler (capitalone#907) * refactor: allow options to go through all * fix: bug in loading options * update * update * Fixes for taylors StructuredCol Issue * Created load and save code from structuredprofiler * intermidiate commit for fixing structured profile * test fix * mypy fixes for typing issues * fix for none case of the datalabler in options * Added mock of datalabeler to structured profile test * Added tests for encoding of the Structured profiler * Update dataprofiler/profilers/json_decoder.py Co-authored-by: Michael Davis <[email protected]> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <[email protected]> * Update dataprofiler/profilers/profiler_options.py Co-authored-by: Michael Davis <[email protected]> * Pr fixes * Fixed typo in test * Update dataprofiler/profilers/json_decoder.py Co-authored-by: Taylor Turner <[email protected]> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <[email protected]> * Update dataprofiler/tests/profilers/utils.py Co-authored-by: Taylor Turner <[email protected]> * Update dataprofiler/profilers/profile_builder.py Co-authored-by: Michael Davis <[email protected]> * Fixes for unneeeded callout for _profile check * small change --------- Co-authored-by: Jeremy Goodsitt <[email protected]> Co-authored-by: taylorfturner <[email protected]> Co-authored-by: ksneab7 <[email protected]> Co-authored-by: ksneab7 <[email protected]> * Added testing for values for test_json_decode_after_update (capitalone#915) * Reuse passed labeler (capitalone#924) * refactor: loading labeler for reuse and abstract loading * refactor: use for DataLabelerColumn as well * fix: don't error if doesn't exist * refactor: allow for config dict to be passed entire way * fix: compiler tests * fix: structCol tests * fix: test * BaseProfiler save() for json (capitalone#923) * added save for top level and tests * small refactor * small fix * refactor: use seed for sample for consistency (capitalone#927) * refactor: use seed for sample for consistency * fix: formatting and variables * WIP top level load (capitalone#925) * quick hot fix for input validation on save() save_metho (capitalone#931) * BaseProfiler: `load_method` hotfix (capitalone#932) * added load_method * updated tests * fix: null_rep mat should calculate even if datetime (capitalone#933) * Notebook Example save/load Profile (capitalone#930) * update example data profiler demo save/load * update notebook cells * Update examples/data_profiler_demo.ipynb * Update examples/data_profiler_demo.ipynb * fix: order bug (capitalone#939) * fix: typo on rebase * fix: typing and bugs from rebase * fix: options tests due to merge and loading new options --------- Co-authored-by: Michael Davis <[email protected]> Co-authored-by: ksneab7 <[email protected]> Co-authored-by: Taylor Turner <[email protected]> Co-authored-by: Tyler <[email protected]> Co-authored-by: Junho Lee <[email protected]> Co-authored-by: ksneab7 <[email protected]>
- Loading branch information