Releases: capitalone/DataProfiler
Releases · capitalone/DataProfiler
v0.7.6
v0.7.5
v0.7.4
v0.7.3
v0.7.2
Profiler
- Add median to numeric stats #389
- Chi square tests, added to profiler #392
- Chi square/homogeneity, median, mode, MAD differences #398, #400
Readers
Graphs
- add missing values matrix #403
- update histogram to use column indexes #404
- Add warning to user when reqs not installed #407
Bug fixes
- Fix bug in mode when disabled #388
- Update exception text for ssl_verify error #395
- ssl verify misnaming fix and consecutive spaces in csv fix #405
- fix cnn confidences not slicing data correctly #419
Other Changes
v0.7.1
Profiler
Readers
- Readers now accepts a url to a file for reading #375
- Allow text to determine encoding automatically #378
Graphs
- Graphs: Create function which accepts a profiler and creates histogram bar charts #367
Bug fixes
- Fixes bug in _get_quantiles when median case occurs #383
- Catch Divide by 0 bug for unique row ratio #384
- Make clean data function static again due to multiprocessing and model issue #385
Other Changes
v0.7.0
Profiler
- Can now take the difference between two profiles #277, #279, #282, #295, #297, #300, #301, #302, #318, #319, #324, #336, #339, #349, #355, #358, #359, #366
- Correlations can now update with new data / merge and NaNs #342
- Add text memory size to unstructured profiler #340
- Add timeit functionality to top level profilers #344, #346
- Users can now specify what is considered a null value #347
Readers
- Can now ingest StringIO and BytesIO #348, #350, #351, #352, #353, #354, #364,
- Allow internal data function calls directly from our data class #360
Runtime Changes
- Abstract NumericalStatsMixin profile for columns #337
- Added profiler min true samples error checking #365
Bug fixes
- Allow users to send in non-string value for structured labeling #343
- Profiler samples now doesn't change visual representation when passed as a list #363
Other Changes
v0.6.1
Profiler
- Options added to allow setting 'k' concerning the top k highest counts of categorical #325
- Improved CSV data streaming to accept StringIO/BytesIO #327
Runtime
- Text in Unstructured profiler now keep a count of word #321
Bug Fixes
- Fixed unalikeability bug that caused errors on datasets with only one sample #341
Other Changes
- Standardized through-put for structured testing #298
v0.6.0
Profiler
- Structured Profiler can now take in duplicate columns #315
- this is an api Change to access to the data in the report, data_stats is now a list
- Categorical Profile now includes top 5 counts #299
- Add new categorical statistics: gini impurity and unalikeability #308, #320
- Unstructured Data Labeler profile now includes entity percentages #305
- Add Pearson's correlation to the Structured Profiler #281, #307, #317
- Unstructured Profiler Text vocab now outputs a top k highest vocab counts #304, #314
Runtime Changes
- Categorical Profiler keeps an internal count of categories #296
- Text in Unstructured profiler now keep a count of vocab #304
- Data Reader's `is_match function can now take in StringIO/ByteIO #292 ,#306, #326
Bug fixes
- Bug fix to make sure samples being stored by UnstructuredProfiler save #313