Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add null ratio to column stats #1052

Merged

Conversation

suprabhatgurrala
Copy link
Contributor

Null ratio is computed as a global stat, but not on a column-level. I've made a simple change to compute null ratio as a column stat by dividing null_count / sample_size.

Had to delete and re-create my fork repo in order to branch from dev, which is why I closed #1051

@taylorfturner taylorfturner merged commit 5f62f03 into capitalone:dev Oct 25, 2023
4 checks passed
taylorfturner pushed a commit to taylorfturner/DataProfiler that referenced this pull request Nov 13, 2023
micdavis pushed a commit that referenced this pull request Nov 13, 2023
* Add null ratio to column stats (#1052)

* Delay transforming priority_order into ndarray (#1045)

In the changed code, we had a mypy error because numpy ndarrays are not
compatible with random.Random.shuffle() (expected argument type is
MutableSequence[Any])

We fix this by first instantiating priority_order as a list, then
shuffling it, then creating an ndarray from it afterwards.

* Rename references to degree of freedom from df to deg_of_free (#1056)

* change references to degrees of freedom in chi2 from df to deg_of_free

* reformated using black pre-commit hook

* add_s3_connection_remote_loading_s3uri_feature (#1054)

* add_s3_connection_remote_loading_s3uri_feature

* pre-commit fix

* created S3Helper class and refactored data_utils and unit test

* enhanced test_data.py with test_read_s3_uri

* enhanced unit tests and refactored is_s3_uri

* refactored some unit-tests structure

* rename TestCreateS3Client to TestS3Helper

* fix directions for contrib branch (#1059)

* Feature: Plugins (#1060)

* Reservoir sampling (#826)

* add code for reservoir sampling and insert sample_nrows options

* pre commit fix

* add tests for reservoir sampling

* fixed mypy issues

* fix import to relative path

---------

Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: Richard Bann <[email protected]>

* plugins loading + preset plugin fetching implementation (#911)

* test

* Plugin implementation

* comments added to functions

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* removing stuff I forgot to delete

* snake_case function names

* relative path

* relative path

* made new file for plugin testing

* forgot to delete function from old file

* now ive fixed if statement

* ok this should be it

* Plugin testing (#947)

* test

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* snake_case function names

* relative path

* relative path

* forgot to delete function from old file

* nothing yet, just want this in two different repos

* new test for plugins feature and small update to plugin init

* pass

* didnt want dir to be overwritten

* forgot a dir

* fix isort pre-commit

* reservoir sample

* fix imports

* fix testing

* fix req to match dev

---------

Co-authored-by: Rushabh Vinchhi <[email protected]>
Co-authored-by: Richard Bann <[email protected]>
Co-authored-by: Liz Smith <[email protected]>

* version bump (#1064)

* empty test

---------

Co-authored-by: Suprabhat Gurrala <[email protected]>
Co-authored-by: Junho Lee <[email protected]>
Co-authored-by: Main Uddin Khan <[email protected]>
Co-authored-by: Mohammad Motamedi <[email protected]>
Co-authored-by: Rushabh Vinchhi <[email protected]>
Co-authored-by: Richard Bann <[email protected]>
Co-authored-by: Liz Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants