-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename references to degree of freedom from df to deg_of_free #1056
Rename references to degree of freedom from df to deg_of_free #1056
Conversation
@mainkhan if you don't mind, let's open this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open PR into dev
Hey @taylorfturner I changed the base branch. Sorry about that, I know it should be obvious but I think I got confused because in CONTRIBUTING.md the first step to create a pull request is to create a feature branch off of |
Totally understandable -- we will update that for clarity. Poor direction on our part. Thanks for the contribution! |
Head branch was pushed to by a user without write access
cfa91f1
to
58ea2cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is changing behavior of the output for end users
- we will need to be very clear in our communication about what changed and when (in PR notes)
- also will need to ensure documentation is updated properly to ensure that is up to date and matching code's current state with this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked in to the docs and could not find any information documenting explicitly what the output will be so I'm not sure we need to change any existing documentation. Perhaps just note in release notes documenting that a field in the dict has changed
dataprofiler.profilers.profiler_utils.perform_chi_squared_test_for_homogeneity(categories1: dict, sample_size1: int, categories2: dict, sample_size2: int) → dict[str, int | float | None]
Perform a Chi Squared test for homogeneity between two groups.
Parameters
categories1 (dict) – Categories and respective counts of the first group
sample_size1 (int) – Number of samples in first group
categories2 (dict) – Categories and respective counts of the second group
sample_size2 (int) – Number of samples in second group
Returns
Results of the chi squared test
Return type
dict
the documentation only specifies that we are returning a dict but does not define the dict's keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more like this area of the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a companion documentation PR #1057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
|
||
conservative_t = scipy.stats.t(conservative_df) | ||
conservative_t = scipy.stats.t(conservative_deg_of_free) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy refers to this as df
; however, I agree for clarity in this repo a rename would make the code. Its more readable / cause less confusion / less ambiguity and as a result --> easier to maintain
…lone#1056) * change references to degrees of freedom in chi2 from df to deg_of_free * reformated using black pre-commit hook
* Add null ratio to column stats (#1052) * Delay transforming priority_order into ndarray (#1045) In the changed code, we had a mypy error because numpy ndarrays are not compatible with random.Random.shuffle() (expected argument type is MutableSequence[Any]) We fix this by first instantiating priority_order as a list, then shuffling it, then creating an ndarray from it afterwards. * Rename references to degree of freedom from df to deg_of_free (#1056) * change references to degrees of freedom in chi2 from df to deg_of_free * reformated using black pre-commit hook * add_s3_connection_remote_loading_s3uri_feature (#1054) * add_s3_connection_remote_loading_s3uri_feature * pre-commit fix * created S3Helper class and refactored data_utils and unit test * enhanced test_data.py with test_read_s3_uri * enhanced unit tests and refactored is_s3_uri * refactored some unit-tests structure * rename TestCreateS3Client to TestS3Helper * fix directions for contrib branch (#1059) * Feature: Plugins (#1060) * Reservoir sampling (#826) * add code for reservoir sampling and insert sample_nrows options * pre commit fix * add tests for reservoir sampling * fixed mypy issues * fix import to relative path --------- Co-authored-by: Taylor Turner <[email protected]> Co-authored-by: Richard Bann <[email protected]> * plugins loading + preset plugin fetching implementation (#911) * test * Plugin implementation * comments added to functions * plugin test implementation for plugin presets * forgot an import * added None catch * preset plugin test * removing stuff I forgot to delete * snake_case function names * relative path * relative path * made new file for plugin testing * forgot to delete function from old file * now ive fixed if statement * ok this should be it * Plugin testing (#947) * test * plugin test implementation for plugin presets * forgot an import * added None catch * preset plugin test * snake_case function names * relative path * relative path * forgot to delete function from old file * nothing yet, just want this in two different repos * new test for plugins feature and small update to plugin init * pass * didnt want dir to be overwritten * forgot a dir * fix isort pre-commit * reservoir sample * fix imports * fix testing * fix req to match dev --------- Co-authored-by: Rushabh Vinchhi <[email protected]> Co-authored-by: Richard Bann <[email protected]> Co-authored-by: Liz Smith <[email protected]> * version bump (#1064) * empty test --------- Co-authored-by: Suprabhat Gurrala <[email protected]> Co-authored-by: Junho Lee <[email protected]> Co-authored-by: Main Uddin Khan <[email protected]> Co-authored-by: Mohammad Motamedi <[email protected]> Co-authored-by: Rushabh Vinchhi <[email protected]> Co-authored-by: Richard Bann <[email protected]> Co-authored-by: Liz Smith <[email protected]>
Fixes Issue #1030
This PR renames references to degrees of freedom in a chi-squared calculation to
deg_of_free
to not get confused with when we usedf
to refer to a data frame.I have run the unit test suite and verified that the 1155 tests are running without fail.
I have also run pre-commit on all the files without issue.