Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename references to degree of freedom from df to deg_of_free #1056

Merged
merged 3 commits into from
Oct 30, 2023

Conversation

mainkhan
Copy link
Contributor

@mainkhan mainkhan commented Oct 27, 2023

Fixes Issue #1030

This PR renames references to degrees of freedom in a chi-squared calculation to deg_of_free to not get confused with when we use df to refer to a data frame.

I have run the unit test suite and verified that the 1155 tests are running without fail.
I have also run pre-commit on all the files without issue.

@mainkhan mainkhan requested a review from a team as a code owner October 27, 2023 03:45
@CLAassistant
Copy link

CLAassistant commented Oct 27, 2023

CLA assistant check
All committers have signed the CLA.

@mainkhan mainkhan changed the title degree of freedom rename Rename references to degree of freedom from df to deg_of_free Oct 27, 2023
@mainkhan mainkhan changed the base branch from dev to main October 27, 2023 04:31
@taylorfturner
Copy link
Contributor

@mainkhan if you don't mind, let's open this PR into dev and not main. Thanks for the contribution!

Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open PR into dev

@mainkhan mainkhan changed the base branch from main to dev October 27, 2023 13:14
@mainkhan
Copy link
Contributor Author

Hey @taylorfturner I changed the base branch. Sorry about that, I know it should be obvious but I think I got confused because in CONTRIBUTING.md the first step to create a pull request is to create a feature branch off of main

@taylorfturner
Copy link
Contributor

Hey @taylorfturner I changed the base branch. Sorry about that, I know it should be obvious but I think I got confused because in CONTRIBUTING.md the first step to create a pull request is to create a feature branch off of main

Totally understandable -- we will update that for clarity. Poor direction on our part. Thanks for the contribution!

@taylorfturner taylorfturner added good_first_issue Label for a more simple / introductory issue and removed do_not_merge labels Oct 27, 2023
@taylorfturner taylorfturner enabled auto-merge (squash) October 27, 2023 13:46
auto-merge was automatically disabled October 27, 2023 14:30

Head branch was pushed to by a user without write access

@mainkhan mainkhan force-pushed the issue/degree-of-freedom-rename branch from cfa91f1 to 58ea2cc Compare October 27, 2023 14:30
@taylorfturner taylorfturner enabled auto-merge (squash) October 27, 2023 14:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is changing behavior of the output for end users

  • we will need to be very clear in our communication about what changed and when (in PR notes)
  • also will need to ensure documentation is updated properly to ensure that is up to date and matching code's current state with this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mainkhan my one thought is that we should

  • research needed documentation changes here
  • if any needed changes open up a PR into dev-gh-pages in parallel to the code change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked in to the docs and could not find any information documenting explicitly what the output will be so I'm not sure we need to change any existing documentation. Perhaps just note in release notes documenting that a field in the dict has changed

I see here: https://capitalone.github.io/DataProfiler/docs/0.10.5/html/dataprofiler.dp_logging.html?highlight=chi#dataprofiler.dp_logging.get_child_logger

dataprofiler.profilers.profiler_utils.perform_chi_squared_test_for_homogeneity(categories1: dict, sample_size1: int, categories2: dict, sample_size2: int) → dict[str, int | float | None]

    Perform a Chi Squared test for homogeneity between two groups.

    Parameters

            categories1 (dict) – Categories and respective counts of the first group

            sample_size1 (int) – Number of samples in first group

            categories2 (dict) – Categories and respective counts of the second group

            sample_size2 (int) – Number of samples in second group

    Returns

        Results of the chi squared test
    Return type

        dict

the documentation only specifies that we are returning a dict but does not define the dict's keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like this area of the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a companion documentation PR #1057

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome


conservative_t = scipy.stats.t(conservative_df)
conservative_t = scipy.stats.t(conservative_deg_of_free)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy refers to this as df; however, I agree for clarity in this repo a rename would make the code. Its more readable / cause less confusion / less ambiguity and as a result --> easier to maintain

@taylorfturner taylorfturner merged commit e5773a6 into capitalone:dev Oct 30, 2023
4 checks passed
@taylorfturner taylorfturner added the Refactor Code that is being modified to improve the library label Oct 30, 2023
taylorfturner pushed a commit to taylorfturner/DataProfiler that referenced this pull request Nov 13, 2023
…lone#1056)

* change references to degrees of freedom in chi2 from df to deg_of_free

* reformated using black pre-commit hook
micdavis pushed a commit that referenced this pull request Nov 13, 2023
* Add null ratio to column stats (#1052)

* Delay transforming priority_order into ndarray (#1045)

In the changed code, we had a mypy error because numpy ndarrays are not
compatible with random.Random.shuffle() (expected argument type is
MutableSequence[Any])

We fix this by first instantiating priority_order as a list, then
shuffling it, then creating an ndarray from it afterwards.

* Rename references to degree of freedom from df to deg_of_free (#1056)

* change references to degrees of freedom in chi2 from df to deg_of_free

* reformated using black pre-commit hook

* add_s3_connection_remote_loading_s3uri_feature (#1054)

* add_s3_connection_remote_loading_s3uri_feature

* pre-commit fix

* created S3Helper class and refactored data_utils and unit test

* enhanced test_data.py with test_read_s3_uri

* enhanced unit tests and refactored is_s3_uri

* refactored some unit-tests structure

* rename TestCreateS3Client to TestS3Helper

* fix directions for contrib branch (#1059)

* Feature: Plugins (#1060)

* Reservoir sampling (#826)

* add code for reservoir sampling and insert sample_nrows options

* pre commit fix

* add tests for reservoir sampling

* fixed mypy issues

* fix import to relative path

---------

Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: Richard Bann <[email protected]>

* plugins loading + preset plugin fetching implementation (#911)

* test

* Plugin implementation

* comments added to functions

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* removing stuff I forgot to delete

* snake_case function names

* relative path

* relative path

* made new file for plugin testing

* forgot to delete function from old file

* now ive fixed if statement

* ok this should be it

* Plugin testing (#947)

* test

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* snake_case function names

* relative path

* relative path

* forgot to delete function from old file

* nothing yet, just want this in two different repos

* new test for plugins feature and small update to plugin init

* pass

* didnt want dir to be overwritten

* forgot a dir

* fix isort pre-commit

* reservoir sample

* fix imports

* fix testing

* fix req to match dev

---------

Co-authored-by: Rushabh Vinchhi <[email protected]>
Co-authored-by: Richard Bann <[email protected]>
Co-authored-by: Liz Smith <[email protected]>

* version bump (#1064)

* empty test

---------

Co-authored-by: Suprabhat Gurrala <[email protected]>
Co-authored-by: Junho Lee <[email protected]>
Co-authored-by: Main Uddin Khan <[email protected]>
Co-authored-by: Mohammad Motamedi <[email protected]>
Co-authored-by: Rushabh Vinchhi <[email protected]>
Co-authored-by: Richard Bann <[email protected]>
Co-authored-by: Liz Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good_first_issue Label for a more simple / introductory issue Refactor Code that is being modified to improve the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants