Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging/main/0.10.6 #1065

Merged
merged 8 commits into from
Nov 13, 2023
Merged

Conversation

taylorfturner
Copy link
Contributor

@taylorfturner taylorfturner commented Nov 13, 2023

  • All the features for 0.10.6
  • parallel documentation updates in dev-gh-pages

@taylorfturner taylorfturner added the Version Upgrade Release / version change PR label Nov 13, 2023
@taylorfturner taylorfturner self-assigned this Nov 13, 2023
@taylorfturner taylorfturner requested a review from a team as a code owner November 13, 2023 19:21
suprabhatgurrala and others added 8 commits November 13, 2023 14:26
In the changed code, we had a mypy error because numpy ndarrays are not
compatible with random.Random.shuffle() (expected argument type is
MutableSequence[Any])

We fix this by first instantiating priority_order as a list, then
shuffling it, then creating an ndarray from it afterwards.
…lone#1056)

* change references to degrees of freedom in chi2 from df to deg_of_free

* reformated using black pre-commit hook
* add_s3_connection_remote_loading_s3uri_feature

* pre-commit fix

* created S3Helper class and refactored data_utils and unit test

* enhanced test_data.py with test_read_s3_uri

* enhanced unit tests and refactored is_s3_uri

* refactored some unit-tests structure

* rename TestCreateS3Client to TestS3Helper
* Reservoir sampling (capitalone#826)

* add code for reservoir sampling and insert sample_nrows options

* pre commit fix

* add tests for reservoir sampling

* fixed mypy issues

* fix import to relative path

---------

Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: Richard Bann <[email protected]>

* plugins loading + preset plugin fetching implementation (capitalone#911)

* test

* Plugin implementation

* comments added to functions

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* removing stuff I forgot to delete

* snake_case function names

* relative path

* relative path

* made new file for plugin testing

* forgot to delete function from old file

* now ive fixed if statement

* ok this should be it

* Plugin testing (capitalone#947)

* test

* plugin test implementation for plugin presets

* forgot an import

* added None catch

* preset plugin test

* snake_case function names

* relative path

* relative path

* forgot to delete function from old file

* nothing yet, just want this in two different repos

* new test for plugins feature and small update to plugin init

* pass

* didnt want dir to be overwritten

* forgot a dir

* fix isort pre-commit

* reservoir sample

* fix imports

* fix testing

* fix req to match dev

---------

Co-authored-by: Rushabh Vinchhi <[email protected]>
Co-authored-by: Richard Bann <[email protected]>
Co-authored-by: Liz Smith <[email protected]>
@@ -53,7 +53,7 @@ For more nuanced testing runs, check out more detailed documentation [here](http
## Creating [Pull Requests](https://github.com/capitalone/DataProfiler/pulls)
Pull requests are the best way to propose changes to the codebase. We actively welcome your pull requests:

1. Fork the repo and create your branch from `main`.
1. Fork the repo and create your branch from `dev`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this after a note from a contributor --> all PRs should be into dev. PR to main only when releasing

@@ -64,6 +64,7 @@ repos:
typing-extensions>=3.10.0.2,
HLL>=2.0.3,
datasketches>=4.1.0,
boto3>=1.28.61,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required for S3 auth

@@ -110,7 +111,7 @@ repos:
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy', 'pandas',
'python-dateutil', 'pytz', 'pyarrow', 'chardet', 'fastavro',
'python-snappy', 'charset-normalizer', 'psutil', 'scipy', 'requests',
'networkx','typing-extensions', 'HLL', 'datasketches']
'networkx','typing-extensions', 'HLL', 'datasketches', 'boto3']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -65,7 +65,14 @@ def __new__(
options = dict()

if is_valid_url(input_file_path):
input_file_path = url_to_bytes(input_file_path, options)
if S3Helper.is_s3_uri(input_file_path, logger=logger):
storage_options = options.pop("storage_options", {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation updated on dev-gh-pages reflecting this now https://github.com/capitalone/DataProfiler/pull/1063/files

@@ -843,3 +847,125 @@ def url_to_bytes(url_as_string: Url, options: Dict) -> BytesIO:

stream.seek(0)
return stream


class S3Helper:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper class to group S3 authentication methods

Comment on lines -2050 to +2052
priority_order = np.array(list(range(num_labels)))
random_state.shuffle(priority_order) # type: ignore
self.priority_prediction(results, priority_order)
priority_order = list(range(num_labels))
random_state.shuffle(priority_order)
self.priority_prediction(results, np.array(priority_order))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy fixes

@@ -2,7 +2,7 @@

MAJOR = 0
MINOR = 10
MICRO = 5
MICRO = 6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version bump required

@@ -17,3 +17,4 @@ typing-extensions>=3.10.0.2
HLL>=2.0.3
datasketches>=4.1.0
packaging>=23.0
boto3>=1.28.61
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 data read dependency

@micdavis micdavis enabled auto-merge (squash) November 13, 2023 20:01
@micdavis micdavis merged commit 302a458 into capitalone:main Nov 13, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Version Upgrade Release / version change PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants