-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging/main/0.10.6 #1065
Staging/main/0.10.6 #1065
Conversation
In the changed code, we had a mypy error because numpy ndarrays are not compatible with random.Random.shuffle() (expected argument type is MutableSequence[Any]) We fix this by first instantiating priority_order as a list, then shuffling it, then creating an ndarray from it afterwards.
…lone#1056) * change references to degrees of freedom in chi2 from df to deg_of_free * reformated using black pre-commit hook
* add_s3_connection_remote_loading_s3uri_feature * pre-commit fix * created S3Helper class and refactored data_utils and unit test * enhanced test_data.py with test_read_s3_uri * enhanced unit tests and refactored is_s3_uri * refactored some unit-tests structure * rename TestCreateS3Client to TestS3Helper
* Reservoir sampling (capitalone#826) * add code for reservoir sampling and insert sample_nrows options * pre commit fix * add tests for reservoir sampling * fixed mypy issues * fix import to relative path --------- Co-authored-by: Taylor Turner <[email protected]> Co-authored-by: Richard Bann <[email protected]> * plugins loading + preset plugin fetching implementation (capitalone#911) * test * Plugin implementation * comments added to functions * plugin test implementation for plugin presets * forgot an import * added None catch * preset plugin test * removing stuff I forgot to delete * snake_case function names * relative path * relative path * made new file for plugin testing * forgot to delete function from old file * now ive fixed if statement * ok this should be it * Plugin testing (capitalone#947) * test * plugin test implementation for plugin presets * forgot an import * added None catch * preset plugin test * snake_case function names * relative path * relative path * forgot to delete function from old file * nothing yet, just want this in two different repos * new test for plugins feature and small update to plugin init * pass * didnt want dir to be overwritten * forgot a dir * fix isort pre-commit * reservoir sample * fix imports * fix testing * fix req to match dev --------- Co-authored-by: Rushabh Vinchhi <[email protected]> Co-authored-by: Richard Bann <[email protected]> Co-authored-by: Liz Smith <[email protected]>
bd448f8
to
e3daf19
Compare
@@ -53,7 +53,7 @@ For more nuanced testing runs, check out more detailed documentation [here](http | |||
## Creating [Pull Requests](https://github.com/capitalone/DataProfiler/pulls) | |||
Pull requests are the best way to propose changes to the codebase. We actively welcome your pull requests: | |||
|
|||
1. Fork the repo and create your branch from `main`. | |||
1. Fork the repo and create your branch from `dev`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this after a note from a contributor --> all PRs should be into dev
. PR to main
only when releasing
@@ -64,6 +64,7 @@ repos: | |||
typing-extensions>=3.10.0.2, | |||
HLL>=2.0.3, | |||
datasketches>=4.1.0, | |||
boto3>=1.28.61, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required for S3 auth
@@ -110,7 +111,7 @@ repos: | |||
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy', 'pandas', | |||
'python-dateutil', 'pytz', 'pyarrow', 'chardet', 'fastavro', | |||
'python-snappy', 'charset-normalizer', 'psutil', 'scipy', 'requests', | |||
'networkx','typing-extensions', 'HLL', 'datasketches'] | |||
'networkx','typing-extensions', 'HLL', 'datasketches', 'boto3'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -65,7 +65,14 @@ def __new__( | |||
options = dict() | |||
|
|||
if is_valid_url(input_file_path): | |||
input_file_path = url_to_bytes(input_file_path, options) | |||
if S3Helper.is_s3_uri(input_file_path, logger=logger): | |||
storage_options = options.pop("storage_options", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation updated on dev-gh-pages
reflecting this now https://github.com/capitalone/DataProfiler/pull/1063/files
@@ -843,3 +847,125 @@ def url_to_bytes(url_as_string: Url, options: Dict) -> BytesIO: | |||
|
|||
stream.seek(0) | |||
return stream | |||
|
|||
|
|||
class S3Helper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helper class to group S3 authentication methods
priority_order = np.array(list(range(num_labels))) | ||
random_state.shuffle(priority_order) # type: ignore | ||
self.priority_prediction(results, priority_order) | ||
priority_order = list(range(num_labels)) | ||
random_state.shuffle(priority_order) | ||
self.priority_prediction(results, np.array(priority_order)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy
fixes
@@ -2,7 +2,7 @@ | |||
|
|||
MAJOR = 0 | |||
MINOR = 10 | |||
MICRO = 5 | |||
MICRO = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version bump required
@@ -17,3 +17,4 @@ typing-extensions>=3.10.0.2 | |||
HLL>=2.0.3 | |||
datasketches>=4.1.0 | |||
packaging>=23.0 | |||
boto3>=1.28.61 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3 data read dependency
dev-gh-pages