Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/open data store #893

Merged
merged 8 commits into from
Dec 11, 2023

Conversation

kbuma
Copy link
Contributor

@kbuma kbuma commented Nov 22, 2023

Summary

Major changes:

  • add S3IndexStore: a store that loads the index of the collection from an S3 file.
  • add OpenDataStore: a store whose data is stored on S3 compatible storage using the format used by Materials Project on OpenData. It utilizes an index that is loaded from S3 compatible storage into memory (S3IndexStore).
  • refactor of S3Store to make it extensible, ie allow OpenDataStore to build on top of it

Checklist

  • Google format doc strings added.
  • Code linted with ruff. (For guidance in fixing rule violates, see rule list)
  • Type annotations included. Check with mypy.
  • Tests added for new features/fixes.
  • I have run the tests locally and they passed.

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (f792dd8) 88.16% compared to head (85e5a5d) 88.32%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/maggma/stores/open_data.py 91.59% 10 Missing ⚠️
src/maggma/stores/aws.py 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #893      +/-   ##
==========================================
+ Coverage   88.16%   88.32%   +0.16%     
==========================================
  Files          45       46       +1     
  Lines        3633     3760     +127     
==========================================
+ Hits         3203     3321     +118     
- Misses        430      439       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.



def test_eq(memstore, s3store):
assert s3store == s3store

Check warning

Code scanning / CodeQL

Comparison of identical values

Comparison of identical values; use cmath.isnan() if testing for not-a-number.
conn.create_bucket(Bucket="bucket1")
index = MemoryStore("index", key=1)
with pytest.raises(AssertionError, match=r"Since we are.*"):
store = OpenDataStore(index=index, bucket="bucket1", s3_workers=4, key=1)

Check warning

Code scanning / CodeQL

Variable defined multiple times

This assignment to 'store' is unnecessary as it is [redefined](1) before this value is used.
with mock_s3():
tic = datetime(2018, 4, 12, 16)
tic2 = datetime.utcnow()
conn = boto3.client("s3")

Check warning

Code scanning / CodeQL

Variable defined multiple times

This assignment to 'conn' is unnecessary as it is [redefined](1) before this value is used.


def test_index_eq(memstore, s3indexstore):
assert s3indexstore == s3indexstore

Check warning

Code scanning / CodeQL

Comparison of identical values

Comparison of identical values; use cmath.isnan() if testing for not-a-number.
@munrojm
Copy link
Member

munrojm commented Dec 11, 2023

@kbuma just went through this in detail. Everything is looking fantastic. I like the structure, and don't really have any specific comments other than to get your input on making the type of compression more configurable for the user. I'm thinking there may be some value in having the compression/decompression callables and header passed in the init, but with the same gzip defaults. Don't have a strong opinion though, so happy to merge whenever you feel good.

@kbuma
Copy link
Contributor Author

kbuma commented Dec 11, 2023

@munrojm I'd like to hold off on making the compression more configurable unless there's an existing use case for it. If there is one let's discuss and add the feature. Otherwise, I'm good to merge.

@munrojm munrojm merged commit 777bb9f into materialsproject:main Dec 11, 2023
@kbuma kbuma deleted the enhancement/open-data-store branch September 9, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants