Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better metadata loading for toyfiles #8

Merged
merged 7 commits into from
Apr 17, 2024
Merged

Better metadata loading for toyfiles #8

merged 7 commits into from
Apr 17, 2024

Conversation

hammannr
Copy link
Collaborator

@hammannr hammannr commented Feb 9, 2024

This pull request changes how metadata is loaded so that it's more usefull. Instead of just returning a dict for each file, we now add an entry "metadata" (plus an entry for each of the array_metadatas) to the returned results dict that is a structured array and has the same dimensions as the results.

This way you can easily relate information from the metadata to each toy experiment.

In addition, I added a short docstring and raise an fnf error in case the patterns don't match any file.

@hammannr hammannr requested review from dachengx and kdund February 9, 2024 14:38
@kdund
Copy link
Collaborator

kdund commented Feb 9, 2024

Hi, @hammannr how does this change the size of files?

@hammannr
Copy link
Collaborator Author

hammannr commented Feb 9, 2024

@kdund the files on disk don't change, this just makes the already stored metadata more accessible.
The memory usage of course increases when loading the metadata and this depends on the number of array names and number of fields in the metadata etc.
To get an idea, I get the following deep sizes of the returned dictionary (using pympler.asizeof) for a random example with 3 array names and 90_000 toys :

  • w/o metadata: 25 MB
  • w/ metadata: 33 MB

Copy link
Collaborator

@dachengx dachengx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hammannr . I fixed a small bug when numpy_array_names is not None. Let's go with it.

@dachengx dachengx merged commit 19780a3 into master Apr 17, 2024
@dachengx dachengx deleted the better_metadata branch April 17, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants