-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 🎸 add concept of Resource #784
Conversation
The documentation is not available anymore as the PR was closed or merged. |
(I had an error from inside ffspec, seems like there is a name conflict - weird!)
This way we can alsp ensure that the storage directory is not None
they were released just after the app has been returned. But we want the resources to be released when the app is shut down instead. Note that as we need the resources (mainly the log, and the assets directory for the prometheus endpoint), we need to allocate them at the start of the function, instead of using the "on_startup" parameter of the Starlette constructor.
they are only available in __init__ and __post_init__. We don't want that in the Resource inherited objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! This will help a lot for the tests :)
just a comment about resources that don't have a release:
to make it clear that the resource is the access to the storage, not the assets directory in itself (which must be persisted and will not be deleted when the ressource is released). Thanks @lhoestq for the idea.
It fixes the issue raised by @lhoestq at #784 (comment). Note that it's a bit radical: all the workers now require write access to the assets storage directory, even if they will not use it. I think it's OK since I will be working very soon on #737, which will merge all the workers in only one worker.
Codecov ReportBase: 92.16% // Head: 88.95% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #784 +/- ##
==========================================
- Coverage 92.16% 88.95% -3.21%
==========================================
Files 33 70 +37
Lines 2271 2861 +590
==========================================
+ Hits 2093 2545 +452
- Misses 178 316 +138
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
MONGODB_MIGRATION_MONGO_DATABASE = "datasets_server_maintenance" | ||
MONGO_DATABASE_MONGO_URL = "mongodb://localhost:27017" | ||
MONGODB_MIGRATION_MONGO_URL = "mongodb://localhost:27017" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the defined class at line 14, It might be DATABASE_MIGRATION_MONGO_URL value
And also attribute name at line 9 could be DATABASE_MIGRATION_MONGO_DATABASE
Just to keep the consistency between NAMESPACE_ -> Class Config as it is done in other configurations classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, @severo.
There are however some things I don't understand.
- In principle, I understand a resource as something you allocate and then you release, all that within a context manager to be sure the resource is properly released even if some exception is raised.
- But some of the resources you define do not follow this design pattern
- For example, the
LogResource
: you just set a logging level but the release does nothing, the original log level is never reset. What is the point of calling this within a context manager? You could just call theinit_logging
function and would have exactly the same behavior.
- For example, the
Do you agree or is there something I am missing?
Yes, you're both right. I tried to manage the log, storage and access to the databases from the same concept of Resource, as context managers. But it does not work so well. I'll change this, thanks. |
As we don't allocate/release anythin, it does not fit inside the concept of Resource
Storage does not need to be a resource. Also: remove the unnecessary MongoConnection class
Should be good for review now @lhoestq @albertvillanova @AndreaFrancis 🤞 |
Thanks for the reviews. Merging |
* feat: 🎸 add concept of Resource * ci: 🎡 fix unit tests for the job * refactor: 💡 remove dead code * test: 💍 give more time to avoid timeout to mongodb * refactor: 💡 rename resource into resources (I had an error from inside ffspec, seems like there is a name conflict - weird!) * refactor: 💡 use InitVar to avoid mutating fields This way we can alsp ensure that the storage directory is not None * fix: 🐛 release the resources on shutdown they were released just after the app has been returned. But we want the resources to be released when the app is shut down instead. Note that as we need the resources (mainly the log, and the assets directory for the prometheus endpoint), we need to allocate them at the start of the function, instead of using the "on_startup" parameter of the Starlette constructor. * fix: 🐛 avoid using InitVar since they are hard to use they are only available in __init__ and __post_init__. We don't want that in the Resource inherited objects * test: 💍 fix parameter name * fix: 🐛 no need to call .allocate, it's done in __post_init__ * feat: 🎸 use primitive parameters, add release, add tests * style: 💄 fix style * refactor: 💡 rename AssetsDirectory to AssetsStorageAccess to make it clear that the resource is the access to the storage, not the assets directory in itself (which must be persisted and will not be deleted when the ressource is released). Thanks @lhoestq for the idea. * fix: 🐛 allocate assets storage for all the workers It fixes the issue raised by @lhoestq at huggingface/dataset-viewer#784 (comment). Note that it's a bit radical: all the workers now require write access to the assets storage directory, even if they will not use it. I think it's OK since I will be working very soon on huggingface/dataset-viewer#737, which will merge all the workers in only one worker. * refactor: 💡 change env var names for coherence fixes https://github.com/huggingface/datasets-server/pull/784/files/2301eb9e528afd4e46414f05f65e032831de6ea5#r1101583351 * fix: 🐛 missing env var * ci: 🎡 fix env var * refactor: 💡 replace LogResource with init_logging As we don't allocate/release anythin, it does not fit inside the concept of Resource * feat: 🎸 add init_assets_directory method * refactor: 💡 remove StorageResource + remove mongo.py Storage does not need to be a resource. Also: remove the unnecessary MongoConnection class * fix: 🐛 fix details to make the CI pass * test: 💍 fix tests * refactor: 💡 remove dead code * test: 💍 remove dead code * style: 💄 fix style
This long PR creates a new concept, Resource. The resource is aimed at being allocated and then released after use: connection to a database, modification of the datasets library config, or creation of storage directories...
Before this PR, it was done in the Config step. Now the Config step should never raise and should be immutable, and the Resources are created based on that config.
It makes it "easier" to understand what is failing in the tests and in the app.
GOOD LUCK to the reviewers, and sorry for the length.