-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 🎸 make the queue agnostic to the types of jobs #608
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Before we had two collections: for splits and for first-rows jobs. Now only one collection name "jobs", with a field "type". Note that the job arguments are still restricted to dataset (required) and optionally config and split. BREAKING CHANGE: 🧨 two collections are removed and a new one is created. The function names have changed too.
49198cd
to
d6664cf
Compare
refresh... functions are now the "compute" abstract method
also: move types-requests dependency to dev dependencies.
Note: we removed apache-beam for now because of an issue with the installation It must be added again later.
e9b41c4
to
0d84eb1
Compare
it only contains the splits/ worker
beware: the docker images don't exist, we will have to update
if the tests fail, it means that a side effect occurs somewhere
we explicitely pass it as an argument, so no need to store it on the disk
now the current package version is 1.0.31, no need to build it from source.
Note that we have to manually migrate the jobs from the splits queue and the first_rows queue to the new jobs queue. To do so, in the prod mongo database:
Then relaunch them by sending a webhook: DATASETS=(datasetA datasetB datasetC ...)
for dataset in ${DATASETS[@]}; do curl -X POST https://datasets-server.huggingface.co/webhook -H 'Content-Type: application/json' -d '{"event": "update", "repo": {"type": "dataset", "name": "'$dataset'"}}'; done; Then: delete both collections |
Done with
and
|
And I deleted the old collections |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before we had two collections: for splits and for first-rows jobs. Now only one collection name "jobs", with a field "type". Note that the job arguments are still restricted to dataset (required) and optionally config and split.
BREAKING CHANGE: 🧨 two collections are removed and a new one is created. The function names have changed too.