Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding lock mechanism to prevent on_disk_cache downloading twice #409
Adding lock mechanism to prevent on_disk_cache downloading twice #409
Changes from 25 commits
03511ed
10ef06b
b914f1d
c9aa2c5
911afe8
1ca1931
7cbc3ff
1e588b8
25df48b
bab6bff
3fc00e6
2e09f36
76c230d
9e216cc
5d8565a
7b80773
988cd3c
b8f619f
0a80ea4
c6a06d2
a604884
9002589
b115a71
755e841
b02f56f
f4c18b6
748d4fc
ffa0fa3
58c25aa
ae05b84
3a9fb53
070e292
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to verify,
len(result)
should be 1, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I'm creating one additional file inside of _slow_fn. So it would be 'downloaded' file and 'pid' file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the cached op is 1-to-n like decompression from archive, if any decompressed file is missing or has incorrect hash, we can directly
return False
and no need to check other files IMHO.There can be a chance that multiple processes are locking different decompressed files for an archive. Then, both processes will run decompression -> racing condition again.
So, I think we should lock over
data
but notfilepaths
(data
represents the compressed archive in this case). For the process that observespromise
file over data, they can directly return True.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately data could be url or something else, it is hard to lock on it.
But this situation is covered.
Imagine data generates two file namesL file1 file2.
Initial pass (empty FS) will add two locks file1.promise and file2.promise and will go 'False' route.
Now second (and every next) pass will see that files are missing, but will fail to create promise and go into the 'file exists' route, which will led them to the situation when they are waiting for file1.promise and file2.promise to disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is an URL, is it possible to create and lock
root/URL.promise
in the file system?I think we should have a similar lock forNevermindHttpReader
to prevent multiple processes from downloading the same file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
file_exists
flag is used for processes to recognize this file or parent archives are going to be processed by another process.