-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List full path of wheel files in cache #8474
Conversation
147e3d0
to
9eb49f2
Compare
I would really appreciate if you could take a look at this, and provide further reviews, since I have built this on top of your older PR. @pradyunsg , I would really appreciate your thought as well on this. |
I like this as a concept, but the interface makes me wonder whether we’d be heading toward inventing a query language via flags. We should probably design this in a more composible way, like maybe generating a JSON output for
|
So if I understand correctly, we are letting the output from This removes the burden of maintaining the logic of But would such tools be available for windows users who want to use this feature? Also I am not sure of the learning curve of I would also wait for other maintainers to chime in before updating the PR to follow that approach. Edit: Seems like @pfmoore already agrees to it. |
Yeah, having recipes sounds like a great idea, since it is indeed non-trivial to learn either tools. Windows is not a problem for
|
Thanks, should I go ahead and make the changes according to your suggestions @uranusjr or should I wait for more opinions on the same? |
As a Windows user, I'd confirm that they are, although I concede that they are not as much part of the "usual culture" for Windows. I consider that a failing of Windows users, not of Windows or of the proposal, though 🙂 Also, I'd like to see more examples of why people need this functionality, with a particular emphasis on the question of "are we likely to see people wanting other means of selecting things to purge?" I note that the motivating use case given in #8355 didn't actually need date-based removal - it was driven simply by "my cache is taking up too much space". The idea of pruning by date was not justified - presumably "reduce the space used by the cache by xx%" or "remove big files" would just as well have solved the original problem. |
Hi! I agree that for less common use-cases that this approach would be best. However I think "clear old things out of a cache" should be treated as a common use-case, and as such handled by pip directly. I would posit that that the next most essential thing for caches after correctness/performance is handling cache expiration, and so it's not really an optional feature. |
On Heroku, the use case is preventing unbounded growth of the pip cache, since (if we used it,) it would be stored in the app build cache and so be fetched/uploaded to S3 before/after every build. One alternative to expiring cache based on access time that would also work for us, is the ability to purge anything from the cache that isn't currently installed (ie reported by |
Seems like from the discussions above, the scope of this PR can be narrowed to provide The side-effect of that would automatically include clearing old items out of the cache, the old items being provided by the This ensures that the different possibilities of removing wheels from cache is not baked into pip, but is user-provided and driven, which we can suggest by the means of |
I agree, cache expiration makes sense, but what I'm trying to understand is why date-based is the only policy we should support from pip. Also, if we are going down this route, then why isn't the argument that as pip is maintaining a cache, it's up to pip to automatically handle expiration, with no user intervention at all?
So wouldn't size-based expiration (remove big files) be more useful? If someone installs a huge package like tensorflow, then realises that wasn't what they meant, why wait 2 weeks to reclaim that space? My concern here is that we either don't lock users into one possible policy, or conversely commit pip to supporting all sorts of options (as @uranusjr said, build a query language in options). Deleting outdated or large files from a filesystem location is a pretty common task, I'm not clear why we need pip to reinvent that wheel (sorry for the pun!). Why can't this simply be built on top of |
... and see? That's when we start getting scope creep 🙂 I would have thought that it's possible to build something like this relatively easily using |
I’m starting to think what we ought to do is to simply have a flag to output the full path of each cache entry (e.g. |
I agree, using the full path will give us the caller to do anything with those paths, including deleting them from cache via It should be a trivial change to provide that and write tests for that in this PR, so I am more than happy to do it if folks agree :) |
Yeah fair point :-) I agree it's likely not worth trying to satisfy every use case, and perhaps those that are related to file-system stats (like last accessed) are best handled outside of pip. However for those that are very much pip domain specific ("which of these packages aren't currently installed?") it seems a bit clunky to have to script multiple pip commands/manipulate output. Even if more advanced cache handling support was a wontfix, I wonder if there are a few small more-generic changes that could simplify implementations - for example if |
Good point - I think pip handling this automatically would be ideal.
Thinking about this more, for the CI/CD use-case (where incremental build performance is the priority) I believe the preferred solution would actually be purging anything unused immediately, since unless there was a regression/revert a package-version will likely never be used again after its initial removal or upgrade. |
Maybe. That's where we need actual real-world use cases, rather than "maybe it would help if" suggestions (not criticising what you're saying here, just pointing out that we need to justify the implementation and maintenance cost with real, demonstrated benefits - the pip developer team is way too small to keep adding features that don't turn out to get much real-world use). |
Hi @uranusjr , In terms of your suggestion at #8474 (comment) , are we looking at something like this?
and
|
Outputting all the paths so the user can delete them is not a great solution. Not all users have rm, since some are on Windows. When I was looking for functionality like this in the help I was looking for words like remove or delete, I shouldn't have to care or know where the directory is. Now, the user has to write a different script on Windows and everything else. |
From the comments and +1s on this PR and the issue, seems like @uranusjr , @pfmoore and @pradyunsg are against introducing a special flag And @xavfernandez and @stuaxo , (the original ticket creator) is against that change and wants pip to handle this, since LRU cache is a common and easy strategy used in caches, and pip can have that baked in I personally am more tilted towards outputting the list of paths, which can be used by a consumer in form of a shell, or a python script for now, and perhaps based on future requirements from other users, think of baking it into pip at some point. Given these facts, if every one agrees, I can leave this PR as-is, or update it to output the list of paths. Please let me know. (Or if there is a third solution to this, that would be welcome too) |
I remain concerned about the limited flexibility of an option specifically to delete by age, but others seem comfortable that this won't result in a series of further options being requested. I do accept that "clearing out old cache entries" is a reasonable thing to want to do. I'm also concerned by the broader question of whether pip should have custom commands that manage things like this, as opposed to taking the approach of providing mechanisms so that users can build their own solutions using that mechanism. I find the assertion that we shouldn't be expecting users to be willing to write small scripts or command line invocations to handle tasks to be disconcerting, given that pip is a tool for people developing code. I also don't give much weight to the idea that "Unix solutions don't work on Windows" - Windows users have access to scripting tools just as much as Unix users (if nothing else they have Python!) and I find it a little condescending to assume that they aren't able to use them. However, I'm going to withdraw from this discussion now. It's reached a point where it's more about high-level design principles than about technical choices, and I'm not sure that a ticket like this is where such decisions should be made. |
It would be good if it was documented that scripts may delete files mentioned here (are there any caveats to this?). |
As long as another pip process than the one providing the output isn't running, it should be OK. |
9eb49f2
to
8ebe7b1
Compare
Thank you all for your comments and suggestions. It seems that most of the folks here are okay with going with the listing of full path of wheels using the |
Awesome, thanks for putting this together. |
8ebe7b1
to
fda2195
Compare
Hi @chrahunt , @pradyunsg , @uranusjr May I request if you folks can look at this PR, and provide review comments, so that this can go in the 20.2 release train? |
Coming back to this, I am second-guessing my The implementation looks good to me. |
fdec3b4
to
545dcc5
Compare
8ebc968
to
8adecc2
Compare
Thanks for the approval. @pradyunsg , may I get this merged, if there are no more reviews needed for this? |
8adecc2
to
e751b07
Compare
Now that the releases are out of the way, may I get this merged @pradyunsg and @chrahunt ? |
e751b07
to
a81447d
Compare
a81447d
to
9450f88
Compare
Thanks @deveshks! ^>^ |
Closes #8355
Add an option
--not-accessed-since
to subcommandslist
andpurge
ofpip cache
to list and remove wheels older than x days from current time.
cc @xavfernandez since the main idea was inspired from their PR #3146 , so I would appreciate thoughts on whether it's in the right direction
EDIT:
Based on further discussion about baking in a specific cache eviction mechanism not the correct way going forward, and letting the user decide based on the output to consume it and use in their own cache eviction script, I have gone ahead and updated the PR to include the
--abspath
option.