Replies: 1 comment 3 replies
-
Refactored that into a RepositoryContainer so as not to pile hack upon hack into PrioritySelected:
The "if project_name in project_list" works because of this implementation of contains, hash, eq:
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Our current internal repository has a problem that playing with simple-repository helped me identify - it's been very slow for a while now and I discovered that when you use its equivalent of a PrioritySelectedProjectsRepository, with our internal artifacts first and public pypi second, the 404s from the first repo take nearly a second each(!) before it moves on to check the repo where the artifact really lives. When I set up simple-repository with the same backends I saw those slow 404s, realised it wasn't a problem with SR but with the back end, but that I could hack SR to fix it.
I was able to make this all substantially faster by only looking in the repos where I know the project list/artifact could possibly exist, by checking the index page - in get_resources/get_project_page of PrioritySelectedProjectsRepository I added:
and then defined that as
while the bad repo I'm working around is unusually slow, this optimisation may be useful generally?
My first attempt at doing this, I began by hacking the ttl cache to accept override ttls on set/update, and made the CachedHttpRepository cache for a short time even if there was no etag (the bad repo also doesn't send etags).
But, this ran into another couple of issues which made it slower. The first problem is that I was now fetching
the index page of pypi a lot, sending a revalidation request every time (because it does send etags) and parsing
the json of the cached response every time. The 304 from pypi is quick, under 0.1s, but the parse is 0.7s.
Generally we want to cache as close as possible to the final response to avoid reprocessing; the caches in simple-repository cache the raw data instead so always need reprocessing. I think that's a mistake, but it was too big a
refactor for me today.
The other issue is when to revalidate. pip is unusual (and IMO, wrong) in ignoring cache-control expiry pypa/pip#5670 - they made a choice to reduce confusion on upload by increasing round trips for downloaders forever. uv for example doesn't do this. simple-repository revalidates every time when I think it should have a revalidate-after and expires-after ttl, which both refresh when revalidated.
To put some meat on this, the pip install --force-reinstall of 18 packages I used to test this was 28s with a cold cache; 21s when CachedHttpRepository warmed up. uv was 18s cold, and 2.2s warm. With the change above to skip the slow repo for projects that aren't in it, pip cold was 13.5s, warm was 7.7s; uv cold was 5.5s, uv warm was 1.4s (about 0.5s spent getting 304s for metadata on 3 of the packages)
Beta Was this translation helpful? Give feedback.
All reactions