Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The newest packages feed does not always return the latest packages #16220

Closed
behnazh-w opened this issue Jul 4, 2024 · 1 comment
Closed
Labels
bug 🐛 requires triaging maintainers need to do initial inspection of issue

Comments

@behnazh-w
Copy link

The newest packages feed might have timezone issues, which can result in returning old packages instead of the newest ones. For example, at 2024-07-03-13-31+10 I got the following result from the feed: 2024-07-03-13-31+10.zip
containing this example timestamp for publication:

<pubDate>Wed, 03 Jul 2024 01:49:28 GMT</pubDate>

Later I got this feed at 2024-07-04-06-51+10: 2024-07-04-06-51+10.zip, which contained more recent packages as expected. Here is an example timestamp from this feed:

<pubDate>Wed, 03 Jul 2024 20:47:58 GMT</pubDate>

But then I got the following result at 2024-07-04-11-32+10: 2024-07-04-11-32+10.zip that contains packages that are older than the ones obtained previously at 2024-07-04-06-51+10. Here is an example publication timestamp:

<pubDate>Wed, 03 Jul 2024 01:30:29 GMT</pubDate>

So, looks like the most recent feed contains packages that were not really the most recent.

@behnazh-w behnazh-w added bug 🐛 requires triaging maintainers need to do initial inspection of issue labels Jul 4, 2024
@di
Copy link
Member

di commented Jul 5, 2024

Hi, this is not a timezone issue and is expected. This could be the case if multiple recent projects were deleted, because our feed just provides the 40 most recently packages at a given time.

def rss_packages(request):
request.response.content_type = "text/xml"
newest_projects = (
request.db.query(Project)
.options(joinedload(Project.releases, innerjoin=True))
.order_by(Project.created.desc().nulls_last())
.limit(40)
.all()
)
project_authors = [
_format_author(project.releases[0]) for project in newest_projects
]
return {"newest_projects": tuple(zip(newest_projects, project_authors))}

Note that similarly this feed may be missing projects if many new projects were created between multiple requests.

Depending on what you're doing, you may want to use the XML-RPC API or the Simple API instead: https://warehouse.pypa.io/api-reference/feeds.html#project-and-release-activity-details

@di di closed this as completed Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 requires triaging maintainers need to do initial inspection of issue
Projects
None yet
Development

No branches or pull requests

2 participants