- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use releases information from PyPI, not GitHub #98
Comments
I'm going to create a separate git scraping repo that records the PyPI JSON for every one of the plugins. |
I'll start with these 101 packages: https://datasette.io/content?sql=select+distinct+package++from+stats+order+by+package |
https://github.com/simonw/pypi-datasette-packages/tree/main/packages now contains the JSON for each one of those. |
I built pypi-to-sqlite to help with this. |
My first attempt at integrating if [ ! -d /tmp/pypi-datasette-packages ]
then
git clone https://github.com/simonw/pypi-datasette-packages /tmp/pypi-datasette-packages
else
(cd /tmp/pypi-datasette-packages && git pull)
fi
args=$(sqlite-utils content.db "
with packages as (
select name from plugins union select name from tools
where name not in ('datasette-app')
)
select
group_concat('-f /tmp/pypi-datasette-packages/packages/' || name || '.json', ' ')
from packages
" --raw)
eval "pypi-to-sqlite content.db ${args} --prefix pypi_" But this failed with the following error:
Because I don't want builds failing because I forgot to scrape a new package. |
This works: args=$(ls /tmp/pypi-datasette-packages/packages/*.json | awk '{print "-f "$0 " \\"}')
eval "pypi-to-sqlite content.db $args
--prefix pypi_" If you run pypi-to-sqlite content.db -f /tmp/pypi-datasette-packages/packages/airtable-export.json \
-f /tmp/pypi-datasette-packages/packages/csv-diff.json \
-f /tmp/pypi-datasette-packages/packages/csvs-to-sqlite.json \
-f /tmp/pypi-datasette-packages/packages/datasette-atom.json \
--prefix pypi_ |
That ran in production, so now we have these three new tables: |
I need a de-duped list of releases based on the PyPI data - which is hard because the select
package,
version,
min(upload_time) as release_time
from
pypi_releases
group by
package,
version
order by
release_time desc |
Figured out a new query I can use on the homepage that includes releases that do not have release notes on GitHub: with release_note_releases as (
select
repos.name as name,
releases.html_url as release_url,
releases.tag_name as version,
substr(releases.published_at, 0, 11) as date,
releases.body as body_markdown,
releases.published_at as released_at,
pypi_packages.summary as pypi_summary
from
releases
join repos on repos.id = releases.repo
left join pypi_packages on pypi_packages.name = repos.name
),
releases_with_no_notes as (
select
pypi_releases.package as name,
null as release_url,
pypi_releases.version as version,
substr(min(pypi_releases.upload_time), 0, 11) as date,
null as body_markdown,
min(pypi_releases.upload_time) as released_at,
pypi_packages.summary as pypi_summary
from
pypi_releases
join pypi_packages on pypi_releases.package = pypi_packages.name
where
package not in (
select
name
from
release_note_releases
)
group by
pypi_releases.package,
pypi_releases.version
)
select
*
from
release_note_releases
where
name in (
'datasette-dashboards',
'pypi-to-sqlite',
'datasette-graphql'
)
union
select
*
from
releases_with_no_notes
where
name in (
'datasette-dashboards',
'pypi-to-sqlite',
'datasette-graphql'
)
order by
released_at desc
limit
100 The |
I had to hack the data to get It links to https://datasette.io/plugins/datasette-dashboard where the others link to the GitHub release. |
Next challenge: get https://datasette.io/plugins/datasette-dashboard page working. It currently uses this SQL: datasette.io/templates/pages/plugins/{name}.html Lines 3 to 13 in 332f89f
|
Even bigger challenge: https://datasette.io/plugins - it currently offers "sort by stars" but if I don't have an associated GitHub repository that doesn't make sense. |
I think I need to redefine these two SQL views: |
Those views are at the heart of how the site things about plugins and tools, and currently have a dependency on Relevant SQL snippet from the view definition: from
datasette_repos
join repos on datasette_repos.id = repos.node_id
join releases on repos.id = releases.repo
join users on users.id = repos.owner
where
datasette_repos.nameWithOwner in (
select
repo
from
tool_repos
) |
Maybe I should have a |
Possible design for that format: - package: datasette-graphql
github: simonw/datasette-graphql
plugin: true
tags:
- graphql
- api
hooks:
- database_actions
- extra_template_vars
- menu_links
- register_routes
- startup
- table_actions |
... but that's going to happen in a separate issue. For the moment the problem I want to solve is getting |
The reason Scroll to the bottom of the page to see this: Those two null columns are for |
I'm going to try joining against |
This new query seems to work: select
repos.name as name,
repos.full_name as full_name,
users.login as owner,
repos.description as description,
repos.stargazers_count,
pypi_versions.name,
max(pypi_releases.upload_time) as latest_release_at,
repos.created_at as created_at,
datasette_repos.openGraphImageUrl,
datasette_repos.usesCustomOpenGraphImage,
(
select
sum(downloads)
from
stats
where
stats.package = repos.name
and stats.date > date('now', '-7 days')
) as downloads_this_week,
(
select
count(*)
from
plugin_repos
where
repo = repos.full_name
) as is_plugin,
(
select
count(*)
from
tool_repos
where
repo = repos.full_name
) as is_tool
from
datasette_repos
join repos on datasette_repos.id = repos.node_id
left join pypi_releases on pypi_releases.package = repos.name
left join pypi_versions on pypi_releases.version = pypi_versions.id
join users on users.id = repos.owner
where
datasette_repos.nameWithOwner in (
select
repo
from
plugin_repos
)
group by
repos.id
order by
latest_release_at desc; |
I broke |
I want to start listing more plugins by other people, for example https://github.com/rclement/datasette-dashboards
The site currently uses GitHub releases for release information - but not every project uses GitHub releases.
The PyPI JSON API provides releases in a way that is guaranteed to work for all packages released to PyPI: https://pypi.org/pypi/datasette-dashboards/json
Switching to this as the primary source of information (while still using GitHub releases data, if available, for changelog notes) would help the plugin directory work against a wider range of plugins.
It could even then work for plugins that don't have their source code hosted on GitHub!
The text was updated successfully, but these errors were encountered: