-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sitemap improvements #4061
Comments
Would be good to pair to spread background on this issue. |
So, with the current sitemap design, I don't think is possible. Right now, we are hosting sitemap files in an S3 bucket. Since the bucket is not configured as a static site domain, I don't think it will be possible to use S3 as a single solution hosting strategy. We could build a sitemap template into ckan catalog, probably |
**Time-consuming in terms of: (1) implementation and (2) actual user load time. |
After experimenting with explicitly setting the |
With GSA/catalog.data.gov#702, the sitemaps should be getting created with the correct There are still a number of changes to the sitemaps we can and should make, but hopefully this will enable Google to start picking up URLs. |
Well, with GSA/catalog.data.gov#703: |
Sitemap urls are being generated correctly, and site map files are being correctly generated but the connection between the two is not working. It should be a simple ngnix fix (and probably is), but I haven't gotten it to work correctly yet. It's possible that there is an s3 config that needs to be changed as well. |
After letting Google do its thing for the weekend, it has made some limited forward progress.
I submitted I'm not sure what to conclude or which actions to take. It does seem like Google will slowly crawl and index these URLs, but (from what I can tell) on the limited set of 5 or 81 sitemap files. Will continue to see if there is a way to account for the missing (on the ui at least) sitemap files and overcome Google's alleged |
As a bump: Search Console is now showing 17k URLs and (perhaps importantly?) 86 sitemap files found in the index (as opposed to the 81 it has been finding, but still missing the total of 374). Since it does look like Google is slowly making its way through the files, perhaps we should close this for now and make a 'Check back in a month'-type ticket? We do have a lot of URLs after all. |
We will make a follow up ticket, but close this one as complete. |
User Story
In order to [improve catalog SEO], [datagov team] wants [fix/improve the current sitemap].
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN [visiting catalog sitemap]
WHEN [the page load] happens
THEN [a html page should be loaded]
[AND the XML file not downloaded but with a download link]
GIVEN [checking Google Search Console]
WHEN [checking sitemap Discovered URLs] happens
THEN [the number is growing]
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Based on the meeting with Freddie Blicher and the datagov team on 11/14, we need to take a couple of actions to improve the effectiveness of our sitemap for catalog.
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]
The text was updated successfully, but these errors were encountered: