-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEO - Page is blocked from indexing - robots.txt #5273
Comments
@strayangelfilms I'm wondering if it's a bug or it might be possible that Google's Search engine algorithm now requires robots.txt file to be indexed, as it is of no use to index the robots.txt file. Please shine a light on this issue. @patrickhulce or @ebidel or @googlebot - can anyone of you help us with this? |
Same issue |
Thanks for following up! It seems like part of the issue is from robots.txt manually enumerating the bot user agents they allow indexing from, and then including something along the lines of GitHub's fallback Perhaps we should request from typical bot user agents like The other problem seems to be extension specific. I don't see the wordpress.com issue when run from the CLI, we'll have to investigate what's going on there. |
Unclear from the screenshot if the audit is actually failing or encountering a fatal error. I think the red icon indicates that it's an error. In the LH test I just ran of this page, I'm getting errors: In the JSON results, I'm seeing this: "is-crawlable": {
"id": "is-crawlable",
"title": "Page is blocked from indexing",
"description": "Search engines are unable to include your pages in search results if they don't have permission to crawl them. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/indexing).",
"score": null,
"scoreDisplayMode": "error",
"rawValue": null,
"errorMessage": "Audit error: Unable to identify the main resource"
} So the @kdzwinel is OOO this week. @patrickhulce you're probably more familiar with this than me, is this something you can investigate further? |
@rviscomi So, you are saying this is a Lighthouse error. Right? |
I agree, it seems to be an issue reading / parsing the robots.txt file which is clearly there and readable on every site / server / environment I’ve tested by all the major crawlers. I have never seen this error except with the current latest build available for download on the Google Chrome web store.
|
Ok thanks for clarifying. I see what's happening now. The audit was recently updated in v3 to look at robots.txt in addition to meta tags. It only looks at This question came up during development and we decided to avoid distinguishing between crawlers (eg looking only at Googlebot). The audit answers the question "Is this page indexable?" but I think we're missing an important follow-up question: "to whom?" Because it can be indexable to some and not others. I think the safest path forward is to handle these indeterminate cases as a warning rather than a failure. The audit help text should convey that the page is indexable to some and not others, and enumerate to which bots it's not indexable. Do we have the ability to pass/fail/warn dynamically? If so that would be a good compromise to avoid penalizing the SEO score while drawing attention to potentially serious misconfigurations. |
Yep! We can handle this, though we just decided in #5270 to not show passed audits with warnings as failures, which we might want to reverse if this is our solution to this issue :) |
OK so the secondary piece of this is an extension failure and a regression of #4794, which was fixed, but then broken again by #4875. While browserify does indeed shim URL such that no more errors appear, it's logic is useless and causes robots-parser to always return a failure if a robots.txt is present. That fact that the audit only fails when a robots.txt is present is why it wasn't caught earlier. Thank you for reporting and persisting through the many layers of bugs/non-bugs here @strayangelfilms! 👍 fix up at #5293 |
@patrickhulce Anything for you Patrick after helping me catch that meta robots nofollow problem happening on our site. I work with an optimization company and we are constantly using Lighthouse all day to evaluate results. It is the best and most robust tool out there! I have another possible issue to report as well which I will create a separate report for. |
I wrote about this bug before, then someone correctly mentioned an issue on one of the sites in question, which made me question if it was a bug, however, I have since corrected that issue, and after more testing, every single site I try and analyze with Lighthouse 3.0.0-alpha.2 says that "Page is blocked from indexing" and applies an SEO penalty. Older versions of Lighthouse seem to be fine. This is the version of Lighthouse that automatically downloads / installs when a user visits the Chrome webstore to add the Lighthouse plugin as an extension for easier one click testing. Other reviews seem to indicate that perhaps this version was pushed to the Chrome webstore a little too early?
Version info on the Chrome webstore indicates it is 2.10.0.3002 (May 16, 2018) however what actually downloads and installs as a browser plugin/extension is 3.0.0-alpha.2.
The text was updated successfully, but these errors were encountered: