-
Notifications
You must be signed in to change notification settings - Fork 145
Don't analyse the content of scripts #56
Comments
In case it helps, I've found two more examples of what I previously reported as Bug 1. Example 1 <!DOCTYPE html><html lang="en-US" xmlns="http://www.w3.org/1999/xhtml"><head>
<script type="text/javascript" async="" src="file:///home/milesj/Documents/seotodo/data/3pxt6/49n6b.js"></script><script src="https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_GB.nA4eUF06vL8.O/m=plusone/rt=j/sv=1/d=1/ed=1/am=AQ/rs=AGLTcCNcXA0KaQorIa7m2ep19XOiE7Q6rQ/t=zcms/cb=gapi.loaded_0" async=""></script>
</head><html> ReferenceError: Can't find variable: gapi Example 2 <!DOCTYPE html><html><body><script src="https://cdn.syndication.twimg.com/widgets/timelines/478801075776942080?callback=__twttr.callbacks.tl_478801075776942080_1&dnt=false&domain=blog.buzzvil.com&lang=en&suppress_response_codes=true&t=1591762" async=""></script></body></html> ReferenceError: Can't find variable: __twttr In both these examples, it appears to be throwing the error while parsing a url query string. Is it supposed to parse query strings? Have I set something up wrong? It's possible I know, but I can't see what it could be. It works on some sites, but not on others. One final thing, in case it's related in some way. When I remove I remove the |
@jonmiles, HTML Inspector doesn't parse the contents of script tags, so there must be something else going on. Do you have these examples hosted anywhere so I could take a look? |
I'm getting a similar response when running the CLI version of html-inspector 0.8.1 with phantomjs 1.9.8 using stock config against http://theinnerlogic.com/.
It works on some sites, but not on others. |
I don't think PhantomJS supports |
I think what we're asking is why is the html-inspector throwing JavaScript errors? If it doesn't inspect scripts why is the cli output referring to a JavaScript variable, or part of a URL in the script tag's SRC? What was wrong with the two examples I provided? Both snippets reproduce the errors on their own. Is it not valid HTML? I've actually rewrote your code to work without phantomjs, or a browser, so this is no longer a problem for me. |
It's not. PhantomJS is throwing those errors. HTML Inspector needs a JavaScript runtime with a DOM in order to work, but that means if there are other JavaScript errors on the page, those errors will affect HTML Inspector's ability to do it's work.
Again, these errors have nothing to do with HTML Inspector or the validity of the HTML. They're being thrown because there are errors with the JavaScript (either downloading it, parsing it, executing it, etc.). |
In phantom-runner.js in your page.onError handler you're re-throwing JavaScript errors from the page and then exiting. You don't need to do this, the JavaScript errors do not affect the HTML Inspectors ability to do it's job. The only errors you should be concerned with are from the phantom.onError callback. So you see the errors are coming from the HTML inspector, and essentially you are inspecting the content of scripts. |
One of the core principles of HTML Inspector is that it operates on live DOM. This allows it to inspect DOM that's dynamically generated as is the case in many single page applications today. If there's a JavaScript error on the page that prevents the DOM from generating, HTML Inspector may (incorrectly) report that it found no errors, when in fact it would have had the JavaScript run properly.
I can see an argument for not exiting, but I can also see an argument for wanting to fail the tests if any JavaScript errors are found. Honestly, the main problem here (as I see it) is not that that I'm exiting on JS errors, it's that PhantomJS doesn't support a lot of the JavaScript features used by devs today, so it's saying the page contains errors when it doesn't if run in a modern browser. I'm very much open to replacing PhantomJS. In fact, I started trying to use jsdom a few months back, but at the time it didn't support all the features needed. FWIW, In my personal tests, I don't use the CLI at all; I use selenium webdriver. |
I've noticed two potential bugs, both related (I believe) to the inspector analysing the content of scripts tags. I would have thought scripts should be ignored?
Both examples are taken from a real website, a large retailer.
Bug 1.
Throws
ReferenceError: Can't find variable: s_c_il
. Apparently when analysing the src attribute of a script tag.Here is the a test snippet taken straight from the site.
Bug 2.
The duplicate-ids rule fires when string
id="val"
also found in script content.This is the warning I receive.
[duplicate-ids] The id 'cp' appears more than once in the document.
When I search the content, I find one valid html assignment
and one instance assigning a JavaScript object attribute.
Btw, great work, this is a great little tool... I just need to work out how to work around these issues.
The text was updated successfully, but these errors were encountered: