-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Versions >= v5.3.2 are unable to parse specific link #280
Comments
Sorry for the bad experience. |
Thanks for attempting to find a fix! I tested I tested on both Node 20.10.0 and Node 18.18.1. Note that this does not happen on <v5.3.2 using the same machine. |
Indeed, this lib is getting slower and slower... |
Thanks for your work! I truly appreciate the added test. Unfortunately the test doesn't quite represent the problem accurately. The file saved to the repo for testing, "HTML Standard.html" is different than the file served at that URL. Perhaps a utility was used to save it that did some processing? I made a fork of your library that additionally uses the HTML fetched from that URL in that same test. You can see in my actions run that the fetched HTML is correctly parsed by 5.3.1 but hangs indefinitely with newer versions. |
I work for a project that validates its links using this library. One link that is frequently validated is the HTML spec at https://html.spec.whatwg.org/. This page has one of the bigger HTML files on the web but node-html-parser was able to parse it well in approximately 23 seconds on my local machine until release 5.3.2.
Consider this example:
With node-html-parser 5.3.1, this outputs the following:
With node-html-parser 5.3.2, this hangs indefinitely; only outputting the following even after running for hours:
The text was updated successfully, but these errors were encountered: