-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to kuchiki #671
Migrate to kuchiki #671
Conversation
I'm very impressed by how much less code this turns into! I would like to add tests before we make changes this large, and I'd also like to know if it regresses the performance. I have a benchmark set up locally I will make a PR for shortly; it includes a rustdoc HTML file we could use to make sure we have the same output before and after. Overall the code looks good. |
Also, the test failures need to be fixed. There's instructions on running tests locally here: https://forge.rust-lang.org/docs-rs/no-docker-compose.html |
See rust-lang#671, servo/html5ever#414 for rationale. This makes the `extract_head_and_body` function public since I couldn't find a cfg flag that would do so only during tests.
f382819
to
c485f9f
Compare
@jyn514 Tests are passing now (was just compiler errors), but again, I have not actually verified equivalency or anything locally. |
I would like to merge #672 and add some tests before making this refactor. If you can review 672 that would be great, and I should be able to add tests sometime today. |
See #671, servo/html5ever#414 for rationale. This makes the `extract_head_and_body` function public since I couldn't find a cfg flag that would do so only during tests.
I tested this locally and it looks fine. I'd still like to get #676 in first, though. |
c485f9f
to
d8c3f3d
Compare
Okay, I've rebased this now that we have tests and such in tree, but don't have time to run benchmarks and so forth. We should be careful when deploying this as it's essentially a bump of the html5ever dependency. (Ideally, I would like to be around when deploying so that if we do run into trouble we can collect some performance statistics to determine where the slowdown is coming from). |
Benchmarks on master:
Benchmarks after this PR:
Benchmarks with html5ever 0.24:
So it looks like a fairly significant slowdown, but not as bad as 0.24. I would like to fix this upstream in html5ever before deploying, but if you think the |
Also, as expected, this changes the display output of the parser. |
Those benchmark results look like maybe 10% (kuchiki) and 30% (.24) differences in performance? To me that seems not nearly enough to be the cause of our prolonged 100% CPU usage but I could be wrong. Let's hold off on deploying until we hear back from the html5ever folks, if they don't plan on doing more until we get more concrete than "this case got 10%-ish slower" than we can deploy and gather some stats in production. Realistically what we're doing is super horrible anyway, as we're parsing the whole (potentially enormous) HTML file into a bunch of separate allocations and then churning through the whole thing to re-generate all that HTML. |
Agreed, and it would be better not to (#679), but in the meantime we can't have the server going down. |
Triage: Still waiting on html5ever to respond to servo/html5ever#414. There are merge conflicts but those don't have to be fixed until we know the story with html5ever. |
I heard back from html5ever, they said they're investigating but since the performance difference was so small it's not high priority. @Mark-Simulacrum do you mind rebasing this so we can test it out in prod? |
I can, yeah, or at least take a stab at it. |
5466f81
to
6fee946
Compare
@jyn514 Tests are passing :) |
I have not tested this locally (I don't currently have a local docs.rs instance), would be good for someone to do that before we deploy.
r? @jyn514