Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to kuchiki #671

Merged
merged 1 commit into from
Jun 7, 2020
Merged

Migrate to kuchiki #671

merged 1 commit into from
Jun 7, 2020

Conversation

Mark-Simulacrum
Copy link
Member

I have not tested this locally (I don't currently have a local docs.rs instance), would be good for someone to do that before we deploy.

r? @jyn514

@jyn514
Copy link
Member

jyn514 commented Mar 31, 2020

I'm very impressed by how much less code this turns into! I would like to add tests before we make changes this large, and I'd also like to know if it regresses the performance. I have a benchmark set up locally I will make a PR for shortly; it includes a rustdoc HTML file we could use to make sure we have the same output before and after.

Overall the code looks good.

@jyn514
Copy link
Member

jyn514 commented Mar 31, 2020

Also, the test failures need to be fixed. There's instructions on running tests locally here: https://forge.rust-lang.org/docs-rs/no-docker-compose.html

jyn514 added a commit to jyn514/docs.rs that referenced this pull request Apr 1, 2020
See rust-lang#671,
servo/html5ever#414 for rationale.

This makes the `extract_head_and_body` function public since I couldn't
find a cfg flag that would do so only during tests.
@jyn514 jyn514 mentioned this pull request Apr 1, 2020
@Mark-Simulacrum
Copy link
Member Author

@jyn514 Tests are passing now (was just compiler errors), but again, I have not actually verified equivalency or anything locally.

@jyn514
Copy link
Member

jyn514 commented Apr 1, 2020

I would like to merge #672 and add some tests before making this refactor. If you can review 672 that would be great, and I should be able to add tests sometime today.

jyn514 added a commit that referenced this pull request Apr 1, 2020
See #671,
servo/html5ever#414 for rationale.

This makes the `extract_head_and_body` function public since I couldn't
find a cfg flag that would do so only during tests.
@jyn514 jyn514 mentioned this pull request Apr 1, 2020
@jyn514
Copy link
Member

jyn514 commented Apr 1, 2020

I tested this locally and it looks fine. I'd still like to get #676 in first, though.

image
image

@Mark-Simulacrum
Copy link
Member Author

Okay, I've rebased this now that we have tests and such in tree, but don't have time to run benchmarks and so forth. We should be careful when deploying this as it's essentially a bump of the html5ever dependency.

(Ideally, I would like to be around when deploying so that if we do run into trouble we can collect some performance statistics to determine where the slowdown is coming from).

@jyn514
Copy link
Member

jyn514 commented Apr 1, 2020

Benchmarks on master:

Benchmarking parse regex html: Collecting 100 samples in estimated 50.858 s (5050 iterations)
parse regex html        time:   [9.9645 ms 9.9814 ms 10.001 ms]
                        change: [-26.849% -26.645% -26.409%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe

Benchmarks after this PR:

parse regex html        time:   [10.700 ms 10.726 ms 10.755 ms]
                        change: [+6.9011% +7.1851% +7.4497%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  11 (11.00%) high mild
  3 (3.00%) high severe

Benchmarks with html5ever 0.24:

Benchmarking parse regex html: Collecting 100 samples in estimated 68.348 s (5050 iterations)
parse regex html        time:   [13.493 ms 13.515 ms 13.540 ms]
                        change: [+25.746% +26.007% +26.254%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

So it looks like a fairly significant slowdown, but not as bad as 0.24. I would like to fix this upstream in html5ever before deploying, but if you think the perf stats would be valuable, my guesstimate is this would only slow down the server, not bring it to a crawl like #668.

@jyn514
Copy link
Member

jyn514 commented Apr 1, 2020

Also, as expected, this changes the display output of the parser.

@Mark-Simulacrum
Copy link
Member Author

Those benchmark results look like maybe 10% (kuchiki) and 30% (.24) differences in performance? To me that seems not nearly enough to be the cause of our prolonged 100% CPU usage but I could be wrong.

Let's hold off on deploying until we hear back from the html5ever folks, if they don't plan on doing more until we get more concrete than "this case got 10%-ish slower" than we can deploy and gather some stats in production.

Realistically what we're doing is super horrible anyway, as we're parsing the whole (potentially enormous) HTML file into a bunch of separate allocations and then churning through the whole thing to re-generate all that HTML.

@jyn514
Copy link
Member

jyn514 commented Apr 1, 2020

Agreed, and it would be better not to (#679), but in the meantime we can't have the server going down.

@jyn514
Copy link
Member

jyn514 commented May 27, 2020

Triage: Still waiting on html5ever to respond to servo/html5ever#414. There are merge conflicts but those don't have to be fixed until we know the story with html5ever.

@jyn514
Copy link
Member

jyn514 commented Jun 6, 2020

I heard back from html5ever, they said they're investigating but since the performance difference was so small it's not high priority. @Mark-Simulacrum do you mind rebasing this so we can test it out in prod?

@Mark-Simulacrum
Copy link
Member Author

I can, yeah, or at least take a stab at it.

@Mark-Simulacrum
Copy link
Member Author

@jyn514 Tests are passing :)

@jyn514 jyn514 merged commit 2edd72e into rust-lang:master Jun 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants