-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(font-size): speed up gatherer #6391
Conversation
@@ -208,7 +208,6 @@ class FontSize extends Audit { | |||
analyzedFailingNodesData, | |||
analyzedFailingTextLength, | |||
failingTextLength, | |||
visitedTextLength, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed visitedTextLength
because we can now easily visit every node (it's one protocol call).
|
||
// Locate the document under analysis. | ||
// TODO: this needs to use frameId | ||
const doc = snapshot.documents.find(doc => lookup(doc.documentURL) === passContext.url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is it not true that "The DOMNode at index 0 corresponds to the root document"? Should poke devtools folks to update that then :)
AFAIK, the only main frame id we have/use is in the trace (the mainFrameIds refactor you just did), and I don't think we're collecting it anywhere else that'd be readily available to a gatherer.
You could query it with Page.getResourceTree
resourceTreeResponse.frameTree.frame.id
or use the network records to get at the frame id if we need to
const nodes = [ | ||
{nodeId: 1, nodeName: 'HTML'}, | ||
{nodeId: 2, nodeName: 'HEAD', parentId: 1}, | ||
{nodeId: 1, backendNodeId: 100, nodeName: 'HTML'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickhulce I hope I maintained the integrity of this test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks really really good!
I'm thinkin we'll want to do a few sanity comparison checks with sites we used in the original PR in addition to the smoke test, maybe we can look at the discussion there to find a couple URLs?
|
||
// Locate the document under analysis. | ||
// TODO: this needs to use frameId | ||
const doc = snapshot.documents.find(doc => lookup(doc.documentURL) === passContext.url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is it not true that "The DOMNode at index 0 corresponds to the root document"? Should poke devtools folks to update that then :)
AFAIK, the only main frame id we have/use is in the trace (the mainFrameIds refactor you just did), and I don't think we're collecting it anywhere else that'd be readily available to a gatherer.
You could query it with Page.getResourceTree
resourceTreeResponse.frameTree.frame.id
or use the network records to get at the frame id if we need to
// but nodes will only contain the Body node and its descendants. | ||
|
||
/** @type {NodeFontData[]} */ | ||
const failingNodes = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we could move all this into some functions for the different steps you're adding so afterPass
will be pretty short like before? I think it might help human-parsers figure out what's going on :)
it looks like there's roughly the following stages
- get the flattened document + snapshot and make sure its all valid
- create our mappings to be able to tie the two together
- go through all the nodes and identify the failing ones
is that roughly accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds good. I mostly wanted to wait for PR feedback before chopping up afterPass, just incase the whole thing was moot or something :P
nodeValue: failingText, | ||
nodeType: FontSizeGather.TEXT_NODE_TYPE, | ||
parentId: 10, | ||
}, | ||
]; | ||
|
||
const stringsMap = {}; | ||
const strings = []; | ||
const dedupe = (value) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about something like getOrCreateStringRef
/getOrCreateStringIndex
to indicate the main reason we're doing this is so that we're dealing with that .strings
property
parentIndex: nodes.map(node => nodes.findIndex(pNode => node.parentId === pNode.nodeId)), | ||
}, | ||
layout: { | ||
nodeIndex: nodes.map((_, i) => i), // this isn't very accurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind elaborating on the comment a bit? :)
is it inaccurate because the nodeIndexes are usually very large? are they usually not sequential? do any of these things affect the fidelity of the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take a look at an actual snapshot, you'll see that there isn't a 1:1 mapping between Nodes and style entries. That's because some elements (like text nodes) don't have styles. Here, I'm being lazy and saying that all nodes have computed styles, and that the mapping is just the identity function.
It is indeed missing out on a small amount of functionality, namely the nodeId => styleIndex mapping in the implementation. it wasn't immediately clear how to generate that part of the fake snapshot so I just punted it. I could look into it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, yeah probably don't need to mess with it that much. If there's an easy way to at least trigger one of the branches for a single node or something that might be good, but not worth hurtin' yourself over I'd say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's trust the protocol documentation
https://chromedevtools.github.io/devtools-protocol/tot/DOMSnapshot#method-captureSnapshot
The DOMNode at index 0 corresponds to the root document.
If it's wrong then testing will catch it :)
disclaimer = ` (based on ${percentageOfVisitedText.toFixed()}% sample)`; | ||
} | ||
|
||
const disclaimer = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just remove disclaimer now ;)
const fontSize = parseInt(lookup(fontSizeStringId), 10); | ||
backendIdsToPartialFontData.set(doc.nodes.backendNodeId[i], { | ||
fontSize, | ||
// TODO: trimming this for a second time. maybe don't? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that because it's internally being trimmed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the check before (non empty text node) trims it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just make that function call check for a text node, and have the "non empty" part come after. I'll do that. Was mostly trying to keep the previous function as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but .trim
doesn't mutate and we're not reassigning nodeValue
so it should still be needed right? I don't think you have to refactor anything :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNodeTextLength
was being called twice. Just rectified that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooooooh I see what you were saying now 👍
I'll go through the other issue and test some URLs. |
one key question: is it actually faster? :) |
Not sure yet. Need to rebase to get the perf tracing. I'll definitely update w/ my findings |
Would be useful to have a PR hook (or just print something in travis ci) that shows us diffs on these marks. Maybe collect all the timings from the smoke tests, aggregate the durations of each mark, and pretty print the time deltas against master. (opened #6409) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, a bunch of style comments, but this is looking great, and a major improvement in both implementation and performance
function getNodeTextLength(node) { | ||
return !node.nodeValue ? 0 : node.nodeValue.trim().length; | ||
function getNodeTextLength(nodeValue) { | ||
return !nodeValue ? 0 : nodeValue.trim().length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is always a string, seems like return nodeValue.trim().length;
is sufficient now?
@@ -31,6 +29,7 @@ const TEXT_NODE_TYPE = 3; | |||
/** @typedef {import('../../driver.js')} Driver */ | |||
/** @typedef {LH.Artifacts.FontSize['analyzedFailingNodesData'][0]} NodeFontData */ | |||
/** @typedef {LH.Artifacts.FontSize.DomNodeMaybeWithParent} DomNodeMaybeWithParent*/ | |||
/** @typedef {Map<number, {fontSize: number, textLength: number}>} BackendIdsToPartialFontData */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a type name, Partial
ends up having a specific meaning (for tsc, at least). Here does it actually mean it's just the subset of font data we're interested in? If so, maybe BackendIdsToFontData
seems better
calculateBackendIdsToPartialFontData(snapshot) { | ||
// Makes the strings access code easier to read. | ||
/** @param {number} index */ | ||
const lookup = (index) => snapshot.strings[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really seem like it's saving much (and lookup
is pretty generic, so it's difficult to tell what it's doing in place). Maybe just drop, or use const strings = snapshot.strings;
?
if (!doc || !doc.nodes.nodeType || !doc.nodes.nodeName || !doc.nodes.backendNodeId | ||
|| !doc.nodes.nodeValue || !doc.nodes.parentIndex) { | ||
throw new Error('Unexpected response from DOMSnapshot.captureSnapshot.'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like these are meant to satisfy the type checker? Maybe that's worth commenting, e.g. // satisfy the type checker that all expected values exist
} | ||
|
||
// Not all nodes have computed styles (ex: TextNodes), so doc.layout.* is smaller than doc.nodes.* | ||
// doc.layout.nodeIndex maps the index into doc.nodes.* to an index into doc.layout.styles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be the opposite ("maps an index in doc.layout.styles
to an index in doc.nodes.*
") since nodeIndexToStyleIndex
is set up to be the inverse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, nice catch
if (!textLength) continue; // ignore empty TextNodes | ||
const parentStyles = doc.layout.styles[styleIndex]; | ||
const [fontSizeStringId] = parentStyles; | ||
const fontSize = parseInt(lookup(fontSizeStringId), 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe another comment here that we expect strings like '16px'
, hence why the parseInt
is appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, do we actually want parseFloat
? I guess 11.5 will still fail the < 12 check, but we might as well store the most accurate info
// fastest way to get the computed styles of every Node. Bonus, it allows for whitelisting properties. | ||
// Once a bad TextNode is identified, its parent Node is needed. DOMSnapshot.captureSnapshot doesn't | ||
// give the entire Node object, so DOM.getFlattenedDocument is used. The only connection between a snapshot | ||
// Node and an actual Protocol Node is backendId, so that is used to join the two data structures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just
// Use DOMSnapshot.captureSnapshot to get the computed font-size style of every node.
(the rest can be filled in below as it happens)
const snapshot = await passContext.driver.sendCommand('DOMSnapshot.captureSnapshot', { | ||
computedStyles: ['font-size'], | ||
}); | ||
// backendIdsToPartialFontData will include all Nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really true? (it just contains non-empty text nodes)
// backendIdsToPartialFontData will include all Nodes, | ||
const backendIdsToPartialFontData = this.calculateBackendIdsToPartialFontData(snapshot); | ||
// but nodes will only contain the Body node and its descendants. | ||
const nodes = await getAllNodesFromBody(passContext.driver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we now want to differentiate nodes
from the nodes in the snapshot, somehow? I'm not sure of a great way. crdpNodes
or something?
} | ||
|
||
/** | ||
* @param {BackendIdsToPartialFontData} backendIdsToPartialFontData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A jsdoc description of the method would be good here (e.g. the comment from below about matching font data from the snapshot to node references via backendId
)
@patrickhulce can you find the discussion # with some test sites? I can't find it now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two last style nits and a question, but otherwise LGTM!
/** @type {Map<number, number>} */ | ||
const nodeIndexToStyleIndex = new Map(); | ||
for (let i = 0; i < doc.layout.nodeIndex.length; i++) { | ||
nodeIndexToStyleIndex.set(doc.layout.nodeIndex[i], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure, are these 1:1?
yah? nah? :)
const nodeValue = strings[doc.nodes.nodeValue[i]]; | ||
if (!isTextNode({ | ||
nodeType, | ||
parentNodeName: strings[doc.nodes.nodeName[doc.nodes.parentIndex[i]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still would be good to pull this into a variable (or variables), especially to make the conditional obvious at a glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear, do you mean
const isTextNode = !isTextNode({
nodeType,
parentNodeName: strings[doc.nodes.nodeName[doc.nodes.parentIndex[i]]],
})
if (!isTextNode) {
continue;
}
or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed he meant breaking up strings[doc.nodes.nodeName[doc.nodes.parentIndex[i]]]
but even if he didn't, I think that would be nice if it isn't too much trouble :)
const textLength = getNodeTextLength(nodeValue); | ||
if (!textLength) continue; // ignore empty TextNodes | ||
const parentStyles = doc.layout.styles[styleIndex]; | ||
const [fontSizeStringId] = parentStyles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment here would still be good too. The type is just Array<string>
, so it and the protocol docs don't give much help on why this is the way it is
Oops, I think Github elided some of your comments @brendankenny ( Should have addressed all of them now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// backendIdsToFontData will include all non-empty TextNodes, | ||
const backendIdsToFontData = this.calculateBackendIdsToFontData(snapshot); | ||
// but nodes will only contain the Body node and its descendants. | ||
const crdpNodes = await getAllNodesFromBody(passContext.driver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAllNodesFromBody relies on DOM.getFlattenedDocument
but I think all the information we need from that is exposed in the snapshot.
Using #6410 it looks like getFlattenedDocument is slower, so it seems nice to drop it.
driver.getNodesInDocument
is only used by font-size, so we can remove it from driver and rely on the capturedSnapshot entirely. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FontSize artifact contains entire CRDP Nodes. The snapshot data doesn't have all the data to reconstruct a CRDP Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need everything that's in CRDP Node.
The audit only uses node.parentNode and node.attributes. It looks like all that data is already in the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need everything that's in CRDP Node.
we need CSS.getMatchedStylesForNode
to get the current styles for a node, which needs the nodeId to query for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulirish , given the above (we need the snapshot + the flattened document protocol calls, to correlate failing nodes with their matched styles), should we still reduce the scope of artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks like the snapshot doesn't have nodeId
. only backendNodeId
. Bummer.
K. We'll keep the current protocol calls. :)
const textLength = getNodeTextLength(nodeValue); | ||
if (!textLength) continue; // ignore empty TextNodes | ||
const parentStyles = doc.layout.styles[styleIndex]; | ||
// each styles is an array of string indices, one for each property given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each styles? doc.layout.styles is an array of....
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc.layout.styles is an array of an array of string indices
...
Tried this URL: https://www.clumsycrafter.com/the-water-blob-tutorial-using-an-iron-to-seal-the-edges/ The old code reports the "Report this ad" as an illegible node: The new code does not. For this site, it's obvious that this ad takes awhile to display on the page. I think the difference we are seeing is due to the old code allowing the page more time to do stuff (we do |
// backendIdsToFontData will include all non-empty TextNodes, | ||
const backendIdsToFontData = this.calculateBackendIdsToFontData(snapshot); | ||
// but nodes will only contain the Body node and its descendants. | ||
const crdpNodes = await getAllNodesFromBody(passContext.driver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks like the snapshot doesn't have nodeId
. only backendNodeId
. Bummer.
K. We'll keep the current protocol calls. :)
passContext.driver.sendCommand('DOM.enable'), | ||
passContext.driver.sendCommand('CSS.enable'), | ||
]); | ||
|
||
// Use DOMSnapshot.captureSnapshot to get the computed font-size style of every node. | ||
const snapshot = await passContext.driver.sendCommand('DOMSnapshot.captureSnapshot', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very small speedup if you run these two in parallel:
lighthouse-core/gather/gatherers/seo/font-size.js | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/lighthouse-core/gather/gatherers/seo/font-size.js b/lighthouse-core/gather/gatherers/seo/font-size.js
index e6d0dbe0..3ceac9bd 100644
--- a/lighthouse-core/gather/gatherers/seo/font-size.js
+++ b/lighthouse-core/gather/gatherers/seo/font-size.js
@@ -350,13 +350,15 @@ class FontSize extends Gatherer {
]);
// Use DOMSnapshot.captureSnapshot to get the computed font-size style of every node.
- const snapshot = await passContext.driver.sendCommand('DOMSnapshot.captureSnapshot', {
+ const snapshotPromise = passContext.driver.sendCommand('DOMSnapshot.captureSnapshot', {
computedStyles: ['font-size'],
});
+ // but nodes will only contain the Body node and its descendants.
+ const allNodesPromise = getAllNodesFromBody(passContext.driver);
+ const [snapshot, crdpNodes] = await Promise.all([snapshotPromise, allNodesPromise]);
+
// backendIdsToFontData will include all non-empty TextNodes,
const backendIdsToFontData = this.calculateBackendIdsToFontData(snapshot);
- // but nodes will only contain the Body node and its descendants.
- const crdpNodes = await getAllNodesFromBody(passContext.driver);
const {
totalTextLength,
failingTextLength,
One of the missing nodes for the site Paul tried (https://www.haaretz.com/) has 8px font. I looked up '8px' in I looked up the offending node (from the DOM.getFlattenedDocument method) by class name:
and walked up the parentNode hierarchy until I got to a BODY element. Grabbed its So, I think that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing my review while we figure out the issues so no one accidentally merges :)
Should we
|
I'll give it another shot in the near future. |
Going to close for now since it's not on our immediate roadmap (I believe), but we can keep the branch open :) |
Fixes #6386.
See comments in discussion and code. LMK if anything is unclear :)