This is intended to give a reasonable number that scales with the amount of
content in a website, for scheduling purposes.
This effectively counts the amount of text connected to a document that isn't
likely to be inline style or script.
Maybe have some more heuristics for hidden elements, like presence of the
Maybe skip whitespace-only text? This does a pretty good job anyways because
whitespace nodes are usually pretty small (like a couple newlines and
spaces), so they don't add too much to the number.
Add some weight to some elements? Maybe images should have a fixed weight,
for example. Though you don't want 0x0 images and such to count... Maybe we
should add to this heuristic out of band when processing image loads or some
Handle shadow DOM and such better? Right now Shadow DOM and XBL are always
assumed visible as long as they're connected. You can in theory do
something like stash a
<div> inside a
<style> element, attach a
ShadowRoot and such, and append a bunch of stuff inside. But I don't think
it's something we should be particularly worried about.
Probably add some check to CharacterData::AppendText as well? Otherwise this
undercounts when loading big amount of text arrives via the network, for
example, but also I'm not sure we're optimizing for log files and such so it
might be ok.
In any case, this gives us a heuristic that we can iterate on later. This does a
pretty good job at representing the amount of content in the examples over here:
For example for:
You get an output like the following if you print the heuristic after each bind
operation (and de-duplicating them):
3 // Some whitespace in <head>
4 // Some whitespace in the <body>.
65547 // Actual content injected by the first script.
65548 // Some more whitespace.
131085 // Actual content injected by the second script.
131087 // Some more whitespace.
I'm not a fan of what clang-format has done to my code btw :)