Closed Bug 1387280 Opened 7 years ago Closed 7 years ago

stylo: Font metrics handling is killing us

Categories

(Core :: CSS Parsing and Computation, enhancement, P1)

enhancement

Tracking

()

RESOLVED INVALID

People

(Reporter: bholley, Unassigned)

References

(Blocks 1 open bug)

Details

We've been getting questions about why our performance on Talos benchmarks isn't as good as we would expect given how much faster stylo is at styling. For Tp5, the answer is basically that they didn't use all that much CSS back in 2011, so we're unlikely to get too much more than parity there. However, we now have Tp6 (formerly known as quantum_pageload), which runs a few modern pages (amazon, facebook, youtube, and google). Modern sites like these are more style-heavy, so we would expect to see significant wins from stylo. And we're seeing some wins there, they're not to the degree we'd expect.

We've been doing a good job of tracking time spent in the style system, and have good numbers to show for it. And we've been tracking total time spent in servo code, because it's easy to filter for (Servo_ in the filter box). But we haven't dug much into the end-to-end pageload timeline, in part because the profiler doesn't offer an obvious range to consider (and thus compare).

After some digging, it appears that tp6 measures the time between "when the harness starts running" and performance.timing.timeToNonBlankPaint (a non-standard, pref-controlled timer). Outside of Talos, a reasonable proxy for "when the harness starts running" seems to be performance.timing.navigationStart [1].

As of this morning, the profiler didn't provide a useful way to find these markers and scope the timeline accordingly. However, gregtatum and mstange kindly jumped in, and developed and deployed a workflow for this in perf.html [2] in just about two hours. In-house tooling is awesome.

Here are the profiles for the amazon search page for 'laptop':
Gecko: https://perfht.ml/2v1PEqj
Stylo: https://perfht.ml/2wbH1rA

The profiles are a bit noisy, and the numbers average about parity, but I deliberately picked a more-optimal gecko profile and a less-optimal stylo profile, so that we could more-easily get to the heart of the matter.

If you measure time spent styling, Stylo is running circles around Gecko: 13.4 ms vs 137 ms. So about 120ms (or 10x) better. So stylo is doing its core job well.

However, the end-to-end times don't reflect that, which means that stylo is slowing things down around the edges. There are a few known issues:
* We're blowing 40ms with our dumb styleset handling, which is bug 1386045.
* We're spending 20ms resolving anonymous box styles, which bug 1368290 should fix.
* CSS parsing is about 5 ms slower (bug 1331843).

But that still doesn't account for the mismatch in what we're seeing. I was stumped on this for a while, until I typed "font" into the filter box. It turns out that Stylo is spending 120ms (!!) doing font metric stuff during layout, whereas Gecko only spends 33ms.

So we're clearly doing something dumb here. We should fix this, and the bugs mentioned above, at which point our TP6 numbers should look more like what we expect.

[1] Ideally Talos would use that instead, but it's probably not worth another disruption in the graphs during the 57 cycle.
[2] https://github.com/devtools-html/perf.html/pull/499
Manish is doing some initial investigation here, but said it may make more sense to pass on to heycam. NI both of them to pick this up.
Flags: needinfo?(manishearth)
Flags: needinfo?(cam)
So we spend 34.8ms on nsFontCache related things, vs. Gecko spends only 3.2ms. Looks like the font metrics cache isn't quite effective.
It seems there are some other things that we spend about twice as Gecko. I suspect we somehow flush things twice?
> It turns out that Stylo is spending 120ms (!!) doing font metric stuff during layout, whereas Gecko only spends 33ms.

Profiling on my laptop with current mozilla-central, I get 5 ms when I filter on "nsFontCache" with Stylo and 4 ms with Gecko.  Bobby will/do you get similar results with tomorrow's Nightly?
Flags: needinfo?(cam)
(In reply to Cameron McCormack (:heycam) from comment #4)
> > It turns out that Stylo is spending 120ms (!!) doing font metric stuff during layout, whereas Gecko only spends 33ms.
> 
> Profiling on my laptop with current mozilla-central, I get 5 ms when I
> filter on "nsFontCache" with Stylo and 4 ms with Gecko.  Bobby will/do you
> get similar results with tomorrow's Nightly?

Just wondering, are you and Bobby profiling on the same platform? The profiles in comment 0 appear to be from macOS; what are you running?
So, I dug into these profiles a bit more today, reprofiled a bit, and cross-referenced on windows. There are basically three confounding factors that led to the extreme difference in the profiles in comment 0:

(1) The biggest issue has to do with whether or not we get a hit on the OS font cache, which depends on whether there is any other tab open using the same font. If there isn't, we do a whole bunch of work on mac to setup the font, with lots of system IPC calls (under CreateFontGroup, FindOrMakeFont, and MakeTextRun). So the fact that it appears in one profile and not the other is dependent on which tabs I had open when taking the profiles. Windows appears to be much more efficient, so the overhead is negligible there anyway.
(2) "font" matches "FlushUserFontSet", which causes us to do a stylist flush.
(3) nsFontFaceLoader::OnStreamComplete (the handler for downloadable fonts) runs in one profile and not the other. Whether or not this comes in before the first non-blank paint seems to be rather timing-dependent, which sucks. Most of the time it comes in after, but sometimes before, which probably contributes some amount of noise to tp6. But there's not much to do about it here.

The upshot is that most of the overhead for stylo is coming from three things we know about:
* Stylist flushes: bug 1386045 (Emilio is looking into this).
* CSS parsing (bug 1331843 - Simon and jdm are working on that stuff).
* Anonymous box re-resolution (bug 1368290, just pushed to central a few minutes ago).

So nothing specific left to do in this bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(manishearth)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.