Closed Bug 1455115 Opened 6 years ago Closed 6 years ago

Enable Parallel CSS Parsing

Categories

(Core :: CSS Parsing and Computation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

Details

I'm landing it preffed off in bug 1346988, will flip the pref a day later.
Backed out for turning mochitest test_bug1166138.html (bug 1240225) into almost permafailure on Android.

Push with failures: https://is.gd/BDhA5B

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=174613959&repo=autoland&lineNumber=1623

Backout link: https://hg.mozilla.org/integration/autoland/rev/209bf033ef6dd029c3253384780b5c2081d1397f
Flags: needinfo?(bobbyholley)
Depends on: 1240225
TL;DR: Parallel CSS parsing improves tp6_facebook, tp6_amazon, and tp6_youtube by 15%, 10%, and 5% respectively. There is pre-existing non-determinism in tp6_youtube that may cause Talos to report a large regression on that test, but we have strong evidence that it is actually a win. Requesting signoff from Product to ride the 61 train.

===Analysis and Measurements===

Tp6 measures time to first non-blank paint. This can be highly non-deterministic depending on the testcase, because Gecko makes no guarantees about how much content is loaded when the first paint occurs. Depending on their characteristics, some testcases will behave consistently in practice, which is the case for tp6_{facebook, amazon, google}. But results for tp6_youtube are currently all over the place for two reasons:
(1) Painting races with the DOM parser, and thus the first paint intermittently comes when we have either ~600 elements in the DOM or ~6000 elements, which in turn has a large impact on how much work must be done before painting.
(2) The recording includes a large overlay ad for Ford electric vehicles, which seems to intermittently flush layout at weird times.

Parallel stylesheet parsing alters the timing and order of operations in early pageload, and thus perturbs the existing youtube instability to stablize closer to the upper threshold, reporting a regression.

The consensus between dbaron, bz, and myself is that we should really be measuring the first paint that arrives after the </body> tag. I wrote a 1-line patch to do this, as well as another patch to block the specific youtube ad. These patches together constitute the "measurement correction" discussed below.

So we have three sets of measurements: Baseline [1], Baseline + Measurement Correction [2], and Baseline + Measurement Correction + Parallel Parsing [3].

Comparing the correction against baseline [4] demonstrates the following:
* Youtube volatility goes away and becomes stable.
* Facebook and Google are mostly unchanged.
* Amazon time increases by ~20% (since we're measuring more of page load), but volatility does not increase.

This establishes the measurement corrections as valid, and thus we can use [2] as a new baseline to evaluate the impact of parallel CSS parsing.

This comparison [5] shows the following performance improvements with parallel CSS parsing on windows:
* 13-15% speedup on facebook
* 4-5% speedup on youtube
* 10% speedup on amazon
* 0-3% speedup on google

This matches my local profiling, and the motivation for this work in the first place (which is that CSS parsing is a major main-thread pageload bottleneck on tp6).

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0e45c13b34e8
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7d87007b78ae1c685d348d277b8cbf0669b3746
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba8f65b5aad4ce17af8611074b6f4f2db95b7b68
[4] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=0e45c13b34e8&newProject=try&newRevision=a7d87007b78ae1c685d348d277b8cbf0669b3746&framework=1&filter=tp6
[5] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a7d87007b78ae1c685d348d277b8cbf0669b3746&newProject=try&newRevision=ba8f65b5aad4ce17af8611074b6f4f2db95b7b68&framework=1&filter=tp6%20e10s
Flags: needinfo?(jgriffiths)
Flags: needinfo?(bobbyholley)
https://hg.mozilla.org/mozilla-central/rev/30e93e3ba260
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Perf win on Android!

== Change summary for alert #12823 (as of Thu, 19 Apr 2018 22:10:30 GMT) ==

Improvements:

  4%  remote-nytimes android-4-2-armv7-api16 opt      3,241.90 -> 3,098.82

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12823
Bobby: great summary, looks great to me! Ship it.
Flags: needinfo?(jgriffiths) → needinfo?(bobbyholley)
Landed in comment 5, so far so good. :-)
Flags: needinfo?(bobbyholley)
Just chatted with sphil and abovens, two action items for me:
* Implement the "corrected" metrics in the platform for tp6 to use.
* Do rough measurements of parallel css parsing performance on large sheets to determine optimal chunking to suggest to web developers.
== Change summary for alert #12830 (as of Thu, 19 Apr 2018 22:10:30 GMT) ==

Regressions:

 42%  tp6_youtube windows7-32 opt e10s stylo     228.46 -> 324.75

Improvements:

 13%  tp6_facebook windows10-64 opt e10s stylo     191.04 -> 166.58
 12%  tp6_facebook windows7-32 opt e10s stylo      195.83 -> 171.83
 11%  tp6_facebook windows7-32 pgo e10s stylo      181.27 -> 160.58
  9%  tp6_facebook linux64 pgo e10s stylo          161.60 -> 147.42
  9%  tp6_facebook_heavy linux64 pgo e10s stylo    160.02 -> 146.04
  8%  tp6_facebook_heavy linux64 opt e10s stylo    168.85 -> 155.62
  8%  tp6_facebook linux64 opt e10s stylo          170.21 -> 157.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12830
That all matches what I expect, except I would also expect a (smaller) win to appear on tp6_amazon. Do you see anything like that on the graphs?
Flags: needinfo?(igoldan)
(In reply to Bobby Holley (On Leave Until June 11th) from comment #11)
> That all matches what I expect, except I would also expect a (smaller) win
> to appear on tp6_amazon. Do you see anything like that on the graphs?

There are improvements for tp6_amazon. Perfherder didn't catch them because of the noise of the tests:

5%  tp6_amazon windows7-32 pgo e10s stylo        283.38 -> 269.29
2%  tp6_amazon windows10-64 opt e10s stylo       306.79 -> 301.79
Flags: needinfo?(igoldan)
You need to log in before you can comment on or make changes to this bug.