Performance change between Firefox 70 and Firefox 71 Wikipedia
Categories
(Core :: Graphics: Text, defect, P3)
Tracking
()
Performance Impact | high |
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox71 | --- | wontfix |
firefox72 | --- | wontfix |
firefox73 | --- | fix-optional |
People
(Reporter: peter, Unassigned)
References
(Regression)
Details
(Keywords: perf:pageload, regression)
Attachments
(4 files)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Safari/605.1.15
Steps to reproduce:
Hi, I work in the Wikimedia Foundations performance team and I've seen some changes in performance between 70 and 71. Let me explain: We focus on first visual change since we know that is one of our most stable metrics. We run to two different tests: one using Browsertime with WebPageReplay to measure frontend performance and one set of tests using WebPageTest.
I got alerts from our Browsertime/WebPageReplay tests last days for first visual change (we test 12 wikis and three pages each) but I wasn't 100% sure because I also changed Firefox preferences during that time.
But when WebPageTest also switched to 71 I could see changes in the performance there too. However we got one page that got a lot faster in our testing with 71:
https://en.wikipedia.org/wiki/Barack_Obama
But we also got pages that got slower:
https://en.wikipedia.org/wiki/Facebook (I've attached a screenshot what the change look like for us).
Our bug for this is: https://phabricator.wikimedia.org/T240557
We also have RUM metrics but Firefox is usually a small part so we cannot see changes + miss first paint there.
Is there any change you all can backtrack to these changes? I want to know if there's something we can do different to avoid the pages getting slower.
Comment 1•6 years ago
|
||
Hi Peter,
Thanks for being clear about the bug. I used this add-on https://addons.mozilla.org/en-US/firefox/addon/load-time/#&gid=1&pid=1 to measure the loading time of the page https://en.wikipedia.org/wiki/Facebook. I'm a bit confused since some of the parameters have improved and some have worsened. Which one do you take as a reference?
Regards, Flor.
Reporter | ||
Comment 2•6 years ago
|
||
Hi Florencia, we focus on First Visual Change (when something first is painted on the screen) that we get from a video we record, but we also get your timeToNonBlankPaint that also have the same behavior in our tests.
However on the English Wikipedia we could see one page getting faster (and that I also think was tied to the Firefox upgrade). A couple of more pages where we could see a regression:
https://zh.wikipedia.org/wiki/Facebook
https://zh.wikipedia.org/wiki/北京市
But there are also a lot pages that we test where I don't see any change at all. So it's hard :) I'm curios if you know any change that could affect the first paint in the changes in 71? It's hard for me from the outside checking changes.
![]() |
||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
I've repro'd this locally on Linux using browsertime (aka sitespeed). I see a ~20% degredation in FP/FVC and SpeedIndex/loadtime/etc.
$ ./mach browsertime --browser firefox --visualMetrics=true --firefox.binaryPath=obj-opt/dist/bin/firefox https://en.wikipedia.org/wiki/Facebook --iterations 10 --video=true
I also tried it with profiles on:
./mach browsertime --browser firefox --visualMetrics=true --firefox.binaryPath=obj-opt/dist/bin/firefox https://en.wikipedia.org/wiki/Facebook --iterations 5 --video=true --firefox.geckoProfiler --firefox.geckoProfilerParams.features=js,stackwalk,leaf,responsiveness,screenshots --firefox.geckoProfilerParams.threads="GeckoMain,Compositor,Socket Thread,SSL,Cert"
Note that to build FF70 locally if you have an up-to-date rust, you need to do "rustup override set 1.36.0" in the ff70 repo. To build 71, you need to apply a patch from bug 1602358 (which need minor conflict resolution to apply to 71)
Results from ff70:
[2019-12-16 16:22:44] INFO: https://en.wikipedia.org/wiki/Facebook 0, backEndTime: 91ms (±6.45ms), firstPaint: 1.75s (±61.45ms), firstVisualChange: 1.79s (±54.73ms), DOMContentLoaded: 1.81s (±55.54ms), Load: 1.84s (±56.58ms), speedIndex: 1984 (±152.65), perceptualSpeedIndex: 2027 (±161.28), contentfulSpeedIndex: 1837 (±73.54), visualComplete85: 2.11s (±225.25ms), lastVisualChange: 2.50s (±99.02ms), rumSpeedIndex: 668 (±20.33) (5 runs)
ff71:
[2019-12-16 12:42:34] INFO: https://en.wikipedia.org/wiki/Facebook 0, backEndTime: 92ms (±6.16ms), firstPaint: 1.13s (±9.65ms), firstVisualChange: 1.18s (±8.39ms), DOMContentLoaded: 1.18s (±9.34ms), Load: 1.22s (±10.15ms), speedIndex: 1189 (±8.42), perceptualSpeedIndex: 1212 (±8.51), contentfulSpeedIndex: 1185 (±8.34), visualComplete85: 1.18s (±8.39ms), lastVisualChange: 1.71s (±16.78ms), rumSpeedIndex: 208 (±9.73) (10 runs)
(those are with screenshots and profiling on; I see similar though faster results with profiling and screenshots off)
The primary difference I saw looking at the profiles of the median run from 70 and 71 is that Reflow takes about 25% longer (1024ms vs 745ms)
I'll attach the profiles, but note most symbols are missing since browsertime doesn't correctly set the directories for geckoprofiler to get symbols from. You should see similar results with local profiling runs. I have the raw numbers from 10 runs without profiling, and 5 with, for each of 70 and 71, as well as videos of each one (which can be combined with a script to do side-by-sides).
Profiles of Median FirstVisualChange run for each:
ff70: https://perfht.ml/2EupHDG ff71: https://perfht.ml/2Puygoi
I have 4 more of each profile if you want.
This page and the Barack Obama page, which did not regress apparently, are very similar. Also, we monitor a (recorded copy) of the Barack Obama page in perfherder/raptor-tp6, and saw no regressions.
Vicky indicated layout perf issues should start with Matt and Maire; NI-ing
Feel free to reach out to me with any questions or for assistance.
Comment 4•6 years ago
|
||
If this is reproducible, it should be feasible to run mozregression on this, right?
Comment 5•6 years ago
|
||
I asked Jesup, and, quoting:
jesup: you don't need profiler, --video or --videometrics to repro - you can do it off FP/FVC or loadtime I believe
jesup: not doing metrics makes it much faster
emilio: jesup so ideally I can get a central checkout and do ./mach browsertime with the right path to the mozregression-downloaded binary path?
emilio: So with --firefox.binaryPath=/path/to/wherever/mozregression/unpacked/stuff
jesup: Yes, I think that'll work
So is someone volunteering? Or did I just volunteer? ;)
(I can try to do this tomorrow or sometime this week if nobody does it before me)
Comment 6•6 years ago
|
||
Note to self: I just tried the above, and it almost works. I had to use MOZ_DISABLE_CONTENT_SANDBOX=1
, because of bug 1597792 (which I forgot about). d'oh.
Comment 7•6 years ago
|
||
Mozregression pointed me to bug 1583707: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=28fcea56b3a3ee186aa736c80e1f95b62c7bcf74&tochange=566dce282e2c06afa4318ba35d471f5c2b8c85e2
If I understand correctly that change made our layout on Linux be similar to other OSes. But it'd be good to confirm.
Comment 8•6 years ago
|
||
(for reference)
Comment 9•6 years ago
|
||
So the way I tested above (also for reference) is running mozregression normally, but then grabbing the binary name (like /tmp/yadayada/firefox/firefox) and using: ./mach browsertime --browser=firefox --iterations 5 https://en.wikipedia.org/wiki/Facebook --firefox.binaryPath=/tmp/tmph1u87yxm/firefox/firefox
.
The page load / time to first paint times would be different enough for it to be noticeable. Se above.
So, can somebody reproduce this with mozregression --repo autoland --launch 28fcea56b3a3ee186aa736c80e1f95b62c7bcf74
(good) and mozregression --repo autoland --launch ebeb78d6ad375a58cab872a7d49ab712c6653b48
(bad)?
Comment 10•6 years ago
|
||
Also, would be good to confirm that the above does not reproduce on Mac / Windows.
Comment 11•6 years ago
|
||
Tentatively assuming I got the regression range right.
Comment 12•6 years ago
|
||
So I guess the next thing to figure out is whether this is a Linux-only issue (as in, cairo being slow), or this affects other platforms more generally (as in, subpixel glyphs are slow) and we just noticed with the above patch. But I'm not familiar with that code at all so... :)
Updated•6 years ago
|
Comment 13•6 years ago
|
||
Emilio - thanks. There might be more than one problem here - or it might be that the testing they're doing in their automation is on linux. Peter?
We should try to re-verify the regression on Windows or Mac
Lee, should we expect this type of perf regression on this page? Why do we see it here, and not on the Barack Obama page (which got faster)? Is it just luck-of-positioning?
Reporter | ||
Comment 14•6 years ago
|
||
I had a look at our RUM and I couldn't see anything there (the main switch started the 6->7 of December) but since we don't have any paint timings there its hard to say. But we could see this on timeToNonBlankPaint on synthetic, so if in this case it would have helped us to have that turned on by default = we would know on other platforms too, however I understand if there are other considerations. Or are you able to see any difference in Navigations Timing metrics too, I didn't fully get that?
We test only on Linux but we do it on two different systems. Both with Browsertime/WebPageReplay and WebPageTest (without replaying traffic) and i could see the regression on those pages on both systems. We did the switch to 71 on different times so it's easy to see. We have no automatic setup for Windows/Mac, sorry.
Comment 15•6 years ago
|
||
Ok, so that would support this being a linux gfx issue (or if you prefer, a linux gfx bugfix that as a side-effect of slowing Reflow). Lee: any chances for improvements in the perf for this feature, or is this a hit we take on all platforms when sub-pixel rendering, and Linux just gets to take the hit now along with the others?
Updated•6 years ago
|
Updated•6 years ago
|
Comment 16•6 years ago
•
|
||
So, with regards to subpixel positioning, the worst case scenario is we may be potentially adding up to 4x as many copies of a given glyph to the glyph cache, if there are instances of the same glyph but at 4 different subpixel locations. On average, though, I'd expect more in the range of 2-3x more FreeType rasterization overhead of initializing a glyph and putting into the glyph cache.
Now, another scenario is the glyph cache itself. Once things are all entered into the glyph cache for a given font, we are no longer actually rasterizing the glyphs in FreeType, and instead are just blitting them straight from the stored copy in the glyph cache. But this glyph cache is itself fixed in size, so if there are too many glyphs to actually fit within the cache limits, we may end up thrashing.
But if it's more first load that's the problem, I wouldn't necessarily be implicating thrashing of the glyph cache to be the issue, but more just that subpixel positioning requires us to punt to FreeType more often to initialize those different locations, and that overhead is an entirely expected trade-off for the improved quality in text layout.
Other issue, those profiles don't seem to have any symbols, so it's hard to really analyze them too heavily. Can we possibly get some better annotated ones?
Comment 17•6 years ago
|
||
Lee: Manual profiles, with full symbols, local opt builds on linux:
FF70: https://perfht.ml/2rZ9oMp
FF71: https://perfht.ml/2S6box6
Comment 18•6 years ago
|
||
Peter: Have there been any significant changes to the /Facebook page (the fonts/css/etc) in the last day or so, especially in the last several hours? I was working to get better profiles, and the regression seems to have disappeared in-between two sets of runs (one showed it, subsequent ones don't). I'm re-running test sequences to try to verify. I'll also check the page history: changes are minimal; someone added a sentence and a reference with a link.
Reporter | ||
Comment 19•6 years ago
|
||
Argh yes. Looking at our WebPageTest tests something changed for that pages the 17/12 -12 13 CET or around that time. The first visual change changed and become much more stable. Checked what changed, its "only" content: https://en.wikipedia.org/w/index.php?title=Facebook&type=revision&diff=931178682&oldid=931146866
There's no visual change between those changes:: https://s3.amazonaws.com/synthetic-tests-result-wikimedia/en.wikipedia.org/2019-12-17-13-01-38/pages/en.wikipedia.org/wiki/Facebook/data/screenshots/wpt-1-firstView.png
and https://s3.amazonaws.com/synthetic-tests-result-wikimedia/en.wikipedia.org/2019-12-17-12-16-06/pages/en.wikipedia.org/wiki/Facebook/data/screenshots/wpt-1-firstView.png
I've seen this before that First Visual Change has been really sensitive to small content changes in Firefox.
https://zh.wikipedia.org/wiki/北京市 seems to still have the issue.
Updated•5 years ago
|
Comment 20•5 years ago
|
||
Removing regression-window wanted based on Comment 11.
Updated•5 years ago
|
Comment 21•5 years ago
|
||
Marking this as fix-optional for 73 to remove it from weekly regression triage. We can still take a patch for 73 or 74.
Updated•5 years ago
|
Comment 22•4 years ago
|
||
Hey Peter, we're looking at this again. I see you closed the issue on your side. Do you see similar issues nowadays or has it vanished sometime between 71 and 72?
Thanks!
Reporter | ||
Comment 23•4 years ago
|
||
Hi Julien, I closed our issue since it was so long time ago and many releases. I usually see regressions/wins on some URLs per release, but its hard report/or fins the exact issue. I think you can close this issue, thanks!
Comment 24•4 years ago
|
||
Thanks for the feedback!
Updated•3 years ago
|
Updated•3 years ago
|
Description
•