Measure start up performance with Fluent
Categories
(Firefox :: New Tab Page, task, P1)
Tracking
()
People
(Reporter: pdahiya, Assigned: pdahiya)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
1.27 MB,
image/png
|
Details |
Monitor performance matrix as fluent changes land using
- Video Recordings
- Telemetry
- Talos
Assignee | ||
Comment 1•5 years ago
|
||
Initial talos run on WIP Fluent branch shows performance gain
Comment 2•5 years ago
|
||
After retriggering some more, the 25% high confidence improvement about_preferences became low 14%, but there still remains quite a few that remain high with 5base/5new runs:
Also just noting that the current WIP isn't fully fluent (fluent-dom; no fluent-react) for new tab yet with just a some more react-intl references left. The final removal should also mean we can remove the vendored dependency and not load the script at all - filed bug 1560689. But this might lead to some performance improvement as well.
Comment 3•5 years ago
•
|
||
So… I tried stubbing out react-intl (relative to mozilla-central without any fluent changes) to see the performance behavior, and it looks like it reproduces the win64 ~10% improvement to startup_about_home_paint, tart, sessionrestore, ts_paint and various others
Here's the code changes but nothing too interesting -- just stubbing enough for the page to not throw exceptions. And looks like the attached screenshot (some strings come from content / layout but no strings in search box or context menu):
https://hg.mozilla.org/try/rev/cbc43ed43aa99c0e70dc359bda17039ba5f623c7
k88hudson, any ideas? I guess react-intl might be causing additional re-renders or perhaps these talos tests happen to measure some load event that happens to get delayed by react-intl (but not by fluent-dom)… ??
Comment 4•5 years ago
•
|
||
Trying to dig into more of what's actually causing the slowdown / improvement for win64 startup_about_home_paint…
stub ReactIntl.FormattedMessage
-7.94% win64
-9.09% win64-qr
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2901d32eddaa6accd04b30725c7307338e94f48f&newProject=try&newRevision=3b10306002818e4b4e5bb815953c466c8860297e&showOnlyImportant=1&showOnlyConfident=1
stub ReactIntl.injectIntl
-8.76% win64
-8.51% win64-qr
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2901d32eddaa6accd04b30725c7307338e94f48f&newProject=try&newRevision=a2595c91882010865d1348116f49491bc3af526b&showOnlyImportant=1&showOnlyConfident=1
stub ReactIntl.FormattedMessage and ReactIntl.injectIntl
-10.25% win64
-9.26% win64-qr
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2901d32eddaa6accd04b30725c7307338e94f48f&newProject=try&newRevision=4f36d7fe2106b58f6d0d2a761b833b134ebd1bf1&showOnlyImportant=1&showOnlyConfident=1
stub ReactIntl.IntlProvider, ReactIntl.FormattedMessage and ReactIntl.injectIntl
-10.60% win64
-9.20% win64-qr
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=2901d32eddaa6accd04b30725c7307338e94f48f&newProject=try&newRevision=d09664542dcb38375c82b35546d680e864b1d0cb&showOnlyImportant=1&showOnlyConfident=1
And with the changes from that last try, here's a startup profile from my machine:
https://perfht.ml/2Rv8eR4
filtering javascript privileged content stack for "activity-stream.bundle":
303ms total = 278ms bound + 24ms onLoaded + 1ms intersection
and filtering for "react-intl":
6ms total = 4ms bound + 2ms onLoaded
And from the reference m-c commit (Bug 1560527 - Enable make backend verbose mode automatically…):
https://perfht.ml/2RuL655
activity-stream.bundle:
334ms total = 298ms bound + 33ms onLoaded + 3ms intersection
react-intl:
34ms total = 27ms bound + 7ms onLoaded
Which seems to match up with roughly 10% of activity-stream.bundle running time associated to react-intl… ?
The function with most time (18ms of 27ms) associated to react-intl is not surprisingly, FormattedMessage.render:
https://github.com/formatjs/react-intl/blob/v2.9.0/src/components/message.js#L77
Where it looks to parse/tokenize then cache various strings and values/placeables in js.
So overall I guess one benefit of fluent-dom is that it's handled lower than js.
Comment 5•5 years ago
|
||
Yeah, I speculated about that myself last week. https://searchfox.org/mozilla-central/source/dom/l10n/L10nMutations.cpp should handle all data-l10n-id elements based on ContentInserted
, in one loop in C++.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
Assignee | ||
Comment 7•5 years ago
|
||
Performance measurement with try server build using June 26th fluent branch shows performance improvement by ~5.8% in cached new tab. Slight performance gains ~.29% is seen with no cache recordings. Details and benchmark numbers included in doc
Assignee | ||
Comment 8•5 years ago
|
||
NI Nan to help monitor for any major changes in telemetry numbers post https://bugzilla.mozilla.org/show_bug.cgi?id=1561811 lands in m-c. Thanks!
Comment 9•5 years ago
|
||
Just to confirm, we're basically seeing perf improvements from dropping react-intl and using "standard" fluent DOM support? Do we use it in any other places besides AS?
Comment 10•5 years ago
|
||
Yeah, in this specific case it seems to come from react-intl, and it's only used in newtab. But from what I looked at, it doesn't seem to be doing anything particularly bad/slow. E.g., if we implemented our own JS object with strings -> string lookup + replacement, it probably wouldn't differ by too much, so the main thing is actually moving the string handling out of JS to native.
Comment 11•5 years ago
|
||
Hmmmm..................................... I retriggered win64/qr "other" talos with startup_about_home_paint for the mozilla-central commits before and containing the fluent changes, and there doesn't seem to be the improvements we saw on try…
windows10-64 graph 730.20 ± 0.66 < 763.60 ± 16.73 4.57%
0.83 (low) 10 / 10
windows10-64-qr graph 684.90 ± 3.08 < 685.20 ± 0.66 0.04%
0.04 (low) 10 / 10
So then I added the same "other" runs to check the specific commit and parent for startup_about_home_paint:
windows10-64 graph 736.60 ± 1.69 > 725.70 ± 0.80 -1.48%
2.52 (low) 10 / 10
windows10-64-qr graph 711.90 ± 10.62 > 687.40 ± 0.85 -3.44%
1.02 (low) 10 / 10
And there's a low confidence improvement but still not quite like what we saw on try…
Comment 12•5 years ago
|
||
Interesting, have we checked the perceived performance through the manual tests? Wonder if we can find something noticeable there.
Assignee | ||
Comment 13•5 years ago
|
||
Could it because Fluent June 26 build on try was an artifact build triggered with try: -b o -p win32,win64,linux64,macosx64 -u crashtest,firefox-ui-functional,geckoview,geckoview-junit,jsreftest,marionette,marionette-e10s,mochitests,reftest,robocop,web-platform-tests,xpcshell -t all --rebuild-talos 5 --artifact
Comparing with m-c commit it shows gain and will be good to know whats different from autoland or mozilla-central treeherder runs
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=7ffabb358c4255897db3ceb09cad21a4731cb0ae&newProject=try&newRevision=c00a30b6fce0773e5bb36ad36f7a19deade2ea95
Manual video recordings were done with try build and showed improvement with new tab cached flow as recorded in #comment 7
Updated•5 years ago
|
Comment 14•5 years ago
|
||
(In reply to Punam Dahiya [:pdahiya] from comment #13)
Could it because Fluent June 26 build on try was an artifact build
Looks like you compared try with m-c. If I'd realized this was an artifact build, I could have told you this wasn't going to work...
Looking closer, it looks like you're comparing m-c opt (non-pgo) builds with try artifact builds. The try build log says:
18:43:45 INFO - Installing from https://queue.taskcluster.net/v1/task/V_11VICvRMaRgq5eF_RBwQ/artifacts/public/build/target.zip
which is from https://queue.taskcluster.net/v1/task/V_11VICvRMaRgq5eF_RBwQ which says "index.gecko.v2.mozilla-central.shippable.latest.firefox.win64-opt" which is code for "pgo".
So you're ending up comparing pgo to non-pgo. Which explains all the massive gains in tests that should be completely unaffected by this code change.
Comment 15•5 years ago
|
||
Ahh.. doh. Thanks. I guess in the future, don't use the try:
syntax and just do something like ./mach try fuzzy -q"'ship 'talos" --rebuild 5
Comment 16•5 years ago
|
||
(In reply to Ed Lee :Mardak from comment #15)
Ahh.. doh. Thanks. I guess in the future, don't use the
try:
syntax and just do something like./mach try fuzzy -q"'ship 'talos" --rebuild 5
With any ./mach try
usage (including fuzzy), if you want non-artifact and your local machine is configured to use artifact, you'll want to pass --no-artifact
.
This problem also goes away if you do a baseline push (ie push the m-c changeset off which you're basing your changes to try with no other changes and the same mach try
command as the test changeset, then compare those - it guarantees apples to apples comparisons).
Assignee | ||
Comment 17•5 years ago
•
|
||
Thanks for feedback. Started to test try comparison runs using ./mach try fuzzy -q "'ship 'talos" --rebuild 5 --no-artifact
a) Fluent changes in m-c (June 28) -
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1211fe3ca7cb83e13814d2db14ffc3791b19d0f3
b) m-c without Fluent changes
https://treeherder.mozilla.org/#/jobs?repo=try&revision=920d367372755ae2fcbc0db36c28c63279d3e48f
Assignee | ||
Comment 18•5 years ago
|
||
Here's final take on performance improvements seen on new tab with Fluent conversion
a) Talos runs shows low confidence improvement with Fluent branch, perf comparison above in #comment 11 and #comment 17
b) Video recordings using artifact build on try shows 200ms (~5.88%) improved performance in cached flow with no cache flow showing slight gain.
Details logged here:
https://docs.google.com/document/d/1npgtp-ycX2Wc-pyiu4b2mtisfW-sXS07VrJ4eNNFSes/edit#bookmark=id.d0owe3que6sp
Description
•