Closed Bug 1299070 Opened 3 years ago Closed 3 years ago

2.07 - 4.51% sessionrestore / sessionrestore_no_auto_restore / ts_paint (linux64, osx-10-10, windows7-32, windows8-64, windowsxp) regression on push 294c959e23689bdc9c3276c552bd91db8ae462a4 (Sun Aug 28 2016)

Categories

(DevTools :: General, defect, P1)

51 Branch
defect

Tracking

(firefox48 unaffected, firefox49 unaffected, firefox50 unaffected, firefox51 fixed)

RESOLVED FIXED
Firefox 51
Iteration:
51.3 - Sep 19
Tracking Status
firefox48 --- unaffected
firefox49 --- unaffected
firefox50 --- unaffected
firefox51 --- fixed

People

(Reporter: ashiue, Assigned: jdescottes)

References

Details

(Keywords: perf, regression, talos-regression, Whiteboard: [devtools-html])

Attachments

(2 files)

Talos has detected a Firefox performance regression from push 294c959e23689bdc9c3276c552bd91db8ae462a4. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  sessionrestore windows7-32 pgo                          686.29 -> 717.25
  4%  sessionrestore windows7-32 opt e10s                     787.33 -> 819.83
  4%  sessionrestore windows7-32 opt                          899.96 -> 936.58
  4%  sessionrestore_no_auto_restore windows8-64 opt          858.25 -> 889.92
  4%  sessionrestore windows8-64 opt e10s                     710.08 -> 735.92
  4%  sessionrestore windows8-64 opt                          820 -> 849.58
  4%  sessionrestore_no_auto_restore windows8-64 opt e10s     746 -> 772.67
  4%  sessionrestore_no_auto_restore windows7-32 opt          957.21 -> 990.92
  3%  ts_paint linux64 opt                                    1532.46 -> 1581.42
  3%  sessionrestore osx-10-10 opt                            1040.08 -> 1072.25
  3%  sessionrestore_no_auto_restore windowsxp opt            912.62 -> 940
  3%  sessionrestore_no_auto_restore windows7-32 opt e10s     833.67 -> 858.42
  3%  ts_paint linux64 opt e10s                               1455.08 -> 1494.83
  3%  sessionrestore linux64 opt e10s                         1047.42 -> 1075.67
  3%  sessionrestore_no_auto_restore osx-10-10 opt            1083.83 -> 1112.5
  2%  sessionrestore linux64 opt                              1185.42 -> 1210


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=2875

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
After doing some retriggers, this issue might be caused by one of following patches:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=087a40f5eaa909ceea0d41f0357c50ee3df47a4d&tochange=294c959e23689bdc9c3276c552bd91db8ae462a4

Hi Julian, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Blocks: 1291351, 1295608
Flags: needinfo?(jdescottes)
Flags: qe-verify-
Whiteboard: [devtools-html] [triage]
Testing locally, I could reproduce a performance regression on the sessionrestore tests (~2%).
I think the root cause of the regression is that since Bug 1295608, devtools are parsing localization properties files using javascript. And on browser startup, the following files are being parsed for devtools:

# loaded by l10n.js
devtools-shared/locale/shared.properties

# loaded by definitions.js
devtools/locale/toolbox
devtools/locale/inspector
devtools/locale/webconsole
devtools/locale/debugger
devtools/locale/styleeditor
devtools/locale/shadereditor
devtools/locale/canvasdebugger
devtools/locale/performance
devtools/locale/memory
devtools/locale/netmonitor
devtools/locale/storage
devtools/locale/webaudioeditor
devtools/locale/scratchpad.properties
devtools/locale/dom.properties
devtools/locale/toolbox.properties

# loaded by browser-menus.js
devtools/locale/menus.properties 

As we can see most of them come from definitions.js, which was updated in https://hg.mozilla.org/integration/fx-team/rev/294c959e2368.
After backing out this patch, the results for sessionrestore seem to be back to the baseline (currently running tests on try to verify that).

Moving on, I can see three options here:
- backout 294c959e2368
- improve the performance of the javascript parser
- introduce some lazy loading to avoid localizing all strings at startup time

I don't think we should be investing too much time in performance improvements of the parsing library: after 1294186 lands, each property file will only ever be parsed once. I would prefer to go for the 3rd option and either lazyload definitions.js in devtools.js, or lazyload the localized strings inside of definitions.js.

I triggered the following try tests:
- backout of 294c9599e2368 : https://treeherder.mozilla.org/#/jobs?repo=try&revision=3835bb78c18c
- lazyloading : https://treeherder.mozilla.org/#/jobs?repo=try&revision=5efdb82d00c5
As discussed with bgrins, here is another approach using getters for each of the definitions.js strings :
- try https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b1e6e9c1374
- perf compare https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=1b1e6e9c1374
Flags: needinfo?(jdescottes)
Results from previous tests
===========================

- adding lazyloading for strings in definitions.js only partially solves the issue. browser-menus.js will still access the label & accessKey for each registered tools on startup in order to build the developer menu. So even with lazy loading, we are still loading properties files for 6 tools (+1 for ellipsis, + 1 for the other menu strings).
- backout of the definitions.js change on a recent fx-team still seems to be ~2% slower on sessionrestore tests. ts_paint tests seem to be fine. Applying the backout on top of the regression itself, performances seem extremely similar to the "good" reference. The only properties file parsed in this case is the one for the ellipsis.

[I have lots of over test results but detailing all of them here would be too long, ping me if interested.]

There is no way to completely avoid loading 7 devtools properties files (6 default tools registered by default, plus the menu). Introducing lazyloading will at most get down the number of parsed properties files from 13 to 7. From the results I get on perfherder so far it doesn't translate into a 50% perf improvement though (https://treeherder.mozilla.org/#/jobs?repo=try&revision=b33f0e319e70). And I still don't really understand why.

Local profiling
===============

Locally I measured the time spent to localize all strings needed by devtools on browser startup
- ~2ms using Services.string (similar to the situation before Bug 1295608)
- ~25ms using require + js properties parser

The 23ms increase seems to make sense. For instance, the sessionrestore test takes a little under 700ms on my machine, which means string localization is responsible for a bit more than 3% of that time. This is consistent with the initial regression figures. My assumption is that the regression solely comes from this.

If I analyze where the time is spent between requiring & parsing. On average requiring takes 6/25ms and parsing takes the remaining 19/25ms. (implementing a simpler parser, I got this down to 7ms for overall parsing, but some edge cases are missing and not sure how fast it can be while still being complete enough for our needs).

The time is evenly spent amongst the 13 properties files, so ~2ms for each. So I would expect the regression to be completely proportional to the number of properties files downloaded. As I said, that's not what I have seen so far on perfherder. I am trying to get more profiling data from perfherder, I will know later today if what I'm seeing locally correlates with what happens on perfherder.

[I also noticed a strange behavior. After doing a fresh build of Firefox, the two first times I open the browser, I will get the ~23ms figure mentioned above, but the 3rd time, the first require takes 70ms. I have not observed this kind of variance with the Services.string implementation. This also doesn't seem to occur when I run the Talos tests locally.]

Next steps
==========

Further improvements to investigate/discuss:
- regroup all definitions strings in a single properties file
- use a faster parser
- keep using Services.string when available
> - regroup all definitions strings in a single properties file

Here is a try push implementing this : https://treeherder.mozilla.org/#/jobs?repo=try&revision=d95b6dab4a7f
And the perfherder comparison with the good baseline: https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=087a40f5eaa909ceea0d41f0357c50ee3df47a4d&newProject=try&newRevision=d95b6dab4a7f&framework=1&showOnlyImportant=0

There is still one orange on sessionrestore opt e10s for Linux64. And I see a lot of ~+1% on ts_paint and sessionrestore tests even if they don't popup as "important" changes.

Alison, could you have a look at the perfherder link above? In my opinion this is still not OK, but I'm not used to work with Perfherder and maybe you have a different take on this.
Flags: needinfo?(ashiue)
thanks Julian for coming up with a fix and getting great data on try.  I took some time this morning looking at this, and overall I think these changes are fixing the large majority of any regressions :)

Looking at the graph links available on the compare view, you can see many of the values from your try push is posting lower than the values on fx-team, very similar to what you see prior to the regression.  Another way to look at this is looking at the raw values seen in the compare view vs the raw values from comment 0, things are much closer to the original.
Flags: needinfo?(ashiue)
Thanks for the feedback Joel! 

I've tried some other variations (https://treeherder.mozilla.org/#/jobs?repo=try&revision=79908fa3180c & https://treeherder.mozilla.org/#/jobs?repo=try&revision=361b5d153b45) but I'm not seeing a huge difference compared to https://treeherder.mozilla.org/#/jobs?repo=try&revision=d95b6dab4a7f.

One issue with this patch is that it involves moving a lot of localized strings from one file to another. I guess we will again need a kind of migration script for localization teams (see Bug 1295157).
Assignee: nobody → jdescottes
Status: NEW → ASSIGNED
Iteration: --- → 51.3 - Sep 12
Priority: -- → P1
Whiteboard: [devtools-html] [triage] → [devtools-html]
Comment on attachment 8787247 [details]
Bug 1299070 - Move l10n.ELLIPSIS behind getter;

https://reviewboard.mozilla.org/r/76090/#review74032
Attachment #8787247 - Flags: review?(bgrinstead) → review+
Axel, to fix this performance regression, I want to move ~60 strings from existing properties files to a new properties file.
What do you advise to do wrt localization teams? Would an email explaining what changed and why be ok here?
Flags: needinfo?(l10n)
Is it possible to see what the string move looks like?

I don't think there's much else that we can do at the moment, besides sending an email. I can take care of that in the report I send at the beginning of the cycle.

It's also still not completely clear what's the timeline for bug 1295157 is, and if it's going to happen for all devtools at the same time.
Flags: needinfo?(l10n)
Comment on attachment 8787248 [details]
Bug 1299070 - Merge all startup strings in startup.properties;

https://reviewboard.mozilla.org/r/76092/#review74074

Good find
Attachment #8787248 - Flags: review?(bgrinstead) → review+
(In reply to Francesco Lodolo [:flod] from comment #12)
> Is it possible to see what the string move looks like?
> 

You can check https://bugzilla.mozilla.org/attachment.cgi?id=8787248

> I don't think there's much else that we can do at the moment, besides
> sending an email. I can take care of that in the report I send at the
> beginning of the cycle.
> 

Ok, thanks a lot!

> It's also still not completely clear what's the timeline for bug 1295157 is,

[Will also reply directly in Bug 1295157]
I don't think we will have a script ready for the next merge day (Sep 12), so I'll wait after that to start landing DTDs -> properties patches. However I think we will have to start landing in Firefox 52 anyway. 

> and if it's going to happen for all devtools at the same time.

For now we focus on the inspector, webconsole and overall toolbox. So the migration will not happen for all devtools at the same time. Other modules will be migrated later (TBD).
Comment on attachment 8787248 [details]
Bug 1299070 - Merge all startup strings in startup.properties;

Francesco, I'll let you review the file and let me know if you have any concern before landing this.

Thanks!
Attachment #8787248 - Flags: feedback?(francesco.lodolo)
Comment on attachment 8787248 [details]
Bug 1299070 - Merge all startup strings in startup.properties;

The patch looks good, since it moves existing strings from one file to the other.

This also made me notice one bad approach to strings, and it's mostly an FYI for the future.

ToolboxDebugger.label=Debugger
debuggerMenu.accesskey=D

The two strings have IDs completely unrelated, you need a wild guess to know which string an accesskey is related to, unless you look at the code.

Proper and intuitive naming schema:

debuggerMenu.label=Debugger
debuggerMenu.commandkey=S
debuggerMenu.accesskey=D
Attachment #8787248 - Flags: feedback?(francesco.lodolo) → feedback+
Component: Untriaged → Developer Tools
https://hg.mozilla.org/mozilla-central/rev/7aaba7598abb
https://hg.mozilla.org/mozilla-central/rev/e9ca1d928e35
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Found one weirdness only after copying strings in my own localization: storage.tooltip3 is now in both storage.properties and startup.properties, I assume that's not wanted?
Flags: needinfo?(jdescottes)
(In reply to Francesco Lodolo [:flod] (slow to reply, mostly away Sep 3-10) from comment #21)
> Found one weirdness only after copying strings in my own localization:
> storage.tooltip3 is now in both storage.properties and startup.properties, I
> assume that's not wanted?

Thanks for catching that! Incidentally, I have an opened bug for cleaning up unused strings (Bug 1299580), I attached a patch to remove it there.
Flags: needinfo?(jdescottes)
Thanks for this fix! The performance improvement already shown on talos.

Improvements:

  4%  sessionrestore_no_auto_restore windowsxp opt e10s     804.46 -> 773.42
  4%  sessionrestore linux64 pgo e10s                       959.79 -> 924.58
  3%  sessionrestore_no_auto_restore osx-10-10 opt e10s     1025.67 -> 992.67
  3%  sessionrestore_no_auto_restore windows8-64 opt        902.42 -> 875
  3%  sessionrestore windows7-32 opt                        942.29 -> 913.92
  3%  sessionrestore windows8-64 opt                        863.42 -> 837.75
  3%  sessionrestore osx-10-10 opt e10s                     980.25 -> 951.58
  3%  sessionrestore linux64 opt e10s                       1087.75 -> 1059.33
  3%  sessionrestore_no_auto_restore linux64 opt e10s       1138.33 -> 1109.17
  3%  ts_paint linux64 pgo e10s                             1329.42 -> 1295.83
  2%  ts_paint linux64 pgo                                  1412.83 -> 1384.42
  2%  sessionrestore_no_auto_restore windowsxp opt          950.33 -> 935.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=2987
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.