Closed Bug 1415196 Opened 3 years ago Closed 3 years ago

4.21 - 21.46% tp6_amazon_heavy / tp6_facebook / tp6_facebook_heavy (linux64, windows7-32) regression on push 929b9943b4fc8af6af3d7553249995f51e02ac1d (Thu Nov 2 2017)

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox58 --- affected

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=929b9943b4fc8af6af3d7553249995f51e02ac1d

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 21%  tp6_facebook summary linux64 pgo 1_thread e10s     293.02 -> 355.92
 17%  tp6_facebook_heavy summary windows7-32 opt e10s    297.96 -> 349.08
 17%  tp6_facebook_heavy summary linux64 pgo e10s        293.02 -> 342.79
 13%  tp6_facebook summary linux64 pgo e10s              294.15 -> 332.00
 13%  tp6_facebook summary windows7-32 opt e10s          306.74 -> 345.27
 12%  tp6_facebook summary windows7-32 pgo 1_thread e10s 252.85 -> 282.12
  9%  tp6_facebook summary windows7-32 opt 1_thread e10s 312.27 -> 340.58
  5%  tp6_facebook_heavy summary linux64 opt e10s        327.42 -> 343.83
  4%  tp6_amazon_heavy summary windows7-32 pgo e10s      553.50 -> 576.79

Improvements:

 16%  tp6_google summary windows10-64 opt 1_thread e10s     482.58 -> 406.04
 15%  tp6_google_heavy summary windows10-64 opt e10s        474.75 -> 401.83
 15%  tp6_google summary linux64 opt 1_thread e10s          527.00 -> 447.12
 15%  tp6_google summary windows10-64 opt e10s              475.12 -> 403.46
 15%  tp6_google summary windows10-64 pgo e10s              401.79 -> 341.88
 15%  tp6_google_heavy summary linux64 opt e10s             546.08 -> 464.92
 15%  tp6_google summary linux64 opt e10s                   548.42 -> 467.58
 15%  tp6_google_heavy summary windows7-32 opt e10s         480.58 -> 409.92
 14%  tp6_google summary windows10-64 pgo 1_thread e10s     404.21 -> 345.83
 14%  tp6_youtube_heavy summary windows7-32 pgo e10s        243.75 -> 208.83
 14%  tp6_google_heavy summary windows10-64 pgo e10s        399.25 -> 342.17
 14%  tp6_google summary windows7-32 pgo e10s               395.25 -> 339.33
 14%  tp6_google summary windows7-32 opt e10s               477.90 -> 410.67
 14%  tp6_google_heavy summary windows7-32 pgo e10s         393.88 -> 338.96
 14%  tp6_google summary windows7-32 opt 1_thread e10s      488.38 -> 420.42
 13%  tp6_google summary linux64 pgo 1_thread e10s          466.83 -> 404.25
 13%  tp6_google summary linux64 pgo e10s                   483.50 -> 419.12
 13%  tp6_google_heavy summary linux64 pgo e10s             481.67 -> 418.21
 13%  tp6_google_heavy summary osx-10-10 opt e10s           492.96 -> 429.79
 11%  tp6_google summary windows7-32 pgo 1_thread e10s      388.93 -> 345.79
  2%  tp5o summary windows10-64 opt e10s                    258.39 -> 252.47


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10314

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
:nbp Looks like there are some big regressions on the tp6_facebook tests. Can you please take a look over these?
Flags: needinfo?(nicolas.b.pierron)
Blocks: 1402299
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> :nbp Looks like there are some big regressions on the tp6_facebook tests.
> Can you please take a look over these?

I already tested this configuration in the previous report, the last time this got enabled, and concluded that this was a false positive (Bug 1392969).

Looking at some of the graph reporting the regressions, it seems that the benchmark became bi-modal, while in some cases, it improves the best case compared to the previous baseline.

A bi-modal behaviour is expected with the JSBC, as it attempts to encode the bytecode, and decode it on upcoming page loads.
If a page is left before we got time to save the bytecode on the necko cache, then we might observe a slow-down as we keep trying to encode the page over and over.  This should be fixed in a separate bug and is unlikely to be seen by users on facebook.com and amazon.com pages.

I would expect a normal user of facebook and amazon to spend time on each page, and consume the content provided by the page. (jmaher) This behaviour got fixed on tp5 by Bug 1372942, do we have something similar on tp6?

I will double check once more with the Gecko Profiler, to see if I can reproduce similar slow down while emulating a user behaviour which implies staying on the page longer.
Flags: needinfo?(jmaher)
tp6 should have the wait for idle-callback as we do for tp5, this applies to all pageloader tests.  I know we have tested locally, but I find it odd that this change corresponds 100% to a clear signal that facebook.com is regressed, improved (on backout), and regressed (on relanding).  This is showing regressions on opt and pgo- possibly there is something we are overlooking here?
Flags: needinfo?(jmaher)
Ok, here is the procedure I followed to test on my own facebook profile:
 0. Install the Gecko Profiler addon: https://perf-html.io/
 1. Open facebook.com and login.
 2. Repeat 6 times: (to fill the cache with the bytecode)
    - Refresh the page.
    - Wait until it is loaded (no new element appearing on the page)
    - Wait 3s seconds.
 3. Capture the JSBC-enabled profile:
    - Start the profiler.
    - Refresh the facebook page.
    - Wait 3 seconds.
    - Capture the profile: Ctrl+Shift+2
    - Stop the profiler.
 4. Go to about:config and set "dom.script_loader.bytecode_cache.enabled" to false.
 5. Capture the JSBC-disabled profile:
    - Start the profiler.
    - Refresh the facebook page.
    - Wait 3 seconds.
    - Capture the profile: Ctrl+Shift+2
    - Stop the profiler.

For both profiles:
 0. Isolate & Select the Content process which has the most activity.
 1. Go to the "Marker Chart" tab.
 2. Double click on the first "Navigation::Start" event.
 3. Select the time slice (dark-grey vertical bar, at the top)
 4. Drag the time window end until the last "Reflow" event appears cutted in 2 at the end of the Marker Chart window.
 5. Click on the Loop in the time slice.
 6. Read the time which appears at the top "Full Range > x.x sec"

While testing on my laptop, on my facebook profile, I get:
 - JSBC-enabled: 3.8
 - JSBC-disabled: 4.5

Which corresponds to a 15.5% improvement. (Linux x64)
This corresponds to the initial improvements reported on the opt-build (Bug 1392969 comment 0), and to the last result I got using the same method (Bug 1392969 comment 4)

If you think the protocol I used above is not representative of a usual page-load, please tell me what I should change, otherwise, I think this is still a harness issue, and we should fix the harness.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(igoldan)
a few things:
1) the other pages have increases, so if there is a harness issue possibly we are missing other regressions/gains- likewise your patch could negatively affect other pages without knowledge.
2) your facebook profile could be much different than what we load in automation: https://github.com/rwood-moz/talos-pagesets/, probably more realistic, but we need someone to give us a realistic profile :)
3) you are measuring some hand curated metric, arguably better than what we measure- as it stands we have been asking for a HERO element or something more realistic for page measurements- to date we haven't recieved that
4) we measure time to first non blank paint for calculating the score for each load of the page.  This is measured with this code:
https://searchfox.org/mozilla-central/source/testing/talos/talos/pageloader/chrome/pageloader.js#276


here are the raw values for a tp6 stylo threads=1 for facebook.com pageload:

before:
[#0] /  Cycles:25  Average:347.80  Median:302.00  stddev:95.26 (31.5%)  stddev-sans-first:70.08
Values: 665.0  304.0  326.0  426.0  321.0  276.0  435.0  297.0  282.0  400.0  273.0  302.0  282.0  463.0  288.0  321.0  384.0  289.0  301.0  460.0  288.0  280.0  481.0  271.0  280.0

after:
[#0] /  Cycles:25  Average:374.92  Median:387.00  stddev:82.83 (21.4%)  stddev-sans-first:57.17
Values: 668.0  258.0  271.0  391.0  336.0  275.0  307.0  284.0  348.0  373.0  376.0  324.0  301.0  403.0  418.0  421.0  329.0  387.0  413.0  449.0  391.0  398.0  397.0  421.0  434.0

Since this is sustained we should look at how we summarize our findings.  the averages are almost the same, but we have been using the median in all other cases, I suspect using the custom metric via profiling to find the time for pageload would either fall into the same pattern over time, or prove to be more stable on both before/after.  As you can see any given data point can show an improvement or a regression.
Flags: needinfo?(igoldan)
Priority: -- → P2
Looks like a serious perf regression in 58 which we're shipping to beta. It would be good to have an owner here.
Flags: needinfo?(nihsanullah)
Priority: P2 → P1
(In reply to Jim Mathies [:jimm] from comment #6)
> Looks like a serious perf regression in 58 which we're shipping to beta. It
> would be good to have an owner here.

First, we need to know if this is a harness issue or a browser issue, right?

From what I tested and observed on new profiles instrumented with the Gecko profilers.
This is ~15% improvements (comment 4) on the full page-load, i.e. from navigating start to the last reflow.

The first non-blank paint metric (comment 5), reports a regression, but it is not clear to me that we even benefit from the JSBC, as this might happen if we reload the page too fast.  This would also explain the bi-modal behaviour of the benchmark (comment 2).

A simple way for testing that would be to re-run the talos benchmarks, but always waiting 10 seconds before refreshing the page, and also measuring the last reflow, to double check that what I observe manually corresponds to what talos reports.
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
> First, we need to know if this is a harness issue or a browser issue, right?
> 
> From what I tested and observed on new profiles instrumented with the Gecko
> profilers.
> This is ~15% improvements (comment 4) on the full page-load, i.e. from
> navigating start to the last reflow.
> 
> The first non-blank paint metric (comment 5), reports a regression, but it
> is not clear to me that we even benefit from the JSBC, as this might happen
> if we reload the page too fast.  This would also explain the bi-modal
> behaviour of the benchmark (comment 2).
> 
> A simple way for testing that would be to re-run the talos benchmarks, but
> always waiting 10 seconds before refreshing the page, and also measuring the
> last reflow, to double check that what I observe manually corresponds to
> what talos reports.

Have you managed to investigate this?
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)
> (In reply to Nicolas B. Pierron [:nbp] from comment #7)
> > A simple way for testing that would be to re-run the talos benchmarks, but
> > always waiting 10 seconds before refreshing the page, and also measuring the
> > last reflow, to double check that what I observe manually corresponds to
> > what talos reports.
> 
> Have you managed to investigate this?

Who are you asking?
I have not spent a single second on this bug, since it does not seems to be a JSBC issue from my point of view.
we have proven that bug 1405738 caused a perf regression, if we feel that in local testing this is not a regression, then we should wontfix this bug.  Keep in mind there are variables outside of just machine differences such as what page you are using, firefox preferences, graphics cards/drivers, etc.
I tried backing out the patch and we see a regression- this is on today's code, not the codebase from 3 weeks ago- I suspect that any work here will not uncover anything and recommend closing this as wontfix
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #11)
> I tried backing out the patch and we see a regression- this is on today's
> code, not the codebase from 3 weeks ago- I suspect that any work here will
> not uncover anything and recommend closing this as wontfix

Thanks, let's do that.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Clearly ni on Naveed as he no longer works at Mozilla.
Flags: needinfo?(nihsanullah)
You need to log in before you can comment on or make changes to this bug.