If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

3% tp5o / tsvgr_opacity (linux64) regression on push c838d2546cadd65bf8d5579db20a268c8b6e4b87 (Fri Oct 7 2016)

RESOLVED WONTFIX

Status

()

Core
Security: Process Sandboxing
RESOLVED WONTFIX
11 months ago
2 months ago

People

(Reporter: Alison Shiue, Unassigned)

Tracking

(Blocks: 2 bugs, {perf, regression, talos-regression})

52 Branch
perf, regression, talos-regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49 unaffected, firefox50 unaffected, firefox51 unaffected, firefox52 disabled, firefox53 wontfix, firefox54 ?, firefox55 ?)

Details

(Whiteboard: sb+)

Attachments

(1 attachment)

(Reporter)

Description

11 months ago
Talos has detected a Firefox performance regression from push c838d2546cadd65bf8d5579db20a268c8b6e4b87. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

100%  glterrain summary linux64 pgo e10s         10.9 -> 21.75
 98%  glterrain summary linux64 opt e10s         11.04 -> 21.89
  3%  tp5o summary linux64 opt e10s              354.71 -> 364.99
  2%  tsvgr_opacity summary linux64 opt e10s     431.72 -> 442.06


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=3656

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
(Reporter)

Comment 1

11 months ago
As a note, Talos has detected a Firefox performance improvement from push 510cf5f0eccabf5c96385a9a11d6f460f8afb227.

Improvements:

 48%  glterrain summary linux64 pgo e10s     21.74 -> 11.34
 48%  glterrain summary linux64 opt e10s     21.82 -> 11.4

For up to date info, please refer: https://treeherder.mozilla.org/perf.html#/alerts?id=3657

Hi Gian-Carlo, I am not sure if these two pushes are dependent, as you are these patches author, can you take a look at this and determine what is the root cause? Thanks!
Blocks: 1289718, 1302124
Flags: needinfo?(gpascutto)

Updated

11 months ago
status-firefox49: --- → unaffected
status-firefox50: --- → unaffected
status-firefox51: --- → unaffected
Potentially related to bug 1308851.
Flags: needinfo?(gpascutto)
Alison, is the talos regression OK now?
Flags: needinfo?(ashiue)
(Reporter)

Comment 4

11 months ago
I don't think so.  bug 1308851 looks like only solved glterrain regressions.
Both tp5o and tsvgr_opacity still have the perf issue.

You could please refer the perf graph:
[tp5o]
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bautoland,80984697abf1f1ff2b058e2d9f0b351fd9d12ad9,1,1%5D

[tsvgr_opacity]
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bautoland,3ff68dab85d334ade039992e6d8cd0ebc05cbcf4,1,1%5D
Flags: needinfo?(ashiue)
Hmm, thinking about it, this looks like the regression from bug 1284912 is showing up in more tests: https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c3

Probably because the seccomp-bpf filter is a bit bigger, and there's some additional overhead on filesystem calls now.
I'm editing the title to make it clear the outstanding regression is 3%, not 100%.
Summary: 2.39 - 99.58% glterrain / tp5o / tsvgr_opacity (linux64) regression on push c838d2546cadd65bf8d5579db20a268c8b6e4b87 (Fri Oct 7 2016) → 3% tp5o / tsvgr_opacity (linux64) regression on push c838d2546cadd65bf8d5579db20a268c8b6e4b87 (Fri Oct 7 2016)
Component: Untriaged → Security: Process Sandboxing
Product: Firefox → Core

Updated

11 months ago
Whiteboard: sblc2
:gcp, it has been a couple of weeks without an update in this bug, is there any update or plans?
Flags: needinfo?(gpascutto)
(In reply to Joel Maher ( :jmaher) from comment #7)
> :gcp, it has been a couple of weeks without an update in this bug, is there
> any update or plans?

I'm investigating in https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c27.
Flags: needinfo?(gpascutto)
Created attachment 8808748 [details]
seccomperf.txt

I tried to dig in here, working from: https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c27

Which showed: 

tp5o indiatimes.com opt e10s   graph 	363.82 ± 0.95% 	< 	393.27 ± 3.08% 	8.10% 		7.76 (high) 	11 / 11 	

i.e. an extremely significant 8% performance regression when file brokering is enabled.

I can't reproduce this at all:
- Logging in various ways shows no file IO at all in content, so there is nothing for the broker to do.

- I ran the tp5o test, with the manifest edited to only cover indiatimes.com. The performance is identical with seccompf, seccomp+brokering, or everything disabled (logs attached). From the above, a performance difference should be easily visible.
which platform are you running on?  By looking at the attachment, I assume linux?

Sometimes we find regressions that only show up on our hardware vs locally.  Here is the hardware we currently run on:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation
(In reply to Joel Maher ( :jmaher) from comment #10)
> which platform are you running on?  By looking at the attachment, I assume
> linux?
> 
> Sometimes we find regressions that only show up on our hardware vs locally. 
> Here is the hardware we currently run on:
> https://wiki.mozilla.org/Buildbot/Talos/
> Misc#Hardware_Profile_of_machines_used_in_automation

I doubt this is the case. I have an SSD and a bit newer CPU, but those would do little to explain the difference. The GPU can be a factor as we saw in the gl tests, but I actually also have an nvidia card in here (using ubuntu/proprietary drivers).

The most telling thing for me is that I can find no evidence of the file broker activating at all. I added code to specifically dump internal things to /tmp and it's empty. The content process doesn't seem to be doing any file IO at all. In light of this, having identical performance with file brokering enabled or disabled seems like the expected result

But why is try Talos different?
I really don't know.  Right now we just have the svgopacity and tp5o regressions remaining, both on linux64 e10s.  I have seen svgr_opacity have issues in the past with odd changes.  Have you looked at the network, I know we do localhost, but that still requires networking code.
status-firefox53: --- → affected
status-firefox52: affected → disabled

Updated

7 months ago
Whiteboard: sblc2 → sb+
Alison, Joel, do you think this is still an issue? I'm marking this wontfix for 53 as there hasn't really been any action on the bug and we're about to ship 53.
status-firefox53: affected → wontfix
status-firefox54: --- → ?
status-firefox55: --- → ?
odd the glterrain issue seems to be fixed a few days later and tsvg looks fixed, but the tp5o issue remains.  So many changes to the tests happen over time, if we don't fix it in a few weeks it realistically doesn't get fixed.

Updated

2 months ago
Blocks: 1257239
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.