Closed Bug 1284912 Opened 8 years ago Closed 8 years ago

2.88% ts_paint (linux64) regression on push 23140396a80eb27ff586c41fdc1cad62c875c9b1 (Tue Jul 5 2016)

Categories

(Core :: Security: Process Sandboxing, defect)

50 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- disabled
firefox53 --- affected

People

(Reporter: rwood, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression, Whiteboard: sblc2)

Talos has detected a Firefox performance regression from push 23140396a80eb27ff586c41fdc1cad62c875c9b1. As author of one of the patches included in that push, we need your help to address this regression.

Summary of tests that regressed:

  ts_paint linux64 opt e10s - 2.88% worse

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=1708

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Enabling seccomp will cause every system call from the content process to get filtered and run through a decision module, so I guess it's not too surprising there's an impact on performance.

The question is if this is the expected amount. Some googling suggests seccomp-bpf overhead per system call to be on the order of 25% ish for simple filters (most of ours just do Allow()), so having a 3% total slowdown does sound like it may be more than expected?
(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)
> The question is if this is the expected amount. Some googling suggests
> seccomp-bpf overhead per system call to be on the order of 25% ish for
> simple filters (most of ours just do Allow()), so having a 3% total slowdown
> does sound like it may be more than expected?

It's not total slowdown. It's total slowdown on one test out of quite a few, and 3% is barely over our noise level in general (though I trust the report that this specific regression shows enough statistical significance to flag it as such, or else typically it would not have been reported - but that's a guess).
(In reply to Avi Halachmi (:avih) from comment #3)
> It's not total slowdown. It's total slowdown on one test out of quite a few,
> and 3% is barely over our noise level in general (though I trust the report
> that this specific regression shows enough statistical significance to flag
> it as such, or else typically it would not have been reported - but that's a
> guess).

The change in distribution is very obvious if you look at the graph:

https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,619ab4c1115bc476b1b460a67288d28bdc272579,1,1%5D&selected=%5Bmozilla-inbound,619ab4c1115bc476b1b460a67288d28bdc272579,34103,31282178,1%5D
(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)
> The question is if this is the expected amount. Some googling suggests
> seccomp-bpf overhead per system call to be on the order of 25% ish for
> simple filters (most of ours just do Allow())

Note that there's one filter program applied to all syscalls, and the one the Chromium code generates is not exactly simple: it checks that the call is for the correct architecture, then uses a binary decision tree to dispatch on the syscall number (see http://searchfox.org/mozilla-central/search?q=AssembleJumpTable ), and *then* does the compiled version of the bpf_dsl::Allow() or other action.

To see a disassembly of the generated BPF program, run with MOZ_SANDBOX_VERBOSE set in the environment.  It looks like the fastest paths to “allow” results in the current desktop content policy are about 12 BPF instructions.
Flags: needinfo?(julian.r.hector)
Whiteboard: sblc2
I would like to add that I also found out this is the root cause of a 7.66% tabpaint e10s regression on linux64:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,ceb25e8fd81f15ed82526f7e28bcfd1f5f6d49ea,1,1%5D&series=%5Bfx-team,ceb25e8fd81f15ed82526f7e28bcfd1f5f6d49ea,1,1%5D

this is really just a LOT noisier than before, and it seems fairly clear that the change occurred with the same push.
(In reply to Robert Wood [:rwood] from comment #0)
> *** Please let us know your plans within 3 business days, or the offending
> patch(es) will be backed out! ***

The linux sandbox is currently restricted to nightly builds and won't roll out for a while. This bug blocks our next milestone. We don't need to back bug 742434 out unless something shows up that's seriously debilitating for our nightly audience.
is there a bug to track when this will go from nightly -> aurora,etc.?
Flags: needinfo?(jmathies)
(In reply to Joel Maher ( :jmaher ) from comment #8)
> is there a bug to track when this will go from nightly -> aurora,etc.?

Not that I'm aware of. We have multiple milestones to go before we consider rolling out.
Flags: needinfo?(jmathies)
ok, I will ping in this bug once/release cycle to know when I should expect this to roll downstream.
I just did some testing, but couldn't really tell a difference, but maybe my system is not configured the same way as the talos.

However, Bug 1259508, may be responsible for this 2.88% regression. It causes the fork/execve of a binary which needs to happen before seccomp is enabled. This action is related to audio and is usually performed on demand. But due to the patch landed in Bug 1259508, this now happens every time a new content process is started.

This could also be the cause of the tabpaint test regression.
thanks Julian for taking a look at this.

Any performance number will probably be much different locally vs in our automation environment.  I would say roughly 25% of the regressions in automation can be easily reproduced locally.

A few tips for investigating the regressions:
* use try server, collect at least 6 data points
* the tip of trunk changes often (for example ts_paint test was modified yesterday!!), remember when working on this to collect additional data from the base revision you are testing on try/locally.  This is usually done with the retrigger job action on treeherder (click on the job letter, hit the 'r' key)
* use the compare feature in perfherder: https://treeherder.mozilla.org/perf.html#/compare
* we have spsProfiling options on try server, this is a good way to use the built in sampling profiler to collect data before/after a patch to get an idea where time is being spent.

and of course, just ask for help in the bug or on irc :)
So this is a wontfix for 50 (now Aurora), right?
Flags: needinfo?(jmathies)
(In reply to Andrew Overholt [:overholt] from comment #13)
> So this is a wontfix for 50 (now Aurora), right?

seccompf is restricted to nightly so this isn't riding the trains.
Flags: needinfo?(jmathies)
tracking on Firefox 51 for now- will adjust as needed if we don't ride the trains in v.51
Blocks: 1291351
No longer blocks: 1280481
:jimm, will we be riding the trains with this?  or will we need to track this on Firefox 52?
Flags: needinfo?(jmathies)
(In reply to Joel Maher ( :jmaher) from comment #16)
> :jimm, will we be riding the trains with this?  or will we need to track
> this on Firefox 52?

Linux sandbox isn't riding in 51.
Flags: needinfo?(jmathies)
Blocks: 1302124
No longer blocks: 1291351
Hi :jmaher,
Per comment #17, does that mean we won't fix in 51?
Flags: needinfo?(jmaher)
Because 51 is Aurora it is no longer effected. The Linux sandbox enables based on Nightly flags.
Flags: needinfo?(jmaher)
if this is nightly only, will it ever go down the trains?  I would prefer to track this on the latest version and when it rides the trains leave it alone.
You can assume for now that it will ride in 52.
great, we are all set then
:gcp, can you confirm this is riding the 52 train?
Flags: needinfo?(gpascutto)
(In reply to Joel Maher ( :jmaher) from comment #23)
> :gcp, can you confirm this is riding the 52 train?

It's currently only enabled on Nightly still: https://dxr.mozilla.org/mozilla-aurora/rev/88cbb2726969ee5a4204ae9df3025c39c864de15/old-configure.in#4046

I think shipping the patches that caused this doesn't gain much without also adding the stuff that caused bug 1310119 and has at least one serious functional regression (no printing, bug 1309205).

So it depends on how quickly the latter can be fixed and how we feel about the perf vs. security impact.
Flags: needinfo?(gpascutto)
I took another look at this and tested my theory on try, it turns out that the early initialization of Cubeb doesn't cause the regression, at least I don't see an improvement if that early init is removed.

Here is the comparison:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c22b272ce18ebf6191bbae12292e5edc9b0caf9e&newProject=try&newRevision=04de1ff6a949c6d42cb393607c1a1bd61dc2c17a&framework=1&showOnlyImportant=0

I don't think the seccomp filter itself causes the regression, rather the setup code found here:
http://searchfox.org/mozilla-central/rev/4012e61cc38758ffa1f7ce26039173f67b048853/security/sandbox/linux/Sandbox.cpp#461

But at the same time I don't know the best way to test how much of a performance impact the policy compilation causes.
I am assuming that the compilation may the cause of the regression because as far as I know, ts_paint measures the start-up time and that code is triggered during startup.
I did a bunch of pushes of the following style: try: -b o -p linux64,linux -u none -t all --rebuild-talos 10

This is inbound from last week (2938e1861662) versus the same revision with the filesystem broker patches backed out (bug 1289718) as e41eb4be5706. 
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e41eb4be5706&newProject=try&newRevision=2938e1861662&framework=1&showOnlyImportant=0&showOnlyConfident=1

This is the above branch with the broker patched out (e41eb4be5706) versus the same branch with all of the content sandbox stuff backed out (40e55de76d0f):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=40e55de76d0f&newProject=try&newRevision=e41eb4be5706&framework=1&showOnlyImportant=0&showOnlyConfident=1

The pairs are displayed in order of original landing (so reverse order of backing out).

The last pair mimics what this bug should have seen (no content sandbox vs basic-seccomp-but-no-filesystem-broker), but as you can see the results are a bit different: except for glterrain, the regression is only for a few tests on the order of 1%-2%. Notably, ts_paint looks like this now:
linux64   graph 	1504.45 ± 0.75% 	> 	1501.82 ± 0.76% 	-0.18% 		0.54 (low) 	11 / 11 	

So that one is gone.

tabbpaint is still there, though:
linux64   subtests · graph 	69.90 ± 3.59% 	< 	72.30 ± 2.07% 	3.43% 		2.72 (low) 	11 / 11 	

But the exact regression amount on that one is tricky because of the high noise. glterrain is probably real: we've already seen the sandbox is prone to break WebGL so it's reasonable to suspect a lot of driver syscalls are now having some overhead.

I think that regression (1-2% max) is something we're willing to ship.

The regressions in the 2nd pair are more serious. They're as high as 3% on top of the first one. Some of the subtests are very clear:

tp5o indiatimes.com opt e10s   graph 	363.82 ± 0.95% 	< 	393.27 ± 3.08% 	8.10% 		7.76 (high) 	11 / 11 	

I think that running one of these locally with logging will reveal what is going on.
thanks for the great details on this!
(In reply to Gian-Carlo Pascutto [:gcp] from comment #27)
> I think that running one of these locally with logging will reveal what is
> going on.

Curiously, I see zero activity whatsoever in the file broker while these tests run. I'm still digging in.
This isn't riding the 52 train, correct?
Flags: needinfo?(gpascutto)
Content Sandboxing is disabled on anything not Nightly.

We might make a call to let seccomp filtering (<2% regression) ride the trains so we have some advance warning of any compatibility impact.

File access brokering (regression unclear, none locally, up to 9% on talos) needs a fix for printing before it can be considered. But I'm working on this and if I get a patch quickly we may consider uplifting it.
Flags: needinfo?(gpascutto)
I spent time on a talos linux64 loaner investigating this. Specifically, I've been running

./mach talos-test --suite tp5o

with a tp5o.manifest containing only indiatimes.com. That's based off of:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=e41eb4be5706&newProject=try&newRevision=2938e1861662&originalSignature=80984697abf1f1ff2b058e2d9f0b351fd9d12ad9&newSignature=80984697abf1f1ff2b058e2d9f0b351fd9d12ad9&framework=1

showing that to be a consistent 8% regression with file brokering.

I see no significant performance difference between sandboxing on/off, file brokering on/off, or compiling the sandboxing code out entirely. My results are consistently of the form:

12:17:12     INFO -  PERFHERDER_DATA: {"framework": {"name": "talos"}, "suites": [{"lowerIsBetter": true, "subtests": [{"name": "http:", "lowerIsBetter": true, "alertThreshold": 2.0, "replicates": [638.0, 438.0, 414.0, 407.0, 402.0, 390.0, 381.0, 417.0, 393.0, 375.0, 420.0, 519.0, 398.0, 426.0, 387.0, 385.0, 416.0, 393.0, 400.0, 424.0, 396.0, 380.0, 472.0, 384.0, 402.0], "value": 397.0, "unit": "ms"}], "name": "tp5o", "alertThreshold": 2.0}, {"subtests": [{"name": "Main_RSS", "value": 219456748.30769232}, {"name": "Private Bytes", "value": 572639846.4}, {"name": "XRes", "value": 1419659.6}, {"name": "responsiveness", "value": 0.9654507564067291}], "name": "tp5o"}]}

i.e. about 400ms average. Referring to the previous treeherder comparison, that's the "slow" side of results.

I also went back to 3e9a2031152f, which is the commit before file brokering: same result. Going back to c676d55b6b00 (just before seccomp at all) failed to build and I didn't manage to work around the breakage easily.

The only useful thing that came out of this investigation is the realization that Talos is running on Ubuntu 12.04, using Linux 3.2 kernels. These are old. They're old enough that they're missing a bunch of seccomp improvements and optimizations, most notably:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8f577cadf7181243d336be9aba40c1bcc02c4c54

the JIT for seccomp-bpf. So this means that any performance regression on Talos from seccomp would be rather exaggerated. That is, if I could see the regression. But I don't.
Pushes based on current m-i, 6 talos iterations:

Regular vs Seccomp-bpf:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4392a3797247&newProject=try&newRevision=e2124c2277e7&framework=1&showOnlyConfident=1

Regular vs Seccomp-bpf+file broker:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4392a3797247&newProject=try&newRevision=d36030c58be5&framework=1&showOnlyConfident=1

Note the variance between the previous tries, i.e. the 9% tp5o regression on indiatimes.com is suddenly only a 5% one, although nothing in the sandbox changed! And again, from the logging I see that the file broker isn't actually activated at all by this test...I don't understand why talos has so much variance here.

Summarizing the investigation so far:

Maximum regression is <3% and this maximum is only on a limited amount of tests, and maybe a bit more on 1 or 2 who are using the GPU (which is expected). As explained in the previous comment, even this is likely a severe overestimation for the majority of our users (Ubuntu 14.04 and Debian 8 have >3.16 kernels with seccomp JIT, unlike Talos). Given the limited regression, likelihood of a substantially smaller regression for real users, and the security benefits, I believe we should ship with this.

I will unblock the seccomp sandbox for Firefox 52 and enable it on Aurora.
:gcp, thank you for your detailed work on this.  As to why you had trouble reproducing the issue on a loaner, I have no answers- usually you can.  In this case knowing we cannot reproduce it easily on a loaner and our OS on the test machines has a very old kernel (i.e. very limited exposure) and that this might not be that severe based on the noise or future changes to Firefox, your assessment of accepting this is right on.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
(In reply to Joel Maher ( :jmaher) from comment #34)
> :gcp, thank you for your detailed work on this.  As to why you had trouble
> reproducing the issue on a loaner, I have no answers- usually you can. 

Note that I was looking for an 8-9% regression with very high confidence from previous pushes, while my new talos push landed on <5% with medium confidence (on the same tp5o indiatimes test with same sandbox settings). It's possible that with enough runs of that test, and carefully averaging, I'd have managed to confirm a smaller regression with even less confidence. That wouldn't have helped me pinpoint where it came from though, which would've been much easier if the regression was reproducibly 9% and what I was looking for.

But the conclusion is: it seems very unlikely the browser actually became 9% slower on that site, what is more likely that it became 1-2% slower and the rest is confusion due to (surprisingly high?) noise between the test runs.
You need to log in before you can comment on or make changes to this bug.