Closed
Bug 1254684
Opened 6 years ago
Closed 6 years ago
speed-battle.com reports a 10% performance regression in Render and Storage performance in 46.0b1 versus 45.0, Peacekeeper reports nearly the same (9.8% perf regression)
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: thee.chicago.wolf, Unassigned)
References
Details
(Keywords: perf)
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0 SeaMonkey/2.42 Build ID: 20160226091329 Steps to reproduce: Visit http://www.speed-battle.com using FF 45.0 final. Repeat the test a few times to attain a high score. Install and run FF 46.0b1. Repeat the same steps as for FF 45.0. Actual results: The average score has dropped 9.34% on my i5-3470 @ 8GB DDR 1866 with 512MB NVidia Geforce 210 on Render performance and dropped 10% in Storage performance. Calculate performance has remained about the same. Expected results: There should not be a drop in performance. Performance should stay the same or slightly increase with a newer Ff version.
Reporter | ||
Comment 1•6 years ago
|
||
Some screen grabs of the difference in performance numbers.
Reporter | ||
Updated•6 years ago
|
Summary: speed-battle.com reports a 10% performance regression in Render and Storage performance regression in 46.0b1 versus 45.0 → speed-battle.com reports a 10% performance regression in Render and Storage performance in 46.0b1 versus 45.0
I'm not sure this online test can be trusted, I obtain many variable results for a same version of Firefox, so results for 45 and 46 can't compared between them, IMHO.
Reporter | ||
Comment 3•6 years ago
|
||
(In reply to Loic from comment #2) > I'm not sure this online test can be trusted, I obtain many variable results > for a same version of Firefox, so results for 45 and 46 can't compared > between them, IMHO. It's a good thermometer nonetheless. However, no one argues Peacekeeper (http://peacekeeper.futuremark.com/) scores. Here are mine on the same hardware: 45.0: 4957 46.0b1, build3: 4505 or 9.12% lower 46.0b1, build4: 4437 or 10.49% lower Those lower scores average out to a 9.8% perf regression. Consistent with what Speed Battle is reporting. So yes, there is a perf regression in 46.
Reporter | ||
Updated•6 years ago
|
Summary: speed-battle.com reports a 10% performance regression in Render and Storage performance in 46.0b1 versus 45.0 → speed-battle.com reports a 10% performance regression in Render and Storage performance in 46.0b1 versus 45.0, Peacekeeper reports nearly the same (9.8% perf regression)
Comment 4•6 years ago
|
||
User Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0 Build ID 20160303134406 I did these tests on Firefox 45 release and Firefox 46 beta. On http://www.speed-battle.com I received different results for same Firefox version. However the average results show FF 45 is better performing than FF 46.0b1. The http://peacekeeper.futuremark.com/ scores also show FF 45 is better performing than FF 46.0b1: 45.0: 4559 46.0b1, build 3: 4254 46.0b1, build 4: 4191 46.0b1, build 8: 4141
Status: UNCONFIRMED → NEW
Component: Untriaged → Performance Monitoring
Ever confirmed: true
Keywords: perf
Product: Firefox → Toolkit
Reporter | ||
Comment 5•6 years ago
|
||
To try and find a regression range, I went back to some older builds to test. Here's what I found: Last good build: 2016010730235, scored 5059 on Peacekeeper First bad build: 2016010803034, scored 4716 on Peacekeeper Of note, build 20160101030330 scored 5249 on Peacekeeper Hopefully with that info someone else can confirm and do some granular regression testing to figure out the bad change.
Reporter | ||
Comment 6•6 years ago
|
||
So I played with the Mozgression tool (1st time) but didn't get too far. My Jedi skills lack in this area. However, I did manage to get the Pushlog from 1/7 to 1/8: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2016-01-07&enddate=2016-01-08 Just glancing through, nothing seems to jump out at me.
Reporter | ||
Comment 7•6 years ago
|
||
Testing with 46.0b4 shows perf went down even further on Peacekeeper with a score of 4141. I thought it was an error but on second run it scored 4199. Averaging those scores, we get 4170. If comparing to my original score of 4957 on 45.0, it would appear perf has now gone down 15.877%. This is a rather alarming perf decrease and indicates a negative trend.
Comment 8•6 years ago
|
||
Arthur in your beta build could you check about:support and see if you were part of the e10s experiment? We're shipping e10s to nightly/aurora and a subset of beta.
Component: Performance Monitoring → Untriaged
Flags: needinfo?(thee.chicago.wolf)
Product: Toolkit → Core
Comment 9•6 years ago
|
||
chutten could you try to reproduce this with e10s and non-e10s and if you can reproduce try to generate some comparative profiles?
tracking-firefox46:
--- → ?
tracking-firefox47:
--- → ?
tracking-firefox48:
--- → ?
Flags: needinfo?(chutten)
Comment 10•6 years ago
|
||
Latest Nightly (48.0a1) e10s - 92.04 949.8 99.9 1141.74 !e10s - 90.02 963.76 99.9 1153.68 Latest DevEd (47.0a2) e10s - 91.02 949.8 106.39 1147.21 !e10s - 94.16 936.23 109.23 1139.62 In other words, e10s doesn't seem to have much/any effect on the stated numbers. Also, I noted that individual runs can range anywhere from 700s to nearly 1200. I tried my best to pick a "representative" number, but humans aren't good at that. Regardless, I'm not sure speed-battle is a rigorous measure.
Flags: needinfo?(chutten)
Reporter | ||
Comment 11•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #10) > Latest Nightly (48.0a1) > e10s - 92.04 949.8 99.9 1141.74 > !e10s - 90.02 963.76 99.9 1153.68 > > Latest DevEd (47.0a2) > e10s - 91.02 949.8 106.39 1147.21 > !e10s - 94.16 936.23 109.23 1139.62 > > In other words, e10s doesn't seem to have much/any effect on the stated > numbers. > > Also, I noted that individual runs can range anywhere from 700s to nearly > 1200. I tried my best to pick a "representative" number, but humans aren't > good at that. Regardless, I'm not sure speed-battle is a rigorous measure. Peacekeeper is more representative than Speed Battle IMO.
Flags: needinfo?(thee.chicago.wolf)
Reporter | ||
Comment 12•6 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #8) > Arthur in your beta build could you check about:support and see if you were > part of the e10s experiment? We're shipping e10s to nightly/aurora and a > subset of beta. Meaning: Multi-process staged rollout 1.0 true e10srollout@mozilla.org or: Multi-process Firefox A/B Test 46.1 e10s-beta46-noapz@experiments.mozilla.org Measure the effect of multi-process Firefox false 1458647505079 control-addon-user
Comment 13•6 years ago
|
||
I thought the bug was about speed-battle, not peacekeeper. Maybe it's about the regression. Sure, I'll give a quick e10s/!e10s comparison of Peacekeeper. Of course, none of these or my earlier numbers are statistically significant in any way (no repeated trials, the hardware running the test wasn't dedicated to the test (ie, I was doing other things while running the tests), AFAICT the test hasn't been vetted...) so they are illustrative. Nightly 48.0a1 2016-03-22 e10s 3936 !e10s 3916 I think this regression is not related to e10s. It sounds as though there's some evidence here for there being a potential 45/46 regression, though I wonder how much of it can be attributed to build differences between Beta and Release. Anyone want to dig up a copy of 45beta kicking around for an apples-to-apples test?
Comment 14•6 years ago
|
||
There aren't real build differences between beta and release (releases are basically just promoted beta builds). But there are differences in behavior because telemetry (and experiments) are enabled by default when you're on the beta channel. In any case, I'm looking for a recommendation of next steps: see if you can reproduce an obvious problem (e10s-related or a standard regression) and what next steps should be. You can of course find old beta builds at https://archive.mozilla.org/pub/firefox/releases/
Flags: needinfo?(chutten)
Reporter | ||
Comment 15•6 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #14) > There aren't real build differences between beta and release (releases are > basically just promoted beta builds). But there are differences in behavior > because telemetry (and experiments) are enabled by default when you're on > the beta channel. > > In any case, I'm looking for a recommendation of next steps: see if you can > reproduce an obvious problem (e10s-related or a standard regression) and > what next steps should be. You can of course find old beta builds at > https://archive.mozilla.org/pub/firefox/releases/ So...the below is not useful? Last good build: 2016010730235, scored 5059 on Peacekeeper First bad build: 2016010803034, scored 4716 on Peacekeeper
Comment 16•6 years ago
|
||
(In reply to Arthur K. from comment #15) > So...the below is not useful? > > Last good build: 2016010730235, scored 5059 on Peacekeeper > First bad build: 2016010803034, scored 4716 on Peacekeeper Regression ranges are always useful. Thank you very much! Unfortunately, I'm not having such luck with Peacekeeper. In trying to nail down relevant changesets, I tried getting a regression range on the aurora channel. Three runs of a single build had scores that differed by 1100 points (mozilla-aurora 2016-01-22 (41159a2) gave me results of 3768 4590 and 4857, for instance). I can't nail down a 10% regression if the numbers vary at least 20% without generating a statistical population of results. At 5 minutes for each result, generating enough for statistics is going to take more time than I'd like. Could you help me by telling me how you came by those builds and those scores? Were they just one-off, or were they stable across multiple runs of the benchmark? Did you build these yourself (and off of which channel?)?
Flags: needinfo?(chutten)
Updated•6 years ago
|
Flags: needinfo?(thee.chicago.wolf)
Reporter | ||
Comment 17•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #16) > (In reply to Arthur K. from comment #15) > > So...the below is not useful? > > > > Last good build: 2016010730235, scored 5059 on Peacekeeper > > First bad build: 2016010803034, scored 4716 on Peacekeeper > > Regression ranges are always useful. Thank you very much! > > Unfortunately, I'm not having such luck with Peacekeeper. In trying to nail > down relevant changesets, I tried getting a regression range on the aurora > channel. Three runs of a single build had scores that differed by 1100 > points (mozilla-aurora 2016-01-22 (41159a2) gave me results of 3768 4590 and > 4857, for instance). I can't nail down a 10% regression if the numbers vary > at least 20% without generating a statistical population of results. > > At 5 minutes for each result, generating enough for statistics is going to > take more time than I'd like. I find that with Peacekeeper my scores are within a few percentage of each other. I tend to do a clean reboot and run it a few times with no other apps open. > Could you help me by telling me how you came by those builds and those > scores? Were they just one-off, or were they stable across multiple runs of > the benchmark? Did you build these yourself (and off of which channel?)? The specific builds I used to find last good/first bad came from http://ftp.mozilla.org/pub/firefox/nightly/2016/01/2016-01-07-03-02-35-mozilla-central/ and http://ftp.mozilla.org/pub/firefox/nightly/2016/01/2016-01-08-03-03-34-mozilla-central/ respectively.
Flags: needinfo?(thee.chicago.wolf)
Comment 18•6 years ago
|
||
Awesome, thanks! For me, ./mach mozregression --good 2016-01-07 --bad 2016-01-08 eventually points me to bug 1205288 which doesn't immediately look like something that ought to cause a regression (maybe checking the method early on in a blob request counts towards the bench whereas, before, it wouldn't have?) I'm going to see if reverting that change actually has an effect.
Comment 19•6 years ago
|
||
No obvious effect. But, then, I wasn't really expecting one. For the first time, mozregression has failed me. The changelog from the builds you cite is here: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e0bcd16e1d4b99ba3e542149d0d41e0f60c54b5c&tochange=1424cdfc075d1b7e277be914488ac73e20d1c982 It isn't short. Nothing in particular jumps out at me as an obvious candidate. Passing those revs into mozregression, I get (3956, 3612, 3818) for 20160107032110 and (3955, 3916, 3891) for 20160108062355 I don't think I have the appropriate hardware to test this regression. Have you used mozregression[0] before? It helps bisect down to the change that introduced a regression. If you have a build environment handy, you can try running ./mach mozregression --good e0bcd16e1 --bad 1424cdfc075 (those are the SHAs for the good and bad builds you supplied earlier) [0]: http://mozilla.github.io/mozregression/
Flags: needinfo?(thee.chicago.wolf)
Reporter | ||
Comment 20•6 years ago
|
||
> I don't think I have the appropriate hardware to test this regression. Have
> you used mozregression[0] before? It helps bisect down to the change that
> introduced a regression. If you have a build environment handy, you can try
> running
>
> ./mach mozregression --good e0bcd16e1 --bad 1424cdfc075
>
> (those are the SHAs for the good and bad builds you supplied earlier)
>
> [0]: http://mozilla.github.io/mozregression/
I tried installing Mozgression GUI but didn't come up with much. Not sure if it requires my hardware to repro as it seems Liviu was seeing the perf regression I am seeing. Did you also test with 45.0 and 46.0b1?
Flags: needinfo?(thee.chicago.wolf)
Based on the comments in the bug, we have not been able to notice this perf slowdown due to the variance Chris has seen across multiple runs. It is still too early to track this for 47. Liz, just fyi, there is a claim that 46 is 10% worse in perf.
Flags: needinfo?(lhenry)
Will I am flagging this one for your attention too. Can somebody from Talos team take a look at the perf regression that is being mentioned here? There is a notable different between 45.0 and 46.0b1. Any help here is appreciated.
Flags: needinfo?(wlachance)
Comment 23•6 years ago
|
||
All the unresolved talos issues in firefox 46 are tracked here: bug 1233209. I don't see anything offhand that would be related to this. That said, we don't test have that many benchmarks like this in Talos, we tend to leave that to the JS team w/ arewefastyet: https://arewefastyet.com/ ... but I don't see any large changes there either.
Flags: needinfo?(wlachance)
Comment 24•6 years ago
|
||
Running mozregression also failed. I was in a 1-2day regression range, when it diverted to taskcluster and things got weird. E.g. > $ mozregression --good=2016-01-06 --bad=2016-01-10 > 0:04.37 INFO: Testing good and bad builds to ensure that they are really good and bad... > 0:04.38 INFO: Downloading build from: https://archive.mozilla.org/pub/firefox/nightly/2016/01/2016-01-06-03-02-25-mozilla-central/firefox-46.0a1.en-US.linux-x86_64.tar.bz2 > ===== Downloaded 100% ===== > 0:13.32 INFO: Running mozilla-central build for 2016-01-06 > 0:25.89 INFO: Launching /tmp/tmp0oDJF2/firefox/firefox > 0:25.90 INFO: application_buildid: 20160106030225 > 0:25.90 INFO: application_changeset: 9d6ffc7a08b6b47056eefe1e652710a3849adbf7 > 0:25.90 INFO: application_name: Firefox > 0:25.90 INFO: application_repository: https://hg.mozilla.org/mozilla-central > 0:25.90 INFO: application_version: 46.0a1 > Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry' or 'exit' and press Enter): Score: 4000 > $ mozregression --repo=mozilla-central --launch 9d6ffc7a08b6b47056eefe1e652710a3849adbf7 > > 0:03.37 INFO: Downloading build from: https://queue.taskcluster.net/v1/task/dJ5fBPeGRJiGVOIUpFc8LQ/runs/0/artifacts/public%2Fbuild%2Ffirefox-46.0a1.en-US.linux-x86_64.tar.bz2 > ===== Downloaded 100% ===== > 0:20.05 INFO: Running mozilla-central build built on 2016-01-06 01:43:50.916000, revision 9d6ffc7a > 0:31.66 INFO: Launching /tmp/tmpm2QOOp/firefox/firefox > 0:31.66 INFO: application_buildid: 20160105163504 > 0:31.66 INFO: application_changeset: 9d6ffc7a08b6b47056eefe1e652710a3849adbf7 > 0:31.66 INFO: application_name: Firefox > 0:31.66 INFO: application_repository: https://hg.mozilla.org/mozilla-central > 0:31.66 INFO: application_version: 46.0a1 Score: 3400 Do the builds differ and how?
Hannes, can you take on the investigation of this bug from here? I'm a little worried about it for the 46 release, and want to make sure we figure out how serious the issue is, and if there's anything we can do about it for 46. For now I'll track the bug and consider it affected for 46. If there is any help you need from QE, maybe Florin's team can contribute or do some testing.
status-firefox46:
--- → affected
Flags: needinfo?(lhenry)
Flags: needinfo?(hv1989)
Flags: needinfo?(florin.mezei)
Comment 26•6 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25) > Hannes, can you take on the investigation of this bug from here? I'm a > little worried about it for the 46 release, and want to make sure we figure > out how serious the issue is, and if there's anything we can do about it for > 46. For now I'll track the bug and consider it affected for 46. > > If there is any help you need from QE, maybe Florin's team can contribute or > do some testing. The issue is tracked in bug 1260435. Bug 1207922 is the cause of it. There we improved performance and maintainability of regexp test and exec functions. What we are seeing here is an extra optimization that doesn't happen anymore, due to the refactor. This optimization was aimed at improving creating a regexp in a loop. In particular we had to do this for benchmark scores (octane regexp). In tight loops with an easy/fast regexp, the creation of the regexp takes the most time. The optimization tried to hoist it outside the loop. The refactor accidentally disabled this on most regexp creations and even with this optimization disabled the refactor was a net win on octane regexp. I don't think we need to track this for beta/aurora/... . On AWFY we haven't seen a regression (which tests the 'official' benchmarks). Peacekeeper isn't one of them. (Peacekeeper itself is outdated, according to the benchmark itself). The regressions on the two subtests (this fix also improved two tests) are micro benchmarks. We will fix the issue, since it will improve octane-regexp also. But I don't think backports and tracking are needed.
Flags: needinfo?(hv1989)
Comment 27•6 years ago
|
||
Sounds like there's nothing more for QA to do here as this is known and tracked separately. Arthur, I guess you can retry getting some measurements when bug 1260435 is fixed.
Flags: needinfo?(florin.mezei)
Reporter | ||
Comment 28•6 years ago
|
||
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #27) > Sounds like there's nothing more for QA to do here as this is known and > tracked separately. > > Arthur, I guess you can retry getting some measurements when bug 1260435 is > fixed. Will keep an eye on it after bug 1260435 is fixed. Thanks all for investigating. I know Peacekeeper's not being maintained anymore by the author but it's still useful. Well, at least for me. =/
Comment 29•6 years ago
|
||
In the interest of clarity, we're not going to track this generic issue by itself, so I'm going to close this bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
tracking-firefox48:
? → ---
Comment 30•5 years ago
|
||
Moving from Core::Untriaged to Core::General https://bugzilla.mozilla.org/show_bug.cgi?id=1407598
Component: Untriaged → General
You need to log in
before you can comment on or make changes to this bug.
Description
•