Closed
Bug 993084
Opened 11 years ago
Closed 11 years ago
14% regression in ts paint on linux64 seen on fx-team
Categories
(Firefox Health Report Graveyard :: Client: Desktop, defect)
Tracking
(firefox30 unaffected, firefox31- verified)
VERIFIED
FIXED
Firefox 31
Tracking | Status | |
---|---|---|
firefox30 | --- | unaffected |
firefox31 | - | verified |
People
(Reporter: jmaher, Assigned: gfritzsche)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression] p=5 s=it-31c-30a-29b.3 [qa-])
Attachments
(5 files)
2.04 KB,
patch
|
gps
:
review+
|
Details | Diff | Splinter Review |
2.51 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
8.36 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
8.59 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
1.97 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
here is a view of the graph:
http://graphs.mozilla.org/graph.html#tests=[[83,132,35]]&sel=none&displayrange=7&datatype=running
here is a view on tbpl, where I see did some retriggers:
https://tbpl.mozilla.org/?tree=Fx-Team&fromchange=10a99a32304b&tochange=d22244fa46b4&jobname=Ubuntu%20HW%2012.04%20x64%20fx-team%20talos%20other
Unfortunately there were a few pushes that didn't run tests due to a bad push prior. So we are missing details on 5 changesets.
If anybody has ideas of why linux 64 would have a large hit on ts (launching the browser), I would appreciate learning more about it.
Comment 1•11 years ago
|
||
Bug 989761 is not the culprit - it only messed with Windows Classic CSS.
Comment 2•11 years ago
|
||
Looking at the graph, it looks like the regression was introduced somewhere on fx-team between 57a393d69f32 and ddc22f087bec.
http://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=57a393d69f32&tochange=ddc22f087bec
Comment 3•11 years ago
|
||
I'd also suspect that bug 989609 is not the culprit, since that patch was backed out.
Comment 4•11 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #0)
> here is a view of the graph:
> http://graphs.mozilla.org/graph.html#tests=[[83,132,
> 35]]&sel=none&displayrange=7&datatype=running
>
> here is a view on tbpl, where I see did some retriggers:
> https://tbpl.mozilla.org/?tree=Fx-
> Team&fromchange=10a99a32304b&tochange=d22244fa46b4&jobname=Ubuntu%20HW%2012.
> 04%20x64%20fx-team%20talos%20other
>
> Unfortunately there were a few pushes that didn't run tests due to a bad
> push prior. So we are missing details on 5 changesets.
>
> If anybody has ideas of why linux 64 would have a large hit on ts (launching
> the browser), I would appreciate learning more about it.
Nothing else really jumps out at me. :/
Bisecting might be our only choice here, unless somebody else has another suggestion.
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Comment 5•11 years ago
|
||
That leaves a few suspects, I will wait until tomorrow to see if the original patch authors have ideas of what might be the problem.
Then a series of pushes to try while backing out sound like a good idea :)
Reporter | ||
Updated•11 years ago
|
Summary: 14% regression in ts paint on linux64 seen on fx-team → 14% regression in ts paint on linux 32/64 seen on fx-team
Comment 6•11 years ago
|
||
Does anyone know if there are configs for the talos run outside of mozilla-central which might affect whether telemetry is enabled during this test?
The two most-likely pushes in this range are:
Benjamin Smedberg — Bug 986582 - Get rid of the toolkit.telemetry.enabledPreRelease pref and make the toolkit.telemetry.enabled pref do the right thing for beta users who are testing a final release build, r=rnewman
Benjamin Smedberg — Bug 992208 - Add Telemetry Experiments to the package so that they are actually used. Also a basic test that the service exists and can be created. r=gfritzsche
Of those I'd be pretty surprised if bug 992208 were at fault here, and mis-configured telemetry seems more likely.
Although if telemetry being enabled affects startup perf by 15%, that sucks also.
Did this show up on any other perf suites/OSes?
Reporter | ||
Comment 7•11 years ago
|
||
I need to see if other tests are affected- There are some other failures, but nothing as large as this. We have different preferences on talos than we do for desktop tests- although many are the same.
Here is the preferences we use:
http://hg.mozilla.org/build/talos/file/tip/talos/PerfConfigurator.py#l237
for ts paint, we don't use other preferences, so this should be it.
Flags: needinfo?(jmaher)
Updated•11 years ago
|
Whiteboard: [talos_regression] → [talos_regression][Australis:P-]
Comment 8•11 years ago
|
||
The changes for bug 989626 are only seen when in-content preferences are loaded, not to mention that in-content preferences are currently disabled by default.
No longer blocks: 989626
Updated•11 years ago
|
Comment 9•11 years ago
|
||
the code modified in bug 984015 is only used by PlacesTransactions.jsm that is currently unused (disabled behind a pref), so it's not executed in this case.
No longer blocks: 984015
Comment 10•11 years ago
|
||
jmaher said on IRC he was going to investigate/trigger some reruns. Hand it off to me if necessary later.
Assignee: nobody → jmaher
Reporter | ||
Comment 11•11 years ago
|
||
I don't believe we have any other tests/platforms which are failing, time to do some try pushes
Reporter | ||
Comment 12•11 years ago
|
||
backout 986582: https://tbpl.mozilla.org/?tree=Try&rev=4d0eaa96acb6
backout 992208: https://tbpl.mozilla.org/?tree=Try&rev=7d7aa86bee45
will do more tonight if these don't help
Reporter | ||
Comment 13•11 years ago
|
||
bug 992208 seems to be the culprit, when it was backed out (https://tbpl.mozilla.org/?tree=Try&rev=7d7aa86bee45), our numbers for ts_paint are <900, and in line with what we were at before the regression.
I assume it has to do with:
https://hg.mozilla.org/try/diff/7d7aa86bee45/browser/installer/package-manifest.in
:bsmedberg, any thoughts on this? I am sort of surprised- maybe I am reading the numbers wrong?
Flags: needinfo?(benjamin)
Comment 14•11 years ago
|
||
That's awesomely weird. Basically what's happening is we're running this code on profile-after-change now when we weren't before:
http://hg.mozilla.org/mozilla-central/annotate/25aeb2bc79f2/browser/experiments/ExperimentsService.js#l31
to
http://hg.mozilla.org/mozilla-central/annotate/25aeb2bc79f2/browser/experiments/Experiments.jsm#l355
Of the set of code running here, the only think I can see which might be a startup issue is the call to _startWatchingAddons which calls into AddonManager.addAddonListener/addInstallListener.
The only thing that's a little weird is that we don't seem to be checking gExperimentsEnabled anywhere in the main sequence of _loadFromCache/_main/_evaluateExperiments.
Assignee: jmaher → georg.fritzsche
Component: Talos → Client: Desktop
Flags: needinfo?(benjamin) → needinfo?(gps)
Product: Testing → Firefox Health Report
Reporter | ||
Comment 15•11 years ago
|
||
is there anyway the profile we use for talos could be affecting this? I am open to trying out anything if needed.
Comment 16•11 years ago
|
||
All of the imports (https://hg.mozilla.org/mozilla-central/annotate/25aeb2bc79f2/browser/experiments/Experiments.jsm#l14) will also contribute to latency. Things like Metrics.jsm (which should be lazy loaded since it is only used for FHR) pull in a lot of sub-modules and services.
Unless we really need Experiments to be initialized ASAP, we should consider delay loading it until after session restore and possibly a few seconds after that.
FWIW, I don't think Experiments.jsm should be adding significantly new memory, CPU, or I/O to Firefox initialization. It's a thousand or so lines of JS + a few kb of read I/O to pull in state. However, timing is everything. It is likely initializing things a few seconds earlier than before and that's causing the regression. I'd be very interested in comparisons of usage 60s after startup to confirm or disprove my hypothesis: that will tell us if there is a deeper performance problem to investigate.
Flags: needinfo?(gps)
Comment 17•11 years ago
|
||
If there is an active experiment we certainly don't want to wait: the experiment should be activated as soon as we know that it's still applicable.
Comment 18•11 years ago
|
||
Attachment #8403559 -
Flags: review?(gps)
Comment 19•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #17)
> If there is an active experiment we certainly don't want to wait: the
> experiment should be activated as soon as we know that it's still applicable.
That may require tighter Add-on Manager integration. This is a result of having experiment add-ons disabled by default. In our current world, we require the Experiments service to be initialized to activate experiments at start-up time. That has performance implications. Switching back to having the Experiments service disable an experiment if needed sometime after initialization would be better for perf. We could have the Add-on Manager refuse to activate an experiment add-on unless the Experiments XPCOM component is present to plug that hole.
Can we just delay experiment init today and fix the activation window as a followup?
Comment 20•11 years ago
|
||
Comment on attachment 8403559 [details] [diff] [review]
Attempt lazy loading, rev. 1
Review of attachment 8403559 [details] [diff] [review]:
-----------------------------------------------------------------
It will be interesting to see how much this moves the needle.
Attachment #8403559 -
Flags: review?(gps) → review+
Comment 21•11 years ago
|
||
Keywords: leave-open
Comment 22•11 years ago
|
||
Reporter | ||
Comment 23•11 years ago
|
||
I am not seeing any cleanup of ts paint from this:
http://graphs.mozilla.org/graph.html#tests=[[83,131,33],[83,64,33],[83,132,33]]&sel=1394460105054,1397052105054&displayrange=30&datatype=running
Assignee | ||
Comment 24•11 years ago
|
||
I'm currently checking through the code path that we hit early to properly honor the pref.
I assume async tasks triggered from "profile-after-change" don't contribute to this?
Reporter | ||
Comment 25•11 years ago
|
||
I have no idea about the async tasks and profile-after-change events. Talos does a warmup run to initialize the new profile. This is a simple launch of the browser to get some meta data and let the profile initialization take place outside of the performance numbers.
Comment 26•11 years ago
|
||
I dealt with Talos foo with FHR. See bug 828153 (which appears to be missing some of the context I have in my head - there must be another bug).
Anything that Experiments.jsm does (including async tasks) after profile-after-change will affect first paint. We've now introduced new code that runs very early after application startup and anything it does will delay first paint, session restore, and make early browser performance worse. Async or not, it is code that is executing and thus contending for system resources.
It is in the best interest of any new feature to delay *all* code execution until as far down the line as possible. If this is not possible, my recollection is you need sign-off from someone on the perf team (Vladan?) and that sign-off will only likely occur once there is satisfaction the performance impact is as minimal as possible. This means minimizing JSMs, compartments, I/O, etc.
Assignee | ||
Comment 27•11 years ago
|
||
Alright, that makes sense. I'll do try pushes tomorrow with some changes.
I'm not sure what effectively makes the most impact here, but it seems like we should definitely get lower impact for when experiments are preffed off (which should be what we worry about for release).
Reporter | ||
Comment 28•11 years ago
|
||
thanks gps and gfritzsche.
Comment 29•11 years ago
|
||
So, adding the XPCOM component manifest to the package manifest regressed performance. Since the XPCOM component (Experiments) is currently prefed off, this is a sign that the Experiments XPCOM component is implemented sub-optimally.
Recommendations:
1) We should not be loading Experiments.jsm if the preference is not enabled.
2) We should be calling Experiments.instance() if the preference is not enabled.
3) We should optionally move the preference observers from Experiments.jsm into ExperimentsService.js.
Tree rules dictate that 1bd9fa893bf8 should be backed out for regressing performance until a fix is in place. I recommend the patch be written quickly.
Updated•11 years ago
|
Updated•11 years ago
|
No longer blocks: 991034
Whiteboard: [talos_regression][Australis:P-] → [talos_regression]
Assignee | ||
Comment 30•11 years ago
|
||
Assignee | ||
Comment 31•11 years ago
|
||
Slightly different approach, comparing upcoming patch [1] against:
* current rev:
* try: https://tbpl.mozilla.org/?tree=Try&rev=fb5badf05c52
* compare: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=fb5badf05c52&newRev=3c0aa7609d5e&submit=true
* bug 992208 backed out:
* try: https://tbpl.mozilla.org/?tree=Try&rev=1095176cfaca
* compare: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=1095176cfaca&newRev=3c0aa7609d5e&submit=true
[1] https://tbpl.mozilla.org/?tree=Try&rev=3c0aa7609d5e
Assignee | ||
Comment 32•11 years ago
|
||
Assignee | ||
Comment 33•11 years ago
|
||
I expect us to hit the same issue again when we land bug 988873 (i.e. enable the feature), checking:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=1095176cfaca&newRev=7b24409c9a49&submit=true
https://tbpl.mozilla.org/?tree=Try&rev=7b24409c9a49
Assignee | ||
Updated•11 years ago
|
Flags: firefox-backlog?
Assignee | ||
Comment 34•11 years ago
|
||
Next things to check for with the feature turned on:
* timers around loading Experiments.jsm
* timers around loading the AddonManager
Assignee | ||
Comment 35•11 years ago
|
||
try push with timing around imports and first accessing the Experiments instance:
https://tbpl.mozilla.org/?tree=Try&rev=64bd9134ae15
Assignee | ||
Comment 36•11 years ago
|
||
Comment on attachment 8404740 [details] [diff] [review]
Skip Experiments.jsm load if pref is off
This avoids any real impact if the feature is off, but means that we now require a restart to turn it on.
Attachment #8404740 -
Flags: review?(benjamin)
Assignee | ||
Comment 37•11 years ago
|
||
... and the Talos comparisons look good:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=fb5badf05c52&newRev=3c0aa7609d5e&submit=true
Comment 38•11 years ago
|
||
Comment on attachment 8404740 [details] [diff] [review]
Skip Experiments.jsm load if pref is off
We *could* add pref observers here to do that, but I agree in the short term that doesn't matter.
Attachment #8404740 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 39•11 years ago
|
||
Timing imports et al didn't reveal anything major:
* first access to Experiments.instance() is 8-9ms
* first import of Experiments.jsm is 5-6ms
Metrics.jsm is the only (sub-)import that stands out with 3-4ms.
Assignee | ||
Comment 40•11 years ago
|
||
Assignee | ||
Comment 41•11 years ago
|
||
Per IRC we are hitting the XPI database via AddonManager.getAddonsByTypes(). Caching of active addons avoids hitting it for addon activation for most startup scenarios.
We can avoid hitting the XPI DB by storing in a pref whether we have an active experiment.
If not, we can delay Experiments initalization until 30s after session restore
If we do, we initialize right away, triggering cache load, AM sync, ...
Updated•11 years ago
|
Flags: firefox-backlog? → firefox-backlog+
Whiteboard: [talos_regression] → [talos_regression][p=5]
Updated•11 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [talos_regression][p=5] → [talos_regression] p=5 s=it-31c-30a-29b.2 [qa?]
Comment 42•11 years ago
|
||
QA will assume this bug is fixed based on the pre-existing tests. If this needs any special manual test coverage then please needinfo me.
Whiteboard: [talos_regression] p=5 s=it-31c-30a-29b.2 [qa?] → [talos_regression] p=5 s=it-31c-30a-29b.2 [qa-]
Comment 43•11 years ago
|
||
Updated•11 years ago
|
Whiteboard: [talos_regression] p=5 s=it-31c-30a-29b.2 [qa-] → [talos_regression] p=5 s=it-31c-30a-29b.3 [qa-]
Reporter | ||
Comment 44•11 years ago
|
||
this looks to be fixed according to graph server! Is there a reason not to close this bug?
Comment 45•11 years ago
|
||
The patch that was checked in was a bandaid which would regress when experiments are enabled by default again. This bug remains open to fix the performance issue when experiments are enabled by default.
Assignee | ||
Comment 46•11 years ago
|
||
Assignee | ||
Comment 47•11 years ago
|
||
And... a push with talos runs actually enabled:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=fb5badf05c52&newRev=3019d0354e6d&submit=true
https://tbpl.mozilla.org/?tree=Try&rev=3019d0354e6d
Assignee | ||
Comment 48•11 years ago
|
||
Talos other looks fine, an xpcshell test fixup was needed to:
https://tbpl.mozilla.org/?tree=Try&rev=8d7d400b6a59
Assignee | ||
Comment 49•11 years ago
|
||
Attachment #8406773 -
Flags: review?(benjamin)
Assignee | ||
Comment 50•11 years ago
|
||
Attachment #8406774 -
Flags: review?(benjamin)
Assignee | ||
Comment 51•11 years ago
|
||
Attachment #8406775 -
Flags: review?(benjamin)
Comment 52•11 years ago
|
||
Comment on attachment 8406773 [details] [diff] [review]
Delay initializing Experiments if there is no experiment running
>+XPCOMUtils.defineLazyGetter(
>+ this, "gActiveExperiment", () => {
>+ try {
>+ let prefs = Services.prefs;
>+ return prefs.getBoolPref(PREF_ACTIVE_EXPERIMENT);
>+ } catch (e) {
>+ return false;
>+ }
>+ });
The try/catch here is unfortunate. Is that really better than using Preferences.jsm and .get(PREF_ACTIVE_EXPERIMENT, false)?
Attachment #8406773 -
Flags: review?(benjamin) → review+
Updated•11 years ago
|
Attachment #8406774 -
Flags: review?(benjamin) → review+
Updated•11 years ago
|
Attachment #8406775 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 53•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Keywords: leave-open
Comment 54•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/beb482c201c5
https://hg.mozilla.org/mozilla-central/rev/7d9a7a531275
https://hg.mozilla.org/mozilla-central/rev/a40bdbcd2304
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 31
Updated•11 years ago
|
Status: RESOLVED → VERIFIED
Updated•11 years ago
|
Assignee | ||
Updated•10 years ago
|
Updated•6 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•