Closed Bug 1249209 (e10s-crashes) Opened 4 years ago Closed 3 years ago

[meta] tracking: e10s top crashes

Categories

(Firefox :: General, defect)

defect
Not set

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: poiru, Assigned: benjamin)

References

(Depends on 4 open bugs)

Details

(Keywords: meta)

Tracking bugs for top crashes in e10s beta experiment 2, part 2 (i.e. e10s-beta45-withoutaddons@experiments.mozilla.org).
Depends on: 1140115, 1248241
Bug 1247380 wasn't e10s specific but likely due to ease of restore crashed tab, gained twice the bug submissions than non-e10s. (Patch landing in b7, hopefully works as bug without steps.)
Depends on: 1247380
No longer depends on: 1003004
FWIW, that query only looks at content crashes, and not at potential browser crashes that e10s may help make worse. I took a look at comparing the percentages of crash signatures in searches for all non-e10s and all e10s crashes, take a look here: https://crash-analysis.mozilla.com/rkaiser/datil/searchcompare/?common=product%3DFirefox%26version%3D45.0b6%26process_type%3Dbrowser%26process_type%3Dcontent&p1=dom_ipc_enabled%3D__null__&p2=dom_ipc_enabled%3D!__null__
Anything with red on the right-most column means those crashes have a higher percentage in e10s-enabled crashes, green means lower percentage (numbers are percent points difference). The top line about OOM|small is actually not reason to celebrate (yet), it's fallout of e10s bug 1236108
The loss of big OOM signiture isn't too bad as it just switches to counting under other signitures.

shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | nsThread::ProcessNextEvent | NS_ProcessNextEvent | mozilla::SharedThreadPool::SpinUntilEmpty
a/v failings. Being uncounted, bug 1241106. 
Have to rely on telemetry of ignore shutdownhangs too.
Alias: e10s-crashes
Note that kairo's comparison does not take into account differing total crash rates. Including a rate adjustment gives a slight reordering of some signatures. (A full +/- column currently sums to 0 with rate adjusted it would sum to be the change e.g. 10% rise in crashes or whatever rate is used.)
It is a bit more than slight.
Open this (7 day)
https://crash-analysis.mozilla.com/rkaiser/datil/searchcompare/?common=process_type%3Dbrowser%26process_type%3Dcontent%26ActiveExperiment%3D%253De10s-beta45-withoutaddons%2540experiments.mozilla.org%26date%3D%3E%253D2016-02-14%26date%3D%3C2016-02-21&p1=ActiveExperimentBranch%3D%253Dcontrol-no-addons&p2=ActiveExperimentBranch%3D%253Dexperiment-no-addons
Add a breakpoint at scomp.js:99 gSigData[result2[i].term].pct2 = 100 * result2[i].count / total2;
Reload
Change total2 from 20913 to 15749
  [20913/(20913/17324*1.1) accessibility adjustment just rough estimate.]
Clear break and continue.
(In reply to Jonathan Howard from comment #5)
> Note that kairo's comparison does not take into account differing total
> crash rates.

Yes, intentionally. We wouldn't even have the necessary data to respect rates in the place this is looking at.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #7)
> (In reply to Jonathan Howard from comment #5)
> > Note that kairo's comparison does not take into account differing total
> > crash rates.
> 
> Yes, intentionally. We wouldn't even have the necessary data to respect
> rates in the place this is looking at.

Automatically it is not possible but entering the right value manually can lead to better results. (Even potentially wrong value might highlight an overlooked change. Off bug topic: looking at 44b9 vs 45b9 I wonder what happens when I rate adjust to have "OOM | small" the same.)
The OOM|small drop with e10s (and probably some increases in other signature the other way) is bug 1236108, the mechanism that detects "OOM | ..." and puts that into signature shas not yet been implemented for content processes. That probably makes a number of signatures related to OOM differ needlessly.
Depends on: 1233723
Depends on: 1251150
Depends on: 1247982
http://www.unbiased.name/ff/e10scrashes/

Looks like ipc corruption fix has solved a couple top signatures since 45. b1 is en-US only, anyone's guess if other locales will have impact.
mozilla::ipc::TransferHandleToProcess probably bug 1140115
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Duplicate of this bug: 1234647
No longer depends on: 1034706
No longer depends on: 1233723
No longer depends on: 1247982
Depends on: 1258312
Depends on: 1258317
Depends on: 1145613
Depends on: 1258331
Depends on: 1258604
No longer depends on: 1248241
No longer depends on: 1251150
Depends on: 1258553
No longer depends on: 1258553
The analysis pass (reviewed and final) on the experiment data shows things are worse than they were:

https://github.com/vitillo/e10s_analyses/blob/master/beta46-noapz/e10s-stability-analysis.ipynb

App stability regression is now 62% (17.1 to 27.8)
Plugin stability regression is 50%: (8.1 to 12.1)

Here are date-controlled links to the relevant topcrash queries for the experiment:

APP CRASH LINKS:

Experiment (e10s) group topcrashes, content process only: https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=experiment-no-addons&process_type=content&date=%3E2016-03-09&date=%3E%3C2016-03-22&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

Control group topcrash, no plugins: https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=control-no-addons&date=%3E2016-03-09&date=%3C2016-03-22&process_type=!plugin&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

I am going to focus only on content crashes because the main process is much better. 

PLUGIN CRASH LINKS:

Experiment/e10s top plugin crashes: https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=experiment-no-addons&process_type=plugin&date=%3E2016-03-09&date=%3C2016-03-22&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

Control/non-e10s top plugin crashes: https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=control-no-addons&process_type=plugin&date=%3E2016-03-09&date=%3C2016-03-22&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

I'm writing a script to highlight the differences for prioritization now: some of the bugs currently blocking this one seem to be "generic topcrashers" and are not the most likely place to look.
:bsmedberg the 'Experiment (e10s) group topcrashes, content process only' App Crash link is not working for me.
Flags: needinfo?(benjamin)
beta 45:
                                 non-e10s       e10s
usage hours                          5382       4742
chrome crashes                      72659      33591
content crashes                     11731      75757
plugin crashes                      41897      55809
main crash rate                     13.50       7.08
main+content crash rate             15.68      23.06
plugin crash rate                    7.78      11.77

beta 46:
                                 non-e10s       e10s
usage hours                          3312       2547
chrome crashes                      50409      20203
content crashes                      6320      50546
plugin crashes                      26843      30741
main crash rate                     15.22       7.93
main+content crash rate             17.13      27.77
plugin crash rate                    8.10      12.07

Interesting that beta 46 regressed generally, and that e10s seems to be adversely affected by that.
Depends on: 1259183
Depends on: 1259192
Depends on: e10s-oom
No longer depends on: 1259183
No longer depends on: 1235633
Depends on: 1259214
Depends on: 1260721
Depends on: 1259480
No longer depends on: 1259187
No longer depends on: 1259827
Depends on: 1262852
Depends on: 1252152
Depends on: e10s-plugincrashes
No longer depends on: 1252152
No longer depends on: 1262852
Depends on: 1158189
Depends on: 1169247
No longer depends on: 1259480
Depends on: 1034706
Depends on: 1265680
Depends on: 1266171
Depends on: 1263951
Duplicate of this bug: 899758
No longer depends on: shutdownkill
Jim,

you mentioned this bug as the follow up bug of bug 899758...
(https://bugzilla.mozilla.org/show_bug.cgi?id=899758#c2)

...and the alias of this bug is "e10s-crashes"...

...so I set bug 1219672 ("ShutDownKills-Win") as a block of this bug, like it was before with bug 899758 ("crash-e10s")...

...but you removed here the dependency again...

So is this bug now a the follow up bug, or not ???
Flags: needinfo?(jmathies)
(In reply to Tobias B. Besemer [:BesTo] (QA) from comment #20)
> Jim,
> 
> you mentioned this bug as the follow up bug of bug 899758...
> (https://bugzilla.mozilla.org/show_bug.cgi?id=899758#c2)
> 
> ...and the alias of this bug is "e10s-crashes"...
> 
> ...so I set bug 1219672 ("ShutDownKills-Win") as a block of this bug, like
> it was before with bug 899758 ("crash-e10s")...
> 
> ...but you removed here the dependency again...
> 
> So is this bug now a the follow up bug, or not ???

We're tracking a specific set of actionable crash bugs related to beta experiments we have running here. The ShutDownKills-Win is too general to be hooked up here.
Flags: needinfo?(jmathies)
Depends on: 1268881
Depends on: 1268900
Depends on: 1191503
Depends on: 1269471
I want to suggest as dependencies bug 627706 and bug 1268559.
Depends on: 1268662
Depends on: 1268721
Can someone plz have a look on bug 1265812?
Depends on: 1269961
Depends on: 1257996
Depends on: 1244883
Depends on: 1272151
Depends on: 1182927
Depends on: 1269036
Depends on: 1279514
Depends on: 1282776
Keywords: meta
Summary: [meta] tracking: e10s beta 46 experiment top crashes → [meta] tracking: e10s top crashes
No longer tracking this separately from normal stability work.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.