Crashing user in wrong e10s experiment branch

RESOLVED WORKSFORME

Status

()

Firefox
General
RESOLVED WORKSFORME
a year ago
a year ago

People

(Reporter: Jonathan Howard, Assigned: Felipe)

Tracking

45 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10sm9+)

Details

(Reporter)

Description

a year ago
bp-0090c192-26dd-490b-a874-b7c5f2160213

There may be more, this one was obvious when knowing bug it relates to (non-issue as fixed with update.)
Same user seems to have crashes 19 times over half hour.

My guess is that the add-ons are all system-wide.
Flags: needinfo?(felipc)
(Assignee)

Comment 1

a year ago
Dave, Gabor, what do you think?
The user on this crash report just got into the new experiment, and they have various add-ons but weren't blocked. Jonathan has the hunch that they are all system-wide.

The add-on "{82AF8DCA-6DE9-405D-BD5E-43525BDAD38A}" is Skype Click-To-Call.
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(dtownsend)
(Assignee)

Updated

a year ago
Flags: needinfo?(felipc)
Besides the problem listed here, I thought we were supposed to switch to safe mode after a certain number of startup crashes.
(Assignee)

Comment 3

a year ago
fwiw it doesn't look like bingsearch.full@microsoft.com is a system add-on.
Based on bug 1210099 and the initial comment of this bug I would imagine that the db of add-ons can get into an invalid state. I remember a bug while working on bug 1234675 that if the browser crashing after I disable an add-on, after restart the add-on starts up but the manager shows that the add-on is disabled. I think we should investigate how can the add-on data we store be more resilient to crashes.

We should investigate these cases at least and make sure everything works consistently or if I can reproduce the wrong add-on should block e10s flag with one of these:
1. installing add-on and crashing
2. uninstalling add-on and crashing
3. disabling add-on and crashing
4. enabling add-on and crashing
5. installing add-on and crashing during install
6. uninstalling add-on and crashing during uninstall

I will look into these cases tomorrow. There can be something else the reason, Dave might have better ideas...
Flags: needinfo?(gkrizsanits)
How does the experiment control the e10sBlocksEnabling flag exactly?
Flags: needinfo?(felipc)
(Assignee)

Comment 6

a year ago
It's set to true on the first startup() of the experiment (for users in the test group), and is left there untouched until the experiment uninstalls (where it's reset).
Flags: needinfo?(felipc)
(In reply to :Felipe Gomes (needinfo me!) from comment #6)
> It's set to true on the first startup() of the experiment (for users in the
> test group), and is left there untouched until the experiment uninstalls
> (where it's reset).

OK and when do we decide which version of the experiment add-on the user gets? Does the user have both installed and we check the flag at startup and enable the version it needs or how does it work?

Here is what I found, if one disable/enable an add-on that requires restart (which we force for all add-ons) we flip the e10sBlockedByAddons flag immediately while the add-on does not change state until restart. At this moment until restart the flag is not in sync with the status of enabled add-ons. (So if you enable an add-on but do not restart the browser yet the flag is true already while the add-on is not active yet). After next start up everything gets in sync. This can explain what we have seen right? Flagging you again for conformation.

It worth nothing that crash at the 'right' moment can further mess with the add-on status. If we crash after an add-on installation that would need a restart we start up next time with the add-on disabled, but I've seen the flag in the wrong state once in this scenario too :( I need to work a bit more to understand the STR there.
Flags: needinfo?(felipc)
(Assignee)

Comment 8

a year ago
There's only one version of the experiment running, and it gets installed by the code in Experiments.jsm. It's somewhat similar to the hotfix system.

What you mention about the flag being set after install (before a restart) can explain an edge case (if the user had all add-ons disabled and then enabled one and crashed by something unrelated before restarting), but is not what we're seeing here, because the crash stack is related to an add-on that is really running (Skype Click-to-call)

Also, the experiment add-on tags the user (in what we call "branches"), and the branch seen in this crash report ("experiment-no-addons") is only set after a restart, after verifying that e10s was activated successfully (by checking Services.appinfo.browserTabsRemoteAutostart == true).
Flags: needinfo?(felipc)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #7)
> (In reply to :Felipe Gomes (needinfo me!) from comment #6)
> > It's set to true on the first startup() of the experiment (for users in the
> > test group), and is left there untouched until the experiment uninstalls
> > (where it's reset).
> 
> OK and when do we decide which version of the experiment add-on the user
> gets? Does the user have both installed and we check the flag at startup and
> enable the version it needs or how does it work?
> 
> Here is what I found, if one disable/enable an add-on that requires restart
> (which we force for all add-ons) we flip the e10sBlockedByAddons flag
> immediately while the add-on does not change state until restart.

Right, I think we can fix that actually. The other thing we can do is not update the pref until after the DB is saved to disk and always flush the prefs file to disk when we change the pref. That should only incur a penalty on startups where changes are happening to the DB which are already accepted to be slower than normal.

> It worth nothing that crash at the 'right' moment can further mess with the
> add-on status. If we crash after an add-on installation that would need a
> restart we start up next time with the add-on disabled, but I've seen the
> flag in the wrong state once in this scenario too :( I need to work a bit
> more to understand the STR there.

If you find ways to trigger that we can certainly try to fix it. To a point though we can't entirely get rid of inconsistencies without causing performance penalties, there might be things we could try that make things better in many cases.
Flags: needinfo?(dtownsend)
tracking-e10s: --- → m9+
Assignee: nobody → felipc
I filed bug 1248753 about the safe mode issue.
(Assignee)

Updated

a year ago
Depends on: 1248796
(Assignee)

Comment 11

a year ago
So it really boils down to two possibilities:

 (1) the pref is unset, meaning that updateAddonsBlockingE10s didn't run
 (2) the pref is set to false, meaning that updateAddonsBlockingE10s ran but didn't get a correct list of add-ons

I filed bug 1248796 to help figure out if it's (1) or (2).
Blocks: 1251376
(Assignee)

Comment 12

a year ago
Chutten did some analysis on telemetry data from the last week of the beta45 experiment. From a sample of 208k pings from beta45 build8, there were only 15 clients with a mismatched branch. All but one had only one add-on installed, so these specific reports could be for users who had just installed a first add-on but not restarted yet (to get the branch updated).

Also, they all had the probe E10S_ADDONS_BLOCKER_RAN (from bug 1248796) set to true. The telemetry dashboard also shows 100% to true (for e10s users) on beta45.

Looks like it's not a widespread issue and could also be caused by some crash that left things in an inconsistent state. We should continue to keep an eye on crash stats but it seems for the moment there's no reason to overreact.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.