Closed Bug 1037595 Opened 10 years ago Closed 10 years ago

[Search experiment] Experiment branch is changing across days

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

x86_64
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: dzeber, Unassigned)

References

Details

Attachments

(5 files)

Some of the profiles selected to participate in the search experiment are getting branches reassigned one or more times. This is occurring for 36% of experiment participants (based on the FHR snapshot from 2014-07-07). 

Looking at the FHR data.days containing the experiments.info for the search experiment, the lastActiveBranch value changes across days for these profiles. 

Attached is a CSV showing the frequency of the different values in order ("a,b" means that value a occurred for the first few days and b for the rest of the time, for example).
Rows 2-15 are what I'd expect if we had a race bug where we were simply failing to write out the experiment branch correctly in some first-run situations, but the search setting is saved properly.

I can imagine this happening in this scenario: we are starting the experiment at about the same time we're saving the experiment cache to disk here: http://hg.mozilla.org/mozilla-central/annotate/75f66f8cb99f/browser/experiments/Experiments.jsm#l954

We may end up interleaving the call which sets the dirty flag in the middle of that: http://hg.mozilla.org/mozilla-central/annotate/75f66f8cb99f/browser/experiments/Experiments.jsm#l645 which would leave us not knowing that the cache was dirty, and never re-saving it. That race is actually fairly likely, given that we will have just activated the experiment.

In rows 16-31, we appear to have a condition where the first experiment start is completely forgotten: both the search setting and the branch setting are gone. I cannot explain this the same way: maybe we're also not saving prefs at the end of the current session, which is known to sometimes happen. But I wouldn't expect that to happen to 8.7% of sessions. Or maybe those users are immediately switching back to Google, and going through normal branch assignment again, but I'd expect different rates for 2/4 than 3/5 in that case. Felipe do you have any other thoughts about why that might happen?
Flags: needinfo?(felipc)
Can I get a sample payload from a user that have had branch changed more than once, or specially that has changed from X to something other than 0?
Flags: needinfo?(dzeber)
Flags: needinfo?(dzeber)
So, one interesting thing about this data is that, for all cases of the branch switching to something other than 0, and the cases where the branch switch keep constantly happening, the search engine never moves away from Google.

I think that it's possible that some add-ons, virus and/or anti-virus might be blocking the engine switching, or causing it to throw an error. That, + the fact with the race condition mentioned above (which would not save the branch info to disk), might mean that users stuck on Google can keep getting reassigned a new branch every time, would explain that.

I don't think that we're actually succeeding in changing the provider and the users are switching back. If that were the case, we would be switching daily and they would be fighting back, and we would have received a bug report about it now, I imagine..

All the samples seem to have a non-trivial list of add-ons installed, but I couldn't find a common add-on on all of them.

I've been staring at the set currentEngine() code from nsSearchService.js but I don't see how it could throw, except maybe if it falls back to sync service initialization through _ensureInitialized due to an add-on forcing it. There's actually a telemetry probe for that, SEARCH_SERVICE_INIT_SYNC, but it only landed in 32. Looking at the telemetry dashboard, it appears that 4% of sessions fallback to sync initialization, so that's statistically interesting..

Also worth noting that all samples here are from Windows.. It would be good to know if there are any non-Windows problematic samples.
Flags: needinfo?(felipc)
Filed bug 1038174 on the race condition.
Depends on: 1038174
There does not appear to be any significant relationship with OS. Multiple branches are occurring on multiple versions of all 3 OSs, and in mostly the same proportions as the overall OS distribution. That said, having the branch entry missing on a given day is slightly more likely on non-Windows.
This is a look at what search providers are showing up as default (searches.engines) on the days in the experiment, for profiles with multiple branches. 

For profile in a branch, I took the sequence of (branch, default search provider) for each day on which the experiment was active, and removed repeats (days with the same values as the previous day). I then collected all such sequences among profiles on the same branch, and found the most common, together with its percent occurrence in the branch. These most common sequences (for branch sequences with at least 50 profiles) are listed in the attached CSV. For search providers listed here, "other" is anything that is not "google", "bing", or "yahoo".

Observations:
- For multiple branch sequences that don't involve branch 0, most profiles (80-90%) keep google as the default the whole time. 

- For profiles that start on 1-5 and then switch to 0, the most common behaviour is to start on google and then switch to the correct provider for the initial branch (60-70% profiles). For example, for users that switched from 2 to 0, 70% had google as the default while on 2, and then bing as their default while on 0.

- For branch sequences that switched from 0 to something else, there is much more variety in the search provider sequences (the most common sequence only occurs for 30-40% of profiles).
Engineering bugs verified that we've fixed this issue for future experiments. I don't think there's anything left to do with this bug itself.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: