Closed Bug 1399936 Opened 8 years ago Closed 8 years ago

Shield unenrolls users in active preference experiments on upgrade

Categories

(Firefox :: Normandy Client, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Tracking Status
firefox55 --- unaffected
firefox56 blocking fixed
firefox57 --- unaffected

People

(Reporter: jlockhart, Assigned: osmose)

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(2 files, 1 obsolete file)

The Activity Stream team is conducting a study using the Shield Pref Flip mechanism to enable Activity Stream for 25% of the en-US 56 Beta population (with an additional tracked 25% control variant where Activity Stream is disabled). The bug for that study is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1389722#c12 Both Ping Centre and Unified Telemetry show that this population stabilized at ~200k users in each arm of the study (control and variant) but this week that number is approaching 0. Ping Centre (which counts only the variant cohort): https://sql.telemetry.mozilla.org/queries/36210#96940 Unified Telemetry: https://sql.telemetry.mozilla.org/queries/27149#93513 The unified telemetry data shows us a more interesting perspective, because it shows that the control variant has a stable population and only the Activity Stream variant is decreasing. This drop in users coincides with the deployment of 56b11 from 56b10, but it's not obvious in any way that that is directly related. It's unclear whether this is 1) A problem witih telemetry collection, but having two independent telemetry systems show the same trend seems to discount that 2) A problem with shield enrollment, but the stability of the control population may indicate that that's not the case? 3) A problem with activity stream itself, it could be crashing for some reason?
Can user's opt-out of the experiment in some way? If so, do our cohorts capture those users?
I tested locally on Beta and was enrolled in the experiment a few times in the feature branch. The very first time I enrolled, I restarted the browser and was suddenly unenrolled for reasons I can't explain. I haven't been able to replicate it since, but it suggests that some behavior on startup/shutdown may be affecting this.
I was able to reproduce a crashed state in Beta 56.0b11 (I'm on Mac OS 10.12.6) - it appears that no system add-ons are being loaded at all. - about:newtab shows a blank html page; - about:home is missing onboarding (which is a system add-on) - about:addons shows a blank screen with an endless loading spinner Console output: https://gist.github.com/k88hudson/240ee1cc0844dfb20813f1b59c154d64 I do see: NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized AddonManager.jsm:1947
(In reply to Kate Hudson :k88hudson from comment #3) > I was able to reproduce a crashed state in Beta 56.0b11 (I'm on Mac OS > 10.12.6) - it appears that no system add-ons are being loaded at all. > > - about:newtab shows a blank html page; > - about:home is missing onboarding (which is a system add-on) > - about:addons shows a blank screen with an endless loading spinner > > Console output: > https://gist.github.com/k88hudson/240ee1cc0844dfb20813f1b59c154d64 > > I do see: > > NS_ERROR_NOT_INITIALIZED: AddonManager is not initialized > AddonManager.jsm:1947 I'm unable to replicate this on the same version/platform (Beta 56.0b11 / Mac OS 10.12.6). Are there any replication steps?
fwiw from 9-9 on the crash rate for clients in the as-enabled branch, after controlling for activeTicks, is significantly higher than the control branch (p value goes from ~.2 to very tiny) DATE ALL CLIENTS CONTROL AS-ENABLED BOTH CONTROL CRASHES/ACTIVETICKS AS-ENABLED CRASHES/ACTIVETICKS P-VALUE 20170908 772670 350648 419668 2354 0.002561633 0.002704299 0.20317512 20170909 625211 290016 333106 2089 0.002599239 0.003225231 4.35E-07 20170910 431217 240271 189377 1569 0.002329428 0.003192641 3.87E-08 20170911 672505 386847 283368 2290 0.002442193 0.003558063 8.29E-21 20170912 594424 392307 200098 2019 0.002581064 0.004150464 5.05E-05
I can reproduce the crash by downloading a fresh version of beta 11 and opening it for the first time. After restarting it the second time, the issue seems to be fixed. This doesn't happen every time unfortunately, but both :emtwo and I have been able to hit it
(In reply to Kate Hudson :k88hudson from comment #6) > I can reproduce the crash by downloading a fresh version of beta 11 and > opening it for the first time. After restarting it the second time, the > issue seems to be fixed. Are you using a new profile every time? If not, what was the last version the profile was used with?
I would speculate that this may happen on first run after an update to beta 11, too. It sounds like it's intermittent and we're not sure if there is any pattern. But it must be widespread (from looking at the data in comment 0). This should likely block 56 and 57 since it breaks the add-on manager including system addons on either first download or first run after an update. Andrei, can your team help us narrow down the STR and regression range? It is fairly urgent. Thanks!
kmag and I were discussing in IRC just now, and it's possible that uplifting bug 1391187 would fix this. From reading the bug it looks like the intention there was to ask for uplift but it wasn't done yet. It'd be nice to have STR to be able to confirm it in any case.
Flags: needinfo?(lhenry)
Maybe we can test with that patch and confirm.
Flags: needinfo?(lhenry)
I was testing with an existing profile, so that could have been the cause of the issue. Let me see if I can reproduce with a clean profile
ahillier was able to reproduce this on beta 11 as well, with an existing profile and the following console output: TypeError: gShutdownBarrier is null[Learn More] (line 3797 addon manager)
Ok, so the full STR is: - download beta 11 - open up another version of firefox, such as firefox nightly, then close it - open beta 11 with the same profile - check about:addons; it should be endlessly loading
When following the STR above, on one run (but not consistently) I got this in the browser console: 1505506261561 extensions.shield-recipe-client.bootstrap ERROR Failed to initialize addon studies: Log.jsm:752 https://pastebin.com/raw/qrLSPz3m
Have also been able to get this on beta 11 on a fresh profile (US locale): 1505509069364 extensions.shield-recipe-client.recipe-runner ERROR Could not execute recipe Preference Experiment: Activity Stream in Beta (Bug 1369150): Log.jsm:752 https://pastebin.com/raw/6cWR5cRW
(In reply to Kate Hudson :k88hudson from comment #13) > Ok, so the full STR is: > > - download beta 11 > - open up another version of firefox, such as firefox nightly, then close it > - open beta 11 with the same profile > - check about:addons; it should be endlessly loading Are you using the same profile in all the steps? Or, a different profile for step 2 where you open and close Nightly? Using a profile opened in Nightly and opening it in an older version is a known issue (and not something we support) It also seems unlikely to be a widespread behavior enough to cause the drop-off.
Flags: needinfo?(khudson)
I haven't managed to reproduce the crashed state described in comment 3. However I've managed to get into a "crashed" Activity Stream state by restarting (with the console command) while on the New Tab page, in which the page content was no longer displayed, but all the other system add-ons and about:addons load and work as expected. This behavior was entirely random (maybe a race condition?) and opening a New Tab usually "fixed" the issue. I've tried creating a profile using Beta 11, using the same profile with the latest Nightly and then going back to the initial beta installation, but aside from Nightly breakage (flexible space, buttons being in wrong places, new buttons appearing) the Activity Stream page worked as expected. I've also run into "1505736173605 extensions.shield-recipe-client.recipe-runner ERROR Could not execute recipe Preference Experiment: Activity Stream in Beta (Bug 1369150):" but I've managed to get this message on profiles with and without Activity Stream so I'm not sure it's related. Seeing as Activity Stream works in Beta, maybe the pings are not correctly sent. Jared or Kate, is there a way to see if the profiles that I have with Activity Stream correctly send their pings?
Flags: needinfo?(andrei.vaida) → needinfo?(jkerim)
You can check if the Ping Centre pings are correctly sent by setting the following prefs to true: browser.ping-centre.log browser.ping-centre.telemetry browser.newtabpage.activity-stream.telemetry Then you can check in the browser console if you see logs when you interact with Activity Stream. That said, the drop in users is visible in both the UT and Ping Centre telemetry systems as we can see in both of these graphs: Ping Centre: https://sql.telemetry.mozilla.org/queries/25980#93514 UT: https://sql.telemetry.mozilla.org/queries/27149#93513 So it's less likely that it's a problem on the telemetry side.
As an update here, we're seeing similar drops in users here: 1) Search experiment (https://sql.telemetry.mozilla.org/queries/38635#103758) 2) Onboarding experiment (https://sql.telemetry.mozilla.org/queries/38610#103685) This makes it seem more likely to be an issue in shield. A relevant ticket is now filed in the Normandy repo: https://github.com/mozilla/normandy/issues/1063
Based on the fact that 1) We see similar drops in user populations for multiple experiments and 2) We see errors in recipe initialization from the errors logs provided by ahillier We believe the error is caused by a bug in the recipe client system addon, and have filed a ticket against it here: https://github.com/mozilla/normandy/issues/1063
Flags: needinfo?(jkerim)
So, maybe this is an issue from beta 10 (released Friday Sept 8th), from https://bugzilla.mozilla.org/show_bug.cgi?id=1397415
Osmose and I found a probable explanation for this change in the Shield code. In Shield v73 (which landed in beta 10, in bug 1397415), we made changes to how we set preferences on the default-branch during startup. As background, default-branch preferences do not persist between Firefox restarts, and so Shield has to set them at every startup. Prior to v73, we stored the data for what prefs to change in IndexedDB. Starting with v73, we store this data in user-branch preferences. This approach incurs much less startup IO, and is overall faster. Shield does not have an intermediary version of this code that reads from both data sources. Users that upgraded from Shield pre-v73 to Shield v73 would not have their default-branch preferences set at startup, since the data only existed in the old system, which is no longer being read. Then later in the Shield execution process, Shield would notice that the preference was not set to the experimental value, assume that the user had manually changed it, and stop the experiment for that user. The users that have experienced this bug ended their studies, we can't get them back. Fixing this bug won't re-enroll them in the experiment, and we can't tell the difference between users that were affected by this bug, and users that actually ended the study for any other reason (such as manual intervention, or on longer being eligible).
OK. does that mean we don't need to fix this for 56 release? Or do we still want to do that, and potentially restart some experiments?
Flags: needinfo?(mkelly)
mythmon: ^
Flags: needinfo?(mcooper)
We would like to fix this for 56. There are a few preference experiments that will be active during the 56 release, and we'd like to avoid dropping users from them. I have a patch + tests that should fix the issue for unaffected experiments. As comment 22 says, experiments on Nightly and Beta that would be affected have already been affected, and future studies run on those channels should not be affected by the bug, so we only need to land the fix in 56 and do not need it in 57 or later versions.
Assignee: nobody → mkelly
Component: Activity Streams: Newtab → Add-on
Flags: needinfo?(mkelly)
Flags: needinfo?(mcooper)
Flags: needinfo?(khudson)
Product: Firefox → Shield
Summary: Drop off in Firefox 56 Beta en-US population in Shield Study → Shield unenrolls users in active preference experiments on upgrade
Severity: normal → blocker
Priority: -- → P1
Comment on attachment 8909605 [details] Bug 1399936: Migrate active preference experiment values to new storage. https://reviewboard.mozilla.org/r/181076/#review186380 ::: browser/extensions/shield-recipe-client/bootstrap.js:129 (Diff revision 1) > - this.initExperimentPrefs(); > + await this.initExperimentPrefs(); > > // If the app is starting up, wait until the UI is available before finishing > // init. > if (reason === REASON_APP_STARTUP) { > Services.obs.addObserver(this, UI_AVAILABLE_NOTIFICATION); > } else { There's a race here now, because `initExperimentPrefs` is async, and so the UI_AVAILABLE_NOTIFICATION might already have fired by the time we add the observer. Additionally, if the add-on shuts down while startup hasn't finished running yet, strange things will happen. We can't just move the observer code earlier because then the inverse might happen and shieldrecipeclient.startup() will run before the preference experiment init code runs. Here's a suggested solution: ```js async startup(data, reason) { // Initialization that needs to happen before the first paint on startup. this.initShieldPrefs(DEFAULT_PREFS); let initPromise = this.initExperimentPrefs(); // On app startup, wait for both the prefs to have initialized and // final UI startup to have happened. Otherwise, we only wait for the prefs. if (reason === REASON_APP_STARTUP) { initPromise = Promise.all([initPromise, this._promiseUIAvailable()]); } try { await initPromise; } catch (ex) { // Initialization failed, for instance because we shut down while initing. return; } ShieldRecipeClient.startup(); } _promiseUIAvailable() { return new Promise((resolve, reject) => { this._startupHandled = (success) => { Services.obs.removeObserver(this, UI_AVAILABLE_NOTIFICATION); delete this._startupHandled; if (success) { resolve(); } else { reject(new Error("Startup was aborted")); } }; Services.obs.addObserver(this, UI_AVAILABLE_NOTIFICATION); }); } observe(subject, topic, data) { // replace the existing version with this: if (topic === UI_AVAILABLE_NOTIFICATION) { this._startupHappened(true); } } ``` and make shutdown check if `this._startupHandled` exists, and call it if so. If that property does not exist, it can also safely assume we no longer need to remove the observer (because either it was never added or it was already removed). To be clear, you don't need to use this exact code as far as I'm concerned, but feel free to. Maybe there's a simpler solution to avoid the race, I'm probably missing something. ::: browser/extensions/shield-recipe-client/test/browser/browser_bootstrap.js:193 (Diff revision 1) > > decorate_task( > withBootstrap, > withStub(ShieldRecipeClient, "startup"), > async function testStartupDelayed(Bootstrap, startupStub) { > - Bootstrap.startup(undefined, 3); // 1 == ADDON_ENABLED > + await Bootstrap.startup(undefined, 3); // 1 == ADDON_ENABLED Err, should the comment be "3 == ADDON_ENABLED" ?
Attachment #8909605 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8909605 [details] Bug 1399936: Migrate active preference experiment values to new storage. https://reviewboard.mozilla.org/r/181076/#review186682
Attachment #8909605 - Flags: review+
Comment on attachment 8909935 [details] Bug 1399936: Avoid race condition during Shield add-on startup. https://reviewboard.mozilla.org/r/181418/#review186680
Attachment #8909935 - Flags: review?(gijskruitbosch+bugs) → review+
You may want to squash these before autolanding, I dunno that it makes sense to have them as separate commits in hg history - up to you.
Attachment #8909935 - Attachment is obsolete: true
Comment on attachment 8909605 [details] Bug 1399936: Migrate active preference experiment values to new storage. Approval Request Comment [Feature/Bug causing the regression]: Bug 1399936 - Shield unenrolls users in active preference experiments on upgrade [User impact if declined]: Preference studies that are running in release when 56 is released (of which there will be at least 2) will unenroll all of their users on non-control branches. [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: Not on Nightly, since this only needs to land on 56. But I have verified the fix manually on local builds of the release branch with the patch. [Needs manual test from QE? If yes, steps to reproduce]: STR: 1. Create and open a new profile on Firefox 55. 2. Open about:config 3. Set the extensions.shield-recipe-client.user_id pref to the string "shield-unenroll". You may need to create the pref if it does not exist. 3. Set the extensions.shield-recipe-client.dev_mode pref to true. 4. Restart Firefox 55 with the same profile. 5. Open about:config and verify that the test.shield.unenroll pref is set to True and the status column says "Default". 6. Open Firefox 56 with the same profile. Without the fix: The test.shield.unenroll pref will not be present in about:config, indicating that the user has been unenrolled. With the fix: The test.shield.unenroll pref should still be set to true, indicating that the user was not unenrolled from the experiment. [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Mildly. [Why is the change risky/not risky?]: Mostly because it is so late in the cycle. It changes the Shield startup behavior a little bit, but is overall a small patch; most of the changes were tests. [String changes made/needed]: None
Attachment #8909605 - Flags: approval-mozilla-release?
Comment on attachment 8909605 [details] Bug 1399936: Migrate active preference experiment values to new storage. Let's uplift for the RC2 56 build. This should let us re-start shield experiments on 56 release once it is in place.
Attachment #8909605 - Flags: approval-mozilla-release? → approval-mozilla-release+
Andrei, can someone test this? It is important for both 56 release and for 57 beta. Osmose mentioned that Adrian is the most familiar with testing this feature. Probably good to work closely with Osmose on testing if you have any questions. Ty!
Flags: needinfo?(andrei.vaida)
Flags: needinfo?(adrian.florinescu)
Landed in https://hg.mozilla.org/releases/mozilla-release/rev/5bd2002c00756a9e4e8cff00cc44728ab2f74dbb, should go out with rc2 if it doesn't get backed out.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35) > Andrei, can someone test this? It is important for both 56 release and for > 57 beta. > Osmose mentioned that Adrian is the most familiar with testing this feature. > Probably good to work closely with Osmose on testing if you have any > questions. Ty! Hey Liz, I'm going to take this and follow it through. I've already talked with Osmose last night and ironed out the details for verifying this issue once RC2 lands.
Flags: needinfo?(andrei.vaida)
Flags: needinfo?(adrian.florinescu)
QA Contact: adrian.florinescu
I tested on the rc2 build at https://archive.mozilla.org/pub/firefox/candidates/56.0-candidates/build3/ on OSX and was able to verify the bug as fixed with the STR from comment 33.
I've also run the test on RC build 4 (http://ftp.mozilla.org/pub/firefox/candidates/56.0-candidates/build4/). In accordance with comment 33 steps, it seems Ok to me. However, checking "shield-preference-experiments.json" for the 55 /56 I notice the "pref-flip-screenshots-release-1369150" expires once I "update" to 56. Since I don't have access to the production admin, I'm not sure if this is the expected from the "pref-flip-screenshot" recipe, but since it's an RC build - hence tagged as release, I don't believe it should happen - unless there is a version filter set for the recipe. Mike, could you confirm that this behavior is expected?: 55 Release (55.0.3 20170824123605) {"pref-flip-screenshots-release-1369150":{"name":"pref-flip-screenshots-release-1369150","branch":"control","expired":false,"lastSeen":"2017-09-25T09:29:01.488Z","preferenceName":"extensions.screenshots.system-disabled","preferenceValue":true,"preferenceType":"boolean","previousPreferenceValue":true,"preferenceBranchType":"default"}} 56 RC 4 (56.0 20170922200134) {"pref-flip-screenshots-release-1369150":{"name":"pref-flip-screenshots-release-1369150","branch":"control","expired":true,"lastSeen":"2017-09-25T09:29:01.488Z","preferenceName":"extensions.screenshots.system-disabled","preferenceValue":true,"preferenceType":"boolean","previousPreferenceValue":true,"preferenceBranchType":"default"}
Flags: needinfo?(mkelly)
(In reply to Adrian Florinescu [:AdrianSV] from comment #39) > However, checking "shield-preference-experiments.json" for the 55 /56 I > notice the "pref-flip-screenshots-release-1369150" expires once I "update" > to 56. Since I don't have access to the production admin, I'm not sure if > this is the expected from the "pref-flip-screenshot" recipe, but since it's > an RC build - hence tagged as release, I don't believe it should happen - > unless there is a version filter set for the recipe. > > Mike, could you confirm that this behavior is expected?: Looking at the recipe list at https://normandy.services.mozilla.com/api/v2/recipe/ for the string "pref-flip-screenshots-release-1369150" leads to a recipe with the filter expression[1]: ( normandy.channel == 'release' && [normandy.userId, normandy.recipe.id]|stableSample(1.0) && normandy.version >= '55.0' && normandy.version < '56.0' ) The version filters are targeted at 55, so moving to 56 should unenroll you, as it did. So we're good! [1] A better UI for searching the public API would probably help make this more usable for QA purposes.
Flags: needinfo?(mkelly)
Verified in Comment 39
Status: RESOLVED → VERIFIED
Product: Shield → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: