Closed Bug 1348097 Opened 8 years ago Closed 8 years ago

Intermittent browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource://app/modules/experiments/Experiments.jsm:840 - Error: experiments are disabled

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: standard8)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Priority: -- → P3
Although this is reporting a test failure against browser/components/search/test/browser_yahoo_behavior.js, I think this is actually an issue in the Experiments code: 20:34:12 INFO - TEST-UNEXPECTED-FAIL | browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource://app/modules/experiments/Experiments.jsm:840 - Error: experiments are disabled 20:34:12 INFO - Stack trace: 20:34:12 INFO - updateManifest@resource://app/modules/experiments/Experiments.jsm:840:29 20:34:12 INFO - notify@resource://app/components/ExperimentsService.js:66:7 20:34:12 INFO - TM_notify/<@resource://gre/components/nsUpdateTimerManager.js:218:11 20:34:12 INFO - TM_notify@resource://gre/components/nsUpdateTimerManager.js:263:7 20:34:12 INFO - GECKO(2802) | Completed ShutdownLeaks collections in process 2802 AFAICT we're not touching anything relating to enabling/disabling experiments in browser_yahoo_behavior.js. This implies another test has triggered the timer in ExperimentsService.js, but it hasn't been disabled properly. There's several things that I think are wrong here: 1) `ExperimentsService:notify` calls `instance.updateManifest()`. However, the function is returning a promise. So this should possibly have a catch on it, or some other handling if required. 2) `gExperimentsEnabled` in ExperimentsService.js is defined as (some constants replaced for clarity): ``` const IS_UNIFIED_TELEMETRY = Preferences.get("toolkit.telemetry.unified", false); return gPrefs.get("experiments.enabled", false) && IS_UNIFIED_TELEMETRY && gPrefs.get(PREF_TELEMETRY_ENABLED, false); ``` In Experiments.jsm it is: ``` gExperimentsEnabled = gPrefs.get("experiments.enabled", false) && TelemetryUtils.isTelemetryEnabled; ``` So `gExperimentsEnabled` is based on two different values. 3) Experiments.jsm watches the telemetry preference status, but ExperimentsService.jsm doesn't. This will mean that the Experiments.jsm version will get an update, whereas the ExperimentsService.jsm might not (it depends on exactly how the lazy getter works). My theory then, is that the timer within ExperimentsService.js has been triggered, and when it activates it calls `instance.updateManifest()` in Experiments.jsm, but that one has been updated to think that the experiments service is not enabled, and hence the promise blows up.
Component: Search → Telemetry
Product: Firefox → Toolkit
Priority: P3 → --
in working on migrating the tests to run on Ubuntu 16.04 (instead of 12.04) we hit this root cause failure often: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4fb0bf102f76c3458b477cabc63a8820960e4f4&filter-searchStr=linux%20bc5%20e10s%20debug is there a recommended fix for this or some general direction of what we should be doing to fix this?
Blocks: 1319863
Flags: needinfo?(standard8)
(In reply to Joel Maher ( :jmaher) from comment #3) > in working on migrating the tests to run on Ubuntu 16.04 (instead of 12.04) > we hit this root cause failure often: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=c4fb0bf102f76c3458b477cabc63a8820960e4f4&filter- > searchStr=linux%20bc5%20e10s%20debug That's interesting, especially as it fails with the same errors but in different tests. > is there a recommended fix for this or some general direction of what we > should be doing to fix this? I've taken a bit more of a look. The more I get into this, the more I think that there is something to do with timing going on in Experiments/ExperimentsService.jsm. I'm not quite sure what yet though. browser_oneOffContextMenu.js does stuff which triggers some Experiments activity, it then turns it off again later. On my opt setup, I'm not able to repeat this, but that's probably unsurprising, as this seems to be timing related. There's a few timers about, and my suspicion is that one of them is somehow triggered at just the wrong time to upset the tests. My guess would be that the Ubuntu 16.04 builders are slightly slower and therefore triggering this more often. With all the debug I've turned on and added, I've now pushed a try build. Hopefully that'll show up an explanation as to what is happening: https://treeherder.mozilla.org/#/jobs?repo=try&revision=db233cd760c9e2916901e35564acb358169d0034 I'll take another look in the morning.
I see a failure in bc3: https://public-artifacts.taskcluster.net/SMemd-6KSXqza_Cl_7ZBvw/0/public/logs/live_backing.log and I see: [task 2017-04-18T22:57:06.022006Z] 22:57:06 INFO - GECKO(1531) | --DOMWINDOW == 13 (0xc9111400) [pid = 1531] [serial = 86] [outer = (nil)] [url = about:blank] [task 2017-04-18T22:57:06.326319Z] 22:57:06 INFO - GECKO(1531) | 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest() [task 2017-04-18T22:57:12.295604Z] 22:57:12 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1664 [task 2017-04-18T22:57:12.701353Z] 22:57:12 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1618 [task 2017-04-18T22:57:13.385353Z] 22:57:13 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1642 [task 2017-04-18T22:57:14.031587Z] 22:57:14 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1584 [task 2017-04-18T22:57:17.421945Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 12 (0xe842cc00) [pid = 1531] [serial = 76] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/opensearch.html] [task 2017-04-18T22:57:17.424246Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 11 (0xc8d33400) [pid = 1531] [serial = 79] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/opensearch.html] [task 2017-04-18T22:57:17.426257Z] 22:57:17 INFO - GECKO(1531) | --DOMWINDOW == 10 (0xc928fc00) [pid = 1531] [serial = 82] [outer = (nil)] [url = chrome://mochitests/content/browser/browser/components/search/test/tooManyEnginesOffered.html] [task 2017-04-18T22:57:22.309864Z] 22:57:22 INFO - Console message: [JavaScript Error: "notify CALLLED123" {file: "jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js" line: 58}] [task 2017-04-18T22:57:22.312343Z] 22:57:22 INFO - notify@jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js:58:5 [task 2017-04-18T22:57:22.313753Z] 22:57:22 INFO - TM_notify/<@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:218:11 [task 2017-04-18T22:57:22.315424Z] 22:57:22 INFO - TM_notify@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:263:7 [task 2017-04-18T22:57:22.315643Z] 22:57:22 INFO - [task 2017-04-18T22:57:22.317015Z] 22:57:22 INFO - Console message: 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest() [task 2017-04-18T22:57:22.374244Z] 22:57:22 INFO - TEST-INFO | started process screentopng [task 2017-04-18T22:57:22.456099Z] 22:57:22 INFO - Failed to start /home/worker/workspace/build/tests/bin/screentopng for screenshot: No such file or directory [task 2017-04-18T22:57:22.461320Z] 22:57:22 INFO - TEST-UNEXPECTED-FAIL | browser/components/search/test/browser_yahoo_behavior.js | A promise chain failed to handle a rejection: - at resource:///modules/experiments/Experiments.jsm:843 - Error: experiments are disabled [task 2017-04-18T22:57:22.461635Z] 22:57:22 INFO - Stack trace: [task 2017-04-18T22:57:22.464118Z] 22:57:22 INFO - updateManifest@resource:///modules/experiments/Experiments.jsm:843:29 [task 2017-04-18T22:57:22.465671Z] 22:57:22 INFO - notify@jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js:67:7 [task 2017-04-18T22:57:22.467080Z] 22:57:22 INFO - TM_notify/<@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:218:11 [task 2017-04-18T22:57:22.468522Z] 22:57:22 INFO - TM_notify@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/nsUpdateTimerManager.js:263:7 [task 2017-04-18T22:57:31.839255Z] 22:57:31 INFO - GECKO(1531) | Completed ShutdownLeaks collections in process 1531 [task 2017-04-18T22:57:31.839664Z] 22:57:31 INFO - TEST-START | Shutdown what looks interesting is: [task 2017-04-18T22:57:06.326319Z] 22:57:06 INFO - GECKO(1531) | 1492556226319 Browser.Experiments.Experiments TRACE Experiments #0::updateManifest() ... [task 2017-04-18T22:57:22.309864Z] 22:57:22 INFO - Console message: [JavaScript Error: "notify CALLLED123" {file: "jar:file:///home/worker/workspace/build/application/firefox/browser/omni.ja!/components/ExperimentsService.js" line: 58}] I assume that is related to this call: https://dxr.mozilla.org/mozilla-central/source/browser/experiments/ExperimentsService.js?q=path%3AExperimentsService.js&redirect_type=single#58 I am not really sure if that helps at all, but thought I would take a look.
So, from the try builds, and today's investigations so far: - ExperimentsService.notify() is being triggered because it is an update-timer defined in Experiments.manifest: https://dxr.mozilla.org/mozilla-central/rev/722fdbff1efc308a22060e75b603311d23541bb5/browser/experiments/Experiments.manifest#3 The update-timers are triggered with one timer every 30 seconds, there are a few of them, so this could indeed be firing at any time (probably worse on slow builds, e.g. the debug ones). Now comes the interesting bit: - When the browser/components/search tests start, both ExperimentsService and Experiments see the various preferences as being true, i.e. they are enabled. In particular, "toolkit.telemetry.enabled" is true. - browser_oneOffContextMenu.js changes "toolkit.telemetry.enabled" to false => Experiments picks this up, and changes its gExperimentsEnabled state to false. ExperimentsService stays as true, as it doesn't look for pref changes (see comment 2). - browser_oneOffContextMenu.js then changes "toolkit.telemetry.enabled" to true => Experiments picks this up, but doesn't change its gExperimentsEnabled state - _telemetryStatusChanged calls _toggleExperimentsEnabled with the current value of gExperimentsEnabled (false), and then passes that enabled value to: gExperimentsEnabled = enabled && TelemetryUtils.isTelemetryEnabled; which is always going to return false... - At some time later the update-timer kicks in, and triggers ExperimentsService. => ExperimentsService has its original recorded value of gExperimentsEnabled = true, so fires Experiments.updateManifest() which rejects a promise due to Experiments thinking that it is disabled, and bang: test fail. I can reproduce this (5/5) on an opt build on my local mac by editing nsUpdateTimerManager.js and in TimerManager.observe forcing `this._timerMinimumDelay` and `firstInterval` to be `5000`. These intervals are normally 30s, so I'd guess these tests are taking longer on the newer ubuntu machines.
Assignee: nobody → standard8
Flags: needinfo?(standard8)
Blocks: 1232648
Attachment #8859535 - Flags: review?(alessio.placitelli) → review?(gfritzsche)
Comment on attachment 8859535 [details] Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. I'm currently investigating the issues raised by try server.
Attachment #8859535 - Flags: review?(gfritzsche)
Comment on attachment 8859535 [details] Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. https://reviewboard.mozilla.org/r/131538/#review134444 Thanks for figuring this out! ::: browser/experiments/Experiments.jsm:124 (Diff revision 2) > function configureLogging() { > if (!gLogger) { > gLogger = Log.repository.getLogger("Browser.Experiments"); > gLogger.addAppender(new Log.ConsoleAppender(new Log.BasicFormatter())); > } > - gLogger.level = gPrefs.get(PREF_LOGGING_LEVEL, Log.Level.Warn); > + gLogger.level = gPrefs.get(PREF_LOGGING_LEVEL, Log.Level.Trace); Left-over? Please undo this. ::: browser/experiments/ExperimentsService.js:53 (Diff revision 2) > > + get _experimentsEnabled() { > + // We can enable experiments if either unified Telemetry or FHR is on, and the user > + // has opted into Telemetry. > + return gPrefs.get(PREF_EXPERIMENTS_ENABLED, false) && > + IS_UNIFIED_TELEMETRY && gPrefs.get(PREF_TELEMETRY_ENABLED, false); Please use `TelemetryUtils.isTelemetryEnabled()`. This handles an actual type issue we have seen before.
Attachment #8859535 - Flags: review?(gfritzsche)
Comment on attachment 8859535 [details] Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. https://reviewboard.mozilla.org/r/131538/#review134444 > Please use `TelemetryUtils.isTelemetryEnabled()`. > This handles an actual type issue we have seen before. Thanks, I've changed it. I also dropped the IS_UNIFIED_TELEMETRY check so that it was consistent with Experiments.jsm
Comment on attachment 8859535 [details] Bug 1348097 - Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. https://reviewboard.mozilla.org/r/131538/#review134950 Thanks for getting this fixed!
Attachment #8859535 - Flags: review?(gfritzsche) → review+
Pushed by mbanner@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e1c9c6e89f50 Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. r=gfritzsche
I knew I should have done the experiments.enabled setting for the whole of that test directory :-( I'll get that fixed.
Flags: needinfo?(standard8)
Pushed by mbanner@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b2c250009e4b Fix intermittent failures caused by Experiments.jsm by ensuring the preference monitoring is in sync. r=gfritzsche
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: