Closed Bug 1579840 Opened 1 year ago Closed 6 months ago

Intermittent telemetry/marionette/tests/client/test_subsession_management.py TestSubsessionManagement.test_subsession_management | AssertionError: u'c641e2d3-2742-4143-b667-1463c7b7fbc1' is not None

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox77 --- wontfix
firefox78 --- fixed
firefox79 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: chutten|PTO)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=265687764&repo=try
Full log: https://queue.taskcluster.net/v1/task/HwCC6GOkRACiXMi9pUi5-Q/runs/0/artifacts/public/logs/live_backing.log


[task 2019-09-09T12:17:35.475Z] 12:17:35 INFO - 1568031455469 Toolkit.Telemetry TRACE TelemetryStorage::loadPendingPing - id: 0fab57e4-4fbf-421a-ba58-443043374e8d
[task 2019-09-09T12:17:35.475Z] 12:17:35 INFO - 1568031455469 Toolkit.Telemetry TRACE TelemetrySession::delayedInit
[task 2019-09-09T12:17:35.479Z] 12:17:35 INFO - 1568031455469 Toolkit.Telemetry TRACE TelemetryStorage::_popAndPerformQueuedOperation - Performing queued operation.
[task 2019-09-09T12:17:35.506Z] 12:17:35 INFO - 1568031455499 Toolkit.Telemetry TRACE TelemetryStorage::_popAndPerformQueuedOperation - Performing queued operation.
[task 2019-09-09T12:17:35.522Z] 12:17:35 INFO - 1568031455514 Toolkit.Telemetry TRACE TelemetrySend::_doPing - server: http://127.0.0.1:8000/pings, persisted: true, id: 0fab57e4-4fbf-421a-ba58-443043374e8d
[task 2019-09-09T12:17:35.550Z] 12:17:35 INFO - pingserver pings_handler received 'main' ping with reason 'shutdown'
[task 2019-09-09T12:17:35.568Z] 12:17:35 INFO - 1568031455558 Toolkit.Telemetry INFO TelemetrySend::_doPing - successfully loaded, status: 200
[task 2019-09-09T12:17:35.568Z] 12:17:35 INFO - 1568031455559 Toolkit.Telemetry TRACE TelemetrySend::_onPingRequestFinished - success: true, persisted: true
[task 2019-09-09T12:17:35.568Z] 12:17:35 INFO - 1568031455559 Toolkit.Telemetry TRACE TelemetryStorage::removePendingPing - deleting ping with id: 0fab57e4-4fbf-421a-ba58-443043374e8d, path: /builds/worker/workspace/build/tmpC59VwD.mozrunner/saved-telemetry-pings/0fab57e4-4fbf-421a-ba58-443043374e8d
[task 2019-09-09T12:17:35.584Z] 12:17:35 INFO - 1568031455573 Toolkit.Telemetry TRACE TelemetrySend::Scheduler::_doSendTask - bailing out after sending, shutdown: false, pendingPingCount: 0
[task 2019-09-09T12:17:35.584Z] 12:17:35 INFO - 1568031455575 Toolkit.Telemetry TRACE TelemetryStorage::loadPingfile - unreadable ping /builds/worker/workspace/build/tmpC59VwD.mozrunner/datareporting/aborted-session-ping: {"operation":"open","path":"/builds/worker/workspace/build/tmpC59VwD.mozrunner/datareporting/aborted-session-ping","unixErrno":2}
[task 2019-09-09T12:17:35.584Z] 12:17:35 INFO - 1568031455576 Toolkit.Telemetry TRACE TelemetryStorage::loadAbortedSessionPing - no such file
[task 2019-09-09T12:17:35.584Z] 12:17:35 INFO - 1568031455576 Toolkit.Telemetry TRACE TelemetryController::checkAbortedSessionPing - found aborted-session ping: false
[task 2019-09-09T12:17:35.584Z] 12:17:35 INFO - 1568031455576 Toolkit.Telemetry TRACE TelemetrySession::_saveAbortedSessionPing
[task 2019-09-09T12:17:35.586Z] 12:17:35 INFO - 1568031455576 Toolkit.Telemetry TRACE TelemetrySession::getSessionPayload - reason: aborted-session, clearSubsession: false
[task 2019-09-09T12:17:35.587Z] 12:17:35 INFO - 1568031455577 Toolkit.Telemetry TRACE TelemetrySession::assemblePayloadWithMeasurements - reason: aborted-session, submitting subsession data: true
[task 2019-09-09T12:17:35.587Z] 12:17:35 INFO - 1568031455580 Toolkit.Telemetry TRACE TelemetryController::saveAbortedSessionPing
[task 2019-09-09T12:17:35.588Z] 12:17:35 INFO - 1568031455580 Toolkit.Telemetry TRACE TelemetryController::assemblePing - Type main, aOptions {"addClientId":true,"addEnvironment":true}
[task 2019-09-09T12:17:35.592Z] 12:17:35 INFO - 1568031455582 Toolkit.Telemetry TRACE TelemetryStorage::saveAbortedSessionPing - ping path: /builds/worker/workspace/build/tmpC59VwD.mozrunner/datareporting/aborted-session-ping
[task 2019-09-09T12:17:35.592Z] 12:17:35 INFO - 1568031455584 Toolkit.Telemetry TRACE TelemetryStorage::_popAndPerformQueuedOperation - Performing queued operation.
[task 2019-09-09T12:17:35.593Z] 12:17:35 INFO - 1568031455584 Toolkit.Telemetry TRACE TelemetryStorage::savePingToFile - path: /builds/worker/workspace/build/tmpC59VwD.mozrunner/datareporting/aborted-session-ping
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455587 Toolkit.Telemetry TRACE TelemetryEnvironment::registerChangeListener for TelemetrySession::onEnvironmentChange
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455588 Toolkit.Telemetry TRACE TelemetryScheduler::init
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455588 Toolkit.Telemetry TRACE TelemetryScheduler::_rescheduleTimeout - isUserIdle: false
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455588 Toolkit.Telemetry TRACE TelemetryScheduler::_rescheduleTimeout - scheduling next tick for Mon Sep 09 2019 12:22:35 GMT+0000 (Coordinated Universal Time)
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455589 Toolkit.Telemetry TRACE TelemetryStorage::cleanArchiveTask
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455589 Toolkit.Telemetry TRACE TelemetryStorage::removeFHRDatabase
[task 2019-09-09T12:17:35.594Z] 12:17:35 INFO - 1568031455590 Toolkit.Telemetry TRACE TelemetryEventPing::::Starting up.
[task 2019-09-09T12:17:35.611Z] 12:17:35 INFO - 1568031455601 Toolkit.Telemetry TRACE TelemetryPrioPing::Starting up.
[task 2019-09-09T12:17:35.611Z] 12:17:35 INFO - 1568031455607 Toolkit.Telemetry TRACE TelemetryStorage::_purgeOldPings
[task 2019-09-09T12:17:35.612Z] 12:17:35 INFO - 1568031455610 Toolkit.Telemetry TRACE TelemetryStorage::_scanArchive
[task 2019-09-09T12:17:35.619Z] 12:17:35 INFO - 1568031455616 Toolkit.Telemetry TRACE TelemetryStorage::_scanArchive - checking in subdir: /builds/worker/workspace/build/tmpC59VwD.mozrunner/datareporting/archived/2019-09
[task 2019-09-09T12:17:35.635Z] 12:17:35 INFO - 1568031455632 Toolkit.Telemetry TRACE TelemetryStorage::_enforceArchiveQuota
[task 2019-09-09T12:17:35.756Z] 12:17:35 INFO - 1568031455753 Marionette TRACE Received observer notification marionette-startup-requested
[task 2019-09-09T12:17:35.757Z] 12:17:35 INFO - 1568031455753 Marionette TRACE Waiting until startup recorder finished recording startup scripts...
[task 2019-09-09T12:17:35.765Z] 12:17:35 INFO - 1568031455759 Marionette TRACE All scripts recorded.
[task 2019-09-09T12:17:35.780Z] 12:17:35 INFO - 1568031455774 Marionette INFO Listening on port 2828
[task 2019-09-09T12:17:35.781Z] 12:17:35 INFO - 1568031455775 Marionette DEBUG Remote service is active
[task 2019-09-09T12:17:35.808Z] 12:17:35 INFO - 1568031455802 Toolkit.Telemetry TRACE TelemetryEnvironment::_updateAddons
[task 2019-09-09T12:17:35.808Z] 12:17:35 INFO - 1568031455805 Toolkit.Telemetry TRACE TelemetryEnvironment::_updateAddons: addons differ
[task 2019-09-09T12:17:35.808Z] 12:17:35 INFO - 1568031455805 Toolkit.Telemetry TRACE TelemetryEnvironment::_startWatchingPrefs - [object Map]
[task 2019-09-09T12:17:35.816Z] 12:17:35 INFO - 1568031455809 Toolkit.Telemetry TRACE TelemetryEnvironment::observe - aTopic: nsPref:changed, aData: gfx.blacklist.layers.opengl
[task 2019-09-09T12:17:35.816Z] 12:17:35 INFO - 1568031455809 Toolkit.Telemetry TRACE TelemetryEnvironment::observe - aTopic: nsPref:changed, aData: gfx.blacklist.layers.opengl.failureid
[task 2019-09-09T12:17:35.873Z] 12:17:35 INFO - 1568031455865 Marionette DEBUG Accepted connection 0 from 127.0.0.1:50758
[task 2019-09-09T12:17:35.873Z] 12:17:35 INFO - 1568031455866 Marionette DEBUG Closed connection 0
[task 2019-09-09T12:17:35.874Z] 12:17:35 INFO - 1568031455867 Marionette DEBUG Accepted connection 1 from 127.0.0.1:50760
[task 2019-09-09T12:17:35.875Z] 12:17:35 INFO - 1568031455868 Marionette DEBUG 1 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[task 2019-09-09T12:17:35.876Z] 12:17:35 INFO - 1568031455871 Marionette TRACE [11] Frame script loaded
[task 2019-09-09T12:17:35.877Z] 12:17:35 INFO - 1568031455871 Marionette TRACE [11] Frame script registered
[task 2019-09-09T12:17:35.877Z] 12:17:35 INFO - 1568031455871 Toolkit.Telemetry TRACE TelemetryEnvironment::observe - aTopic: nsPref:changed, aData: marionette.contentListener
[task 2019-09-09T12:17:35.880Z] 12:17:35 INFO - 1568031455873 Marionette DEBUG 1 <- [1,1,null,{"sessionId":"fb893407-7812-4030-8a85-47e59a3ca3f7","capabilities":{"browserName":"firefox","browserVersion":"71.0" ... ld/tmpC59VwD.mozrunner","moz:shutdownTimeout":60000,"moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true}}]
[task 2019-09-09T12:17:35.881Z] 12:17:35 INFO - 1568031455879 Marionette DEBUG 1 -> [0,2,"Marionette:SetContext",{"value":"content"}]
[task 2019-09-09T12:17:35.881Z] 12:17:35 INFO - 1568031455880 Marionette DEBUG 1 <- [1,2,null,{"value":null}]
[task 2019-09-09T12:17:35.882Z] 12:17:35 INFO - 1568031455881 Marionette DEBUG 1 -> [0,3,"Marionette:GetContext",{}]
[task 2019-09-09T12:17:35.883Z] 12:17:35 INFO - 1568031455881 Marionette DEBUG 1 <- [1,3,null,{"value":"content"}]
[task 2019-09-09T12:17:35.884Z] 12:17:35 INFO - 1568031455882 Marionette DEBUG 1 -> [0,4,"Marionette:SetContext",{"value":"chrome"}]
[task 2019-09-09T12:17:35.885Z] 12:17:35 INFO - 1568031455882 Marionette DEBUG 1 <- [1,4,null,{"value":null}]
[task 2019-09-09T12:17:35.888Z] 12:17:35 INFO - 1568031455885 Marionette DEBUG 1 -> [0,5,"WebDriver:TakeScreenshot",{"full":true,"hash":false,"id":null,"scroll":true}]
[task 2019-09-09T12:17:35.964Z] 12:17:35 INFO - TEST-UNEXPECTED-FAIL | telemetry/marionette/tests/client/test_subsession_management.py TestSubsessionManagement.test_subsession_management | AssertionError: u'c641e2d3-2742-4143-b667-1463c7b7fbc1' is not None
[task 2019-09-09T12:17:35.965Z] 12:17:35 INFO - Traceback (most recent call last):
[task 2019-09-09T12:17:35.965Z] 12:17:35 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 159, in run
[task 2019-09-09T12:17:35.965Z] 12:17:35 INFO - testMethod()
[task 2019-09-09T12:17:35.965Z] 12:17:35 INFO - File "/builds/worker/workspace/build/tests/telemetry/marionette/tests/client/test_subsession_management.py", line 58, in test_subsession_management
[task 2019-09-09T12:17:35.966Z] 12:17:35 INFO - self.assertIsNone(ping1_info["previousSubsessionId"])
[task 2019-09-09T12:17:35.966Z] 12:17:35 INFO - TEST-INFO took 7110ms
[task 2019-09-09T12:17:35.966Z] 12:17:35 INFO - 1568031455934 Marionette DEBUG 1 <- [1,5,null,{"value":"iVBORw0KGgoAAAANSUhEUgAABQAAAAQQCAYAAAC9RfbYAAAgAElEQVR4nOzd368d5Z0nal9mTm/+gDEorVEftRpzvL29+b1xwBjsnZxYT ... ACAAAAQGACQAAAAAAITAAIAAAAAIEJAAEAAAAgMAEgAAAAAAQmAAQAAACAwASAAAAAABCYABAAAAAAAhMAAgAAAEBg/w9WTASdqSkw1AAAAABJRU5ErkJggg=="}]
[task 2019-09-09T12:17:35.966Z] 12:17:35 INFO - 1568031455936 Marionette DEBUG 1 -> [0,6,"Marionette:SetContext",{"value":"content"}]
[task 2019-09-09T12:17:35.967Z] 12:17:35 INFO - 1568031455936 Marionette DEBUG 1 <- [1,6,null,{"value":null}]
[task 2019-09-09T12:17:35.967Z] 12:17:35 INFO - 1568031455936 Marionette DEBUG 1 -> [0,7,"Marionette:GetContext",{}]
[task 2019-09-09T12:17:35.967Z] 12:17:35 INFO - 1568031455936 Marionette DEBUG 1 <- [1,7,null,{"value":"content"}]
[task 2019-09-09T12:17:35.968Z] 12:17:35 INFO - 1568031455937 Marionette DEBUG 1 -> [0,8,"Marionette:SetContext",{"value":"content"}]
[task 2019-09-09T12:17:35.968Z] 12:17:35 INFO - 1568031455937 Marionette DEBUG 1 <- [1,8,null,{"value":null}]
[task 2019-09-09T12:17:35.968Z] 12:17:35 INFO - 1568031455938 Marionette DEBUG 1 -> [0,9,"WebDriver:GetPageSource",{}]
[task 2019-09-09T12:17:35.968Z] 12:17:35 INFO - 1568031455939 Marionette DEBUG 1 <- [1,9,null,{"value":"<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">\n<head>\n <meta http-equiv="Co ... "about:webrtc">about:webrtc</a></li><li><a href="about:welcome">about:welcome</a></li></ul>\n </div>\n</body>\n</html>"}]
[task 2019-09-09T12:17:35.969Z] 12:17:35 INFO - 1568031455939 Marionette DEBUG 1 -> [0,10,"Marionette:SetContext",{"value":"content"}]
[task 2019-09-09T12:17:35.969Z] 12:17:35 INFO - 1568031455939 Marionette DEBUG 1 <- [1,10,null,{"value":null}]
[task 2019-09-09T12:17:35.969Z] 12:17:35 INFO - 1568031455945 Marionette DEBUG 1 -> [0,11,"WebDriver:GetChromeWindowHandles",{}]
[task 2019-09-09T12:17:35.969Z] 12:17:35 INFO - 1568031455946 Marionette DEBUG 1 <- [1,11,null,["1"]]
[task 2019-09-09T12:17:35.970Z] 12:17:35 INFO - 1568031455946 Marionette DEBUG 1 -> [0,12,"WebDriver:GetWindowHandles",{}]
[task 2019-09-09T12:17:35.970Z] 12:17:35 INFO - 1568031455946 Marionette DEBUG 1 <- [1,12,null,["11"]]
[task 2019-09-09T12:17:36.154Z] 12:17:36 INFO - TEST-START | telemetry/marionette/tests/unit/test_ping_server_received_ping.py TestPingServer.test_ping_server_received_ping

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Assignee: nobody → rpierzina
Flags: needinfo?(rpierzina)
Depends on: 1636130

This is another instance of the previous subsession id not being None as we expected. This means an extra "main" ping is being sent at some point earlier in the test.

Looking at the logs provided this happens at least once because the Firefox under test is enrolled in an Experiment. This changes the Telemetry Environment which cuts a "main" ping with reason "environment-change".

Getting around this is fairly trivial, we can (and probably eventually should) turn off studies and experiments in this suite. However, we should probably examine what mechanism decided targeting a Firefox in automation was a good idea. If it's something specific to our setup then no worries and we can go ahead with disabling things. If it's not specific then we might be in a situation where our experiments are targeting Firefoxen being used by robots.

Severity: normal → S4
Status: REOPENED → ASSIGNED
Priority: P5 → P1
Pushed by rpierzina@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/81bfdd0b42ab
Disable Normandy in tt(c) to avoid extra subsessions; r=chutten
Status: ASSIGNED → RESOLVED
Closed: 1 year ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

The error in Comment 27 happened for a different reason than Normandy being enabled: It appears the Telemetry Environment changed because a new search region was set.

Relevant log lines

I'm not sure what's causing the search region to change. I also spotted this cache error:

chutten, can you please have a look at the log file?

Flags: needinfo?(rpierzina) → needinfo?(chutten)

Is this another instance of bug 1545207 rearing its head? Did the search region mock in https://phabricator.services.mozilla.com/D28059 stop working?

Flags: needinfo?(chutten) → needinfo?(rpierzina)

What do you suggest for determining if the mock works? I see the locale in the Telemetry Environment is reported as "en-US".
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305599375&repo=mozilla-central&lineNumber=3530

Flags: needinfo?(rpierzina) → needinfo?(chutten)

I went looking through the code and found that the pref we're using (browser.search.geoip.url) was removed at the end of February as part of bug 1589618. It appears as though the new pref might be geo.provider-country.network.url.

+mkaply because he's the one who originally told me how to mock this.

ni?daleharvey as the change-maker: Is there anything else we need to do other than switch browser.search.geoip.url? I went looking for test_location.js to continue copying its approach, but couldn't find it. It appears to have been removed in bug 1627541 and I can't find where that code is covered now.

Flags: needinfo?(chutten) → needinfo?(dharvey)

Based on my reading things, I do think you just need to change the pref name, but dale's the expert now on this.

So that pref has changed twice recently, was in a little bit of flux but should be stable now, its browser.region.network.url, sorry I missed over your test when I was making the changes.

If I remember the subsession telemetry tests were quite fragile to timing changes, the region patches have introduced some different timing characteristics so might need to be careful about waiting for the region check to complete, if you fix the mock and are still running into issues feel free to ping me about them

Flags: needinfo?(dharvey)

The patch landed in nightly and beta is affected.
:raphael, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(rpierzina)

What do you think, chutten?

Flags: needinfo?(rpierzina) → needinfo?(chutten)

I'll update the pref and push and see if that helps.

Assignee: rpierzina → chutten
Status: RESOLVED → REOPENED
Flags: needinfo?(chutten)
Resolution: FIXED → ---

Annnd we got an intermittent failure. The browser.search.region pref is still changing during startup: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305945327&repo=try&lineNumber=2632

It's set to DE: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305945327&repo=try&lineNumber=3472

I guess the timing has changed and so our previous mock no longer works. Which I guess means we need to talk about what we need to change that both A) Leaves Firefox as close to "normal" as possible, to maintain the relevance of the test B) Makes it testable.

I see a couple of options. One is to adapt the test to permit receiving as many main pings as it wants to, combining the relevant data across all of them in order to ensure we captured all of the calls. Another is to adapt the code further to somehow avoid this early shift. We have test-only APIs to suppress watching prefs, maybe we start using them. Or maybe we dig deeper into region detection and see if there's something specific we can do to make it reliable in our tests.

I'm kinda leaning towards adapting the tests to support aggregating across multiple pings since that mirrors how search data is analysed. What do you think, Raphael?

Flags: needinfo?(rpierzina)
Status: REOPENED → ASSIGNED
Target Milestone: mozilla79 → ---

On startup, Region.jsm will defer a callback (idleDispatchToMainThread) a function that will check the region, that check will then read that pref you are defining, set the region then send an observer nofication with "region-updated", the search service listens for that observer notification and reloads engines

Looking at the test, there doesnt seem to be anything you specifically need from the searchService, you can avoid the startup engines reloading etc by setting browser.region.network.url to ""?

I'm afraid combining relevant data across all pings is problematic for test_subsession_management.py as this test verifies the management of Telemetry subsessions. I agree this approach makes sense for other tests though.

Flags: needinfo?(rpierzina)
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b9031f502d1e
Update browser search pref in telemetry tests r=raphael,daleharvey
Status: ASSIGNED → RESOLVED
Closed: 6 months ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
See Also: → 1649760
You need to log in before you can comment on or make changes to this bug.