Closed Bug 1776394 Opened 2 years ago Closed 2 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | toolkit/components/telemetry/tests/unit/test_HealthPing.js | Test timed out

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: chutten)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(2 files)

Filed by: nfay [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=382409398&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/XvOrBhAzR1-y-JRGL3NRuA/runs/0/artifacts/public/logs/live_backing.log


[task 2022-06-24T09:43:10.275Z] 09:43:10     INFO -  TEST-START | toolkit/components/telemetry/tests/unit/test_HealthPing.js
[task 2022-06-24T09:48:10.278Z] 09:48:10  WARNING -  TEST-UNEXPECTED-TIMEOUT | toolkit/components/telemetry/tests/unit/test_HealthPing.js | Test timed out
[task 2022-06-24T09:48:10.279Z] 09:48:10     INFO -  TEST-INFO took 300003ms
[task 2022-06-24T09:48:10.279Z] 09:48:10     INFO -  >>>>>>>
[task 2022-06-24T09:48:10.279Z] 09:48:10     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2022-06-24T09:48:10.279Z] 09:48:10     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2022-06-24T09:48:10.280Z] 09:48:10     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2022-06-24T09:48:10.280Z] 09:48:10     INFO -  running event loop
[task 2022-06-24T09:48:10.280Z] 09:48:10     INFO -  toolkit/components/telemetry/tests/unit/test_HealthPing.js | Starting setup
[task 2022-06-24T09:48:10.280Z] 09:48:10     INFO -  (xpcshell/head.js) | test setup pending (2)
[task 2022-06-24T09:48:10.281Z] 09:48:10     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2022-06-24T09:48:10.281Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790409	Toolkit.Telemetry	TRACE	TelemetryEnvironment::constructor"
[task 2022-06-24T09:48:10.281Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790440	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_getGFXData - Only one display adapter detected."
[task 2022-06-24T09:48:10.282Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790444	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_updateSearchEngine - ignoring early call"
[task 2022-06-24T09:48:10.283Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "readUpdateConfig: Unable to read app update configuration file. Exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: resource://gre/modules/UpdateUtils.jsm :: getConfigFilePath :: line 154"  data: no]" {file: "resource://gre/modules/UpdateUtils.jsm" line: 865}]
[task 2022-06-24T09:48:10.283Z] 09:48:10     INFO -  readUpdateConfig@resource://gre/modules/UpdateUtils.jsm:865:10
[task 2022-06-24T09:48:10.283Z] 09:48:10     INFO -  _do_main@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:240:6
[task 2022-06-24T09:48:10.283Z] 09:48:10     INFO -  _execute_test@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:597:5
[task 2022-06-24T09:48:10.284Z] 09:48:10     INFO -  @-e:1:1
[task 2022-06-24T09:48:10.284Z] 09:48:10     INFO -  "
[task 2022-06-24T09:48:10.284Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "1656063790447	Toolkit.Telemetry	ERROR	TelemetryEnvironment::EnvironmentCache - error while initializing: TypeError: can't access property "client", gBeforeShutdownBarrier is null(resource://gre/modules/AddonManager.jsm:4300:5) JS Stack trace: get beforeShutdown@AddonManager.jsm:4300:5
[task 2022-06-24T09:48:10.285Z] 09:48:10     INFO -  init@TelemetryEnvironment.jsm:605:5
[task 2022-06-24T09:48:10.285Z] 09:48:10     INFO -  EnvironmentCache@TelemetryEnvironment.jsm:976:29
[task 2022-06-24T09:48:10.285Z] 09:48:10     INFO -  getGlobal@TelemetryEnvironment.jsm:102:26
[task 2022-06-24T09:48:10.285Z] 09:48:10     INFO -  onInitialized@TelemetryEnvironment.jsm:113:12
[task 2022-06-24T09:48:10.286Z] 09:48:10     INFO -  setEmptyPrefWatchlist@head.js:483:31
[task 2022-06-24T09:48:10.286Z] 09:48:10     INFO -  setup@test_HealthPing.js:69:9
[task 2022-06-24T09:48:10.286Z] 09:48:10     INFO -  _run_next_test/<@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:1766:22
[task 2022-06-24T09:48:10.286Z] 09:48:10     INFO -  _run_next_test@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:1766:38
[task 2022-06-24T09:48:10.287Z] 09:48:10     INFO -  run@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:812:9
[task 2022-06-24T09:48:10.287Z] 09:48:10     INFO -  _do_main@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:240:6
[task 2022-06-24T09:48:10.287Z] 09:48:10     INFO -  _execute_test@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:597:5
[task 2022-06-24T09:48:10.287Z] 09:48:10     INFO -  @-e:1:1" {file: "resource://gre/modules/Log.jsm" line: 725}]
[task 2022-06-24T09:48:10.288Z] 09:48:10     INFO -  append@resource://gre/modules/Log.jsm:725:12
[task 2022-06-24T09:48:10.288Z] 09:48:10     INFO -  log@resource://gre/modules/Log.jsm:381:16
[task 2022-06-24T09:48:10.288Z] 09:48:10     INFO -  getLoggerWithMessagePrefix/proxy.log@resource://gre/modules/Log.jsm:510:18
[task 2022-06-24T09:48:10.288Z] 09:48:10     INFO -  error@resource://gre/modules/Log.jsm:389:10
[task 2022-06-24T09:48:10.289Z] 09:48:10     INFO -  EnvironmentCache/this._initTask<@resource://gre/modules/TelemetryEnvironment.jsm:1006:17
[task 2022-06-24T09:48:10.289Z] 09:48:10     INFO -  _do_main@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:240:6
[task 2022-06-24T09:48:10.289Z] 09:48:10     INFO -  _execute_test@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:597:5
[task 2022-06-24T09:48:10.289Z] 09:48:10     INFO -  @-e:1:1
[task 2022-06-24T09:48:10.289Z] 09:48:10     INFO -  "
[task 2022-06-24T09:48:10.290Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_startWatchingPrefs - [object Map]"
[task 2022-06-24T09:48:10.290Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_stopWatchingPrefs"
[task 2022-06-24T09:48:10.290Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_updateSearchEngine - ignoring early call"
[task 2022-06-24T09:48:10.290Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_startWatchingPrefs - [object Map]"
[task 2022-06-24T09:48:10.291Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryEnvironment::observe - aTopic: nsPref:changed, aData: toolkit.telemetry.healthping.enabled"
[task 2022-06-24T09:48:10.291Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790448	Toolkit.Telemetry	TRACE	TelemetryController::setupTelemetry"
[task 2022-06-24T09:48:10.291Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790449	Toolkit.Telemetry	TRACE	TelemetryController::registerJsProbes - registering builtin JS probes"
[task 2022-06-24T09:48:10.292Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790449	Toolkit.Telemetry	TRACE	TelemetryController::registerScalarProbes - registering scalar builtin JS probes"
[task 2022-06-24T09:48:10.292Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790449	Toolkit.Telemetry	TRACE	TelemetryController::loadProbeDefinitions - no builtin JS probe file ScalarArtifactDefinitions.json"
[task 2022-06-24T09:48:10.292Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790450	Toolkit.Telemetry	TRACE	TelemetryReportingPolicy::setup"
[task 2022-06-24T09:48:10.293Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790450	Toolkit.Telemetry	CONFIG	TelemetryController::enableTelemetryRecording - canRecordBase:true, canRecordExtended: true"
[task 2022-06-24T09:48:10.293Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790450	Toolkit.Telemetry	TRACE	TelemetrySession::earlyInit"
[task 2022-06-24T09:48:10.293Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790450	Toolkit.Telemetry	INFO	TelemetryReportingPolicy::get dataSubmissionPolicyNotifiedDate - No date stored yet."
[task 2022-06-24T09:48:10.294Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790451	Toolkit.Telemetry	TRACE	UpdatePing::init - enabled: true"
[task 2022-06-24T09:48:10.294Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790451	Toolkit.Telemetry	TRACE	TelemetryController::registerEventProbes - registering builtin JS Event probes"
[task 2022-06-24T09:48:10.294Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790451	Toolkit.Telemetry	TRACE	TelemetryController::loadProbeDefinitions - no builtin JS probe file EventArtifactDefinitions.json"
[task 2022-06-24T09:48:10.295Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "readUpdateConfig: Unable to read app update configuration file. Exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: resource://gre/modules/UpdateUtils.jsm :: getConfigFilePath :: line 154"  data: no]" {file: "resource://gre/modules/UpdateUtils.jsm" line: 865}]
[task 2022-06-24T09:48:10.295Z] 09:48:10     INFO -  readUpdateConfig@resource://gre/modules/UpdateUtils.jsm:865:10
[task 2022-06-24T09:48:10.295Z] 09:48:10     INFO -  _do_main@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:240:6
[task 2022-06-24T09:48:10.296Z] 09:48:10     INFO -  _execute_test@Z:\\task_165606228933289\\build\\tests\\xpcshell\\head.js:597:5
[task 2022-06-24T09:48:10.296Z] 09:48:10     INFO -  @-e:1:1
[task 2022-06-24T09:48:10.296Z] 09:48:10     INFO -  "
[task 2022-06-24T09:48:10.296Z] 09:48:10     INFO -  "CONSOLE_MESSAGE: (info) 1656063790469	Toolkit.Telemetry	TRACE	TelemetryEnvironment::_onEnvironmentChange for system-info"

Hello, this bug seems to be permafailing now. Could you take a look at it please? Thank you!

Flags: needinfo?(chutten)
See Also: → 1724922

Weird. It was green when I checked: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=a58b56af0ed03e09ed092aff82d12855d5d6f61b

Is it a clue that the configuration that's failing xpcshell-spi-nw-1 wasn't the specific one I ran? I've gone looking to see if I could add jobs of that config, but they aren't in the UI.

Flags: needinfo?(chutten)

The logs do suggest it's failing in the same place and way (it's testing a send timeout, and for whatever reason the send succeeds despite the timeout value being set to a very low value (one millisecond) meaning that the task waiting for the timeout never sees it.)

:barret would you like to have a go at a fix here or could we disable the test on windows until a fix is in place maybe after Chris is back from PTO?

Flags: needinfo?(brennie)
Assignee: nobody → csabou
Status: NEW → ASSIGNED
Status: ASSIGNED → NEW
Keywords: leave-open
Whiteboard: [stockwell disabled]
Assignee: csabou → nobody
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c6cf4533dfad
Disable test_HealthPing.js on win10_2004 for frequent failures. r=intermittent-reviewers,MasterWayZ DONTBUILD
Flags: needinfo?(brennie)
Summary: Perma TEST-UNEXPECTED-TIMEOUT | toolkit/components/telemetry/tests/unit/test_HealthPing.js | Test timed out → Intermittent TEST-UNEXPECTED-TIMEOUT | toolkit/components/telemetry/tests/unit/test_HealthPing.js | Test timed out

ni?me to (try to) figure out what the heck is going on here.

Flags: needinfo?(chutten)

Interesting that this isn't win10_2004-only, and is still happening across a variety of mac platforms (just at an acceptably-infrequent intermittent rate).

Giving this another go over here. We'll see what comes up with my try run, and how it's different from the ones that used to cause the errors.

One possibility is to drop this particular case from the "heatlh" ping tests. Testing network timeouts like this is prone to failure and we could look into more unit-test-y ways to simulate that without playing with unreliable timeouts. That'll allow us to turn the rest of the test file back on across a wider base of platforms. I'd rather figure it out from base principles and get it fixed, but this is sitting in the back of my mind as I do.

Flags: needinfo?(chutten)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P5 → P1

This test causes failures intermittently when left alone, and permanently if
we try to fix it.

It is decidedly an integration/systems test instead of a unit test, and our
framework isn't set up to do reliable and reproducible network errors.

The unit behaviour is covered by the other tests (it just ends up calling
recordSendFailure with a different string), so we're losing more coverage
by skipping the test by platform than we will by removing the test entirely.

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d7a1dcec29c3
Remove "health" ping submission timeout test r=TravisLong

With volumes of failures at 0 on the 14th and 15th, I think we can call this FIXED

(there may be one last intermittent summary comment coming on Monday, though)

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: