Closed Bug 1766707 Opened 2 years ago Closed 1 year ago

Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code -6 (right after "DelayedRunnable.TimerFiresBeforeRunnableRuns")

Categories

(Core :: XPCOM, defect, P5)

Unspecified
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered])

Filed by: abutkovits [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=376013287&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/M-z-zb87R1GljBAUY1DH-Q/runs/0/artifacts/public/logs/live_backing.log


[task 2022-04-27T20:30:20.211Z] 20:30:20     INFO -  TEST-START | DelayedRunnable.TimerFiresBeforeRunnableRuns
[task 2022-04-27T20:30:20.211Z] 20:30:20     INFO -  TEST-PASS | DelayedRunnable.TimerFiresBeforeRunnableRuns | test completed (time: 1ms)
[task 2022-04-27T20:30:20.273Z] 20:30:20     INFO -  gtest INFO | gtest | process wait complete, returncode=-6
[task 2022-04-27T20:30:20.273Z] 20:30:20     INFO -  mozcrash checking /opt/worker/tasks/task_165109122865668/build/tests/gtest for minidumps...
[task 2022-04-27T20:30:20.274Z] 20:30:20  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code -6
[task 2022-04-27T20:30:20.274Z] 20:30:20     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2022-04-27T20:30:20.370Z] 20:30:20    ERROR - Return code: 1
[task 2022-04-27T20:30:20.370Z] 20:30:20    ERROR - No tests run or test summary not found
[task 2022-04-27T20:30:20.370Z] 20:30:20     INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2022-04-27T20:30:20.370Z] 20:30:20  WARNING - # TBPL FAILURE #
[task 2022-04-27T20:30:20.370Z] 20:30:20  WARNING - setting return code to 2
[task 2022-04-27T20:30:20.370Z] 20:30:20  WARNING - The gtest suite: gtest ran with return status: FAILURE
[task 2022-04-27T20:30:20.371Z] 20:30:20     INFO - Running post-action listener: _package_coverage_data
[task 2022-04-27T20:30:20.371Z] 20:30:20     INFO - Running post-action listener: _resource_record_post_action
[task 2022-04-27T20:30:20.371Z] 20:30:20     INFO - Running post-action listener: process_java_coverage_data
[task 2022-04-27T20:30:20.371Z] 20:30:20     INFO - [mozharness: 2022-04-27 20:30:20.371523Z] Finished run-tests step (success)
[task 2022-04-27T20:30:20.371Z] 20:30:20     INFO - Running post-run listener: _resource_record_post_run
[task 2022-04-27T20:30:20.440Z] 20:30:20     INFO - instance_metadata.json not found; unable to determine instance type
[task 2022-04-27T20:30:20.440Z] 20:30:20     INFO - Validating Perfherder data against /opt/worker/tasks/task_165109122865668/mozharness/external_tools/performance-artifact-schema.json
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"name": "gtest.gtest.overall", "extraOptions": ["buildbot-unknown"], "subtests": [{"name": "cpu_percent", "value": 6.808991228070174}, {"name": "io_write_bytes", "value": 787038208}, {"name": "io.read_bytes", "value": 177094656}, {"name": "io_write_time", "value": 1112}, {"name": "io_read_time", "value": 1108}]}, {"name": "gtest.gtest.start-pulseaudio", "subtests": [{"name": "time", "value": 0.0004260540008544922}]}, {"name": "gtest.gtest.install", "subtests": [{"name": "time", "value": 23.773169994354248}, {"name": "cpu_percent", "value": 14.028985507246377}]}, {"name": "gtest.gtest.stage-files", "subtests": [{"name": "time", "value": 0.29721593856811523}]}, {"name": "gtest.gtest.run-tests", "subtests": [{"name": "time", "value": 54.60997915267944}, {"name": "cpu_percent", "value": 3.373557692307692}]}]}
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - Total resource usage - Wall time: 78s; CPU: 3%; Read bytes: 177094656; Write bytes: 787038208; Read time: 1108; Write time: 1112
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: CPU usage<br/>3.4%
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: I/O read bytes / time<br/>177,094,656 / 1,108
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: I/O write bytes / time<br/>787,038,208 / 1,112
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: CPU idle<br/>869.0 (93.2%)
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: CPU system<br/>17.0 (1.8%)
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: CPU user<br/>46.2 (5.0%)
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - TinderboxPrint: Swap in / out<br/>628,047,872 / 0
[task 2022-04-27T20:30:20.444Z] 20:30:20     INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-04-27T20:30:20.445Z] 20:30:20     INFO - install - Wall time: 24s; CPU: 14%; Read bytes: 374298112; Write bytes: 312791040; Read time: 21746; Write time: 547
[task 2022-04-27T20:30:20.445Z] 20:30:20     INFO - stage-files - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-04-27T20:30:20.446Z] 20:30:20     INFO - run-tests - Wall time: 55s; CPU: 3%; Read bytes: 110313472; Write bytes: 254787584; Read time: 650; Write time: 238
[task 2022-04-27T20:30:20.466Z] 20:30:20  WARNING - returning nonzero exit status 2
[taskcluster 2022-04-27T20:30:20.498Z]    Exit Code: 2
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Based on comment 0 and the 2 most recent failures (from comment 32), it looks like this is always happening right after this line:

DelayedRunnable.TimerFiresBeforeRunnableRuns | test completed

Let's formalize that in the bug title.

That association with DelayedRunnable.TimerFiresBeforeRunnableRuns suggests there may be a connection to this test or the underlying code.

hg blame shows that the TimerFiresBeforeRunnableRuns test was added (in bug 1704887). But we also had some other changes to the DelayedRunnable class in March 24, 2022 (in bug 1738103) about 1 month before this (low-frequency) intermittent failure was filed.

Anyway: CC'ing some relevant folks who've touched DelayedRunnable, and tentatively reclassifying to XPCOM.

Component: GTest → XPCOM
Product: Testing → Core
Summary: Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code -6 → Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code -6 (right after "DelayedRunnable.TimerFiresBeforeRunnableRuns")

Also: it looks like this is exclusively an issue on macOS, affecting these two tasks (the opt & debug gtest tasks):

 OS X 10.15 WebRender opt test-macosx1015-64-qr/opt-gtest-1proc GTest-1proc
 OS X 10.15 WebRender debug test-macosx1015-64-qr/debug-gtest-1proc GTest-1proc

In case it's useful, here's the most recent "bad" log, vs. a recent reference "good" log for the same task type:

Bad: https://treeherder.mozilla.org/logviewer?job_id=413021994&repo=autoland&lineNumber=3149
Good: https://treeherder.mozilla.org/logviewer?job_id=414291515&repo=autoland

In the "Good" log, the DelayedRunnable.TimerFiresBeforeRunnableRuns line is followed by us starting some other test, rather than a crash:

INFO -  TEST-START | DelayedRunnable.TimerFiresBeforeRunnableRuns
INFO -  TEST-PASS | DelayedRunnable.TimerFiresBeforeRunnableRuns | test completed (time: 2ms)
INFO -  TEST-START | Encoding.GoodSurrogatePair
INFO -  TEST-PASS | Encoding.GoodSurrogatePair | test completed (time: 0ms)

(So this isn't just like a shutdown crash being arbitrarily blamed on the final test, or that sort of thing.)

OS: Unspecified → macOS

Not sure what happens here since there's no crash report, but I see that the test is prone to misbehave in the face of spurious wakeups, so let's fix that.

Depends on: 1830825
Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.