Closed Bug 1754168 Opened 2 years ago Closed 2 years ago

Perma (for a range of pushes) gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output during GeckoMediaPlugins.RemoveAndDeleteForcedInUse

Categories

(Testing :: GTest, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=367049304&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WbDOsQgJQeOzh5J9t4G8cA/runs/0/artifacts/public/logs/live_backing.log


[task 2022-02-08T10:39:40.666Z] 10:39:40     INFO -  TEST-START | GeckoMediaPlugins.RemoveAndDeleteForcedInUse
[task 2022-02-08T10:44:41.898Z] 10:44:41     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2022-02-08T10:44:41.907Z] 10:44:41  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2022-02-08T10:44:41.908Z] 10:44:41     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2022-02-08T10:44:41.967Z] 10:44:41    ERROR - Return code: 1
[task 2022-02-08T10:44:41.969Z] 10:44:41    ERROR - No tests run or test summary not found
[task 2022-02-08T10:44:41.969Z] 10:44:41     INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2022-02-08T10:44:41.969Z] 10:44:41  WARNING - # TBPL FAILURE #
[task 2022-02-08T10:44:41.969Z] 10:44:41  WARNING - setting return code to 2
[task 2022-02-08T10:44:41.969Z] 10:44:41  WARNING - The gtest suite: gtest ran with return status: FAILURE
[task 2022-02-08T10:44:41.970Z] 10:44:41     INFO - Running post-action listener: _package_coverage_data
[task 2022-02-08T10:44:41.970Z] 10:44:41     INFO - Running post-action listener: _resource_record_post_action
[task 2022-02-08T10:44:41.970Z] 10:44:41     INFO - Running post-action listener: process_java_coverage_data
[task 2022-02-08T10:44:41.970Z] 10:44:41     INFO - [mozharness: 2022-02-08 10:44:41.970398Z] Finished run-tests step (success)
[task 2022-02-08T10:44:41.970Z] 10:44:41     INFO - Running post-run listener: _resource_record_post_run
[task 2022-02-08T10:44:43.218Z] 10:44:43     INFO - instance_metadata.json not found; unable to determine instance type
[task 2022-02-08T10:44:43.235Z] 10:44:43     INFO - Validating Perfherder data against Z:\task_164431340618675\mozharness\external_tools\performance-artifact-schema.json
[task 2022-02-08T10:44:43.237Z] 10:44:43     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"name": "gtest.gtest.overall", "extraOptions": ["buildbot-unknown"], "subtests": [{"name": "cpu_percent", "value": 7.102858809801644}, {"name": "io_write_bytes", "value": 2836772864}, {"name": "io.read_bytes", "value": 436302336}, {"name": "io_write_time", "value": 101}, {"name": "io_read_time", "value": 64}]}, {"name": "gtest.gtest.start-pulseaudio", "subtests": [{"name": "time", "value": 0.0063648223876953125}]}, {"name": "gtest.gtest.install", "subtests": [{"name": "time", "value": 9.855278491973877}, {"name": "cpu_percent", "value": 12.808333333333332}]}, {"name": "gtest.gtest.stage-files", "subtests": [{"name": "time", "value": 0.6989486217498779}]}, {"name": "gtest.gtest.run-tests", "subtests": [{"name": "time", "value": 848.8707847595215}, {"name": "cpu_percent", "value": 7.036245572609219}]}]}
[task 2022-02-08T10:44:43.237Z] 10:44:43     INFO - Total resource usage - Wall time: 859s; CPU: 7%; Read bytes: 436302336; Write bytes: 2836772864; Read time: 64; Write time: 101
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: CPU usage<br/>7.0%
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: I/O read bytes / time<br/>436,302,336 / 64
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: I/O write bytes / time<br/>2,836,772,864 / 101
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: CPU idle<br/>6,384.8 (92.8%)
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: CPU system<br/>213.7 (3.1%)
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - TinderboxPrint: CPU user<br/>271.4 (3.9%)
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - install - Wall time: 10s; CPU: 13%; Read bytes: 217088; Write bytes: 354496512; Read time: 0; Write time: 19
[task 2022-02-08T10:44:43.238Z] 10:44:43     INFO - stage-files - Wall time: 1s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-02-08T10:44:43.244Z] 10:44:43     INFO - run-tests - Wall time: 849s; CPU: 7%; Read bytes: 436085248; Write bytes: 2426701824; Read time: 64; Write time: 81
[task 2022-02-08T10:44:43.429Z] 10:44:43  WARNING - returning nonzero exit status 2
[taskcluster 2022-02-08T10:44:43.522Z]    Exit Code: 2
[taskcluster 2022-02-08T10:44:43.522Z]    User Time: 15.625ms
[taskcluster 2022-02-08T10:44:43.522Z]  Kernel Time: 31.25ms
[taskcluster 2022-02-08T10:44:43.522Z]    Wall Time: 18m5.8418931s
[taskcluster 2022-02-08T10:44:43.522Z]       Result: FAILED
[taskcluster 2022-02-08T10:44:43.522Z] === Task Finished ===
[taskcluster 2022-02-08T10:44:43.522Z] Task Duration: 18m5.8448957s
[taskcluster 2022-02-08T10:44:43.668Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2023-02-08T10:26:11.026Z
[taskcluster 2022-02-08T10:44:44.238Z] Uploading artifact public/test_info/resource-usage.json from file build\blobber_upload_dir\resource-usage.json with content encoding "gzip", mime type "application/json" and expiry 2023-02-08T10:26:11.026Z
[taskcluster 2022-02-08T10:44:44.810Z] Uploading artifact public/test_info/system-info.log from file build\blobber_upload_dir\system-info.log with content encoding "gzip", mime type "text/plain" and expiry 2023-02-08T10:26:11.026Z
[taskcluster 2022-02-08T10:44:45.045Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2023-02-08T10:26:11.026Z
[taskcluster:error] exit status 2

GTest-1proc started to fail on Windows ASan when bug 1732737 landed which was a failure (WinWindowOcclusionTrackerInteractiveTest.OffscreenOcclusion) fixed by its backout. After the backout, a different test failed (GeckoMediaPlugins.RemoveAndDeleteForcedInUse) which had been passing when the former test was failing. It startted to pass when bug 1753061 landed. Push range

Gerald, do you have insights which would explain the behavior?

Flags: needinfo?(gsquelart)
Summary: Perma (sometimes) gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output → Perma (for a range of pushes) gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output during GeckoMediaPlugins.RemoveAndDeleteForcedInUse
See Also: → 1732737

Bryce, do you have insights which would explain this GMP test failing permanently for some pushes and then starting to pass again?

Flags: needinfo?(gsquelart) → needinfo?(bvandyk)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #2)

Bryce, do you have insights which would explain this GMP test failing permanently for some pushes and then starting to pass again?

I'm afraid not. Nothing about the bugs above seems like it should obviously impact the GMP test that is failing. The test failing above is testing that we can kill a GMP process and that various cleanup happens as expected.

I don't know how bug 1732737 would break this. If the patch stopped processes spawning or us doing file IO, it would make sense, but it seems pretty far from that.

Bug 1753061 is also unclear for me, but part of that is how big it is. It makes sense to me that there may be somewhere in there that is causing issues with some GMP logic, but I'm afraid I don't know what.

If this continues to be an issue, it would be useful if we could discover where the test is stalling (I don't know if we can easily get a stack before killing it). It should be a fast test, so a long stall suggests a blocking call just sitting there, and identifying that would be helpful to try and understand what's going on.

Flags: needinfo?(bvandyk)

This causes almost permanent Windows asan failures on mozilla-beta (example).

This permafails on latest simulation as well

Central-as-beta-simulation

How to run these simulations

Failure log

I'm currently hoping bug 1757116 can be resolved to help me try and repro this locally more easily.

Depends on: 1757116
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.