Open Bug 1115253 Opened 10 years ago Updated 1 day ago

Perma gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300|1200|1800 seconds

Categories

(Core :: Audio/Video: GMP, defect, P3)

x86
macOS
defect

Tracking

()

People

(Reporter: philor, Unassigned)

References

(Depends on 3 open bugs)

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [test disabled on android][stockwell disabled])

Attachments

(4 files)

No description provided.
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Most (all?) of these times out are in TestNamedPipeService: 11:34:01 INFO - TEST-START | TestNamedPipeService.Test 11:39:01 ERROR - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200 seconds 11:39:02 ERROR - Return code: 1 11:39:02 ERROR - No tests run or test summary not found
This failure became much more frequent after the changes in bug 1313752. :bsmedberg - just letting you know!
Flags: needinfo?(benjamin)
Depends on: 1322377
gbrown, this bug appears to cover numerous issues that are all lumped together. Would it be possible for you to classify these gtest timeout by the last test that started before the timeout, or whether the timeout happens during shutdown? For instance: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-esr45&job_id=131509#L29804 This should be TEST-UNEXPECTED-FAIL | gtest | test-shutdown | timed out after 1200 seconds because the timeout happens after we've finished running all the tests and printed the "TEST-PASS | GTest unit test: passed" line. https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=40225283#L7726 This should be TEST-UNEXPECTED-FAIL | gtest | TestNamedPipeService.Test | timed out... Because the last test was TEST-START | TestNamedPipeService.Test I filed bug 1322377 specifically for the TestNamedPipeService work.
Flags: needinfo?(benjamin)
Depends on: 1322616
Almost all recent failures are from mozilla-esr45, in GeckoMediaPlugins.GMPTestCodec.
Depends on: 1332352
(In reply to Benjamin Smedberg [:bsmedberg] from comment #128) > gbrown, this bug appears to cover numerous issues that are all lumped > together. Would it be possible for you to classify these gtest timeout by > the last test that started before the timeout, or whether the timeout > happens during shutdown? That's a good idea and I know we do that for most other test types. It looks a little tricky for gtest; it's not obvious to me how to implement for gtest, but I've filed bug 1332352.
Summary: Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200 seconds → Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200|1800 seconds
This Bug occurred 32 times in the last 7 days on Windows 7 and Linux. It is failing only on debug build type. Failing tests: debug-gtest Link to a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=145900700&lineNumber=1504 Part of a log: 11:40:50 INFO - TEST-START | Moz2D.JobScheduler_Shutdown 15861 11:45:50 ERROR - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200 seconds 15862 11:45:50 ERROR - Return code: 1 15863 11:45:50 ERROR - No tests run or test summary not found 15864 11:45:50 INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em> 15865 11:45:50 ERROR - # TBPL FAILURE # 15866 11:45:50 WARNING - setting return code to 2 15867 11:45:50 ERROR - The gtest suite: gtest ran with return status: FAILURE 15868 11:45:50 INFO - Running post-action listener: _package_coverage_data 15869 11:45:50 INFO - Running command: ['zip', '-r', 'Z:\\task_1511091199\\build\\blobber_upload_dir\\code-coverage-gcda.zip', '.'] in c:\users\genericworker\appdata\local\temp\tmpwxmpby 15870 11:45:50 INFO - Copy/paste: zip -r Z:\task_1511091199\build\blobber_upload_dir\code-coverage-gcda.zip . 15871 11:45:50 INFO - zip error: Nothing to do! (try: zip -r Z:\task_1511091199\build\blobber_upload_dir\code-coverage-gcda.zip . -i .) 15872 11:45:50 ERROR - Return code: 12 15873 11:45:50 INFO - Running command: ['zip', '-r', 'Z:\\task_1511091199\\build\\blobber_upload_dir\\code-coverage-jsvm.zip', '.'] in c:\users\genericworker\appdata\local\temp\tmpjuwroc 15874 11:45:50 INFO - Copy/paste: zip -r Z:\task_1511091199\build\blobber_upload_dir\code-coverage-jsvm.zip . 15875 11:45:50 INFO - adding: 1511091346-6904-0.info (152 bytes security) (stored 0%) 15876 11:45:50 INFO - Return code: 0 15877 11:45:50 INFO - proxxy config: {} 15878 11:45:50 INFO - trying https://queue.taskcluster.net/v1/task/bhAm7VGnRoqY4_R60i6DEQ/artifacts/public/build/target.code-coverage-gcno.zip 15879 11:45:50 INFO - Downloading https://queue.taskcluster.net/v1/task/bhAm7VGnRoqY4_R60i6DEQ/artifacts/public/build/target.code-coverage-gcno.zip to c:\users\genericworker\appdata\local\temp\tmp40ncpa\target.code-coverage-gcno.zip 15880 11:45:50 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/bhAm7VGnRoqY4_R60i6DEQ/artifacts/public/build/target.code-coverage-gcno.zip', 'file_name': 'c:\\users\\genericworker\\appdata\\local\\temp\\tmp40ncpa\\target.code-coverage-gcno.zip'}, attempt #1 15881 11:45:52 INFO - Downloaded 116843543 bytes. 15882 11:45:52 INFO - Getting output from command: ['c:\\users\\genericworker\\appdata\\local\\temp\\tmp40ncpa\\grcov', '-t', 'lcov', '-p', 'z:/build/build/src/', '--ignore-dir', 'gcc', 'c:\\users\\genericworker\\appdata\\local\\temp\\tmp40ncpa\\target.code-coverage-gcno.zip', 'Z:\\task_1511091199\\build\\blobber_upload_dir\\code-coverage-gcda.zip', '--llvm'] 15883 11:45:52 INFO - Copy/paste: c:\users\genericworker\appdata\local\temp\tmp40ncpa\grcov -t lcov -p z:/build/build/src/ --ignore-dir gcc c:\users\genericworker\appdata\local\temp\tmp40ncpa\target.code-coverage-gcno.zip Z:\task_1511091199\build\blobber_upload_dir\code-coverage-gcda.zip --llvm 15884 11:45:52 ERROR - Errors received: 15885 11:45:52 INFO - Reading from file tmpfile_stderr 15886 11:45:52 ERROR - thread '<unnamed>' panicked at 'Failed to open ZIP file 'Z:\task_1511091199\build\blobber_upload_dir\code-coverage-gcda.zip'.: Error { repr: Os { code: 2, message: "The system cannot find the file specified." } }', src\libcore\result.rs:906:4 15887 11:45:52 ERROR - note: Run with `RUST_BACKTRACE=1` for a backtrace. 15888 11:45:52 ERROR - Return code: 0 15889 11:45:52 INFO - Running command: ['zip', 'Z:\\task_1511091199\\build\\blobber_upload_dir\\code-coverage-grcov.zip', 'grcov_lcov_output.info'] in c:\users\genericworker\appdata\local\temp\tmp40ncpa 15890 11:45:52 INFO - Copy/paste: zip Z:\task_1511091199\build\blobber_upload_dir\code-coverage-grcov.zip grcov_lcov_output.info 15891 11:45:52 INFO - adding: grcov_lcov_output.info (152 bytes security) (stored 0%) 15892 11:45:52 INFO - Return code: 0 Also this bug is blocked by these 2 Bugs: - 1322616 "gtest timeout message does not differentiate between timeout and no-output timeout" - 1332352 "Last test seen not part of gtest timeout message" : Selena Deckelmann could you please take a look?
Flags: needinfo?(sdeckelmann)
Whiteboard: [stockwell needswork]
Many recent failures are windows 10 ccov gtest, which was just enabled in bug 1417436. It looks like there are other complaints in that bug about failures and it may be hidden soon. ni :marco to make sure he is aware of this.
Blocks: 1417436
Flags: needinfo?(mcastelluccio)
The ccov GTest failures will be fixed by bug 1418255. This bug contains issues that are unrelated to each other. We should probably make GTest fail more verbosely and file separate bugs for each problem.
Flags: needinfo?(mcastelluccio)
No longer blocks: 1417436
No longer depends on: 1322616
See Also: → 1322616
No longer depends on: 1332352
See Also: → 1332352
Depends on: 1419196
Component: General → Graphics
Flags: needinfo?(sdeckelmann)
With changes landing in bug 1322616 today, some (possibly most or even all) failures previously reported here will be reported as "gtest | timed out after 300 seconds without output".
Blocks: 1426037
See Also: → 1473531
Blocks: 1411358
Over the last 7 days there have been 36 failures on this bug, these happen on linux64, windows10-64-ccov. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=186736776&repo=mozilla-central&lineNumber=856 23:19:25 ERROR - shutil error: Destination path 'Z:\task_1530832153\build\application\firefox\gmp-clearkey' already exists
Flags: needinfo?(dbolter)
Maire is gmp-clearkey a media thing?
Flags: needinfo?(dbolter) → needinfo?(mreavy)
Note that most recent failures are on ccov - bug 1473531.
See Also: → 1474254
There are 32 failures in the last 7 days on windows10-64-ccov debug.
There have been 37 failures in the last 7 days: All the failures occur on the debug build type. Failures per platform: -windows10-64-ccov: 35 -linux32: 1 -windows7-32: 1 Recent log file: https://treeherder.mozilla.org/logviewer.html#?job_id=190059853&repo=mozilla-central&lineNumber=853
(In reply to Andrew Halberstadt [:ahal] from comment #214) > This is perma-failing on windows10-64-ccov: ...bug 1474254.
Depends on: 1474254
See Also: 1474254
Component: Graphics → Audio/Video: GMP
Flags: needinfo?(mreavy)
No longer blocks: 1411358
Summary: Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200|1800 seconds → Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300|1200|1800 seconds

This bug has failed 34 times in the last 7 days. Occurs on linux64-ccov on debug build type.

Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=236418885&repo=mozilla-central

There are 43 total failures in the last 7 days on linux64-ccov debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=238572618&repo=mozilla-central&lineNumber=1250

[task 2019-04-06T12:33:52.031Z] 12:33:52 INFO - TEST-START | WebRtcIceConnectTest.TestConnectShutdownOneSide
[task 2019-04-06T12:33:52.034Z] 12:33:52 INFO - Candidate for stream P1:stream0 initialized: candidate:0 1 UDP 2122187007 172.17.0.5 56476 typ host
[task 2019-04-06T12:33:52.034Z] 12:33:52 INFO - Candidate for stream P1:stream0 initialized: candidate:2 1 UDP 2122252543 fd15:4ba5:5a2b:100a:0:242:ac11:5 32971 typ host
[task 2019-04-06T12:33:52.035Z] 12:33:52 INFO - P1 Gathering complete

[task 2019-04-06T12:33:52.057Z] 12:33:52 INFO - P2 ICE connected
[task 2019-04-06T12:33:53.707Z] 12:33:53 ERROR - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200 seconds
[task 2019-04-06T12:33:59.186Z] 12:33:59 ERROR - Return code: 1
[task 2019-04-06T12:33:59.186Z] 12:33:59 ERROR - No tests run or test summary not found
[task 2019-04-06T12:33:59.187Z] 12:33:59 INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2019-04-06T12:33:59.187Z] 12:33:59 ERROR - # TBPL FAILURE #
[task 2019-04-06T12:33:59.187Z] 12:33:59 WARNING - setting return code to 2
[task 2019-04-06T12:33:59.187Z] 12:33:59 ERROR - The gtest suite: gtest ran with return status: FAILURE
[task 2019-04-06T12:33:59.188Z] 12:33:59 INFO - Running post-action listener: _package_coverage_data

Nils can you assign someone to take a look?

Flags: needinfo?(drno)
Whiteboard: [stockwell unknown] → [stockwell needswork:owner]

Over the last 7 days there are 32 failure on this bug. These happen on android-em-7-0-x86_64, linux32, linux64-ccov, linux64-ccov

Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240257877&repo=autoland&lineNumber=3914

The android-em-7-0-x86_64 failures began when we started running gtest on that platform. Android test logs usually show ImageBitmapColorUtils as the last running test, but logcats show that MultiWriterQueue is the real culprit.

04-15 11:50:35.890 2402 2417 I gtest : TEST-START | ImageBitmapColorUtils.LabToBGRA32
04-15 11:50:35.890 2402 2417 I gtest : TEST-PASS | ImageBitmapColorUtils.LabToBGRA32 | test completed (time: 0ms)
04-15 11:50:35.890 2402 2417 I gtest : TEST-START | MultiWriterQueue.SingleThreaded
04-15 11:50:35.960 2402 2417 I gtest : TEST-PASS | MultiWriterQueue.SingleThreaded | test completed (time: 70ms)
04-15 11:50:35.960 2402 2417 I gtest : TEST-START | MultiWriterQueue.MultiWriterSingleReader
04-15 11:50:42.310 2402 2417 D Telemetry: MainThreadArmPersistenceTimer

Recent linux64-ccov logs look quite different from the Android case.

I missed this intermittent when enabling Android gtests in bug 1318091. Disable
the failing test on Android only, to green up Android gtest.

Keywords: leave-open
Whiteboard: [stockwell needswork:owner] → [stockwell needswork:owner] [test disabled on android]
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7d37a22773a0 Disable gtest TestMultiWriterQueue on Android; r=jmaher

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=busted&selectedJob=240393974&revision=7d37a22773a008ecb76f1a01f09f915c2f18f613

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240394002&repo=autoland&lineNumber=17780

Backout link: https://hg.mozilla.org/integration/autoland/rev/b02881ec59d7df457d97fac800244fe918017425

[task 2019-04-15T17:52:33.317Z] 17:52:33 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/media/doctor/gtest'
[task 2019-04-15T17:52:33.319Z] 17:52:33 INFO - /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang++ --target=arm-linux-androideabi -o Unified_cpp_media_doctor_gtest0.o -c -flto=thin -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DNDEBUG=1 -DTRIMMED=1 -DOS_POSIX=1 -DOS_LINUX=1 -DSTATIC_EXPORTABLE_JS_API -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/dom/media/doctor/gtest -I/builds/worker/workspace/build/src/obj-firefox/dom/media/doctor/gtest -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/dom/media/doctor -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Qunused-arguments -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include/arm-linux-androideabi -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include -gcc-toolchain /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64 -D__ANDROID_API_=16 -fstack-protector-strong -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -mno-unaligned-access -fno-sized-deallocation -fno-short-enums -fno-exceptions -stdlib=libstdc++ -I/builds/worker/workspace/build/src/android-ndk/sources/cxx-stl/llvm-libc++/include -I/builds/worker/workspace/build/src/android-ndk/sources/android/support/include -I/builds/worker/workspace/build/src/android-ndk/sources/cxx-stl/llvm-libc++abi/include -march=armv7-a -mthumb -mfpu=neon -mfloat-abi=softfp -fstack-protector-strong -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -Oz -fno-omit-frame-pointer -funwind-tables -Werror -Wno-error=shadow -MD -MP -MF .deps/Unified_cpp_media_doctor_gtest0.o.pp /builds/worker/workspace/build/src/obj-firefox/dom/media/doctor/gtest/Unified_cpp_media_doctor_gtest0.cpp
[task 2019-04-15T17:52:33.320Z] 17:52:33 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/dom/media/doctor/gtest/Unified_cpp_media_doctor_gtest0.cpp:2:
[task 2019-04-15T17:52:33.320Z] 17:52:33 INFO - In file included from /builds/worker/workspace/build/src/dom/media/doctor/gtest/TestRollingNumber.cpp:7:
[task 2019-04-15T17:52:33.320Z] 17:52:33 ERROR - /builds/worker/workspace/build/src/dom/media/doctor/RollingNumber.h:73:30: error: use of undeclared identifier 'MaxDiff'
[task 2019-04-15T17:52:33.320Z] 17:52:33 INFO - MOZ_ASSERT(aIncrement <= MaxDiff);
[task 2019-04-15T17:52:33.320Z] 17:52:33 INFO - ^
[task 2019-04-15T17:52:33.321Z] 17:52:33 ERROR - /builds/worker/workspace/build/src/dom/media/doctor/RollingNumber.h:84:30: error: use of undeclared identifier 'MaxDiff'
[task 2019-04-15T17:52:33.321Z] 17:52:33 INFO - MOZ_ASSERT(aDecrement <= MaxDiff);

Flags: needinfo?(gbrown)

I missed this intermittent when enabling Android gtests in bug 1318091. Disable the failing test on Android only, to green up Android gtest.

Attachment #9058353 - Attachment description: Bug 1115253 - Disable dom/media/docter gtests on Android; r=jmaher → Bug 1115253 - Disable dom/media/doctor gtests on Android; r=jmaher
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2d9779f97ccb Disable dom/media/doctor gtests on Android; r=jmaher
Flags: needinfo?(gbrown)
Whiteboard: [test disabled on android][stockwell disable-recommended] → [stockwell needswork:owner] [test disabled on android]

This bug failed 32 times in the last 7 days. Occurs on linux64-ccov, windows7-32, windows10-64, windows10-aarch64 on debug build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=242744985&repo=mozilla-central&lineNumber=1251
,

There are 37 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-27&endday=2019-05-04&tree=trunk&bug=1115253

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=244608102&repo=mozilla-central&lineNumber=41997

[task 2019-05-03T23:35:01.165Z] 23:35:01 ERROR - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 1200 seconds
[task 2019-05-03T23:35:04.986Z] 23:35:04 ERROR - Return code: 1
[task 2019-05-03T23:35:04.986Z] 23:35:04 ERROR - No tests run or test summary not found
[task 2019-05-03T23:35:04.987Z] 23:35:04 INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2019-05-03T23:35:04.987Z] 23:35:04 ERROR - # TBPL FAILURE #
[task 2019-05-03T23:35:04.987Z] 23:35:04 WARNING - setting return code to 2
[task 2019-05-03T23:35:04.987Z] 23:35:04 ERROR - The gtest suite: gtest ran with return status: FAILURE
[task 2019-05-03T23:35:04.988Z] 23:35:04 INFO - Running post-action listener: package_coverage_data
[task 2019-05-03T23:35:05.031Z] 23:35:05 INFO - Getting output from command: ['/builds/worker/fetches/grcov', '-t', 'lcov', '-p', '/builds/worker/workspace/build/src/', '--ignore-dir', 'gcc*', '--ignore-dir', 'vs2017
', '/builds/worker/fetches/target.code-coverage-gcno.zip', '/tmp/tmpJECiGb']
[task 2019-05-03T23:35:05.031Z] 23:35:05 INFO - Copy/paste: /builds/worker/fetches/grcov -t lcov -p /builds/worker/workspace/build/src/ --ignore-dir gcc
--ignore-dir vs2017_* /builds/worker/fetches/target.code-coverage-gcno.zip /tmp/tmpJECiGb
[task 2019-05-03T23:35:43.135Z] 23:35:43 INFO - Running post-action listener: _resource_record_post_action
[task 2019-05-03T23:35:43.135Z] 23:35:43 INFO - Running post-action listener: process_java_coverage_data
[task 2019-05-03T23:35:43.135Z] 23:35:43 INFO - [mozharness: 2019-05-03 23:35:43.135448Z] Finished run-tests step (success)
[task 2019-05-03T23:35:43.135Z] 23:35:43 INFO - Running post-run listener: _resource_record_post_run
[task 2019-05-03T23:35:43.250Z] 23:35:43 INFO - Validating Perfherder data against /builds/worker/workspace/mozharness/external_tools/performance-artifact-schema.json
[task 2019-05-03T23:35:43.255Z] 23:35:43 INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"subtests": [{"name": "cpu_percent", "value": 22.330197063369415}, {"name": "io_write_bytes", "value": 10843070464}, {"name": "io.read_bytes", "value": 476684288}, {"name": "io_write_time", "value": 1875588}, {"name": "io_read_time", "value": 5984}], "extraOptions": ["taskcluster-m3.xlarge"], "name": "gtest.gtest.overall"}, {"subtests": [{"name": "time", "value": 52.412179946899414}, {"name": "cpu_percent", "value": 25.209134615384617}], "name": "gtest.gtest.install"}, {"subtests": [{"name": "time", "value": 1.6050779819488525}], "name": "gtest.gtest.stage-files"}, {"subtests": [{"name": "time", "value": 1243.031630039215}, {"name": "cpu_percent", "value": 22.204213709677433}], "name": "gtest.gtest.run-tests"}]}
[task 2019-05-03T23:35:43.255Z] 23:35:43 INFO - Total resource usage - Wall time: 1296s; CPU: 22.0%; Read bytes: 476684288; Write bytes: 10843070464; Read time: 5984; Write time: 1875588
[task 2019-05-03T23:35:43.255Z] 23:35:43 INFO - TinderboxPrint: CPU usage<br/>22.2%
[task 2019-05-03T23:35:43.255Z] 23:35:43 INFO - TinderboxPrint: I/O read bytes / time<br/>476,684,288 / 5,984
[task 2019-05-03T23:35:43.255Z] 23:35:43 INFO - TinderboxPrint: I/O write bytes / time<br/>10,843,070,464 / 1,875,588
[task 2019-05-03T23:35:43.256Z] 23:35:43 INFO - TinderboxPrint: CPU idle<br/>3,884.3 (77.2%)
[task 2019-05-03T23:35:43.256Z] 23:35:43 INFO - TinderboxPrint: CPU system<br/>367.0 (7.3%)
[task 2019-05-03T23:35:43.256Z] 23:35:43 INFO - TinderboxPrint: CPU user<br/>735.5 (14.6%)
[task 2019-05-03T23:35:43.256Z] 23:35:43 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2019-05-03T23:35:43.258Z] 23:35:43 INFO - install - Wall time: 52s; CPU: 25.0%; Read bytes: 8192; Write bytes: 2044661760; Read time: 28; Write time: 425576
[task 2019-05-03T23:35:43.259Z] 23:35:43 INFO - stage-files - Wall time: 2s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2019-05-03T23:35:43.273Z] 23:35:43 INFO - run-tests - Wall time: 1243s; CPU: 22.0%; Read bytes: 475602944; Write bytes: 8664330240; Read time: 5948; Write time: 1436824
[task 2019-05-03T23:35:43.695Z] 23:35:43 WARNING - returning nonzero exit status 2
[task 2019-05-03T23:35:43.695Z] WARNING: No valid records found in []
[task 2019-05-03T23:35:43.730Z] cleanup
[task 2019-05-03T23:35:43.731Z] + cleanup
[task 2019-05-03T23:35:43.731Z] + local rv=2
[task 2019-05-03T23:35:43.731Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2019-05-03T23:35:43.731Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2019-05-03T23:35:43.733Z] + true
[task 2019-05-03T23:35:43.733Z] + cleanup_xvfb
[task 2019-05-03T23:35:43.734Z] pidof Xvfb
[task 2019-05-03T23:35:43.734Z] ++ pidof Xvfb
[task 2019-05-03T23:35:43.751Z] + local xvfb_pid=28
[task 2019-05-03T23:35:43.751Z] + local vnc=false
[task 2019-05-03T23:35:43.751Z] + local interactive=false
[task 2019-05-03T23:35:43.753Z] + '[' -n 28 ']'
[task 2019-05-03T23:35:43.753Z] + [[ false == false ]]
[task 2019-05-03T23:35:43.753Z] + [[ false == false ]]
[task 2019-05-03T23:35:43.753Z] + kill 28
[task 2019-05-03T23:35:43.753Z] + screen -XS xvfb quit
[task 2019-05-03T23:35:43.753Z] XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
[task 2019-05-03T23:35:43.753Z] after 10639 requests (10639 known processed) with 0 events remaining.
[task 2019-05-03T23:35:43.753Z] compizconfig - Info: Backend : ini
[task 2019-05-03T23:35:43.753Z] compizconfig - Info: Integration : true
[task 2019-05-03T23:35:43.753Z] compizconfig - Info: Profile : default
[task 2019-05-03T23:35:43.967Z] No screen session found.
[task 2019-05-03T23:35:43.967Z] + true
[task 2019-05-03T23:35:43.967Z] + exit 2
[taskcluster 2019-05-03 23:35:44.419Z] === Task Finished ===
[taskcluster 2019-05-03 23:35:48.072Z] Unsuccessful task run with exit code: 2 completed in 1655.222 seconds

I believe the linux64-ccov perma-fail is because the job is running for too long: should be able to fix by increasing the timeout.

Flags: needinfo?(gbrown)

linux64-ccov gtest has been perma-fail (mozilla-central only) since approximately March 20, 2019.

Depends on: 1549380

I will work on the ccov perma-fail in bug 1549380.

Flags: needinfo?(gbrown)

Recent occurences here are bug 1567921. Conversation about a fix can be found in bug 1486042

In the last 7 days there were 45 failures associated with this bug: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-08-14&endday=2019-08-21&tree=trunk&bug=1115253
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262243732&repo=mozilla-inbound&lineNumber=1665
Occurs on macosx1014-64 debug build type.
:dbrown any updates on this one?

The macosx debug time-outs are in XPCOMDeadlockDetectorTest.XPCOMSanityDeathTest, which has not been updated recently.

Likely started in this range:

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=gtest&fromchange=e7ff4966b99e57346c8bff748b0dc06c074cfb00&tochange=5ae16b0b7a440df1b08490d5c4447dee36b2e179

I'm retriggering to narrow that down.

These failures were caused by bug 1035892, which was backed out for causing this failure, mostly reported in bug 1574515.

This bug no longer needs attention; clearing the longstanding ni.

Flags: needinfo?(drno)
Keywords: leave-open
See Also: → 1574515

(In reply to Intermittent Failures Robot from comment #298)
Logcat shows:

04-01 16:43:51.370  3982  3997 I gtest   : TEST-START | ExtractVPXCodecDetails.TestInputData
04-01 16:43:51.370  3982  3997 I Gecko   : [3982, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x80070057: file /builds/worker/checkouts/gecko/dom/media/VideoUtils.cpp, line 296
04-01 16:43:51.370  3982  3997 I gtest   : TEST-PASS | ExtractVPXCodecDetails.TestInputData | test completed (time: 0ms)
04-01 16:43:51.370  3982  3997 I gtest   : TEST-START | ExtractVPXCodecDetails.TestParsingOutput
04-01 16:43:51.370  3982  3997 I gtest   : TEST-PASS | ExtractVPXCodecDetails.TestParsingOutput | test completed (time: 0ms)
04-01 16:43:51.370  3982  3997 I gtest   : TEST-START | AudioBuffers.Test
04-01 16:43:51.370  3982  3997 I gtest   : TEST-PASS | AudioBuffers.Test | test completed (time: 0ms)
04-01 16:43:51.370  3982  3997 I gtest   : TEST-START | TestAudioCallbackDriver.StartStop
04-01 16:43:51.570  3982  3997 I gtest   : TEST-UNEXPECTED-FAIL | TestAudioCallbackDriver.StartStop | Value of: driver->ThreadRunning()
04-01 16:43:51.570  3982  3997 I gtest   :   Actual: false
04-01 16:43:51.570  3982  3997 I gtest   : Expected: true
04-01 16:43:51.570  3982  3997 I gtest   : Verify thread is running @ /builds/worker/checkouts/gecko/dom/media/gtest/TestAudioCallbackDriver.cpp:73
04-01 16:43:51.571  3982  3997 I gtest   : TEST-UNEXPECTED-FAIL | TestAudioCallbackDriver.StartStop | Value of: driver->IsStarted()
04-01 16:43:51.571  3982  3997 I gtest   :   Actual: false
04-01 16:43:51.571  3982  3997 I gtest   : Expected: true
04-01 16:43:51.571  3982  3997 I gtest   : Verify thread is started @ /builds/worker/checkouts/gecko/dom/media/gtest/TestAudioCallbackDriver.cpp:74

Recent failures are on Windows-10/debug and seem to fail like:
https://treeherder.mozilla.org/logviewer?job_id=320296393&repo=autoland&lineNumber=28175

[task 2020-10-30T20:08:39.010Z] 20:08:39     INFO -  TEST-START | ImageDecoders.AVIFStackCheck
[task 2020-10-30T20:13:39.069Z] 20:13:39     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2020-10-30T20:13:39.069Z] 20:13:39  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2020-10-30T20:13:39.069Z] 20:13:39     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2020-10-30T20:13:39.134Z] 20:13:39    ERROR - Return code: 1

(In reply to Geoff Brown [:gbrown] from comment #303)

Recent failures are on Windows-10/debug and seem to fail like:
https://treeherder.mozilla.org/logviewer?job_id=320296393&repo=autoland&lineNumber=28175

[task 2020-10-30T20:08:39.010Z] 20:08:39     INFO -  TEST-START | ImageDecoders.AVIFStackCheck
[task 2020-10-30T20:13:39.069Z] 20:13:39     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2020-10-30T20:13:39.069Z] 20:13:39  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2020-10-30T20:13:39.069Z] 20:13:39     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2020-10-30T20:13:39.134Z] 20:13:39    ERROR - Return code: 1

:jbauman - any idea what's happening here?

Flags: needinfo?(jbauman)

No idea, really. The ImageDecoders.AVIFStackCheck test was put in to guard against a regression where the dav1d decoder library would crash on certain input due to requiring a larger stack than we had allocated to ImgLib decoder threads by default. It seems to work the vast majority of the time, so I've never been able to reproduce this behavior locally or consistently to figure out what's going wrong.

I'd say rare failures can generally be ignored as it should catch a regression (i.e., changing the stack size back to something smaller) with 100% consistency, but it's also frustrating to get these false positives, so I'm not sure what the best way forward is.

Flags: needinfo?(jbauman)
Depends on: 1675073

In the last 7 days there have been 32 occurrences on windows10-64 debug and opt.

Recent failure: https://treeherder.mozilla.org/logviewer?job_id=322583426&repo=autoland&lineNumber=27950

There have been 31 total failures in the last 7 days:

Affected platforms:

  • windows7-32 debug
  • windows10-64-ccov-qr
  • windows10-64 opt & debug
  • linux1804-64-qr

Geoff are you working on this?

Flags: needinfo?(whole.grains)
Whiteboard: [test disabled on android][stockwell unknown] → [test disabled on android][stockwell needswork:owner]

I only did the initial investigation on 2020-11-02/03.

Almost all of the current failures here are caused by bug 1675073. There has been investigation and discussion in bug 1675073, but no changes landed. I have suggested that the test be disabled if a fix is delayed, but that failed to prompt any action.

Flags: needinfo?(whole.grains)

Thank you Geoff, I'll ask again if disabling is an option in bug 1675073.

Depends on: 1730524

This bug continues to collect infrequent test timeouts from a variety of causes, but lately there have been more failures. Most recent failures are on Windows and seem to occur during WinWindowOcclusionTrackerInteractiveTest.OcclusionAfterVisibilityToggle:

https://treeherder.mozilla.org/logviewer?job_id=357240934&repo=autoland&lineNumber=33217

[task 2021-11-06T19:47:29.067Z] 19:47:29     INFO -  TEST-START | WinWindowOcclusionTrackerInteractiveTest.OcclusionAfterVisibilityToggle
[task 2021-11-06T19:52:29.100Z] 19:52:29     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2021-11-06T19:52:29.107Z] 19:52:29  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output

:sotaro -- Could this be related to bug 1737526?

Flags: needinfo?(sotaro.ikeda.g)

I am going to look into timeout of WinWindowOcclusionTrackerInteractiveTest.OcclusionAfterVisibilityToggle.

Update:
There have been 29 failures within the last 7 days:
• 1 failures on Linux 18.04 WebRender debug
• 1 failures on Linux 18.04 WebRender Shippable opt
• 2 failures on Linux 18.04 x64 WebRender asan opt
• 1 failures on Linux 18.04 x64 CCov WebRender opt
• 7 failures on Linux 18.04 x64 WebRender opt
• 13 failures on Linux 18.04 x64 WebRender debug
• 1 failures on Linux 18.04 x64 WebRender Shippable opt
• 2 failures on Windows 10 x86 2004 WebRender opt
• 1 failures on Windows 10 x64 2004 WebRender opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=360864453&repo=autoland&lineNumber=377153

[task 2021-12-11T00:39:41.369Z] 00:39:41     INFO -  TEST-START | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2
[task 2021-12-11T00:39:41.369Z] 00:39:41     INFO -  Checking cipher: 1
[task 2021-12-11T00:39:41.369Z] 00:39:41     INFO -  TEST-PASS | SrtpParamInit/TransportSrtpParameterTest.TestSrtpCiphersMismatchCombinations/2 | test completed (time: 218ms)
[task 2021-12-11T00:39:41.369Z] 00:39:41     INFO -  TEST-PASS | GTest unit test: passed
[task 2021-12-11T00:39:41.370Z] 00:39:41     INFO -  Passed: 6919
[task 2021-12-11T00:39:41.370Z] 00:39:41     INFO -  Failed: 0
[task 2021-12-11T00:44:41.294Z] 00:44:41     INFO -  gtest INFO | gtest | process wait complete, returncode=-15
[task 2021-12-11T00:44:41.295Z] 00:44:41  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2021-12-11T00:44:41.295Z] 00:44:41     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2021-12-11T00:44:41.335Z] 00:44:41    ERROR - Return code: 1
[task 2021-12-11T00:44:41.335Z] 00:44:41     INFO - TinderboxPrint: gtest-gtest<br/>6919/0
[task 2021-12-11T00:44:41.335Z] 00:44:41  WARNING - # TBPL FAILURE #
[task 2021-12-11T00:44:41.336Z] 00:44:41  WARNING - setting return code to 2
[task 2021-12-11T00:44:41.336Z] 00:44:41  WARNING - The gtest suite: gtest ran with return status: FAILURE
[task 2021-12-11T00:44:41.336Z] 00:44:41     INFO - Running post-action listener: _package_coverage_data
[task 2021-12-11T00:44:41.336Z] 00:44:41     INFO - Running post-action listener: _resource_record_post_action
[task 2021-12-11T00:44:41.336Z] 00:44:41     INFO - Running post-action listener: process_java_coverage_data
[task 2021-12-11T00:44:41.336Z] 00:44:41     INFO - [mozharness: 2021-12-11 00:44:41.336616Z] Finished run-tests step (success)
[task 2021-12-11T00:44:41.336Z] 00:44:41     INFO - Running post-run listener: _resource_record_post_run
[task 2021-12-11T00:44:41.427Z] 00:44:41     INFO - Validating Perfherder data against /builds/worker/workspace/mozharness/external_tools/performance-artifact-schema.json
[task 2021-12-11T00:44:41.430Z] 00:44:41     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"name": "gtest.gtest.overall", "extraOptions": ["taskcluster-m5.xlarge"], "subtests": [{"name": "cpu_percent", "value": 13.62300805729633}, {"name": "io_write_bytes", "value": 1887866880}, {"name": "io.read_bytes", "value": 36864}, {"name": "io_write_time", "value": 143684}, {"name": "io_read_time", "value": 16}]}, {"name": "gtest.gtest.start-pulseaudio", "subtests": [{"name": "time", "value": 0.017714738845825195}]}, {"name": "gtest.gtest.install", "subtests": [{"name": "time", "value": 11.263304948806763}, {"name": "cpu_percent", "value": 25.2}]}, {"name": "gtest.gtest.stage-files", "subtests": [{"name": "time", "value": 0.18117165565490723}]}, {"name": "gtest.gtest.run-tests", "subtests": [{"name": "time", "value": 1106.8388485908508}, {"name": "cpu_percent", "value": 13.491628959276017}]}]}
[task 2021-12-11T00:44:41.430Z] 00:44:41     INFO - Total resource usage - Wall time: 1118s; CPU: 13%; Read bytes: 36864; Write bytes: 1887866880; Read time: 16; Write time: 143684
[task 2021-12-11T00:44:41.430Z] 00:44:41     INFO - TinderboxPrint: CPU usage<br/>13.5%
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: I/O read bytes / time<br/>36,864 / 16
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: I/O write bytes / time<br/>1,887,866,880 / 143,684
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: CPU idle<br/>3,840.8 (86.4%)
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: CPU system<br/>235.7 (5.3%)
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: CPU user<br/>359.1 (8.1%)
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2021-12-11T00:44:41.431Z] 00:44:41     INFO - install - Wall time: 11s; CPU: 25%; Read bytes: 0; Write bytes: 16384; Read time: 0; Write time: 16
[task 2021-12-11T00:44:41.432Z] 00:44:41     INFO - stage-files - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2021-12-11T00:44:41.439Z] 00:44:41     INFO - run-tests - Wall time: 1107s; CPU: 13%; Read bytes: 36864; Write bytes: 1887850496; Read time: 16; Write time: 143668
[task 2021-12-11T00:44:41.602Z] 00:44:41  WARNING - returning nonzero exit status 2
[task 2021-12-11T00:44:41.627Z] cleanup
[task 2021-12-11T00:44:41.628Z] + cleanup
[task 2021-12-11T00:44:41.628Z] + local rv=2
[task 2021-12-11T00:44:41.628Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2021-12-11T00:44:41.628Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2021-12-11T00:44:41.629Z] + '[' ']'
[task 2021-12-11T00:44:41.629Z] + true
[task 2021-12-11T00:44:41.629Z] + cleanup_xvfb
[task 2021-12-11T00:44:41.629Z] ++ pidof Xvfb
[task 2021-12-11T00:44:41.640Z] + local xvfb_pid=50
[task 2021-12-11T00:44:41.640Z] + local vnc=false
[task 2021-12-11T00:44:41.640Z] + local interactive=false
[task 2021-12-11T00:44:41.640Z] + '[' -n 50 ']'
[task 2021-12-11T00:44:41.640Z] + [[ false == false ]]
[task 2021-12-11T00:44:41.640Z] + [[ false == false ]]
[task 2021-12-11T00:44:41.641Z] + kill 50
[task 2021-12-11T00:44:41.641Z] + screen -XS xvfb quit
[task 2021-12-11T00:44:41.647Z] + exit 2
[taskcluster 2021-12-11 00:44:41.944Z] === Task Finished ===
[taskcluster 2021-12-11 00:44:43.494Z] Unsuccessful task run with exit code: 2 completed in 1220.979 seconds
Whiteboard: [test disabled on android][stockwell unknown] → [test disabled on android][stockwell needswork:owner]

Hi Bryce! As the owner of this component, could you help us assign it to someone?
Thank you.

Flags: needinfo?(bvandyk)

Is this failing on any GMP related tests? I'm having trouble spotting what the thing busting this at the moment is.

It sounds like the failure message we get is generic and the cause has shifted over time. Is there value into making this into a meta and storing it in a testing component, and then opening blockers for individual tests in specific components as they arise?

Flags: needinfo?(bvandyk)

There have been 43 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • android-em-7-0-x86_64-qr
  • linux1804-32-qr
  • linux1804-32-shippable-qr
  • linux1804-64-asan-qr
  • linux1804-64-qr
  • macosx1015-64-qr
  • windows10-64-2004-asan-qr
  • windows10-64-2004-qr

There have been 35 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • linux1804-32-qr
  • linux1804-64-asan-qr
  • linux1804-64-qr
  • macosx1015-64-qr
  • windows10-64-2004-asan-qr
  • windows10-64-2004-qr

(In reply to Sotaro Ikeda [:sotaro] from comment #356)

I am going to look into timeout of WinWindowOcclusionTrackerInteractiveTest.OcclusionAfterVisibilityToggle.

That seems to be running reliably now.

Flags: needinfo?(sotaro.ikeda.g)
Depends on: 1775317
Whiteboard: [test disabled on android][stockwell unknown] → [test disabled on android][stockwell needswork:owner]

There are 68 total failures in the last 7 days

  • windows10-64-2004-qr opt
  • linux1804-64-qr opt and debug
  • linux1804-32-qr debug
  • linux1804-32-shippable-qr opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=382792934&repo=autoland&lineNumber=32929

[task 2022-06-28T20:15:18.607Z] 20:15:18     INFO -  TEST-START | TestAudioTrackGraph.SetInputChannelCountBeforeAudioCallbackDriver
[task 2022-06-28T20:20:18.643Z] 20:20:18     INFO -  gtest INFO | gtest | process wait complete, returncode=-15
[task 2022-06-28T20:20:18.643Z] 20:20:18  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2022-06-28T20:20:18.643Z] 20:20:18     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2022-06-28T20:20:18.684Z] 20:20:18    ERROR - Return code: 1
[task 2022-06-28T20:20:18.685Z] 20:20:18    ERROR - No tests run or test summary not found
[task 2022-06-28T20:20:18.685Z] 20:20:18     INFO - TinderboxPrint: gtest-gtest<br/><em class="testfail">T-FAIL</em>
[task 2022-06-28T20:20:18.685Z] 20:20:18  WARNING - # TBPL FAILURE #
[task 2022-06-28T20:20:18.686Z] 20:20:18  WARNING - setting return code to 2
[task 2022-06-28T20:20:18.686Z] 20:20:18  WARNING - The gtest suite: gtest ran with return status: FAILURE
[task 2022-06-28T20:20:18.687Z] 20:20:18     INFO - Running post-action listener: _package_coverage_data
[task 2022-06-28T20:20:18.687Z] 20:20:18     INFO - Running post-action listener: _resource_record_post_action
[task 2022-06-28T20:20:18.688Z] 20:20:18     INFO - Running post-action listener: process_java_coverage_data
[task 2022-06-28T20:20:18.688Z] 20:20:18     INFO - [mozharness: 2022-06-28 20:20:18.688178Z] Finished run-tests step (success)
[task 2022-06-28T20:20:18.688Z] 20:20:18     INFO - Running post-run listener: _resource_record_post_run
[task 2022-06-28T20:20:18.758Z] 20:20:18     INFO - Validating Perfherder data against /builds/worker/workspace/mozharness/external_tools/performance-artifact-schema.json
[task 2022-06-28T20:20:18.762Z] 20:20:18     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"name": "gtest.gtest.overall", "extraOptions": ["taskcluster-projects/887720501152/machineTypes/n2-standard-2"], "subtests": [{"name": "cpu_percent", "value": 14.864298245614018}, {"name": "io_write_bytes", "value": 1554890752}, {"name": "io.read_bytes", "value": 12804096}, {"name": "io_write_time", "value": 141776}, {"name": "io_read_time", "value": 4120}]}, {"name": "gtest.gtest.start-pulseaudio", "subtests": [{"name": "time", "value": 0.02722620964050293}]}, {"name": "gtest.gtest.install", "subtests": [{"name": "time", "value": 11.408473253250122}, {"name": "cpu_percent", "value": 50.339999999999996}]}, {"name": "gtest.gtest.stage-files", "subtests": [{"name": "time", "value": 0.22133088111877441}]}, {"name": "gtest.gtest.run-tests", "subtests": [{"name": "time", "value": 559.0523316860199}, {"name": "cpu_percent", "value": 14.090322580645145}]}]}
[task 2022-06-28T20:20:18.762Z] 20:20:18     INFO - Total resource usage - Wall time: 571s; CPU: 14%; Read bytes: 12804096; Write bytes: 1554890752; Read time: 4120; Write time: 141776
[task 2022-06-28T20:20:18.764Z] 20:20:18     INFO - TinderboxPrint: CPU usage<br/>14.1%
[task 2022-06-28T20:20:18.764Z] 20:20:18     INFO - TinderboxPrint: I/O read bytes / time<br/>12,804,096 / 4,120
[task 2022-06-28T20:20:18.764Z] 20:20:18     INFO - TinderboxPrint: I/O write bytes / time<br/>1,554,890,752 / 141,776
[task 2022-06-28T20:20:18.764Z] 20:20:18     INFO - TinderboxPrint: CPU idle<br/>965.3 (85.1%)
[task 2022-06-28T20:20:18.764Z] 20:20:18     INFO - TinderboxPrint: CPU system<br/>44.5 (3.9%)
[task 2022-06-28T20:20:18.765Z] 20:20:18     INFO - TinderboxPrint: CPU user<br/>122.3 (10.8%)
[task 2022-06-28T20:20:18.765Z] 20:20:18     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2022-06-28T20:20:18.766Z] 20:20:18     INFO - start-pulseaudio - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-06-28T20:20:18.767Z] 20:20:18     INFO - install - Wall time: 11s; CPU: 50%; Read bytes: 0; Write bytes: 155648; Read time: 0; Write time: 0
[task 2022-06-28T20:20:18.767Z] 20:20:18     INFO - stage-files - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-06-28T20:20:18.768Z] 20:20:18     INFO - run-tests - Wall time: 559s; CPU: 14%; Read bytes: 12804096; Write bytes: 1554735104; Read time: 4120; Write time: 141776
[task 2022-06-28T20:20:18.828Z] 20:20:18  WARNING - returning nonzero exit status 2
[task 2022-06-28T20:20:18.852Z] cleanup
[task 2022-06-28T20:20:18.853Z] + cleanup
[task 2022-06-28T20:20:18.853Z] + local rv=2
[task 2022-06-28T20:20:18.853Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2022-06-28T20:20:18.853Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2022-06-28T20:20:18.859Z] + '[' ']'
[task 2022-06-28T20:20:18.859Z] + true
[task 2022-06-28T20:20:18.859Z] + cleanup_xvfb
[task 2022-06-28T20:20:18.860Z] ++ pidof Xvfb
[task 2022-06-28T20:20:18.864Z] + local xvfb_pid=49
[task 2022-06-28T20:20:18.864Z] + local vnc=false
[task 2022-06-28T20:20:18.864Z] + local interactive=false
[task 2022-06-28T20:20:18.864Z] + '[' -n 49 ']'
[task 2022-06-28T20:20:18.865Z] + [[ false == false ]]
[task 2022-06-28T20:20:18.865Z] + [[ false == false ]]
[task 2022-06-28T20:20:18.865Z] + kill 49
[task 2022-06-28T20:20:18.866Z] + screen -XS xvfb quit
[task 2022-06-28T20:20:18.927Z] No screen session found.
[task 2022-06-28T20:20:18.928Z] + true
[task 2022-06-28T20:20:18.928Z] + exit 2
[taskcluster 2022-06-28 20:20:19.370Z] === Task Finished ===
[taskcluster 2022-06-28 20:20:20.821Z] Unsuccessful task run with exit code: 2 completed in 684.729 seconds

Geoff, are you working on this?

Flags: needinfo?(gbrown)
Depends on: 1777369
Depends on: 1777371

I have filed some bugs for specific causes. Otherwise, I'm not working on this.

Flags: needinfo?(gbrown)

Two of the most frequent timeouts here are on:

Jim, could you please redirect this to someone who can take a look over these failures? Thank you.

Flags: needinfo?(jmathies)

It looks like backfills lead me to your push and also it looks like this is perma now, Andrew can you please take a look?
Backfills

Flags: needinfo?(aosmond)
Summary: Intermittent gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300|1200|1800 seconds → Perma gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300|1200|1800 seconds

Specifically it appears it is bug 1777287 and the exposure of the OffscreenCanvasRenderingContext2D text APIs. If I stop exposing it on WebIDL, but leave the other changes, it passes:
https://treeherder.mozilla.org/jobs?repo=try&revision=2ed9b2160e94f32bfe123e2d5466ac207e27dfd8
https://hg.mozilla.org/try/rev/cc05a51902d9fc658c0f06ba012970fdedc9df25

Narrowing further, it appears that if I change it so there is no functional change besides the exposure of the WebIDL, the problem remains. Breaking up the WebIDL issue into pieces resolves the problem individually, so it is only in aggregate that it is a problem. I don't think this is me. I'm going out on a limb and blame the compiler. Given it is passing again on autoland/mozilla-central again, there were no canvas related changes to have fixed it, and the gtests don't even use canvas/FontFaceSet, I will say it is unrelated to bug 1777287.

Flags: needinfo?(aosmond)
Whiteboard: [test disabled on android][stockwell disable-recommended] → [test disabled on android][stockwell needswork:owner]

(In reply to Andrew Osmond [:aosmond] (he/him) -- PTO Jul 16-25 from comment #409)

Given it is passing again on autoland/mozilla-central again

It's not passing on autoland.

Christian, do you know who could look into these permanent failures?

Flags: needinfo?(choller)

(In reply to Andrew Osmond [:aosmond] (he/him) -- PTO Jul 16-25 from comment #409)

Narrowing further, it appears that if I change it so there is no functional change besides the exposure of the WebIDL, the problem remains. Breaking up the WebIDL issue into pieces resolves the problem individually, so it is only in aggregate that it is a problem. I don't think this is me. I'm going out on a limb and blame the compiler. Given it is passing again on autoland/mozilla-central again, there were no canvas related changes to have fixed it, and the gtests don't even use canvas/FontFaceSet, I will say it is unrelated to bug 1777287.

The addition of a single line to a webidl file is enough to cause this problem from aosmond's try pushes. So maybe it's a problem with the webidl compiler or something? Someone who might be able to investigate along those lines might be where to look.

Edgar, could you investigate these Windows asan GTest-1proc timeouts which are related to WebIDL changes (see previous comments)?

Flags: needinfo?(echen)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #414)

Edgar, could you investigate these Windows asan GTest-1proc timeouts which are related to WebIDL changes (see previous comments)?

I tried to debug this, but even I backouted the webidl changes from bug 1777287, the GTest is still failing, https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&searchStr=windows%2C10%2Cx64%2C2004%2Casan%2Cwebrender%2Copt%2Ctest-windows10-64-2004-asan-qr%2Fopt-gtest-1proc%2Cgtest-1proc&revision=08d81077df9acb615e839c226af381f5d5782d82, I am not sure why I got different result as Andrew's try push, but it seems unrelated to webidl. Maybe we could have someone who might be familiar with the test to shed some light?

Flags: needinfo?(echen)

Jim, could you find someone to look into this timeout when GeckoMediaPlugins.RemoveAndDeleteForcedInUse runs?

(In reply to Edgar Chen [:edgar] from comment #416)

I tried to debug this, but even I backouted the webidl changes from bug 1777287, the GTest is still failing, https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&searchStr=windows%2C10%2Cx64%2C2004%2Casan%2Cwebrender%2Copt%2Ctest-windows10-64-2004-asan-qr%2Fopt-gtest-1proc%2Cgtest-1proc&revision=08d81077df9acb615e839c226af381f5d5782d82, I am not sure why I got different result as Andrew's try push

Andrew's try push is based on the revision where his patch landed on jul 8. If you back out Andrews patch completely on a tip from jul 13 (after jul 13 a patch landed which causes conflicts in the backout) then the gtest is still failing. So something else that landed after causes them to fail as well. Bisecting this (first changeset to fail on autoload with andrews original patch backed out) comes up with the first push that touches a webidl file:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=6b061e6ac1deec4a951de1254f43375a49537e8c

So it feels like adding anything to a webidl after jul 8 hits some kind of limit that causes the gtest suite to fail for some reason.

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #412)

Christian, do you know who could look into these permanent failures?

This just looks like OOM and it is not ASan specific. Probably whoever owns the test should look into this or we should disable it.

Flags: needinfo?(choller)
Assignee: nobody → apavel
Status: NEW → ASSIGNED
Assignee: apavel → nobody
Status: ASSIGNED → NEW
Keywords: leave-open
Whiteboard: [test disabled on android][stockwell disable-recommended] → [test disabled on android][stockwell disabled]
Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cc4afeef08cc disable GeckoMediaPlugins.RemoveAndDeleteForcedInUse on win64 r=intermittent-reviewers,jmaher DONTBUILD

(In reply to Christian Holler (:decoder) from comment #421)

This just looks like OOM and it is not ASan specific.

The non-ASAN failures are in a completely different test, so looks like misclassified as the same issue.

(In reply to Timothy Nikkel (:tnikkel) from comment #424)

(In reply to Christian Holler (:decoder) from comment #421)

This just looks like OOM and it is not ASan specific.

The non-ASAN failures are in a completely different test, so looks like misclassified as the same issue.

I checked the failures and they are under TestAudioTrackGraph.SetInputChannelCountBeforeAudioCallbackDriver
A larger number of the failures also have bug 1731012 before the timeout, however there are also failures that have only the timeout under this test.

Looks like the disable patch is not working.
Joel, any idea what we could tweak here to get it to stick?

Flags: needinfo?(jmaher)

disabling the next test yields green: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=6182ac552f7db258a9383245e91077790836249b

I tried disabling other tests and those didn't yield green, so I think we just need ^

Flags: needinfo?(jmaher)
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3005d6e6b3a8 disable RemoveAndDeleteDeferredInUse to make gtest on win64:asan green again. r=intermittent-reviewers,MasterWayZ
Flags: needinfo?(jmathies)
Severity: normal → S3

There have been 31 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • linux1804-64-ccov-qr
  • windows10-64-2004-asan-qr
  • windows10-64-2004-qr
[task 2022-10-20T17:08:56.011Z] 17:08:56     INFO -  TEST-START | UtilityProcess.LaunchProcess
[task 2022-10-20T17:13:57.174Z] 17:13:57     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2022-10-20T17:13:57.187Z] 17:13:57  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
[task 2022-10-20T17:13:57.188Z] 17:13:57     INFO -  gtest INFO | rungtests.py exits with code 1
[task 2022-10-20T17:13:57.292Z] 17:13:57    ERROR - Return code: 1
[task 2022-10-20T17:13:57.297Z] 17:13:57    ERROR - No tests run or test summary not found

Many of these timeouts are hangs during the execution of TestAudioCallbackDriver.SlowStart (log). This got frequent around Oct 14. Paul, could you have a look or redirect this request?

Flags: needinfo?(padenot)

Andreas, do you have any idea what could have happened here? Otherwise, I'm going to add printfs to try to understand what's up.

Flags: needinfo?(padenot) → needinfo?(apehrson)

Looks like the MockCubebStream could get thawed while we wait for this. Then we hang waiting for the (still frozen) MockCubebStream to start.

If the gtest runner could kill us in a way that stacks of all threads are captured on a timeout like this, solving bugs like this one would be a lot easier.

Flags: needinfo?(apehrson)
Depends on: 1797104
Depends on: 1797445

There should be an improvement here following the recent landings of these commits:
https://hg.mozilla.org/mozilla-central/rev/7e19af2c6374
https://hg.mozilla.org/mozilla-central/rev/af2063a04186

Depends on: 1922205
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: