Closed Bug 1400716 Opened 7 years ago Closed 6 years ago

Intermittent Test harness output was not a valid structured log message: DON'T USE FOR WPT CLASSIFICATION IF MANY FAILURES

Categories

(Testing :: web-platform-tests, defect, P5)

Version 3
defect

Tracking

(firefox59 fixed)

RESOLVED INCOMPLETE
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Whiteboard: [stockwell fixed:other])

Attachments

(2 files)

This failure became much more frequent around Nov 21. There's may be some correlation with https://bugzilla.mozilla.org/show_bug.cgi?id=1400716, so ni jgraham in case he knows what this is about.
Flags: needinfo?(james)
we see this after the browser terminates:
00:10:01     INFO - Got 0 unexpected results
00:10:01     INFO - SUITE-END | took 808s
00:10:01     INFO - Closing logging queue
00:10:01     INFO - queue closed
00:10:01 CRITICAL - Test harness output was not a valid structured log message: 
00:10:01 CRITICAL - Traceback (most recent call last):
00:10:01 CRITICAL - Test harness output was not a valid structured log message: 
00:10:01 CRITICAL -   File "c:\mozilla-build\python\Lib\multiprocessing\queues.py", line 264, in _feed
00:10:01 CRITICAL - Test harness output was not a valid structured log message: 
00:10:01 CRITICAL -     send(obj)
00:10:01 CRITICAL - Test harness output was not a valid structured log message: 
00:10:01 CRITICAL - IOError: [Errno 232] The pipe is being closed
00:10:01     INFO - Return code: 0
00:10:01 CRITICAL - # TBPL FAILURE #
00:10:01  WARNING - setting return code to 2
00:10:01     INFO - Running post-action listener: _package_coverage_data
00:10:01     INFO - Running post-action listener: _resource_record_post_action

this happens primarily on windows 10 (all configs), but looks to have good representation on linux as well.
Yeah, so something is trying to write to a pipe that's closed sometime in the shutdown sequence. It's almost certainly totally harmless, but also hard to diagnose since we don't get a useful traceback or anything.

My first guess-based attempt at a fix didn't help: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a68e61e596bc743c8f289106f86ca8b5d2861f9&selectedJob=147889632
:jgraham, can you please look into this issue?
AutomatedTester was going to take a look at this.

I think that what's going on is a queue somewhere ends up with unprocessed messages during shutdown e.g. https://stackoverflow.com/questions/19070638/python-multiprocessing-ioerror-errno-232-the-pipe-is-being-closed has a similar error. I'm not exactly sure how to debug it; you might want to see if it's possible to get some extra logging in the queue so you can tell which one is failing.
Flags: needinfo?(james) → needinfo?(dburns)
There are 72 failures in the last 7 days. The test is failing mostly on windows 10-64 and Linux. A spike was recorded for 12th of December.

Recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=151571802

00:06:35     INFO - Closing logging queue
00:06:35     INFO - queue closed
00:06:35 CRITICAL - Test harness output was not a valid structured log message: 
00:06:35 CRITICAL - Traceback (most recent call last):
00:06:35 CRITICAL - Test harness output was not a valid structured log message: 
00:06:35 CRITICAL -   File "c:\mozilla-build\python\Lib\multiprocessing\queues.py", line 264, in _feed
00:06:35 CRITICAL - Test harness output was not a valid structured log message: 
00:06:35 CRITICAL -     send(obj)
Comment on attachment 8938023 [details]
Bug 1400716 - Explicitly drain all queues on exit,

https://reviewboard.mozilla.org/r/208758/#review214856
Attachment #8938023 - Flags: review?(mjzffr) → review+
Pushed by mjzffr@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/0ea252bf357c
Explicitly drain all queues on exit, r=maja_zf
https://hg.mozilla.org/integration/autoland/rev/38a8726fbde2
Fixup call to traceback, r=maja_zf
https://hg.mozilla.org/mozilla-central/rev/0ea252bf357c
https://hg.mozilla.org/mozilla-central/rev/38a8726fbde2
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Recent log: https://treeherder.mozilla.org/logviewer.html#?job_id=153377300&repo=mozilla-central&lineNumber=21120

[task 2017-12-26T13:03:03.248Z] 13:03:03     INFO - Browser exited with return code 0
[task 2017-12-26T13:03:03.249Z] 13:03:03  WARNING - u'runner_teardown': ()
[task 2017-12-26T13:03:03.251Z] 13:03:03     INFO - Got 0 unexpected results
[task 2017-12-26T13:03:03.251Z] 13:03:03     INFO - SUITE-END | took 603s
[task 2017-12-26T13:03:03.312Z] 13:03:03     INFO - Closing logging queue
[task 2017-12-26T13:03:03.313Z] 13:03:03     INFO - queue closed
[task 2017-12-26T13:03:03.733Z] 13:03:03 CRITICAL - Test harness output was not a valid structured log message: 
[task 2017-12-26T13:03:03.734Z] 13:03:03 CRITICAL - Exception
[task 2017-12-26T13:03:03.755Z] 13:03:03     INFO - Return code: 0
[task 2017-12-26T13:03:03.755Z] 13:03:03 CRITICAL - # TBPL FAILURE #
[task 2017-12-26T13:03:03.755Z] 13:03:03  WARNING - setting return code to 2
[task 2017-12-26T13:03:03.756Z] 13:03:03     INFO - Running post-action listener: _package_coverage_data
[task 2017-12-26T13:03:03.756Z] 13:03:03     INFO - Running post-action listener: _resource_record_post_action
[task 2017-12-26T13:03:03.756Z] 13:03:03     INFO - [mozharness: 2017-12-26 13:03:03.756473Z] Finished run-tests step (success)
[task 2017-12-26T13:03:03.756Z] 13:03:03     INFO - Running post-run listener: _resource_record_post_run
[task 2017-12-26T13:03:03.821Z] 13:03:03     INFO - Total resource usage - Wall time: 626s; CPU: 37.0%; Read bytes: 17072128; Write bytes: 2357915648; Read time: 408; Write time: 179648
[task 2017-12-26T13:03:03.821Z] 13:03:03     INFO - TinderboxPrint: CPU usage<br/>36.9%
[task 2017-12-26T13:03:03.822Z] 13:03:03     INFO - TinderboxPrint: I/O read bytes / time<br/>17,072,128 / 408
[task 2017-12-26T13:03:03.822Z] 13:03:03     INFO - TinderboxPrint: I/O write bytes / time<br/>2,357,915,648 / 179,648
[task 2017-12-26T13:03:03.823Z] 13:03:03     INFO - TinderboxPrint: CPU idle<br/>1,541.6 (63.0%)
[task 2017-12-26T13:03:03.823Z] 13:03:03     INFO - TinderboxPrint: CPU system<br/>82.2 (3.4%)
[task 2017-12-26T13:03:03.823Z] 13:03:03     INFO - TinderboxPrint: CPU user<br/>809.3 (33.1%)
[task 2017-12-26T13:03:03.823Z] 13:03:03     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2017-12-26T13:03:03.824Z] 13:03:03     INFO - pull - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2017-12-26T13:03:03.825Z] 13:03:03     INFO - install - Wall time: 18s; CPU: 28.0%; Read bytes: 7135232; Write bytes: 723902464; Read time: 128; Write time: 104980
[task 2017-12-26T13:03:03.832Z] 13:03:03     INFO - run-tests - Wall time: 608s; CPU: 37.0%; Read bytes: 9412608; Write bytes: 1628295168; Read time: 256; Write time: 72524
[task 2017-12-26T13:03:04.037Z] 13:03:04     INFO - Running post-run listener: _upload_blobber_files
[task 2017-12-26T13:03:04.037Z] 13:03:04  WARNING - Blob upload gear skipped. Missing cmdline options.
[task 2017-12-26T13:03:04.038Z] 13:03:04     INFO - Running post-run listener: copy_logs_to_upload_dir
[task 2017-12-26T13:03:04.038Z] 13:03:04     INFO - Copying logs to upload dir...
[task 2017-12-26T13:03:04.038Z] 13:03:04     INFO - mkdir: /builds/worker/workspace/build/upload/logs
[task 2017-12-26T13:03:04.047Z] 13:03:04     INFO - Copying logs to upload dir...
[task 2017-12-26T13:03:04.056Z] 13:03:04  WARNING - returning nonzero exit status 2
[task 2017-12-26T13:03:04.079Z] cleanup
[task 2017-12-26T13:03:04.079Z] + cleanup
[task 2017-12-26T13:03:04.079Z] + local rv=2
[task 2017-12-26T13:03:04.079Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2017-12-26T13:03:04.079Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2017-12-26T13:03:04.083Z] + true
[task 2017-12-26T13:03:04.083Z] + cleanup_xvfb
[task 2017-12-26T13:03:04.083Z] pidof Xvfb
[task 2017-12-26T13:03:04.083Z] ++ pidof Xvfb
[task 2017-12-26T13:03:04.086Z] + local xvfb_pid=25
[task 2017-12-26T13:03:04.086Z] + local vnc=false
[task 2017-12-26T13:03:04.087Z] + local interactive=false
[task 2017-12-26T13:03:04.087Z] + '[' -n 25 ']'
[task 2017-12-26T13:03:04.087Z] + [[ false == false ]]
[task 2017-12-26T13:03:04.087Z] + [[ false == false ]]
[task 2017-12-26T13:03:04.087Z] + kill 25
[task 2017-12-26T13:03:04.087Z] + screen -XS xvfb quit
[task 2017-12-26T13:03:04.087Z] XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
[task 2017-12-26T13:03:04.087Z]       after 17073 requests (17073 known processed) with 0 events remaining.
[task 2017-12-26T13:03:04.087Z] compizconfig - Info: Backend     : ini
[task 2017-12-26T13:03:04.087Z] compizconfig - Info: Integration : true
[task 2017-12-26T13:03:04.087Z] compizconfig - Info: Profile     : default
[task 2017-12-26T13:03:04.128Z] No screen session found.
[task 2017-12-26T13:03:04.128Z] + true
[task 2017-12-26T13:03:04.128Z] + exit 2
[taskcluster 2017-12-26 13:03:04.328Z] === Task Finished ===
[taskcluster 2017-12-26 13:03:14.031Z] Unsuccessful task run with exit code: 2 completed in 979.6 seconds
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [stockwell disable-recommended] → [stockwell fixed:other]
Flags: needinfo?(dburns)
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=194293262&repo=mozilla-central&lineNumber=21637
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Setting dom.serviceWorkers.parent_intercept=true makes this permafail (or close to it) in Linux64/debug chunk 14:
https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=usercancel%2Cretry%2Cpending%2Crunning%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&classifiedState=unclassified&group_state=expanded&revision=7b5211adf4b48fd0dbee818b7d2c8d38dac4e880

Consequently, this will block getting WPT running with the serviceworker e10s refactor enabled.
On closer look, it looks like something different is happening in the issue I'm seeing. The tasks starred here at least start running their tests whereas the issue I'm seeing is in the setup somewhere.
No longer blocks: 1470266
Summary: Intermittent Test harness output was not a valid structured log message: → Intermittent Test harness output was not a valid structured log message: DON'T USE FOR WPT CLASSIFICATION IF MANY FAILURES
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: