Closed Bug 1629789 Opened 5 years ago Closed 5 years ago

Perma Linux x64 QR debug W-fis(wpt5) fix-stacks error: failed to read breakpad symbols dir `/builds/worker/workspace/build/symbols/libpthread.so.0` for `/lib/x86_64-linux-gnu/libpthread.so.0`

Categories

(Toolkit :: Crash Reporting, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla77
Tracking Status
firefox-esr68 --- unaffected
firefox74 --- unaffected
firefox75 --- unaffected
firefox76 --- unaffected
firefox77 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: n.nethercote)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=296760839&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Vmhke1sDT1677ACYixLypg/runs/0/artifacts/public/logs/live_backing.log


This shows up in at least Linux x64 18.04 Quantumrender with wpt5 fission enabled since bug 1622701 landed.

[task 2020-04-08T08:00:50.172Z] 08:00:50 INFO - TEST-TIMEOUT | /printing/print-microtask-after-navigate.html | took 30791ms
[task 2020-04-08T08:00:50.236Z] 08:00:50 INFO - PID 12487 | 1586332850227 Marionette INFO Stopped listening on port 59336
[task 2020-04-08T08:00:50.821Z] 08:00:50 INFO - PID 12487 | [GPU 12584, Compositor] WARNING: Possibly dropping task posted to updater thread: file /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZUpdater.cpp, line 371
[task 2020-04-08T08:00:50.889Z] 08:00:50 INFO - PID 12487 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-08T08:00:50.966Z] 08:00:50 INFO - PID 12487 | [2020-04-08T08:00:50Z WARN xulstore::persist] tried to remove key that isn't in the store
[task 2020-04-08T08:00:50.966Z] 08:00:50 INFO - PID 12487 | [2020-04-08T08:00:50Z WARN xulstore::persist] tried to remove key that isn't in the store
[task 2020-04-08T08:00:51.043Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/netwerk/base/nsNetUtil.cpp, line 2307
[task 2020-04-08T08:00:51.044Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 398
[task 2020-04-08T08:00:51.045Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 420
[task 2020-04-08T08:00:51.045Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/netwerk/base/nsNetUtil.cpp, line 2307
[task 2020-04-08T08:00:51.045Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 398
[task 2020-04-08T08:00:51.046Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 420
[task 2020-04-08T08:00:51.046Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/netwerk/base/nsNetUtil.cpp, line 2307
[task 2020-04-08T08:00:51.047Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 398
[task 2020-04-08T08:00:51.047Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 420
[task 2020-04-08T08:00:51.048Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/netwerk/base/nsNetUtil.cpp, line 2307
[task 2020-04-08T08:00:51.048Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 398
[task 2020-04-08T08:00:51.049Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/caps/ContentPrincipal.cpp, line 420
[task 2020-04-08T08:00:51.050Z] 08:00:51 INFO - PID 12487 | [Child 12655, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp, line 3352
[task 2020-04-08T08:00:51.071Z] 08:00:51 INFO - PID 12487 | nsStringStats
[task 2020-04-08T08:00:51.072Z] 08:00:51 INFO - PID 12487 | => mAllocCount: 24334
[task 2020-04-08T08:00:51.072Z] 08:00:51 INFO - PID 12487 | => mReallocCount: 0
[task 2020-04-08T08:00:51.073Z] 08:00:51 INFO - PID 12487 | => mFreeCount: 24334
[task 2020-04-08T08:00:51.074Z] 08:00:51 INFO - PID 12487 | => mShareCount: 11651
[task 2020-04-08T08:00:51.074Z] 08:00:51 INFO - PID 12487 | => mAdoptCount: 447
[task 2020-04-08T08:00:51.075Z] 08:00:51 INFO - PID 12487 | => mAdoptFreeCount: 479
[task 2020-04-08T08:00:51.075Z] 08:00:51 INFO - PID 12487 | => Process ID: 12655, Thread ID: 139705266898816
[task 2020-04-08T08:01:05.852Z] 08:01:05 INFO - PID 12487 | console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.jsm", 433))
[task 2020-04-08T08:01:54.634Z] 08:01:54 INFO - PID 12487 | XPCOM_MEM_BLOAT_LOG: /tmp/tmpsvOl6e/runtests_leaks_986.log
[task 2020-04-08T08:01:54.634Z] 08:01:54 INFO - PID 12487 | Writing to log: /tmp/tmpsvOl6e/runtests_leaks_986.log
[task 2020-04-08T08:01:54.635Z] 08:01:54 INFO - PID 12487 | Hit MOZ_CRASH(Shutdown hanging before starting.) at /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:203
[task 2020-04-08T08:01:54.651Z] 08:01:54 INFO - STDOUT: Initializing stack-fixing for the first stack frame, this may take a while...
[task 2020-04-08T08:01:54.679Z] 08:01:54 INFO - PID 12487 | #01: _pt_root [hg:hg.mozilla.org/integration/autoland:nsprpub/pr/src/pthreads/ptthread.c:fc7f46174390746803126bbd7c70c9451f8051c0:204]
[task 2020-04-08T08:01:54.679Z] 08:01:54 CRITICAL - Test harness output was not a valid structured log message:
[task 2020-04-08T08:01:54.679Z] 08:01:54 CRITICAL - fix-stacks error: failed to read breakpad symbols dir /builds/worker/workspace/build/symbols/libpthread.so.0 for /lib/x86_64-linux-gnu/libpthread.so.0
[task 2020-04-08T08:01:54.680Z] 08:01:54 CRITICAL - fix-stacks note: this is expected and harmless for system libraries on debug automation runs
[task 2020-04-08T08:01:54.680Z] 08:01:54 INFO - PID 12487 | #02: ??? [/lib/x86_64-linux-gnu/libpthread.so.0 + 0x76db]
[task 2020-04-08T08:01:54.680Z] 08:01:54 CRITICAL - Test harness output was not a valid structured log message:
[task 2020-04-08T08:01:54.680Z] 08:01:54 CRITICAL - fix-stacks error: failed to read breakpad symbols dir /builds/worker/workspace/build/symbols/libc.so.6 for /lib/x86_64-linux-gnu/libc.so.6
[task 2020-04-08T08:01:54.680Z] 08:01:54 CRITICAL - fix-stacks note: this is expected and harmless for system libraries on debug automation runs
[task 2020-04-08T08:01:54.681Z] 08:01:54 INFO - PID 12487 | #03: clone [/lib/x86_64-linux-gnu/libc.so.6 + 0x12188f]
[task 2020-04-08T08:01:54.681Z] 08:01:54 INFO - PID 12487 | #04: ??? (???:???)
[task 2020-04-08T08:01:54.681Z] 08:01:54 INFO - PID 12487 | ExceptionHandler::GenerateDump cloned child 13610
[task 2020-04-08T08:01:54.681Z] 08:01:54 INFO - PID 12487 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2020-04-08T08:01:54.682Z] 08:01:54 INFO - PID 12487 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2020-04-08T08:01:54.780Z] 08:01:54 INFO - PID 12487 | Exiting due to channel error.
[task 2020-04-08T08:01:54.782Z] 08:01:54 INFO - PID 12487 | Exiting due to channel error.
[task 2020-04-08T08:01:54.783Z] 08:01:54 INFO - PID 12487 | [GPU 12584, Main Thread] WARNING: Shutting down GPU process early due to a crash!: file /builds/worker/checkouts/gecko/gfx/ipc/GPUParent.cpp, line 519
[task 2020-04-08T08:01:54.783Z] 08:01:54 INFO - PID 12487 | Exiting due to channel error.
[task 2020-04-08T08:01:54.848Z] 08:01:54 INFO - Browser exited with return code 11

Has Regression Range: --- → yes
Keywords: regression

First of all, this is a tier 2 job, and the job mentioned in comment 0 actually still shows up as green ("Result: success") on Treeherder. So I'm not sure why this bug was filed.

Here is the relevant part of the output:

[task 2020-04-08T08:01:54.679Z] 08:01:54 CRITICAL - Test harness output was not a valid structured log message: 
[task 2020-04-08T08:01:54.679Z] 08:01:54 CRITICAL - fix-stacks error: failed to read breakpad symbols dir `/builds/worker/workspace/build/symbols/libpthread.so.0` for `/lib/x86_64-linux-gnu/libpthread.so.0`
[task 2020-04-08T08:01:54.680Z] 08:01:54 CRITICAL - fix-stacks note:  this is expected and harmless for system libraries on debug automation runs

As the third line indicates, this "error" isn't a problem, but it is printed to stderr and is apparently in a format that the web platform test harness doesn't like. Note also that this only happens when we've already crashed, so there would have been a subsequent failure anyway.

James, can you give me any pointers on "valid structure log messages" w.r.t. web platform tests? I'm not sure the best way to deal with this. I'd prefer to keep emitting the error to stderr if possible, because it can be useful.

Flags: needinfo?(n.nethercote) → needinfo?(james)

The wpt harness wraps stdio of child processes and re-logs that output as process_output mozlog messages [1]. This ends up in the raw log as JSON data which in the CI configuration is then written to stdout. That output ends up being parsed in mozharness in order to do things like count the failures and set the job status. If some other subprocess we don't know about starts writing to std[out|err] in a different format we are unable to process that output and end up with this error. In general having multiple processes writing to the same ouput stream seems quite likely to end up with corrupt output, so if we can find a way to avoid that it would be good. If this is all going through mozrunner [2], maybe we can have a way to pass in an output handler function to fixSymbols so we can process it like we do for Firefox output (c.f. [3])

[1] https://firefox-source-docs.mozilla.org/mozbase/mozlog.html#data-format
[2] https://searchfox.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/utils.py#227
[3] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#299

Flags: needinfo?(james)

(In reply to Nicholas Nethercote [:njn] from comment #1)

So I'm not sure why this bug was filed.

To get rid of it to prevent confusion if the task actually fails and has these additional failure lines.

Thank you both for the explanations!

I think the best path forward here is to augment fixSymbols with an optional argument: quiet=False. When set to true, it will suppress errors being printed to stderr. The web platform test harness can then set quiet=True.

I'm in the middle of another change relating to fix-stacks in bug 1628527, so this will have to wait on that.

Depends on: 1628527

Because the web platform test harness doesn't recognize them and annotates them
with CRITICAL, which is disconcerting, even though those errors are typically
not harmful.

Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED

jgraham: review ping!

Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9f796abfd8ca Suppress `fix-stacks` errors on web platform tests. r=erahm,jgraham
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23294 for changes under testing/web-platform/tests

Backed out for failures caused by unexpected keyword argument hide_error

backout: https://hg.mozilla.org/integration/autoland/rev/7b78aab460c6c5dc5c942d081ce1b729826ae07b

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=Sufu51hLR-K56215-JL6wg-0&revision=9f796abfd8cabad7f13f952f79713c93ab29e46b&searchStr=android%2C7.0%2Cx86-64%2Cdebug%2Creftests%2Ctest-android-em-7.0-x86_64%2Fdebug-geckoview-reftest-e10s-2%2Cr%28r2%29

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=299764020&repo=autoland&lineNumber=1522

[task 2020-04-28T11:23:44.858Z] 11:23:44 INFO - REFTEST TEST-START | layout/reftests/css-grid/grid-row-gap-003.html == layout/reftests/css-grid/grid-row-gap-003-ref.html
[task 2020-04-28T11:23:44.858Z] 11:23:44 INFO - Automation Error: Exception caught while running tests
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - Traceback (most recent call last):
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remotereftest.py", line 413, in run_test_harness
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - retVal = reftest.runTests(options.tests, options)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 616, in runTests
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - return self.runSerialTests(manifests, options, cmdargs)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 956, in runSerialTests
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - return run()
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 947, in run
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - **kwargs)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remotereftest.py", line 372, in runApp
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - e10s=options.e10s)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remoteautomation.py", line 82, in runApp
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - status = self.waitForFinish(timeout, maxTime)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remoteautomation.py", line 132, in waitForFinish
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - status = self.wait(timeout=maxTime, noOutputTimeout=timeout)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remoteautomation.py", line 362, in wait
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - hasOutput = self.read_stdout()
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/remoteautomation.py", line 308, in read_stdout
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - parsed_messages = self.messageLogger.write(line)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/output.py", line 146, in call
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - self.verbatim(line)
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - File "/builds/worker/workspace/build/tests/reftest/output.py", line 162, in verbatim
[task 2020-04-28T11:23:44.860Z] 11:23:44 INFO - line = self.stack_fixer_function(line)
[task 2020-04-28T11:23:44.861Z] 11:23:44 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/utils.py", line 257, in stack_fixer_function
[task 2020-04-28T11:23:44.861Z] 11:23:44 INFO - line, slowWarning=True, breakpadSymsDir=symbolsPath, hide_errors=hideErrors)
[task 2020-04-28T11:23:44.861Z] 11:23:44 INFO - TypeError: fixSymbols() got an unexpected keyword argument 'hide_errors'

Flags: needinfo?(n.nethercote)
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR was closed without merging

Argh, this is another problem caused by Android host utils. I'll have to land the host utils part first, then get the host utils updated, then land the non-host utils part.

Flags: needinfo?(n.nethercote)

Bug 1633901 has landed with the fix_stacks.py change. aerickson, sorry to do this to you again but could you update the Android host utils from revision https://hg.mozilla.org/mozilla-central/rev/6bb8423186c1 or later? Once that's done, I can land the attached patch.

Depends on: 1633901
No longer depends on: 1628527
Flags: needinfo?(aerickson)

No worries. :) Yeah, will do.

Flags: needinfo?(aerickson)
Depends on: 1634226
Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/034f61c238aa Suppress `fix-stacks` errors on web platform tests. r=erahm,jgraham
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
Upstream PR merged by moz-wptsync-bot
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: