Open Bug 1358898 (RunWatchdogShutdownhang) Opened 8 years ago Updated 4 days ago

[meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.

Categories

(Toolkit :: Async Tooling, defect)

defect

Tracking

()

Tracking Status
firefox55 + fixed
firefox56 + fixed

People

(Reporter: kats, Unassigned)

References

(Depends on 20 open bugs, Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [Comment 269][stockwell disable-recommended])

Crash Data

Attachments

(1 file, 2 obsolete files)

See list of bugs this is blocking.
Kats, since this is a frequent failure, could you take a look at this bug, thanks!
Flags: needinfo?(bugmail)
So this signature appears when Firefox kills itself because it's taking too long to shut down. I looked at the 5 most recent instances of this in the orangefactor link in comment 3: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=106188385&lineNumber=3102 https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106181195&lineNumber=3063 https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106175000&lineNumber=3066 https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106156528&lineNumber=3104 https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=106155867&lineNumber=3106 In three of the 5, the main thread is busy doing cycle collection when the watchdog decides it's been long enough and shuts down. In the other two the main thread seems to be doing something in the idle timer callback. I think the cycle collection owners should take the first crack at this, since from my sample >50% of these forced shutdowns are due to that. :mccr8, do you know who would be the right person to take a look?
Flags: needinfo?(bugmail) → needinfo?(continuation)
So, this isn't the cycle collector per se. It looks like we're shutting down threads, which causes us to run all of the events, which causes us to run AsyncFreeSnowWhite::Run, which destroys cycle collected objects. Maybe this runnable is getting starved out while we're running tests, so a ton of things accumulate? Maybe we can collect more aggressively when there's a lot to destroy. Do you have any ideas, Olli? It looks like this started happening more often, so presumably either your or Ehsan's work to do more work when idle is causing problems.
Flags: needinfo?(continuation) → needinfo?(bugs)
yeah, I think AsyncFreeSnowWhite could use idle+timeout, not only idle dispatch.
Attached patch idle dispatch with timeout (obsolete) — Splinter Review
Try to use idle time, but if not available, run at least once a second.
Flags: needinfo?(bugs)
Attachment #8877706 - Flags: review?(continuation)
Comment on attachment 8877706 [details] [diff] [review] idle dispatch with timeout Review of attachment 8877706 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8877706 - Flags: review?(continuation) → review+
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0ee807301e66 ensure AsyncFreeSnowWhite is run often enough, r=mccr8
[Tracking Requested - why for this release]: intermittent failures, possible higher memory usage when the browser is very busy. This should get backported to 55.
Backed out for crashing in test_add_task_run_next_test.js during Linux build: https://hg.mozilla.org/integration/mozilla-inbound/rev/51cae901338f24eec985f2e5aa58387c50ecb28e Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0ee807301e66400fcee3d428dd82584f8041c393&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=107115313&repo=mozilla-inbound [task 2017-06-14T19:13:03.982302Z] 19:13:03 INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_msvc [task 2017-06-14T19:13:03.982530Z] 19:13:03 INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_overridden_unsupported_clang [task 2017-06-14T19:13:03.982756Z] 19:13:03 INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_overridden_unsupported_gcc [task 2017-06-14T19:13:03.982976Z] 19:13:03 INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_unsupported_msvc [task 2017-06-14T19:13:15.052508Z] 19:13:15 INFO - /home/worker/workspace/build/src/testing/xpcshell/selftest.py [task 2017-06-14T19:13:15.052724Z] 19:13:15 INFO - PROCESS-CRASH | test_add_task_run_next_test.js | application crashed [@ libxul.so + 0xba601c] [task 2017-06-14T19:13:15.052866Z] 19:13:15 INFO - Crash dump filename: /tmp/xpc-other-NFEPmg/7e1416f8-1c93-a010-30ee-f2765cc9d012.dmp [task 2017-06-14T19:13:15.053041Z] 19:13:15 INFO - Operating system: Linux [task 2017-06-14T19:13:15.053232Z] 19:13:15 INFO - 0.0.0 Linux 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64 [task 2017-06-14T19:13:15.053344Z] 19:13:15 INFO - CPU: amd64 [task 2017-06-14T19:13:15.053604Z] 19:13:15 INFO - family 6 model 63 stepping 2 [task 2017-06-14T19:13:15.053642Z] 19:13:15 INFO - 16 CPUs [task 2017-06-14T19:13:15.053786Z] 19:13:15 INFO - Crash reason: SIGSEGV [task 2017-06-14T19:13:15.053954Z] 19:13:15 INFO - Crash address: 0x0 [task 2017-06-14T19:13:15.054107Z] 19:13:15 INFO - Thread 0 (crashed) [task 2017-06-14T19:13:15.054249Z] 19:13:15 INFO - 0 libxul.so + 0xba601c [task 2017-06-14T19:13:15.054396Z] 19:13:15 INFO - rbx = 0x00007ffeb171fb10 r12 = 0x00007ffeb171fb28 [task 2017-06-14T19:13:15.054555Z] 19:13:15 INFO - r13 = 0x00007ffeb171fb18 r14 = 0x0000000000000000 [task 2017-06-14T19:13:15.054713Z] 19:13:15 INFO - r15 = 0x00007fa824c1f0b6 rip = 0x00007fa824c1f01c [task 2017-06-14T19:13:15.054853Z] 19:13:15 INFO - rsp = 0x00007ffeb171fab0 rbp = 0x00007ffeb171faf0 [task 2017-06-14T19:13:15.054989Z] 19:13:15 INFO - Found by: given as instruction pointer in context [task 2017-06-14T19:13:15.055125Z] 19:13:15 INFO - 1 libxul.so + 0xba88e1 [task 2017-06-14T19:13:15.055268Z] 19:13:15 INFO - rip = 0x00007fa824c218e2 rsp = 0x00007ffeb171fac0 [task 2017-06-14T19:13:15.055429Z] 19:13:15 INFO - rbp = 0x00007ffeb171faf0 [task 2017-06-14T19:13:15.055619Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.055833Z] 19:13:15 INFO - 2 libxul.so + 0xba8da3 [task 2017-06-14T19:13:15.056034Z] 19:13:15 INFO - rip = 0x00007fa824c21da4 rsp = 0x00007ffeb171fb00 [task 2017-06-14T19:13:15.065752Z] 19:13:15 INFO - rbp = 0x00007ffeb171fb60 [task 2017-06-14T19:13:15.065787Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.065809Z] 19:13:15 INFO - 3 libxul.so + 0x127db0d [task 2017-06-14T19:13:15.065835Z] 19:13:15 INFO - rip = 0x00007fa8252f6b0e rsp = 0x00007ffeb171fb70 [task 2017-06-14T19:13:15.065857Z] 19:13:15 INFO - rbp = 0x00007ffeb171fba0 [task 2017-06-14T19:13:15.065878Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.065895Z] 19:13:15 INFO - 4 libxul.so + 0x127db7f [task 2017-06-14T19:13:15.065918Z] 19:13:15 INFO - rip = 0x00007fa8252f6b80 rsp = 0x00007ffeb171fbb0 [task 2017-06-14T19:13:15.065937Z] 19:13:15 INFO - rbp = 0x00007ffeb171fbc0 [task 2017-06-14T19:13:15.065956Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.065975Z] 19:13:15 INFO - 5 libxul.so + 0x127db2d [task 2017-06-14T19:13:15.065998Z] 19:13:15 INFO - rip = 0x00007fa8252f6b2e rsp = 0x00007ffeb171fbb8 [task 2017-06-14T19:13:15.066018Z] 19:13:15 INFO - rbp = 0x00007ffeb171fbc0 [task 2017-06-14T19:13:15.066035Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066053Z] 19:13:15 INFO - 6 libxul.so + 0xb46d57 [task 2017-06-14T19:13:15.066076Z] 19:13:15 INFO - rip = 0x00007fa824bbfd58 rsp = 0x00007ffeb171fbd0 [task 2017-06-14T19:13:15.066096Z] 19:13:15 INFO - rbp = 0x00007ffeb171fc70 [task 2017-06-14T19:13:15.066115Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066133Z] 19:13:15 INFO - 7 libxul.so + 0xb48411 [task 2017-06-14T19:13:15.066157Z] 19:13:15 INFO - rip = 0x00007fa824bc1412 rsp = 0x00007ffeb171fc80 [task 2017-06-14T19:13:15.066176Z] 19:13:15 INFO - rbp = 0x00007ffeb171fcf0 [task 2017-06-14T19:13:15.066195Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066214Z] 19:13:15 INFO - 8 libxul.so + 0x4446328 [task 2017-06-14T19:13:15.066238Z] 19:13:15 INFO - rip = 0x00007fa8284bf329 rsp = 0x00007ffeb171fcb0 [task 2017-06-14T19:13:15.066258Z] 19:13:15 INFO - rbp = 0x00007ffeb171fcf0 [task 2017-06-14T19:13:15.066278Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066297Z] 19:13:15 INFO - 9 libxul.so + 0x3b62300 [task 2017-06-14T19:13:15.066322Z] 19:13:15 INFO - rip = 0x00007fa827bdb301 rsp = 0x00007ffeb171fcc0 [task 2017-06-14T19:13:15.066342Z] 19:13:15 INFO - rbp = 0x00007ffeb171fcf0 [task 2017-06-14T19:13:15.066361Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066379Z] 19:13:15 INFO - 10 libxul.so + 0xb5cc77 [task 2017-06-14T19:13:15.066405Z] 19:13:15 INFO - rip = 0x00007fa824bd5c78 rsp = 0x00007ffeb171fcf0 [task 2017-06-14T19:13:15.066437Z] 19:13:15 INFO - rbp = 0x00007ffeb171fcf0 [task 2017-06-14T19:13:15.066472Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066507Z] 19:13:15 INFO - 11 libxul.so + 0xb484f3 [task 2017-06-14T19:13:15.066551Z] 19:13:15 INFO - rip = 0x00007fa824bc14f4 rsp = 0x00007ffeb171fd00 [task 2017-06-14T19:13:15.066590Z] 19:13:15 INFO - rbp = 0x00007ffeb171fd40 [task 2017-06-14T19:13:15.066629Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066666Z] 19:13:15 INFO - 12 libxul.so + 0xb48623 [task 2017-06-14T19:13:15.066712Z] 19:13:15 INFO - rip = 0x00007fa824bc1624 rsp = 0x00007ffeb171fd50 [task 2017-06-14T19:13:15.066750Z] 19:13:15 INFO - rbp = 0x00007ffeb171fd90 [task 2017-06-14T19:13:15.066788Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066825Z] 19:13:15 INFO - 13 libxul.so + 0xbc2b2d [task 2017-06-14T19:13:15.066870Z] 19:13:15 INFO - rip = 0x00007fa824c3bb2e rsp = 0x00007ffeb171fda0 [task 2017-06-14T19:13:15.066908Z] 19:13:15 INFO - rbp = 0x00007ffeb171fe20 [task 2017-06-14T19:13:15.066947Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.066983Z] 19:13:15 INFO - 14 xpcshell + 0x1a18f [task 2017-06-14T19:13:15.067029Z] 19:13:15 INFO - rip = 0x000000000041a190 rsp = 0x00007ffeb171fdc0 [task 2017-06-14T19:13:15.067067Z] 19:13:15 INFO - rbp = 0x00007ffeb171fe20 [task 2017-06-14T19:13:15.067103Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067141Z] 19:13:15 INFO - 15 libxul.so + 0x12ae686 [task 2017-06-14T19:13:15.067187Z] 19:13:15 INFO - rip = 0x00007fa825327687 rsp = 0x00007ffeb171fe30 [task 2017-06-14T19:13:15.067226Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.067264Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067303Z] 19:13:15 INFO - 16 libgtk-3.so.0.400.4 + 0x3e45c0 [task 2017-06-14T19:13:15.067350Z] 19:13:15 INFO - rip = 0x00007fa8239c75c1 rsp = 0x00007ffeb171fe98 [task 2017-06-14T19:13:15.067388Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.067425Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067457Z] 19:13:15 INFO - 17 libxul.so + 0x5e56eff [task 2017-06-14T19:13:15.067502Z] 19:13:15 INFO - rip = 0x00007fa829ecff00 rsp = 0x00007ffeb171ff90 [task 2017-06-14T19:13:15.067539Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.067576Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067613Z] 19:13:15 INFO - 18 libxul.so + 0x1298589 [task 2017-06-14T19:13:15.067659Z] 19:13:15 INFO - rip = 0x00007fa82531158a rsp = 0x00007ffeb171ffc8 [task 2017-06-14T19:13:15.067699Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.067737Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067775Z] 19:13:15 INFO - 19 libgtk-3.so.0.400.4 + 0x3da00 [task 2017-06-14T19:13:15.067820Z] 19:13:15 INFO - rip = 0x00007fa823620a01 rsp = 0x00007ffeb1720018 [task 2017-06-14T19:13:15.067857Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.067895Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.067935Z] 19:13:15 INFO - 20 libglib-2.0.so.0.3400.3 + 0xd7ff [task 2017-06-14T19:13:15.067982Z] 19:13:15 INFO - rip = 0x00007fa821d62800 rsp = 0x00007ffeb1720020 [task 2017-06-14T19:13:15.068019Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.068057Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.068092Z] 19:13:15 INFO - 21 libxul.so + 0x46500 [task 2017-06-14T19:13:15.068137Z] 19:13:15 INFO - rip = 0x00007fa8240bf501 rsp = 0x00007ffeb1720050 [task 2017-06-14T19:13:15.068176Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.068213Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.068250Z] 19:13:15 INFO - 22 libxul.so + 0xb2b218 [task 2017-06-14T19:13:15.068295Z] 19:13:15 INFO - rip = 0x00007fa824ba4219 rsp = 0x00007ffeb1720058 [task 2017-06-14T19:13:15.068333Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.068371Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.068414Z] 19:13:15 INFO - 23 ld-2.12.so + 0x9f09 [task 2017-06-14T19:13:15.068536Z] 19:13:15 INFO - rip = 0x00007fa82ac9bf0a rsp = 0x00007ffeb17200b0 [task 2017-06-14T19:13:15.068720Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.068907Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.069100Z] 19:13:15 INFO - 24 xpcshell + 0x19eff [task 2017-06-14T19:13:15.069300Z] 19:13:15 INFO - rip = 0x0000000000419f00 rsp = 0x00007ffeb1720150 [task 2017-06-14T19:13:15.069481Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.069663Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.069884Z] 19:13:15 INFO - 25 libglib-2.0.so.0.3400.3 + 0x53800 [task 2017-06-14T19:13:15.070088Z] 19:13:15 INFO - rip = 0x00007fa821da8801 rsp = 0x00007ffeb1720160 [task 2017-06-14T19:13:15.070264Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.070448Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.070631Z] 19:13:15 INFO - 26 libxul.so + 0x25c7634 [task 2017-06-14T19:13:15.070824Z] 19:13:15 INFO - rip = 0x00007fa826640635 rsp = 0x00007ffeb1720168 [task 2017-06-14T19:13:15.071005Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.071187Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.071373Z] 19:13:15 INFO - 27 libglib-2.0.so.0.3400.3 + 0x84dff [task 2017-06-14T19:13:15.071564Z] 19:13:15 INFO - rip = 0x00007fa821dd9e00 rsp = 0x00007ffeb17201a0 [task 2017-06-14T19:13:15.071740Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.071922Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.072101Z] 19:13:15 INFO - 28 libxul.so + 0x5e20b2f [task 2017-06-14T19:13:15.072297Z] 19:13:15 INFO - rip = 0x00007fa829e99b30 rsp = 0x00007ffeb17201b8 [task 2017-06-14T19:13:15.072474Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.072655Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.072837Z] 19:13:15 INFO - 29 libxul.so + 0x45ac069 [task 2017-06-14T19:13:15.073033Z] 19:13:15 INFO - rip = 0x00007fa82862506a rsp = 0x00007ffeb17201c8 [task 2017-06-14T19:13:15.073217Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.073397Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.073574Z] 19:13:15 INFO - 30 xpcshell + 0x10100 [task 2017-06-14T19:13:15.073769Z] 19:13:15 INFO - rip = 0x0000000000410101 rsp = 0x00007ffeb17201f0 [task 2017-06-14T19:13:15.073966Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.074147Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.074325Z] 19:13:15 INFO - 31 0x3fffff [task 2017-06-14T19:13:15.074522Z] 19:13:15 INFO - rip = 0x0000000000400000 rsp = 0x00007ffeb1720218 [task 2017-06-14T19:13:15.074706Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.074882Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.075059Z] 19:13:15 INFO - 32 xpcshell + 0x5797 [task 2017-06-14T19:13:15.075251Z] 19:13:15 INFO - rip = 0x0000000000405798 rsp = 0x00007ffeb1720278 [task 2017-06-14T19:13:15.075428Z] 19:13:15 INFO - rbp = 0x00007ffeb1720290 [task 2017-06-14T19:13:15.075612Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.075788Z] 19:13:15 INFO - 33 xpcshell + 0x55d4 [task 2017-06-14T19:13:15.075983Z] 19:13:15 INFO - rip = 0x00000000004055d5 rsp = 0x00007ffeb17202a0 [task 2017-06-14T19:13:15.076158Z] 19:13:15 INFO - rbp = 0x00007ffeb17202d0 [task 2017-06-14T19:13:15.076337Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.076518Z] 19:13:15 INFO - 34 libc-2.12.so + 0x1ed5c [task 2017-06-14T19:13:15.076712Z] 19:13:15 INFO - rip = 0x00007fa821233d5d rsp = 0x00007ffeb17202e0 [task 2017-06-14T19:13:15.076889Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.077068Z] 19:13:15 INFO - 35 xpcshell + 0x5570 [task 2017-06-14T19:13:15.077265Z] 19:13:15 INFO - rip = 0x0000000000405571 rsp = 0x00007ffeb1720300 [task 2017-06-14T19:13:15.077440Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.077619Z] 19:13:15 INFO - 36 xpcshell + 0x5797 [task 2017-06-14T19:13:15.077833Z] 19:13:15 INFO - rip = 0x0000000000405798 rsp = 0x00007ffeb1720318 [task 2017-06-14T19:13:15.078010Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.078188Z] 19:13:15 INFO - 37 xpcshell + 0x1c87f [task 2017-06-14T19:13:15.078385Z] 19:13:15 INFO - rip = 0x000000000041c880 rsp = 0x00007ffeb1720360 [task 2017-06-14T19:13:15.078560Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.078740Z] 19:13:15 INFO - 38 xpcshell + 0x5797 [task 2017-06-14T19:13:15.078939Z] 19:13:15 INFO - rip = 0x0000000000405798 rsp = 0x00007ffeb1720388 [task 2017-06-14T19:13:15.079114Z] 19:13:15 INFO - Found by: stack scanning [task 2017-06-14T19:13:15.079296Z] 19:13:15 INFO - 39 xpcshell + 0x57c0 [task 2017-06-14T19:13:15.079490Z] 19:13:15 INFO - rip = 0x00000000004057c1 rsp = 0x00007ffeb17203a0 [task 2017-06-14T19:13:15.079669Z] 19:13:15 INFO - Found by: stack scanning
Flags: needinfo?(bugs)
Surprising. I guess xpcshell tests behave very differently to actual browser.
Attached patch async_snow_white_2.diff (obsolete) — Splinter Review
idle dispatch fails anyhow later if one can't access current thread.
Attachment #8877706 - Attachment is obsolete: true
Flags: needinfo?(bugs)
Attachment #8877750 - Flags: review?(continuation)
Comment on attachment 8877750 [details] [diff] [review] async_snow_white_2.diff Review of attachment 8877750 [details] [diff] [review]: ----------------------------------------------------------------- Nathan should review the thread changes.
Attachment #8877750 - Flags: review?(continuation) → review?(nfroyd)
Blocks: 1373116
tracking for 55 and 56 per comment 11.
Comment on attachment 8877750 [details] [diff] [review] async_snow_white_2.diff Review of attachment 8877750 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the below change. ::: xpcom/threads/nsThreadUtils.cpp @@ +371,5 @@ > nsCOMPtr<nsIRunnable> event(Move(aEvent)); > NS_ENSURE_TRUE(event, NS_ERROR_INVALID_ARG); > + > + //XXX Using current thread for now as the nsIEventTarget. > + nsIEventTarget* target = NS_GetCurrentThread(); Please use mozilla::GetCurrentThreadEventTarget, which I believe should "fail" in the same ways at shutdown.
Attachment #8877750 - Flags: review?(nfroyd) → review+
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ff18daec4a56 ensure AsyncFreeSnowWhite is run often enough, r=mccr8,nfroyd
Whiteboard: [stockwell fixed:product]
Looks like this is fixed in 56 but the bug wasn't marked. Is this something we might want to uplift to 55 beta?
Flags: needinfo?(bugs)
Assignee: nobody → bugs
Comment on attachment 8878580 [details] [diff] [review] async_snow_white_2.diff Approval Request Comment [Feature/Bug causing the regression]: Maybe bug 1367164 [User impact if declined]: not sure, this is mostly testing issue. [Is this code covered by automated tests?]: well this is testing issues [Has the fix been verified in Nightly?]: don't know [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: NA [Is the change risky?]: should be safe [Why is the change risky/not risky?]: Just running some runnables more reliable [String changes made/needed]: NA
Flags: needinfo?(bugs)
Attachment #8878580 - Flags: approval-mozilla-beta?
Comment on attachment 8878580 [details] [diff] [review] async_snow_white_2.diff Fix for annoying intermittent test failures, let's uplift this for beta 5.
Attachment #8878580 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1312623
Attachment #8877750 - Attachment is obsolete: true
(In reply to Ryan VanderMeulen [:RyanVM] from comment #25) > https://hg.mozilla.org/releases/mozilla-beta/rev/d5d30ef9787f This was broken on Beta because GetCurrentThreadEventTarget doesn't exist on 55. So it was backed out: https://hg.mozilla.org/releases/mozilla-beta/rev/a071ea8831a3daf150012b16484a24c0bbb27aa7 I landed the original version of the patch using NS_GetCurrentThread and it appears to be working fine. https://hg.mozilla.org/releases/mozilla-beta/rev/67e3dc7adcbe69fef347bfe70ee0afceab507210
this has started up again at a pretty high failure rate. :smaug, I see you landed a fix for this a few weeks ago, can you take a look at this again?
Flags: needinfo?(bugs)
Whiteboard: [stockwell fixed:product] → [stockwell needswork]
Unfortunately I have no idea what is causing this. This bug should probably be closed and a new one opened.
Flags: needinfo?(bugs)
Yeah this one was meant to be more of a metabug for a general signature. There are probably multiple different causes for this.
Whiteboard: [stockwell needswork] → [stockwell unknown]
I have seen an increased number of those crashes due to slow shutdowns for various Marionette restart tests. This happens especially on Linux since Sep 13/14th.
This bug's going to suddenly have more starred against it, because we decided that the one I had been using was all about ASan, and "fixed" that, and called the bug fixed. Doesn't mean these are happening more, or less, just that this is now my one true bug for all of them.
I suspect that with the increased duration in AsyncShutdown/nsTerminator, we'll also see new forms of this bug as harness timeout. Time will tell.
Depends on: 1425323
Keywords: meta
Summary: Many intermittent crashes in RunWatchdog → [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
Alias: RunWatchdogShutdownhang
Depends on: 1336478
Depends on: 1445210
Depends on: 1130782
Summary: [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down. → Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
Depends on: 1411908
Depends on: 1434618
Depends on: 1452416
Please add "Crash" key word.
Depends on: 1424393
:aryx -- Some of the recent dups do not look like they are related to shutdown or RunWatchdog; can you explain what criteria you are using for duping here?
Flags: needinfo?(aryx.bugmail)
Sorry, I over-generalized there. The bugs have been reopened.
Flags: needinfo?(aryx.bugmail)
Over the last 7 days there are 32 failures on this bug. These happen on windows10-64-ccov, linux64 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-07-24&endday=2018-07-31&tree=trunk&bug=1358898 [task 2018-07-31T04:06:40.325Z] 04:06:40 ERROR - ==7398==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa506be594d bp 0x7fa4c6ffedd0 sp 0x7fa4c6ffed20 T87) [task 2018-07-31T04:06:40.327Z] 04:06:40 INFO - ==7398==The signal is caused by a WRITE memory access. [task 2018-07-31T04:06:40.330Z] 04:06:40 INFO - ==7398==Hint: address points to the zero page. [task 2018-07-31T04:06:41.257Z] 04:06:41 INFO - #0 0x7fa506be594c in mozilla::(anonymous namespace)::RunWatchdog(void*) /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp [task 2018-07-31T04:06:41.258Z] 04:06:41 INFO - #1 0x7fa518232f08 in _pt_root /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:201:5 [task 2018-07-31T04:06:41.258Z] 04:06:41 INFO - #2 0x7fa51b82c6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) [task 2018-07-31T04:06:41.319Z] 04:06:41 INFO - #3 0x7fa51a8b541c in clone /build/glibc-Cl5G7W/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109 [task 2018-07-31T04:06:41.321Z] 04:06:41 INFO - AddressSanitizer can not provide additional info. [task 2018-07-31T04:06:41.321Z] 04:06:41 INFO - SUMMARY: AddressSanitizer: SEGV /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp in mozilla::(anonymous namespace)::RunWatchdog(void*) [task 2018-07-31T04:06:41.322Z] 04:06:41 INFO - Thread T87 (Shutdow~minator) created by T0 here: [task 2018-07-31T04:06:41.340Z] 04:06:41 INFO - #0 0x4aae8d in __interceptor_pthread_create /builds/worker/workspace/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cc:204:3 [task 2018-07-31T04:06:41.342Z] 04:06:41 INFO - #1 0x7fa51822fc45 in _PR_CreateThread /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:433:14 [task 2018-07-31T04:06:41.342Z] 04:06:41 INFO - #2 0x7fa51822f82e in PR_CreateThread /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:518:12 [task 2018-07-31T04:06:41.343Z] 04:06:41 INFO - #3 0x7fa506be56c2 in CreateSystemThread /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:103:22 [task 2018-07-31T04:06:41.345Z] 04:06:41 INFO - #4 0x7fa506be56c2 in mozilla::nsTerminator::StartWatchdog() /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:453 [task 2018-07-31T04:06:41.347Z] 04:06:41 INFO - #5 0x7fa506be5bd6 in Start /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:393:3 [task 2018-07-31T04:06:41.348Z] 04:06:41 INFO - #6 0x7fa506be5bd6 in mozilla::nsTerminator::Observe(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:508 [task 2018-07-31T04:06:41.349Z] 04:06:41 INFO - #7 0x7fa4fb161323 in nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/xpcom/ds/nsObserverList.cpp:112:19 [task 2018-07-31T04:06:41.350Z] 04:06:41 INFO - #8 0x7fa4fb164c52 in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/xpcom/ds/nsObserverService.cpp:295:19 [task 2018-07-31T04:06:41.351Z] 04:06:41 INFO - #9 0x7fa5069fd86e in nsAppStartup::Quit(unsigned int) /builds/worker/workspace/build/src/toolkit/components/startup/nsAppStartup.cpp:468:19 [task 2018-07-31T04:06:41.352Z] 04:06:41 INFO - #10 0x7fa4fb2a7b11 in NS_InvokeByIndex /builds/worker/workspace/build/src/xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_x86_64_unix.S:106 [task 2018-07-31T04:06:41.361Z] 04:06:41 INFO - #11 0x7fa4fcbc717b in Invoke /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1705:12 [task 2018-07-31T04:06:41.363Z] 04:06:41 INFO - #12 0x7fa4fcbc717b in Call /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1231 [task 2018-07-31T04:06:41.363Z] 04:06:41 INFO - #13 0x7fa4fcbc717b in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1198 [task 2018-07-31T04:06:41.364Z] 04:06:41 INFO - #14 0x7fa4fcbccd04 in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:893:12 [task 2018-07-31T04:06:41.384Z] 04:06:41 INFO - #15 0x7fa506f09a8e in CallJSNative /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:444:15 [task 2018-07-31T04:06:41.384Z] 04:06:41 INFO - #16 0x7fa506f09a8e in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:532 [task 2018-07-31T04:06:41.384Z] 04:06:41 INFO - #17 0x7fa506ef46a2 in CallFromStack /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:589:12 [task 2018-07-31T04:06:41.385Z] 04:06:41 INFO - #18 0x7fa506ef46a2 in Interpret(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:3239 [task 2018-07-31T04:06:41.385Z] 04:06:41 INFO - #19 0x7fa506eda63a in js::RunScript(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:424:12 [task 2018-07-31T04:06:41.387Z] 04:06:41 INFO - #20 0x7fa506f0a364 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:556:15 [task 2018-07-31T04:06:41.387Z] 04:06:41 INFO - #21 0x7fa506f0b8f2 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:602:10 [task 2018-07-31T04:06:41.387Z] 04:06:41 INFO - #22 0x7fa507b23a75 in js::ForwardingProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const /builds/worker/workspace/build/src/js/src/proxy/Wrapper.cpp:176:12 [task 2018-07-31T04:06:41.397Z] 04:06:41 INFO - #23 0x7fa507ae8bb8 in js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const /builds/worker/workspace/build/src/js/src/proxy/CrossCompartmentWrapper.cpp:359:23 [task 2018-07-31T04:06:41.397Z] 04:06:41 INFO - #24 0x7fa507afe643 in js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) /builds/worker/workspace/build/src/js/src/proxy/Proxy.cpp:510:21 [task 2018-07-31T04:06:41.398Z] 04:06:41 INFO - #25 0x7fa506f0a984 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:508:20 [task 2018-07-31T04:06:41.398Z] 04:06:41 INFO - #26 0x7fa506ef46a2 in CallFromStack /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:589:12 [task 2018-07-31T04:06:41.399Z] 04:06:41 INFO - #27 0x7fa506ef46a2 in Interpret(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:3239 [task 2018-07-31T04:06:41.400Z] 04:06:41 INFO - #28 0x7fa506eda63a in js::RunScript(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:424:12 [task 2018-07-31T04:06:41.400Z] 04:06:41 INFO - #29 0x7fa506f0a364 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:556:15 [task 2018-07-31T04:06:41.400Z] 04:06:41 INFO - #30 0x7fa506f0b8f2 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:602:10 [task 2018-07-31T04:06:41.419Z] 04:06:41 INFO - #31 0x7fa507a3bbad in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/jsapi.cpp:2841:12 [task 2018-07-31T04:06:41.420Z] 04:06:41 INFO - #32 0x7fa4fcbafd50 in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedJSClass.cpp:1131:23 [task 2018-07-31T04:06:41.420Z] 04:06:41 INFO - #33 0x7fa4fb2a91e8 in PrepareAndDispatch /builds/worker/workspace/build/src/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:127:37 [task 2018-07-31T04:06:41.420Z] 04:06:41 INFO - #34 0x7fa4fb2a80ba in SharedStub (/builds/worker/workspace/build/application/firefox/libxul.so+0x213d0ba) [task 2018-07-31T04:06:41.420Z] 04:06:41 INFO - #35 0x7fa4fb2799bf in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1235:14 [task 2018-07-31T04:06:41.420Z] 04:06:41 INFO - #36 0x7fa4fb280b88 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:519:10 [task 2018-07-31T04:06:41.427Z] 04:06:41 INFO - #37 0x7fa4fc1abafa in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/workspace/build/src/ipc/glue/MessagePump.cpp:97:21 [task 2018-07-31T04:06:41.434Z] 04:06:41 INFO - #38 0x7fa4fc10018c in RunInternal /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:325:10 [task 2018-07-31T04:06:41.435Z] 04:06:41 INFO - #39 0x7fa4fc10018c in RunHandler /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:318 [task 2018-07-31T04:06:41.437Z] 04:06:41 INFO - #40 0x7fa4fc10018c in MessageLoop::Run() /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:298 [task 2018-07-31T04:06:41.437Z] 04:06:41 INFO - #41 0x7fa502febdca in nsBaseAppShell::Run() /builds/worker/workspace/build/src/widget/nsBaseAppShell.cpp:158:27 [task 2018-07-31T04:06:41.437Z] 04:06:41 INFO - #42 0x7fa5069fcb3b in nsAppStartup::Run() /builds/worker/workspace/build/src/toolkit/components/startup/nsAppStartup.cpp:290:30 [task 2018-07-31T04:06:41.437Z] 04:06:41 INFO - #43 0x7fa506c2fc4b in XREMain::XRE_mainRun() /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4764:22 [task 2018-07-31T04:06:41.438Z] 04:06:41 INFO - #44 0x7fa506c32d73 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4909:8 [task 2018-07-31T04:06:41.439Z] 04:06:41 INFO - #45 0x7fa506c34228 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:5001:21 [task 2018-07-31T04:06:41.440Z] 04:06:41 INFO - #46 0x4f20ac in do_main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:233:22 [task 2018-07-31T04:06:41.445Z] 04:06:41 INFO - #47 0x4f20ac in main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:311 [task 2018-07-31T04:06:41.446Z] 04:06:41 INFO - #48 0x7fa51a7ce82f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291 [task 2018-07-31T04:06:41.447Z] 04:06:41 INFO - ==7398==ABORTING [task 2018-07-31T04:12:51.453Z] 04:12:51 ERROR - REFTEST ERROR | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/lists-3/list-style-type-string-001b.html | application timed out after 370 seconds with no output [task 2018-07-31T04:12:51.455Z] 04:12:51 ERROR - REFTEST ERROR | Force-terminating active process(es). [task 2018-07-31T04:12:51.457Z] 04:12:51 INFO - REFTEST TEST-INFO | started process screentopng [task 2018-07-31T04:12:52.136Z] 04:12:52 INFO - REFTEST TEST-INFO | screentopng: exit 0 [task 2018-07-31T04:12:52.137Z] 04:12:52 INFO - REFTEST INFO | Can't trigger Breakpad, just killing process [task 2018-07-31T04:12:52.238Z] 04:12:52 ERROR - TEST-UNEXPECTED-FAIL | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/lists-3/list-style-type-string-001b.html | application terminated with exit code 1
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
I don't see anything useful in the logs.
There are 67 failures asociated to this bug in the last 7 days. This is mostly occurring on Linux 64, with some occurrences on Linux 32, Windows 7 and OS. :Yoric can you take a look at this?
Flags: needinfo?(dteller)
Please note that for all those cases I can see bug 1451613 being related: https://treeherder.mozilla.org/logviewer.html#?job_id=202468209&repo=try&lineNumber=39094-39096 > [task 2018-09-30T14:25:42.200Z] 14:25:42 INFO - [Parent 11365, Main Thread] WARNING: NS_ENSURE_TRUE(mDB) failed: file /builds/worker/workspace/build/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp, line 1422 > [task 2018-09-30T14:26:26.940Z] 14:26:26 INFO - JavaScript error: resource://gre/modules/osfile/osfile_async_front.jsm, line 407: Error: OS.File has been shut down. Rejecting post to stat > [task 2018-09-30T14:26:45.854Z] 14:26:45 INFO - Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:219
(In reply to Andrei Ciure[:andrei_ciure_] from comment #153) > There are 67 failures asociated to this bug in the last 7 days. This is > mostly occurring on Linux 64, with some occurrences on > Linux 32, Windows 7 and OS. > > :Yoric can you take a look at this? Just to be clear: because I wrote the timer doesn't mean that I'm the best person to fix the stuff that it measures :) You want me to do some triage?
Flags: needinfo?(dteller) → needinfo?(aciure)
Please note that most of the lately reported failures (see last comment) are related to VRListenerThreadHolder. This particular problem should mostly/completely be fixed now. See bug 1494490.
Depends on: 1494490
Flags: needinfo?(aciure)
There are 11 failures in the last 7 days, removing the disabled recommended tag.
Whiteboard: [stockwell disable-recommended]
Summary: Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down. → [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
There are 23 failures in the last 7 days, 18 of which on linux64 asan, 4 on windows10-64-qr debug and 1 linux32 debug. https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-11-30&endday=2018-12-07&tree=all&bug=1358898

There are 30 failures in the last 7 days, mostly of which on osx-10-10 debug/opt and linux64 asan.

:smaug can you take a look?

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=233087449&repo=autoland&lineNumber=7391

12:28:28 INFO - TEST-INFO | Main app process: exit 1
12:28:28 INFO - Buffered messages finished
12:28:28 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
12:28:28 INFO - runtests.py | Application ran for: 0:01:16.689748
12:28:28 INFO - zombiecheck | Reading PID log: /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpxhcSy0pidlog
12:28:28 INFO - ==> process 944 launched child process 945
12:28:28 INFO - ==> process 944 launched child process 946
12:28:28 INFO - ==> process 944 launched child process 947
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 945
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 946
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 947
12:28:28 INFO - mozcrash Copy/paste: /Users/cltbld/tasks/task_1552306751/build/macosx64-minidump_stackwalk /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpsLxijP.mozrunner/minidumps/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp /Users/cltbld/tasks/task_1552306751/build/symbols
12:28:45 INFO - mozcrash Saved minidump as /Users/cltbld/tasks/task_1552306751/build/blobber_upload_dir/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp
12:28:45 INFO - mozcrash Saved app info as /Users/cltbld/tasks/task_1552306751/build/blobber_upload_dir/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.extra
12:28:45 INFO - PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
12:28:45 INFO - Crash dump filename: /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpsLxijP.mozrunner/minidumps/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp
12:28:45 INFO - Operating system: Mac OS X
12:28:45 INFO - 10.10.5 14F27
12:28:45 INFO - CPU: amd64
12:28:45 INFO - family 6 model 69 stepping 1
12:28:45 INFO - 4 CPUs
12:28:45 INFO -
12:28:45 INFO - GPU: UNKNOWN
12:28:45 INFO -
12:28:45 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
12:28:45 INFO - Crash address: 0x0
12:28:45 INFO - Process uptime: 75 seconds
12:28:45 INFO -
12:28:45 INFO - Thread 37 (crashed)
12:28:45 INFO - 0 XUL!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:a423e8a5526660b0b84c79d9c7b13a0d1987eaff : 200 + 0x0]
12:28:45 INFO - rax = 0x0000000111b8235c rdx = 0x00007fff79dd11f8
12:28:45 INFO - rcx = 0x000000010a05c6b0 rbx = 0x0000000000000000
12:28:45 INFO - rsi = 0x0000480000004800 rdi = 0x0000470000004803
12:28:45 INFO - rbp = 0x0000000114303eb0 rsp = 0x0000000114303e80
12:28:45 INFO - r8 = 0x0000000114303e30 r9 = 0x0000000114304000
12:28:45 INFO - r10 = 0x00007fff8e6873ef r11 = 0x00007fff8e6873c0
12:28:45 INFO - r12 = 0x000000000000003f r13 = 0x00000000000008ff
12:28:45 INFO - r14 = 0x00000001134fb9e4 r15 = 0x000000010a05c6d8
12:28:45 INFO - rip = 0x000000010fa054d3
12:28:45 INFO - Found by: given as instruction pointer in context
12:28:45 INFO - 1 libnss3.dylib!_pt_root [ptthread.c:a423e8a5526660b0b84c79d9c7b13a0d1987eaff : 201 + 0x3]
12:28:45 INFO - rbp = 0x0000000114303ef0 rsp = 0x0000000114303ec0
12:28:45 INFO - rip = 0x000000010a46e980
12:28:45 INFO - Found by: previous frame's frame pointer

Flags: needinfo?(bugs)

This is a meta bug, not really something to look at.

Flags: needinfo?(bugs)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork]

Failures from the 18th till present day were caused by bug 1545440 and fixed by emilio here: https://hg.mozilla.org/integration/autoland/rev/133b9a6eafebffdf83e548972a7ff5634433d5d0

In that interval there were 585 total failures (browser-chrome failures), all of which are apparently bug 1545979
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-18&endday=2019-04-21&tree=trunk&bug=1358898

Removing ni as the high frequency here is caused solely by missclassifications and the real issue only has 10 total failures:

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-14&endday=2019-04-18&tree=trunk&bug=1358898

Flags: needinfo?(dteller)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

This bug had a massive spike in the last 7 days. Occurring 184 times on Linux and OSX 64bit platforms on debug and opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=261906593&repo=autoland&lineNumber=35462

(In reply to Dorel Luca [:dluca] from comment #220)

This bug had a massive spike in the last 7 days. Occurring 184 times on Linux and OSX 64bit platforms on debug and opt build types.

And the last failure happened on August 16th, which means no failure over the last weekend. I wonder if this is related to the lower amount of pushes, or something else fixed it. Worth checking again tomorrow.

Whiteboard: [stockwell disable-recommended]

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #221)

And the last failure happened on August 16th, which means no failure over the last weekend. I wonder if this is related to the lower amount of pushes, or something else fixed it. Worth checking again tomorrow.

Correct, no more failures yet since Aug 16th. Olli, any idea what could have fixed that bug?

Flags: needinfo?(bugs)

I'm not aware of any specific recent changes affecting this bug.
un-regression range might reveal something interesting.

Flags: needinfo?(bugs)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

This bug failed 39 times in the last 7 days. Occurs on linux64, windows10-64, windows10-64-shippable and on windows7-32

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=268547147&repo=autoland&lineNumber=2425

There are 44 total failures in the last 7 days on linux64 and windows7-32 debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269063326&repo=autoland&lineNumber=4421

[task 2019-09-30T14:37:46.037Z] 14:37:46 INFO - TEST-START | accessible/tests/browser/e10s/browser_treeupdate_visibility.js
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOCSHELL 085DB800 == 6 [pid = 4168] [id = {8e4d59ba-58c7-4c73-8903-9e7f5bf632cd}]
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 43 (0B22B820) [pid = 4168] [serial = 72] [outer = 00000000]
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 44 (085E2C00) [pid = 4168] [serial = 73] [outer = 0B22B820]
[task 2019-09-30T14:37:46.119Z] 14:37:46 INFO - GECKO(1980) | [Child 4168, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
[task 2019-09-30T14:37:46.119Z] 14:37:46 INFO - GECKO(1980) | : file z:/build/build/src/xpcom/threads/BlockingResourceBase.cpp, line 293
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | --- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | calling context
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | [stack trace unavailable]
[task 2019-09-30T14:37:46.160Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 45 (0B20E000) [pid = 4168] [serial = 74] [outer = 0B22B820]
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | [Child 4168, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | : file z:/build/build/src/xpcom/threads/BlockingResourceBase.cpp, line 293
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | --- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)

[task 2019-09-30T14:38:57.440Z] 14:38:57 INFO - mozcrash Copy/paste: Z:\task_1569850468\build\win32-minidump_stackwalk.exe c:\users\task_1569850468\appdata\local\temp\tmp0hxmto.mozrunner\minidumps\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp Z:\task_1569850468\build\symbols
[task 2019-09-30T14:39:18.980Z] 14:39:18 INFO - mozcrash Saved minidump as Z:\task_1569850468\build\blobber_upload_dir\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp
[task 2019-09-30T14:39:18.983Z] 14:39:18 INFO - mozcrash Saved app info as Z:\task_1569850468\build\blobber_upload_dir\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.extra
[task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - PROCESS-CRASH | Last test finished | application crashed [@ static void mozilla::anonymous namespace'::RunWatchdog(void *)] [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash dump filename: c:\users\task_1569850468\appdata\local\temp\tmp0hxmto.mozrunner\minidumps\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Operating system: Windows NT [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - 6.1.7601 Service Pack 1 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - CPU: x86 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - GenuineIntel family 6 model 63 stepping 2 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - 8 CPUs [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - GPU: UNKNOWN [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash reason: EXCEPTION_BREAKPOINT [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash address: 0x5b912725 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Process uptime: 100 seconds [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Thread 33 (crashed) [task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 0 xul.dll!static void mozilla::anonymous namespace'::RunWatchdog(void *) [nsTerminator.cpp:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 213 + 0x0]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x5b912725 esp = 0x18abfcac ebp = 0x18abfcc0 ebx = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - esi = 0x0000003f edi = 0x7685c446 eax = 0x6b39389c ecx = 0x000000d5
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - edx = 0x00000049 efl = 0x00000202
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: given as instruction pointer in context
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 1 nss3.dll!_PR_NativeRunThread [pruthr.c:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 397 + 0x6]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b69aa98 esp = 0x18abfcc8 ebp = 0x18abfce8 esi = 0x008b0630
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - edi = 0x008b0580
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 2 nss3.dll!static unsigned int pr_root(void *) [w95thred.c:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 137 + 0x7]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b68917d esp = 0x18abfcf0 ebp = 0x18abfcf4 ebx = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - esi = 0x6b689170 edi = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 3 ucrtbase.dll!__RtlConvertUlongToLargeInteger + 0x8489
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b18e16f esp = 0x18abfcfc ebp = 0x18abfd30
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 4 kernel32.dll! ?? ?? ::FNODOBFM::`string' + 0x46
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x7685ef3c esp = 0x18abfd38 ebp = 0x18abfd3c
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: previous frame's frame pointer
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 5 ntdll.dll!__RtlUserThreadStart + 0x27
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - eip = 0x77a93618 esp = 0x18abfd44 ebp = 0x18abfd7c
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - Found by: previous frame's frame pointer
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - 6 ntdll.dll!_RtlUserThreadStart + 0x1b
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - eip = 0x77a935eb esp = 0x18abfd84 ebp = 0x18abfd94
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - Found by: call frame info

Based on https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-08-31&endday=2019-09-30&tree=all&bug=1358898 seems like spikes started on the 21st ~ 24th of September.

David can you assign someone to take a look?

Flags: needinfo?(dteller)

I can try to find someone, but I haven't worked on anything related to shutdown in ~4 years.

Flags: needinfo?(dteller)

Hmm, it is mostly mochitest-browser-chrome-fis-e10s-16 (debug) which is failing. We certainly may have issues with shutdown in Fission.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork]

Over the last 7 days there are 59 failures present on this bug. These happen on: linux64, linux64-qr, macosx1014-64, windows10-64, windows10-64-qr, windows10-64-shippable

Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271983976&repo=autoland&lineNumber=8858

(In reply to Olli Pettay [:smaug] from comment #238)

Hmm, it is mostly mochitest-browser-chrome-fis-e10s-16 (debug) which is failing. We certainly may have issues with shutdown in Fission.

Olli are you working on this or can you assign someone to take a look or is this not fixable?

Whiteboard: [stockwell needswork] → [stockwell needswork:owner]

Hi Geoff, any suggestions here?

Flags: needinfo?(gbrown)

Most of the recent failures are on linux64/debug and in mochitest-browser-chrome-fission; since it is fission-specific, let's see if mccr8 is interested.

The shutdown hang occurs most frequently after running the tests in browser/components/extensions/test/browser (WebExtensions); let's see if jimm is interested.

Flags: needinfo?(jmathies)
Flags: needinfo?(gbrown)
Flags: needinfo?(continuation)

I don't know what could be going on. It does seem strange that around half of the failures are in a single directory. Looking at the log, it frees some windows and then just sits there for about a minute before getting killed. With Fission, a testing-only pref to reuse content processes is not working. I would have expected that this only make the tests themselves take longer, but maybe this directory creates an unusually large number of processes, and there's some kind of per-process cleanup that happens only in shutdown?

Flags: needinfo?(continuation)

Kris, in a discussion of bug 1580212, you were saying that maybe we should just leave dom.ipc.keepProcessesAlive broken with Fission. However, this directory, browser/components/extensions/test/browser/, seems to be experiencing frequent shutdown hangs, and I do wonder if the fact that it seems to open around 255 content tabs has anything to do with it? The failures are almost all in debug builds, so it is possible that there's something particularly slow we're doing at shutdown in debug builds. (Leak checking, maybe? Something else?)

Flags: needinfo?(kmaglione+bmo)

(In reply to Geoff Brown [:gbrown] from comment #248)

Most of the recent failures are on linux64/debug and in mochitest-browser-chrome-fission; since it is fission-specific, let's see if mccr8 is interested.

The shutdown hang occurs most frequently after running the tests in browser/components/extensions/test/browser (WebExtensions); let's see if jimm is interested.

If this is specific to fission, Neha can help prioritize the work.

Flags: needinfo?(jmathies) → needinfo?(nkochar)

mccr8 has looked into this and Kris is NI'ed here too for feedback.

Flags: needinfo?(nkochar)
Whiteboard: [stockwell disable-recommended]

There are 94 total failures in the last 7 days mostly on linux64 debug and 399 in the last 20 days.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=274975908&repo=autoland&lineNumber=36336

[task 2019-11-07T01:24:02.907Z] 01:24:02 INFO - checking window state
[task 2019-11-07T01:24:02.935Z] 01:24:02 INFO - TEST-START | browser/components/extensions/test/browser/browser_ext_windows_update.js
[task 2019-11-07T01:24:02.952Z] 01:24:02 INFO - GECKO(1178) | ++DOCSHELL 0x7efe4793c000 == 40 [pid = 1178] [id = {573c6e22-67b2-4820-ab21-c0659b56358c}]
[task 2019-11-07T01:24:02.952Z] 01:24:02 INFO - GECKO(1178) | ++DOMWINDOW == 161 (0x7efe52019020) [pid = 1178] [serial = 1261] [outer = (nil)]

[task 2019-11-07T01:25:40.713Z] 01:25:40 INFO - GECKO(1178) | Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:213
[task 2019-11-07T01:25:40.749Z] 01:25:40 INFO - GECKO(1178) | #01: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-11-07T01:25:40.750Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #02: libpthread.so.0 + 0x76ba
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #03: libc.so.6 + 0x10741d
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #04: ??? (???:???)
[task 2019-11-07T01:25:40.752Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::GenerateDump cloned child 8412
[task 2019-11-07T01:25:40.753Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-11-07T01:25:40.755Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-11-07T01:31:50.765Z] 01:31:50 INFO - Buffered messages finished
[task 2019-11-07T01:31:50.765Z] 01:31:50 ERROR - TEST-UNEXPECTED-TIMEOUT | Last test finished | application timed out after 370 seconds with no output
[task 2019-11-07T01:31:50.765Z] 01:31:50 ERROR - Force-terminating active process(es).
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - Determining child pids from psutil...
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - []
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - ==> process 1178 launched child process 1196
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - ==> process 1178 launched child process 1236
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1258
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1314
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1342
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1362
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1386
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1412

[task 2019-11-07T01:31:52.276Z] 01:31:52 INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpVyDQ8d.mozrunner/minidumps/7ffdcd9d-6d3c-b607-8799-954437912954.dmp /builds/worker/workspace/build/symbols
[task 2019-11-07T01:31:58.645Z] 01:31:58 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/7ffdcd9d-6d3c-b607-8799-954437912954.dmp
[task 2019-11-07T01:31:58.645Z] 01:31:58 INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/7ffdcd9d-6d3c-b607-8799-954437912954.extra
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - Crash dump filename: /tmp/tmpVyDQ8d.mozrunner/minidumps/7ffdcd9d-6d3c-b607-8799-954437912954.dmp
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - Operating system: Linux
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - CPU: amd64
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - family 6 model 85 stepping 4
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - 1 CPU
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - GPU: UNKNOWN
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Crash address: 0x0
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Process uptime: not available
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - Thread 7 (crashed)
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:64599da92fcc6f9988d9e50a15c364beef9cb009 : 213 + 0x22]
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - rax = 0x00007efeab590414 rdx = 0x0000000000000000
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - rcx = 0x000056527442ee28 rbx = 0x000000000000003f
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - rsi = 0x00007efeb73e5770 rdi = 0x00007efeb73e4540
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - rbp = 0x00007efe63e8aec0 rsp = 0x00007efe63e8ae90
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r8 = 0x00007efeb73e5770 r9 = 0x00007efe63e8b700
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r10 = 0x0000000000000001 r11 = 0x0000000000000000
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r12 = 0x00007efe63e8b650 r13 = 0x0000000000000000
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r14 = 0x00007efe63e8aea0 r15 = 0x00007efe479db140
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - rip = 0x00007efea801f072
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - Found by: given as instruction pointer in context
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - 1 libnspr4.so!_pt_root [ptthread.c:64599da92fcc6f9988d9e50a15c364beef9cb009 : 201 + 0x7]
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - rbx = 0x00007efe59a25ee0 rbp = 0x00007efe63e8af10
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - rsp = 0x00007efe63e8aed0 r12 = 0x00007efe63e8b650
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - r13 = 0x0000000000000000 r14 = 0x00000000000020d8
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - r15 = 0x00007efe63e8b700 rip = 0x00007efeb845b0ec
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - Found by: call frame info
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - 2 libpthread-2.23.so + 0x76ba
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - rsp = 0x00007efe63e8af20 r12 = 0x0000000000000000
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - r13 = 0x00007ffed4470e5f r14 = 0x00007efe63e8b9c0
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - r15 = 0x0000000000000000 rip = 0x00007efeb809d6ba
[task 2019-11-07T01:31:58.690Z] 01:31:58 INFO - Found by: call frame info
[task 2019-11-07T01:31:58.690Z] 01:31:58 INFO - 3 libc-2.23.so + 0x10741d
[task 2019-11-07T01:31:58.691Z] 01:31:58 INFO - rsp = 0x00007efe63e8afc0 rip = 0x00007efeb712641d
[task 2019-11-07T01:31:58.691Z] 01:31:58 INFO - Found by: stack scanning

Whiteboard: [stockwell needswork]

(In reply to Worcester12345 from comment #93)

Please add "Crash" key word.

Also add "PERF" key word?

(In reply to Andreea Pavel [:apavel] from comment #260)

There are 94 total failures in the last 7 days mostly on linux64 debug and 399 in the last 20 days.

My patch in bug 1580212 will hopefully fix all of the mochitest-browser-chrome-fis-e10s-16 failures, which are about half of the last approximately 100 failures.

Another 27 of these failures are in mochitest-browser-chrome-fis-e10s-7. I checked a handful of these failures, and they were all in browser/components/uitour/test/. I'll file a bug about flipping the pref in there, too.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(bugs)

Thank you.

Whiteboard: [stockwell needswork] → [stockwell needswork][comment 262]
Whiteboard: [comment 262][stockwell disable-recommended] → [comment 262][comment 265][comment 266]
Whiteboard: [comment 262][comment 265][comment 266] → [comment 262]
Depends on: 1596307

These shutdown hangs spiked up again on the 16th. It looks like a lot of these failures are happening in toolkit/components/reader/test/ with Fission, and bug 1586139 landed on the 16th and reenabled these tests. I think this is the same worker shutdown hang we saw with the reader tests in another directory (bug 1580212), so we need to flip the pref to reuse content processes here, too. I'll file a bug blocking this one.

oh in bug 1525281, :bc is looking at which test is causing the leak, possibly there is something larger going on here.

Whiteboard: [comment 262][stockwell disable-recommended] → [Comment 269]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]

There are 34 failures associated to this bug in the last 7 days. These are occurring on:

  • linux1804-64-ccov
  • linux1804-64-shippable-qr
  • linux64-tsan
  • macosx1014-64-shippable

recent log: https://treeherder.mozilla.org/logviewer.html#?job_id=292548300&repo=autoland

See Also: → 1603882

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #269)

oh in bug 1525281, :bc is looking at which test is causing the leak, possibly there is something larger going on here.

Hi Joel, considering this is being worked on/looked at in another bug, due to the frequency of the failures, do we still add the stockwell tag and ni here?

Bug has the "meta" keyword also.

Flags: needinfo?(jmaher)

that was 4 months ago, I didn't see a resolution to bug 1525281, maybe that contributes to a lot of this and is starred incorrectly?

Flags: needinfo?(jmaher)

I looked at the last logs, some of them were indeed misclassified, however the confusion was between this bug and bug 1614294. Recent failures that were classified correctly with this bug have:

10:36:50 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
10:36:57 INFO - PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]

There are 55 total failures in the last 7 days.

debug: windows7-32, windows10-64
opt: windows10-64-shippable, linux64-tsan, linux1804-64-shippable-qr, linux1804-64-qr

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=295237224&repo=autoland&lineNumber=53078

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]

Over the last 7 days there are 47 failures on this bug. These happen on: linux1804-64-qr, linux1804-64-shippable, macosx1014-64-shippable, windows10-64, windows10-64-qr, windows7-32.

Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296694189&repo=mozilla-central&lineNumber=52073

Flags: needinfo?(bugs)

I looked at 3 or 4 of the most recent failures in mochitest-devtools-chrome-e10s-3, including the one in comment 298, and they all had this crash in them:

MOZ_CRASH(WorkerRef not thread-safe)

It is unfortunate that these are showing up as WatchDog crashes and not this crash. That might be worth investigating in the test harness.

That was happening around the same time the hanging process was last mentioned. So it seems like this is some kind of shutdown issue with workers. Maybe Jens can find somebody to look at this. Thanks.

Flags: needinfo?(bugs) → needinfo?(jstutte)

I see here:
[task 2020-04-07T22:54:00.967Z] 22:54:00 INFO - GECKO(5024) | Hit MOZ_CRASH(WorkerRef not thread-safe) at /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:40

logged as INFO, followed by a stack trace. Perry, can you have a look?

Flags: needinfo?(jstutte) → needinfo?(perry)
Flags: needinfo?(perry)
Flags: needinfo?(perry)

That could be due to bug 1628052, which I landed on the 13th. Some of the changes in there were for leaks, but some of them were for hangs. You can back that out if you want to see if it helps.

(In reply to Andrew McCreight [:mccr8] from comment #305)

That could be due to bug 1628052, which I landed on the 13th. Some of the changes in there were for leaks, but some of them were for hangs. You can back that out if you want to see if it helps.

I'll test that in try. Thank you

The backout indeed seems to reduce the failure rate on fission to 0 failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7dea7018ea3656d568eb4605edbe3821bb7a4a8&selectedJob=298010940

Most recent failures are linux1804-64 debug mochitest-browser-chrome-fis-e10s-<no> : https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-04-09&endday=2020-04-16&tree=trunk&bug=1358898

Based on the above, should we proceed with backing this out of autoland?

Flags: needinfo?(continuation)

(In reply to Andreea Pavel [:apavel] from comment #307)

Based on the above, should we proceed with backing this out of autoland?

Yes please.

Sorry for the trouble. I meant to post in this bug when I landed in case the issues showed up again, but I forgot.

Flags: needinfo?(perry)
Flags: needinfo?(continuation)
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]

(In reply to Andrew McCreight [:mccr8] from comment #308)

(In reply to Andreea Pavel [:apavel] from comment #307)

Based on the above, should we proceed with backing this out of autoland?

Yes please.

Sorry for the trouble. I meant to post in this bug when I landed in case the issues showed up again, but I forgot.

No problem, thank you for clarifying this.

Failure rate is still high, we'll keep monitoring this in the next few days to see if anything changes.

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

Looking through these crashes, a number of them seem to have crashes in other processes. So we're probably crashing in one process, then another process doesn't know what to do and ends up hanging, but it gets tagged in TreeHerder as a hang because a bug happens to be on file for it already. I'm not sure what we can do about that.

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

Looking at the ~64 failures from the last 7 days, 18 of them are in mochitest-devtools-chrome-e10s-2, and look like bug 1611094. Basically, one process is hitting some threadsafety assertion, and I guess the other process hangs while waiting for the other process to finish.

URL: 1611094

There are another 9 failures in mochitest-devtools-chrome-e10s-3, which look like the same thing.

I'm relanding bug 1628052, except without changes in the directories where I added the pref to fix hangs. Hopefully it won't cause this to regress now.

Assignee: bugs → nobody
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

I looked at a couple of the recent crashes and there's not a lot to go on. It looks like the main thread is spinning the event loop, but there's not any information beyond that. Maybe we can record some information about what we're doing when we spin the event loop, then have the watchdog log it before doing the crash.

This bug is still missing the "CRASH" key word.

Keywords: crash
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

There seems to have been a small spike this week, however no more failures since the 15th.

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

The failure rate has decreased, 11 total failures in the last 7 days.

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

In the past week there were 32 failures on:
-linux1804-64-asan opt
-linux1804-64-ccov opt
-windows7-32 debug.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306211804&repo=mozilla-central&lineNumber=113989

[task 2020-06-13T12:02:43.173Z] 12:02:43 INFO - TEST-START | Shutdown
[task 2020-06-13T12:02:43.177Z] 12:02:43 INFO - Browser Chrome Test Summary
[task 2020-06-13T12:02:43.177Z] 12:02:43 INFO - Passed: 3658
[task 2020-06-13T12:02:43.177Z] 12:02:43 INFO - Failed: 0
[task 2020-06-13T12:02:43.179Z] 12:02:43 INFO - Todo: 0
[task 2020-06-13T12:02:43.181Z] 12:02:43 INFO - Mode: e10s
[task 2020-06-13T12:02:43.181Z] 12:02:43 INFO - *** End BrowserChrome Test Results ***
[task 2020-06-13T12:02:43.747Z] 12:02:43 INFO - GECKO(2856) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-06-13T12:02:43.771Z] 12:02:43 INFO - GECKO(2856) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-06-13T12:02:45.196Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.198Z] 12:02:45 INFO - GECKO(2856) | Suppressions used:
[task 2020-06-13T12:02:45.198Z] 12:02:45 INFO - GECKO(2856) | count bytes template
[task 2020-06-13T12:02:45.198Z] 12:02:45 INFO - GECKO(2856) | 18 568 nsComponentManagerImpl
[task 2020-06-13T12:02:45.199Z] 12:02:45 INFO - GECKO(2856) | 633 18287 libfontconfig.so
[task 2020-06-13T12:02:45.199Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.437Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.438Z] 12:02:45 INFO - GECKO(2856) | Suppressions used:
[task 2020-06-13T12:02:45.438Z] 12:02:45 INFO - GECKO(2856) | count bytes template
[task 2020-06-13T12:02:45.438Z] 12:02:45 INFO - GECKO(2856) | 48 1448 nsComponentManagerImpl
[task 2020-06-13T12:02:45.438Z] 12:02:45 INFO - GECKO(2856) | 633 18287 libfontconfig.so
[task 2020-06-13T12:02:45.438Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.698Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.698Z] 12:02:45 INFO - GECKO(2856) | Suppressions used:
[task 2020-06-13T12:02:45.698Z] 12:02:45 INFO - GECKO(2856) | count bytes template
[task 2020-06-13T12:02:45.698Z] 12:02:45 INFO - GECKO(2856) | 126 3736 nsComponentManagerImpl
[task 2020-06-13T12:02:45.698Z] 12:02:45 INFO - GECKO(2856) | 633 18287 libfontconfig.so
[task 2020-06-13T12:02:45.699Z] 12:02:45 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:45.837Z] 12:02:45 INFO - GECKO(2856) | 1592049765826 Marionette TRACE Received observer notification xpcom-will-shutdown
[task 2020-06-13T12:02:45.837Z] 12:02:45 INFO - GECKO(2856) | 1592049765829 Marionette INFO Stopped listening on port 2828
[task 2020-06-13T12:02:45.837Z] 12:02:45 INFO - GECKO(2856) | 1592049765829 Marionette DEBUG Marionette stopped listening
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | Suppressions used:
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | count bytes template
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | 415 12216 nsComponentManagerImpl
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | 633 18287 libfontconfig.so
[task 2020-06-13T12:02:46.015Z] 12:02:46 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:46.056Z] 12:02:46 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:46.057Z] 12:02:46 INFO - GECKO(2856) | Suppressions used:
[task 2020-06-13T12:02:46.059Z] 12:02:46 INFO - GECKO(2856) | count bytes template
[task 2020-06-13T12:02:46.059Z] 12:02:46 INFO - GECKO(2856) | 141 4176 nsComponentManagerImpl
[task 2020-06-13T12:02:46.059Z] 12:02:46 INFO - GECKO(2856) | 633 18287 libfontconfig.so
[task 2020-06-13T12:02:46.059Z] 12:02:46 INFO - GECKO(2856) | -----------------------------------------------------
[task 2020-06-13T12:02:47.085Z] 12:02:47 INFO - GECKO(2856) | [2020-06-13T12:02:47Z ERROR xulstore::persist] removeDocument error: unavailable
[task 2020-06-13T12:05:53.510Z] 12:05:53 INFO - GECKO(2856) | AddressSanitizer:DEADLYSIGNAL
[task 2020-06-13T12:05:53.510Z] 12:05:53 INFO - GECKO(2856) | =================================================================
[task 2020-06-13T12:05:53.511Z] 12:05:53 ERROR - GECKO(2856) | ==2856==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f0c51edd7df bp 0x7f0c0bcf8e10 sp 0x7f0c0bcf8d60 T371)
[task 2020-06-13T12:05:53.511Z] 12:05:53 INFO - GECKO(2856) | ==2856==The signal is caused by a WRITE memory access.
[task 2020-06-13T12:05:53.511Z] 12:05:53 INFO - GECKO(2856) | ==2856==Hint: address points to the zero page.
[task 2020-06-13T12:05:53.673Z] 12:05:53 INFO - GECKO(2856) | #0 0x7f0c51edd7de in mozilla::(anonymous namespace)::RunWatchdog(void*) /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:216:5
[task 2020-06-13T12:05:53.673Z] 12:05:53 INFO - GECKO(2856) | #1 0x7f0c64ad8d3e in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
[task 2020-06-13T12:05:53.673Z] 12:05:53 INFO - GECKO(2856) | #2 0x7f0c689046da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
[task 2020-06-13T12:05:53.729Z] 12:05:53 INFO - GECKO(2856) | #3 0x7f0c678e288e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-06-13T12:05:53.730Z] 12:05:53 INFO - GECKO(2856) | AddressSanitizer can not provide additional info.
[task 2020-06-13T12:05:53.730Z] 12:05:53 INFO - GECKO(2856) | SUMMARY: AddressSanitizer: SEGV /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:216:5 in mozilla::(anonymous namespace)::RunWatchdog(void*

Flags: needinfo?(dteller)
Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell needswork:owner]

Well, the watchdog seems to be doing its work.
Is there any reason for which I'm ni?ed?

Flags: needinfo?(dteller) → needinfo?(atila.butkovits)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #337)

Well, the watchdog seems to be doing its work.
Is there any reason for which I'm ni?ed?

Sorry about this!

Flags: needinfo?(atila.butkovits)
See Also: → 1590762

I randomly looked at this case and the main thread seems to wait on some lock. Looking at the callstack, I find dlclose calls, which are parent of dlopen which seems to be known to be not thread safe.

This all happens during a psm::UnloadUserModules call, thus I ni Dana tentatively here (this is just one case out of many different, though).

Flags: needinfo?(dkeeler)

From my reading of that link, the issue is not that those calls are not thread-safe but that they can deadlock. I don't see any other threads trying to do anything with dlopen/dlclose and malloc, so I don't think that's the issue. However, I do see gcov trying to do something:

Thread 0
 0  libc.so.6!__lll_lock_wait_private + 0x1c
 1  libc.so.6!_IO_setbuffer [iosetbuffer.c : 33 + 0x4d]
 2  libnssckbi.so!__gcov_open [gcov-io.c : 195 + 0x5]
 3  libnssckbi.so!gcov_do_dump [libgcov-driver.c : 850 + 0xdb]
 4  libnssckbi.so!__gcov_exit [libgcov-driver.c : 887 + 0x12]
 5  ld-linux-x86-64.so.2!_dl_close_worker [dl-close.c : 288 + 0x4]
 6  ld-linux-x86-64.so.2!_dl_close [dl-close.c : 842 + 0x22]
 7  libc.so.6!_dl_catch_exception [dl-error-skeleton.c : 196 + 0xc]
 8  libc.so.6!_dl_catch_error [dl-error-skeleton.c : 215 + 0x5]
 9  libdl.so.2!_dlerror_run [dlerror.c : 162 + 0x17]
10  libdl.so.2!__dlclose [dlclose.c : 46 + 0x5]
11  libnspr4.so!PR_UnloadLibrary [prlink.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 895 + 0xa]
12  libnss3.so!SECMOD_UnloadModule [pk11load.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 674 + 0x8]
13  libnss3.so!SECMOD_SlotDestroyModule [pk11util.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 951 + 0x8]
14  libnss3.so!PK11_DestroySlot [pk11slot.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 468 + 0xa]
15  libnss3.so!PK11_FreeSlot [pk11slot.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 480 + 0x5]
16  libnss3.so!SECMOD_DestroyModule [pk11util.c:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 922 + 0x5]
17  libxul.so!mozilla::psm::UnloadUserModules() [NSSCertDBTrustDomain.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 1616 + 0x8]
18  libxul.so!nsNSSComponent::ShutdownNSS() [nsNSSComponent.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 2119 + 0x5]
19  libxul.so!nsNSSComponent::Observe(nsISupports*, char const*, char16_t const*) [nsNSSComponent.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 2370 + 0x8]
20  libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 65 + 0x1f]
21  libxul.so!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 288 + 0x19]
22  libxul.so!nsXREDirProvider::DoShutdown() [nsXREDirProvider.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 1039 + 0x16]
23  libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 1278 + 0xc]
24  libxul.so!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 4875 + 0x24]
25  libxul.so!XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 4912 + 0x10]
26  libxul.so!mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) [Bootstrap.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 45 + 0x5]
27  firefox-bin!do_main [nsBrowserApp.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 217 + 0x1d]
28  firefox-bin!main [nsBrowserApp.cpp:d48aa0f0aa0bbe69f8eae663be2f5671c41af27c : 331 + 0xf]

Are we sure that gcov is doing the right thing here?

Flags: needinfo?(dkeeler)

Are we sure that gcov is doing the right thing here?

Thanks Dana, do you have a pointer to whom we could ask this?

Flags: needinfo?(dkeeler)

No, I don't know who to ask about that. You could ask in #developers or #firefox on slack or the equivalent in matrix. Incidentally, I had a quick look at a few other instances and either they were misfiled or they're due to something completely different.

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #347)

No, I don't know who to ask about that. You could ask in #developers or #firefox on slack or the equivalent in matrix.

Thanks, will do!

Incidentally, I had a quick look at a few other instances and either they were misfiled or they're due to something completely different.

Yes, this bug collects many different strange cases. I just randomly picked that one and looked at it.

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #343)

From my reading of that link, the issue is not that those calls are not thread-safe but that they can deadlock. I don't see any other threads trying to do anything with dlopen/dlclose and malloc, so I don't think that's the issue. However, I do see gcov trying to do something:

Thread 0
 0  libc.so.6!__lll_lock_wait_private + 0x1c
 1  libc.so.6!_IO_setbuffer [iosetbuffer.c : 33 + 0x4d]
 2  libnssckbi.so!__gcov_open [gcov-io.c : 195 + 0x5]
 3  libnssckbi.so!gcov_do_dump [libgcov-driver.c : 850 + 0xdb]
 4  libnssckbi.so!__gcov_exit [libgcov-driver.c : 887 + 0x12]

Are we sure that gcov is doing the right thing here?

Short investigation on #developers:

marco:

Firefox is dumping coverage counters at the end of the execution, which is pretty slow
so it might also just be super slow and being killed
that said, we are switching to Clang soon, so if it is indeed a deadlock, it might go away after the switch

calixte:

I don't really see how it's possible to have a deadlock here
so I'd say like marco said: it's likely because dump is slow

So this particular case seems not very actionable for now.

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell needswork:owner]

In the last 7 days there have been 31 occurrences, most on linux1804-64-ccov opt, some on windows 10 debug.

Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=312497291&repo=mozilla-central&lineNumber=6007

Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell disable-recommended]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

There are 28 total failures in the last 7 days on

  • linux1804-64-ccov opt
  • linux1804-64-ccov debug
  • windows10-64-qr debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=314383528&repo=mozilla-central&lineNumber=5924

[task 2020-08-29T22:41:12.138Z] 22:41:12 INFO - REFTEST TEST-START | layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA.html?border_abspos == layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA-ref.html?border_abspos
[task 2020-08-29T22:41:12.138Z] 22:41:12 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA.html?border_abspos | 1 / 2 (50%)
[task 2020-08-29T22:41:12.397Z] 22:41:12 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA-ref.html?border_abspos | 1 / 2 (50%)
[task 2020-08-29T22:41:12.484Z] 22:41:12 INFO - REFTEST TEST-PASS | layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA.html?border_abspos == layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA-ref.html?border_abspos | image comparison, max difference: 0, number of differing pixels: 0
[task 2020-08-29T22:41:12.485Z] 22:41:12 INFO - REFTEST TEST-END | layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA.html?border_abspos == layout/reftests/position-dynamic-changes/mixed/mixed-dimentionA-ref.html?border_abspos
[task 2020-08-29T22:41:12.506Z] 22:41:12 INFO - REFTEST INFO | Slowest test took 408ms (file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/position-dynamic-changes/mixed/mixed-dimentionN.html?border_abspos)
[task 2020-08-29T22:41:12.507Z] 22:41:12 INFO - REFTEST INFO | Total canvas count = 2
[task 2020-08-29T22:41:12.769Z] 22:41:12 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-08-29T22:41:12.770Z] 22:41:12 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-08-29T22:41:12.771Z] 22:41:12 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-08-29T22:41:12.885Z] 22:41:12 INFO - profiling:/tmp/tmpyrYTjS/toolkit/xre/nsAppRunner.gcda:Merge mismatch for function 107
[task 2020-08-29T22:42:16.629Z] 22:42:16 INFO - ExceptionHandler::GenerateDump cloned child 9441ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2020-08-29T22:42:16.630Z] 22:42:16 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
Fatal error: glibc detected an invalid stdio handle
[task 2020-08-29T22:42:16.756Z] 22:42:16 INFO - Exiting due to channel error.
[task 2020-08-29T22:42:16.933Z] 22:42:16 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
[task 2020-08-29T22:42:16.933Z] 22:42:16 INFO - REFTEST INFO | Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Rw7DcM9wTUqpdbzrwTBChA/artifacts/public/build/target.crashreporter-symbols.zip
[task 2020-08-29T22:42:19.968Z] 22:42:19 INFO - REFTEST INFO | Copy/paste: /builds/worker/fetches/minidump_stackwalk/minidump_stackwalk /tmp/tmpyNpy4z.mozrunner/minidumps/35b1c654-1508-201f-dd7b-18197b6d2618.dmp /tmp/tmpWPTqn1
[task 2020-08-29T22:42:24.679Z] 22:42:24 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/35b1c654-1508-201f-dd7b-18197b6d2618.dmp
[task 2020-08-29T22:42:24.679Z] 22:42:24 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/35b1c654-1508-201f-dd7b-18197b6d2618.extra
[task 2020-08-29T22:42:24.846Z] 22:42:24 INFO - REFTEST PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2020-08-29T22:42:24.846Z] 22:42:24 INFO - Mozilla crash reason: Shutdown hanging at step quit-application. Something is blocking the main-thread.
[task 2020-08-29T22:42:24.846Z] 22:42:24 INFO - Crash dump filename: /tmp/tmpyNpy4z.mozrunner/minidumps/35b1c654-1508-201f-dd7b-18197b6d2618.dmp
[task 2020-08-29T22:42:24.846Z] 22:42:24 INFO - Operating system: Linux
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - CPU: amd64
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - family 6 model 85 stepping 4
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - 4 CPUs
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO -
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - GPU: UNKNOWN
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO -
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Crash address: 0x0
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Process uptime: not available
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO -
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Thread 68 (crashed)
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 205 + 0x13]
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rax = 0x00007f950a61fb20 rdx = 0x0000563198d90440
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rcx = 0x00007f950a61fb20 rbx = 0x00007f950bbd6980
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rsi = 0x00007f950b987050 rdi = 0x00007f950a61fb68
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rbp = 0x00007f950d3ace60 rsp = 0x00007f950d3ace30
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r8 = 0x0000000000000009 r9 = 0x00007f95523008c0
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r10 = 0x00001f6b8041f6e7 r11 = 0x00007f95525e3720
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r12 = 0x000000000000003f r13 = 0x00000000000024e0
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r14 = 0x00007f950d3ad700 r15 = 0x00007f950d3ad638
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rip = 0x00007f953a83c581
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Found by: given as instruction pointer in context
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - 1 libnspr4.so!_pt_root [ptthread.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 201 + 0x7]
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rbx = 0x00007f95139e5160 rbp = 0x00007f950d3aceb0
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - rsp = 0x00007f950d3ace70 r12 = 0x0000000000000001
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r13 = 0x00000000000024e0 r14 = 0x00007f950d3ad700
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - r15 = 0x00007f950d3ad638 rip = 0x00007f95536d7b2e
[task 2020-08-29T22:42:24.847Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - 2 libpthread.so.0!start_thread [pthread_create.c : 463 + 0xd]
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - rsp = 0x00007f950d3acec0 r12 = 0x00007f950d3acf80
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - r13 = 0x0000000000000000 r14 = 0x00007f95139e5160
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - r15 = 0x00007ffd267e2060 rip = 0x00007f955336f6db
[task 2020-08-29T22:42:24.848Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - 3 libc.so.6!clone + 0x3f
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rbx = 0x00007f950d3ad700 rbp = 0x0000000000000000
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rsp = 0x00007f950d3acf80 r12 = 0x00007f950d3acf80
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - r13 = 0x0000000000000000 r14 = 0x00007f95139e5160
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - r15 = 0x00007ffd267e2060 rip = 0x00007f9552555a3f
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO -
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - Thread 0
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - 0 libc.so.6!__lll_lock_wait_private + 0x1c
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rax = 0xfffffffffffffe00 rdx = 0x0000000000000002
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rcx = 0x00007f955256485c rbx = 0x00007f9509da9400
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rsi = 0x0000000000000080 rdi = 0x00007f9509da94e0
[task 2020-08-29T22:42:24.849Z] 22:42:24 INFO - rbp = 0x0000000000000001 rsp = 0x00007ffd267def08
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - r8 = 0x00007f9553788780 r9 = 0x0000000000000000
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000282
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - r12 = 0x0000000000000000 r13 = 0x00007f950cba4d20
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - r14 = 0x00007f950cba4d88 r15 = 0x00007f950cba9e28
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - rip = 0x00007f955256485c
[task 2020-08-29T22:42:24.850Z] 22:42:24 INFO - Found by: given as instruction pointer in context
[task 2020-08-29T22:42:24.851Z] 22:42:24 INFO - 1 libc.so.6!__GI__IO_un_link [genops.c : 61 + 0x46]
[task 2020-08-29T22:42:24.851Z] 22:42:24 INFO - rdx = 0x00007f9509da94e0 rbx = 0x00007f9509da9400
[task 2020-08-29T22:42:24.851Z] 22:42:24 INFO - rbp = 0x0000000000000001 rsp = 0x00007ffd267def20
[task 2020-08-29T22:42:24.852Z] 22:42:24 INFO - r10 = 0x000000000000006a r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.853Z] 22:42:24 INFO - r13 = 0x00007f950cba4d20 r14 = 0x00007f950cba4d88
[task 2020-08-29T22:42:24.854Z] 22:42:24 INFO - r15 = 0x00007f950cba9e28 rip = 0x00007f95524c173f
[task 2020-08-29T22:42:24.855Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.855Z] 22:42:24 INFO - 2 libc.so.6!_IO_new_fclose [iofclose.c : 49 + 0x5]
[task 2020-08-29T22:42:24.856Z] 22:42:24 INFO - rbx = 0x00007f9509da9400 rbp = 0x0000000001b30000
[task 2020-08-29T22:42:24.857Z] 22:42:24 INFO - rsp = 0x00007ffd267deff0 r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.858Z] 22:42:24 INFO - r13 = 0x00007f950cba4d20 r14 = 0x00007f950cba4d88
[task 2020-08-29T22:42:24.859Z] 22:42:24 INFO - r15 = 0x00007f950cba9e28 rip = 0x00007f95524b237d
[task 2020-08-29T22:42:24.859Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.860Z] 22:42:24 INFO - 3 libnssckbi.so!__gcov_close [gcov-io.c : 212 + 0x5]
[task 2020-08-29T22:42:24.861Z] 22:42:24 INFO - rbx = 0x0000563198d92700 rbp = 0x0000000001b30000
[task 2020-08-29T22:42:24.862Z] 22:42:24 INFO - rsp = 0x00007ffd267df010 r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.862Z] 22:42:24 INFO - r13 = 0x00007f950cba4d20 r14 = 0x00007f950cba4d88
[task 2020-08-29T22:42:24.863Z] 22:42:24 INFO - r15 = 0x00007f950cba9e28 rip = 0x00007f950cb3d66c
[task 2020-08-29T22:42:24.864Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.865Z] 22:42:24 INFO - 4 libnssckbi.so!gcov_do_dump [libgcov-driver.c : 850 + 0x14f]
[task 2020-08-29T22:42:24.866Z] 22:42:24 INFO - rbx = 0x00007f950cba4d20 rbp = 0x0000000001b30000
[task 2020-08-29T22:42:24.866Z] 22:42:24 INFO - rsp = 0x00007ffd267df020 r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.867Z] 22:42:24 INFO - r13 = 0x00007f950cba4d20 r14 = 0x00007f950cba4d88
[task 2020-08-29T22:42:24.867Z] 22:42:24 INFO - r15 = 0x00007f950cba9e28 rip = 0x00007f950cb3e1e1
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - 5 libnssckbi.so!__gcov_exit [libgcov-driver.c : 887 + 0x12]
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rbx = 0x00007f950cb812b8 rbp = 0x00007ffd267e3ec0
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rsp = 0x00007ffd267e3830 r12 = 0x00007f950cb81390
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r13 = 0x000000000000008e r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r15 = 0x000000000000008e rip = 0x00007f950cb3f102
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - 6 ld-linux-x86-64.so.2!_dl_close_worker [dl-close.c : 288 + 0x4]
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rbx = 0x00007f950cb812b8 rbp = 0x00007ffd267e3ec0
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rsp = 0x00007ffd267e3840 r12 = 0x00007f950cb81390
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r13 = 0x000000000000008e r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r15 = 0x000000000000008e rip = 0x00007f955359cccc
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - 7 ld-linux-x86-64.so.2!_dl_close [dl-close.c : 842 + 0x22]
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rbx = 0x00007f950ca5a800 rbp = 0x00007f9552220318
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - rsp = 0x00007ffd267e3ed0 r12 = 0x00007f9552220310
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.868Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955359dafa
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - 8 libc.so.6!_dl_catch_exception [dl-error-skeleton.c : 196 + 0xc]
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x00007f9552220318
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rsp = 0x00007ffd267e3ee0 r12 = 0x00007f9552220310
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955259b51f
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - 9 libc.so.6!_dl_catch_error [dl-error-skeleton.c : 215 + 0x5]
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rbx = 0x00007f9552220308 rbp = 0x00007f9552220318
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rsp = 0x00007ffd267e4000 r12 = 0x00007f9552220310
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955259b5af
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - 10 libdl.so.2!_dlerror_run [dlerror.c : 162 + 0x17]
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rbx = 0x00007f9552220300 rbp = 0x00007f9553165070
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - rsp = 0x00007ffd267e4040 r12 = 0x00007f950ca5a800
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f9553165745
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.869Z] 22:42:24 INFO - 11 libdl.so.2!__dlclose [dlclose.c : 46 + 0x5]
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffd267e4090
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rsp = 0x00007ffd267e4060 r12 = 0x00007f950c9b3fd0
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f95531650b3
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - 12 libnspr4.so!PR_UnloadLibrary [prlink.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 676 + 0xa]
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffd267e4090
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rsp = 0x00007ffd267e4070 r12 = 0x00007f950c9b3fd0
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f95536b621b
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - 13 libnss3.so!SECMOD_UnloadModule [pk11load.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 674 + 0x8]
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rbx = 0x00007f950c9b3fd0 rbp = 0x00007ffd267e40b0
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rsp = 0x00007ffd267e40a0 r12 = 0x00007f952ef7b020
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f95519537a9
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - 14 libnss3.so!SECMOD_SlotDestroyModule [pk11util.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 951 + 0x8]
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rbx = 0x00007f952ef7b020 rbp = 0x00007ffd267e40d0
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - rsp = 0x00007ffd267e40c0 r12 = 0x00007f952ef7b020
[task 2020-08-29T22:42:24.870Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955197aaa9
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - 15 libnss3.so!PK11_DestroySlot [pk11slot.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 468 + 0xa]
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rbx = 0x00007f950ea74000 rbp = 0x00007ffd267e40f0
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rsp = 0x00007ffd267e40e0 r12 = 0x00007f952ef7b020
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f95519742ed
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - 16 libnss3.so!PK11_FreeSlot [pk11slot.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 480 + 0x5]
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffd267e4100
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rsp = 0x00007ffd267e4100 r12 = 0x00007f952ef7b020
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955197434a
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - 17 libnss3.so!SECMOD_DestroyModule [pk11util.c:29ba77e279e4346ecf071b81c8fb171115b266d1 : 922 + 0x5]
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffd267e4130
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - rsp = 0x00007ffd267e4110 r12 = 0x00007f952ef7b020
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r13 = 0x0000000000000008 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f955197abc5
[task 2020-08-29T22:42:24.871Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - 18 libxul.so!mozilla::psm::UnloadUserModules() [NSSCertDBTrustDomain.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 1638 + 0x8]
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rbx = 0x00007f952ef7b020 rbp = 0x00007ffd267e4150
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rsp = 0x00007ffd267e4140 r12 = 0x00007f952ef77220
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f95352c09b5
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - 19 libxul.so!nsNSSComponent::ShutdownNSS() [nsNSSComponent.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 2118 + 0x5]
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rbx = 0x00007f950ea9d9b0 rbp = 0x00007ffd267e41a0
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rsp = 0x00007ffd267e4160 r12 = 0x00007f950ea9da38
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f953a5c19dd
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - 20 libxul.so!nsNSSComponent::Observe(nsISupports*, char const*, char16_t const*) [nsNSSComponent.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 2387 + 0x8]
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rbx = 0x00007f953d6ebd69 rbp = 0x00007ffd267e4240
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - rsp = 0x00007ffd267e41b0 r12 = 0x00007f950ea9d9b0
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r13 = 0x00007f953eb11c20 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f953a5c3969
[task 2020-08-29T22:42:24.872Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - 21 libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 70 + 0x1f]
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rbx = 0x0000000000000008 rbp = 0x00007ffd267e4290
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rsp = 0x00007ffd267e4250 r12 = 0x00007f953eb11c20
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r13 = 0x00007f953d6ebd69 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r15 = 0x00007ffd267e4250 rip = 0x00007f953535b817
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - 22 libxul.so!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 287 + 0x19]
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rbx = 0x00007f953d6ebd69 rbp = 0x00007ffd267e4370
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rsp = 0x00007ffd267e42a0 r12 = 0x00007ffd267e42f0
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r13 = 0x00007f9530252b50 r14 = 0x00007ffd267e42e0
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r15 = 0x00007ffd267e42d8 rip = 0x00007f9535360360
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - 23 libxul.so!nsXREDirProvider::DoShutdown() [nsXREDirProvider.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 1053 + 0x16]
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rbx = 0x00007f9530252b50 rbp = 0x00007ffd267e43b0
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - rsp = 0x00007ffd267e4380 r12 = 0x00007ffd267e44f0
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - r15 = 0x00007f955228e340 rip = 0x00007f953a891809
[task 2020-08-29T22:42:24.873Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - 24 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 1284 + 0xc]
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rbx = 0x00007f95302ac368 rbp = 0x00007ffd267e4400
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rsp = 0x00007ffd267e43c0 r12 = 0x00007ffd267e44b0
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r15 = 0x00007f955228e340 rip = 0x00007f953a87a1b4
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - 25 libxul.so!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 4958 + 0x24]
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rbx = 0x00007f95302ac368 rbp = 0x00007ffd267e44a0
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rsp = 0x00007ffd267e4410 r12 = 0x00007ffd267e44b0
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r15 = 0x00007f955228e340 rip = 0x00007f953a884981
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - 26 libxul.so!XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 4995 + 0x10]
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rbx = 0x0000000000000004 rbp = 0x00007ffd267e45d0
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - rsp = 0x00007ffd267e44b0 r12 = 0x00007ffd267e44b0
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r13 = 0x00007ffd267e5788 r14 = 0x00007ffd267e4600
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x00007f953a884cda
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.874Z] 22:42:24 INFO - 27 libxul.so!mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) [Bootstrap.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 45 + 0x5]
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rbx = 0x00007ffd267e5788 rbp = 0x00007ffd267e45e0
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rsp = 0x00007ffd267e45e0 r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r13 = 0x0000000000000004 r14 = 0x00007ffd267e57b0
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x00007f953a8883ba
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - 28 firefox-bin!do_main(int, char**, char**) [nsBrowserApp.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 217 + 0x1d]
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rbx = 0x00007ffd267e5788 rbp = 0x00007ffd267e5650
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rsp = 0x00007ffd267e45f0 r12 = 0x0000000000000000
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r13 = 0x0000000000000004 r14 = 0x00007ffd267e57b0
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x0000563198c7d8d7
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - 29 firefox-bin!main [nsBrowserApp.cpp:29ba77e279e4346ecf071b81c8fb171115b266d1 : 331 + 0xf]
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rbx = 0x00000d60d880cdea rbp = 0x00007ffd267e56a0
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - rsp = 0x00007ffd267e5660 r12 = 0x00007ffd267e57b0
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r13 = 0x00007ffd267e5780 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x0000563198c7c2b7
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.875Z] 22:42:24 INFO - 30 libc.so.6!__libc_start_main [libc-start.c : 310 + 0x1a]
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x0000563198d27d40
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - rsp = 0x00007ffd267e56b0 r12 = 0x0000563198c7cc11
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - r13 = 0x00007ffd267e5780 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x00007f9552455b97
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - Found by: call frame info
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - 31 firefox-bin!_start + 0x29
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - rsp = 0x00007ffd267e5770 r12 = 0x0000563198c7cc11
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - r13 = 0x00007ffd267e5780 r14 = 0x0000000000000000
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - r15 = 0x0000000000000000 rip = 0x0000563198c7cc3a
[task 2020-08-29T22:42:24.876Z] 22:42:24 INFO - Found by: call frame info

In the last 7 days there have been 42 occurrences on linux1804-64-ccov, macosx1014-64, windows10-64 build types debug and opt.

Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315544998&repo=mozilla-central&lineNumber=43581

Hi David, can we somehow triage these bugs through MOZ_CRASH_REASON now ? I expect to see then different patterns:

In particular I see in several stack traces of the Windows Shutdown too long, probably frozen, causing a crash. variant the main thread hanging at KERNELBASE.dll!RtlpComputeLangListCheckSum which might contain something actionable (an internal deadlock?).

Flags: needinfo?(dteller)

If it helps finding underlying reasons, by all means!

The difference between these three crash reasons is how much data/which data we have during the crash. The terminator doesn't know if the data is actually relevant to the crash, but if the human being finds out that it is, power to the human :)

Flags: needinfo?(dteller)
Depends on: 1665308
No longer depends on: 1665308
Flags: needinfo?(jstutte)
Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell needswork:owner]

FWIW, I see mostly:

Under macosx1014-64 / debug / test-verify-e10s only MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.).

Under linux1804-64-ccov / opt / reftest-e10s-5 I see both Shutdown hanging at step quit-application. Something is blocking the main-thread. and MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.).

Then there are few sparse other platforms / builds.

It would be helpful to have the same aggregation methods as under crash-stats also here.

Flags: needinfo?(jstutte)

All the recent hangs show the following MOZ_CRASH:

https://treeherder.mozilla.org/logviewer?job_id=328280753&repo=mozilla-central&lineNumber=22155-22171

[task 2021-01-30T06:56:50.674Z] 06:56:50     INFO - Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:233
[task 2021-01-30T06:56:51.435Z] 06:56:51     INFO - #01: _PR_NativeRunThread(void*) [nsprpub/pr/src/threads/combined/pruthr.c:399]
[task 2021-01-30T06:56:51.435Z] 06:56:51     INFO - #02: pr_root(void*) [nsprpub/pr/src/md/windows/w95thred.c:139]
[task 2021-01-30T06:56:51.441Z] 06:56:51     INFO - #03: o____lc_collate_cp_func [Z:\task_1611987784\build\application\firefox\ucrtbase.DLL + 0x3e16f]
[task 2021-01-30T06:56:51.442Z] 06:56:51     INFO - #04: BaseThreadInitThunk [C:\windows\system32\kernel32.dll + 0x4ef3c]
[task 2021-01-30T06:56:51.457Z] 06:56:51     INFO - #05: RtlInitializeExceptionChain [C:\windows\SYSTEM32\ntdll.dll + 0x63618]
[task 2021-01-30T06:56:51.457Z] 06:56:51     INFO - #06: RtlInitializeExceptionChain [C:\windows\SYSTEM32\ntdll.dll + 0x635eb]

Which maps to:
https://searchfox.org/mozilla-central/rev/bd92b9b4a3c2ff022e830c1358968a84e6e69c95/toolkit/components/terminator/nsTerminator.cpp#233

Gabriele any idea what could be wrong here? Could this be related to the crash reporter code that gets called a line before?

Flags: needinfo?(gsvelto)

This message is indicating, that mozilla::XPCOMShutdownNotified(); already happened before the watchdog triggered. But at this point shutdown has not yet completed entirely, and (as we can see) the watchdog is still armed. Indeed, after this point in ShutdownXPCOM(nsIServiceManager* aServMgr) we still have quite some road to go:

observerService->NotifyObservers(nullptr, NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID, nullptr);
...
nsTimerImpl::Shutdown();
...
nsThreadManager::get().Shutdown();
...
[and some other steps]

Some of those steps will be shortened by the fast shutdown (once bug 1683957 landed), but there is (and will remain) still potential for hangs after this point in time (so the message of the watchdog might be slightly misleading, too).

BTW, Doug and I chatted about this lately, and we noticed also that the shutdown phase related telemetry is not in sync with some later defined shutdown phases of the watchdog. I'll file a bug to adjust this.

See Also: → 1689953

The CrashManager is trying to create a folder and failing. I don't know if the error-handling is robust in that function so it might be playing a part here. I'll have a look later today.

(In reply to Gabriele Svelto [:gsvelto] from comment #393)

The CrashManager is trying to create a folder and failing. I don't know if the error-handling is robust in that function so it might be playing a part here. I'll have a look later today.

Gabriele, have you had the time to take a look at that problem?

Not yet sorry, leaving the NI? so I don't forget.

The crash manager isn't handling this errors gracefully, they just bubble up. I'll file a bug to fix it.

Flags: needinfo?(gsvelto)
Depends on: 1694624

I filed bug 1694624 to address the issue with the crash manager. Note that since we bubble up the errors in that case it might have no impact on this bug, or it might, unfortunately it's impossible to tell beforehand.

Depends on: 1695504
Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell unknown] → [Comment 269][stockwell needswork:owner]
Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]

There has been a recent increase in this type of failure, with crash stacks that contain "NativeMenuMac::OpenMenu". This type is tracked in bug 1707598.

After looking through a few of the Linux assertion failures here, I find it somewhat suspicious that so many of them also contain this crash in a content process:

INFO - out of memory: 0x0000000028700000 bytes requested
INFO - Hit MOZ_CRASH() at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc_abort.cpp:33
INFO - #01: mozalloc_abort [memory/mozalloc/mozalloc_abort.cpp:33]
INFO - #02: mozalloc_handle_oom(unsigned long) [memory/mozalloc/mozalloc_oom.cpp:51]
INFO - #03: moz_xrealloc [memory/mozalloc/mozalloc.cpp:72]
INFO - #04: nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_RelocateUsingMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned long, unsigned long) [xpcom/ds/nsTArray-inl.h:210]
INFO - #05: mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) [netwerk/ipc/ChannelEventQueue.h:218]
INFO - #06: mozilla::net::HttpChannelChild::DoNotifyListener() [netwerk/protocol/http/HttpChannelChild.cpp:1245]
INFO - #07: mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::net::HttpChannelChild>, void (mozilla::net::HttpChannelChild::*)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1204]

This appears to be crashing in the runnable dispatched from https://searchfox.org/mozilla-central/rev/9f76a47f4aa935b49754c5608a1c8e72ee358c46/netwerk/protocol/http/HttpChannelChild.cpp#242-244 due to an OOM. The RunOrEnqueue call appears to be OOM-ing here: https://searchfox.org/mozilla-central/rev/9f76a47f4aa935b49754c5608a1c8e72ee358c46/netwerk/ipc/ChannelEventQueue.h#218

The size of the allocation here is especially suspect, as mEventQueue is a nsTArray<UniquePtr<...>>, meaning that the allocation is trying to reserve enough space for ~84,803,583 events. Based on my understanding of our growth algorithm, this implies that we're trying to add our ~75,366,400th event, which is an absurd number of events to have queued.

ni? :jstutte to find someone from the networking team who might know what's going on here :-)

EDIT: I noticed immediately after submitting that the ones I noticed seem to be crashing in a specific crashtest (https://searchfox.org/mozilla-central/rev/9f76a47f4aa935b49754c5608a1c8e72ee358c46/xpcom/string/crashtests/1113005.html). Some examples:

Flags: needinfo?(jstutte)

That crashtest OOM is being tracked in bug 1683062.

See Also: → 1683062

I will try to reproduce this with logging. Hopefully that will work.

Flags: needinfo?(jstutte) → needinfo?(dd.mozilla)

I tried yesterday to reproduce it on try with logging:
https://treeherder.mozilla.org/jobs?repo=try&revision=359cf9bd9014e40dd8919c509e5cdf7a856b51fa

It haven't failed yet. I will try a couple of more runs.

Flags: needinfo?(dd.mozilla)

Looking at some of the reports: example 1, example 2, example 3 I see us hanging at gfxPlatform::ShutdownLayersIPC() on a quite weird stack:

[task 2021-05-06T18:46:43.050Z] 18:46:43     INFO - Thread 0
[task 2021-05-06T18:46:43.050Z] 18:46:43     INFO -  0  libpthread.so.0 + 0xd9f3
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     rax = 0xfffffffffffffe00   rdx = 0x0000000000000000
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     rcx = 0x00007f7838f8e9f3   rbx = 0x00007f7820e59500
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     rsi = 0x0000000000000080   rdi = 0x00007f7820e59528
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     rbp = 0x00007f7820e59524   rsp = 0x00007ffefa579a60
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -      r8 = 0x0000000000000000    r9 = 0x0000000000000000
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     r10 = 0x0000000000000000   r11 = 0x0000000000000246
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     r12 = 0x00007f7820e59528   r13 = 0x0000000000000000
[task 2021-05-06T18:46:43.051Z] 18:46:43     INFO -     r14 = 0x00007f7820e59400   r15 = 0x000000000000008c
[task 2021-05-06T18:46:43.052Z] 18:46:43     INFO -     rip = 0x00007f7838f8e9f3
[task 2021-05-06T18:46:43.052Z] 18:46:43     INFO -     Found by: given as instruction pointer in context
[task 2021-05-06T18:46:43.052Z] 18:46:43     INFO -  1  libgdk-3.so.0 + 0x67fd6
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -     rsp = 0x00007ffefa579aa0   rip = 0x00007f78366a1fd6
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -  2  libpthread.so.0 + 0xd690
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -     rsp = 0x00007ffefa579aa8   rip = 0x00007f7838f8e690
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -  3  firefox-bin!mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) [ConditionVariable_posix.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 108 + 0xb]
[task 2021-05-06T18:46:43.053Z] 18:46:43     INFO -     rsp = 0x00007ffefa579b30   rip = 0x000055827c95b295
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -  4  firefox-bin!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_posix.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 115 + 0x8]
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     rsp = 0x00007ffefa579b50   rip = 0x000055827c95b310
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -  5  ld-linux-x86-64.so.2 + 0x138cc
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     rsp = 0x00007ffefa579b80   rip = 0x00007f78391b38cc
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -  6  libxul.so!mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [BlockingResourceBase.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 525 + 0x10]
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     rsp = 0x00007ffefa579bc0   rip = 0x00007f7827412e6a
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.054Z] 18:46:43     INFO -  7  libxul.so!mozilla::OffTheBooksCondVar::Wait() [BlockingResourceBase.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 505 + 0x5]
[task 2021-05-06T18:46:43.055Z] 18:46:43     INFO -     rsp = 0x00007ffefa579c10   rip = 0x00007f7827412dbf
[task 2021-05-06T18:46:43.055Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.055Z] 18:46:43     INFO -  8  libxul.so!mozilla::TaskController::GetRunnableForMTTask(bool) [TaskController.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 530 + 0x8]
[task 2021-05-06T18:46:43.055Z] 18:46:43     INFO -     rsp = 0x00007ffefa579c20   rip = 0x00007f7827420ccd
[task 2021-05-06T18:46:43.056Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.056Z] 18:46:43     INFO -  9  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 1093 + 0x11]
[task 2021-05-06T18:46:43.059Z] 18:46:43     INFO -     rsp = 0x00007ffefa579c70   rip = 0x00007f782742d33f
[task 2021-05-06T18:46:43.059Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 10  ld-linux-x86-64.so.2 + 0x138cc
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d30   rip = 0x00007f78391b38cc
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 11  ld-linux-x86-64.so.2 + 0x19a58
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d40   rip = 0x00007f78391b9a58
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 12  libxul.so!NS_IsMainThread [nsThreadManager.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 221 + 0x10]
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d50   rip = 0x00007f78274207ea
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 13  libxul.so!nsIXPConnect::XPConnect() [nsXPConnect.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 1031 + 0x5]
[task 2021-05-06T18:46:43.061Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d60   rip = 0x00007f7827e4ec09
[task 2021-05-06T18:46:43.061Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.061Z] 18:46:43     INFO - 14  libxul.so!XPCJSContext::RecordScriptActivity(bool) [XPCJSContext.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 544 + 0x5]
[task 2021-05-06T18:46:43.061Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d70   rip = 0x00007f7827e247f6
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO - 15  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 548 + 0xc]
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     rsp = 0x00007ffefa579db0   rip = 0x00007f7827432026
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO - 16  libxul.so!mozilla::layers::CompositorThreadHolder::Shutdown() [CompositorThread.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 132 + 0x42]
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     rsp = 0x00007ffefa579de0   rip = 0x00007f782823d68c
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO - 17  libxul.so!gfxPlatform::ShutdownLayersIPC() [gfxPlatform.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 1356 + 0x5]
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     rsp = 0x00007ffefa579e40   rip = 0x00007f782826fed6
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO - 18  libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [XPCOMInit.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 620 + 0x5]
[task 2021-05-06T18:46:43.062Z] 18:46:43     INFO -     rsp = 0x00007ffefa579e70   rip = 0x00007f7827467f9c

Weird because the stack continues for many frames in linux internal functions and I do not see the start frame libxul.so!XRE_main(int, char**, mozilla::BootstrapConfig const&) which I would expect to see.

We end up in this SpinEventLoopUntil, it seems.

:jya, do you have any thought on this (given that you touched that code at least somehow) ?

Flags: needinfo?(jya-moz)

The stackwalker is using stack scanning to find the instruction pointers in the stack; that's prone to errors. We don't have symbols for those native libraries which seems odd, which version of Linux are we running in automation?

I am no longer a Mozilla employee unfortunately.

There are many places where races between processes could cause for a process to wait indefinitely on an action that will never come.

Flags: needinfo?(jya-moz)

Sorry to bother you then.

(In reply to Gabriele Svelto [:gsvelto] from comment #434)

The stackwalker is using stack scanning to find the instruction pointers in the stack; that's prone to errors. We don't have symbols for those native libraries which seems odd, which version of Linux are we running in automation?

It would be good to know if that stack is reliable at least up to the SpinEventLoopUntil. It sounds risky to me to have nested event loops such late during shutdown?

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]

(In reply to Jens Stutte [:jstutte] from comment #437)

It would be good to know if that stack is reliable at least up to the SpinEventLoopUntil. It sounds risky to me to have nested event loops such late during shutdown?

With stack scanning it's hard to tell. What we should do is identify exactly which libraries are in use and extract the debuginfo from them; this way I can provide a stack that can be relied upon. Generally speaking however if you find multiple stacks generated with stack scanning and they all have the same frames it's likely that they're right; it's just that you can't be 100% sure.

(In reply to Gabriele Svelto [:gsvelto] from comment #439)

With stack scanning it's hard to tell. What we should do is identify exactly which libraries are in use and extract the debuginfo from them; this way I can provide a stack that can be relied upon. Generally speaking however if you find multiple stacks generated with stack scanning and they all have the same frames it's likely that they're right; it's just that you can't be 100% sure.

The stack seems to reproduce several times in the same way, but if I look at

[task 2021-05-06T18:46:43.056Z] 18:46:43     INFO -  9  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 1093 + 0x11]
[task 2021-05-06T18:46:43.059Z] 18:46:43     INFO -     rsp = 0x00007ffefa579c70   rip = 0x00007f782742d33f
[task 2021-05-06T18:46:43.059Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 10  ld-linux-x86-64.so.2 + 0x138cc
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d30   rip = 0x00007f78391b38cc
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 11  ld-linux-x86-64.so.2 + 0x19a58
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d40   rip = 0x00007f78391b9a58
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO - 12  libxul.so!NS_IsMainThread [nsThreadManager.cpp:fe98ac52cf7cf863404f1781b6c8b7bc2fada5d2 : 221 + 0x10]
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     rsp = 0x00007ffefa579d50   rip = 0x00007f78274207ea
[task 2021-05-06T18:46:43.060Z] 18:46:43     INFO -     Found by: stack scanning

I see no meaningful path from NS_IsMainThread() to nsThread::ProcessNextEvent through two linux function calls?

There are 87 total failures in the last 7 days on

  • linux1804-64 debug and opt
  • linux1804-64-ccov opt
  • linux1804-64-qr debug
  • windows10-64 debug
  • windows10-64-qr debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=340522107&repo=mozilla-central&lineNumber=5812

[task 2021-05-22T10:32:11.595Z] 10:32:11 INFO - GECKO(2263) | [Child 2633, Main Thread] WARNING: '!ipcActor->SendAccumulateChildHistograms(histogramsToSend)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:279
[task 2021-05-22T10:32:11.596Z] 10:32:11 INFO - GECKO(2263) | [Child 2633, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:527
[task 2021-05-22T10:32:11.597Z] 10:32:11 INFO - GECKO(2263) | [Child 2633, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:298
[task 2021-05-22T10:32:14.228Z] 10:32:14 INFO - GECKO(2263) | out of memory: 0x0000000028700000 bytes requested
[task 2021-05-22T10:32:14.228Z] 10:32:14 INFO - GECKO(2263) | Hit MOZ_CRASH() at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc_abort.cpp:33
[task 2021-05-22T10:32:14.326Z] 10:32:14 INFO - GECKO(2263) | #01: mozalloc_abort [memory/mozalloc/mozalloc_abort.cpp:33]
[task 2021-05-22T10:32:14.328Z] 10:32:14 INFO - GECKO(2263) | #02: mozalloc_handle_oom(unsigned long) [memory/mozalloc/mozalloc_oom.cpp:51]
[task 2021-05-22T10:32:14.329Z] 10:32:14 INFO - GECKO(2263) | #03: moz_xstrdup [memory/mozalloc/mozalloc.cpp:80]
[task 2021-05-22T10:32:14.331Z] 10:32:14 INFO - GECKO(2263) | #04: nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_RelocateUsingMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned long, unsigned long) [xpcom/ds/nsTArray-inl.h:223]
[task 2021-05-22T10:32:14.332Z] 10:32:14 INFO - GECKO(2263) | #05: mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) [netwerk/ipc/ChannelEventQueue.h:218]
[task 2021-05-22T10:32:14.333Z] 10:32:14 INFO - GECKO(2263) | #06: mozilla::net::HttpChannelChild::DoNotifyListener() [netwerk/protocol/http/HttpChannelChild.cpp:1226]
[task 2021-05-22T10:32:14.334Z] 10:32:14 INFO - GECKO(2263) | #07: mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::net::HttpChannelChild>, void (mozilla::net::HttpChannelChild::)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1206]
[task 2021-05-22T10:32:14.335Z] 10:32:14 INFO - GECKO(2263) | #08: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:483]
[task 2021-05-22T10:32:14.336Z] 10:32:14 INFO - GECKO(2263) | #09: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:766]
[task 2021-05-22T10:32:14.337Z] 10:32:14 INFO - GECKO(2263) | #10: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:621]
[task 2021-05-22T10:32:14.338Z] 10:32:14 INFO - GECKO(2263) | #11: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:405]
[task 2021-05-22T10:32:14.339Z] 10:32:14 INFO - GECKO(2263) | #12: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:535]
[task 2021-05-22T10:32:14.340Z] 10:32:14 INFO - GECKO(2263) | #13: nsThread::ProcessNextEvent(bool, bool
) [xpcom/threads/nsThread.cpp:1163]
[task 2021-05-22T10:32:14.341Z] 10:32:14 INFO - GECKO(2263) | #14: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:548]
[task 2021-05-22T10:32:14.342Z] 10:32:14 INFO - GECKO(2263) | #15: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:85]
[task 2021-05-22T10:32:14.343Z] 10:32:14 INFO - GECKO(2263) | #16: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:335]
[task 2021-05-22T10:32:14.344Z] 10:32:14 INFO - GECKO(2263) | #17: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:311]
[task 2021-05-22T10:32:14.345Z] 10:32:14 INFO - GECKO(2263) | #18: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
[task 2021-05-22T10:32:14.346Z] 10:32:14 INFO - GECKO(2263) | #19: XRE_RunAppShell() [toolkit/xre/nsEmbedFunctions.cpp:911]
[task 2021-05-22T10:32:14.348Z] 10:32:14 INFO - GECKO(2263) | #20: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:235]
[task 2021-05-22T10:32:14.349Z] 10:32:14 INFO - GECKO(2263) | #21: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:335]
[task 2021-05-22T10:32:14.349Z] 10:32:14 INFO - GECKO(2263) | #22: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:311]
[task 2021-05-22T10:32:14.350Z] 10:32:14 INFO - GECKO(2263) | #23: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:743]
[task 2021-05-22T10:32:14.351Z] 10:32:14 INFO - GECKO(2263) | #24: content_process_main(mozilla::Bootstrap*, int, char**) [ipc/contentproc/plugin-container.cpp:58]
[task 2021-05-22T10:32:14.352Z] 10:32:14 INFO - GECKO(2263) | #25: main [browser/app/nsBrowserApp.cpp:313]
[task 2021-05-22T10:32:14.353Z] 10:32:14 INFO - GECKO(2263) | #26: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x21b97]
[task 2021-05-22T10:32:14.354Z] 10:32:14 INFO - GECKO(2263) | #27: ??? [/builds/worker/workspace/build/application/firefox/firefox-bin + 0x41559]
[task 2021-05-22T10:32:14.354Z] 10:32:14 INFO - GECKO(2263) | #28: ??? (???:???)
[task 2021-05-22T10:32:14.575Z] 10:32:14 INFO - TEST-INFO | Main app process: exit 11
[task 2021-05-22T10:32:14.576Z] 10:32:14 INFO - Buffered messages finished
[task 2021-05-22T10:32:14.577Z] 10:32:14 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
[task 2021-05-22T10:32:14.580Z] 10:32:14 INFO - runtests.py | Application ran for: 0:23:24.722463
[task 2021-05-22T10:32:14.583Z] 10:32:14 INFO - zombiecheck | Reading PID log: /tmp/tmpnbj3jmi_pidlog
[task 2021-05-22T10:32:14.587Z] 10:32:14 INFO - ==> process 2263 launched child process 2283
[task 2021-05-22T10:32:14.588Z] 10:32:14 INFO - ==> process 2263 launched child process 2338
[task 2021-05-22T10:32:14.588Z] 10:32:14 INFO - ==> process 2263 launched child process 2360
[task 2021-05-22T10:32:14.590Z] 10:32:14 INFO - ==> process 2263 launched child process 2391
[task 2021-05-22T10:32:14.591Z] 10:32:14 INFO - ==> process 2263 launched child process 2425
[task 2021-05-22T10:32:14.591Z] 10:32:14 INFO - ==> process 2263 launched child process 2475
[task 2021-05-22T10:32:14.592Z] 10:32:14 INFO - ==> process 2263 launched child process 2509
[task 2021-05-22T10:32:14.592Z] 10:32:14 INFO - ==> process 2263 launched child process 2515
[task 2021-05-22T10:32:14.592Z] 10:32:14 INFO - ==> process 2263 launched child process 2520
[task 2021-05-22T10:32:14.593Z] 10:32:14 INFO - ==> process 2263 launched child process 2588
[task 2021-05-22T10:32:14.593Z] 10:32:14 INFO - ==> process 2263 launched child process 2608
[task 2021-05-22T10:32:14.594Z] 10:32:14 INFO - ==> process 2263 launched child process 2633
[task 2021-05-22T10:32:14.594Z] 10:32:14 INFO - ==> process 2263 launched child process 2653
[task 2021-05-22T10:32:14.594Z] 10:32:14 INFO - ==> process 2263 launched child process 2676
[task 2021-05-22T10:32:14.595Z] 10:32:14 INFO - ==> process 2263 launched child process 2680
[task 2021-05-22T10:32:14.595Z] 10:32:14 INFO - ==> process 2263 launched child process 2717
[task 2021-05-22T10:32:14.595Z] 10:32:14 INFO - ==> process 2263 launched child process 2737
[task 2021-05-22T10:32:14.596Z] 10:32:14 INFO - ==> process 2263 launched child process 2757
[task 2021-05-22T10:32:14.597Z] 10:32:14 INFO - ==> process 2263 launched child process 2777
[task 2021-05-22T10:32:14.600Z] 10:32:14 INFO - ==> process 2263 launched child process 2797
[task 2021-05-22T10:32:14.602Z] 10:32:14 INFO - ==> process 2263 launched child process 2817
[task 2021-05-22T10:32:14.602Z] 10:32:14 INFO - ==> process 2263 launched child process 2837
[task 2021-05-22T10:32:14.602Z] 10:32:14 INFO - ==> process 2263 launched child process 2862
[task 2021-05-22T10:32:14.603Z] 10:32:14 INFO - ==> process 2263 launched child process 2887
[task 2021-05-22T10:32:14.603Z] 10:32:14 INFO - ==> process 2263 launched child process 2888
[task 2021-05-22T10:32:14.603Z] 10:32:14 INFO - ==> process 2263 launched child process 2890
[task 2021-05-22T10:32:14.603Z] 10:32:14 INFO - ==> process 2263 launched child process 2892
[task 2021-05-22T10:32:14.604Z] 10:32:14 INFO - ==> process 2263 launched child process 2895
[task 2021-05-22T10:32:14.605Z] 10:32:14 INFO - ==> process 2263 launched child process 2980
[task 2021-05-22T10:32:14.605Z] 10:32:14 INFO - ==> process 2263 launched child process 3034
[task 2021-05-22T10:32:14.605Z] 10:32:14 INFO - ==> process 2263 launched child process 3065
[task 2021-05-22T10:32:14.606Z] 10:32:14 INFO - ==> process 2263 launched child process 3098
[task 2021-05-22T10:32:14.606Z] 10:32:14 INFO - ==> process 2263 launched child process 3125
[task 2021-05-22T10:32:14.606Z] 10:32:14 INFO - ==> process 2263 launched child process 3152
[task 2021-05-22T10:32:14.611Z] 10:32:14 INFO - ==> process 2263 launched child process 3177
[task 2021-05-22T10:32:14.612Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2817
[task 2021-05-22T10:32:14.612Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2837
[task 2021-05-22T10:32:14.612Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3098
[task 2021-05-22T10:32:14.613Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2588
[task 2021-05-22T10:32:14.613Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2717
[task 2021-05-22T10:32:14.613Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2338
[task 2021-05-22T10:32:14.614Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2980
[task 2021-05-22T10:32:14.614Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2475
[task 2021-05-22T10:32:14.614Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2862
[task 2021-05-22T10:32:14.615Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2608
[task 2021-05-22T10:32:14.615Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2737
[task 2021-05-22T10:32:14.616Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3125
[task 2021-05-22T10:32:14.616Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2360
[task 2021-05-22T10:32:14.616Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3065
[task 2021-05-22T10:32:14.616Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2757
[task 2021-05-22T10:32:14.617Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2887
[task 2021-05-22T10:32:14.617Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2888
[task 2021-05-22T10:32:14.617Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2633
[task 2021-05-22T10:32:14.617Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2890
[task 2021-05-22T10:32:14.618Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2892
[task 2021-05-22T10:32:14.618Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2509
[task 2021-05-22T10:32:14.618Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2895
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3152
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2515
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2391
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2520
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2777
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3034
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2653
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 3177
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2283
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2797
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2676
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2680
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - zombiecheck | Checking for orphan process with PID: 2425
[task 2021-05-22T10:32:14.619Z] 10:32:14 INFO - mozcrash Copy/paste: /builds/worker/fetches/minidump_stackwalk/minidump_stackwalk /tmp/tmphuulsm3n.mozrunner/minidumps/65203269-c857-4237-f92a-d02242120827.dmp /builds/worker/workspace/build/symbols
[task 2021-05-22T10:32:20.295Z] 10:32:20 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/65203269-c857-4237-f92a-d02242120827.dmp
[task 2021-05-22T10:32:20.297Z] 10:32:20 INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/65203269-c857-4237-f92a-d02242120827.extra
[task 2021-05-22T10:32:20.426Z] 10:32:20 INFO - PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2021-05-22T10:32:20.426Z] 10:32:20 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-05-22T10:32:20.426Z] 10:32:20 INFO - Crash dump filename: /tmp/tmphuulsm3n.mozrunner/minidumps/65203269-c857-4237-f92a-d02242120827.dmp
[task 2021-05-22T10:32:20.427Z] 10:32:20 INFO - Operating system: Linux
[task 2021-05-22T10:32:20.428Z] 10:32:20 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2021-05-22T10:32:20.428Z] 10:32:20 INFO - CPU: amd64
[task 2021-05-22T10:32:20.428Z] 10:32:20 INFO - family 6 model 85 stepping 7
[task 2021-05-22T10:32:20.429Z] 10:32:20 INFO - 2 CPUs
[task 2021-05-22T10:32:20.429Z] 10:32:20 INFO -
[task 2021-05-22T10:32:20.429Z] 10:32:20 INFO - GPU: UNKNOWN
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO -
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - Crash address: 0x0
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - Process uptime: not available
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO -
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - Thread 49 (crashed)
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:f2d7667fc838510b83ccda7b757f91b9d958b275 : 246 + 0x22]
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - rax = 0x00007f97ac852972 rdx = 0x0000000000000000
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - rcx = 0x000056388ba528f8 rbx = 0x0000000000000276
[task 2021-05-22T10:32:20.430Z] 10:32:20 INFO - rsi = 0x00007f97c03768b0 rdi = 0x00007f97c0375680
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - rbp = 0x00007f978fba5e60 rsp = 0x00007f978fba5e30
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - r8 = 0x00007f97c03768b0 r9 = 0x00007f978fba6700
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000000
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - r12 = 0x00007f978fba6630 r13 = 0x0000000000000007
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - r14 = 0x00007f978fba5e40 r15 = 0x00007f97726f90d0
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - rip = 0x00007f97b35912d0
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - Found by: given as instruction pointer in context
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - 1 libnspr4.so!_pt_root [ptthread.c:f2d7667fc838510b83ccda7b757f91b9d958b275 : 201 + 0x7]
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - rbx = 0x00007f9773499940 rbp = 0x00007f978fba5eb0
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - rsp = 0x00007f978fba5e70 r12 = 0x00007f978fba6630
[task 2021-05-22T10:32:20.431Z] 10:32:20 INFO - r13 = 0x0000000000000007 r14 = 0x0000000000000ca6
[task 2021-05-22T10:32:20.432Z] 10:32:20 INFO - r15 = 0x00007f978fba6700 rip = 0x00007f97c1255396
[task 2021-05-22T10:32:20.432Z] 10:32:20 INFO - Found by: call frame info
[task 2021-05-22T10:32:20.432Z] 10:32:20 INFO - 2 libpthread.so.0 + 0x76db
[task 2021-05-22T10:32:20.432Z] 10:32:20 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2021-05-22T10:32:20.433Z] 10:32:20 INFO - rsp = 0x00007f978fba5ec0 r12 = 0x00007f978fba5f80
[task 2021-05-22T10:32:20.433Z] 10:32:20 INFO - r13 = 0x0000000000000000 r14 = 0x00007f9773499940
[task 2021-05-22T10:32:20.433Z] 10:32:20 INFO - r15 = 0x00007ffc70d8ec90 rip = 0x00007f97c0ec46db
[task 2021-05-22T10:32:20.434Z] 10:32:20 INFO - Found by: call frame info
[task 2021-05-22T10:32:20.434Z] 10:32:20 INFO - 3 libc.so.6 + 0x121a3f
[task 2021-05-22T10:32:20.434Z] 10:32:20 INFO - rsp = 0x00007f978fba5f80 rip = 0x00007f97c00aaa3f
[task 2021-05-22T10:32:20.435Z] 10:32:20 INFO - Found by: stack scanning

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]

There are 79 total failures in the last 7 days on

  • windows10-64-qr debug
  • windows10-32-qr debug
  • windows10-32 debug
  • linux1804-64-qr debug
  • linux1804-64 debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=343948618&repo=autoland&lineNumber=54041

[task 2021-06-26T17:21:54.946Z] 17:21:54 INFO - XPCOM_MEM_BLOAT_LOG: /tmp/tmpn62hm3nq.mozrunner/runreftest_leaks.log
[task 2021-06-26T17:21:54.949Z] 17:21:54 INFO - Writing to log: /tmp/tmpn62hm3nq.mozrunner/runreftest_leaks.log
[task 2021-06-26T17:21:55.955Z] 17:21:55 INFO - Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:246
[task 2021-06-26T17:21:55.964Z] 17:21:55 INFO - #01: mozilla::(anonymous namespace)::RunWatchdog(void*) [toolkit/components/terminator/nsTerminator.cpp:246]
[task 2021-06-26T17:21:55.997Z] 17:21:55 INFO - #02: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2021-06-26T17:21:55.999Z] 17:21:55 INFO - #03: ??? [/lib/x86_64-linux-gnu/libpthread.so.0 + 0x76db]
[task 2021-06-26T17:21:56.001Z] 17:21:56 INFO - #04: clone [/lib/x86_64-linux-gnu/libc.so.6 + 0x121a3f]
[task 2021-06-26T17:21:56.002Z] 17:21:56 INFO - #05: ??? (???:???)
[task 2021-06-26T17:21:56.003Z] 17:21:56 INFO - ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2021-06-26T17:21:56.004Z] 17:21:56 INFO - 2722
[task 2021-06-26T17:21:56.005Z] 17:21:56 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2021-06-26T17:21:56.164Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.165Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.166Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.166Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.168Z] 17:21:56 INFO - [Child 1643, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:527
[task 2021-06-26T17:21:56.169Z] 17:21:56 INFO - [GFX1-]: Receive IPC close with reason=AbnormalShutdown
[task 2021-06-26T17:27:33.240Z] 17:27:33 INFO - out of memory: 0x0000000028700000 bytes requested
[task 2021-06-26T17:27:33.240Z] 17:27:33 INFO - Hit MOZ_CRASH() at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc_abort.cpp:33
[task 2021-06-26T17:27:33.244Z] 17:27:33 INFO - #01: mozalloc_abort [memory/mozalloc/mozalloc_abort.cpp:33]
[task 2021-06-26T17:27:33.244Z] 17:27:33 INFO - #02: mozalloc_handle_oom(unsigned long) [memory/mozalloc/mozalloc_oom.cpp:51]
[task 2021-06-26T17:27:33.245Z] 17:27:33 INFO - #03: moz_xstrdup [memory/mozalloc/mozalloc.cpp:80]
[task 2021-06-26T17:27:33.253Z] 17:27:33 INFO - #04: nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_RelocateUsingMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned long, unsigned long) [xpcom/ds/nsTArray-inl.h:223]
[task 2021-06-26T17:27:33.256Z] 17:27:33 INFO - #05: mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) [netwerk/ipc/ChannelEventQueue.h:218]
[task 2021-06-26T17:27:33.257Z] 17:27:33 INFO - #06: mozilla::net::HttpChannelChild::DoNotifyListener() [netwerk/protocol/http/HttpChannelChild.cpp:1215]
[task 2021-06-26T17:27:33.260Z] 17:27:33 INFO - #07: mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::net::HttpChannelChild>, void (mozilla::net::HttpChannelChild::)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1204]
[task 2021-06-26T17:27:33.261Z] 17:27:33 INFO - #08: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:503]
[task 2021-06-26T17:27:33.261Z] 17:27:33 INFO - #09: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:805]
[task 2021-06-26T17:27:33.262Z] 17:27:33 INFO - #10: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:641]
[task 2021-06-26T17:27:33.262Z] 17:27:33 INFO - #11: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:425]
[task 2021-06-26T17:27:33.263Z] 17:27:33 INFO - #12: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:533]
[task 2021-06-26T17:27:33.264Z] 17:27:33 INFO - #13: nsThread::ProcessNextEvent(bool, bool
) [xpcom/threads/nsThread.cpp:1156]
[task 2021-06-26T17:27:33.264Z] 17:27:33 INFO - #14: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:431]
[task 2021-06-26T17:27:33.267Z] 17:27:33 INFO - #15: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:622]
[task 2021-06-26T17:27:33.267Z] 17:27:33 INFO - #16: XRE_TermEmbedding() [toolkit/xre/nsEmbedFunctions.cpp:216]
[task 2021-06-26T17:27:33.268Z] 17:27:33 INFO - #17: mozilla::ipc::ScopedXREEmbed::Stop() [ipc/glue/ScopedXREEmbed.cpp:91]
[task 2021-06-26T17:27:33.268Z] 17:27:33 INFO - #18: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:747]
[task 2021-06-26T17:27:33.269Z] 17:27:33 INFO - #19: content_process_main(mozilla::Bootstrap*, int, char**) [ipc/contentproc/plugin-container.cpp:58]
[task 2021-06-26T17:27:33.269Z] 17:27:33 INFO - #20: main [browser/app/nsBrowserApp.cpp:327]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #21: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x21b97]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #22: ??? [/builds/worker/workspace/build/application/firefox/firefox-bin + 0x41599]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #23: ??? (???:???)
[task 2021-06-26T17:27:33.611Z] 17:27:33 ERROR - TEST-UNEXPECTED-FAIL | xpcom/string/crashtests/1113005.html (finished) | application terminated with exit code 11
[task 2021-06-26T17:27:33.617Z] 17:27:33 INFO - REFTEST INFO | Copy/paste: /builds/worker/fetches/minidump_stackwalk/minidump_stackwalk /tmp/tmpn62hm3nq.mozrunner/minidumps/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp /builds/worker/workspace/build/symbols
[task 2021-06-26T17:27:40.501Z] 17:27:40 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp
[task 2021-06-26T17:27:40.502Z] 17:27:40 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/4d806502-be64-cfd7-0738-4a0e5085ab4f.extra
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - REFTEST PROCESS-CRASH | xpcom/string/crashtests/1113005.html (finished) | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash dump filename: /tmp/tmpn62hm3nq.mozrunner/minidumps/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Operating system: Linux
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - CPU: amd64
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - family 6 model 85 stepping 7
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 2 CPUs
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - GPU: UNKNOWN
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash address: 0x0
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Process uptime: not available
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Thread 39 (crashed)
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:813c6084a272058497b36049c42cec3e93d90420 : 246 + 0x22]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rax = 0x00007fc7e3606f8c rdx = 0x0000000000000000
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rcx = 0x0000561e68f25938 rbx = 0x0000000000000276
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rsi = 0x00007fc7f715c8b0 rdi = 0x00007fc7f715b680
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rbp = 0x00007fc7a1725e60 rsp = 0x00007fc7a1725e30
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r8 = 0x00007fc7f715c8b0 r9 = 0x00007fc7a1726700
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000000
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r12 = 0x00007fc7a1726630 r13 = 0x0000000000000007
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r14 = 0x00007fc7a1725e40 r15 = 0x00007fc7ba66ca08
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rip = 0x00007fc7ea44d450
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Found by: given as instruction pointer in context
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 1 libnspr4.so!_pt_root [ptthread.c:813c6084a272058497b36049c42cec3e93d90420 : 201 + 0x7]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rbx = 0x00007fc7cf276940 rbp = 0x00007fc7a1725eb0
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rsp = 0x00007fc7a1725e70 r12 = 0x00007fc7a1726630
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r13 = 0x0000000000000007 r14 = 0x0000000000000a84
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r15 = 0x00007fc7a1726700 rip = 0x00007fc7f803b3f3
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: call frame info
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - 2 libpthread.so.0 + 0x76db
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rsp = 0x00007fc7a1725ec0 r12 = 0x00007fc7a1725f80
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r13 = 0x0000000000000000 r14 = 0x00007fc7cf276940
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r15 = 0x00007ffeb9461320 rip = 0x00007fc7f7caa6db
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: call frame info
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - 3 libc.so.6 + 0x121a3f
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rsp = 0x00007fc7a1725f80 rip = 0x00007fc7f6e90a3f
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: stack scanning

There are 79 total failures in the last 7 days on

  • windows10-64-qr debug
  • windows10-32-qr debug
  • windows10-32 debug
  • linux1804-64-qr debug
  • linux1804-64 debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=343948618&repo=autoland&lineNumber=54041

[task 2021-06-26T17:21:54.946Z] 17:21:54 INFO - XPCOM_MEM_BLOAT_LOG: /tmp/tmpn62hm3nq.mozrunner/runreftest_leaks.log
[task 2021-06-26T17:21:54.949Z] 17:21:54 INFO - Writing to log: /tmp/tmpn62hm3nq.mozrunner/runreftest_leaks.log
[task 2021-06-26T17:21:55.955Z] 17:21:55 INFO - Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:246
[task 2021-06-26T17:21:55.964Z] 17:21:55 INFO - #01: mozilla::(anonymous namespace)::RunWatchdog(void*) [toolkit/components/terminator/nsTerminator.cpp:246]
[task 2021-06-26T17:21:55.997Z] 17:21:55 INFO - #02: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2021-06-26T17:21:55.999Z] 17:21:55 INFO - #03: ??? [/lib/x86_64-linux-gnu/libpthread.so.0 + 0x76db]
[task 2021-06-26T17:21:56.001Z] 17:21:56 INFO - #04: clone [/lib/x86_64-linux-gnu/libc.so.6 + 0x121a3f]
[task 2021-06-26T17:21:56.002Z] 17:21:56 INFO - #05: ??? (???:???)
[task 2021-06-26T17:21:56.003Z] 17:21:56 INFO - ExceptionHandler::GenerateDump cloned child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2021-06-26T17:21:56.004Z] 17:21:56 INFO - 2722
[task 2021-06-26T17:21:56.005Z] 17:21:56 INFO - ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2021-06-26T17:21:56.164Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.165Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.166Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.166Z] 17:21:56 INFO - [Child 1643, IPC I/O Child] WARNING: [CEBF1CE29482D46F.E50D96809E1D1A3B]: Ignoring message 'EVENT_MESSAGE' to peer 1.1 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-06-26T17:21:56.168Z] 17:21:56 INFO - [Child 1643, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:527
[task 2021-06-26T17:21:56.169Z] 17:21:56 INFO - [GFX1-]: Receive IPC close with reason=AbnormalShutdown
[task 2021-06-26T17:27:33.240Z] 17:27:33 INFO - out of memory: 0x0000000028700000 bytes requested
[task 2021-06-26T17:27:33.240Z] 17:27:33 INFO - Hit MOZ_CRASH() at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc_abort.cpp:33
[task 2021-06-26T17:27:33.244Z] 17:27:33 INFO - #01: mozalloc_abort [memory/mozalloc/mozalloc_abort.cpp:33]
[task 2021-06-26T17:27:33.244Z] 17:27:33 INFO - #02: mozalloc_handle_oom(unsigned long) [memory/mozalloc/mozalloc_oom.cpp:51]
[task 2021-06-26T17:27:33.245Z] 17:27:33 INFO - #03: moz_xstrdup [memory/mozalloc/mozalloc.cpp:80]
[task 2021-06-26T17:27:33.253Z] 17:27:33 INFO - #04: nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_RelocateUsingMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned long, unsigned long) [xpcom/ds/nsTArray-inl.h:223]
[task 2021-06-26T17:27:33.256Z] 17:27:33 INFO - #05: mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) [netwerk/ipc/ChannelEventQueue.h:218]
[task 2021-06-26T17:27:33.257Z] 17:27:33 INFO - #06: mozilla::net::HttpChannelChild::DoNotifyListener() [netwerk/protocol/http/HttpChannelChild.cpp:1215]
[task 2021-06-26T17:27:33.260Z] 17:27:33 INFO - #07: mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::net::HttpChannelChild>, void (mozilla::net::HttpChannelChild::)(), true, (mozilla::RunnableKind)0, >::Run() [xpcom/threads/nsThreadUtils.h:1204]
[task 2021-06-26T17:27:33.261Z] 17:27:33 INFO - #08: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:503]
[task 2021-06-26T17:27:33.261Z] 17:27:33 INFO - #09: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:805]
[task 2021-06-26T17:27:33.262Z] 17:27:33 INFO - #10: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:641]
[task 2021-06-26T17:27:33.262Z] 17:27:33 INFO - #11: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:425]
[task 2021-06-26T17:27:33.263Z] 17:27:33 INFO - #12: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:533]
[task 2021-06-26T17:27:33.264Z] 17:27:33 INFO - #13: nsThread::ProcessNextEvent(bool, bool
) [xpcom/threads/nsThread.cpp:1156]
[task 2021-06-26T17:27:33.264Z] 17:27:33 INFO - #14: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:431]
[task 2021-06-26T17:27:33.267Z] 17:27:33 INFO - #15: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:622]
[task 2021-06-26T17:27:33.267Z] 17:27:33 INFO - #16: XRE_TermEmbedding() [toolkit/xre/nsEmbedFunctions.cpp:216]
[task 2021-06-26T17:27:33.268Z] 17:27:33 INFO - #17: mozilla::ipc::ScopedXREEmbed::Stop() [ipc/glue/ScopedXREEmbed.cpp:91]
[task 2021-06-26T17:27:33.268Z] 17:27:33 INFO - #18: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:747]
[task 2021-06-26T17:27:33.269Z] 17:27:33 INFO - #19: content_process_main(mozilla::Bootstrap*, int, char**) [ipc/contentproc/plugin-container.cpp:58]
[task 2021-06-26T17:27:33.269Z] 17:27:33 INFO - #20: main [browser/app/nsBrowserApp.cpp:327]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #21: __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6 + 0x21b97]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #22: ??? [/builds/worker/workspace/build/application/firefox/firefox-bin + 0x41599]
[task 2021-06-26T17:27:33.270Z] 17:27:33 INFO - #23: ??? (???:???)
[task 2021-06-26T17:27:33.611Z] 17:27:33 ERROR - TEST-UNEXPECTED-FAIL | xpcom/string/crashtests/1113005.html (finished) | application terminated with exit code 11
[task 2021-06-26T17:27:33.617Z] 17:27:33 INFO - REFTEST INFO | Copy/paste: /builds/worker/fetches/minidump_stackwalk/minidump_stackwalk /tmp/tmpn62hm3nq.mozrunner/minidumps/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp /builds/worker/workspace/build/symbols
[task 2021-06-26T17:27:40.501Z] 17:27:40 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp
[task 2021-06-26T17:27:40.502Z] 17:27:40 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/4d806502-be64-cfd7-0738-4a0e5085ab4f.extra
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - REFTEST PROCESS-CRASH | xpcom/string/crashtests/1113005.html (finished) | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash dump filename: /tmp/tmpn62hm3nq.mozrunner/minidumps/4d806502-be64-cfd7-0738-4a0e5085ab4f.dmp
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Operating system: Linux
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - CPU: amd64
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - family 6 model 85 stepping 7
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 2 CPUs
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - GPU: UNKNOWN
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Crash address: 0x0
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Process uptime: not available
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO -
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Thread 39 (crashed)
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:813c6084a272058497b36049c42cec3e93d90420 : 246 + 0x22]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rax = 0x00007fc7e3606f8c rdx = 0x0000000000000000
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rcx = 0x0000561e68f25938 rbx = 0x0000000000000276
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rsi = 0x00007fc7f715c8b0 rdi = 0x00007fc7f715b680
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rbp = 0x00007fc7a1725e60 rsp = 0x00007fc7a1725e30
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r8 = 0x00007fc7f715c8b0 r9 = 0x00007fc7a1726700
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000000
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r12 = 0x00007fc7a1726630 r13 = 0x0000000000000007
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - r14 = 0x00007fc7a1725e40 r15 = 0x00007fc7ba66ca08
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rip = 0x00007fc7ea44d450
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - Found by: given as instruction pointer in context
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - 1 libnspr4.so!_pt_root [ptthread.c:813c6084a272058497b36049c42cec3e93d90420 : 201 + 0x7]
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rbx = 0x00007fc7cf276940 rbp = 0x00007fc7a1725eb0
[task 2021-06-26T17:27:40.627Z] 17:27:40 INFO - rsp = 0x00007fc7a1725e70 r12 = 0x00007fc7a1726630
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r13 = 0x0000000000000007 r14 = 0x0000000000000a84
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r15 = 0x00007fc7a1726700 rip = 0x00007fc7f803b3f3
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: call frame info
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - 2 libpthread.so.0 + 0x76db
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rsp = 0x00007fc7a1725ec0 r12 = 0x00007fc7a1725f80
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r13 = 0x0000000000000000 r14 = 0x00007fc7cf276940
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - r15 = 0x00007ffeb9461320 rip = 0x00007fc7f7caa6db
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: call frame info
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - 3 libc.so.6 + 0x121a3f
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - rsp = 0x00007fc7a1725f80 rip = 0x00007fc7f6e90a3f
[task 2021-06-26T17:27:40.628Z] 17:27:40 INFO - Found by: stack scanning

Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]
Flags: needinfo?(kershaw)

(In reply to Jens Stutte [:jstutte] from comment #471)

See bug 1683062 comment 33.

I'll try to fix bug 1683062.

Flags: needinfo?(kershaw)
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]

There have been 128 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • linux1804-64-ccov-qr
  • linux1804-64-qr
  • linux1804-64-tsan-qr
  • macosx1015-64-qr
Depends on: 1740087
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
Whiteboard: [Comment 269][stockwell disable-recommended] → [Comment 269][stockwell needswork:owner]
See Also: → 1769821
Severity: normal → S3
No longer depends on: 1424451
Depends on: 1769821

(In reply to Intermittent Failures Robot from comment #575)

15 failures in 755 pushes (0.02 failures/push) were associated with this bug yesterday.
For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1358898&startday=2022-11-08&endday=2022-11-08&tree=all

All cases in this bucket are the same as bug 1769821.

Duplicate of this bug: 1777684

Copying crash signatures from duplicate bugs.

Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] → [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed]
See Also: 1769821
Duplicate of this bug: 1817725
Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed] → [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed]

(In reply to Intermittent Failures Robot from comment #605)

18 failures in 3411 pushes (0.005 failures/push) were associated with this bug in the last 7 days.
For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1358898&startday=2023-05-01&endday=2023-05-07&tree=all

The only ESR instance seems to be solved by bug 1817674, probably. All other instances are random ASAN shutdown hangs that are not really actionable given that we do not have the other executing threads, see bug 1424393 comment 72.

The real hangs are now split over several bugs, see this crash-stats query.

Not sure how to handle this, this bug is basically moot now but links to a bunch of other bugs that might be still relevant. Probably we should go through the signatures in crash-stats and check for existing new bugs where to move them if appropriate.

Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed] → [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed]

I added the existing open bugs as blockers here. There are some more on crash-stats that may deserve their own bug.

Duplicate of this bug: 1861906

Something like 10 of the recent failures are in dom/indexedDB/test/test_traffic_jam.html on Linux opt.

https://treeherder.mozilla.org/logviewer?job_id=457742524&repo=autoland&lineNumber=15056
is in netwerk/cookie/test/browser/browser_indexedDB.js but seems not to be the shutdown hang we want to track here. There I read right before the crash:

[task 2024-05-10T10:37:56.871Z] 10:37:56     INFO - GECKO(19453) | [Child 21829: Main Thread]: D/WorkerShutdownDump Found ActiveWorker (dedicated): blob:https://example.com/46757b52-44b2-43e6-8cf6-078a37665c02
[task 2024-05-10T10:37:56.871Z] 10:37:56     INFO - GECKO(19453) | [Child 21829: Main Thread]: D/WorkerShutdownDump Principal: https://example.com
[task 2024-05-10T10:37:56.872Z] 10:37:56     INFO - GECKO(19453) | [Child 21829: Main Thread]: D/WorkerShutdownDump LoadingPrincipal: https://example.com
[task 2024-05-10T10:37:56.873Z] 10:37:56     INFO - GECKO(19453) | [Child 21829: Main Thread]: D/WorkerShutdownDump CrashInfo: IsChromeWorker(false)|IDBOpenDBRequest

so while we are shutting down we see this worker ref being held.

https://treeherder.mozilla.org/logviewer?job_id=457692462&repo=autoland&lineNumber=3985
is in /IndexedDB/structured-clone.any.worker.html?61-80 and again is that different thing.

There seem to be more of these and we might want a separate bug for this. However, there is also the possibility that this is a symptom of having moved IDB connections to work on a thread pool and we see some underlying shutdown issue of thread pools in general.

Once a thread pool is asked to shutdown, it will refuse to accept any further events even if those come from a thread running inside the same pool while we are draining the remaining events (the loop continues until there are events). That means, that any processing that relies on a sequence of

  1. "A on other thread dispatches B to the pool"
  2. "B runs on pool and dispatches C to the same pool"
  3. "C runs on pool and dispatches D to original thread to resolve some promise/close some callback roundtrip/whatever"

would break, if the pool is asked to shutdown after 1. but before 2. I think it is worth filing a bug to rule this out.

Depends on: 1897376

We do not really expect bug 1897376 to make a difference here, at least for the IndexedDB case.

No longer depends on: 1897376
See Also: → 1897376

:jstutte, it seems like there is a spike right now in these watchdog failures. A good chunk of them seem to be waiting for the PBackground thread to shut down (which is currently idle) e.g. https://treeherder.mozilla.org/logviewer?job_id=472964517&repo=mozilla-beta&lineNumber=18734. The hang appears to be https://searchfox.org/mozilla-central/rev/f09e3f9603a08b5b51bf504846091579bc2ff531/ipc/glue/BackgroundImpl.cpp#953-954. (There was also a failure in RemoteWorkerService::Observe, which may be different: https://treeherder.mozilla.org/logviewer?job_id=472975560&repo=mozilla-beta&lineNumber=6172)

Any chance you have an idea what might be going wrong here?

Flags: needinfo?(jstutte)
Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::(anonymous namespace)::RunWatchdog(void*) + 0x3ed] → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ]
Flags: needinfo?(jstutte)

The SpinEventLoopUntil waits for sLiveActorCount to go down. It even has a timer to force kill remaining actors after 10s.

:janv, IIRC we are/were switching more code to use actors recently, could that be of touching PBackground things?

Flags: needinfo?(jvarga)

(In reply to Jens Stutte [:jstutte] from comment #730)

The SpinEventLoopUntil waits for sLiveActorCount to go down. It even has a timer to force kill remaining actors after 10s.

:janv, IIRC we are/were switching more code to use actors recently, could that be of touching PBackground things?

I'm not aware if any new sub protocols of PBackground.
There was some cleanup/refactoring work:
https://phabricator.services.mozilla.com/D194042
https://phabricator.services.mozilla.com/D194150
https://phabricator.services.mozilla.com/D194214
https://phabricator.services.mozilla.com/D194097
but I'm not sure if that could affect hthe top level protocol shutdown

Maybe we could add some tags for PBackground actors, so we would know if it's a worker or main thread, etc ?
We could iterate over remaining actors and print info about them when the timer fires.

Flags: needinfo?(jvarga)

We could iterate over remaining actors and print info about them when the timer fires.

IIUC we only count content processes (and their derivates) with sLiveActorCount, so while we might find a pattern for specific process types, we will not be able to see here what keeps those processes alive.

I wonder though if ContentParent::RecvInitBackground could happen after we already crossed sBackgroundThread = nullptr; while shutting down and waiting for other processes. At a first glance I do not see anything in place that would prevent us from trying to re-create a new background thread which might arrive at a point that breaks our book-keeping?

Most of the recent failures seem to have the following line from the crash reporter included in the logs:

JavaScript error: resource://gre/modules/CrashManager.jsm, line 828: AbortError: IOUtils: Shutting down and refusing additional I/O tasks

I've filed bug 1734543 a while ago. Could this be related as well (means we could mark as blocking this bug)?

(In reply to Jens Stutte [:jstutte] from comment #734)

We could iterate over remaining actors and print info about them when the timer fires.

IIUC we only count content processes (and their derivates) with sLiveActorCount, so while we might find a pattern for specific process types, we will not be able to see here what keeps those processes alive.

We could try to go further and dump information about PBackground sub-protocols ...
(once there would be basic infrastructure for dumping debug information)

A lot of the non-asan failures from last week seem to be all about bug 1734543. So maybe these can be re-classified after inspection?

Depends on: 1734543
Flags: needinfo?(sheriffs)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #751)

A lot of the non-asan failures from last week seem to be all about bug 1734543. So maybe these can be re-classified after inspection?

Re-classified the non-asan failures after inspection

Flags: needinfo?(sheriffs)
Duplicate of this bug: 1930437

Copying crash signatures from duplicate bugs.

Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator…
Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator… → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ]

Copying crash signatures from duplicate bugs.

Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator…

(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #766)

Copying crash signatures from duplicate bugs.

I think we want only the watchdog signatures here (which should capture all cases, anyways). I hope the bot will accept my change...

Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator… → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog]

Copying crash signatures from duplicate bugs.

Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculato…

I removed the unwanted signature from bug 1930437 so that the bot should no longer update this bug.

Crash Signature: [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculato… → [@ shutdownhang | mozilla::(anonymous namespace)::RunWatchdog ] [@ mozilla::(anonymous namespace)::RunWatchdog]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: