Open Bug 1358898 (RunWatchdogShutdownhang) Opened 3 years ago Updated 14 hours ago

[meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.

Categories

(Toolkit :: Async Tooling, defect)

defect
Not set

Tracking

()

Tracking Status
firefox55 + fixed
firefox56 + fixed

People

(Reporter: kats, Assigned: smaug)

References

(Depends on 14 open bugs, Blocks 1 open bug)

Details

(Keywords: intermittent-failure, leave-open, meta, Whiteboard: [Comment 269])

Attachments

(1 file, 2 obsolete files)

See list of bugs this is blocking.
Kats, since this is a frequent failure, could you take a look at this bug, thanks!
Flags: needinfo?(bugmail)
So this signature appears when Firefox kills itself because it's taking too long to shut down. I looked at the 5 most recent instances of this in the orangefactor link in comment 3:

https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=106188385&lineNumber=3102
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106181195&lineNumber=3063
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106175000&lineNumber=3066
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=106156528&lineNumber=3104
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=106155867&lineNumber=3106

In three of the 5, the main thread is busy doing cycle collection when the watchdog decides it's been long enough and shuts down. In the other two the main thread seems to be doing something in the idle timer callback.

I think the cycle collection owners should take the first crack at this, since from my sample >50% of these forced shutdowns are due to that. :mccr8, do you know who would be the right person to take a look?
Flags: needinfo?(bugmail) → needinfo?(continuation)
So, this isn't the cycle collector per se. It looks like we're shutting down threads, which causes us to run all of the events, which causes us to run AsyncFreeSnowWhite::Run, which destroys cycle collected objects. Maybe this runnable is getting starved out while we're running tests, so a ton of things accumulate? Maybe we can collect more aggressively when there's a lot to destroy. Do you have any ideas, Olli? It looks like this started happening more often, so presumably either your or Ehsan's work to do more work when idle is causing problems.
Flags: needinfo?(continuation) → needinfo?(bugs)
yeah, I think AsyncFreeSnowWhite could use idle+timeout, not only idle dispatch.
See Also: → 1371728
Attached patch idle dispatch with timeout (obsolete) — Splinter Review
Try to use idle time, but if not available, run at least once a second.
Flags: needinfo?(bugs)
Attachment #8877706 - Flags: review?(continuation)
Comment on attachment 8877706 [details] [diff] [review]
idle dispatch with timeout

Review of attachment 8877706 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8877706 - Flags: review?(continuation) → review+
Keywords: leave-open
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ee807301e66
ensure AsyncFreeSnowWhite is run often enough, r=mccr8
[Tracking Requested - why for this release]: intermittent failures, possible higher memory usage when the browser is very busy.

This should get backported to 55.
Backed out for crashing in test_add_task_run_next_test.js during Linux build:

https://hg.mozilla.org/integration/mozilla-inbound/rev/51cae901338f24eec985f2e5aa58387c50ecb28e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0ee807301e66400fcee3d428dd82584f8041c393&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=107115313&repo=mozilla-inbound

[task 2017-06-14T19:13:03.982302Z] 19:13:03     INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_msvc
[task 2017-06-14T19:13:03.982530Z] 19:13:03     INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_overridden_unsupported_clang
[task 2017-06-14T19:13:03.982756Z] 19:13:03     INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_overridden_unsupported_gcc
[task 2017-06-14T19:13:03.982976Z] 19:13:03     INFO - TEST-PASS | /home/worker/workspace/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_unsupported_msvc
[task 2017-06-14T19:13:15.052508Z] 19:13:15     INFO - /home/worker/workspace/build/src/testing/xpcshell/selftest.py
[task 2017-06-14T19:13:15.052724Z] 19:13:15     INFO - PROCESS-CRASH | test_add_task_run_next_test.js | application crashed [@ libxul.so + 0xba601c]
[task 2017-06-14T19:13:15.052866Z] 19:13:15     INFO - Crash dump filename: /tmp/xpc-other-NFEPmg/7e1416f8-1c93-a010-30ee-f2765cc9d012.dmp
[task 2017-06-14T19:13:15.053041Z] 19:13:15     INFO - Operating system: Linux
[task 2017-06-14T19:13:15.053232Z] 19:13:15     INFO -                   0.0.0 Linux 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64
[task 2017-06-14T19:13:15.053344Z] 19:13:15     INFO - CPU: amd64
[task 2017-06-14T19:13:15.053604Z] 19:13:15     INFO -      family 6 model 63 stepping 2
[task 2017-06-14T19:13:15.053642Z] 19:13:15     INFO -      16 CPUs
[task 2017-06-14T19:13:15.053786Z] 19:13:15     INFO - Crash reason:  SIGSEGV
[task 2017-06-14T19:13:15.053954Z] 19:13:15     INFO - Crash address: 0x0
[task 2017-06-14T19:13:15.054107Z] 19:13:15     INFO - Thread 0 (crashed)
[task 2017-06-14T19:13:15.054249Z] 19:13:15     INFO -  0  libxul.so + 0xba601c
[task 2017-06-14T19:13:15.054396Z] 19:13:15     INFO -     rbx = 0x00007ffeb171fb10   r12 = 0x00007ffeb171fb28
[task 2017-06-14T19:13:15.054555Z] 19:13:15     INFO -     r13 = 0x00007ffeb171fb18   r14 = 0x0000000000000000
[task 2017-06-14T19:13:15.054713Z] 19:13:15     INFO -     r15 = 0x00007fa824c1f0b6   rip = 0x00007fa824c1f01c
[task 2017-06-14T19:13:15.054853Z] 19:13:15     INFO -     rsp = 0x00007ffeb171fab0   rbp = 0x00007ffeb171faf0
[task 2017-06-14T19:13:15.054989Z] 19:13:15     INFO -     Found by: given as instruction pointer in context
[task 2017-06-14T19:13:15.055125Z] 19:13:15     INFO -  1  libxul.so + 0xba88e1
[task 2017-06-14T19:13:15.055268Z] 19:13:15     INFO -     rip = 0x00007fa824c218e2   rsp = 0x00007ffeb171fac0
[task 2017-06-14T19:13:15.055429Z] 19:13:15     INFO -     rbp = 0x00007ffeb171faf0
[task 2017-06-14T19:13:15.055619Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.055833Z] 19:13:15     INFO -  2  libxul.so + 0xba8da3
[task 2017-06-14T19:13:15.056034Z] 19:13:15     INFO -     rip = 0x00007fa824c21da4   rsp = 0x00007ffeb171fb00
[task 2017-06-14T19:13:15.065752Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fb60
[task 2017-06-14T19:13:15.065787Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.065809Z] 19:13:15     INFO -  3  libxul.so + 0x127db0d
[task 2017-06-14T19:13:15.065835Z] 19:13:15     INFO -     rip = 0x00007fa8252f6b0e   rsp = 0x00007ffeb171fb70
[task 2017-06-14T19:13:15.065857Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fba0
[task 2017-06-14T19:13:15.065878Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.065895Z] 19:13:15     INFO -  4  libxul.so + 0x127db7f
[task 2017-06-14T19:13:15.065918Z] 19:13:15     INFO -     rip = 0x00007fa8252f6b80   rsp = 0x00007ffeb171fbb0
[task 2017-06-14T19:13:15.065937Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fbc0
[task 2017-06-14T19:13:15.065956Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.065975Z] 19:13:15     INFO -  5  libxul.so + 0x127db2d
[task 2017-06-14T19:13:15.065998Z] 19:13:15     INFO -     rip = 0x00007fa8252f6b2e   rsp = 0x00007ffeb171fbb8
[task 2017-06-14T19:13:15.066018Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fbc0
[task 2017-06-14T19:13:15.066035Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066053Z] 19:13:15     INFO -  6  libxul.so + 0xb46d57
[task 2017-06-14T19:13:15.066076Z] 19:13:15     INFO -     rip = 0x00007fa824bbfd58   rsp = 0x00007ffeb171fbd0
[task 2017-06-14T19:13:15.066096Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fc70
[task 2017-06-14T19:13:15.066115Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066133Z] 19:13:15     INFO -  7  libxul.so + 0xb48411
[task 2017-06-14T19:13:15.066157Z] 19:13:15     INFO -     rip = 0x00007fa824bc1412   rsp = 0x00007ffeb171fc80
[task 2017-06-14T19:13:15.066176Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fcf0
[task 2017-06-14T19:13:15.066195Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066214Z] 19:13:15     INFO -  8  libxul.so + 0x4446328
[task 2017-06-14T19:13:15.066238Z] 19:13:15     INFO -     rip = 0x00007fa8284bf329   rsp = 0x00007ffeb171fcb0
[task 2017-06-14T19:13:15.066258Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fcf0
[task 2017-06-14T19:13:15.066278Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066297Z] 19:13:15     INFO -  9  libxul.so + 0x3b62300
[task 2017-06-14T19:13:15.066322Z] 19:13:15     INFO -     rip = 0x00007fa827bdb301   rsp = 0x00007ffeb171fcc0
[task 2017-06-14T19:13:15.066342Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fcf0
[task 2017-06-14T19:13:15.066361Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066379Z] 19:13:15     INFO - 10  libxul.so + 0xb5cc77
[task 2017-06-14T19:13:15.066405Z] 19:13:15     INFO -     rip = 0x00007fa824bd5c78   rsp = 0x00007ffeb171fcf0
[task 2017-06-14T19:13:15.066437Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fcf0
[task 2017-06-14T19:13:15.066472Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066507Z] 19:13:15     INFO - 11  libxul.so + 0xb484f3
[task 2017-06-14T19:13:15.066551Z] 19:13:15     INFO -     rip = 0x00007fa824bc14f4   rsp = 0x00007ffeb171fd00
[task 2017-06-14T19:13:15.066590Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fd40
[task 2017-06-14T19:13:15.066629Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066666Z] 19:13:15     INFO - 12  libxul.so + 0xb48623
[task 2017-06-14T19:13:15.066712Z] 19:13:15     INFO -     rip = 0x00007fa824bc1624   rsp = 0x00007ffeb171fd50
[task 2017-06-14T19:13:15.066750Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fd90
[task 2017-06-14T19:13:15.066788Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066825Z] 19:13:15     INFO - 13  libxul.so + 0xbc2b2d
[task 2017-06-14T19:13:15.066870Z] 19:13:15     INFO -     rip = 0x00007fa824c3bb2e   rsp = 0x00007ffeb171fda0
[task 2017-06-14T19:13:15.066908Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fe20
[task 2017-06-14T19:13:15.066947Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.066983Z] 19:13:15     INFO - 14  xpcshell + 0x1a18f
[task 2017-06-14T19:13:15.067029Z] 19:13:15     INFO -     rip = 0x000000000041a190   rsp = 0x00007ffeb171fdc0
[task 2017-06-14T19:13:15.067067Z] 19:13:15     INFO -     rbp = 0x00007ffeb171fe20
[task 2017-06-14T19:13:15.067103Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067141Z] 19:13:15     INFO - 15  libxul.so + 0x12ae686
[task 2017-06-14T19:13:15.067187Z] 19:13:15     INFO -     rip = 0x00007fa825327687   rsp = 0x00007ffeb171fe30
[task 2017-06-14T19:13:15.067226Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.067264Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067303Z] 19:13:15     INFO - 16  libgtk-3.so.0.400.4 + 0x3e45c0
[task 2017-06-14T19:13:15.067350Z] 19:13:15     INFO -     rip = 0x00007fa8239c75c1   rsp = 0x00007ffeb171fe98
[task 2017-06-14T19:13:15.067388Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.067425Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067457Z] 19:13:15     INFO - 17  libxul.so + 0x5e56eff
[task 2017-06-14T19:13:15.067502Z] 19:13:15     INFO -     rip = 0x00007fa829ecff00   rsp = 0x00007ffeb171ff90
[task 2017-06-14T19:13:15.067539Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.067576Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067613Z] 19:13:15     INFO - 18  libxul.so + 0x1298589
[task 2017-06-14T19:13:15.067659Z] 19:13:15     INFO -     rip = 0x00007fa82531158a   rsp = 0x00007ffeb171ffc8
[task 2017-06-14T19:13:15.067699Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.067737Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067775Z] 19:13:15     INFO - 19  libgtk-3.so.0.400.4 + 0x3da00
[task 2017-06-14T19:13:15.067820Z] 19:13:15     INFO -     rip = 0x00007fa823620a01   rsp = 0x00007ffeb1720018
[task 2017-06-14T19:13:15.067857Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.067895Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.067935Z] 19:13:15     INFO - 20  libglib-2.0.so.0.3400.3 + 0xd7ff
[task 2017-06-14T19:13:15.067982Z] 19:13:15     INFO -     rip = 0x00007fa821d62800   rsp = 0x00007ffeb1720020
[task 2017-06-14T19:13:15.068019Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.068057Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.068092Z] 19:13:15     INFO - 21  libxul.so + 0x46500
[task 2017-06-14T19:13:15.068137Z] 19:13:15     INFO -     rip = 0x00007fa8240bf501   rsp = 0x00007ffeb1720050
[task 2017-06-14T19:13:15.068176Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.068213Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.068250Z] 19:13:15     INFO - 22  libxul.so + 0xb2b218
[task 2017-06-14T19:13:15.068295Z] 19:13:15     INFO -     rip = 0x00007fa824ba4219   rsp = 0x00007ffeb1720058
[task 2017-06-14T19:13:15.068333Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.068371Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.068414Z] 19:13:15     INFO - 23  ld-2.12.so + 0x9f09
[task 2017-06-14T19:13:15.068536Z] 19:13:15     INFO -     rip = 0x00007fa82ac9bf0a   rsp = 0x00007ffeb17200b0
[task 2017-06-14T19:13:15.068720Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.068907Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.069100Z] 19:13:15     INFO - 24  xpcshell + 0x19eff
[task 2017-06-14T19:13:15.069300Z] 19:13:15     INFO -     rip = 0x0000000000419f00   rsp = 0x00007ffeb1720150
[task 2017-06-14T19:13:15.069481Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.069663Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.069884Z] 19:13:15     INFO - 25  libglib-2.0.so.0.3400.3 + 0x53800
[task 2017-06-14T19:13:15.070088Z] 19:13:15     INFO -     rip = 0x00007fa821da8801   rsp = 0x00007ffeb1720160
[task 2017-06-14T19:13:15.070264Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.070448Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.070631Z] 19:13:15     INFO - 26  libxul.so + 0x25c7634
[task 2017-06-14T19:13:15.070824Z] 19:13:15     INFO -     rip = 0x00007fa826640635   rsp = 0x00007ffeb1720168
[task 2017-06-14T19:13:15.071005Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.071187Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.071373Z] 19:13:15     INFO - 27  libglib-2.0.so.0.3400.3 + 0x84dff
[task 2017-06-14T19:13:15.071564Z] 19:13:15     INFO -     rip = 0x00007fa821dd9e00   rsp = 0x00007ffeb17201a0
[task 2017-06-14T19:13:15.071740Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.071922Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.072101Z] 19:13:15     INFO - 28  libxul.so + 0x5e20b2f
[task 2017-06-14T19:13:15.072297Z] 19:13:15     INFO -     rip = 0x00007fa829e99b30   rsp = 0x00007ffeb17201b8
[task 2017-06-14T19:13:15.072474Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.072655Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.072837Z] 19:13:15     INFO - 29  libxul.so + 0x45ac069
[task 2017-06-14T19:13:15.073033Z] 19:13:15     INFO -     rip = 0x00007fa82862506a   rsp = 0x00007ffeb17201c8
[task 2017-06-14T19:13:15.073217Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.073397Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.073574Z] 19:13:15     INFO - 30  xpcshell + 0x10100
[task 2017-06-14T19:13:15.073769Z] 19:13:15     INFO -     rip = 0x0000000000410101   rsp = 0x00007ffeb17201f0
[task 2017-06-14T19:13:15.073966Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.074147Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.074325Z] 19:13:15     INFO - 31  0x3fffff
[task 2017-06-14T19:13:15.074522Z] 19:13:15     INFO -     rip = 0x0000000000400000   rsp = 0x00007ffeb1720218
[task 2017-06-14T19:13:15.074706Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.074882Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.075059Z] 19:13:15     INFO - 32  xpcshell + 0x5797
[task 2017-06-14T19:13:15.075251Z] 19:13:15     INFO -     rip = 0x0000000000405798   rsp = 0x00007ffeb1720278
[task 2017-06-14T19:13:15.075428Z] 19:13:15     INFO -     rbp = 0x00007ffeb1720290
[task 2017-06-14T19:13:15.075612Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.075788Z] 19:13:15     INFO - 33  xpcshell + 0x55d4
[task 2017-06-14T19:13:15.075983Z] 19:13:15     INFO -     rip = 0x00000000004055d5   rsp = 0x00007ffeb17202a0
[task 2017-06-14T19:13:15.076158Z] 19:13:15     INFO -     rbp = 0x00007ffeb17202d0
[task 2017-06-14T19:13:15.076337Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.076518Z] 19:13:15     INFO - 34  libc-2.12.so + 0x1ed5c
[task 2017-06-14T19:13:15.076712Z] 19:13:15     INFO -     rip = 0x00007fa821233d5d   rsp = 0x00007ffeb17202e0
[task 2017-06-14T19:13:15.076889Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.077068Z] 19:13:15     INFO - 35  xpcshell + 0x5570
[task 2017-06-14T19:13:15.077265Z] 19:13:15     INFO -     rip = 0x0000000000405571   rsp = 0x00007ffeb1720300
[task 2017-06-14T19:13:15.077440Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.077619Z] 19:13:15     INFO - 36  xpcshell + 0x5797
[task 2017-06-14T19:13:15.077833Z] 19:13:15     INFO -     rip = 0x0000000000405798   rsp = 0x00007ffeb1720318
[task 2017-06-14T19:13:15.078010Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.078188Z] 19:13:15     INFO - 37  xpcshell + 0x1c87f
[task 2017-06-14T19:13:15.078385Z] 19:13:15     INFO -     rip = 0x000000000041c880   rsp = 0x00007ffeb1720360
[task 2017-06-14T19:13:15.078560Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.078740Z] 19:13:15     INFO - 38  xpcshell + 0x5797
[task 2017-06-14T19:13:15.078939Z] 19:13:15     INFO -     rip = 0x0000000000405798   rsp = 0x00007ffeb1720388
[task 2017-06-14T19:13:15.079114Z] 19:13:15     INFO -     Found by: stack scanning
[task 2017-06-14T19:13:15.079296Z] 19:13:15     INFO - 39  xpcshell + 0x57c0
[task 2017-06-14T19:13:15.079490Z] 19:13:15     INFO -     rip = 0x00000000004057c1   rsp = 0x00007ffeb17203a0
[task 2017-06-14T19:13:15.079669Z] 19:13:15     INFO -     Found by: stack scanning
Flags: needinfo?(bugs)
Surprising. I guess xpcshell tests behave very differently to actual browser.
Attached patch async_snow_white_2.diff (obsolete) — Splinter Review
idle dispatch fails anyhow later if one can't access current thread.
Attachment #8877706 - Attachment is obsolete: true
Flags: needinfo?(bugs)
Attachment #8877750 - Flags: review?(continuation)
Comment on attachment 8877750 [details] [diff] [review]
async_snow_white_2.diff

Review of attachment 8877750 [details] [diff] [review]:
-----------------------------------------------------------------

Nathan should review the thread changes.
Attachment #8877750 - Flags: review?(continuation) → review?(nfroyd)
Blocks: 1373116
tracking for 55 and 56 per comment 11.
Comment on attachment 8877750 [details] [diff] [review]
async_snow_white_2.diff

Review of attachment 8877750 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below change.

::: xpcom/threads/nsThreadUtils.cpp
@@ +371,5 @@
>    nsCOMPtr<nsIRunnable> event(Move(aEvent));
>    NS_ENSURE_TRUE(event, NS_ERROR_INVALID_ARG);
> +
> +  //XXX Using current thread for now as the nsIEventTarget.
> +  nsIEventTarget* target = NS_GetCurrentThread();

Please use mozilla::GetCurrentThreadEventTarget, which I believe should "fail" in the same ways at shutdown.
Attachment #8877750 - Flags: review?(nfroyd) → review+
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff18daec4a56
ensure AsyncFreeSnowWhite is run often enough, r=mccr8,nfroyd
Whiteboard: [stockwell fixed:product]
Looks like this is fixed in 56 but the bug wasn't marked. 
Is this something we might want to uplift to 55 beta?
Flags: needinfo?(bugs)
Assignee: nobody → bugs
Comment on attachment 8878580 [details] [diff] [review]
async_snow_white_2.diff

Approval Request Comment
[Feature/Bug causing the regression]: Maybe bug 1367164 
[User impact if declined]: not sure, this is mostly testing issue.
[Is this code covered by automated tests?]: well this is testing issues
[Has the fix been verified in Nightly?]: don't know
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: NA
[Is the change risky?]: should be safe
[Why is the change risky/not risky?]: Just running some runnables more reliable
[String changes made/needed]: NA
Flags: needinfo?(bugs)
Attachment #8878580 - Flags: approval-mozilla-beta?
Comment on attachment 8878580 [details] [diff] [review]
async_snow_white_2.diff

Fix for annoying intermittent test failures, let's uplift this for beta 5.
Attachment #8878580 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1312623
Attachment #8877750 - Attachment is obsolete: true
(In reply to Ryan VanderMeulen [:RyanVM] from comment #25)
> https://hg.mozilla.org/releases/mozilla-beta/rev/d5d30ef9787f

This was broken on Beta because GetCurrentThreadEventTarget doesn't exist on 55. So it was backed out:
https://hg.mozilla.org/releases/mozilla-beta/rev/a071ea8831a3daf150012b16484a24c0bbb27aa7

I landed the original version of the patch using NS_GetCurrentThread and it appears to be working fine.
https://hg.mozilla.org/releases/mozilla-beta/rev/67e3dc7adcbe69fef347bfe70ee0afceab507210
this has started up again at a pretty high failure rate.  :smaug, I see you landed a fix for this a few weeks ago, can you take a look at this again?
Flags: needinfo?(bugs)
Whiteboard: [stockwell fixed:product] → [stockwell needswork]
Unfortunately I have no idea what is causing this. This bug should probably be closed and a new one opened.
Flags: needinfo?(bugs)
Yeah this one was meant to be more of a metabug for a general signature. There are probably multiple different causes for this.
Whiteboard: [stockwell needswork] → [stockwell unknown]
I have seen an increased number of those crashes due to slow shutdowns for various Marionette restart tests. This happens especially on Linux since Sep 13/14th.
This bug's going to suddenly have more starred against it, because we decided that the one I had been using was all about ASan, and "fixed" that, and called the bug fixed. Doesn't mean these are happening more, or less, just that this is now my one true bug for all of them.
I suspect that with the increased duration in AsyncShutdown/nsTerminator, we'll also see new forms of this bug as harness timeout. Time will tell.
Duplicate of this bug: 1419860
Duplicate of this bug: 1421922
Depends on: 1425323
Keywords: meta
Summary: Many intermittent crashes in RunWatchdog → [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
Duplicate of this bug: 1445210
Alias: RunWatchdogShutdownhang
Depends on: 1336478
Depends on: 1445210
Depends on: 1130782
Summary: [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down. → Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
Duplicate of this bug: 1445210
Depends on: 1411908
Depends on: 1434618
Duplicate of this bug: 1425396
Duplicate of this bug: 1427389
Duplicate of this bug: 1441423
Duplicate of this bug: 1442174
Duplicate of this bug: 1444261
Duplicate of this bug: 1446107
Duplicate of this bug: 1446458
Duplicate of this bug: 1446628
Duplicate of this bug: 1446633
Duplicate of this bug: 1146703
Depends on: 1452416
Duplicate of this bug: 1426544
Duplicate of this bug: 1244415
Please add "Crash" key word.
Depends on: 1424393
Duplicate of this bug: 1456867
Duplicate of this bug: 1456832
Duplicate of this bug: 1456810
Duplicate of this bug: 1456826
Duplicate of this bug: 1450397
Duplicate of this bug: 1457348
Duplicate of this bug: 1463927
Duplicate of this bug: 1464526
:aryx -- Some of the recent dups do not look like they are related to shutdown or RunWatchdog; can you explain what criteria you are using for duping here?
Flags: needinfo?(aryx.bugmail)
Sorry, I over-generalized there. The bugs have been reopened.
Flags: needinfo?(aryx.bugmail)
Duplicate of this bug: 1450276
Over the last 7 days there are 32 failures on this bug. These happen on windows10-64-ccov, linux64

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-07-24&endday=2018-07-31&tree=trunk&bug=1358898

[task 2018-07-31T04:06:40.325Z] 04:06:40    ERROR - ==7398==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa506be594d bp 0x7fa4c6ffedd0 sp 0x7fa4c6ffed20 T87)
[task 2018-07-31T04:06:40.327Z] 04:06:40     INFO - ==7398==The signal is caused by a WRITE memory access.
[task 2018-07-31T04:06:40.330Z] 04:06:40     INFO - ==7398==Hint: address points to the zero page.
[task 2018-07-31T04:06:41.257Z] 04:06:41     INFO -     #0 0x7fa506be594c in mozilla::(anonymous namespace)::RunWatchdog(void*) /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp
[task 2018-07-31T04:06:41.258Z] 04:06:41     INFO -     #1 0x7fa518232f08 in _pt_root /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:201:5
[task 2018-07-31T04:06:41.258Z] 04:06:41     INFO -     #2 0x7fa51b82c6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
[task 2018-07-31T04:06:41.319Z] 04:06:41     INFO -     #3 0x7fa51a8b541c in clone /build/glibc-Cl5G7W/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109
[task 2018-07-31T04:06:41.321Z] 04:06:41     INFO - AddressSanitizer can not provide additional info.
[task 2018-07-31T04:06:41.321Z] 04:06:41     INFO - SUMMARY: AddressSanitizer: SEGV /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp in mozilla::(anonymous namespace)::RunWatchdog(void*)
[task 2018-07-31T04:06:41.322Z] 04:06:41     INFO - Thread T87 (Shutdow~minator) created by T0 here:
[task 2018-07-31T04:06:41.340Z] 04:06:41     INFO -     #0 0x4aae8d in __interceptor_pthread_create /builds/worker/workspace/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cc:204:3
[task 2018-07-31T04:06:41.342Z] 04:06:41     INFO -     #1 0x7fa51822fc45 in _PR_CreateThread /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:433:14
[task 2018-07-31T04:06:41.342Z] 04:06:41     INFO -     #2 0x7fa51822f82e in PR_CreateThread /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:518:12
[task 2018-07-31T04:06:41.343Z] 04:06:41     INFO -     #3 0x7fa506be56c2 in CreateSystemThread /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:103:22
[task 2018-07-31T04:06:41.345Z] 04:06:41     INFO -     #4 0x7fa506be56c2 in mozilla::nsTerminator::StartWatchdog() /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:453
[task 2018-07-31T04:06:41.347Z] 04:06:41     INFO -     #5 0x7fa506be5bd6 in Start /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:393:3
[task 2018-07-31T04:06:41.348Z] 04:06:41     INFO -     #6 0x7fa506be5bd6 in mozilla::nsTerminator::Observe(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:508
[task 2018-07-31T04:06:41.349Z] 04:06:41     INFO -     #7 0x7fa4fb161323 in nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/xpcom/ds/nsObserverList.cpp:112:19
[task 2018-07-31T04:06:41.350Z] 04:06:41     INFO -     #8 0x7fa4fb164c52 in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/workspace/build/src/xpcom/ds/nsObserverService.cpp:295:19
[task 2018-07-31T04:06:41.351Z] 04:06:41     INFO -     #9 0x7fa5069fd86e in nsAppStartup::Quit(unsigned int) /builds/worker/workspace/build/src/toolkit/components/startup/nsAppStartup.cpp:468:19
[task 2018-07-31T04:06:41.352Z] 04:06:41     INFO -     #10 0x7fa4fb2a7b11 in NS_InvokeByIndex /builds/worker/workspace/build/src/xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_x86_64_unix.S:106
[task 2018-07-31T04:06:41.361Z] 04:06:41     INFO -     #11 0x7fa4fcbc717b in Invoke /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1705:12
[task 2018-07-31T04:06:41.363Z] 04:06:41     INFO -     #12 0x7fa4fcbc717b in Call /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1231
[task 2018-07-31T04:06:41.363Z] 04:06:41     INFO -     #13 0x7fa4fcbc717b in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1198
[task 2018-07-31T04:06:41.364Z] 04:06:41     INFO -     #14 0x7fa4fcbccd04 in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:893:12
[task 2018-07-31T04:06:41.384Z] 04:06:41     INFO -     #15 0x7fa506f09a8e in CallJSNative /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:444:15
[task 2018-07-31T04:06:41.384Z] 04:06:41     INFO -     #16 0x7fa506f09a8e in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:532
[task 2018-07-31T04:06:41.384Z] 04:06:41     INFO -     #17 0x7fa506ef46a2 in CallFromStack /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:589:12
[task 2018-07-31T04:06:41.385Z] 04:06:41     INFO -     #18 0x7fa506ef46a2 in Interpret(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:3239
[task 2018-07-31T04:06:41.385Z] 04:06:41     INFO -     #19 0x7fa506eda63a in js::RunScript(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:424:12
[task 2018-07-31T04:06:41.387Z] 04:06:41     INFO -     #20 0x7fa506f0a364 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:556:15
[task 2018-07-31T04:06:41.387Z] 04:06:41     INFO -     #21 0x7fa506f0b8f2 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:602:10
[task 2018-07-31T04:06:41.387Z] 04:06:41     INFO -     #22 0x7fa507b23a75 in js::ForwardingProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const /builds/worker/workspace/build/src/js/src/proxy/Wrapper.cpp:176:12
[task 2018-07-31T04:06:41.397Z] 04:06:41     INFO -     #23 0x7fa507ae8bb8 in js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const /builds/worker/workspace/build/src/js/src/proxy/CrossCompartmentWrapper.cpp:359:23
[task 2018-07-31T04:06:41.397Z] 04:06:41     INFO -     #24 0x7fa507afe643 in js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) /builds/worker/workspace/build/src/js/src/proxy/Proxy.cpp:510:21
[task 2018-07-31T04:06:41.398Z] 04:06:41     INFO -     #25 0x7fa506f0a984 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:508:20
[task 2018-07-31T04:06:41.398Z] 04:06:41     INFO -     #26 0x7fa506ef46a2 in CallFromStack /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:589:12
[task 2018-07-31T04:06:41.399Z] 04:06:41     INFO -     #27 0x7fa506ef46a2 in Interpret(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:3239
[task 2018-07-31T04:06:41.400Z] 04:06:41     INFO -     #28 0x7fa506eda63a in js::RunScript(JSContext*, js::RunState&) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:424:12
[task 2018-07-31T04:06:41.400Z] 04:06:41     INFO -     #29 0x7fa506f0a364 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:556:15
[task 2018-07-31T04:06:41.400Z] 04:06:41     INFO -     #30 0x7fa506f0b8f2 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/vm/Interpreter.cpp:602:10
[task 2018-07-31T04:06:41.419Z] 04:06:41     INFO -     #31 0x7fa507a3bbad in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /builds/worker/workspace/build/src/js/src/jsapi.cpp:2841:12
[task 2018-07-31T04:06:41.420Z] 04:06:41     INFO -     #32 0x7fa4fcbafd50 in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedJSClass.cpp:1131:23
[task 2018-07-31T04:06:41.420Z] 04:06:41     INFO -     #33 0x7fa4fb2a91e8 in PrepareAndDispatch /builds/worker/workspace/build/src/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:127:37
[task 2018-07-31T04:06:41.420Z] 04:06:41     INFO -     #34 0x7fa4fb2a80ba in SharedStub (/builds/worker/workspace/build/application/firefox/libxul.so+0x213d0ba)
[task 2018-07-31T04:06:41.420Z] 04:06:41     INFO -     #35 0x7fa4fb2799bf in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1235:14
[task 2018-07-31T04:06:41.420Z] 04:06:41     INFO -     #36 0x7fa4fb280b88 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:519:10
[task 2018-07-31T04:06:41.427Z] 04:06:41     INFO -     #37 0x7fa4fc1abafa in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/workspace/build/src/ipc/glue/MessagePump.cpp:97:21
[task 2018-07-31T04:06:41.434Z] 04:06:41     INFO -     #38 0x7fa4fc10018c in RunInternal /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:325:10
[task 2018-07-31T04:06:41.435Z] 04:06:41     INFO -     #39 0x7fa4fc10018c in RunHandler /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:318
[task 2018-07-31T04:06:41.437Z] 04:06:41     INFO -     #40 0x7fa4fc10018c in MessageLoop::Run() /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:298
[task 2018-07-31T04:06:41.437Z] 04:06:41     INFO -     #41 0x7fa502febdca in nsBaseAppShell::Run() /builds/worker/workspace/build/src/widget/nsBaseAppShell.cpp:158:27
[task 2018-07-31T04:06:41.437Z] 04:06:41     INFO -     #42 0x7fa5069fcb3b in nsAppStartup::Run() /builds/worker/workspace/build/src/toolkit/components/startup/nsAppStartup.cpp:290:30
[task 2018-07-31T04:06:41.437Z] 04:06:41     INFO -     #43 0x7fa506c2fc4b in XREMain::XRE_mainRun() /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4764:22
[task 2018-07-31T04:06:41.438Z] 04:06:41     INFO -     #44 0x7fa506c32d73 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4909:8
[task 2018-07-31T04:06:41.439Z] 04:06:41     INFO -     #45 0x7fa506c34228 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:5001:21
[task 2018-07-31T04:06:41.440Z] 04:06:41     INFO -     #46 0x4f20ac in do_main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:233:22
[task 2018-07-31T04:06:41.445Z] 04:06:41     INFO -     #47 0x4f20ac in main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:311
[task 2018-07-31T04:06:41.446Z] 04:06:41     INFO -     #48 0x7fa51a7ce82f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291
[task 2018-07-31T04:06:41.447Z] 04:06:41     INFO - ==7398==ABORTING
[task 2018-07-31T04:12:51.453Z] 04:12:51    ERROR - REFTEST ERROR | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/lists-3/list-style-type-string-001b.html | application timed out after 370 seconds with no output
[task 2018-07-31T04:12:51.455Z] 04:12:51    ERROR - REFTEST ERROR | Force-terminating active process(es).
[task 2018-07-31T04:12:51.457Z] 04:12:51     INFO - REFTEST TEST-INFO | started process screentopng
[task 2018-07-31T04:12:52.136Z] 04:12:52     INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2018-07-31T04:12:52.137Z] 04:12:52     INFO - REFTEST INFO | Can't trigger Breakpad, just killing process
[task 2018-07-31T04:12:52.238Z] 04:12:52    ERROR - TEST-UNEXPECTED-FAIL | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/lists-3/list-style-type-string-001b.html | application terminated with exit code 1
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
I don't see anything useful in the logs.
Duplicate of this bug: 1493344
There are 67 failures asociated to this bug in the last 7 days. This is mostly occurring on Linux 64, with some occurrences on 
Linux 32, Windows 7 and OS.

:Yoric can you take a look at this?
Flags: needinfo?(dteller)
Please note that for all those cases I can see bug 1451613 being related:

https://treeherder.mozilla.org/logviewer.html#?job_id=202468209&repo=try&lineNumber=39094-39096

> [task 2018-09-30T14:25:42.200Z] 14:25:42     INFO - [Parent 11365, Main Thread] WARNING: NS_ENSURE_TRUE(mDB) failed: file /builds/worker/workspace/build/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp, line 1422
> [task 2018-09-30T14:26:26.940Z] 14:26:26     INFO - JavaScript error: resource://gre/modules/osfile/osfile_async_front.jsm, line 407: Error: OS.File has been shut down. Rejecting post to stat
> [task 2018-09-30T14:26:45.854Z] 14:26:45     INFO - Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:219
(In reply to Andrei Ciure[:andrei_ciure_] from comment #153)
> There are 67 failures asociated to this bug in the last 7 days. This is
> mostly occurring on Linux 64, with some occurrences on 
> Linux 32, Windows 7 and OS.
> 
> :Yoric can you take a look at this?

Just to be clear: because I wrote the timer doesn't mean that I'm the best person to fix the stuff that it measures :)

You want me to do some triage?
Flags: needinfo?(dteller) → needinfo?(aciure)
Please note that most of the lately reported failures (see last comment) are related to VRListenerThreadHolder. This particular problem should mostly/completely be fixed now. See bug 1494490.
Depends on: 1494490
Flags: needinfo?(aciure)
There are 11 failures in the last 7 days, removing the disabled recommended tag.
Whiteboard: [stockwell disable-recommended]
Summary: Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down. → [meta] Many intermittent crashes in RunWatchdog. Firefox kills itself because it's taking too long to shut down.
There are 23 failures in the last 7 days, 18 of which on linux64 asan, 4 on windows10-64-qr debug and 1 linux32
debug. https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-11-30&endday=2018-12-07&tree=all&bug=1358898

There are 30 failures in the last 7 days, mostly of which on osx-10-10 debug/opt and linux64 asan.

:smaug can you take a look?

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=233087449&repo=autoland&lineNumber=7391

12:28:28 INFO - TEST-INFO | Main app process: exit 1
12:28:28 INFO - Buffered messages finished
12:28:28 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
12:28:28 INFO - runtests.py | Application ran for: 0:01:16.689748
12:28:28 INFO - zombiecheck | Reading PID log: /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpxhcSy0pidlog
12:28:28 INFO - ==> process 944 launched child process 945
12:28:28 INFO - ==> process 944 launched child process 946
12:28:28 INFO - ==> process 944 launched child process 947
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 945
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 946
12:28:28 INFO - zombiecheck | Checking for orphan process with PID: 947
12:28:28 INFO - mozcrash Copy/paste: /Users/cltbld/tasks/task_1552306751/build/macosx64-minidump_stackwalk /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpsLxijP.mozrunner/minidumps/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp /Users/cltbld/tasks/task_1552306751/build/symbols
12:28:45 INFO - mozcrash Saved minidump as /Users/cltbld/tasks/task_1552306751/build/blobber_upload_dir/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp
12:28:45 INFO - mozcrash Saved app info as /Users/cltbld/tasks/task_1552306751/build/blobber_upload_dir/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.extra
12:28:45 INFO - PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
12:28:45 INFO - Crash dump filename: /var/folders/_6/fvrv10k16g34qs2w99kytz1800000w/T/tmpsLxijP.mozrunner/minidumps/3AA9DA58-AA22-4AFF-AD28-F1187373CAAD.dmp
12:28:45 INFO - Operating system: Mac OS X
12:28:45 INFO - 10.10.5 14F27
12:28:45 INFO - CPU: amd64
12:28:45 INFO - family 6 model 69 stepping 1
12:28:45 INFO - 4 CPUs
12:28:45 INFO -
12:28:45 INFO - GPU: UNKNOWN
12:28:45 INFO -
12:28:45 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
12:28:45 INFO - Crash address: 0x0
12:28:45 INFO - Process uptime: 75 seconds
12:28:45 INFO -
12:28:45 INFO - Thread 37 (crashed)
12:28:45 INFO - 0 XUL!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:a423e8a5526660b0b84c79d9c7b13a0d1987eaff : 200 + 0x0]
12:28:45 INFO - rax = 0x0000000111b8235c rdx = 0x00007fff79dd11f8
12:28:45 INFO - rcx = 0x000000010a05c6b0 rbx = 0x0000000000000000
12:28:45 INFO - rsi = 0x0000480000004800 rdi = 0x0000470000004803
12:28:45 INFO - rbp = 0x0000000114303eb0 rsp = 0x0000000114303e80
12:28:45 INFO - r8 = 0x0000000114303e30 r9 = 0x0000000114304000
12:28:45 INFO - r10 = 0x00007fff8e6873ef r11 = 0x00007fff8e6873c0
12:28:45 INFO - r12 = 0x000000000000003f r13 = 0x00000000000008ff
12:28:45 INFO - r14 = 0x00000001134fb9e4 r15 = 0x000000010a05c6d8
12:28:45 INFO - rip = 0x000000010fa054d3
12:28:45 INFO - Found by: given as instruction pointer in context
12:28:45 INFO - 1 libnss3.dylib!_pt_root [ptthread.c:a423e8a5526660b0b84c79d9c7b13a0d1987eaff : 201 + 0x3]
12:28:45 INFO - rbp = 0x0000000114303ef0 rsp = 0x0000000114303ec0
12:28:45 INFO - rip = 0x000000010a46e980
12:28:45 INFO - Found by: previous frame's frame pointer

Flags: needinfo?(bugs)

This is a meta bug, not really something to look at.

Flags: needinfo?(bugs)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork]

Failures from the 18th till present day were caused by bug 1545440 and fixed by emilio here: https://hg.mozilla.org/integration/autoland/rev/133b9a6eafebffdf83e548972a7ff5634433d5d0

In that interval there were 585 total failures (browser-chrome failures), all of which are apparently bug 1545979
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-18&endday=2019-04-21&tree=trunk&bug=1358898

Removing ni as the high frequency here is caused solely by missclassifications and the real issue only has 10 total failures:

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-14&endday=2019-04-18&tree=trunk&bug=1358898

Flags: needinfo?(dteller)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]
Duplicate of this bug: 1492572

This bug had a massive spike in the last 7 days. Occurring 184 times on Linux and OSX 64bit platforms on debug and opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=261906593&repo=autoland&lineNumber=35462

(In reply to Dorel Luca [:dluca] from comment #220)

This bug had a massive spike in the last 7 days. Occurring 184 times on Linux and OSX 64bit platforms on debug and opt build types.

And the last failure happened on August 16th, which means no failure over the last weekend. I wonder if this is related to the lower amount of pushes, or something else fixed it. Worth checking again tomorrow.

Whiteboard: [stockwell disable-recommended]

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #221)

And the last failure happened on August 16th, which means no failure over the last weekend. I wonder if this is related to the lower amount of pushes, or something else fixed it. Worth checking again tomorrow.

Correct, no more failures yet since Aug 16th. Olli, any idea what could have fixed that bug?

Flags: needinfo?(bugs)

I'm not aware of any specific recent changes affecting this bug.
un-regression range might reveal something interesting.

Flags: needinfo?(bugs)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

This bug failed 39 times in the last 7 days. Occurs on linux64, windows10-64, windows10-64-shippable and on windows7-32

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=268547147&repo=autoland&lineNumber=2425

There are 44 total failures in the last 7 days on linux64 and windows7-32 debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269063326&repo=autoland&lineNumber=4421

[task 2019-09-30T14:37:46.037Z] 14:37:46 INFO - TEST-START | accessible/tests/browser/e10s/browser_treeupdate_visibility.js
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOCSHELL 085DB800 == 6 [pid = 4168] [id = {8e4d59ba-58c7-4c73-8903-9e7f5bf632cd}]
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 43 (0B22B820) [pid = 4168] [serial = 72] [outer = 00000000]
[task 2019-09-30T14:37:46.052Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 44 (085E2C00) [pid = 4168] [serial = 73] [outer = 0B22B820]
[task 2019-09-30T14:37:46.119Z] 14:37:46 INFO - GECKO(1980) | [Child 4168, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
[task 2019-09-30T14:37:46.119Z] 14:37:46 INFO - GECKO(1980) | : file z:/build/build/src/xpcom/threads/BlockingResourceBase.cpp, line 293
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | --- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | calling context
[task 2019-09-30T14:37:46.120Z] 14:37:46 INFO - GECKO(1980) | [stack trace unavailable]
[task 2019-09-30T14:37:46.160Z] 14:37:46 INFO - GECKO(1980) | ++DOMWINDOW == 45 (0B20E000) [pid = 4168] [serial = 74] [outer = 0B22B820]
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | [Child 4168, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | : file z:/build/build/src/xpcom/threads/BlockingResourceBase.cpp, line 293
[task 2019-09-30T14:37:46.175Z] 14:37:46 INFO - GECKO(1980) | --- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)

[task 2019-09-30T14:38:57.440Z] 14:38:57 INFO - mozcrash Copy/paste: Z:\task_1569850468\build\win32-minidump_stackwalk.exe c:\users\task_1569850468\appdata\local\temp\tmp0hxmto.mozrunner\minidumps\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp Z:\task_1569850468\build\symbols
[task 2019-09-30T14:39:18.980Z] 14:39:18 INFO - mozcrash Saved minidump as Z:\task_1569850468\build\blobber_upload_dir\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp
[task 2019-09-30T14:39:18.983Z] 14:39:18 INFO - mozcrash Saved app info as Z:\task_1569850468\build\blobber_upload_dir\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.extra
[task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - PROCESS-CRASH | Last test finished | application crashed [@ static void mozilla::anonymous namespace'::RunWatchdog(void *)] [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash dump filename: c:\users\task_1569850468\appdata\local\temp\tmp0hxmto.mozrunner\minidumps\5b176dd4-4ef8-4d66-b76b-2949e8020fd2.dmp [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Operating system: Windows NT [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - 6.1.7601 Service Pack 1 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - CPU: x86 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - GenuineIntel family 6 model 63 stepping 2 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - 8 CPUs [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - GPU: UNKNOWN [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash reason: EXCEPTION_BREAKPOINT [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Crash address: 0x5b912725 [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Process uptime: 100 seconds [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - [task 2019-09-30T14:39:19.032Z] 14:39:19 INFO - Thread 33 (crashed) [task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 0 xul.dll!static void mozilla::anonymous namespace'::RunWatchdog(void *) [nsTerminator.cpp:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 213 + 0x0]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x5b912725 esp = 0x18abfcac ebp = 0x18abfcc0 ebx = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - esi = 0x0000003f edi = 0x7685c446 eax = 0x6b39389c ecx = 0x000000d5
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - edx = 0x00000049 efl = 0x00000202
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: given as instruction pointer in context
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 1 nss3.dll!_PR_NativeRunThread [pruthr.c:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 397 + 0x6]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b69aa98 esp = 0x18abfcc8 ebp = 0x18abfce8 esi = 0x008b0630
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - edi = 0x008b0580
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 2 nss3.dll!static unsigned int pr_root(void *) [w95thred.c:63b8d34e13af04cdab0c70cbb07b1f0be6a316fe : 137 + 0x7]
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b68917d esp = 0x18abfcf0 ebp = 0x18abfcf4 ebx = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - esi = 0x6b689170 edi = 0x0042fd90
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 3 ucrtbase.dll!__RtlConvertUlongToLargeInteger + 0x8489
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x6b18e16f esp = 0x18abfcfc ebp = 0x18abfd30
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: call frame info
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 4 kernel32.dll! ?? ?? ::FNODOBFM::`string' + 0x46
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - eip = 0x7685ef3c esp = 0x18abfd38 ebp = 0x18abfd3c
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - Found by: previous frame's frame pointer
[task 2019-09-30T14:39:19.033Z] 14:39:19 INFO - 5 ntdll.dll!__RtlUserThreadStart + 0x27
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - eip = 0x77a93618 esp = 0x18abfd44 ebp = 0x18abfd7c
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - Found by: previous frame's frame pointer
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - 6 ntdll.dll!_RtlUserThreadStart + 0x1b
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - eip = 0x77a935eb esp = 0x18abfd84 ebp = 0x18abfd94
[task 2019-09-30T14:39:19.034Z] 14:39:19 INFO - Found by: call frame info

Based on https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-08-31&endday=2019-09-30&tree=all&bug=1358898 seems like spikes started on the 21st ~ 24th of September.

David can you assign someone to take a look?

Flags: needinfo?(dteller)

I can try to find someone, but I haven't worked on anything related to shutdown in ~4 years.

Flags: needinfo?(dteller)

Hmm, it is mostly mochitest-browser-chrome-fis-e10s-16 (debug) which is failing. We certainly may have issues with shutdown in Fission.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork]

Over the last 7 days there are 59 failures present on this bug. These happen on: linux64, linux64-qr, macosx1014-64, windows10-64, windows10-64-qr, windows10-64-shippable

Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271983976&repo=autoland&lineNumber=8858

(In reply to Olli Pettay [:smaug] from comment #238)

Hmm, it is mostly mochitest-browser-chrome-fis-e10s-16 (debug) which is failing. We certainly may have issues with shutdown in Fission.

Olli are you working on this or can you assign someone to take a look or is this not fixable?

Whiteboard: [stockwell needswork] → [stockwell needswork:owner]

Hi Geoff, any suggestions here?

Flags: needinfo?(gbrown)

Most of the recent failures are on linux64/debug and in mochitest-browser-chrome-fission; since it is fission-specific, let's see if mccr8 is interested.

The shutdown hang occurs most frequently after running the tests in browser/components/extensions/test/browser (WebExtensions); let's see if jimm is interested.

Flags: needinfo?(jmathies)
Flags: needinfo?(gbrown)
Flags: needinfo?(continuation)

I don't know what could be going on. It does seem strange that around half of the failures are in a single directory. Looking at the log, it frees some windows and then just sits there for about a minute before getting killed. With Fission, a testing-only pref to reuse content processes is not working. I would have expected that this only make the tests themselves take longer, but maybe this directory creates an unusually large number of processes, and there's some kind of per-process cleanup that happens only in shutdown?

Flags: needinfo?(continuation)

Kris, in a discussion of bug 1580212, you were saying that maybe we should just leave dom.ipc.keepProcessesAlive broken with Fission. However, this directory, browser/components/extensions/test/browser/, seems to be experiencing frequent shutdown hangs, and I do wonder if the fact that it seems to open around 255 content tabs has anything to do with it? The failures are almost all in debug builds, so it is possible that there's something particularly slow we're doing at shutdown in debug builds. (Leak checking, maybe? Something else?)

Flags: needinfo?(kmaglione+bmo)

(In reply to Geoff Brown [:gbrown] from comment #248)

Most of the recent failures are on linux64/debug and in mochitest-browser-chrome-fission; since it is fission-specific, let's see if mccr8 is interested.

The shutdown hang occurs most frequently after running the tests in browser/components/extensions/test/browser (WebExtensions); let's see if jimm is interested.

If this is specific to fission, Neha can help prioritize the work.

Flags: needinfo?(jmathies) → needinfo?(nkochar)

mccr8 has looked into this and Kris is NI'ed here too for feedback.

Flags: needinfo?(nkochar)
Depends on: 1580212
Whiteboard: [stockwell disable-recommended]
Depends on: 1594572

There are 94 total failures in the last 7 days mostly on linux64 debug and 399 in the last 20 days.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=274975908&repo=autoland&lineNumber=36336

[task 2019-11-07T01:24:02.907Z] 01:24:02 INFO - checking window state
[task 2019-11-07T01:24:02.935Z] 01:24:02 INFO - TEST-START | browser/components/extensions/test/browser/browser_ext_windows_update.js
[task 2019-11-07T01:24:02.952Z] 01:24:02 INFO - GECKO(1178) | ++DOCSHELL 0x7efe4793c000 == 40 [pid = 1178] [id = {573c6e22-67b2-4820-ab21-c0659b56358c}]
[task 2019-11-07T01:24:02.952Z] 01:24:02 INFO - GECKO(1178) | ++DOMWINDOW == 161 (0x7efe52019020) [pid = 1178] [serial = 1261] [outer = (nil)]

[task 2019-11-07T01:25:40.713Z] 01:25:40 INFO - GECKO(1178) | Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:213
[task 2019-11-07T01:25:40.749Z] 01:25:40 INFO - GECKO(1178) | #01: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-11-07T01:25:40.750Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #02: libpthread.so.0 + 0x76ba
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #03: libc.so.6 + 0x10741d
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO -
[task 2019-11-07T01:25:40.751Z] 01:25:40 INFO - GECKO(1178) | #04: ??? (???:???)
[task 2019-11-07T01:25:40.752Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::GenerateDump cloned child 8412
[task 2019-11-07T01:25:40.753Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-11-07T01:25:40.755Z] 01:25:40 INFO - GECKO(1178) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-11-07T01:31:50.765Z] 01:31:50 INFO - Buffered messages finished
[task 2019-11-07T01:31:50.765Z] 01:31:50 ERROR - TEST-UNEXPECTED-TIMEOUT | Last test finished | application timed out after 370 seconds with no output
[task 2019-11-07T01:31:50.765Z] 01:31:50 ERROR - Force-terminating active process(es).
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - Determining child pids from psutil...
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - []
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - ==> process 1178 launched child process 1196
[task 2019-11-07T01:31:50.766Z] 01:31:50 INFO - ==> process 1178 launched child process 1236
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1258
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1314
[task 2019-11-07T01:31:50.767Z] 01:31:50 INFO - ==> process 1178 launched child process 1342
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1362
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1386
[task 2019-11-07T01:31:50.768Z] 01:31:50 INFO - ==> process 1178 launched child process 1412

[task 2019-11-07T01:31:52.276Z] 01:31:52 INFO - mozcrash Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpVyDQ8d.mozrunner/minidumps/7ffdcd9d-6d3c-b607-8799-954437912954.dmp /builds/worker/workspace/build/symbols
[task 2019-11-07T01:31:58.645Z] 01:31:58 INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/7ffdcd9d-6d3c-b607-8799-954437912954.dmp
[task 2019-11-07T01:31:58.645Z] 01:31:58 INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/7ffdcd9d-6d3c-b607-8799-954437912954.extra
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - Crash dump filename: /tmp/tmpVyDQ8d.mozrunner/minidumps/7ffdcd9d-6d3c-b607-8799-954437912954.dmp
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - Operating system: Linux
[task 2019-11-07T01:31:58.682Z] 01:31:58 INFO - 0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - CPU: amd64
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - family 6 model 85 stepping 4
[task 2019-11-07T01:31:58.683Z] 01:31:58 INFO - 1 CPU
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - GPU: UNKNOWN
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Crash address: 0x0
[task 2019-11-07T01:31:58.684Z] 01:31:58 INFO - Process uptime: not available
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO -
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - Thread 7 (crashed)
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - 0 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:64599da92fcc6f9988d9e50a15c364beef9cb009 : 213 + 0x22]
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - rax = 0x00007efeab590414 rdx = 0x0000000000000000
[task 2019-11-07T01:31:58.685Z] 01:31:58 INFO - rcx = 0x000056527442ee28 rbx = 0x000000000000003f
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - rsi = 0x00007efeb73e5770 rdi = 0x00007efeb73e4540
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - rbp = 0x00007efe63e8aec0 rsp = 0x00007efe63e8ae90
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r8 = 0x00007efeb73e5770 r9 = 0x00007efe63e8b700
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r10 = 0x0000000000000001 r11 = 0x0000000000000000
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r12 = 0x00007efe63e8b650 r13 = 0x0000000000000000
[task 2019-11-07T01:31:58.686Z] 01:31:58 INFO - r14 = 0x00007efe63e8aea0 r15 = 0x00007efe479db140
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - rip = 0x00007efea801f072
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - Found by: given as instruction pointer in context
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - 1 libnspr4.so!_pt_root [ptthread.c:64599da92fcc6f9988d9e50a15c364beef9cb009 : 201 + 0x7]
[task 2019-11-07T01:31:58.687Z] 01:31:58 INFO - rbx = 0x00007efe59a25ee0 rbp = 0x00007efe63e8af10
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - rsp = 0x00007efe63e8aed0 r12 = 0x00007efe63e8b650
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - r13 = 0x0000000000000000 r14 = 0x00000000000020d8
[task 2019-11-07T01:31:58.688Z] 01:31:58 INFO - r15 = 0x00007efe63e8b700 rip = 0x00007efeb845b0ec
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - Found by: call frame info
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - 2 libpthread-2.23.so + 0x76ba
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - rsp = 0x00007efe63e8af20 r12 = 0x0000000000000000
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - r13 = 0x00007ffed4470e5f r14 = 0x00007efe63e8b9c0
[task 2019-11-07T01:31:58.689Z] 01:31:58 INFO - r15 = 0x0000000000000000 rip = 0x00007efeb809d6ba
[task 2019-11-07T01:31:58.690Z] 01:31:58 INFO - Found by: call frame info
[task 2019-11-07T01:31:58.690Z] 01:31:58 INFO - 3 libc-2.23.so + 0x10741d
[task 2019-11-07T01:31:58.691Z] 01:31:58 INFO - rsp = 0x00007efe63e8afc0 rip = 0x00007efeb712641d
[task 2019-11-07T01:31:58.691Z] 01:31:58 INFO - Found by: stack scanning

Whiteboard: [stockwell needswork]

(In reply to Worcester12345 from comment #93)

Please add "Crash" key word.

Also add "PERF" key word?

(In reply to Andreea Pavel [:apavel] from comment #260)

There are 94 total failures in the last 7 days mostly on linux64 debug and 399 in the last 20 days.

My patch in bug 1580212 will hopefully fix all of the mochitest-browser-chrome-fis-e10s-16 failures, which are about half of the last approximately 100 failures.

Another 27 of these failures are in mochitest-browser-chrome-fis-e10s-7. I checked a handful of these failures, and they were all in browser/components/uitour/test/. I'll file a bug about flipping the pref in there, too.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(bugs)

Thank you.

Whiteboard: [stockwell needswork] → [stockwell needswork][comment 262]
Depends on: 1594878
Depends on: 1595190
Depends on: 1595191
Whiteboard: [comment 262][stockwell disable-recommended] → [comment 262][comment 265][comment 266]
Whiteboard: [comment 262][comment 265][comment 266] → [comment 262]
Depends on: 1596307

These shutdown hangs spiked up again on the 16th. It looks like a lot of these failures are happening in toolkit/components/reader/test/ with Fission, and bug 1586139 landed on the 16th and reenabled these tests. I think this is the same worker shutdown hang we saw with the reader tests in another directory (bug 1580212), so we need to flip the pref to reuse content processes here, too. I'll file a bug blocking this one.

oh in bug 1525281, :bc is looking at which test is causing the leak, possibly there is something larger going on here.

Depends on: 1598075
Whiteboard: [comment 262][stockwell disable-recommended] → [Comment 269]
You need to log in before you can comment on or make changes to this bug.