Closed Bug 1139597 Opened 9 years ago Closed 6 years ago

Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [None]

Categories

(Core :: XPCOM, defect, P3)

x86_64
macOS
defect

Tracking

()

RESOLVED INVALID

People

(Reporter: RyanVM, Unassigned)

References

Details

(Whiteboard: [stockwell unknown])

08:42:52 INFO - TEST-INFO | Main app process: exit 0
08:42:52 INFO - runtests.py | Application ran for: 0:02:25.307033
08:42:52 INFO - zombiecheck | Reading PID log: /var/folders/4q/slhhrqs92_l1cnypjyf7mbxh00000w/T/tmpT1DQe0pidlog
08:42:52 INFO - mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1425479998/firefox-39.0a1.en-US.mac.crashreporter-symbols.zip
08:43:33 INFO - mozcrash Saved minidump as /builds/slave/talos-slave/test/build/blobber_upload_dir/76A70A28-D2AA-42EB-975B-03BE94580FF0.dmp
08:43:33 INFO - mozcrash Saved app info as /builds/slave/talos-slave/test/build/blobber_upload_dir/76A70A28-D2AA-42EB-975B-03BE94580FF0.extra
08:43:33 WARNING - PROCESS-CRASH | Main app process exited normally | application crashed [None]
08:43:33 INFO - Crash dump filename: /var/folders/4q/slhhrqs92_l1cnypjyf7mbxh00000w/T/tmpo47v6k.mozrunner/minidumps/76A70A28-D2AA-42EB-975B-03BE94580FF0.dmp
08:43:33 INFO - Operating system: Mac OS X
08:43:33 INFO - 10.10.2 14C109
08:43:33 INFO - CPU: amd64
08:43:33 INFO - family 6 model 42 stepping 7
08:43:33 INFO - 8 CPUs
08:43:33 INFO - Crash reason: EXC_SOFTWARE / SIGABRT
08:43:33 INFO - Crash address: 0x7fff912fb4de
08:43:33 INFO - Thread 0
08:43:33 INFO - 0 XUL!nsCycleCollector::FreeSnowWhite(bool) [nsISupportsImpl.h:a190978e94f1 : 314 + 0x4]
08:43:33 INFO - rbx = 0x000000010a9e9000 r12 = 0x0000000153489cf8
08:43:33 INFO - r13 = 0x0000000153489008 r14 = 0x0000000000000d08
08:43:33 INFO - r15 = 0x000000012fd9fae0 rip = 0x0000000100490cd1
08:43:33 INFO - rsp = 0x00007fff5fbfc0b0 rbp = 0x00007fff5fbfc140
08:43:33 INFO - Found by: given as instruction pointer in context
08:43:33 INFO - 1 XUL!nsCycleCollector_doDeferredDeletion() [nsCycleCollector.cpp:a190978e94f1 : 4160 + 0x4]
08:43:33 INFO - rbx = 0x000000a6d5230929 r12 = 0x000000010a916128
08:43:33 INFO - r13 = 0x00007fff5fbfc267 r14 = 0x000000010a916100
08:43:33 INFO - r15 = 0x00000001104f9580 rip = 0x0000000100493aba
08:43:33 INFO - rsp = 0x00007fff5fbfc150 rbp = 0x00007fff5fbfc150
08:43:33 INFO - Found by: call frame info
08:43:33 INFO - 2 XUL!AsyncFreeSnowWhite::Run() [XPCJSRuntime.cpp:a190978e94f1 : 228 + 0x4]
08:43:33 INFO - rbx = 0x000000a6d5230929 r12 = 0x000000010a916128
08:43:33 INFO - r13 = 0x00007fff5fbfc267 r14 = 0x000000010a916100
08:43:33 INFO - r15 = 0x00000001104f9580 rip = 0x0000000100bc382f
08:43:33 INFO - rsp = 0x00007fff5fbfc160 rbp = 0x00007fff5fbfc180
08:43:33 INFO - Found by: call frame info
08:43:33 INFO - 3 XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a190978e94f1 : 855 + 0x5]
08:43:33 INFO - rbx = 0x000000010a916168 r12 = 0x000000010a916128
08:43:33 INFO - r13 = 0x00007fff5fbfc267 r14 = 0x000000010a916100
08:43:33 INFO - r15 = 0x000000010a916100 rip = 0x00000001004e867d
08:43:33 INFO - rsp = 0x00007fff5fbfc190 rbp = 0x00007fff5fbfc250
08:43:33 INFO - Found by: call frame info
08:43:33 INFO - 4 XUL!NS_ProcessPendingEvents(nsIThread*, unsigned int) [nsThreadUtils.cpp:a190978e94f1 : 207 + 0xd]
08:43:33 INFO - rbx = 0x0000000000000000 r12 = 0x00000000000aeef5
08:43:33 INFO - r13 = 0x00007fff5fbfc267 r14 = 0x0000000000000014
08:43:33 INFO - r15 = 0x000000010a916100 rip = 0x0000000100509686
08:43:33 INFO - rsp = 0x00007fff5fbfc260 rbp = 0x00007fff5fbfc290
08:43:33 INFO - Found by: call frame info
08:43:33 INFO - 5 XUL!nsBaseAppShell::NativeEventCallback() [nsBaseAppShell.cpp:a190978e94f1 : 98 + 0xe]
08:43:33 INFO - rbx = 0x000000010a92fde0 r12 = 0x0000000000000000
08:43:33 INFO - r13 = 0x0000000000001d03 r14 = 0x000000010a916100
08:43:33 INFO - r15 = 0x000000010a92fd00 rip = 0x0000000101ffdf97
08:43:33 INFO - rsp = 0x00007fff5fbfc2a0 rbp = 0x00007fff5fbfc2c0
08:43:33 INFO - Found by: call frame info
08:43:33 INFO - 6 XUL!nsAppShell::ProcessGeckoEvents(void*) [nsAppShell.mm:a190978e94f1 : 377 + 0x7]
08:43:33 INFO - rbx = 0x000000010a931000 r12 = 0x0000000108f0b440
08:43:33 INFO - r13 = 0x0000000000001d03 r14 = 0x0000000108f0b458
08:43:33 INFO - r15 = 0x000000010a92fde0 rip = 0x0000000102044cb2
08:43:33 INFO - rsp = 0x00007fff5fbfc2d0 rbp = 0x00007fff5fbfc310
08:43:33 INFO - Found by: call frame info
[Mass Closure] Closing bug as the WORKSFORME as the intermittent failure has not been seen for 45+ days If this has been closed and you feel that it should Not have been closed, please reopen and add [leave open] to the whiteboard.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Reopening as per recent comments.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Ted, I'm not sure if there's anything useful here that you can do something with, but it looks like minidump_stackwalk is failing to get a stack somehow.
Flags: needinfo?(ted)
(In reply to Andrew McCreight [:mccr8] from comment #23)
> Ted, I'm not sure if there's anything useful here that you can do something
> with, but it looks like minidump_stackwalk is failing to get a stack somehow.

I poked at a few of the minidumps from the logs, and they're basically all malformed. The Windows ones are completely empty, and the OS X ones have a valid header but are missing the information in the minidump directory that points to the actual useful contents. This is why minidump_stackwalk fails to print anything.

This usually means memory was trashed, but in this case, since the main app process is shutting down cleanly, it might be a race between the child process crashing and the parent process exiting. It's possible we only wind up writing part of the dump before the main process exits, or something like that.
Flags: needinfo?(ted)
Thanks for taking a look, Ted. That sounds plausible.

I looked at all of the crashes in the last week or so and they all had [None] so I'm going to change the signature. A number of them were hitting assertion failures along the lines of:

Assertion failure: lock != NULL, at c:/builds/moz2_slave/fx-team-w32-d-0000000000000000/build/src/nsprpub/pr/src/threads/combined/prulock.c:211
Summary: Intermittent OSX 10.10 PROCESS-CRASH | Main app process exited normally | application crashed [None][@ nsCycleCollector::FreeSnowWhite(bool)] → Intermittent OSX 10.10 PROCESS-CRASH | Main app process exited normally | application crashed [None]
This occurred only on OS X 10.6 and Windows 7 in the last month.
Summary: Intermittent OSX 10.10 PROCESS-CRASH | Main app process exited normally | application crashed [None] → Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [None]
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
There are 44 failures in the past 7 days.
Platforms: Linux opt/debug, Linux x64 opt/debug, linux64-qr opt/debug, linux64-stylo-disabled opt/debug, OS X 10.10 opt/debug and linux32-stylo-disabled debug.
Recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=155746958
[task 2018-01-11T22:20:03.840Z] 22:20:03     INFO - Buffered messages logged at 22:13:37
17221
[task 2018-01-11T22:20:03.842Z] 22:20:03     INFO - SpawnTask.js | Entering test test_drawWindow
17222
[task 2018-01-11T22:20:03.843Z] 22:20:03     INFO - Extension loaded
17223
[task 2018-01-11T22:20:03.844Z] 22:20:03     INFO - Extension loaded
17224
[task 2018-01-11T22:20:03.845Z] 22:20:03     INFO - Buffered messages finished
17225
[task 2018-01-11T22:20:03.846Z] 22:20:03    ERROR - TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/mochitest/test-oop-extensions/test_ext_contentscript_canvas.html | application timed out after 370 seconds with no output
17226
[task 2018-01-11T22:20:03.860Z] 22:20:03    ERROR - Force-terminating active process(es).
17227
[task 2018-01-11T22:20:03.861Z] 22:20:03     INFO - Determining child pids from psutil...
17228
[task 2018-01-11T22:20:03.862Z] 22:20:03     INFO - [3341, 3370]
17229
[task 2018-01-11T22:20:03.863Z] 22:20:03     INFO - ==> process 3236 launched child process 3259
17230
[task 2018-01-11T22:20:03.864Z] 22:20:03     INFO - ==> process 3236 launched child process 3287
17231
[task 2018-01-11T22:20:03.866Z] 22:20:03     INFO - ==> process 3236 launched child process 3341
17232
[task 2018-01-11T22:20:03.867Z] 22:20:03     INFO - ==> process 3236 launched child process 3370
17233
[task 2018-01-11T22:20:03.868Z] 22:20:03     INFO - ==> process 3236 launched child process 3410
17234
[task 2018-01-11T22:20:03.869Z] 22:20:03     INFO - Found child pids: set([3370, 3259, 3410, 3341, 3287])
17235
[task 2018-01-11T22:20:03.870Z] 22:20:03     INFO - Failed to get child procs
17236
[task 2018-01-11T22:20:03.871Z] 22:20:03     INFO - Killing process: 3370
17237
[task 2018-01-11T22:20:03.873Z] 22:20:03     INFO - TEST-INFO | started process screentopng
17238
[task 2018-01-11T22:20:04.213Z] 22:20:04     INFO - TEST-INFO | screentopng: exit 0
17239
[task 2018-01-11T22:20:04.214Z] 22:20:04     INFO - Killing process: 3259
17240
[task 2018-01-11T22:20:04.215Z] 22:20:04     INFO - Not taking screenshot here: see the one that was previously logged
17241
[task 2018-01-11T22:20:04.215Z] 22:20:04     INFO - Can't trigger Breakpad, process no longer exists
17242
[task 2018-01-11T22:20:04.216Z] 22:20:04     INFO - Killing process: 3410
17243
[task 2018-01-11T22:20:04.217Z] 22:20:04     INFO - Not taking screenshot here: see the one that was previously logged
17244
[task 2018-01-11T22:20:04.217Z] 22:20:04     INFO - Can't trigger Breakpad, process no longer exists
17245
[task 2018-01-11T22:20:04.218Z] 22:20:04     INFO - Killing process: 3341
17246
[task 2018-01-11T22:20:04.219Z] 22:20:04     INFO - Not taking screenshot here: see the one that was previously logged
17247
[task 2018-01-11T22:20:04.219Z] 22:20:04     INFO - Killing process: 3287
17248
[task 2018-01-11T22:20:04.220Z] 22:20:04     INFO - Not taking screenshot here: see the one that was previously logged
17249
[task 2018-01-11T22:20:04.221Z] 22:20:04     INFO - Can't trigger Breakpad, process no longer exists
17250
[task 2018-01-11T22:20:04.221Z] 22:20:04     INFO - Killing process: 3236
17251
[task 2018-01-11T22:20:04.221Z] 22:20:04     INFO - Not taking screenshot here: see the one that was previously logged
17252
[task 2018-01-11T22:20:04.516Z] 22:20:04     INFO - psutil found pid 3236 dead
17253
[task 2018-01-11T22:20:04.686Z] 22:20:04  WARNING - runtests.py | Failed to get app exit code - running/crashed?
17254
[task 2018-01-11T22:20:04.689Z] 22:20:04     INFO - TEST-INFO | Main app process: exit 0
17255
[task 2018-01-11T22:20:04.690Z] 22:20:04     INFO - runtests.py | Application ran for: 0:07:48.650529
17256
[task 2018-01-11T22:20:04.691Z] 22:20:04     INFO - zombiecheck | Reading PID log: /tmp/tmpmZubAOpidlog
17257
[task 2018-01-11T22:20:04.692Z] 22:20:04     INFO - ==> process 3236 launched child process 3259
17258
[task 2018-01-11T22:20:04.694Z] 22:20:04     INFO - ==> process 3236 launched child process 3287
17259
[task 2018-01-11T22:20:04.695Z] 22:20:04     INFO - ==> process 3236 launched child process 3341
17260
[task 2018-01-11T22:20:04.696Z] 22:20:04     INFO - ==> process 3236 launched child process 3370
17261
[task 2018-01-11T22:20:04.697Z] 22:20:04     INFO - ==> process 3236 launched child process 3410

:froydnj Could you please take a look?
Flags: needinfo?(nfroyd)
Whiteboard: [stockwell needswork]
From the log in comment 88:
The uploaded screenshot shows a tab crashed page: https://public-artifacts.taskcluster.net/Ymt8IHhhRgChqIOpKZqqXw/0/public/test_info//mozilla-test-fail-screenshot_5YFdB5.png

That seems bad. I thought we were setting `MOZ_CRASHREPORTER_SHUTDOWN=1` to force the test harness to quit when this happens?

From the log snippet above, looks like maybe the browser process was hung waiting on a content process, and then our cleanup routine did `kill -ABRT` on that content process, which unstuck the browser? You can see that the browser pid here was 3236, and after we loop through all of the pids it launched we try to kill it and hit "psutil found pid 3236 dead".

I'm not quite sure what's going on with the `application crashed [None]` bit--we get a couple of other sensible looking stacks in the log: the one for the browser process shows it just hanging out spinning the event loop, and there's one for a content process (which looks like the actual underlying problem here) which starts out:
[task 2018-01-11T22:20:25.106Z] 22:20:25     INFO - Thread 26 (crashed)
[task 2018-01-11T22:20:25.107Z] 22:20:25     INFO -  0  firefox!MOZ_CrashOOL [Assertions.cpp:adb5fab118c5 : 33 + 0x0]
[task 2018-01-11T22:20:25.108Z] 22:20:25     INFO -     eip = 0x0804d213   esp = 0xdcefea20   ebp = 0xdcefea28   ebx = 0xf398394a
[task 2018-01-11T22:20:25.109Z] 22:20:25     INFO -     esi = 0x08086000   edi = 0xf54a1000   eax = 0x08086980   ecx = 0xf7560864
[task 2018-01-11T22:20:25.110Z] 22:20:25     INFO -     edx = 0x00000000   efl = 0x00210282
[task 2018-01-11T22:20:25.111Z] 22:20:25     INFO -     Found by: given as instruction pointer in context
[task 2018-01-11T22:20:25.111Z] 22:20:25     INFO -  1  libxul.so!nsAutoOwningThread::AssertCurrentThreadOwnsMe [nsISupportsImpl.cpp:adb5fab118c5 : 46 + 0x12]
[task 2018-01-11T22:20:25.112Z] 22:20:25     INFO -     eip = 0xef62e01d   esp = 0xdcefea30   ebp = 0xdcefea48   ebx = 0xf54a1000
[task 2018-01-11T22:20:25.113Z] 22:20:25     INFO -     esi = 0xdf611200   edi = 0xf54a1000
[task 2018-01-11T22:20:25.114Z] 22:20:25     INFO -     Found by: call frame info
[task 2018-01-11T22:20:25.114Z] 22:20:25     INFO -  2  libxul.so!mozilla::net::nsStandardURL::Release [nsISupportsImpl.h:adb5fab118c5 : 69 + 0x6]
[task 2018-01-11T22:20:25.115Z] 22:20:25     INFO -     eip = 0xef7400f9   esp = 0xdcefea50   ebp = 0xdcefea78   ebx = 0xdf690640
[task 2018-01-11T22:20:25.116Z] 22:20:25     INFO -     esi = 0xdf611200   edi = 0xf54a1000
[task 2018-01-11T22:20:25.116Z] 22:20:25     INFO -     Found by: call frame info

The bogus minidump might just be an artifact of something racing during shutdown?
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=155746958&lineNumber=20426

Redirecting to Valentin for the nsStandardURL issue from comment 88.  I don't see the same URL failures in other logs like the brasstacks links in comment 87, but we might as well start somewhere...
Flags: needinfo?(nfroyd) → needinfo?(valentin.gosu)
This looks like jar channel gets released off-MT and so does the nsStandardURL.
This is probably due to bug 1373708 (which got backed out).
Gary, note that we should use nsMainThreadPtrHandle<T> in nsJARChannel for all URI types if the channel will be released OMT.
Flags: needinfo?(valentin.gosu)
Everything starred as this over the past two weeks either has another bug it should be starred as, or should have another bug filed for it, or was bustage and shouldn't have been starred at all. This bug is nothing but an attractive nuisance.
Status: REOPENED → RESOLVED
Closed: 9 years ago6 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.