Closed Bug 1139860 Opened 5 years ago Closed 4 years ago

Intermittent test_autoIncrement.js,test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js,test_writer_starvation.js | application crashed [@ libsystem_kernel.dylib + 0x16166]

Categories

(Core :: IPC, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
e10s + ---
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- affected
firefox42 --- fixed
firefox43 --- fixed
firefox44 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: billm)

References

()

Details

(Keywords: crash, intermittent-failure, Whiteboard: KillHard)

Attachments

(1 file, 1 obsolete file)

Rev5 MacOSX Yosemite 10.10 fx-team opt test xpcshell

https://treeherder.mozilla.org/logviewer.html#?job_id=2161747&repo=fx-team

23:11:34 WARNING - PROCESS-CRASH | extensions/cookie/test/unit_ipc/test_parent.js | application crashed [@ libsystem_kernel.dylib + 0x16166]
23:11:34 INFO - Crash dump filename: /var/folders/h_/qm_0_8l16mx72zdxd4g30gd800000w/T/tmpiqDWjI/FEAE0F68-9830-41E7-99E0-C290A61C9AD4-browser.dmp
23:11:34 INFO - Operating system: Mac OS X
23:11:34 INFO - 10.10.2 14C109
23:11:34 INFO - CPU: amd64
23:11:34 INFO - family 6 model 42 stepping 7
23:11:34 INFO - 8 CPUs
23:11:34 INFO - Crash reason: EXC_BREAKPOINT / 0x00000002
23:11:34 INFO - Crash address: 0x7fff93895166
23:11:34 INFO - Thread 0 (crashed)
23:11:34 INFO - 0 libsystem_kernel.dylib + 0x16166
23:11:34 INFO - rbx = 0x0000000000000203 r12 = 0x00007fff5fbfe7b8
23:11:34 INFO - r13 = 0x0000000000000203 r14 = 0x0000000000000000
23:11:34 INFO - r15 = 0x00007fff5fbfe7a0 rip = 0x00007fff93895166
23:11:34 INFO - rsp = 0x00007fff5fbfe3e8 rbp = 0x00007fff5fbfe470
23:11:34 INFO - Found by: given as instruction pointer in context
23:11:34 INFO - 1 libsystem_pthread.dylib + 0x1789
23:11:34 INFO - rip = 0x00007fff9688578a rsp = 0x00007fff5fbfe3f0
23:11:34 INFO - rbp = 0x00007fff5fbfe470
23:11:34 INFO - Found by: stack scanning
23:11:34 INFO - 2 XUL!google_breakpad::ExceptionHandler::WriteMinidump(bool) [exception_handler.cc:03d1fd491515 : 294 + 0x4]
23:11:34 INFO - rip = 0x0000000102844aba rsp = 0x00007fff5fbfe480
23:11:34 INFO - rbp = 0x00007fff5fbfe710
23:11:34 INFO - Found by: stack scanning
23:11:34 INFO - 3 XUL!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool, bool (*)(char const*, char const*, void*, bool), void*) [exception_handler.cc:03d1fd491515 : 309 + 0xb]
23:11:34 INFO - rbx = 0x0000000111740c80 r12 = 0x00007fff7e1e1070
23:11:34 INFO - r13 = 0x00007fff5fbfe860 r14 = 0x0000000000000001
23:11:34 INFO - r15 = 0x00007fff5fbfe730 rip = 0x0000000102844ceb
23:11:34 INFO - rsp = 0x00007fff5fbfe720 rbp = 0x00007fff5fbfe820
23:11:34 INFO - Found by: call frame info
23:11:34 INFO - 4 XUL!CrashReporter::CreatePairedMinidumps(unsigned int, unsigned int, nsIFile**) [nsExceptionHandler.cpp:03d1fd491515 : 3245 + 0x14]
23:11:34 INFO - rbx = 0x0000000111740c80 r12 = 0x0000000000008500
23:11:34 INFO - r13 = 0x00007fff5fbfe860 r14 = 0x00007fff5fbfe920
Component: General → IPC
Duplicate of this bug: 1139951
Duplicate of this bug: 1139985
Summary: Intermittent test_parent.js | application crashed [@ libsystem_kernel.dylib + 0x16166] → Intermittent test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js | application crashed [@ libsystem_kernel.dylib + 0x16166]
Summary: Intermittent test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js | application crashed [@ libsystem_kernel.dylib + 0x16166] → Intermittent test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js,test_writer_starvation.js | application crashed [@ libsystem_kernel.dylib + 0x16166]
See Also: → 1120785
Summary: Intermittent test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js,test_writer_starvation.js | application crashed [@ libsystem_kernel.dylib + 0x16166] → Intermittent test_autoIncrement.js,test_dns_service_wrap.js,test_ipc_parser_0001.js,test_parent.js,test_update_prefs.js,test_writer_starvation.js | application crashed [@ libsystem_kernel.dylib + 0x16166]
Another frequent OSX IPC crash. Note again the various See Alsos.
tracking-e10s: --- → ?
Flags: needinfo?(jmathies)
See Also: → 1140915, 1121629
Flags: needinfo?(jmathies)
See Also: → 1139309
This is a KillHard abort, unfortunately I have no idea which one it is. Is there any way to access crash report meta data from a crash like this?
Blocks: killhard-win
Flags: needinfo?(ryanvm)
Ted, is that something you'd know about?
Flags: needinfo?(ryanvm) → needinfo?(ted)
Whiteboard: KillHard
If we can't get at those reports, we should file a bug on dumping crashreporter annotations to the mochitest logs when we crash on a test run.
Flags: needinfo?(jmathies)
You can, we upload the .extra file alongside the .dmp file to blobber. If you look in the log from comment 100 you'll see:
10:55:32     INFO -  (blobuploader) - INFO - TinderboxPrint: <a href='http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-beta/sha512/41354c0381fbab7cb41b361a8c939de08ff48d03b5ecf0abaeb9a2ffad1e8f2b8b61fe7051fde208813f744efde07a39cdc88bfcac95a4316b4744181e70c3ae'>B2771352-9D55-4054-B896-E0425102A294.extra</a>: uploaded

So:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-beta/sha512/41354c0381fbab7cb41b361a8c939de08ff48d03b5ecf0abaeb9a2ffad1e8f2b8b61fe7051fde208813f744efde07a39cdc88bfcac95a4316b4744181e70c3ae

...unfortunately this is not terribly useful as it only contains:
StartupTime=1428601582
CrashTime=1428601604

I think the KillHard code maybe isn't quite doing things right. For one thing, it's calling AnnotateCrashReport *after* CreatePairedMinidump, and CreatePairedMinidump writes the .extra file out so those annotations aren't being used:
https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/toolkit/crashreporter/nsExceptionHandler.cpp#l3178
Flags: needinfo?(ted)
Flags: needinfo?(jmathies)
(In reply to Treeherder Robot from comment #202)
> log:
> https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-
> inbound&job_id=10772403
> repository: mozilla-inbound
> start_time: 2015-06-15T00:48:23
> who: tomcat[at]mozilla[dot]com
> machine: t-yosemite-r5-0082
> buildname: Rev5 MacOSX Yosemite 10.10 mozilla-inbound opt test xpcshell
> revision: a493653ebbed
> 
> PROCESS-CRASH |
> xpcshell-child-process.ini:dom/indexedDB/test/unit/test_autoIncrement.js |
> application crashed [@ libsystem_kernel.dylib + 0x16166]
> PROCESS-CRASH |
> xpcshell-child-process.ini:dom/indexedDB/test/unit/test_autoIncrement.js |
> application crashed [@ libsystem_kernel.dylib + 0x115da]
> Return code: 1

extra data:

StartupTime=1434354812
CrashTime=1434354824
StartupTime=1434354814
ProcessType=content
additional_minidumps=browser
kill_hard=
ipc_channel_error=ShutDownKill

This points to this content process killing safety timeout:

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#3377
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2091

The failure here only happens in xpshell tests. Apparently this safety timeout is active despite it being set to 0 in at least one test related profile. I'm not sure if this is expected though.

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2098

billm looks like you set this timeout code up. Do you have any idea if this should be running during xpshell tests?
Flags: needinfo?(wmccloskey)
It looks like we don't use prefs_general.js when running xpcshell tests. Joel, is there an alternate place for putting prefs that will get picked up when we run xpcshell tests?
Flags: needinfo?(wmccloskey) → needinfo?(jmaher)
xpcshell tests don't have a profile by default. They can use do_get_profile to get one:
https://hg.mozilla.org/mozilla-central/annotate/ce863f9d8864/testing/xpcshell/head.js#l1114

...but I don't think that sets any default prefs.
I think we should just set the default in that getter code in ContentParent to zero. We set actual values in prefs, the defaults should be the values that turn the feature off.
thanks ted. I am fine with a general solution here or something specific to the test/directory.
Flags: needinfo?(jmaher)
I'll try to implement Jim's solution. It makes the most sense.
Flags: needinfo?(wmccloskey)
Attached patch patch (obsolete) — Splinter Review
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Flags: needinfo?(wmccloskey)
Attachment #8627358 - Flags: review?(jmathies)
Attachment #8627358 - Flags: review?(jmathies) → review+
https://hg.mozilla.org/mozilla-central/rev/5bc9df8e1808
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
\m/

Please request Aurora/Beta approval on this when you get a chance :)
Comment on attachment 8627358 [details] [diff] [review]
patch

This patch should fix some random orange on aurora/beta. It doesn't change things in release at all.

Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: none
[Describe test coverage new/current, TreeHerder]: on m-c
[Risks and why]: basically no risk. this only affects testing.
[String/UUID change made/needed]: none
Flags: needinfo?(wmccloskey)
Attachment #8627358 - Flags: approval-mozilla-beta?
Attachment #8627358 - Flags: approval-mozilla-aurora?
Comment on attachment 8627358 [details] [diff] [review]
patch

Thanks for paying attention to intermittent failures. Aurora+ Beta+
Attachment #8627358 - Flags: approval-mozilla-beta?
Attachment #8627358 - Flags: approval-mozilla-beta+
Attachment #8627358 - Flags: approval-mozilla-aurora?
Attachment #8627358 - Flags: approval-mozilla-aurora+
This patch appears to have helped, but unfortunately this still isn't fixed :(
Status: RESOLVED → REOPENED
Flags: needinfo?(wmccloskey)
Resolution: FIXED → ---
See Also: → 1178194
Target Milestone: mozilla42 → ---