[10.15] Crash in [@ CrashReporter::TerminateHandler]
Categories
(Toolkit :: Crash Reporting, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | fixed |
firefox68 | --- | wontfix |
firefox69 | --- | fixed |
firefox70 | --- | fixed |
People
(Reporter: marcia, Assigned: haik)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(3 files)
11.49 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release-
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
159.65 KB,
image/png
|
Details |
This bug is for crash report bp-81c9333d-4860-4800-a8cb-6f58a0190716. All of the crashes are 10.15 users running 10.15.0 19A501i .
Seen while looking at nightly crash stats: https://bit.ly/2GgJOpY. Crashes started in 20190715214335.
Not sure if this is something we regressed or whether this was around the time the third beta came out.
Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=30b8d57cb72a2f955532a0e670c599881a110f17&tochange=c0bcda96a954fe7a3700466bda256aea58189ac9
Top 9 frames of crashing thread:
0 XUL CrashReporter::TerminateHandler toolkit/crashreporter/nsExceptionHandler.cpp:1380
1 @0x7fff69db6da6
2 @0x7fff69db6b54
3 @0x7fff69da834e
4 @0x7fff68ad42af
5 @0x7fff68adb6de
6 @0x7fff68adcb35
7 @0x7fff6cdd3da9
8 @0x7fff6cdd06ae
Reporter | ||
Comment 1•5 years ago
|
||
https://bit.ly/2JQ5Gtt has similar stuff in the stack, may be the same issue since they are both 10.15 crashes.
Reporter | ||
Comment 2•5 years ago
|
||
"No proper signature could be created because no good data for the crashing thread" shows up in the reports. All of them show content process rdd. 503 crashes/51 installs so far. Andrew - Any ideas what might have triggered this?
Comment 3•5 years ago
|
||
Looking at the regression range in comment 0, bug 1560368 involves RDD. Bug 1546299 talks about the Mac, but I don't know how signing might cause these stacks to be so bad.
Reporter | ||
Comment 4•5 years ago
|
||
Hello Michael and Aki - Adding you both per Comment 3 with the hope of finding out what might have caused this macOS 10.15 spike.
Comment 5•5 years ago
|
||
Bug 1546299 is only for the geckodriver binary, which appears to be for internal testing, e.g. for marionette tests. This happens in a task separate from the actual build artifact signing. I would be surprised if that's the cause of the crashreporter spike.
Comment 6•5 years ago
|
||
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #0)
Not sure if this is something we regressed or whether this was around the time the third beta came out.
Do we know what channel it's from? The third beta would be the first Firefox beta aiui (the first two are devedition-only), making it the first Firefox Beta that can run on Catalina at all. If that's it, this may be the baseline of Catalina crashes on the beta channel.
Comment 7•5 years ago
|
||
Could this be related to bug 1556846? That is supposed to fix an RDD crash, but it wasn't uplifted to beta until the 17th.
Reporter | ||
Comment 8•5 years ago
|
||
Sorry I confused everyone with using beta terminology. I meant the macOS 10.15 developer betas. One was just pushed the other day, (19A512f). The crash reports show both that version and the previous version (10.15.0 19A512f ) crashing.
Comment 9•5 years ago
|
||
The change in bug 1560368 added Opus decoding on RDD, but this is not the first decoder to run on RDD. However, it would be a simple change to pref-off Opus RDD decoding[1] and see if it moves the crash report needle.
[1] https://searchfox.org/mozilla-central/source/modules/libpref/init/StaticPrefList.h#5918
Reporter | ||
Comment 10•5 years ago
|
||
I guess we can try what Michael suggests in Comment 9. We have 745 crashes/105 installations so far, all on 10.15 (the latest seed 10.15.0 19A512f ).
The bug in Comment 7 landed in Nightly 70 on 7-10, which doesn't exactly map to the regression range since these crashes started in the 7-15 build. Adding Haik in case he has any insight here.
Assignee | ||
Comment 11•5 years ago
|
||
Here's the crashing stack. I'll attach a listing of all the thread stacks.
Process 82598 stopped
* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010fa95b85 XUL`CrashReporter::TerminateHandler() (.llvm.5017237185227756273) + 21
XUL`CrashReporter::TerminateHandler() (.llvm.5017237185227756273):
-> 0x10fa95b85 <+21>: movl $0x564, 0x0 ; imm = 0x564
0x10fa95b90 <+32>: callq 0x1105ea1c8 ; symbol stub for: abort
0x10fa95b95 <+37>: nopw %cs:(%rax,%rax)
0x10fa95b9f <+47>: nop
Target 0: (plugin-container) stopped.
(lldb) bt
* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x000000010fa95b85 XUL`CrashReporter::TerminateHandler() (.llvm.5017237185227756273) + 21
frame #1: 0x00007fff63b29da7 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #2: 0x00007fff63b29b55 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
frame #3: 0x00007fff63b1b34f libc++abi.dylib`__cxa_throw + 113
frame #4: 0x00007fff628492b0 caulk`check_posix_error(char const*, int) + 168
frame #5: 0x00007fff628506df caulk`caulk::thread::attributes::apply_to_this_thread() + 35
frame #6: 0x00007fff62851b36 caulk`void* caulk::thread_proxy<std::__1::tuple<caulk::thread::attributes, void (caulk::concurrent::details::worker_thread::*)(), std::__1::tuple<caulk::concurrent::details::worker_thread*> > >(void*) + 15
frame #7: 0x00007fff66b40cce libsystem_pthread.dylib`_pthread_start + 125
frame #8: 0x00007fff66b3d72b libsystem_pthread.dylib`thread_start + 15
Assignee | ||
Comment 12•5 years ago
|
||
Full stack list. I noticed this thread stack in cubeb_init called from mozilla::OpusDataDecoder::Init().
thread #14
frame #0: 0x00007fff2f41fcaf CoreFoundation`__CFSearchStringROM + 44
frame #1: 0x00007fff2f41f76b CoreFoundation`__CFStringCreateImmutableFunnel3 + 1988
frame #2: 0x00007fff2f425824 CoreFoundation`CFStringCreateWithBytes + 27
frame #3: 0x00007fff2f4706df CoreFoundation`_createUniqueStringWithUTF8Bytes + 165
frame #4: 0x00007fff2f432dcc CoreFoundation`parseStringTag + 1544
frame #5: 0x00007fff2f430b30 CoreFoundation`parseXMLElement + 822
frame #6: 0x00007fff2f4312ff CoreFoundation`parseXMLElement + 2821
frame #7: 0x00007fff2f430c2d CoreFoundation`parseXMLElement + 1075
frame #8: 0x00007fff2f430269 CoreFoundation`_CFPropertyListCreateFromUTF8Data + 1884
frame #9: 0x00007fff2f515c96 CoreFoundation`_CFPropertyListCreateWithData + 600
frame #10: 0x00007fff2f42f58a CoreFoundation`CFPropertyListCreateWithData + 51
frame #11: 0x00007fff2f441b4e CoreFoundation`_CFBundleCopyInfoDictionaryInDirectoryWithVersion + 814
frame #12: 0x00007fff2f5c36dd CoreFoundation`_CFBundleRefreshInfoDictionaryAlreadyLocked + 111
frame #13: 0x00007fff2f44180c CoreFoundation`CFBundleGetInfoDictionary + 33
frame #14: 0x00007fff2f515403 CoreFoundation`_CFBundleCreate + 715
frame #15: 0x00007fff2f47b6f5 CoreFoundation`_CFBundleEnsureBundleExistsForImagePath + 55
frame #16: 0x00007fff2f47b59d CoreFoundation`CFBundleGetBundleWithIdentifier + 221
frame #17: 0x00007fff2e9bf927 CoreAudio`HALSystem::InitializeDevices() + 329
frame #18: 0x00007fff2e9be891 CoreAudio`HALSystem::CheckOutInstance() + 161
frame #19: 0x00007fff2e9c3c84 CoreAudio`AudioObjectSetPropertyData + 184
frame #20: 0x000000010ef05bdd XUL`audiounit_init + 125
frame #21: 0x000000010ef05181 XUL`cubeb_init + 177
frame #22: 0x000000010d9f53e7 XUL`mozilla::CubebUtils::GetCubebContextUnlocked() + 855
frame #23: 0x000000010dc91299 XUL`mozilla::OpusDataDecoder::Init() + 2089
frame #24: 0x000000010dbf4ae8 XUL`mozilla::RemoteDecoderParent::RecvInit() + 56
frame #25: 0x000000010bb68722 XUL`mozilla::PRemoteDecoderParent::OnMessageReceived(IPC::Message const&) + 178
frame #26: 0x000000010bb652ee XUL`mozilla::PRemoteDecoderManagerParent::OnMessageReceived(IPC::Message const&) + 1006
frame #27: 0x000000010b817ba3 XUL`mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) + 467
frame #28: 0x000000010b819278 XUL`mozilla::ipc::MessageChannel::MessageTask::Run() + 440
frame #29: 0x000000010b0d8c23 XUL`nsThread::ProcessNextEvent(bool, bool*) + 3411
frame #30: 0x000000010b0db629 XUL`NS_ProcessNextEvent(nsIThread*, bool) + 73
frame #31: 0x000000010b81d02a XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) + 282
frame #32: 0x000000010b0d5fd2 XUL`nsThread::ThreadFunc(void*) + 498
frame #33: 0x00000001066dae85 libnss3.dylib`_pt_root + 357
frame #34: 0x00007fff66b40cce libsystem_pthread.dylib`_pthread_start + 125
frame #35: 0x00007fff66b3d72b libsystem_pthread.dylib`thread_start + 15
Assignee | ||
Comment 13•5 years ago
|
||
On Nightly, setting media.rdd-opus.enabled
to false and restarting avoided the crash for me on macOS 10.15 Beta 4.
Assignee | ||
Comment 14•5 years ago
|
||
This might be another sandboxing issue.
Reporter | ||
Comment 15•5 years ago
|
||
Adding the topcrash keyword since this is #2 overall in 70 nightly.
Assignee | ||
Comment 16•5 years ago
•
|
||
This is a sandboxing issue on macOS 10.15. I'm testing a fix and should have it out for review today or tomorrow. Details below.
The cause of the crash is that new code in 10.15 triggers a crash when the pthread function pthread_setname_np
fails when called from some macOS internal library threads. The function fails in the RDD process because of sandboxing restrictions where the setcontrol
variant of the proc_info
syscall is not allowed. We allow this in content, but not RDD only because it was not known to be needed. The fix for bug 1560368 exposed this problem.
With a debug build, this is the call to pthread_setname_np
that fails and causes the crash.
frame #0: 0x00007fff70e8598c libsystem_pthread.dylib`pthread_setname_np
frame #1: 0x00007fff6cb901f9 caulk`caulk::mach::this_thread::set_name(char const*) + 9
frame #2: 0x00007fff6cb976df caulk`caulk::thread::attributes::apply_to_this_thread() + 35
frame #3: 0x00007fff6cb98b36 caulk`void* caulk::thread_proxy<std::__1::tuple<caulk::thread::attributes, void (caulk::concurrent::details::worker_thread::*)(), std::__1::tuple<caulk::concurrent::details::worker_thread*> > >(void*) + 15
frame #4: 0x00007fff70e87cce libsystem_pthread.dylib`_pthread_start + 125
frame #5: 0x00007fff70e8472b libsystem_pthread.dylib`thread_start + 15
But this is the actual crash stack after the exception handling.
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: pthread_setname_np failed: Operation not permitted
frame #0: 0x00007fff70dca6ce libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff70e87691 libsystem_pthread.dylib`pthread_kill + 258
frame #2: 0x00007fff70d52a5c libsystem_c.dylib`abort + 120
frame #3: 0x00007fff6de63bc8 libc++abi.dylib`abort_message + 231
frame #4: 0x00007fff6de63d64 libc++abi.dylib`demangling_terminate_handler() + 238
frame #5: 0x00007fff6f94ad52 libobjc.A.dylib`_objc_terminate() + 104
frame #6: 0x00007fff6de70da7 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #7: 0x00007fff6de70b55 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
frame #8: 0x00007fff6de6234f libc++abi.dylib`__cxa_throw + 113
frame #9: 0x00007fff6cb902b0 caulk`check_posix_error(char const*, int) + 168
frame #10: 0x00007fff6cb976df caulk`caulk::thread::attributes::apply_to_this_thread() + 35
frame #11: 0x00007fff6cb98b36 caulk`void* caulk::thread_proxy<std::__1::tuple<caulk::thread::attributes, void (caulk::concurrent::details::worker_thread::*)(), std::__1::tuple<caulk::concurrent::details::worker_thread*> > >(void*) + 15
frame #12: 0x00007fff70e87cce libsystem_pthread.dylib`_pthread_start + 125
frame #13: 0x00007fff70e8472b libsystem_pthread.dylib`thread_start + 15
The reason the fix for bug 1560368 exposes the crash appears to be that it causes these macOS library threads to be spawned which use the caulk
code. One thread is named "AMCP Logging Spool".
thread #11, name = 'AMCP Logging Spool'
frame #0: 0x00007fff70dc43d2 libsystem_kernel.dylib`semaphore_wait_trap + 10
frame #1: 0x00007fff6cb99eb6 caulk`caulk::mach::semaphore::wait() + 16
frame #2: 0x00007fff6cb95452 caulk`caulk::semaphore::timed_wait(double) + 106
frame #3: 0x00007fff6cb98a04 caulk`caulk::concurrent::details::worker_thread::run() + 30
frame #4: 0x00007fff6cb98b54 caulk`void* caulk::thread_proxy<std::__1::tuple<caulk::thread::attributes, void (caulk::concurrent::details::worker_thread::*)(), std::__1::tuple<caulk::concurrent::details::worker_thread*> > >(void*) + 45
frame #5: 0x00007fff70e87cce libsystem_pthread.dylib`_pthread_start + 125
frame #6: 0x00007fff70e8472b libsystem_pthread.dylib`thread_start + 15
thread #12
frame #0: 0x00007fff70dc43d2 libsystem_kernel.dylib`semaphore_wait_trap + 10
frame #1: 0x00007fff6cb99eb6 caulk`caulk::mach::semaphore::wait() + 16
frame #2: 0x00007fff6cb95452 caulk`caulk::semaphore::timed_wait(double) + 106
frame #3: 0x00007fff6cb98a04 caulk`caulk::concurrent::details::worker_thread::run() + 30
frame #4: 0x00007fff6cb98b54 caulk`void* caulk::thread_proxy<std::__1::tuple<caulk::thread::attributes, void (caulk::concurrent::details::worker_thread::*)(), std::__1::tuple<caulk::concurrent::details::worker_thread*> > >(void*) + 45
frame #5: 0x00007fff70e87cce libsystem_pthread.dylib`_pthread_start + 125
frame #6: 0x00007fff70e8472b libsystem_pthread.dylib`thread_start + 15
For the fix, we must add access to process-info-setcontrol (target setlf)
in the utility sandbox (used by the RDD process). The utility sandbox has a (deny process-info*)
rule which blocks access to all proc_info syscall calls unless they are explicitly allowed. We should also add this to the GMP process to avoid this problem happening with GMP in the future. The web content and Flash plugin sandboxes already allow process-info-setcontrol.
Assignee | ||
Comment 17•5 years ago
|
||
To avoid crashing in macOS 10.15, allow access to the proc_info PROC_INFO_CALL_SETCONTROL syscall variant in the GMP and RDD sandboxes.
Assignee | ||
Comment 18•5 years ago
|
||
The implementation of the proc_info syscall for 10.14.1 (which is the latest macOS release for which it is available at this time) can be found here: https://opensource.apple.com/source/xnu/xnu-4903.221.2/bsd/kern/proc_info.c.auto.html See proc_setcontrol.
Comment 19•5 years ago
|
||
Pushed by haftandilian@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/295ef3d15d11 [10.15] Crash in [@ CrashReporter::TerminateHandler] r=spohl
Comment 20•5 years ago
|
||
bugherder |
Assignee | ||
Comment 21•5 years ago
|
||
Needinfo to myself to file an uplift request after I do some testing on Beta.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 22•5 years ago
|
||
Comment on attachment 9080126 [details]
Bug 1566540 - [10.15] Crash in [@ CrashReporter::TerminateHandler] r?spohl
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Not having the patch might expose us to Widevine (e.g. Netflix) or AV1 crashes on macOS 10.15 which is currently in Beta and expected to release in the September timeframe.
This is not needed in ESR 60 because the GMP sandboxing code was less restrictive at that time and the RDD process (for AV1 decoding) was not enabled.
- User impact if declined: On macOS 10.15, users might experience crashes during Widevine decoding (such as Netflix playback). The crash is triggered by macOS library code which is not well understood. If we don't include the patch, another Firefox fix or a change to macOS might trigger the crashing code.
- Fix Landed on Version: 70
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is limited to macOS sandboxing code and only adds an allow rule which is unlikely to cause regressions.
- String or UUID changes made by this patch:
Beta/Release Uplift Approval Request
- User impact if declined: On macOS 10.15, users might experience crashes during Widevine decoding (such as Netflix playback) or AV1 decoding. The crash is only happening with AV1 decoding with the fix for bug 1560368 which is only on 70. However, the crash is triggered by macOS library code which is not well understood. If we don't include the patch, another Firefox fix or a change to macOS might trigger the crashing code.
The code is covered by automated tests, but the tests are not run on macOS 10.15 where this problem occurs.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: The crashes are not reproducible on releases earlier than 70.
- List of other uplifts needed: Bug 1558924
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is limited to macOS sandboxing code and only adds an allow rule which is unlikely to cause regressions.
- String changes made/needed:
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 23•5 years ago
|
||
I've tested on macOS 10.15 Beta 3 and Beta 4, but I wasn't able to reproduce the crash. I've tried to reproduce the crash on netflix.com.
I have the following results:
- on macOS 10.15 Beta 3 (19A501i) using Nightly 70.0a1 (2019-07-15) -> Netflix doesn't work, an error is displayed, but Firefox doesn't crash.
- on macOS 10.15 Beta 4 (19A512f) using Nightly 70.0a1 (201-07-15), Nightly 70.0a1 (2019-07-17), Nightly 70.0a1 (2019-07-18) -> Netflix doesn't work, an error is displayed, but Firefox doesn't crash.
This is the Netflix error displayed:
Whoops, something went wrong...
Playback Error
There appears to be a problem with Firefox that is preventing Netflix from starting playback.
Please ensure that you are on the latest version of Firefox.
Error Code: F7702-1290
Note: Widevine version 4.10.1440.19
Any thoughts here? Should I try something else to be able to reproduce the crash?
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Comment on attachment 9080126 [details]
Bug 1566540 - [10.15] Crash in [@ CrashReporter::TerminateHandler] r?spohl
Fix for a crash during video playback on macOS 10.15. Approved for 69.0b9.
Comment 25•5 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 26•5 years ago
|
||
(In reply to Camelia Badau [:cbadau], Release Desktop QA from comment #23)
Any thoughts here? Should I try something else to be able to reproduce the crash?
The Widevine crashes hit on Nightly are fixed in newer versions. Specifically, since an earlier version of Nightly was tested (2019-07-15 and 2019-07-18), the fix for bug 1566523 was not included and that is needed for Widevine playback. But the crash being fixed here isn't reproducible with Widevine right now. More details below.
On Nightly, the crash should be reproducible only for AV1 content (such as https://demo.bitmovin.com/public/firefox/av1/), but it's not reproducible for Widevine. For AV1 content, it's the RDD process that crashes and the user only sees a playback error and not a full browser crash.
On Beta, the crashes aren't reproducible right now. The motivation for uplifting the patches is that a macOS change or an unrelated Firefox change could trigger the crash on 10.15 without the patch. For example, bug 1560368 triggers this crash because it indirectly causes macOS to create some threads in the RDD process which abort without the fix. See comment 16 for more information.
Sorry for not being more clear in the uplift request.
Comment 27•5 years ago
|
||
Comment on attachment 9080126 [details]
Bug 1566540 - [10.15] Crash in [@ CrashReporter::TerminateHandler] r?spohl
Approved for 68.1esr as well. Same as the other macOS 10.15 bugs, not approving for an Fx68 dot release, however. Let's aim to have these fixes ride with Fx69/68.1esr in September.
Comment 28•5 years ago
|
||
bugherder uplift |
Comment 30•5 years ago
|
||
I've tested with a demo from https://demo.bitmovin.com/demos/av1 (as you mentioned in comment 26) on macOS 10.15 Beta 3 (19A501i) using an old version of Nightly (2019-07-19) and the latest Nightly 70.0a1 (2019-08-05) - I received a playback error on both Nightly builds. It is ok? You can see the error in the "error.png" attachment.
Also, can someone who initially reproduced the problem check that it is now fixed and there is no crash anymore?
Assignee | ||
Comment 31•5 years ago
|
||
(In reply to Camelia Badau [:cbadau], Release Desktop QA from comment #30)
Created attachment 9083343 [details]
error.pngI've tested with a demo from https://demo.bitmovin.com/demos/av1 (as you mentioned in comment 26) on macOS 10.15 Beta 3 (19A501i) using an old version of Nightly (2019-07-19) and the latest Nightly 70.0a1 (2019-08-05) - I received a playback error on both Nightly builds. It is ok?
No, that error indicates we have a problem which is probably that the RDD process is crashing. We need to determine why we're getting that error on the latest Nightly. Could you file a new bug for this issue?
Due to bug 1570451, we can't test/debug on the latest Catalina build.
Comment 32•5 years ago
|
||
I've retested today on macOS 10.15 Beta 4 (19A512f) using latest Nightly 70.0a1 (2019-08-08) and the playback error mentioned in comment 30 isn't displayed anymore: the demo correctly plays and works. It seems that the error appears only on macOS 10.15 Beta 3, but it's fixed on macOS 10.15 Beta 4. In this case, I don't think it is necessary to log a new bug.
Updated•5 years ago
|
Description
•