Closed
Bug 1185726
Opened 8 years ago
Closed 5 years ago
Mysterious crashes in nsIFrame in Nightly 42
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
RESOLVED
WORKSFORME
mozilla42
People
(Reporter: away, Unassigned)
References
Details
(Keywords: crash, Whiteboard: ShutDownKill)
Crash Data
Attachments
(1 file)
3.12 KB,
patch
|
lsocks
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-48a4d38d-841c-4c1c-8db8-7bb282150715. ============================================================= The signatures are all over nsIFrame (I've only listed a few). In total it's one of the top crashes on nightly right now. The reason code is EXCEPTION_BREAKPOINT which normally means an "int 3" assembly instruction from a MOZ_CRASH, but I don't see those instructions in the minidumps. This seems to have started in 20150714030206. (I checekd, and it's not the AMD bug) https://crash-stats.mozilla.com/search/?signature=%24nsIFrame&release_channel=nightly&product=Firefox&platform=Windows+NT&_facets=signature&_facets=build_id&_facets=version&_facets=release_channel&_facets=reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
![]() |
||
Comment 2•8 years ago
|
||
[Tracking Requested - why for this release]: This is a huge portion of the currently very high crash rate on Nightly. I think most of the bold items in https://crash-analysis.mozilla.com/rkaiser/2015-07-19/2015-07-19.firefox.42.explosiveness.html belong to this issue.
tracking-firefox42:
--- → ?
I think these are actually crashes in the parent process, with this stack: xul!mozilla::dom::CrashReporterParent::GenerateCompleteMinidump<mozilla::dom::ContentParent> xul!mozilla::dom::ContentParent::KillHard xul!mozilla::dom::ContentParent::ProcessingError (*) xul!mozilla::ipc::MessageChannel::MaybeHandleError xul!mozilla::ipc::MessageChannel::DispatchMessageW xul!mozilla::ipc::MessageChannel::OnMaybeDequeueOne xul!RunnableMethod<mozilla::ipc::MessageChannel,void (__cdecl mozilla::ipc::MessageChannel::*)(void) __ptr64,Tuple0>::Run xul!MessageLoop::DoWork Near (*) I see these strings on the stack 00feed40 6c746e54 "DispatchAsyncMessage" 00feed44 00feede8 "(msgtype=0x4A0003,name=PDocAccessible::Msg_ShowEvent) P" I suspect that these EXCEPTION_BREAKPOINTS in the child process are artificial and coming from here: https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#767 And I bet that the reason we see so much layout is just due to the hotness of that code. The regression range is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=38d03bf4616e&tochange=283b5f38ce57 which makes me suspect bug 1172525. Trevor can you please investigate?
Flags: needinfo?(tbsaunde+mozbugs)
More complete version of the error string: aReason = 0x00feede8 "(msgtype=0x4A0003,name=PDocAccessible::Msg_ShowEvent) Processing error: message was deserialized, but the handler returned false (indicating failure)"
![]() |
||
Comment 5•8 years ago
|
||
I filled bug 1183913 for a huge amount of crashes I have with e10s ON, starting with the build from July 14th. I didn't do much investigations.
Comment 6•8 years ago
|
||
Could you file a bug on getting crash reporter to report this better?
Flags: needinfo?(dmajor)
![]() |
||
Comment 8•8 years ago
|
||
Out of the ~180 signatures we saw on Nightly yesterday, I guess that >150 are this crash, and probably 80-90% of the total crash volume. We really need to get this resolved in some form.
Comment 9•8 years ago
|
||
Wondering if Billm might shed light on comment 4.
Flags: needinfo?(wmccloskey)
The problem is that DocAccessibleParent::RecvShowEvent is returning false. That causes us to kill the child process. Since the Show message is async, the child process might have been doing almost anything at this point, so that's why the signatures are so random. It seems like bug 1172525 is a likely candidate for regressing this given the range. I'm afraid I don't really understand the patch. In general, the only reason to return false from these functions is if the child process is likely to be misbehaving or exploited. It shouldn't just be for expected errors.
Flags: needinfo?(wmccloskey)
Also, you can use Super Search to look for these based on the "ipc channel error" field. There are an enormous number of them! Out of all our "return false" crashes, 97% are from PDocAccessible. 78% are from RecvShowEvent, 11% from TextChangeEvent, and 7% from StateChangeEvent.
Comment 12•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #10) > The problem is that DocAccessibleParent::RecvShowEvent is returning false. > That causes us to kill the child process. Since the Show message is async, > the child process might have been doing almost anything at this point, so > that's why the signatures are so random. > > It seems like bug 1172525 is a likely candidate for regressing this given > the range. I'm afraid I don't really understand the patch. could be your patch from bug 1100602 too (i haven't looked at the range) but I'd give that a fair chance of causing some of these. > In general, the only reason to return false from these functions is if the > child process is likely to be misbehaving or exploited. It shouldn't just be > for expected errors. I'm pretty sure all of these are cases of the child process being buggy. That said for now I'll change these to assert and return true without doing anything.
Assignee: nobody → tbsaunde+mozbugs
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Trevor Saunders (:tbsaunde) from comment #12) > could be your patch from bug 1100602 too (i haven't looked at the range) but > I'd give that a fair chance of causing some of these. I thought that too, but the crashes predate my patch. They start on 7/14 and the earliest nightly that would have had my patch is 7/17. > I'm pretty sure all of these are cases of the child process being buggy. > That said for now I'll change these to assert and return true without doing > anything. Okay, as long as the parent isn't going to just crash based on that.
Comment 14•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #13) > (In reply to Trevor Saunders (:tbsaunde) from comment #12) > > could be your patch from bug 1100602 too (i haven't looked at the range) but > > I'd give that a fair chance of causing some of these. > > I thought that too, but the crashes predate my patch. They start on 7/14 and > the earliest nightly that would have had my patch is 7/17. hmm I'm working on getting the dom fuzzer to work with e10s hopefully that will get me a test case and I can see what's happening here. > > I'm pretty sure all of these are cases of the child process being buggy. > > That said for now I'll change these to assert and return true without doing > > anything. > > Okay, as long as the parent isn't going to just crash based on that. I believe the parent should be fine. It should behave just like if those events never happened.
Comment 15•8 years ago
|
||
Attachment #8636604 -
Flags: review?(lorien)
Updated•8 years ago
|
Attachment #8636604 -
Flags: review?(lorien) → review+
Updated•8 years ago
|
Component: Layout → Disability Access APIs
Adding a tracking flag for FF42 as it is a topcrash.
![]() |
Reporter | |
Comment 19•8 years ago
|
||
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #6) > Could you file a bug on getting crash reporter to report this better? Yeah, this was a suboptimal debugging experience.
Crash Signature: , mozilla::Sides, int* const)]
[@ nsIFrame::MarkAbsoluteFramesForDisplayList(nsDisplayListBuilder*, nsRect const&)]
[@ nsIFrame::IsVisibleConsideringAncestors(unsigned int)]
[@ nsIFrame::HasOpacity()]
[@ nsIFrame::GetOverflowAreasProperty()] → , mozilla::Sides, int* const)]
[@ nsIFrame::MarkAbsoluteFramesForDisplayList(nsDisplayListBuilder*, nsRect const&)]
[@ nsIFrame::IsVisibleConsideringAncestors(unsigned int)]
[@ nsIFrame::HasOpacity()]
[@ nsIFrame::GetOverflowAreasProperty()]
[@ NtSet…
https://hg.mozilla.org/mozilla-central/rev/99e95508dc89 https://hg.mozilla.org/mozilla-central/rev/5c1f1b00ad67
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
![]() |
Reporter | |
Comment 23•8 years ago
|
||
The fix is looking good. I'm not seeing any reports of these crashes on the latest nightly so far.
![]() |
Reporter | |
Comment 24•8 years ago
|
||
Nightly crash volume is way down, but still I noticed this on the 0723 build: bp-d97663da-d8e1-4a7a-84d4-c98c82150723 reason = char [512] "(msgtype=0x4A0008,name=PDocAccessible::Msg_BindChildDoc) Processing error: message was deserialized, but the handler returned false (indicating failure)" Is it another manifestation of the same issue?
Flags: needinfo?(tbsaunde+mozbugs)
Comment 25•8 years ago
|
||
in some sense yeah, since we don't correctly keep the tree in the parent up to date any message that rellies on that can fail, and BindChildDoc is another such message. I'll prepare a patch to return true from that one :(
Flags: needinfo?(tbsaunde+mozbugs)
![]() |
||
Comment 26•8 years ago
|
||
BindDoc isn't very common, that ShoweEvent was the bad one. Here's a report showing how these failures get associated with child calls. http://www.mathies.com/mozilla/client-abort-report-nightly.txt
![]() |
||
Comment 27•8 years ago
|
||
(In reply to Trevor Saunders (:tbsaunde) from comment #25) > in some sense yeah, since we don't correctly keep the tree in the parent up > to date any message that rellies on that can fail, and BindChildDoc is > another such message. > > I'll prepare a patch to return true from that one :( I'm crashing a lot recently, and almost all of my crash reports have "(msgtype=0x4A0008,name=PDocAccessible::Msg_BindChildDoc) Processing error: message was deserialized, but the handler returned false (indicating failure)".
![]() |
||
Updated•8 years ago
|
Flags: needinfo?(tbsaunde+mozbugs)
![]() |
||
Comment 28•8 years ago
|
||
(In reply to Guilherme Lima from comment #27) > (In reply to Trevor Saunders (:tbsaunde) from comment #25) > > in some sense yeah, since we don't correctly keep the tree in the parent up > > to date any message that rellies on that can fail, and BindChildDoc is > > another such message. > > > > I'll prepare a patch to return true from that one :( > > I'm crashing a lot recently, and almost all of my crash reports have > "(msgtype=0x4A0008,name=PDocAccessible::Msg_BindChildDoc) Processing error: > message was deserialized, but the handler returned false (indicating > failure)". BindChildDoc is bug 1189277.
![]() |
||
Comment 29•8 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #28) > (In reply to Guilherme Lima from comment #27) > > (In reply to Trevor Saunders (:tbsaunde) from comment #25) > > > in some sense yeah, since we don't correctly keep the tree in the parent up > > > to date any message that rellies on that can fail, and BindChildDoc is > > > another such message. > > > > > > I'll prepare a patch to return true from that one :( > > > > I'm crashing a lot recently, and almost all of my crash reports have > > "(msgtype=0x4A0008,name=PDocAccessible::Msg_BindChildDoc) Processing error: > > message was deserialized, but the handler returned false (indicating > > failure)". > > BindChildDoc is bug 1189277. Ah, thanks!
Flags: needinfo?(tbsaunde+mozbugs)
Comment 30•8 years ago
|
||
OK, seems the crash is anyhow back. I use Win7, 64-bit and FF42.0a2, 64-bit with e10s. Seems I got this signature on shutdown of FF from the plugin-container.exe. As this crashes are ATM normally not send out, I guess it happens a lot more often then just the ~150 times that are in Socorro for now ... Here a overview about the crashes in Socorro: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsIFrame%3A%3AStyleDisplay%28%29 Here is my one: https://crash-stats.mozilla.com/report/index/34df7073-ea7f-4ee9-a8e2-a48ef2150906 And here the Crashing Thread from it: 0 xul.dll nsIFrame::StyleDisplay() obj-firefox/dist/include/nsStyleStructList.h 1 xul.dll nsFrame::DestroyFrom(nsIFrame*) layout/generic/nsFrame.cpp 2 xul.dll nsBoxFrame::DestroyFrom(nsIFrame*) layout/xul/nsBoxFrame.cpp 3 xul.dll nsContainerFrame::DestroyFrom(nsIFrame*) layout/generic/nsContainerFrame.cpp 4 xul.dll nsBoxFrame::DestroyFrom(nsIFrame*) layout/xul/nsBoxFrame.cpp 5 xul.dll nsContainerFrame::DestroyFrom(nsIFrame*) layout/generic/nsContainerFrame.cpp 6 xul.dll nsContainerFrame::DestroyFrom(nsIFrame*) layout/generic/nsContainerFrame.cpp 7 xul.dll nsFrameManager::Destroy() layout/base/nsFrameManager.cpp 8 xul.dll PresShell::Destroy() layout/base/nsPresShell.cpp 9 xul.dll nsDocumentViewer::DestroyPresShell() layout/base/nsDocumentViewer.cpp 10 xul.dll nsDocumentViewer::Destroy() layout/base/nsDocumentViewer.cpp 11 xul.dll nsDocShell::Destroy() docshell/base/nsDocShell.cpp 12 xul.dll nsWebBrowser::SetDocShell(nsIDocShell*) embedding/browser/nsWebBrowser.cpp 13 xul.dll nsWebBrowser::InternalDestroy() embedding/browser/nsWebBrowser.cpp 14 xul.dll nsWebBrowser::Destroy() embedding/browser/nsWebBrowser.cpp 15 xul.dll mozilla::dom::TabChild::DestroyWindow() dom/ipc/TabChild.cpp 16 xul.dll mozilla::dom::TabChild::RecvDestroy() dom/ipc/TabChild.cpp 17 xul.dll mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PBrowserChild.cpp 18 xul.dll mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PContentChild.cpp 19 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const&) ipc/glue/MessageChannel.cpp 20 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp 21 xul.dll RunnableMethod<mozilla::ipc::MessageChannel, void ( mozilla::ipc::MessageChannel::*)(void), Tuple0>::Run() ipc/chromium/src/base/task.h 22 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc 23 xul.dll mozilla::ipc::DoWorkRunnable::Run() ipc/glue/MessagePump.cpp 24 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 25 xul.dll XPTC__InvokebyIndex xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm 26 @0xfff8ffffffffffff 27 xul.dll nsXPConnect::XPConnect() js/xpconnect/src/xpcprivate.h 28 xul.dll XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) js/xpconnect/src/XPCWrappedNative.cpp 29 xul.dll XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) js/xpconnect/src/XPCWrappedNativeJSOps.cpp 30 @0x9da2121c4b
Status: RESOLVED → REOPENED
Hardware: x86 → All
Resolution: FIXED → ---
Updated•8 years ago
|
Crash Signature: , nsRect const&)]
[@ nsIFrame::IsVisibleConsideringAncestors(unsigned int)]
[@ nsIFrame::HasOpacity()]
[@ nsIFrame::GetOverflowAreasProperty()]
[@ NtSetIoCompletion ]
[@ ZwSetIoCompletion ] → , nsRect const&)]
[@ nsIFrame::IsVisibleConsideringAncestors(unsigned int)]
[@ nsIFrame::HasOpacity()]
[@ nsIFrame::GetOverflowAreasProperty()]
[@ NtSetIoCompletion ]
[@ ZwSetIoCompletion ]
[@ nsIFrame::StyleDisplay]
[@ nsIFrame::BuildDisplayListFo…
Updated•8 years ago
|
Blocks: shutdownkill
status-firefox43:
--- → ?
status-firefox44:
--- → ?
status-firefox45:
--- → ?
OS: Windows NT → Windows
Whiteboard: ShutDownKill
Comment 31•7 years ago
|
||
42 is gone.
From the crash signature nsIFrame::GetOverflowAreasProperty, the current affected versions are: - Nightly: 47 - Aurora: 44, 46 - Beta: 44.0b1, 44.0b99, 45.0b1 - Release: 44.0 From the crash signature nsIFrame::IsVisibleConsideringAncestors, the current affected versions are: - Beta: 44.0b9, 44.0b99, 45.0b1, 45.0b2 - Release: 44.0 From the crash signature nsIFrame::HasOpacity, the current affected versions are: - Beta: 44.0b1, 44.0b7, 44.0b99, 45.0b1 - Release: 44.0 From the crash signature nsIFrame::StyleDisplay, the current affected versions are: - Aurora: 44, 46 - Beta: 44.0b1, 44.0b6, 44.0b8, 44.0b9, 44.0b99, 45.0b1, 45.0b2 - Release: 44.0 From the crash signature nsIFrame::MarkAbsoluteFramesForDisplayList, the current affected versions are: - Beta: 45.0b1, 45.0b2 From the crash signature nsIFrame::BuildDisplayListForChild, the current affected versions are: - Nightly: 46, 47 - Aurora: 44, 46 - Beta: 44.0b1, 44.0b2, 44.0b8, 44.0b99, 45.0b1, 45.0b2 - Release: 44.0 From the crash signature nsIFrame::ComputeBorderRadii, the current affected versions are: - Nightly: 45 - Beta: 44.0b99 - Release: 44.0 From the crash signature NtSetIoCompletion, the current affected versions are: - Nightly: 44, 45 - Aurora: 44 - Beta: 44.0b9 From the crash signature ZwSetIoCompletion and nsIFrame::InlinePrefISizeData::ForceBreak the current affected versions are: - Nightly: 45 From the crash signature nsIFrame::BuildDisplayListForChild, nsIFrame::GetOffsetTo, nsIFrame::GetOffsetToCrossDoc, nsIFrame::GetPositionOfChildIgnoringScrolling and nsIFrame::SetStyleContext the current affected versions are: - Nightly: 47
Updated•7 years ago
|
No longer blocks: shutdownkill
Updated•6 years ago
|
Assignee: tbsaunde+mozbugs → nobody
Comment 33•5 years ago
|
||
I do see some crashes like nsIFrame::IsVisibleConsideringAncestors, but I don't longer see a11y on stack, do you? If not, then it's worth to open a new bug for those crashes, and close this one.
Flags: needinfo?(brindusa.tot)
Comment 34•5 years ago
|
||
Hi Alexander, From the list of the crashes from the bug, under the mozilla crash reports, the followings are still reproducing in the last 7 days: - nsIFrame::StyleDisplay - nsIFrame::BuildDisplayListForChild - nsIFrame::ComputeBorderRadii - nsIFrame::IsVisibleConsideringAncestors - nsIFrame::HasOpacity - nsIFrame::GetOverflowAreasProperty - ZwSetIoCompletion - nsIFrame::MarkAbsoluteFramesForDisplayList For me, it's not clear why we should open a new bug for these crashes, considering that the above-mentioned crashes are still reproducible. Couldn't we continue tracking these failures in this bug?
Flags: needinfo?(brindusa.tot) → needinfo?(surkov.alexander)
Comment 35•5 years ago
|
||
(In reply to Brindusa Tot[:brindusat] from comment #34) > Hi Alexander, > > From the list of the crashes from the bug, under the mozilla crash reports, > the followings are still reproducing in the last 7 days: > - nsIFrame::StyleDisplay > - nsIFrame::BuildDisplayListForChild > - nsIFrame::ComputeBorderRadii > - nsIFrame::IsVisibleConsideringAncestors > - nsIFrame::HasOpacity > - nsIFrame::GetOverflowAreasProperty > - ZwSetIoCompletion > - nsIFrame::MarkAbsoluteFramesForDisplayList > > For me, it's not clear why we should open a new bug for these crashes, > considering that the above-mentioned crashes are still reproducible. > Couldn't we continue tracking these failures in this bug? I suggested another bug because these crashes look different from those reported initially. Granted, they have same signatures, but their stacks are different, because I don't see a11y on the stacks any longer. If you like to keep this bug open, then it's fine for sure, just the bug's component should be changed.
Flags: needinfo?(surkov.alexander) → needinfo?(brindusa.tot)
Comment 36•5 years ago
|
||
I would say, just update the component with the correct one and keep this bug opened.
Flags: needinfo?(brindusa.tot)
Updated•5 years ago
|
Component: Disability Access APIs → Layout
Comment 37•5 years ago
|
||
I don't see much value in keeping this bug open and tracking the crashes here. The bug was originally opened for accessibility issue (see comment 3 onward). As far as the crashes no longer have a11y bits in their reports, the new crashes seem to be unrelated, and thus the discussion in this bug may not provide useful information anymore. Furthermore, this bug contains lots of effectively unrelated signatures, which may actually belong to different bugs, and they happen to be connected to this bug just because there was a bug which can end up causing crashes with these signatures. So counting new crashes into this bug is not helpful, and can be confusing. Given comment 30 and the fact that we don't see a11y bits in those crashes recently, I'd say this is a WORKSFORME since we don't actually know what fixed the a11y issue to have it stop triggering crashes here. Please open new bugs for any signature which are still outstanding here.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 5 years ago
Component: Layout → Disability Access APIs
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•