Fission crash in [@ mozilla::a11y::DocAccessibleParent::AddChildDoc]
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox84 | --- | disabled |
firefox85 | --- | disabled |
firefox86 | + | verified |
People
(Reporter: sg, Assigned: Jamie)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/df8d0a4b-4991-47e2-be48-ad95a0201006
Reason: EXCEPTION_BREAKPOINT
MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(aBasePtr)
Top 10 frames of crashing thread:
0 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:652
1 xul.dll mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor dom/ipc/BrowserParent.cpp:1244
2 xul.dll mozilla::dom::PBrowserParent::OnMessageReceived ipc/ipdl/PBrowserParent.cpp:2790
3 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:6643
4 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2074
5 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1953
6 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:514
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327
Assignee | ||
Comment 1•4 years ago
|
||
Unfortunately, this stack is missing critical frames. The stack from WinDBG has a few extra, but is still missing critical frames:
00 (Inline Function) --------`-------- xul!mozilla::UniquePtr<IStream,mozilla::mscom::detail::PreservedStreamDeleter>::reset+0x2ad [/builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h @ 302]
01 (Inline Function) --------`-------- xul!mozilla::UniquePtr<IStream,mozilla::mscom::detail::PreservedStreamDeleter>::operator=+0x2ad [/builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h @ 256]
02 0000000b`1c5fca50 00007ffe`b012bc54 xul!mozilla::a11y::DocAccessibleParent::AddChildDoc(class mozilla::a11y::DocAccessibleParent * aChildDoc = 0x000001f3`e354a820, unsigned int64 aParentID = <Value unavailable error>, bool aCreating = <Value unavailable error>)+0x643 [/builds/worker/checkouts/gecko/accessible/ipc/DocAccessibleParent.cpp @ 652]
03 0000000b`1c5fcb30 00007ffe`ae716147 xul!mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor(class mozilla::a11y::PDocAccessibleParent * aDoc = 0x000001f3`e354a820, class mozilla::a11y::PDocAccessibleParent * aParentDoc = <Value unavailable error>, unsigned int64 * aParentID = 0x0000000b`1c5fcce0, unsigned int * aMsaaID = 0x0000000b`1c5fccb0, class mozilla::mscom::COMPtrHolder<IAccessible,IID_IAccessible> * aDocCOMProxy = 0x0000000b`1c5fce20)+0x184 [/builds/worker/checkouts/gecko/dom/ipc/BrowserParent.cpp @ 1247]
From what I can find, MOZ_RELEASE_ASSERT(aBasePtr) only happens in NotNull... but the top of the stack is calling the deleter for a PreservedStreamPtr. I can't work out where PreservedStreamPtr would endup using NotNull, nor can I figure out why calling DocAccessibleParent::SendEmulatedWindowHandle (as apparently happens in frame 2) would delete a PreservedStreamPtr. Worse, the code in frame 2 can only run if window emulation is started, but the crash report shows that accessibility inproc client is 0x400 (UNKNOWN), which means window emulation shouldn't be started.
So... I'm at a bit of a loss here.
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 2•4 years ago
|
||
Maybe this can be ignored, there have only been very few reports now. At first sight, I thought it might be actionable since it involves an assertion failure. I gave it a second look now in the debugger, and I cannot make sense out of that neither. Both aChildDoc
and this
in the top non-inline frame also look good, I found no indication of a UAF or so.
Comment 3•4 years ago
|
||
I just saw this several times on a try push:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319172495&repo=try&lineNumber=1786
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319172495&repo=try&lineNumber=7623
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319172496&repo=try&lineNumber=1787
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319172496&repo=try&lineNumber=7606
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319173091&repo=try&lineNumber=1788
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319173091&repo=try&lineNumber=7590
Updated•4 years ago
|
Comment 4•4 years ago
|
||
This spikes in the past few days.
Comment 5•4 years ago
|
||
I've been getting this somewhat regularly for about the last day and a half, is there anything I can do to help here?
Assignee | ||
Comment 6•4 years ago
|
||
Thanks for the offer of help, Justin.I've spent quite a bit of time trying to fathom this, but still haven't had any luck. As I noted in comment 1, the info we're getting from crash dumps just doesn't make any sense. I'm honestly not sure what to suggest yet.
Is there any pattern to these crashes for you? is there some website or task that is more likely to cause it than others?
Comment 7•4 years ago
|
||
Unfortunately I haven't noticed much of a pattern myself, but I'll let you know what I can. (Obvious disclaimers about anecdotal evidence apply.)
- It doesn't seem to matter what website I'm on; I've had it across a variety of websites. (I check the box to report what website I was on when it's not something too obscure/potentially identifiable.)
- I usually seem to go for somewhere between half an hour and an hour or two after starting Firefox before it'll crash; however, after it crashes once it seems far more likely to trigger another crash when restarting/reopening tabs. (Not 100%, but saying at least 50% probably isn't unreasonable. Speculation might imply that it has something to do with whether it's something where restarting might result in being logged out of a site or somehow loading a significant different page.) After closing and restarting Firefox normally, going to the same tabs doesn't immediately trigger a crash. I don't know if there would be any good way to capture any sort of more useful debug information starting directly after a crash so that I could catch the few-second window where it's more likely to crash again.
- Crashes seem to happen disproportionately often (but not exclusively) just before a page finishes loading.
- Even more anecdotal, I just had a crash very shortly after waking my computer from sleep; however, that could just be coincidence. With the point above, however, I'm fairly certain that the crash requires action of some kind to trigger it; it doesn't seem to happen due to anything happening in the background.
Not sure if any of that is helpful, but if there's anything else I can do please let me know.
Comment 8•4 years ago
|
||
I looked into this a bit more and now I can reproduce it reliably. I don't know whether this is specific to my system (I don't have another to test it on at the moment) or which websites this will work with (the below works reliably, but obviously some other sites will work as well), and I put a few seconds (5 or so is enough) between each step.
- Open Firefox
- Navigate to phoronix.com
- Close the lid of the laptop (sleep)
- Open and log in
- Click the "featured article" near the top
Comment 9•4 years ago
|
||
Slight clarification (sorry for so many posts in a row, I can't edit them): putting the computer to sleep (any method appears to work) is not necessary for the crash, but it is necessary to make the reproduction steps above reliable (at least for me). Also, for step 5 clicking on any article works.
Assignee | ||
Comment 10•4 years ago
|
||
Thanks Justin. I really appreciate the time and effort you've put into pinning this down. Unfortunately, I've tried many times and cannot get it to happen on my system, so I'm back to square 0. Hopefully I can think of something else...
Comment 11•4 years ago
|
||
Is there any way I could grab some sort of more verbose log or anything that could be of use to you? I feel like there has to be some way.... It looks like rr is only a Linux tool, and it looks like this is a Windows-only issue.
Assignee | ||
Comment 12•4 years ago
|
||
The COM proxy for a DocAccessibleParent at the top level in its content process really should never be null.
However, some systems seem to have a broken COM configuration which causes problems like this.
This does mean a11y is broken, but users who get a11y enabled because of something other than an AT (e.g. touch screen) probably aren't even aware.
Regardless, we shouldn't crash.
Instead, we assert (in debug builds) and null check.
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
|
||
The patch above is somewhat speculative, since I can't reproduce this and the stack is somewhat nonsensical. Still, I think this is worth a try.
Comment 14•4 years ago
|
||
All the crash reports have Fission enabled, which is only available in Nightly. I don't expect we'll need to uplift a fix to Beta.
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
bugherder |
Comment 17•4 years ago
|
||
I can no longer reproduce the crash on the latest nightly.
Comment 19•3 years ago
|
||
Description
•