Closed Bug 1669748 Opened 4 years ago Closed 4 years ago

Fission crash in [@ mozilla::a11y::DocAccessibleParent::AddChildDoc]

Categories

(Core :: Disability Access APIs, defect)

All
Windows 10
defect

Tracking

()

VERIFIED FIXED
86 Branch
Fission Milestone M6c
Tracking Status
firefox-esr78 --- unaffected
firefox84 --- disabled
firefox85 --- disabled
firefox86 + verified

People

(Reporter: sg, Assigned: Jamie)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/df8d0a4b-4991-47e2-be48-ad95a0201006

Reason: EXCEPTION_BREAKPOINT
MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(aBasePtr)

Top 10 frames of crashing thread:

0 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:652
1 xul.dll mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor dom/ipc/BrowserParent.cpp:1244
2 xul.dll mozilla::dom::PBrowserParent::OnMessageReceived ipc/ipdl/PBrowserParent.cpp:2790
3 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:6643
4 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2074
5 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1953
6 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:514
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327

Unfortunately, this stack is missing critical frames. The stack from WinDBG has a few extra, but is still missing critical frames:

00 (Inline Function) --------`-------- xul!mozilla::UniquePtr<IStream,mozilla::mscom::detail::PreservedStreamDeleter>::reset+0x2ad [/builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h @ 302] 
01 (Inline Function) --------`-------- xul!mozilla::UniquePtr<IStream,mozilla::mscom::detail::PreservedStreamDeleter>::operator=+0x2ad [/builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h @ 256] 
02 0000000b`1c5fca50 00007ffe`b012bc54 xul!mozilla::a11y::DocAccessibleParent::AddChildDoc(class mozilla::a11y::DocAccessibleParent * aChildDoc = 0x000001f3`e354a820, unsigned int64 aParentID = <Value unavailable error>, bool aCreating = <Value unavailable error>)+0x643 [/builds/worker/checkouts/gecko/accessible/ipc/DocAccessibleParent.cpp @ 652] 
03 0000000b`1c5fcb30 00007ffe`ae716147 xul!mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor(class mozilla::a11y::PDocAccessibleParent * aDoc = 0x000001f3`e354a820, class mozilla::a11y::PDocAccessibleParent * aParentDoc = <Value unavailable error>, unsigned int64 * aParentID = 0x0000000b`1c5fcce0, unsigned int * aMsaaID = 0x0000000b`1c5fccb0, class mozilla::mscom::COMPtrHolder<IAccessible,IID_IAccessible> * aDocCOMProxy = 0x0000000b`1c5fce20)+0x184 [/builds/worker/checkouts/gecko/dom/ipc/BrowserParent.cpp @ 1247] 

From what I can find, MOZ_RELEASE_ASSERT(aBasePtr) only happens in NotNull... but the top of the stack is calling the deleter for a PreservedStreamPtr. I can't work out where PreservedStreamPtr would endup using NotNull, nor can I figure out why calling DocAccessibleParent::SendEmulatedWindowHandle (as apparently happens in frame 2) would delete a PreservedStreamPtr. Worse, the code in frame 2 can only run if window emulation is started, but the crash report shows that accessibility inproc client is 0x400 (UNKNOWN), which means window emulation shouldn't be started.

So... I'm at a bit of a loss here.

Severity: -- → S3

Maybe this can be ignored, there have only been very few reports now. At first sight, I thought it might be actionable since it involves an assertion failure. I gave it a second look now in the debugger, and I cannot make sense out of that neither. Both aChildDoc and this in the top non-inline frame also look good, I found no indication of a UAF or so.

This spikes in the past few days.

I've been getting this somewhat regularly for about the last day and a half, is there anything I can do to help here?

Thanks for the offer of help, Justin.I've spent quite a bit of time trying to fathom this, but still haven't had any luck. As I noted in comment 1, the info we're getting from crash dumps just doesn't make any sense. I'm honestly not sure what to suggest yet.

Is there any pattern to these crashes for you? is there some website or task that is more likely to cause it than others?

Flags: needinfo?(1justinpeter)

Unfortunately I haven't noticed much of a pattern myself, but I'll let you know what I can. (Obvious disclaimers about anecdotal evidence apply.)

  • It doesn't seem to matter what website I'm on; I've had it across a variety of websites. (I check the box to report what website I was on when it's not something too obscure/potentially identifiable.)
  • I usually seem to go for somewhere between half an hour and an hour or two after starting Firefox before it'll crash; however, after it crashes once it seems far more likely to trigger another crash when restarting/reopening tabs. (Not 100%, but saying at least 50% probably isn't unreasonable. Speculation might imply that it has something to do with whether it's something where restarting might result in being logged out of a site or somehow loading a significant different page.) After closing and restarting Firefox normally, going to the same tabs doesn't immediately trigger a crash. I don't know if there would be any good way to capture any sort of more useful debug information starting directly after a crash so that I could catch the few-second window where it's more likely to crash again.
  • Crashes seem to happen disproportionately often (but not exclusively) just before a page finishes loading.
  • Even more anecdotal, I just had a crash very shortly after waking my computer from sleep; however, that could just be coincidence. With the point above, however, I'm fairly certain that the crash requires action of some kind to trigger it; it doesn't seem to happen due to anything happening in the background.

Not sure if any of that is helpful, but if there's anything else I can do please let me know.

Flags: needinfo?(1justinpeter) → needinfo?(jteh)

I looked into this a bit more and now I can reproduce it reliably. I don't know whether this is specific to my system (I don't have another to test it on at the moment) or which websites this will work with (the below works reliably, but obviously some other sites will work as well), and I put a few seconds (5 or so is enough) between each step.

  1. Open Firefox
  2. Navigate to phoronix.com
  3. Close the lid of the laptop (sleep)
  4. Open and log in
  5. Click the "featured article" near the top

Slight clarification (sorry for so many posts in a row, I can't edit them): putting the computer to sleep (any method appears to work) is not necessary for the crash, but it is necessary to make the reproduction steps above reliable (at least for me). Also, for step 5 clicking on any article works.

Thanks Justin. I really appreciate the time and effort you've put into pinning this down. Unfortunately, I've tried many times and cannot get it to happen on my system, so I'm back to square 0. Hopefully I can think of something else...

Flags: needinfo?(jteh)

Is there any way I could grab some sort of more verbose log or anything that could be of use to you? I feel like there has to be some way.... It looks like rr is only a Linux tool, and it looks like this is a Windows-only issue.

The COM proxy for a DocAccessibleParent at the top level in its content process really should never be null.
However, some systems seem to have a broken COM configuration which causes problems like this.
This does mean a11y is broken, but users who get a11y enabled because of something other than an AT (e.g. touch screen) probably aren't even aware.
Regardless, we shouldn't crash.
Instead, we assert (in debug builds) and null check.

Assignee: nobody → jteh
Status: NEW → ASSIGNED

The patch above is somewhat speculative, since I can't reproduce this and the stack is somewhat nonsensical. Still, I think this is worth a try.

All the crash reports have Fission enabled, which is only available in Nightly. I don't expect we'll need to uplift a fix to Beta.

Crash Signature: [@ mozilla::a11y::DocAccessibleParent::AddChildDoc] → [@ mozilla::a11y::DocAccessibleParent::AddChildDoc] [@ mozilla::WrapNotNull<T> | mozilla::a11y::DocAccessibleParent::AddChildDoc]
Fission Milestone: --- → M6c
Hardware: Unspecified → All
Summary: Crash in [@ mozilla::a11y::DocAccessibleParent::AddChildDoc] → Fission crash in [@ mozilla::a11y::DocAccessibleParent::AddChildDoc]
Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/739437f45d3b Null check the document IAccessibles sent for OOP iframes. r=yzen
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch

I can no longer reproduce the crash on the latest nightly.

Thanks Justin.

Status: RESOLVED → VERIFIED
See Also: → 1713680
See Also: → 1737193
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: