Closed Bug 1631276 Opened 4 years ago Closed 4 years ago

Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::a11y::PDocAccessibleParent::SendChildAtPoint]

Categories

(Core :: Disability Access APIs, defect, P1)

76 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox75 --- unaffected
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: philipp, Assigned: Jamie)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

This bug is for crash report bp-44369265-17da-4a1c-896a-7a8aa0200419.

Top 10 frames of crashing thread:

0 XUL mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:160
1 XUL mozilla::ipc::IProtocol::HandleFatalError const ipc/glue/ProtocolUtils.cpp:399
2 XUL mozilla::a11y::PDocAccessibleParent::SendChildAtPoint ipc/ipdl/PDocAccessibleParent.cpp
3 XUL mozilla::a11y::ProxyAccessible::ChildAtPoint accessible/ipc/other/ProxyAccessible.cpp:796
4 XUL -[mozAccessible accessibilityHitTest:] accessible/mac/mozAccessible.mm:526
5 XUL -[mozAccessible accessibilityHitTest:] accessible/mac/mozAccessible.mm:519
6 AppKit -[NSWindow accessibilityHitTest:] 
7 AppKit -[NSApplication accessibilityHitTest:] 
8 AppKit CopyElementAtPosition 
9 HIServices _AXXMIGCopyElementAtPosition 

this crash signature from macos users is starting to show up since firefox 76 - perhaps related to bug 1598299. it's only affecting nightly and devedition users though...

Blocks: a11y-fission
Priority: -- → P1

:jamie, if it's a P1 can you get someone to work on it?

Flags: needinfo?(jteh)

I've been trying to figure out what's going on here, but I'm honestly at a bit of a loss. I'll keep trying. Crash volume is pretty low - 15 crashes so far - but I'll leave it as p1 for now.

Assignee: nobody → jteh
Flags: needinfo?(jteh)

This is still low volume. I'm wondering whether this will just go away with the Mac AccessibleOrproxy refactor (bug 1632252 ). I'm inclined to wait for that and see what happens.

Blocks: 1632252
Severity: -- → S3

This is still super low volume, but I see we have some crashes on Linux such as this one:
bp-c1626022-2d36-46c9-88b4-6b6450200610
as well as crashes on Mac after bug 1632252 such as this one:
bp-77d03964-76f2-4a56-b8a7-ef5e00200610
So this is still a problem and not Mac specific.

The crash is caused by the IPC error "Error deserializing 'PDocAccessible'". I assume that's the aResultDoc return for PDocAccessible::ChildAtPoint. So, somehow, the result document actor is bad.

The result doc is fetched by calling Document()->IPCDoc() on the hit test target accessible. I don't understand how this could be invalid. If it is dying, IPCDoc() should have returned null.

No longer blocks: 1632252
OS: macOS → All
Hardware: x86_64 → All

Nika, do you know if the "Error deserializing 'PDocAccessible'" fatal IPC error could be caused by the following scenario?

  1. The DocAccessibleChild gets created in the content process.
  2. We send the constructor to the parent process, which I assume is an async message. So, as far as the content process is concerned, the actor is all set up. (Or is there some construction handshake between content and parent here?)
  3. The parent process doesn't process the queue yet. Instead, it answers an a11y query, which makes a sync IPC call.
  4. The sync IPC call sends back the actor created in (1) as a return value.
  5. Because (3) is sync IPC, the parent process tries to deserialise (4), even though it hasn't yet processed the async constructor sent in (2).

If this is plausible, is there some way I can detect this?

Flags: needinfo?(nika)

I guess I could create my own constructor response message, send it from parent once construction is done, set a flag on the DocAccessibleChild when it's received and only allow this sync message to send a DocAccessibleChild as a return value if that flag is set. That feels really yucky, though, and creates more IPC traffic which won't even be useful 99% of the time. I'm really hoping there's another way...

(In reply to James Teh [:Jamie] from comment #5)

Nika, do you know if the "Error deserializing 'PDocAccessible'" fatal IPC error could be caused by the following scenario?

Yes, that's definitely possible to have happen. It's a big issue with sync IPC where ordering guarantees are completely broken as soon as there are nested sync IPC messages, and it's one of the reasons why we're trying to get rid of nested sync IPC, and it's (mostly) only used in a11y and NPAPI nowadays.

I can think of 3 ways to do this:

  1. The ack message you mentioned in comment 6
  2. Sending an ID rather than the actor in the message, and looking it up manually in the parent process through some other mechanism, or
  3. Using epochs in the ChildAtPoint message and the PDocAccesible constructor to keep track of what the most recent PDocAccessible the parent process has seen so far is.

The easiest option is probably the ack message.

Flags: needinfo?(nika)
Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/26a4a3ed96f5
part 1: Move Set/GetIsConstructedInParentProcess into DocAccessibleChildBase. r=eeejay
https://hg.mozilla.org/integration/autoland/rev/33231d9cbc84
part 2: On non-Windows, have the parent process notify the content process when the DocAccessibleParent is constructed. r=eeejay
https://hg.mozilla.org/integration/autoland/rev/f75850ba8c68
part 3: Don't return a descendant document in DocAccessibleChild::RecvChildAtPoint unless we're certain that document has been constructed in the parent process. r=eeejay
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: