Closed Bug 1679753 Opened 3 years ago Closed 3 years ago

Crash in [@ IPCError-browser | AddChildDoc binding to nonexistant proxy!]

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
90 Branch
Fission Milestone MVP
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- disabled
firefox84 --- disabled
firefox85 --- disabled
firefox88 --- disabled
firefox89 --- disabled
firefox90 --- fixed

People

(Reporter: aryx, Assigned: Jamie)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Almost all crashes running on Windows 10, all on Nightly with Fission enabled.

Previous instance of this crash signature: bug 1574286

Crash report: https://crash-stats.mozilla.org/report/index/80169e22-82b7-4e55-96c4-509bd0201122

Reason: EXCEPTION_BREAKPOINT

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1143
5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
6 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
8 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
9 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:602
  1. 100% of the (45) crashes from 81 - 85 Nightly have Fission enabled. Let's track this for Fission MVP to monitor the crash volume.

  2. Is the stack trace in comment 0 really from the crashing thread or process?

  3. btw, "nonexistant" should be spelled "nonexistent".

Blocks: 1574286
Fission Milestone: --- → MVP
Summary: [Fission] Crash in [@ IPCError-browser | AddChildDoc binding to nonexistant proxy!] → Crash in [@ IPCError-browser | AddChildDoc binding to nonexistant proxy!]

(In reply to Chris Peterson [:cpeterson] from comment #1)

  1. Is the stack trace in comment 0 really from the crashing thread or process?

Sort of. The error occurs in the parent process, which returns an IPC error, but IPC errors deliberately crash the content process. The problem is that the stack is useless, since we don't know what code path the child process took to send the request to the parent, nor do we know what code path the parent process took to return the error. So, it's all guesswork from here. :(

Blocks: a11y-fission

I've given this some thought and I really don't know what's causing this. To learn more, we could start with DIAGNOSTIC_ASSERT(false) where we throw the IPC error. This would give us the stack that led to the error, since AddChildDoc is called from a few places.

Severity: -- → S2

Dropping severity due to low crash volume.

CPeterson and I discussed this via email. A diagnostic assert will only fire on Nightly and I'm not sure if we have enough of these crashes there now to give us useful data. On the other hand, a beta assert would cause a full browser crash for affected users instead of just a tab crash. We'd get data, but perhaps at the cost of user pain. For now, we've decided to just wait and see what happens when the Fission beta experiment starts. We should be able to uplift a beta assert fairly quickly if we need the data.

Severity: S2 → S3

Jamie, the Fission beta experiment has been ongoing since Fx89. What would like to collect from the beta experiment to make progress here?

Flags: needinfo?(jteh)

Interestingly, if I'm reading the data correctly, we aren't seeing these crashes at all on beta. I guess we don't crash for IPC errors on beta?

There are 70+ crashes this week on Nightly, so I think adding the diagnostic assert on Nightly might be useful.

Returning an IPC failure causes a crash in the content process on Nightly.
However, the crash doesn't have a useful stack.
A diagnostic assert will cause a full browser crash instead of a tab crash, which isn't ideal.
On the other hand, it will hopefully allow us to get much more useful info about the crash.

Assignee: nobody → jteh
Status: NEW → ASSIGNED
Flags: needinfo?(jteh)
Keywords: leave-open
Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/91426e1b5ca1
Add diagnostic assert to DocAccessibleParent::AddChildDoc when binding to a nonexistent proxy. r=eeejay

Here's a crash report:
https://crash-stats.mozilla.org/report/index/04c29944-b806-4a23-ba6c-391650210519
because of the added assertion.

Crash Signature: [@ IPCError-browser | AddChildDoc binding to nonexistant proxy!] → [@ IPCError-browser | AddChildDoc binding to nonexistant proxy!] [@ mozilla::a11y::DocAccessibleParent::AddChildDoc ]

Ah. That's more useful.

This is happening when adding an in-process iframe. Looking at the crash dump, I see that the document to which it's being added is itself an OOP iframe, but that OOP iframe document has been shut down. I think it got shut down because the OOP iframe's embedder got destroyed.

To put this another way, we have:
top level -> OOP iframe -> in-proc iframe
Top level gets destroyed, destroying OOP iframe's DocAccessibleParent in its wake, but the OOP iframe is really still alive for now.

Nika, is there any chance that an OOP iframe can transfer ownership to an embedder in a different process from its original embedder? In the above example, could top level switch processes without recreating OOP iframe?

If so, I'll need to make sure embedders don't shut down OOP iframes.
If not, I can just ignore any requests coming in for destroyed documents.

Flags: needinfo?(nika)

(In reply to James Teh [:Jamie] from comment #11)

Nika, is there any chance that an OOP iframe can transfer ownership to an embedder in a different process from its original embedder? In the above example, could top level switch processes without recreating OOP iframe?

No, there's no way to re-use an oop iframe in a single document. We can only swap subframes between documents in the very specific situation of dragging a tab into a new window in the browser UI, and it's not possible to do in content.

Flags: needinfo?(nika)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #12)

We can only swap subframes between documents in the very specific situation of dragging a tab into a new window in the browser UI

Ah... but if the top level DocAccessibleParent gets destroyed (even if that's triggered by a parent process action), that will destroy the OOP iframe DocAccessibleParent too.

Is the "Move Tab -> Move to New Window" command in the tab context menu enough to trigger this? I can't seem to pull off dragging a tab to a new window with the mouse, even using my screen reader's mouse commands.

When I do this, it seems that the actual DOM document (in the content process) doesn't get recreated and so we also keep the PDocAccessible actor. Could dragging a tab to another window ever cause the actual DOM document (or PBrowser actor) to be destroyed/recreated?

I do notice some unrelated minor a11y breakage when dragging a tab to a new window. Is there some notification in the parent process I can listen to for when a tab gets moved to a new window?

Flags: needinfo?(nika)

(In reply to James Teh [:Jamie] from comment #13)

Ah... but if the top level DocAccessibleParent gets destroyed (even if that's triggered by a parent process action), that will destroy the OOP iframe DocAccessibleParent too.

It should not be destroyed in this situation :-)

Is the "Move Tab -> Move to New Window" command in the tab context menu enough to trigger this? I can't seem to pull off dragging a tab to a new window with the mouse, even using my screen reader's mouse commands.

Yes, that should be an effective way to trigger that code path.

When I do this, it seems that the actual DOM document (in the content process) doesn't get recreated and so we also keep the PDocAccessible actor. Could dragging a tab to another window ever cause the actual DOM document (or PBrowser actor) to be destroyed/recreated?

It should not :-)

I do notice some unrelated minor a11y breakage when dragging a tab to a new window. Is there some notification in the parent process I can listen to for when a tab gets moved to a new window?

I believe this is currently generally performed using swapDocShells on the browser custom element, which should fire some events here: https://searchfox.org/mozilla-central/rev/2f1a015b004b79f1145c81cdf86b15481a5630e2/toolkit/content/widgets/browser-custom-element.js#1542.

Flags: needinfo?(nika)
See Also: → 1712182

(In reply to James Teh [:Jamie] from comment #13)

I do notice some unrelated minor a11y breakage when dragging a tab to a new window.

Filed bug 1712233 for this.

Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0620992c6ba8
When constructing a new DocAccessibleParent for an in-process iframe, handle the case where the parent document has already been destroyed. r=eeejay
Keywords: leave-open
Target Milestone: --- → 90 Branch
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

(In reply to lomombwlo from comment #19)

The crash has resurfaced
https://crash-stats.mozilla.org/report/index/7624dc19-bf06-411f-8e29-3e7c70210528

This one is bug 1713680.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: