Closed Bug 1202887 Opened 4 years ago Closed 4 years ago

crash in mozilla::PWebBrowserPersistDocumentParent::DestroySubtree(mozilla::ipc::IProtocolManager<T>::ActorDestroyReason)

Categories

(Core :: IPC, defect, critical)

42 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox41 --- unaffected
firefox42 + fixed
firefox43 + fixed

People

(Reporter: mats, Assigned: jld)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-9f5b1f5b-becb-475a-9fc1-bc7ec2150907.
=============================================================

Currently #17 in Nightly Top crash data.  First seen 2015-08-08.

Possibly a regression from bug 1101100? (landed 2015-08-06 on m-c)

Stack:
mozilla::PWebBrowserPersistDocumentParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason)
mozilla::dom::PBrowserParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason)
mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const&)
mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)
mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&)
mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const&)
mozilla::ipc::MessageChannel::OnMaybeDequeueOne()
RunnableMethod<mozilla::ipc::MessageChannel, void ( mozilla::ipc::MessageChannel::*)(void), Tuple0>::Run()
MessageLoop::DoWork()
mozilla::ipc::DoWorkRunnable::Run()
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3APWebBrowserPersistDocumentParent%3A%3ADestroySubtree%28mozilla%3A%3Aipc%3A%3AIProtocolManager%3CT%3E%3A%3AActorDestroyReason%29

[Tracking Requested - why for this release]: crash regression
Flags: needinfo?(jld)
The parent is handling a __delete__ for a PBrowser (TabParent) and destroying its actor subtree, but encounters a WebBrowserPersistDocumentParent that has somehow been corrupted or freed without being removed from the managee array.  Which shouldn't be possible — WebBrowserPersistDocumentParent has the normal IPC lifetime, and the only thing that should be able to free one is the IPDL actor deletion code, which should maintain the invariant that seems to be violated here.
Flags: needinfo?(jld)
Flags: needinfo?(wmccloskey)
I think I know what's going on here: there's more than one WebBrowserPersistDocumentParent managed by the PBrowserParent, and one that's earlier in the array invokes callbacks (or maybe destructors via dropping refcounts) that cause a WebBrowserPersistDocumentParent later in the managee array to have Send__delete__ called on it; that destroys the second WebBrowserPersistDocumentParent and removes it from the actor's managee array, but then returns to the PBrowserParent::DestroySubtree activation, which is operating on a copy of the managee array and tries to delete the subactor a second time, which is a use-after-free.

(Closer inspection of the crash dump, with help from a disassembly of the corresponding nightly build, confirms that we're trying to call through a vtable pointer that's 0x5a5a5a5a5a5a5a5a.)

So, chalk up another casualty to aliasable mutable state.

Under the assumption that needing a TabParent to be destroyed at exactly the right time during a user-initiated Save As isn't a plausible attack vector, I'm not hiding this bug, but if someone with more sec bug experience feels otherwise I'll defer to their judgment.

And there's another problem: this means that a TabParent can be destroyed while an e10s Save As is still happening, which is going to result in an incomplete saved file, which is not what the user expects.
Assuming I'm right, the UAF can be fixed by deferring the callbacks that occur in the actor destruction path.  I'm not as clear on what needs to happen to keep the remote document alive until it's done being saved.
Assignee: nobody → jld
Flags: needinfo?(wmccloskey)
See Also: → 1203602
This patch should fix the crash reported here, but as detailed in bug 1203602 there are some related failure modes (one of which can crash the content process) which will be more difficult to deal with.

I'm a little uneasy about NS_NewRunnableMethodWithArgs's magical storage inference, but it seems to be working.

Try run looks good so far: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6926a3fb6207
Attachment #8660028 - Flags: review?(wmccloskey)
Comment on attachment 8660028 [details] [diff] [review]
Patch: defer WBP callbacks during ActorDestroy.

Review of attachment 8660028 [details] [diff] [review]:
-----------------------------------------------------------------

So, to make sure I understand:

1. We're tearing down a TabParent
2. Somehow some ActorDestroy code triggers WebBrowserPersistRemoteDocument::~WebBrowserPersistRemoteDocument via a decref or something

And using runnables here just puts off the decref? That makes sense to me.
Attachment #8660028 - Flags: review?(wmccloskey) → review+
(In reply to Jed Davis [:jld] {UTC-7} from comment #5)
> I'm a little uneasy about NS_NewRunnableMethodWithArgs's magical storage
> inference, but it seems to be working.

…which is meaningless, because that doesn't cover non-Deletion ActorDestroy.
(In reply to Bill McCloskey (:billm) from comment #6)
> So, to make sure I understand:
> 
> 1. We're tearing down a TabParent
> 2. Somehow some ActorDestroy code triggers
> WebBrowserPersistRemoteDocument::~WebBrowserPersistRemoteDocument via a
> decref or something

Yes.  Probably like this:

WebBrowserPersistResourcesParent::ActorDestroy ->
nsWebBrowserPersist::OnWalk::EndVisit ->
nsWebBrowserPersist::EndDownload ->
nsWebBrowserPersist::Cleanup ->
nsWebBrowserPersist::DocData::~DocData ->
nsCOMPtr<nsIWebBrowserPersistDocument>::Release ->
WebBrowserPersistRemoteDocument::~WebBrowserPersistRemoteDocument

> And using runnables here just puts off the decref? That makes sense to me.

Yes, that's the idea.
(In reply to Jed Davis [:jld] from comment #5)
> I'm a little uneasy about NS_NewRunnableMethodWithArgs's magical storage inference

Fortunately, bug 1179787 makes it so using nsCOMPtr as the storage type does the right thing (and looks like it would have worked before that, too, just with an extra AddRef/Release or two that doesn't matter here), and that's on all the branches where I'll need this.
Minor change to callback storage types.  Carrying over r+.
Attachment #8660028 - Attachment is obsolete: true
Attachment #8660855 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/7c9a7c6cd636
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Note to self to request uplift once this has been in the field for a few days.
Flags: needinfo?(jld)
Comment on attachment 8660855 [details] [diff] [review]
Patch: defer WBP callbacks during ActorDestroy. [v2]

Approval Request Comment
[Feature/regressing bug #]: bug 1101100
[User impact if declined]: Crashes when saving documents in e10s mode.
[Describe test coverage new/current, TreeHerder]: No STR found, so no regression test, but this crash stopped being reported on 43 (nightly) after this patch (2015-09-15). 
[Risks and why]: Minor; this affects error cases only.
[String/UUID change made/needed]: None.
Flags: needinfo?(jld)
Attachment #8660855 - Flags: approval-mozilla-beta?
Comment on attachment 8660855 [details] [diff] [review]
Patch: defer WBP callbacks during ActorDestroy. [v2]

Fix a crash, taking it!
Attachment #8660855 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.