Closed Bug 532208 Opened 15 years ago Closed 14 years ago

Plugins: Make PBrowserStream/PPluginStream destructors uni-directional

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: cjones, Assigned: benjamin)

References

Details

Attachments

(7 files, 2 obsolete files)

According to bsmedberg:

"PBrowserStream/PPluginStream may not need a bi-directional destructor, I
misunderstood how NPP_Write behaves."

I suspect that they're only sent in one direction currently, and if so this would be an IPDL-only change.
(In reply to comment #0)
> I suspect that they're only sent in one direction currently, and if so this
> would be an IPDL-only change.

Nope :(.  https://developer.mozilla.org/en/NPN_DestroyStream
From reading the docs a little closer, I think that NPP_DestroyStream() really means "NPP_NotifyDestroyStream()", and that the browser always owns the underlying "real" NPStream.  So I think all dtors can go plugin-->browser.

jgriffin/bsmedberg, does that sound correct?
From a third read of the docs it seemed to me that for BrowserStream, NPN_Destroy() ~= "done, close, delete" and NPP_Destroy() ~= "cancel" (and similarly for PluginStream).  So that's what I've implemented, though there are remaining kinks to work out.
That's definitely incorrect. If anything, NPN_Destroy means cancel and NPP_Destroy means "done close delete" for browser streams. It could be reversed for plugin streams, I'm not sure.
Oops, I actually did mean to say NPP_Destroy = "done close delete" and NPN_Destroy = "cancel".  Basically the NP*_Destroy matched with the NP*_New is the "done close delete", the other is "cancel".
This is how I think we can make PBrowserStream stateful. In this protocol, nothing races with __delete__ (even discard messages, I think).
Assignee: nobody → benjamin
Attachment #416173 - Flags: review?(jones.chris.g)
Depends on: 533002
Valgrind errors that should be diagnosed and fixed by this bug.
Comment on attachment 421754 [details]
valgrind log showing stream errors

>  (processing deferred in-call)
>==25042== Invalid read of size 4
>==25042==    at 0x6A7EED6: mozilla::plugins::PBrowserStreamChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PBrowserStreamChild.cpp:294)
>==25042==    by 0x6A6DCE4: mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PPluginModuleChild.cpp:375)
>==25042==    by 0x6A18B2B: mozilla::ipc::RPCChannel::DispatchIncall(IPC::Message const&) (RPCChannel.cpp:347)
>==25042==    by 0x6A18A44: mozilla::ipc::RPCChannel::Incall(IPC::Message const&, unsigned long) (RPCChannel.cpp:332)
>==25042==    by 0x6A18374: mozilla::ipc::RPCChannel::MaybeProcessDeferredIncall() (RPCChannel.cpp:216)
>==25042==    by 0x6A18662: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:257)
>==25042==    by 0x6A1B351: void DispatchToMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, void (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383)
>==25042==    by 0x6A1B1A7: RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() (task.h:307)
>==25042==    by 0x6ACE0A1: MessageLoop::RunTask(Task*) (message_loop.cc:326)
>==25042==    by 0x6ACE111: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (message_loop.cc:334)
>==25042==    by 0x6ACE50F: MessageLoop::DoWork() (message_loop.cc:434)
>==25042==    by 0x6A169F0: mozilla::ipc::DoWorkRunnable::Run() (MessagePump.cpp:75)
>==25042==  Address 0x13c7f7d8 is 24 bytes inside a block of size 144 free'd
>==25042==    at 0x4C24A7A: operator delete(void*) (vg_replace_malloc.c:346)
>==25042==    by 0x6A0F974: mozilla::plugins::BrowserStreamChild::~BrowserStreamChild() (BrowserStreamChild.h:63)
>==25042==    by 0x69FAD34: mozilla::plugins::PluginInstanceChild::DeallocPBrowserStream(mozilla::plugins::PBrowserStreamChild*) (PluginInstanceChild.cpp:914)
>==25042==    by 0x6A7EEA6: mozilla::plugins::PBrowserStreamChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PBrowserStreamChild.cpp:292)
>==25042==    by 0x6A6DCE4: mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PPluginModuleChild.cpp:375)
>==25042==    by 0x6A18B2B: mozilla::ipc::RPCChannel::DispatchIncall(IPC::Message const&) (RPCChannel.cpp:347)
>==25042==    by 0x6A18A44: mozilla::ipc::RPCChannel::Incall(IPC::Message const&, unsigned long) (RPCChannel.cpp:332)
>==25042==    by 0x6A18374: mozilla::ipc::RPCChannel::MaybeProcessDeferredIncall() (RPCChannel.cpp:216)
>==25042==    by 0x6A18662: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:257)
>==25042==    by 0x6A1B351: void DispatchToMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, void (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383)
>==25042==    by 0x6A1B1A7: RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() (task.h:307)
>==25042==    by 0x6ACE0A1: MessageLoop::RunTask(Task*) (message_loop.cc:326)
>==25042== 
>

I don't have the NPAPI log from before this error, but it appears to me that parent and child calls raced, the child won, and the child's out-call eventually led to the deletion of this BrowserStream.  Then the parent's deferred call was processed, and was directed at this dead stream.  This is a problem with RPC race resolution that I nebulously hinted at in bug 533002 comment 4.  I've got an in-progress newsgroup post describing some possible fixes, but I haven't had time to sit down for a good think on it yet.

>==25042== Invalid read of size 4
>==25042==    at 0x6A80DE3: mozilla::plugins::PStreamNotifyChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PStreamNotifyChild.cpp:117)
>==25042==    by 0x6A6DCE4: mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PPluginModuleChild.cpp:375)
>==25042==    by 0x6A18B2B: mozilla::ipc::RPCChannel::DispatchIncall(IPC::Message const&) (RPCChannel.cpp:347)
>==25042==    by 0x6A18A44: mozilla::ipc::RPCChannel::Incall(IPC::Message const&, unsigned long) (RPCChannel.cpp:332)
>==25042==    by 0x6A186FA: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:267)
>==25042==    by 0x6A1B351: void DispatchToMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, void (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383)
>==25042==    by 0x6A1B1A7: RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() (task.h:307)
>==25042==    by 0x6ACE0A1: MessageLoop::RunTask(Task*) (message_loop.cc:326)
>==25042==    by 0x6ACE111: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (message_loop.cc:334)
>==25042==    by 0x6ACE50F: MessageLoop::DoWork() (message_loop.cc:434)
>==25042==    by 0x6A169F0: mozilla::ipc::DoWorkRunnable::Run() (MessagePump.cpp:75)
>==25042==    by 0x6BD1718: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:527)
>==25042==  Address 0x130cf9a8 is 24 bytes inside a block of size 64 free'd
>==25042==    at 0x4C24A7A: operator delete(void*) (vg_replace_malloc.c:346)
>==25042==    by 0x69FB5CB: mozilla::plugins::StreamNotifyChild::~StreamNotifyChild() (StreamNotifyChild.h:48)
>==25042==    by 0x69FAEF2: mozilla::plugins::PluginInstanceChild::DeallocPStreamNotify(mozilla::plugins::PStreamNotifyChild*) (PluginInstanceChild.cpp:970)
>==25042==    by 0x6A80DB9: mozilla::plugins::PStreamNotifyChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PStreamNotifyChild.cpp:115)
>==25042==    by 0x6A6DCE4: mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) (PPluginModuleChild.cpp:375)
>==25042==    by 0x6A18B2B: mozilla::ipc::RPCChannel::DispatchIncall(IPC::Message const&) (RPCChannel.cpp:347)
>==25042==    by 0x6A18A44: mozilla::ipc::RPCChannel::Incall(IPC::Message const&, unsigned long) (RPCChannel.cpp:332)
>==25042==    by 0x6A186FA: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:267)
>==25042==    by 0x6A1B351: void DispatchToMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, void (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383)
>==25042==    by 0x6A1B1A7: RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() (task.h:307)
>==25042==    by 0x6ACE0A1: MessageLoop::RunTask(Task*) (message_loop.cc:326)
>==25042==    by 0x6ACE111: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (message_loop.cc:334)
>==25042== 
>

This appears to be a bug in IPDL-generated code; apparently the invalid read is of an |mId| of the stream that was just deleted.  I'm surprised I haven't seen this in IPDL testing; I think it's because none of the tests use synchronous destructors.  I'll cook up a test case to confirm.  Spun off into bug 539856.
Depends on: 547703
Depends on: 548217
IPDL doesn't have the state machine completely hooked up, but this is a stateful protocol where the states are currently implemented by hand. Browser stream data is now delivered completely asynchronously. Apart from some horrible bugs in the plugin host and an IPDL message delivery ordering bug (both marked as dependencies), this should be good to go. It passes mochitests with the plugnhost patch.
Attachment #416173 - Attachment is obsolete: true
Attachment #428662 - Flags: review?(bent.mozilla)
Attachment #416173 - Flags: review?(jones.chris.g)
Depends on: 548214
No longer depends on: 533002, 547703
Comment on attachment 428662 [details] [diff] [review]
Make browser stream data delivery asynchronous, rev. 1

>+BrowserStreamChild::RecvWrite(const int32_t& offset,
> ...
>+  NS_ASSERTION(ALIVE == mState, "Received data after NPP_DestroyStream");

Let's abort on child-side state inconsistencies?

>+  mInstance->mPluginIface->destroystream(&mInstance->mData, &mStream, reason);

Do we not care about plugin errors here? What does pluginhost do here?

>+      if (!r) {
> ...
>+      if (r == 0) {

Nit: pick a style ;)

>+static const int32_t kSendDataChunk = 0x1000;

Nit: Can this hang out at the top of the file? I feel like that's where most people look for constants like this.

> BrowserStreamParent::Write(int32_t offset,
>                            int32_t len,
>                            void* buffer)

Is it worth asserting that len > 0?
Attachment #428662 - Flags: review?(bent.mozilla) → review+
Attachment #428886 - Flags: review?(jones.chris.g)
Comment on attachment 428886 [details] [diff] [review]
DeliverData must not re-enter, rev. 1

Some of these invariants want comments wanted, but shipping alpha >> comments.
Attachment #428886 - Flags: review?(jones.chris.g) → review+
The invariants as I understand them are

 - when NPP_Write() returns, a DeliverData task is in the event queue
 - when DeliverData returns, !mPending.empty() => there is a DeliverData task in the event queue

I don't see those violated.
Comment on attachment 428982 [details] [diff] [review]
DeliverData must not reenter, rev. 2

This patch suspends not just DeliverData, but also NPP_DestroyStream until data delivery is complete (or the plugin cancels it). jgriffin's stream tests were a lifesaver!

For the record, this got landed and backed out last night because of async delivery and re-entrancy issues which reprented at least three different bugs in the first patch:

DeliverData could re-enter (bad)
event when DeliverData couldn't re-enter, merely appending to mPendingData could invalidate the reference we held to mPendingData[0]
RecvNPP_DestroyStream could nest inside of DeliverData while there was pending data, which is both unexpected NPAPI behavior and also caused us to delete the actor while it was still on the stack
Attachment #428982 - Attachment description: De → DeliverData must not reenter, rev. 2
Attachment #428982 - Attachment is patch: true
Attachment #428982 - Attachment mime type: application/octet-stream → text/plain
Attachment #428982 - Flags: review?(bent.mozilla)
Comment on attachment 428982 [details] [diff] [review]
DeliverData must not reenter, rev. 2

>+BrowserStreamChild::MaybeDeliverNPP_DestroyStream()
> ...
>+    NPReason reason = mDestroyPending;

You don't seem to use this.

>+   * stack frame. It instead posts a runnable using this tracker to cancel
>+   * in case we are destroyed.
>+   */
>+  ScopedRunnableMethodFactory<BrowserStreamChild> mDeliverDataTracker;

I don't see where you would cancel this... Maybe you can avoid the tracker?
Attachment #428982 - Flags: review?(bent.mozilla) → review+
This patch makes the destructor unidirectional without giving the protocol async semantics. Async semantics rev. 2 still has problems with NPN_URLNotify being delivered before NPP_DestroyStream.
Attachment #429040 - Flags: review?(jones.chris.g)
If you get a chance, please land this along with the test I will attach and re-land bug 548217.
Comment on attachment 429040 [details] [diff] [review]
RPC but unidirectional, rev. 1

This patch is still susceptible to a use-after-__delete__ hazard, I think.  Namely,

  Parent                Child
 ---------             -------
  CallWrite             CallDestroy
  AnswerDestroy         [defer Write]
  <- NPN_Destroy
  -> NPP_Destroy (?)
  Call__dealloc__
                        delete stream
  delete stream
  ...                   ...
                        [process deferred]
                        AnswerWrite
                        BOOM

This might be better in that the crash would likely happen in the plugin process, and we would error out of the C++ stack frame on which the deleted BrowserStreamParent actor made the out-call.

I think we can work around this by
 - adding an NPP_Destroy() message
 - having that enqueue the stream for __delete__ before CallNPP_Destroy()
 - having AnswerNPP_Destroy call iface->NPP_Destroy
 - guarding re-entry of Write et al. with an mClosed flag
 - override OnExitedCxxStack() to flush the streamDeleteQueue, calling __delete__

The patch is pretty straightforward, but I don't think I'll be able to test it locally due to remote-X latency running firefox.  If that doesn't work, I'll push it pending-r? to the e10s "tryserver" and if it doesn't flame, maybe we can transplant over tomorrow or Saturday.
Attachment #429040 - Flags: review?(jones.chris.g) → review-
This is the basic idea, but it's failing the new mochitest-ipcplugins test, apparently because streams aren't being removed from the to-delete array as I expect them to be.  In the hotel, my debugging cycle is around half an hour, so I'll go ahead and put this up for review and fix the bug tomorrow.
Attachment #429081 - Flags: review?(benjamin)
Chris, the latest patch wasn't meant to be purely safe, just good enough for beta... the actual likelihood of race conditions happening is pretty small in this case... I'll look over this patch, but in the interest of starting alpha builds today I'd like to take the not-perfect patch if there continue to be problems with yours.
Comment on attachment 429081 [details] [diff] [review]
Delete BrowserStreams off of ExitedCxxStack() callbacks

The DeleteOnStackExit/DontDelete machinery seems really strange to me. I would expect that problem to normally be solved by posting a cancelable event and canceling the event in the destructor.
Attachment #429081 - Flags: review?(benjamin) → review-
I don't feel comfortable with "RPC but unidirectional"; I think it's still susceptible to several bugs already on file.

Can we give a "delete event" version of the latest patch a chance?  I leave for home around noonish, I'm willing to "call it" then.
Passes mochitest-ipcplugins (along with the new test).  Additionally "fixes" bug 543917 and presumably its likely dups.
Attachment #429081 - Attachment is obsolete: true
Attachment #429148 - Flags: review?(benjamin)
Attachment #429148 - Flags: review?(benjamin) → review+
Blocks: 544058
Depends on: 549088
I'll mark this FIXED and do the async delivery in a followup.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 545186
Blocks: 551049
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: