Closed Bug 559200 Opened 14 years ago Closed 14 years ago

e10s HTTP: Buffer incoming IPDL necko messages

Categories

(Core :: IPC, defect)

Other Branch
x86
Windows CE
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0a1+ ---

People

(Reporter: jduell.mcbugs, Assigned: jdm)

References

Details

Attachments

(3 files, 11 obsolete files)

Dougt is seeing the following error when loading

    http://mobile.support.mozilla.com/en-US/kb/

Possibly something wrong with our IPDL serialization logic causing RecvOnStartRequest to return before calling the listener?

Jae-Seong, you wrote the serialization code, so I'm giving to you.  This is blocking browsing with fennec, though, so if you can't get to it very soon, please let me know and we'll find someone else.  

---

WARNING: Positioned frame that does not handle positioned kids; looking further
up the parent chain: file
/home/dougt/builds/e10s/electrolysis/layout/base/nsCSSFrameConstructor.cpp, line
5511

WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x804B000A: file
/home/dougt/builds/e10s/electrolysis/netwerk/base/src/nsIOService.cpp, line 583

++DOMWINDOW == 3 (0xb1e4dcd0) [serial = 3] [outer = 0xb1e4ce20]

###!!! ASSERTION: Error: OnStartRequest() must be called before
OnDataAvailable(): '(eOnStart == mParserContext->mStreamListenerState ||
eOnDataAvail == mParserContext->mStreamListenerState)', file
/home/dougt/builds/e10s/electrolysis/parser/htmlparser/src/nsParser.cpp, line
2905

###!!! [Child][AsyncChannel] Error: Value error: message was deserialized, but
contained an illegal value

###!!! ASSERTION: Error: OnStartRequest() must be called before
OnDataAvailable(): '(eOnStart == mParserContext->mStreamListenerState ||
eOnDataAvail == mParserContext->mStreamListenerState)', file
/home/dougt/builds/e10s/electrolysis/parser/htmlparser/src/nsParser.cpp, line
2905

###!!! [Child][AsyncChannel] Error: Value error: message was deserialized, but
contained an illegal value

###!!! ASSERTION: Error: OnStartRequest() must be called before
OnDataAvailable(): '(eOnStart == mParserContext->mStreamListenerState ||
eOnDataAvail == mParserContext->mStreamListenerState)', file
/home/dougt/builds/e10s/electrolysis/parser/htmlparser/src/nsParser.cpp, line
2905

###!!! [Child][AsyncChannel] Error: Value error: message was deserialized, but
contained an illegal value

###!!! ASSERTION: nsExpatDriver didn't get an nsIExpatSink: 'Error', file
/home/dougt/builds/e10s/electrolysis/parser/htmlparser/src/nsExpatDriver.cpp,
line 1206

###!!! ASSERTION: DoContent returned no listener?: 'abort ||
m_targetStreamListener', file
/home/dougt/builds/e10s/electrolysis/uriloader/base/nsURILoader.cpp, line 755
I would like to say that it is not necessarily the serialization code that causes the "Error: Value error: " message.

https://wiki.mozilla.org/Content_Processes/FAQ suggests MOZ_IPC_MESSAGE_LOG=1.
PHttpChannelParent sent Msg_OnStartRequest(), but the child did not receive it, HttpChannelChild::RecvOnStartRequest not executed, thus the assertion error - OnStartRequest() must be called before OnDataAvailable().

This happens on fennec, but not on firefox.
(In reply to comment #2)
> but the child did not receive it,
> HttpChannelChild::RecvOnStartRequest not executed,

I was wrong.  HttpChannelChild::OnStartRequest is called, but I don't know what is happening.

----------------------------------------------------------------------------
I put printfs in RecvOnStartRequest, RecvOnDataAvailable & RecvOnStopRequest.

bool 
HttpChannelChild::RecvOnStartRequest(const nsHttpResponseHead& responseHead)
{
  LOG(("HttpChannelChild::RecvOnStartRequest [this=%x]\n", this));
+ fprintf(stderr, "\nHttpChannelChild::OnStartRequest b\n");
  mState = HCC_ONSTART;

  mResponseHead = new nsHttpResponseHead(responseHead);
+ fprintf(stderr, "\nStatus: %d\n", mResponseHead->Status());
+ fprintf(stderr, "\nVersion: %d\n", mResponseHead->Version());
  nsresult rv = mListener->OnStartRequest(this, mListenerContext);
+ fprintf(stderr, "\nHttpChannelChild::OnStartRequest 1\n");
  if (!NS_SUCCEEDED(rv)) {
    // TODO: Cancel request:
    //  - Send Cancel msg to parent 
    //  - drop any in flight OnDataAvail msgs we receive
    //  - make sure we do call OnStopRequest eventually
    //  - return true here, not false
    return false;  
  }
+ fprintf(stderr, "\nHttpChannelChild::OnStartRequest e\n");
  return true;
}

http://mxr.mozilla.org/projects-central/source/electrolysis/netwerk/protocol/http/src/HttpChannelChild.cpp#94

----------------------------------------------------------------------------
HttpChannelChild::OnStartRequest b

Status: 200

Version: 11
++DOMWINDOW == 3 (045B7860) [serial = 3] [outer = 03EB8628]

HttpChannelChild::OnDataAvailable b

###!!! ASSERTION: Error: OnStartRequest() must be called before OnDataAvailable(): '(eOnStart == mParserContext->mStreamListenerState || eOnDataAvail == mParserContext->mStreamListenerState)', file c:/mozilla-build/e10s-debug/parser/htmlparser/src/nsParser.cpp, line 2905
----------------------------------------------------------------------------
Jason, I don't think it is the serialization code.

###!!! [Child][AsyncChannel] Error: Value error: message was deserialized, but
contained an illegal value

That message was generated because RecvOnDataAvailable returned false (NS_PREDONDITION failed).  I can confirm that using debugger.
Assignee: lusian → nobody
Is there a way to force OnDataAvailable wait until OnStartRequest completes?

This may be wrong, but here is my idea:

 1 NS_InvokeByIndex_P
   ...
17 mozilla::net::HttpChannelChild::RecvOnStartRequest
18 mozilla::net::PHttpChannelChild::OnMessageReceived
19 mozilla::dom::PContentProcessChild::OnMessageReceived
20 mozilla::ipc::AsyncChannel::OnDispatchMessage

HttpChannelChild received RecvOnStartRequest (17) and made the following calls (16 to 1).  NS_InvokeByIndex_P calls functions (such as nsStandardURL::GetSpec), and periodically calls TabChild::SendSyncMessage, which subsequently calls

nsFrameMessageManager::SendSyncMessage -> PIFrameEmbeddingChild::CallsendSyncMessageToParent -> RPCChannel::Call -> AsyncChannel::OnDispatchMessage

and AsyncChannel::OnDispatchMessage calls HttpChannelChild::RecvOnDataAvailable although HttpChannelChild::RecvOnStartRequest has not completed yet.  The following code sees "OnStartRequest() must be called before
OnDataAvailable()" is false.

So, we need to do something to make sure OnStartRequest finishes before OnDataAvailable starts.
Comment on attachment 439077 [details]
stack trace until the first ASSERTION

cjones looked at this stack trace, and apparently the problem is the following:

RecvOnStartRequest is triggering a DOM event that's making an RPC call
to the parent (the confusingly named rpc method "sendSyncMessageToParent").

Since rpc msgs need to be re-entrant, this causes pending IPDL msgs in the queue (in this case, RecvOnDataAvailable) to get delivered, when it would otherwise not be dispatched until RecvOnStartRequest has finished.

I don't think this issue can be fixed in necko, or should be.  The rpc msg probably shouldn't be getting sent, and/or shouldn't be an rpc msg (I thought we were planning to use rpc only when needed--for plugins--precisely to avoid this sort of mess).
Renaming and (per hg blame), assigning to smaug.

This is blocking fennec using necko/e10s, so would be nice to fix ASAP.
Assignee: nobody → Olli.Pettay
Component: Networking: HTTP → DOM: Events
QA Contact: networking.http → events
Summary: HttpChannelChild calling OnDataAvail before OnStartRequest due to serialization error? → DOM "SendSyncMessageToParent" rpc call breaks necko/e10s
To unblock, I would suggest finding a possibly quick 'n hacky way of preventing the OnStart event from being forwarded from child->parent, I don't see how that would ever be useful.
Fwiw, this frame in the backtrace

xul.dll!nsDocLoader::FireOnLocationChange

suggests that we're forwarding onlocationchange to chrome.  If that's right, comment 9 is incorrect.  I do however think that we shouldn't forward cross-process a notification that chrome can (can?|might?) listen to.
Component: DOM: Events → IPC
QA Contact: events → ipc
I don't know what is confusing with the name "sendSyncMessageToParent"
It is sending a synchronous messageManager message from child to parent.

And yes, it needs to be 'rpc' because otherwise CPOW handling doesn't
work. It used to be 'sync' , but after adding support for CPOW sending, it was
changed to rpc. But in any case it must be something
which can return some value to child process synchronously.

Could we somehow postpone sendOnDataAvailable? Could RecvOnStartRequest
send some message to chrome when it is "done", and only after that
chrome sends OnDataAvailable?
I don't really know anything in messageManager message handling which
could help here.
Assignee: Olli.Pettay → nobody
And note, if sendSyncMessageToParent can be called, then also createWindow
which also must be rpc. It just doesn't happen in this case, but writing
a testcase for that shouldn't be hard.
A possible case for OnStartRequest to send createWindow to chrome is
using XMLHttpRequest and having readystatechange event listener to
calls window.open() in JS.

So to me it sounds like necko/e10s really must not send on OnDataAvailable
before it knows that OnStartRequest is fully handled.
Or the client side part of necko/e10s should cache/postpone/batch the handling of
OnDataAvailable messages.
Attached patch Quick fix (obsolete) — Splinter Review
This is untested, but I suspect it should fix the problem in question.  We just buffer up all data received until the child sends a notification that OnStartRequest has finished, then the parent can send all buffered notifications.  It gets a bit more complicated because of thread-safety (I suspect there's the potential for more OnDataAvailable notifications to be delivered while sending the buffered IPDL messages), but I think it's a workable short-term solution.
Attachment #439883 - Flags: review?(jduell.mcbugs)
Are those geolocation changes relevant?
I'm sorry, there were some mq problems.  I'll put up a correct one in a few minutes.
Attached patch Quick fix (obsolete) — Splinter Review
Now with 100% fewer mq screwups.
Attachment #439883 - Attachment is obsolete: true
Attachment #439884 - Flags: review?(jduell.mcbugs)
Attachment #439883 - Flags: review?(jduell.mcbugs)
Bleah, it occurs to me that there's still a race present in my code:

>  else {
>    MutexAutoLock lock(mBufferLock);
>    shouldBuffer = !!mBufferedData.Length();
>  }
>  if (shouldBuffer) {
>    DataBuffer buffer = {
>      data, aOffset, bytesRead
>    };
>    MutexAutoLock lock(mBufferLock);
>    mBufferedData.AppendElement(buffer);
>    return NS_OK;
>  }

If the parent receives StartRequestCompleted in between shouldBuffer being set and the AppendElement call, we could end up buffering data that will never be sent, which will then cause all subsequent data to be buffered as well.  I'm not really sure of the best way to fix this without duplicating the appending to share the first lock.
I guess the parent could just grab the lock unconditionally.
Attached patch Quick fix v2 (obsolete) — Splinter Review
Race fixed, explanatory comments.  Sorry for the bugspam.
Attachment #439884 - Attachment is obsolete: true
Attachment #439886 - Flags: review?(jduell.mcbugs)
Attachment #439884 - Flags: review?(jduell.mcbugs)
Comment on attachment 439886 [details] [diff] [review]
Quick fix v2

After chatting with bsmedberg, the preferred way to handle this is to cache the data on the child side, not the parent.  That way we can fire off OnDataAvail as soon as OnStart completes, rather than waiting for an IPC round trip.

See BrowserStreamChild::RecvWrite() for an example.

Thanks for picking this bug up jdm!
Attachment #439886 - Flags: review?(jduell.mcbugs) → review-
(In reply to comment #10)
> Fwiw, this frame in the backtrace
> 
> xul.dll!nsDocLoader::FireOnLocationChange
> 
> suggests that we're forwarding onlocationchange to chrome.  If that's right,
> comment 9 is incorrect.  I do however think that we shouldn't forward
> cross-process a notification that chrome can (can?|might?) listen to.

Is forwarding onlocationchange to chrome indeed what was causing this bug?  If so, why are we doing that?  Can't chrome determine for itself when the location has changed?

(In reply to comment #11)
> I don't know what is confusing with the name "sendSyncMessageToParent"
> It is sending a synchronous messageManager message from child to parent.
> 

'send': for a 'sync' message Foo() in a protocol spec, the IPDL compiler generate a C++ method called SendFoo(); for an 'rpc' message Bar(), it generates a method CallBar().  Also, prevailing style uses upper-case MessageName().

'Sync': IPDL uses the 'sync' keyword to mean "synchronous, non-reentrant"; 'rpc' is "synchronous, reentrant."

'ToParent': is implied

This is all documented here https://developer.mozilla.org/en/IPDL.
(In reply to comment #23)
> Is forwarding onlocationchange to chrome indeed what was causing this bug?  If
> so, why are we doing that?  Can't chrome determine for itself when the location
> has changed?
> 

(FWIW I don't disagree with the proposed fix, I just think it would be propitious to smoke out unnecessary event forwarding.)
Attached patch Patch (obsolete) — Splinter Review
Still untested.  If there's a testcase for this, it would be nice to prove to myself that this works.
Assignee: nobody → josh
Attachment #439886 - Attachment is obsolete: true
Attachment #440068 - Flags: review?(jduell.mcbugs)
Attached patch Patch (obsolete) — Splinter Review
One day mq and I will get along without incident.
Attachment #440068 - Attachment is obsolete: true
Attachment #440069 - Flags: review?(jduell.mcbugs)
Attachment #440068 - Flags: review?(jduell.mcbugs)
(In reply to comment #23) 
> 'ToParent': is implied
Because IPDL doesn't allow to use the same message name for
child->parent and parent->child, I couldn't figure out anything better naming for this.

(And yes, this is off topic :) )
(In reply to comment #27)
> (In reply to comment #23) 
> > 'ToParent': is implied
> Because IPDL doesn't allow to use the same message name for
> child->parent and parent->child, I couldn't figure out anything better naming
> for this.
> 

Nope, https://developer.mozilla.org/en/IPDL/Tutorial#Direction .
Attached file stack trace after applying jdm's patch (obsolete) —
###!!! ASSERTION: nsExpatDriver didn't get an nsIExpatSink: 'Error', file c:/mozilla-build/e10s-debug/parser/htmlparser/src/nsExpatDriver.cpp, line 1206

Now, HttpChannelChild::RecvOnStopRequest runs before HttpChannelChild::RecvOnStartRequest finishes.
I suppose I could just make it a queue of nsIRunnables and add buffering code to every message that could be received.  Dunno if there's a better way to address this.
Attached patch Patch v2 (obsolete) — Splinter Review
Now buffering all callbacks.
Attachment #440069 - Attachment is obsolete: true
Attachment #440174 - Flags: review?(jduell.mcbugs)
Attachment #440069 - Flags: review?(jduell.mcbugs)
Comment on attachment 440174 [details] [diff] [review]
Patch v2

Bleah, patch is busted.  New version that doesn't segfault coming soon.
Attachment #440174 - Attachment is obsolete: true
Attachment #440174 - Flags: review?(jduell.mcbugs)
Attached patch Patch v2 (obsolete) — Splinter Review
It's new and improved!
Attachment #440183 - Flags: review?(jduell.mcbugs)
Comment on attachment 440092 [details]
stack trace after applying jdm's patch

No more errors!
Attachment #440092 - Attachment is obsolete: true
Blocks: 558801
Attached patch Patch v2.5 (obsolete) — Splinter Review
The patches, they never end!  This just factors the buffering logic out into one place, and makes it generally nicer to add more buffered callbacks (such as the forthcoming redirect, I assume).
Attachment #440183 - Attachment is obsolete: true
Attachment #440474 - Flags: review?(jduell.mcbugs)
Attachment #440183 - Flags: review?(jduell.mcbugs)
Smaug, can we now agree that the sendSyncMessageToParent() message should have IPDL/'sync' semantics, not 'rpc'?  This would allow jduell to dodge this bullet for a while longer.
This kind of a band-aid, dodge-the-bullet kind of patch, but I don't repro the
original bug with it applied.  If anything tried to re-enter sendSyncMessage (a
CPOW, say), the content process would blow up.
(In reply to comment #36)
> Smaug, can we now agree that the sendSyncMessageToParent() message should have
> IPDL/'sync' semantics, not 'rpc'? 
As I said no. If we want to keep support for CPOW sending, then sendSyncMessageToParent needs to be rpc. But if we want to remove support for
CPOWs there, then we can make it back to sync which it was before Bug 554941.
And actually even removing CPOW support from sync message listeners doesn't
help at all, since in chrome message listener could still access
the CPOW for the content process DOMWindow.
Depends on: 564382
Depends on: 567058
Attachment #440474 - Flags: review?(jduell.mcbugs)
The blocking bugs are tracking the steps being taken to fix this problem; jduell suggested that we should add some state flags to the http channel child that let us check for re-entrancy so we can avoid problems like this in the future.
There are two issues here --- first, once this bug's blockers are fixed, we can summarily remove 'rpc' from the PContent tree, and statically prevent re-entrancy at the IPDL level.

Second is re-entrancy caused by nested event loops, which our IPC code doesn't pay much attention to.  It's not clear to me what problems might arise there.
Blocks: 581279
It looks like various event loops currently exist in the code, for example in window.alert()/notify/confirm, as noted in bug 573635. And there might be additional sequences of messages aside from HttpChannel::onStartRequest/onDataAvailable/etc. that are sensitive to this problem (for example, FTP channels). So perhaps it makes sense to think about a more general solution?

One possible direction: Separate messages into multiple logical streams (e.g., 'channels' in the sense of the networking library ENet if anyone's heard of that, http://enet.bespin.org/Features.html ). So we might have networking messages in one stream, GUI messages in another, etc., and the order would be guaranteed to be preserved within a stream. So when responding to a network event, if you run an event loop, it will let you process additional messages but *not* messages from the same stream, which it will buffer. So we would basically write the buffering code once in a general way as opposed to specifically for HTTP channels.
OK, it looks like sync XHR is going to require that we actually take this buffering approach.  If an XHR gets sent out during OnStartRequest, for instance, we'll spin the loop and get OnDataAvailable from IPDL while we're still in OnStart.  Buffering is the most obvious way to fix this. 

> One possible direction: Separate messages into multiple logical streams (e.g.,
'channels' in the sense of the networking library ENet if anyone's heard of
that, http://enet.bespin.org/Features.html ). So we might have networking
messages in one stream, GUI messages in another, etc.

Won't work--this is re-entrancy within necko itself.  Time to bite the bullet and buffer.  (hey, that's kinda catchy).

jdm is putting this high on his stack--we need a working solution for fennec.  Should just be a matter of fixing up his previous patch.
tracking-fennec: --- → ?
Summary: DOM "SendSyncMessageToParent" rpc call breaks necko/e10s → e10s HTTP: Buffer incoming IPDL necko messages
Attached patch Patch (obsolete) — Splinter Review
Rebased patch that fixes the crash in bug 581279.
Attachment #440474 - Attachment is obsolete: true
Attachment #441125 - Attachment is obsolete: true
Attachment #461703 - Flags: review?(jduell.mcbugs)
Attached patch PatchSplinter Review
Updated to not leak callbacks.
Attachment #461703 - Attachment is obsolete: true
Attachment #461993 - Flags: review?(jduell.mcbugs)
Attachment #461703 - Flags: review?(jduell.mcbugs)
tracking-fennec: ? → 2.0a1+
Comment on attachment 461993 [details] [diff] [review]
Patch

double check lock.  neat.
Attachment #461993 - Flags: review?(jduell.mcbugs) → review+
also -- this approach is probably fine for now.  I much rather see a solution that doesn't require every protocol to do this buffering.  Could we get a follow up bug to do this work, please?
Filed bug 584441 for further investigations into a more widespread fix.
OS: Linux → Windows CE
http://hg.mozilla.org/mozilla-central/rev/40aea46474a9
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I happened to catch the use of the lock here while unrotting a patch.  It looks totally unnecessary to me (this is all main-thread only), and additionally,

+  MutexAutoLock lock(mBufferLock);
+  bool ret = true;
+  nsCOMPtr<nsIHttpChannel> kungFuDeathGrip(this);
+  for (PRUint32 i = 0; i < mBufferedCallbacks.Length(); i++) {
+    ret = mBufferedCallbacks[i]->Run();

is dispatching callbacks while holding the lock, which is going to deadlock if we need to buffer from a nested event loop.  Plz to be fixing?
Attached patch FollowupSplinter Review
You make excellent points.
Attachment #462968 - Flags: review?(jones.chris.g)
Attachment #462968 - Flags: review?(jones.chris.g) → review+
Comment on attachment 462968 [details] [diff] [review]
Followup

There are some other things I think we should fix in the original
patch.  Let me list those first.

First, the invariant here is somewhat nonobvious, took me a bit to
figure out: "Don't dispatch necko events until after the OnStart
notification finishes, and then only after queued events have been
dispatched in the order in which they were received."  We need a
comment describing this.  It's also tricky that we might enqueue new
Callbacks while flushing the buffer; that is, mutate
mBufferedCallbacks while in the dispatch loop.  This is OK because of
the way the loop iteration is written and because we don't *remove*
items until done dispatching.

We also need assertions to check this invariant.  This code implements
a three-phase process: first, before the OnStart notification
finishes, we queue events (say, PHASE_ONSTART).  Second, after OnStart
finishes, we flush the queue, but still enqueue any events received
from a nested context so as to preserve in-order delivery (say,
PHASE_FLUSHING_BUFFER).  And third, no queuing (say, PHASE_UNBUFFERED).  I
would recommend a state variable for these instead of a bool
mShouldBuffer.  Then, in the functions where we deliver the
notifications, we can |NS_ABORT_IF_FALSE(mPhase != PHASE_ONSTART)|.
To decide when to buffer, we can check |mPhase != PHASE_UNBUFFER|.

I like the |Callback| and |BufferOrDispatch| implementation, it's nice
and clean.  However, it incurs a cost --- here we always alloc the
Callback, and always copy the delivered params.  Considering that the
param to OnData is a buffer of possibly very large size, I think we
should only pay this price when we need to, since it should be
relatively uncommon.  So I think this style

>+  DataAvailableEvent* event = new DataAvailableEvent(this, data, offset, count);
>+  return BufferOrDispatch(event);

ought instead to be

  if (ShouldBuffer()) {
    return AppendToBuffer(new DataAvailableEvent(this, data, offset, count));
  }
  return OnDataAvailable(data, offset, count);

This is more verbose, but saves what could potentially be a fair
amount of unnecessary memory traffic.

>diff --git a/netwerk/protocol/http/HttpChannelChild.cpp b/netwerk/protocol/http/HttpChannelChild.cpp
>--- a/netwerk/protocol/http/HttpChannelChild.cpp
>+++ b/netwerk/protocol/http/HttpChannelChild.cpp
>@@ -44,18 +44,16 @@
> #include "mozilla/net/NeckoChild.h"
> #include "mozilla/net/HttpChannelChild.h"
> 
> #include "nsStringStream.h"
> #include "nsHttpHandler.h"
> #include "nsMimeTypes.h"
> #include "nsNetUtil.h"
> 
>-using mozilla::MutexAutoLock;
>-
> class Callback
> {
>  public:
>   virtual bool Run() = 0;
> };
> 
> namespace mozilla {
> namespace net {
>@@ -63,17 +61,16 @@ namespace net {
> // C++ file contents
> HttpChannelChild::HttpChannelChild()
>   : mIsFromCache(PR_FALSE)
>   , mCacheEntryAvailable(PR_FALSE)
>   , mCacheExpirationTime(nsICache::NO_EXPIRATION_TIME)
>   , mState(HCC_NEW)
>   , mIPCOpen(false)
>   , mShouldBuffer(true)
>-  , mBufferLock("mozilla.net.HttpChannelChild.mBufferLock")
> {
>   LOG(("Creating HttpChannelChild @%x\n", this));
> }
> 
> HttpChannelChild::~HttpChannelChild()
> {
>   LOG(("Destroying HttpChannelChild @%x\n", this));
> }
>@@ -151,17 +148,16 @@ HttpChannelChild::RecvOnStartRequest(con
>     //  - make sure we do call OnStopRequest eventually
>     //  - return true here, not false
>     return false;  
>   }
> 
>   if (mResponseHead)
>     SetCookie(mResponseHead->PeekHeader(nsHttp::Set_Cookie));
> 
>-  MutexAutoLock lock(mBufferLock);
>   bool ret = true;
>   nsCOMPtr<nsIHttpChannel> kungFuDeathGrip(this);
>   for (PRUint32 i = 0; i < mBufferedCallbacks.Length(); i++) {
>     ret = mBufferedCallbacks[i]->Run();
>     if (!ret)
>       break;
>   }
>   mBufferedCallbacks.Clear();
>@@ -384,24 +380,18 @@ HttpChannelChild::OnStatus(const nsresul
> 
>   return true;
> }
> 
> bool
> HttpChannelChild::BufferOrDispatch(Callback* callback)
> {
>   if (mShouldBuffer) {
>-    MutexAutoLock lock(mBufferLock);
>-    // If we can't grab the lock immediately, that means we're currently
>-    // emptying the buffer. Therefore, the following condition should now
>-    // be false, and we can resume immediate message processing.
>-    if (mShouldBuffer) {
>       mBufferedCallbacks.AppendElement(callback);
>       return true;
>-    }
>   }
> 
>   bool result = callback->Run();
>   delete callback;
>   return result;
> }
> 

FWIW, this patch looks fine to me for fixing the unnecessary locking and potential deadlock ;).
Also, it's possible to do sync XHR in xpcshell, right?  Would be nice to have a test for this.
Please don't check in something this significant and subtle to necko w/o getting a review or signoff from a necko peer (me, bz, biesi, or darin if you're really lucky).

I'm confused here, and 95% sure that this code is at least a little bit broken.

1) Some OnStatus (and I believe OnProgress) messages arrive and should be dispatched *before* OnStartRequest.   This patch looks like it breaks that.  To know for sure (and to add test coverage), please modify test_progress.js so that it also handles OnStart/Data/Stop.  Have all of them (inc onStatus/Progress) print when they're called.   Run in unit, and then make a wrapper in unit_ipc and run.  See if the order of calls differ.  If it does, change the test so that it keeps a state variable to enforce that e10s calls all the notifications/callbacks in the same order as non-e10s (I'm guessing we'll want to guarantee that OnStatus and/or OnProgress have been called before OnStartReq.  If we want to get fancy, OnStatus and OnProgress should also be called again afterward, before each OnDataAvail.)

2) Why are we only buffering until OnStartRequest completes?  I'm guessing it's quite possible for a sync XHR request to happen within OnDataAvailable (if HTML parser hits JS that does one before the page has finished loading?  Not my department--anyone know?).  Or an OnProgress/Status, or even OnStop.  There's nothing in the API that forbids that. 

I was figuring we'd have a mechanism that buffered incoming msgs if *any* other message is still being dispatched (or if the channel's suspended).  I.e. just guarantee that all necko IPDL messages are dispatched serially.  

nit: rename "Callback" something more descriptive and less likely to cause namespace and/or mxr clash?  How about "ChannelChildEvent"?  

> Considering that the param to OnData is a buffer of possibly very large size,
> I think we should only pay this price when we need to, since it should be
> relatively uncommon.

+1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
str = "guarantee that all necko IPDL messages are dispatched serially";

str += "on a per-channel basis";  

We obviously aren't trying to queue msgs for other channels (like the sync XHR one).
BTW in case it's not clear, I don't think we need to back this out--dougt doesn't think anything in fennec would break from the onStatus/progress msgs coming later than OnStart.

(Does this mean we should file a new bug?)
cjones - great suggestions.  I created a follow up bug to address both parts.  See bug 584604.

jduell - it is odd that status messages happen before start happens, but right not I do not think that anything depends on this behavior in fennec or dependent code.

I still would like to treat this solution as a work around.  I really hate to see http do so much work to deal with ipdl necko messages.  Other protocols are going to have to do the same thing which means making a new protocol in mozilla a real pain in the ass.
(In reply to comment #60)
> I still would like to treat this solution as a work around.  I really hate to
> see http do so much work to deal with ipdl necko messages.  Other protocols are
> going to have to do the same thing which means making a new protocol in mozilla
> a real pain in the ass.

The work done here is to hack around necko notifying observers that call out into other things and so on that end up spinning a nested XPCOM loop, and then the observer asserting when it's re-entered in the nested loop.  The same problem exists for single-process necko, does anyone know how it's avoided?  I don't.

The reason I suspect there's not a general IPDL-level solution is that some consumers need messages delivered in the nested context and some don't, and some might sometimes and not others.  Regardless, *not* delivering in a nested loop breaks in-order delivery, which is a fundamental guarantee.  I'm all for C++ helpers to make this problem easier, where it arises and a general helper makes sense, but let's not overgeneralize based on this instance.  We had a similar problem with plugin streams, and the solution we needed there was different.
(In reply to comment #56)
> Also, it's possible to do sync XHR in xpcshell, right?  Would be nice to have a
> test for this.

Seems so, once bug 564351 is fixed.
Calling this fixed for now.  I'll reproduce all important comments here in bug 584604.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: