Closed Bug 536321 Opened 15 years ago Closed 14 years ago

e10s HTTP: suspend/resume

Categories

(Core :: Networking: HTTP, defect)

Other Branch
x86
Windows CE
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0a1+ ---

People

(Reporter: jduell.mcbugs, Assigned: jdm)

References

Details

Attachments

(2 files, 10 obsolete files)

Not sure what's involved in HTTP suspend/resume for e10s, and it's not as high a priority as many other things, but we'll need it to work eventually, and presumably before fennec ships.
I'm not sure that's true. Do we use suspend/resume for anything other than downloads? If so, I think the chrome process can control that activity and there's little/nothing to remote.
biesi noted to me that if we do need suspend/resume, we can get it "for free" if in RecvOnDataAvailable() we use a pipe and an inputStreamPump instead of an nsStringInputStream.   But this will force us to make a copy of the data, which would be nice to avoid.
I believe roc's post clears up how suspending/resuming is used for the media cache: http://weblogs.mozillazine.org/roc/archives/2009/04/media_cache.html
Attached patch Bare minimum (obsolete) — Splinter Review
Here's a completely untested patch.  I've left the Suspend/Resume messages as async, but if we wanted to return the actual status of the operation then they could become synchronous.
Assignee: nobody → josh
This patch introduces the channel as a weak reference.  jduell says that we'll probably want a strong reference, which exists in the patch in bug 536316.
Depends on: 536316
Attached patch Bare minimum v2 (obsolete) — Splinter Review
Still untested, but more likely to be a final candidate for review.  I suspect we can leave the methods as async, but it's a pretty easy change to make if I'm incorrect.
Attachment #437734 - Attachment is obsolete: true
> untested, but more likely to be a final candidate for review.

It's even *more* likely to be a final candidate for review once it's tested :)

If you're lucky, test_netwerk/test/unit/resumable_channel.js will work in e10s if we write a test_resumable_channel_wrap.js in netwerk/test/unit_ipc.

Running tests builds character!  Just like wandering in the outback without a compass does--but less risky.

If you're unfamiliar with xpcshell, may I suggest (just updated for e10s!):

  https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests
Attached patch Patch (obsolete) — Splinter Review
All tests in test_resumable_channel.js pass, except for the various authentication ones.  Those will require a little bit of reworking to keep them in the parent, but authentication itself is currently busted on e10s, so I haven't bothered to put in the work at this time.

One thing to note is that the parent channel is now created in the constructor, rather than RecvAsyncOpen.  This is necessary so that calls like ResumeAt can actually be processed before AsyncOpen, and also stymies combining the constructor and RecvAsyncOpen into one message.
Attachment #442406 - Attachment is obsolete: true
Attachment #442906 - Flags: review?(jduell.mcbugs)
Blocks: 558623
Depends on: 537782
Mmm, just noticed that I forgot to hg add the wrapper test.  I'll get around to that at some point.
Comment on attachment 442906 [details] [diff] [review]
Patch

Cancelling review request until I make the changes discussed on IRC.
Attachment #442906 - Flags: review?(jduell.mcbugs)
Attached patch Patch (obsolete) — Splinter Review
Changes made, tests pass.
Attachment #442906 - Attachment is obsolete: true
Attachment #444058 - Flags: review?(jduell.mcbugs)
Comment on attachment 444058 [details] [diff] [review]
Patch

I've got some minor corrections, but also I don't see this patch dealing with a central issue:  between the time when the client calls Suspend on the child, and when chrome actually receives the msg and suspends the nsHttpChannel, there may be an arbitrary number of OnDataAvailable msgs that may have been put in flight.  We need to check in RecvOnDataAvail to see if we're suspended, and if so, store any arriving msgs and deliver them only when Resume has been called.

Given that and the bug in HttpChannelChild::Resume, I don't see how the tests could possibly have succeeded in a meaningful way.  If they did, we need better tests.

Note that your patch appears to be off a tree that has the patch for bug 559200 applies, so it doesn't apply cleanly to a clean e10s tree.


>diff --git a/netwerk/protocol/http/src/HttpChannelChild.cpp b/netwerk/protocol/http/src/HttpChannelChild.cpp
>--- a/netwerk/protocol/http/src/HttpChannelChild.cpp
>+++ b/netwerk/protocol/http/src/HttpChannelChild.cpp
>@@ -65,16 +65,18 @@ class Callback : public nsRunnable
> namespace mozilla {
> namespace net {
>   
> // C++ file contents
> HttpChannelChild::HttpChannelChild()
>   : mState(HCC_NEW)
>   , mShouldBuffer(true)
>   , mBufferLock("mozilla.net.HttpChannelChild.mBufferLock")
>+  , mSendResumeAt(false)
>+  , mStartPos(0)

mStartPos and mEntity should be moved to HttpBaseChannel, since they're shared with nsHttpChannel.  Note that nsHttpChannel inits mStartPos with LL_MAXUINT--I'm not sure why, but it's probably advisable to use that value unless you know a reason otherwise.


> NS_IMETHODIMP
> HttpChannelChild::Suspend()
> {
>-  DROP_DEAD();
>+  SendSuspend();
>+  return NS_OK;
> }
> 
> NS_IMETHODIMP
> HttpChannelChild::Resume()
> {
>-  DROP_DEAD();
>+  SendSuspend();
>+  return NS_OK;
> }

Um, you're sending SendSuspend() for *both* Suspend and Resume!!!  And you're telling me all the tests passed?   Have a closer look at how that could possibly be true :)


> NS_IMETHODIMP
> HttpChannelChild::ResumeAt(PRUint64 startPos, const nsACString& entityID)
> {
>-  DROP_DEAD();
>+  if (!mWasOpened) {
>+    mStartPos = startPos;
>+    mEntityID = entityID;
>+    mSendResumeAt = true;
>+    return NS_OK;
>+  }
>+  SendResumeAt(startPos, nsCString(entityID));
>+  return NS_OK;
> }

Don't call SendResumeAt here, and get rid of the msg from IPDL.  The nsIResumableChannel.idl states clearly that "Calling resumeAt after open or asyncOpen has been called has undefined behaviour", i.e. we don't need to support it.  So throw in a ENSURE_CALLED_BEFORE_ASYNC_OPEN here.


> NS_IMETHODIMP
> HttpChannelChild::GetEntityID(nsACString& aEntityID)
> {
>-  DROP_DEAD();
>+  nsresult rv;
>+  nsCString entityID;
>+  SendGetEntityID(&rv, &entityID);
>+  if (NS_SUCCEEDED(rv)) {
>+    aEntityID = entityID;
>+  }
>+  return rv;
> }

I don't believe you need IPDL traffic for this either.  It looks to me like you can simply move nsHttpChannel's implemention into HttpBaseChannel and share it.


>diff --git a/netwerk/protocol/http/src/HttpChannelParent.cpp b/netwerk/protocol/http/src/HttpChannelParent.cpp

>+bool
>+HttpChannelParent::RecvResumeAt(const PRUint64& startPos,
>+                                const nsCString& entityID)
>+{
>+  nsCOMPtr<nsIResumableChannel> resumable = do_QueryInterface(mChannel);
>+  resumable->ResumeAt(startPos, entityID);
>+  return true;
>+}

So this method should go away, but FYI, the new preferred method of accessing the channel is to cast it to an nsHttpChannel, like so:

    nsHttpChannel *httpChan = static_cast<nsHttpChannel *>(mChannel.get());

It's faster than XPCOM QI, and you only have to do it once, since you can call any function from any interface on an nsHttpChannel ptr.   This breaks the old contract necko had that allowed extensions/add-ons to do things like replace the C++ nsHttpChannel with custom JS code, but we don't care about supporting that nay more.
Attachment #444058 - Flags: review?(jduell.mcbugs) → review-
Attached patch Patch (obsolete) — Splinter Review
The various concerns should have been addressed here.  I was able to lift the buffering mechanism from the necko/RPC bug wholesale, and I've written a single test so far.  I'd particularly like feedback as to whether there's a better way to test suspending and resuming; currently this actually ends up triggering a warning in the parent because the request is already complete by the time the Suspend call propagates up, but it shows that the child buffering code is working correctly.
Attachment #444058 - Attachment is obsolete: true
Attachment #448463 - Flags: feedback?(jduell.mcbugs)
Attachment #448463 - Flags: feedback?(jduell.mcbugs) → review?(jduell.mcbugs)
Attachment #448463 - Flags: review?(honzab.moz)
Attachment #448463 - Flags: review?(jduell.mcbugs) → review?(dwitte)
This patch will need to be updated to work in conjunction with the new message buffering changes, so review can wait.
Attached patch Patch (obsolete) — Splinter Review
Updated for correctness.
Attachment #448463 - Attachment is obsolete: true
Attachment #461992 - Flags: review?(dwitte)
Attachment #448463 - Flags: review?(honzab.moz)
Attachment #448463 - Flags: review?(dwitte)
The new patch needs to be applied after bug 559200.
Attached patch Patch (obsolete) — Splinter Review
Fixed an incorrect setTimeout reference.  The auth tests in test_resumable_channel_wrap.js are still failing for some reason I am unable to discern.
Attachment #461992 - Attachment is obsolete: true
Attachment #461995 - Flags: review?(dwitte)
Attachment #461992 - Flags: review?(dwitte)
tracking-fennec: --- → 2.0a1+
Attached patch Patch (obsolete) — Splinter Review
Rebased to most recent necko message buffering changes in bug 584604.
Attachment #461995 - Attachment is obsolete: true
Attachment #463432 - Flags: review?(dwitte)
Attachment #461995 - Flags: review?(dwitte)
Attached patch Patch (obsolete) — Splinter Review
A few mistakes slipped by my finely-attuned senses.
Attachment #463432 - Attachment is obsolete: true
Attachment #463434 - Flags: review?(dwitte)
Attachment #463432 - Flags: review?(dwitte)
Thinking about it, there could still be a problem with the most recent patch if the listener's callback contained |request.suspend(); request.resume()|, as the buffer would be unconditionally flushed before the current handler was finished executing.  I suppose I need to introduce another early return into FlushBuffer to check if mBufferPhase == PHASE_FLUSHING_BUFFER.  I'll hopefully fix that up tomorrow.
OS: Linux → Windows CE
Attached patch Patch (obsolete) — Splinter Review
I feel pretty good about this one.  It helps that the tests are actually working now.  The FlushBuffer logic in particular has gone through the wringer, and I think it's finally correct.
Attachment #463434 - Attachment is obsolete: true
Attachment #463651 - Flags: review?(dwitte)
Attachment #463434 - Flags: review?(dwitte)
Depends on: 584604
Comment on attachment 463651 [details] [diff] [review]
Patch

>diff --git a/netwerk/protocol/http/HttpChannelChild.cpp b/netwerk/protocol/http/HttpChannelChild.cpp

>@@ -131,48 +135,61 @@ HttpChannelChild::BeginIPDLBuffering()
> 
> bool
> HttpChannelChild::FlushBuffer()
> {
>   NS_ABORT_IF_FALSE(mBufferPhase != PHASE_UNBUFFERED,
>                     "Buffer flushing should only ever occur while buffering.");
>   
>   // We do not want to attempt to flush the buffer while it is
>-  // in the process of being flushed.
>-  if (mBufferPhase != PHASE_BUFFERED)
>+  // in the process of being flushed, or if the channel is suspended.
>+  if (mBufferPhase != PHASE_BUFFERED || mSuspended)
>     return true;
>   
>   // It is possible for new callbacks to be enqueued as we are
>   // flushing the buffer, so the buffer must not be cleared until
>   // all callbacks have run.
>   mBufferPhase = PHASE_FLUSHING_BUFFER;
>   
>   bool ret = true;
>   nsCOMPtr<nsIHttpChannel> kungFuDeathGrip(this);

FWIW you can 'nsRefPtr<HttpChannelChild>' here, and avoid a QI.

>-  for (PRUint32 i = 0; i < mBufferedCallbacks.Length(); i++) {
>+  PRUint32 i;
>+  for (i = 0; i < mBufferedCallbacks.Length(); i++) {
>     ret = mBufferedCallbacks[i]->Run();
>-    if (!ret)
>+    if (!ret) {
>+      // We want to clear all callbacks in the case of failure,
>+      // even those not run.
>+      i = mBufferedCallbacks.Length();
>+      break;

Why?

>+    }
>+    // If the callback ended up suspending us, abort all further flushing.
>+    if (mSuspended)
>       break;
>   }
>-  mBufferedCallbacks.Clear();
>+
>+  // We will always want to remove at least one finished callback.
>+  if (i < mBufferedCallbacks.Length())
>+    i++;
>+  
>+  mBufferedCallbacks.RemoveElementsAt(0, i);
>   mBufferPhase = PHASE_UNBUFFERED;

So if the callback suspended us, we go into PHASE_UNBUFFERED. Is that the right thing to do? At casual glance I'd think we'd want PHASE_BUFFERED && mSuspended, since we still have pending things to flush. And it looks like Resume() calls FlushBuffer(), which will explodily abort...

Do you have a test for that?

>diff --git a/netwerk/protocol/http/HttpChannelChild.h b/netwerk/protocol/http/HttpChannelChild.h

>@@ -137,22 +137,27 @@ protected:
> private:
>   RequestHeaderTuples mRequestHeaders;
> 
>   PRPackedBool mIsFromCache;
>   PRPackedBool mCacheEntryAvailable;
>   PRUint32     mCacheExpirationTime;
>   nsCString    mCachedCharset;
> 
>+  bool mSendResumeAt;
>+  bool mSuspended;

Comment what these mean?

>diff --git a/netwerk/protocol/http/HttpChannelParent.cpp b/netwerk/protocol/http/HttpChannelParent.cpp

>@@ -100,17 +100,20 @@ HttpChannelParent::RecvAsyncOpen(const I
>                                  const PRUint32&            loadFlags,
>                                  const RequestHeaderTuples& requestHeaders,
>                                  const nsHttpAtom&          requestMethod,
>                                  const nsCString&           uploadStreamData,
>                                  const PRInt32&             uploadStreamInfo,
>                                  const PRUint16&            priority,
>                                  const PRUint8&             redirectionLimit,
>                                  const PRBool&              allowPipelining,
>-                                 const PRBool&              forceAllowThirdPartyCookie)
>+                                 const PRBool&              forceAllowThirdPartyCookie,
>+                                 const bool&                doResumeAt,

Can you make the bool/PRBools consistent? I prefer bool. :)

>diff --git a/netwerk/test/unit/head_channels.js b/netwerk/test/unit/head_channels.js

>@@ -45,58 +49,79 @@ function ChannelListener(closure, ctx, f

>   onDataAvailable: function(request, context, stream, offset, count) {
>     try {
>+      let current = new Date().getTime();

What's the difference between Date.now() and Date().getTime()?

>+
>       if (!this._got_onstartrequest)
>         do_throw("onDataAvailable without onStartRequest event!");
>       if (this._got_onstoprequest)
>         do_throw("onDataAvailable after onStopRequest event!");
>       if (!request.isPending())
>         do_throw("request reports itself as not pending from onDataAvailable!");
>       if (this._flags & CL_EXPECT_FAILURE)
>         do_throw("Got data despite expecting a failure");
> 
>+      if (current - this._lastEvent >= SUSPEND_DELAY &&
>+          !(this._flags & CL_EXPECT_3S_DELAY))
>+       do_throw("Data received after significant unexpected delay");
>+      else if (current - this._lastEvent < SUSPEND_DELAY &&
>+               this._flags & CL_EXPECT_3S_DELAY)
>+        do_throw("Data received sooner than expected");

Is this reliable? Does do_timeout() guarantee that the callback is executed after >= specified delay? (I'd assume so, but just checking.)

It'll be pretty clear if we get random orange from this, so in any case, nevermind.

>diff --git a/netwerk/test/unit/test_httpsuspend.js b/netwerk/test/unit/test_httpsuspend.js

>@@ -0,0 +1,64 @@

>+  onStartRequest: function(request, ctx) {
>+    this._lastEvent = Date.now();
>+    request.QueryInterface(Ci.nsIRequest);

Does this do anything? I thought you have to assign the result into something...

>+function run_test() {
>+  httpserv = new nsHttpServer();
>+  httpserv.registerPathHandler("/woo", data);
>+  httpserv.start(4444);
>+
>+  var chan = makeChan("http://localhost:4444/woo");
>+  chan.QueryInterface(Ci.nsIRequest);

Same here?

>diff --git a/netwerk/test/unit/test_resumable_channel.js b/netwerk/test/unit/test_resumable_channel.js

>   function success(request, data, ctx) {
>     dump("*** success()\n");
>     do_check_true(request.nsIHttpChannel.requestSucceeded);
>     do_check_eq(data, rangeBody);
> 
>     // Authentication (no password; working resume)
>     // (should not give us any data)
>+    /*var chan = make_channel("http://localhost:4444/range");
>+    chan.nsIResumableChannel.resumeAt(1, entityID);
>+    chan.nsIHttpChannel.setRequestHeader("X-Need-Auth", "true", false);
>+    chan.asyncOpen(new ChannelListener(test_auth_nopw, null, CL_EXPECT_FAILURE), null);*/
>+
>+    // XXX skip all authentication tests for now, as they're busted on e10s (bug 537782)

Move this XXX above the code you've commented out?

r=dwitte!
Attachment #463651 - Flags: review?(dwitte) → review+
I'm attaching a fixed-up patch shortly.

(In reply to comment #23)
> Comment on attachment 463651 [details] [diff] [review]
> Patch
> 
> >diff --git a/netwerk/protocol/http/HttpChannelChild.cpp b/netwerk/protocol/http/HttpChannelChild.cpp
> 
> >-  for (PRUint32 i = 0; i < mBufferedCallbacks.Length(); i++) {
> >+  PRUint32 i;
> >+  for (i = 0; i < mBufferedCallbacks.Length(); i++) {
> >     ret = mBufferedCallbacks[i]->Run();
> >-    if (!ret)
> >+    if (!ret) {
> >+      // We want to clear all callbacks in the case of failure,
> >+      // even those not run.
> >+      i = mBufferedCallbacks.Length();
> >+      break;
> 
> Why?

Good point.  I've now merged this condition with suspending, so we remove any callbacks run in both cases.
 
> >+    }
> >+    // If the callback ended up suspending us, abort all further flushing.
> >+    if (mSuspended)
> >       break;
> >   }
> >-  mBufferedCallbacks.Clear();
> >+
> >+  // We will always want to remove at least one finished callback.
> >+  if (i < mBufferedCallbacks.Length())
> >+    i++;
> >+  
> >+  mBufferedCallbacks.RemoveElementsAt(0, i);
> >   mBufferPhase = PHASE_UNBUFFERED;
> 
> So if the callback suspended us, we go into PHASE_UNBUFFERED. Is that the right
> thing to do? At casual glance I'd think we'd want PHASE_BUFFERED && mSuspended,
> since we still have pending things to flush. And it looks like Resume() calls
> FlushBuffer(), which will explodily abort...
> 
> Do you have a test for that?

Good catch.  I thought I was testing this, but realized I wasn't.  I've added a new test for necko reentrancy in general, with a specific test for suspending, running a sync xhr, then resuming, and it did indeed explode here.  I've fixed this up and am marking this bug as depending on bug 564351.

> >diff --git a/netwerk/protocol/http/HttpChannelParent.cpp b/netwerk/protocol/http/HttpChannelParent.cpp
> 
> >@@ -100,17 +100,20 @@ HttpChannelParent::RecvAsyncOpen(const I
> >                                  const PRUint32&            loadFlags,
> >                                  const RequestHeaderTuples& requestHeaders,
> >                                  const nsHttpAtom&          requestMethod,
> >                                  const nsCString&           uploadStreamData,
> >                                  const PRInt32&             uploadStreamInfo,
> >                                  const PRUint16&            priority,
> >                                  const PRUint8&             redirectionLimit,
> >                                  const PRBool&              allowPipelining,
> >-                                 const PRBool&              forceAllowThirdPartyCookie)
> >+                                 const PRBool&              forceAllowThirdPartyCookie,
> >+                                 const bool&                doResumeAt,
> 
> Can you make the bool/PRBools consistent? I prefer bool. :)

Could I create a follow-up for this?
 
> >diff --git a/netwerk/test/unit/head_channels.js b/netwerk/test/unit/head_channels.js
> 
> >@@ -45,58 +49,79 @@ function ChannelListener(closure, ctx, f
> 
> >   onDataAvailable: function(request, context, stream, offset, count) {
> >     try {
> >+      let current = new Date().getTime();
> 
> What's the difference between Date.now() and Date().getTime()?

Nothing at all.  I've switched everything to Date.now().

> >+
> >       if (!this._got_onstartrequest)
> >         do_throw("onDataAvailable without onStartRequest event!");
> >       if (this._got_onstoprequest)
> >         do_throw("onDataAvailable after onStopRequest event!");
> >       if (!request.isPending())
> >         do_throw("request reports itself as not pending from onDataAvailable!");
> >       if (this._flags & CL_EXPECT_FAILURE)
> >         do_throw("Got data despite expecting a failure");
> > 
> >+      if (current - this._lastEvent >= SUSPEND_DELAY &&
> >+          !(this._flags & CL_EXPECT_3S_DELAY))
> >+       do_throw("Data received after significant unexpected delay");
> >+      else if (current - this._lastEvent < SUSPEND_DELAY &&
> >+               this._flags & CL_EXPECT_3S_DELAY)
> >+        do_throw("Data received sooner than expected");
> 
> Is this reliable? Does do_timeout() guarantee that the callback is executed
> after >= specified delay? (I'd assume so, but just checking.)
> 
> It'll be pretty clear if we get random orange from this, so in any case,
> nevermind.

This is a concern that Waldo had, as do_timeout does not, in fact, guarantee this at all.  httpd.js has a mechanism which fixes this (callLater, I believe), but there's no easy way to share that without duplicating the code.  I'd be all for fixing do_timeout, but in another bug?

> >diff --git a/netwerk/test/unit/test_httpsuspend.js b/netwerk/test/unit/test_httpsuspend.js
> 
> >@@ -0,0 +1,64 @@
> 
> >+  onStartRequest: function(request, ctx) {
> >+    this._lastEvent = Date.now();
> >+    request.QueryInterface(Ci.nsIRequest);
> 
> Does this do anything? I thought you have to assign the result into
> something...

It does work, actually.  And the test breaks if I comment out the line, so it's not coincidence.
Depends on: 564351
Attached patch PatchSplinter Review
Updated with new tests to ensure working in all conceivable situations.  I'm hoping interdiff will work here to make the review easier.
Attachment #463651 - Attachment is obsolete: true
Attachment #464662 - Flags: review?(dwitte)
Comment on attachment 464662 [details] [diff] [review]
Patch

r=dwitte!
Attachment #464662 - Flags: review?(dwitte) → review+
Attached patch Rebased patchSplinter Review
Rebased, all tests pass.
re-re-re-re-based and landed on e10s:

http://hg.mozilla.org/projects/electrolysis/rev/6d4660297eab

Will get merged to m-c soon
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 586238
Blocks: 586855
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: