The default bug view has changed. See this FAQ.

e10s necko: refactor channelEventQueue to allow async resume/flush

RESOLVED FIXED in mozilla7

Status

()

Core
Networking
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jduell, Assigned: jduell)

Tracking

Trunk
mozilla7
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

Created attachment 536259 [details] [diff] [review]
Simple fix.

In bug 637339 we've moved mCallonResume and FlushEventQueue() to an asyc callback (instead of right within Resume()) so that we don't flush the queue (and call client callbacks) before the client's Resume call completes.  Alas, this breaks the existing, rather hack-y ChannelEventQueue logic.

In particular, if the suspend happens within OnStartReq, etc., we wind up hitting FlushEventQueue (when OnStartReq's AutoEventEnqueuer goes out of scope), and that barfs on the mQueuePhase==PHASE_FINISHED_QUEUEING we set in Resume().  If we remove the error check and allow PHASE_FINISHED_QUEUEING the queue will get flushed before we get a chance to run mCallOnResume, which would be out of order.  I suppose we might be able to fix that by adding another state to mQueuePhase, or possibly even (now that I think about it) by setting mQueuePhase=PHASE_QUEUEING in HttpChannelChild::Resume(), plus removing the NS_ABORT_IF_FALSE(mQueuePhase != PHASE_UNQUEUED) in FlushEventQueue().

But in general I'm finding the ChannelEventQueue logic to be too fragile and unintuitive:  for example, when suspended we wind up queueing messages while state=PHASE_UNQUEUED.  For this bug I offer two approaches--one quick fix, and one more extensive refactoring.  I think the second is better.

Note: both of these patches are built on top of the patches in bug 637339, but you don't really need to understand them at all, other than than HttpChannelChild::Resume now launches an async "CompleteResume()" method, which is where we want to actually resume the EventQueue.

Short term fix: I've built in formal Suspend/Resume methods into ChannelEventQueue.  This allows us to delay resuming the ChannelEventQueue until we get to CompleteResume().  It also cleans up the caller code a bit (I could also remove all the template logic from ChannelEventQueue, but I haven't bothered to here).
Attachment #536259 - Flags: review?(josh)
(Assignee)

Comment 1

6 years ago
Created attachment 536260 [details] [diff] [review]
More extensive refactoring (applies on top of simple fix)

2nd, more ambituous patch.  Follow on the first patch, but change ChannelEventQueue from using a single state variable and instead check for the various conditions (suspended, in a 'critical section', flushing) that we've been fudging with an enum.  I think this makes the logic a lot easier to follow.

Also convert ChannelEventQueue to be a non-template class, and make it a member of channels, rather than a base class.
Attachment #536260 - Flags: review?(josh)
(Assignee)

Updated

6 years ago
Blocks: 637339

Comment 2

6 years ago
Comment on attachment 536260 [details] [diff] [review]
More extensive refactoring (applies on top of simple fix)

I really like the way the logic is presented here; it's significantly easier to follow. My comments are all superficial things.

>+#include <nsIChannel.h>

I think this can be forward-declared.

>+  bool mForced;
>   bool mSuspended;
>+  bool mFlushing;

These could be bitfields. I don't feel strongly one way or the other, though.

>+  NS_ASSERTION(!( (answer == false) && !mEventQueue.IsEmpty()),
>+               "Should always enqueue if ChannelEventQueue not empty");

This hurts every time I think about it. Can we apply DeMorgan's laws and get |answer == true || mEventQueue.IsEmpty()| instead?

>+  NS_ASSERTION(!mSuspended,

NS_ABORT_IF_FALSE

>+  NS_ASSERTION(mSuspended,

NS_ABORT_IF_FALSE

>+  NS_ASSERTION(!mForced, "MaybeFlushQueue called inside critical section");

NS_ABORT_IF_FALSE

>+    mEventQ.Resume();    // TODO: make this async: see HttpChannelChild::Resume

Get a bug filed on that?
Attachment #536260 - Flags: review?(josh) → review+

Updated

6 years ago
Attachment #536259 - Flags: review?(josh)

Comment 3

6 years ago
However, please add the test changes from the first patch into the second one.

Comment 4

6 years ago
Actually ignore comment 3, I can't read.

Updated

6 years ago
Attachment #536259 - Flags: review+
(Assignee)

Comment 5

6 years ago
> >+#include <nsIChannel.h>
> 
> I think this can be forward-declared.

Done. We'll slow global warming down a little from that, except that I suspect all the code that includes this file will eventually include nsIChannel.h too.

 
> These could be bitfields. I don't feel strongly one way or the other, though.

Not going to bother with bitfields. If we had 12+ bools, then sure.


> >+  NS_ASSERTION(!( (answer == false) && !mEventQueue.IsEmpty()),
> >+               "Should always enqueue if ChannelEventQueue not empty");
> 
> This hurts every time I think about it. Can we apply DeMorgan's laws and get
> |answer == true || mEventQueue.IsEmpty()| instead?

I like it.  Sold!

> >+  NS_ASSERTION(!mSuspended,
> 
> NS_ABORT_IF_FALSE

What do you like about NS_ABORT so much?  You want to crash in the field, not just get orange on tbpl?

> >+    mEventQ.Resume();    // TODO: make this async: see HttpChannelChild::Resume
> 
> Get a bug filed on that?

Yes--bug 657076.

Pushed to try...

Comment 6

6 years ago
Well, I conceptually like NS_ABORT for things we're asserting that are meant to be actual invariants. The cases I highlighted fill that role in my mind; if you disagree then I'll defer to you.
(Assignee)

Comment 7

6 years ago
Meh--causing test_reentrancy_wrap.js to fail.  Will look into it...

Comment 8

6 years ago
With regards to asserting, want to compromise on using MOZ_ASSERT from mfbt/Util.h to get hard assertions in debug and nothing in release?
(Assignee)

Comment 9

6 years ago
MOZ_ASSERT is so much shorter to spell than NS_ABORT_IF_FALSE--that's great!  Is there any other semantic difference?  (both seem to drop dead if debug, else nothing: MOZ_ASSERT doesn't let you set a msg). 

Alas, MOZ_ASSERT currently ties into JS, and while we no longer "really" care about keeping necko independent from other parts of the browser, this seems like a silly place to start breaking that interface.  So I'll stick with NS_ABORT_IF_FALSE.
MOZ_ASSERT is a nop in release builds. And while it technically ties into the JS engine right now, that's going to change in the nearish future, once mfbt/ turns into a real library of its own right.
(Assignee)

Comment 11

6 years ago
Created attachment 538715 [details] [diff] [review]
fixes issue with test_rentrancy_wrap.js

test_reentrancy_wrap.js shows that it is possible to call MaybeFlushQueue from within a critical section (for instance, if you're in OnStartRequest for a while and something outside suspends/resumes you).   Fixed by checking for !mForced.

Pushed to try.
(Assignee)

Comment 12

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/4a33bc1f772d
Whiteboard: [inbound]
Pushed:
http://hg.mozilla.org/mozilla-central/rev/4a33bc1f772d
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla7
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.