Closed Bug 660774 Opened 13 years ago Closed 13 years ago

e10s necko: refactor channelEventQueue to allow async resume/flush

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla7

People

(Reporter: jduell.mcbugs, Assigned: jduell.mcbugs)

References

Details

Attachments

(3 files)

Attached patch Simple fix.Splinter Review
In bug 637339 we've moved mCallonResume and FlushEventQueue() to an asyc callback (instead of right within Resume()) so that we don't flush the queue (and call client callbacks) before the client's Resume call completes. Alas, this breaks the existing, rather hack-y ChannelEventQueue logic. In particular, if the suspend happens within OnStartReq, etc., we wind up hitting FlushEventQueue (when OnStartReq's AutoEventEnqueuer goes out of scope), and that barfs on the mQueuePhase==PHASE_FINISHED_QUEUEING we set in Resume(). If we remove the error check and allow PHASE_FINISHED_QUEUEING the queue will get flushed before we get a chance to run mCallOnResume, which would be out of order. I suppose we might be able to fix that by adding another state to mQueuePhase, or possibly even (now that I think about it) by setting mQueuePhase=PHASE_QUEUEING in HttpChannelChild::Resume(), plus removing the NS_ABORT_IF_FALSE(mQueuePhase != PHASE_UNQUEUED) in FlushEventQueue(). But in general I'm finding the ChannelEventQueue logic to be too fragile and unintuitive: for example, when suspended we wind up queueing messages while state=PHASE_UNQUEUED. For this bug I offer two approaches--one quick fix, and one more extensive refactoring. I think the second is better. Note: both of these patches are built on top of the patches in bug 637339, but you don't really need to understand them at all, other than than HttpChannelChild::Resume now launches an async "CompleteResume()" method, which is where we want to actually resume the EventQueue. Short term fix: I've built in formal Suspend/Resume methods into ChannelEventQueue. This allows us to delay resuming the ChannelEventQueue until we get to CompleteResume(). It also cleans up the caller code a bit (I could also remove all the template logic from ChannelEventQueue, but I haven't bothered to here).
Attachment #536259 - Flags: review?(josh)
2nd, more ambituous patch. Follow on the first patch, but change ChannelEventQueue from using a single state variable and instead check for the various conditions (suspended, in a 'critical section', flushing) that we've been fudging with an enum. I think this makes the logic a lot easier to follow. Also convert ChannelEventQueue to be a non-template class, and make it a member of channels, rather than a base class.
Attachment #536260 - Flags: review?(josh)
Blocks: 637339
Comment on attachment 536260 [details] [diff] [review] More extensive refactoring (applies on top of simple fix) I really like the way the logic is presented here; it's significantly easier to follow. My comments are all superficial things. >+#include <nsIChannel.h> I think this can be forward-declared. >+ bool mForced; > bool mSuspended; >+ bool mFlushing; These could be bitfields. I don't feel strongly one way or the other, though. >+ NS_ASSERTION(!( (answer == false) && !mEventQueue.IsEmpty()), >+ "Should always enqueue if ChannelEventQueue not empty"); This hurts every time I think about it. Can we apply DeMorgan's laws and get |answer == true || mEventQueue.IsEmpty()| instead? >+ NS_ASSERTION(!mSuspended, NS_ABORT_IF_FALSE >+ NS_ASSERTION(mSuspended, NS_ABORT_IF_FALSE >+ NS_ASSERTION(!mForced, "MaybeFlushQueue called inside critical section"); NS_ABORT_IF_FALSE >+ mEventQ.Resume(); // TODO: make this async: see HttpChannelChild::Resume Get a bug filed on that?
Attachment #536260 - Flags: review?(josh) → review+
Attachment #536259 - Flags: review?(josh)
However, please add the test changes from the first patch into the second one.
Actually ignore comment 3, I can't read.
Attachment #536259 - Flags: review+
> >+#include <nsIChannel.h> > > I think this can be forward-declared. Done. We'll slow global warming down a little from that, except that I suspect all the code that includes this file will eventually include nsIChannel.h too. > These could be bitfields. I don't feel strongly one way or the other, though. Not going to bother with bitfields. If we had 12+ bools, then sure. > >+ NS_ASSERTION(!( (answer == false) && !mEventQueue.IsEmpty()), > >+ "Should always enqueue if ChannelEventQueue not empty"); > > This hurts every time I think about it. Can we apply DeMorgan's laws and get > |answer == true || mEventQueue.IsEmpty()| instead? I like it. Sold! > >+ NS_ASSERTION(!mSuspended, > > NS_ABORT_IF_FALSE What do you like about NS_ABORT so much? You want to crash in the field, not just get orange on tbpl? > >+ mEventQ.Resume(); // TODO: make this async: see HttpChannelChild::Resume > > Get a bug filed on that? Yes--bug 657076. Pushed to try...
Well, I conceptually like NS_ABORT for things we're asserting that are meant to be actual invariants. The cases I highlighted fill that role in my mind; if you disagree then I'll defer to you.
Meh--causing test_reentrancy_wrap.js to fail. Will look into it...
With regards to asserting, want to compromise on using MOZ_ASSERT from mfbt/Util.h to get hard assertions in debug and nothing in release?
MOZ_ASSERT is so much shorter to spell than NS_ABORT_IF_FALSE--that's great! Is there any other semantic difference? (both seem to drop dead if debug, else nothing: MOZ_ASSERT doesn't let you set a msg). Alas, MOZ_ASSERT currently ties into JS, and while we no longer "really" care about keeping necko independent from other parts of the browser, this seems like a silly place to start breaking that interface. So I'll stick with NS_ABORT_IF_FALSE.
MOZ_ASSERT is a nop in release builds. And while it technically ties into the JS engine right now, that's going to change in the nearish future, once mfbt/ turns into a real library of its own right.
test_reentrancy_wrap.js shows that it is possible to call MaybeFlushQueue from within a critical section (for instance, if you're in OnStartRequest for a while and something outside suspends/resumes you). Fixed by checking for !mForced. Pushed to try.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla7
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: