Consolidate the chromium and xpcom event queues

RESOLVED FIXED in Firefox 49

Status

()

Core
XPCOM
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: khuey, Assigned: khuey)

Tracking

(Blocks: 1 bug)

unspecified
mozilla49
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49 fixed)

Details

Attachments

(3 attachments)

Comment hidden (empty)
Created attachment 8747384 [details] [diff] [review]
Part 1: Implement an nsIEventTarget equivalent for PostDelayedTask

This may not be good enough, but it's simplest, so I'd like to start with this.

The semantics of the chromium PostDelayedTask are:

1. We get a PostTask(W), PostDelayedTask(X, t), PostTask(Y), and PostTask(Z) in that order.  Since there is a lock involved the order is well defined regardless of which threads these come from.
2. That results in W, X, Y, and Z in the event queue in that order.
3. W is executed.
4. Since we executed a regular event, we check the delayed work queue and it is empty.
5. X is set aside into the delayed work queue.
6. Since we didn't execute a regular event, we *don't* check the delayed work queue.
7. Y is executed.
8. Since we executed a regular event, we check the delayed work queue. If the delay specified by t has expired, we execute X, otherwise we do nothing.
9. Z is executed.
10. Since we executed a regular event, we check the delayed work queue. If the delay specified by t has expired, we execute X, otherwise, since there are no pending events in the regular queue, we set an nsITimer to wake up this thread to execute X at the right time.  (NB: We do different things with other Chromium message pumps, but on all threads that are nsThreads, we use nsITimer)

This is quite complicated and hopefully unnecessary.  Instead what I've done is:

1. DelayedDispatch(X, t) wraps X in another runnable and sets up an nsITimer to fire after delay t.  It is then put in the regular event queue.
2. If we Run() the wrapper before delay t has elapsed, it does nothing, and instead waits for the timer.
3. If we Run() the wrapper after delay t has elapsed and we have not already run X, we cancel the timer and run X.
4. If the timer fires and we have not already run X we run X.

NB: It is possible for us to race such that the timer fires first (consider an untimely context switch after calling InitWithCallback) in which case we will run X under #4 and then #3 will be a no-op.

This avoids having to have some sort of set-aside queue inside nsThread, so hopefully we can get away with it.
Attachment #8747384 - Flags: review?(nfroyd)
Created attachment 8747385 [details] [diff] [review]
Part 2: Consolidate event queues for non-main threads.

This adds a bit of infrastructure so Gecko's MessagePumps can have an nsIEventTarget pointer, and if said pointer exists, the chromium dispatch mechanism will dispatch directly into the nsIEventTarget.
Attachment #8747385 - Flags: review?(nfroyd)
Created attachment 8747386 [details] [diff] [review]
Part 3: And the main thread

I intend to land this a few days later to give problems with non-main threads a few days to shake out.
Attachment #8747386 - Flags: review?(nfroyd)
I've run this on try and haven't seen anything too crazy yet.  There may yet be fallout though.
Comment on attachment 8747384 [details] [diff] [review]
Part 1: Implement an nsIEventTarget equivalent for PostDelayedTask

Review of attachment 8747384 [details] [diff] [review]:
-----------------------------------------------------------------

Does the compositor run on an XPCOM thread or an IPC-style thread?  I see that some of the compositor calls pass a delay of 0ms, and I'm a bit surprised those don't blow up.  Ah, I guess this is bug 1259450.  Looking at the PostDelayedTask calls in tree, we have something like:

- dom/bluetooth/ calls, which wouldn't get tested on try;
- dom/system/gonk/ calls, likewise;
- dom/plugins/ipc/ calls, which miiiiight get tested;
- gfx/{layers,thebes}/ calls, which sound like they all use non-XPCOM threading;
- widget/android/ calls, which don't interact with the chromium event loop at all.

So it doesn't seem terribly surprising that this didn't break very much.

This all has subtly different semantics than the chromium code, right?  I hate having to remind myself how this works everytime I have to review code around here, but MessagePump::Run does something like (simplified):

1. Run an XPCOM event if one exists.
2. Run something from IPC.
3. Run an XPCOM event; wait if one doesn't.
4. Goto 1.

and the current PostDelayedTask will insert things into the IPC queue.  So without this code, we could have something like:

- PostDelayedTask to main thread event loop;
- Lots of XPCOM events inside the main thread event queue;
- posted task runs relatively quickly because we don't have that many delayed tasks, and because IPC tasks "jump over" the XPCOM event queue.

With this code, however, we now have:

- PostDelayedTask to main thread event loop;
- Lots of XPCOM events inside the main thread event queue;
- Timer fires due to event backlog
- Timer event posted to thread event queue (*not* run immediately)
- Lots more events run
- DelayedRunnable runs

So we had a much longer delay with the new scheme than the old scheme.  This is all possible on non-main threads as well IIUC, but I'm assuming we have shorter event queues there.

Does that seem plausible?  I'm not 100% convinced of the logic above because, well, it's our event queue, but it seems like we might be introducing latency in unappealing places.  Also seems like it's going to flush out some assumptions around event ordering.

I guess I'll r+ this, but I'd like to have a bit of discussion about the above stuff before landing, and especially try and figure out if this is going to affect plugins badly.

::: xpcom/threads/nsThread.cpp
@@ +239,5 @@
> +public:
> +  DelayedRunnable(already_AddRefed<nsIRunnable> aRunnable,
> +                  uint32_t aDelay)
> +    : mWrappedRunnable(aRunnable),
> +      mDelayedFrom(TimeStamp::NowLoRes()),

Some of the PostDelayedTask calls have delays shorter than the Windows default timer resolution; are you sure you want to use a low resolution timer here?  I'm not sure all of the current calls get routed through this code, but eventually we'll want to use this code for everything, right?
Attachment #8747384 - Flags: review?(nfroyd) → review+
Comment on attachment 8747385 [details] [diff] [review]
Part 2: Consolidate event queues for non-main threads.

Review of attachment 8747385 [details] [diff] [review]:
-----------------------------------------------------------------

I guess this is kind of reasonable.

::: ipc/chromium/src/base/message_loop.cc
@@ +281,5 @@
>  void MessageLoop::PostTask_Helper(already_AddRefed<Runnable> task, int delay_ms) {
> +  if (nsIEventTarget* target = pump_->GetXPCOMThread()) {
> +    nsresult rv;
> +    if (delay_ms) {
> +      rv = target->DelayedDispatch(Move(task), delay_ms);

We should DCHECK or MOZ_ASSERT that delay_ms > 0 here just to make sure we don't shoot ourselves in the foot.

::: ipc/glue/MessageChannel.h
@@ +484,5 @@
>      };
>  
>      // Wrap an existing task which can be cancelled at any time
>      // without the wrapper's knowledge.
> +    class DequeueTask : public CancelableRunnable

What is this here for?  Did we forget to move something to CancelableRunnable earlier?
Attachment #8747385 - Flags: review?(nfroyd) → review+
Comment on attachment 8747386 [details] [diff] [review]
Part 3: And the main thread

Review of attachment 8747386 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but the below needs some clarification.  Hate all the indirection in message_loop.cc and company.

::: ipc/glue/MessagePump.cpp
@@ +208,5 @@
> +  }
> +
> +  // Main thread
> +  nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> +  return mainThread;

Uh.  So if we don't have a thread, we're going to redirect everything to the main thread?  Or is this a "we created the main thread message loop too early to actually have a main thread to assign in mThread, so we know that if we don't have a thread, it's going to be the main thread"?  Or is this "MessagePump is implicitly only for the main thread, so this is OK"?  But then how was the previous patch for non-main thread (MessagePumpForNonMainThreads) only?

This feels like a comment or assert is warranted.
Attachment #8747386 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #5)
> Comment on attachment 8747384 [details] [diff] [review]
> Part 1: Implement an nsIEventTarget equivalent for PostDelayedTask
> 
> Review of attachment 8747384 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Does the compositor run on an XPCOM thread or an IPC-style thread?  I see
> that some of the compositor calls pass a delay of 0ms, and I'm a bit
> surprised those don't blow up.  Ah, I guess this is bug 1259450.  Looking at
> the PostDelayedTask calls in tree, we have something like:
> 
> - dom/bluetooth/ calls, which wouldn't get tested on try;
> - dom/system/gonk/ calls, likewise;
> - dom/plugins/ipc/ calls, which miiiiight get tested;
> - gfx/{layers,thebes}/ calls, which sound like they all use non-XPCOM
> threading;
> - widget/android/ calls, which don't interact with the chromium event loop
> at all.
> 
> So it doesn't seem terribly surprising that this didn't break very much.

This is actually a good question ... I should check and see who actually uses PostDelayedTask on an XPCOM thread now that b2g is dead.

> This all has subtly different semantics than the chromium code, right?  I
> hate having to remind myself how this works everytime I have to review code
> around here, but MessagePump::Run does something like (simplified):
> 
> 1. Run an XPCOM event if one exists.
> 2. Run something from IPC.
> 3. Run an XPCOM event; wait if one doesn't.
> 4. Goto 1.

s/IPC/chromium event queue/, but yeah.  Notably we always run one XPCOM event then one chromium event then one XPCOM event ad infinitum.

> and the current PostDelayedTask will insert things into the IPC queue.  So
> without this code, we could have something like:
> 
> - PostDelayedTask to main thread event loop;
> - Lots of XPCOM events inside the main thread event queue;
> - posted task runs relatively quickly because we don't have that many
> delayed tasks, and because IPC tasks "jump over" the XPCOM event queue.
>
> With this code, however, we now have:
> 
> - PostDelayedTask to main thread event loop;
> - Lots of XPCOM events inside the main thread event queue;
> - Timer fires due to event backlog
> - Timer event posted to thread event queue (*not* run immediately)
> - Lots more events run
> - DelayedRunnable runs
> 
> So we had a much longer delay with the new scheme than the old scheme.  This
> is all possible on non-main threads as well IIUC, but I'm assuming we have
> shorter event queues there.
> 
> Does that seem plausible?  I'm not 100% convinced of the logic above
> because, well, it's our event queue, but it seems like we might be
> introducing latency in unappealing places.  Also seems like it's going to
> flush out some assumptions around event ordering.

Yes ... did you read comment 1?  I listed the behaviors and changes I'm aware of there.

> I guess I'll r+ this, but I'd like to have a bit of discussion about the
> above stuff before landing, and especially try and figure out if this is
> going to affect plugins badly.
> 
> ::: xpcom/threads/nsThread.cpp
> @@ +239,5 @@
> > +public:
> > +  DelayedRunnable(already_AddRefed<nsIRunnable> aRunnable,
> > +                  uint32_t aDelay)
> > +    : mWrappedRunnable(aRunnable),
> > +      mDelayedFrom(TimeStamp::NowLoRes()),
> 
> Some of the PostDelayedTask calls have delays shorter than the Windows
> default timer resolution; are you sure you want to use a low resolution
> timer here?  I'm not sure all of the current calls get routed through this
> code, but eventually we'll want to use this code for everything, right?

The Chromium event loop today uses the low res timer (see http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/ipc/chromium/src/base/message_loop.cc#l284 and http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/ipc/chromium/src/base/time.h#l391) so I just preserved that.
(In reply to Nathan Froyd [:froydnj] from comment #6)
> Comment on attachment 8747385 [details] [diff] [review]
> ::: ipc/chromium/src/base/message_loop.cc
> @@ +281,5 @@
> >  void MessageLoop::PostTask_Helper(already_AddRefed<Runnable> task, int delay_ms) {
> > +  if (nsIEventTarget* target = pump_->GetXPCOMThread()) {
> > +    nsresult rv;
> > +    if (delay_ms) {
> > +      rv = target->DelayedDispatch(Move(task), delay_ms);
> 
> We should DCHECK or MOZ_ASSERT that delay_ms > 0 here just to make sure we
> don't shoot ourselves in the foot.

I don't really follow ... why should I assert something that is tautologically true?

> ::: ipc/glue/MessageChannel.h
> @@ +484,5 @@
> >      };
> >  
> >      // Wrap an existing task which can be cancelled at any time
> >      // without the wrapper's knowledge.
> > +    class DequeueTask : public CancelableRunnable
> 
> What is this here for?  Did we forget to move something to
> CancelableRunnable earlier?

Now that these get dispatched into the XPCOM event queue, on worker threads, they must be cancelable.
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Comment on attachment 8747386 [details] [diff] [review]
> Part 3: And the main thread
> 
> Review of attachment 8747386 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, but the below needs some clarification.  Hate all the indirection in
> message_loop.cc and company.
> 
> ::: ipc/glue/MessagePump.cpp
> @@ +208,5 @@
> > +  }
> > +
> > +  // Main thread
> > +  nsCOMPtr<nsIThread> mainThread = do_GetMainThread();
> > +  return mainThread;
> 
> Uh.  So if we don't have a thread, we're going to redirect everything to the
> main thread?  Or is this a "we created the main thread message loop too
> early to actually have a main thread to assign in mThread, so we know that
> if we don't have a thread, it's going to be the main thread"?  Or is this
> "MessagePump is implicitly only for the main thread, so this is OK"?  But
> then how was the previous patch for non-main thread
> (MessagePumpForNonMainThreads) only?
> 
> This feels like a comment or assert is warranted.

It's the "we created the main thread message loop too early to actually have a main thread to assign in mThread, so we know that if we don't have a thread, it's going to be the main thread" case.  Note http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/ipc/glue/MessagePump.cpp#l174.  I'll add a comment.

And yes, I hate the indirection everywhere too.
ni? to make sure you see my responses.
Flags: needinfo?(nfroyd)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9)
> (In reply to Nathan Froyd [:froydnj] from comment #6)
> > Comment on attachment 8747385 [details] [diff] [review]
> > ::: ipc/chromium/src/base/message_loop.cc
> > @@ +281,5 @@
> > >  void MessageLoop::PostTask_Helper(already_AddRefed<Runnable> task, int delay_ms) {
> > > +  if (nsIEventTarget* target = pump_->GetXPCOMThread()) {
> > > +    nsresult rv;
> > > +    if (delay_ms) {
> > > +      rv = target->DelayedDispatch(Move(task), delay_ms);
> > 
> > We should DCHECK or MOZ_ASSERT that delay_ms > 0 here just to make sure we
> > don't shoot ourselves in the foot.
> 
> I don't really follow ... why should I assert something that is
> tautologically true?

delay_ms is |int|, not |unsigned int|, and negative integers are truthy.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #8)
> (In reply to Nathan Froyd [:froydnj] from comment #5)
> > and the current PostDelayedTask will insert things into the IPC queue.  So
> > without this code, we could have something like:
> > 
> > - PostDelayedTask to main thread event loop;
> > - Lots of XPCOM events inside the main thread event queue;
> > - posted task runs relatively quickly because we don't have that many
> > delayed tasks, and because IPC tasks "jump over" the XPCOM event queue.
> >
> > With this code, however, we now have:
> > 
> > - PostDelayedTask to main thread event loop;
> > - Lots of XPCOM events inside the main thread event queue;
> > - Timer fires due to event backlog
> > - Timer event posted to thread event queue (*not* run immediately)
> > - Lots more events run
> > - DelayedRunnable runs
> > 
> > So we had a much longer delay with the new scheme than the old scheme.  This
> > is all possible on non-main threads as well IIUC, but I'm assuming we have
> > shorter event queues there.
> > 
> > Does that seem plausible?  I'm not 100% convinced of the logic above
> > because, well, it's our event queue, but it seems like we might be
> > introducing latency in unappealing places.  Also seems like it's going to
> > flush out some assumptions around event ordering.
> 
> Yes ... did you read comment 1?  I listed the behaviors and changes I'm
> aware of there.

Maybe I'm dense, but I don't see any behavorial changes there...I see a list of "here's what the old implementation did" and "here's what we're going to try and do".  Am I mistaken?

> > I guess I'll r+ this, but I'd like to have a bit of discussion about the
> > above stuff before landing, and especially try and figure out if this is
> > going to affect plugins badly.
> > 
> > ::: xpcom/threads/nsThread.cpp
> > @@ +239,5 @@
> > > +public:
> > > +  DelayedRunnable(already_AddRefed<nsIRunnable> aRunnable,
> > > +                  uint32_t aDelay)
> > > +    : mWrappedRunnable(aRunnable),
> > > +      mDelayedFrom(TimeStamp::NowLoRes()),
> > 
> > Some of the PostDelayedTask calls have delays shorter than the Windows
> > default timer resolution; are you sure you want to use a low resolution
> > timer here?  I'm not sure all of the current calls get routed through this
> > code, but eventually we'll want to use this code for everything, right?
> 
> The Chromium event loop today uses the low res timer (see
> http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/ipc/chromium/src/
> base/message_loop.cc#l284 and
> http://hg.mozilla.org/mozilla-central/annotate/0a25833062a8/ipc/chromium/src/
> base/time.h#l391) so I just preserved that.

OK, that's fine.
Flags: needinfo?(nfroyd)
PostDelayedTask gets used primarily in APZ and plugin code, which does get tested on try, although not a ton.  I'm planning to land it and see what breaks, essentially.  And if we have to be more faithful to Chromium's PostDelayedTask implementation we can cross that bridge then.

Comment 14

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/ddfd27c35530
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ebbd63adcd9
https://hg.mozilla.org/integration/mozilla-inbound/rev/3533a2d51e97

Comment 15

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ddfd27c35530
https://hg.mozilla.org/mozilla-central/rev/7ebbd63adcd9
https://hg.mozilla.org/mozilla-central/rev/3533a2d51e97
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49

Updated

2 years ago
See Also: → bug 1180860
Depends on: 1275299
See Also: → bug 1303940
You need to log in before you can comment on or make changes to this bug.