Closed Bug 1306591 Opened 8 years ago Closed 8 years ago

Add urgent priority queue to nsThread and add an IPC annotation to use it

Categories

(Core :: XPCOM, defect, P3)

50 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: smaug, Assigned: smaug)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(4 files, 7 obsolete files)

Thinker, could you move your patch from bug 1269690 perhaps here?
Depends on: 1306593
Priority: -- → P3
Given the comments in the mailing list, perhaps we could initially add a new annotation and do possible renaming of urgent and high later.

The new one could be maybe...  priority_queue ?

Though, we don't have that many urgent or high usage, so I guess renaming first might be simple enough, and then just have prio(urgent) to have this special mean to use priority queue on the receiving side. prio(high) could be removed.
Summary: Add urgent priority queue to nsThread and make IPC's urgent annotation to use it → Add urgent priority queue to nsThread and add an IPC annotation to use it
Thinker, I added a new prio() annotation in bug 1306708 (and renamed the old prio() to nested()). You should be able to access this priority on the IPC message object. I think this will make things a lot simpler for you.
Depends on: 1306708
I guess we can accept this new prio(urgent) to work always, both parent->child and child->parent.
It just means which queue to use on the receiving side, right?
Yes, it will always work in both directions. It has whatever meaning we decide to give it. I only added one priority, prio(high). We can add more if we need them.
Attached patch wip (obsolete) — Splinter Review
This is the patch from bug 1269690 updated to use the current ipdl annotations.
VSync is has higher priority.

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=4437d26dc98f2b144f78bbbb2479637a956cc63c
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=4437d26dc98f2b144f78bbbb2479637a956cc63c
Assignee: nobody → bugs
Need to fix that double mRefCount member issue, but surprisingly good results. Couple of tests to fix, or need to fix the underlying implementation.
Attached patch wip2 (obsolete) — Splinter Review
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=152f018659d6188f74110ad64be8c85f6623e0cb
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=152f018659d6188f74110ad64be8c85f6623e0cb
Attachment #8799039 - Attachment is obsolete: true
What do we expect to use this for?
vsync and user input will use high priority queue, possibly also some tab switching stuff.
Depends on: 1169287
Attached patch v5 (obsolete) — Splinter Review
devtools run super heavy JS in rAF so had to increase timeout value (for debug builds).
 
test_pathAnimInterpolation.xhtml has clearly a bug, since it doesn't remove all those animation tests, so the next tick will still go through all the animations, and with the patch
it is possible to get a new tick before the window has been unloaded.

I'm not too happy with nsIRunnablePriority usage. I'd rather have NS_HighPriorityDispatchToCurrentThread or some such, but the ipc stuff is messy,
and nsIRunnablePriority works too, for now. If the ipc gets cleaned up, we could simplify this.

mNumberOfHighPriorityRunnablesToProcess is a bit weird stuff, but it is there to ensure we process some non-high priority events too.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=87e47522f2ab1359c33af9307d166a215195e49d

https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=87e47522f2ab1359c33af9307d166a215195e49d
Attachment #8799360 - Attachment is obsolete: true
Attachment #8803164 - Attachment is obsolete: true
Comment on attachment 8804894 [details] [diff] [review]
v5

See comment 13
Attachment #8804894 - Flags: review?(wmccloskey)
Comment on attachment 8804894 [details] [diff] [review]
v5

Review of attachment 8804894 [details] [diff] [review]:
-----------------------------------------------------------------

I noticed some issues with the thread stuff. I also think it makes a lot more sense to do this on top of bug 1312960. The MessageChannel stuff isn't correct without that bug, and it will be a lot simpler too. You should just be able to add a GetPriority method to MessageTask that looks at the Message and then you'll be done.

::: xpcom/threads/nsIRunnable.idl
@@ +18,5 @@
>      void run();
>  };
> +
> +[uuid(e75aa42a-80a9-11e6-afb5-e89d87348e2c)]
> +interface nsIRunnablePriority : nsISupports

What about using nsISupportsPriority instead?

::: xpcom/threads/nsThread.h
@@ +162,5 @@
> +      // We don't want to wait if mHighPriorityQueue might have some events.
> +      bool reallyMayWait =
> +        aMayWait && !mHighPriorityQueue->HasPendingEvent(aProofOfLock);
> +      bool retVal =
> +        mNormalPriorityQueue->GetEvent(reallyMayWait, aEvent, aProofOfLock);

I don't understand how this works. Let's say that both queues are empty and we end up waiting on the condition variable (i.e., reallyMayWait is true). Now someone puts an event into the high priority queue. The normal queue wakes up (since it uses the same condition variable), but it immediately goes back to sleep again because its queue is empty:
http://searchfox.org/mozilla-central/rev/4012e61cc38758ffa1f7ce26039173f67b048853/xpcom/threads/nsEventQueue.cpp#55

I think we're going to have to do the condition variable wait ourselves and then check HasPendingEvent on each queue.

I looked at the patch in bug 1198381 and I think it has a related problem: if we go to sleep waiting for a normal event, we won't wake up if an idle event is dispatched.

I think it would be good for this code to fit better with the code from bug 1198381 and to fix these problems.

@@ +165,5 @@
> +      bool retVal =
> +        mNormalPriorityQueue->GetEvent(reallyMayWait, aEvent, aProofOfLock);
> +
> +      // Let's see if we should next time process some high priority events.
> +      mNumberOfHighPriorityRunnablesToProcess =

The way that mNumberOfHighPriorityRunnablesToProcess is updated seems wrong to me. PutEvent never increments it. What happens if we enter GetEvent and there are high priority events but mNumberOfHighPriorityRunnablesToProcess == 0? We'll first call mNormalPriorityQueue->GetEvent. Assume it returns something (say there were also normal priority events available). Then we'll return a normal priority event even though the high-priority queue is non-empty.
Attachment #8804894 - Flags: review?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #15)
> I noticed some issues with the thread stuff. I also think it makes a lot
> more sense to do this on top of bug 1312960.
yeah, I was wonder what will happen with that stuff. didn't know if there was a bug already.

> The MessageChannel stuff isn't
> correct without that bug, and it will be a lot simpler too.
how so?

> What about using nsISupportsPriority instead?
Makes no sense since we support only normal and high priority.
I'd like to get rid of nsIRunnablePriority assuming IPC level gets sanitized so that dispatching explicitly 
high priority runnables doesn't make the code too ugly.


> I don't understand how this works. Let's say that both queues are empty and
> we end up waiting on the condition variable (i.e., reallyMayWait is true).
> Now someone puts an event into the high priority queue. The normal queue
> wakes up (since it uses the same condition variable), but it immediately
> goes back to sleep again because its queue is empty:
> http://searchfox.org/mozilla-central/rev/
> 4012e61cc38758ffa1f7ce26039173f67b048853/xpcom/threads/nsEventQueue.cpp#55
Oh, indeed, that while loop is buggy.


> I think we're going to have to do the condition variable wait ourselves and
> then check HasPendingEvent on each queue.
Why? I think it is better to just modify GetEvent, since it is just buggy right now.
IsEmpty() doesn't make sense there.


> I looked at the patch in bug 1198381 and I think it has a related problem:
> if we go to sleep waiting for a normal event, we won't wake up if an idle
> event is dispatched.
Ah, that is possible.
 
> I think it would be good for this code to fit better with the code from bug
> 1198381 and to fix these problems.
I started to write a patch following the patterns from that bug, but our IPC code is too messy that it wasn't really possible without making code even more messy


> The way that mNumberOfHighPriorityRunnablesToProcess is updated seems wrong
> to me. PutEvent never increments it.
That is the whole point that PutEvent doesn't update it. We update it only after processing one normal event.


> What happens if we enter GetEvent and
> there are high priority events but mNumberOfHighPriorityRunnablesToProcess
> == 0?
We should process normal priority event.


> We'll first call mNormalPriorityQueue->GetEvent. Assume it returns
> something (say there were also normal priority events available). Then we'll
> return a normal priority event even though the high-priority queue is
> non-empty.
Yes, that is the whole point. We must process also non-high priority events even though there are high priority events pending.
The way to decide how often is of course a question. Initially I had a boolean to ensure after each high priority event one normal event is processed, but that felt even too conservative.
(In reply to Olli Pettay [:smaug] from comment #16)
> > The MessageChannel stuff isn't
> > correct without that bug, and it will be a lot simpler too.
> how so?

Right now the IPC code doesn't have a direct link between runnables and messages. Most of the time the runnable we dispatch for a message ends up handling that message. But sometimes it handles a different message.

> > What about using nsISupportsPriority instead?
> Makes no sense since we support only normal and high priority.
> I'd like to get rid of nsIRunnablePriority assuming IPC level gets sanitized
> so that dispatching explicitly 
> high priority runnables doesn't make the code too ugly.

OK. Were you thinking of changing the Dispatch call or something?

> > I think we're going to have to do the condition variable wait ourselves and
> > then check HasPendingEvent on each queue.
> Why? I think it is better to just modify GetEvent, since it is just buggy
> right now.
> IsEmpty() doesn't make sense there.

I think we need to call IsEmpty(). Wait() can wake up even if the condition variable hasn't been notified.

> > I think it would be good for this code to fit better with the code from bug
> > 1198381 and to fix these problems.
> I started to write a patch following the patterns from that bug, but our IPC
> code is too messy that it wasn't really possible without making code even
> more messy

I meant the requestIdleCallback bug. I think we should use the same mechanism for high priority tasks as we use for idle tasks.

> > The way that mNumberOfHighPriorityRunnablesToProcess is updated seems wrong
> > to me. PutEvent never increments it.
> That is the whole point that PutEvent doesn't update it. We update it only
> after processing one normal event.

Oh, sorry, I missed that part of comment 13. Can you add some comments about how that is supposed to work?
(In reply to Bill McCloskey (:billm) from comment #17)

> > > What about using nsISupportsPriority instead?
> > Makes no sense since we support only normal and high priority.
> > I'd like to get rid of nsIRunnablePriority assuming IPC level gets sanitized
> > so that dispatching explicitly 
> > high priority runnables doesn't make the code too ugly.
> 
> OK. Were you thinking of changing the Dispatch call or something?
Yeah, explicit NS_HighPriorityDispatchToCurrentThread or some such


> 
> > > I think we're going to have to do the condition variable wait ourselves and
> > > then check HasPendingEvent on each queue.
> > Why? I think it is better to just modify GetEvent, since it is just buggy
> > right now.
> > IsEmpty() doesn't make sense there.
> 
> I think we need to call IsEmpty(). Wait() can wake up even if the condition
> variable hasn't been notified.

Sorry, my comment was vague. I mean we don't need to have IsEmpty() in 'while'. 'if' is enough


> > > I think it would be good for this code to fit better with the code from bug
> > > 1198381 and to fix these problems.
> > I started to write a patch following the patterns from that bug, but our IPC
> > code is too messy that it wasn't really possible without making code even
> > more messy
> 
> I meant the requestIdleCallback bug. I think we should use the same
> mechanism for high priority tasks as we use for idle tasks.
I actually think we'll need to move rIC closer to the approach taken here.
Though, rIC is quite different. high priority and normal priority events are interleaved in a bit different way.



> Oh, sorry, I missed that part of comment 13. Can you add some comments about
> how that is supposed to work?
Definitely.
Hmm, perhaps because of nsThreadPool stuff condvar handling might be simpler to do outside nsEventQueue after all. Or have two modes in nsEventQueue, normal mode and threadpool mode which lets one to use
those hacky Wait() and NotifyAll() methods defined in the .h
https://treeherder.mozilla.org/#/jobs?repo=try&revision=10f96a11179467be593959abf8e7c7305b7bed9b

But sounds like ipc part should be done after bug 1312960. And if that let's us to get rid of nsIRunnablePriority, even better.
Attached patch v9Splinter Review
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0512c4a522bfd7b56482ea86dbeb0aa44022c76
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=a0512c4a522bfd7b56482ea86dbeb0aa44022c76
Attachment #8804894 - Attachment is obsolete: true
Attachment #8805491 - Attachment is obsolete: true
Attachment #8807889 - Attachment is obsolete: true
Attachment #8807894 - Attachment is obsolete: true
Explicitly not trying to fix bug 1313988 here.
Attachment #8807985 - Flags: review?(wmccloskey)
Blocks: 1315570
Since this + bkelly's changes to setTimeout handling could easily break more tests, better to push to try again.
No real code changes from v11, except renaming the values in nsEventQueue's enum, since the new ThrottledEventQueue uses them too.

remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb01d26e3b0c1a663c60dc71b98b1406623a3b9e
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=fb01d26e3b0c1a663c60dc71b98b1406623a3b9e
Comment on attachment 8807985 [details] [diff] [review]
ipc_priority_11.diff

Review of attachment 8807985 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessageChannel.h
@@ +467,5 @@
>          NS_IMETHOD Run() override;
>          nsresult Cancel() override;
> +        NS_IMETHOD GetPriority(uint32_t* aPriority) override
> +        {
> +          *aPriority = mMessage.priority();

I'd feel a lot better if you did something like this:
  *aPriority = mMessage.priority() == Message::HIGH_PRIORITY
             ? PRIORITY_HIGH
             : PRIORITY_NORMAL;
It's possible we'll add a new priority to one side or the other and then these might not match up.

Also, can you move this function to the .cpp file? That way someone won't miss it while reading through the .cpp file.

::: xpcom/threads/nsEventQueue.cpp
@@ +25,5 @@
>    , mOffsetHead(0)
>    , mOffsetTail(0)
> +  , mType(aType)
> +{
> +  mOwnEventsAvailable =

I feel like this would be cleaner if nsThread/nsThreadPool always owned the CondVar and just passed it in here. Then you would only need one constructor and the ownership issues would be simpler.

::: xpcom/threads/nsThread.h
@@ +151,3 @@
>      }
>  
>      bool GetEvent(bool aMayWait, nsIRunnable** aEvent,

Can you move GetEvent/PutEvent to the .cpp file? They're so big that inlining won't help, and it'll be easier to maintain the code in the future.

@@ +176,5 @@
> +        if (*aEvent) {
> +          // We got an event, return early.
> +          return retVal;
> +        }
> +

Please remove the blank line.

@@ +177,5 @@
> +          // We got an event, return early.
> +          return retVal;
> +        }
> +
> +      } while(aMayWait || mProcessHighPriorityRunnable);

Space after "while".

@@ +184,5 @@
>      }
>  
>      void PutEvent(nsIRunnable* aEvent, mozilla::MutexAutoLock& aProofOfLock)
>      {
> +      nsCOMPtr<nsIRunnablePriority> runnablePrio = do_QueryInterface(aEvent);

It seems like this code duplication is to avoid an extra refcount. However, it looks like nsEventQueue::PutEvent(nsIRunnable*) takes an extra ref anyway. Why not do it sooner here and just call into the already_AddRefed version?

@@ +233,5 @@
> +
> +    // Try to process one high priority runnable after each normal
> +    // priority runnable. This gives the processing model HTML spec has for
> +    // 'Update the rendering' in the case only vsync messages are in the high
> +    // priority queue and prevents starving the normal priority queue.

Given how this works, it seems a bit weird to me to call these "high priority" events since we actually process the normal priority events in preference to high priority events. But I can't think of a better name, so I guess this is okay for now.
Attachment #8807985 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #27)
> @@ +233,5 @@
> > +
> > +    // Try to process one high priority runnable after each normal
> > +    // priority runnable. This gives the processing model HTML spec has for
> > +    // 'Update the rendering' in the case only vsync messages are in the high
> > +    // priority queue and prevents starving the normal priority queue.
> 
> Given how this works, it seems a bit weird to me to call these "high
> priority" events since we actually process the normal priority events in
> preference to high priority events. But I can't think of a better name, so I
> guess this is okay for now.

Instead of baking this into nsThread could we instead make nsRefreshDriver call NS_ProcessNextEvent() to force some normal runnables if it detects that requestAnimationFrame took a long time?  That would keep the weird logic close to the thing causing the weird logic.
(In reply to Ben Kelly [:bkelly] from comment #28)
> Instead of baking this into nsThread could we instead make nsRefreshDriver
> call NS_ProcessNextEvent() to force some normal runnables if it detects that
> requestAnimationFrame took a long time?  That would keep the weird logic
> close to the thing causing the weird logic.
I really prefer to follow the spec model as much as possible here.
(In reply to Bill McCloskey (:billm) from comment #27)
> > +  mOwnEventsAvailable =
> 
> I feel like this would be cleaner if nsThread/nsThreadPool always owned the
> CondVar and just passed it in here. Then you would only need one constructor
> and the ownership issues would be simpler.
possibly yes.
ThrottledEventQueue added a new use for this, but I guess I could just add the condvar as its member variable.


> 
> Can you move GetEvent/PutEvent to the .cpp file? They're so big that
> inlining won't help, and it'll be easier to maintain the code in the future.
> 
GetEvent at least. PutEvents are small

> > +
> > +    // Try to process one high priority runnable after each normal
> > +    // priority runnable. This gives the processing model HTML spec has for
> > +    // 'Update the rendering' in the case only vsync messages are in the high
> > +    // priority queue and prevents starving the normal priority queue.
> 
> Given how this works, it seems a bit weird to me to call these "high
> priority" events since we actually process the normal priority events in
> preference to high priority events. But I can't think of a better name, so I
> guess this is okay for now.
right. The high priority comes from the fact that this type of messages should be used very rarely.
(In reply to Olli Pettay [:smaug] from comment #30)
> (In reply to Bill McCloskey (:billm) from comment #27)
> > > +  mOwnEventsAvailable =
> > 
> > I feel like this would be cleaner if nsThread/nsThreadPool always owned the
> > CondVar and just passed it in here. Then you would only need one constructor
> > and the ownership issues would be simpler.
> possibly yes.
> ThrottledEventQueue added a new use for this, but I guess I could just add
> the condvar as its member variable.

By this you mean not having a nested runnable?  Ok.

We could still hint to nsThread from nsRefreshDriver that high priority should be suspended for N runnables or time period, though.  It just feels like a layering violation to have this special logic in the deep guts of the system for a particular condition occurring in a higher layer.
what is the violation? We have two queues, of which one is commonly used and once is rarely used, yet we give them both effectively the same priority, meaning that runnables in the rarely used get processed faster.
What if we call it a secondary queue instead of a high-priority queue?
(In reply to Olli Pettay [:smaug] from comment #32)
> what is the violation? We have two queues, of which one is commonly used and
> once is rarely used, yet we give them both effectively the same priority,
> meaning that runnables in the rarely used get processed faster.

Because of requestAnimationFrame() in queue 1 depending on logic running in js in queue 2 you cannot really implement this as a high level priority queue.  If you made the refresh driver in queue 1 hint that queue 2 needs to run, though, then nsThread would not need to guess when it should force the high priority mechanism off.  It would be more explicit and more precise.

Anyway, it was just a suggestion.
Attached patch v13Splinter Review
I like "secondary queue". 

remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=5487c6761e55f0f9ab8d267126afee6090f93615
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=5487c6761e55f0f9ab8d267126afee6090f93615
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fab432069073
add secondary event queue to let high priority messages to be processed sooner, r=billm
https://hg.mozilla.org/mozilla-central/rev/fab432069073
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1316686
Depends on: 1318900
Depends on: 1322970
Depends on: 1331706
Depends on: 1329079
See Also: → 1338288
No longer depends on: 1340142
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: