1318226 - Tail dispatcher breaks the order of events

Assignee

Description

•

8 years ago

Without tail dispatching:
TaskQueue* tq1;
TaskQueue* tq1;
tq1->Dispatch(R1);
tq2->Dispatch(R2); // R2 will call tq1->Dispatch(R3)

R1 is guaranteed to happen before R3.

With tail dispatching:
We can't guarantee R1 happens before R3.

http://searchfox.org/mozilla-central/rev/efcb1644e49c36445e75d89b434fdf4c84832c84/dom/media/MediaDecoder.cpp#304
This causes bug 1316145 where mCompositorUpdatedEvent dispatches a task to the TaskQueue of MediaDecoderReader after it is shut down.

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

8 years ago

Blocks: 1316145

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 1

•

8 years ago

Hi Bobby,
Do you have any suggestion how to fix the event disorder introduced by tail dispatching?

Flags: needinfo?(bobbyholley)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 2

•

8 years ago

Add a gtest to verify the bug:
https://hg.mozilla.org/try/rev/0e0de8b04f5a

and it fails miserably on Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb0da3a6fc22aacb9eb114b5078362bfadfe246b

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/94284/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/94284/

Bobby Holley (:bholley)

Comment 4

•

8 years ago

Hm, nice find! That's a pretty tough one.

Fundamentally, the point of tail dispatching is to make asynchronous messages from one thread to another occur atomically. It's not possible to reconcile that with ordering assumptions where callers _depend_ on the order of interleaved events dispatched from a given thread.

We could at least detect this case though: i.e. make TaskDispatcher::AddTask MOZ_CRASH if the PerThreadTaskGroup it returns is not end-of-list. Callers could then either drain the tail dispatcher before continuing (ensuring the correct ordering), or pass some kind of opt-out token (indicating that they don't care about the ordering).

I don't have a sense of how common this pattern is, and therefore what the false-positive rate would be. Maybe worth a try though?

Flags: needinfo?(bobbyholley)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 5

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> Fundamentally, the point of tail dispatching is to make asynchronous
> messages from one thread to another occur atomically.
Why is the atomicity required? I think the purpose of tail dispatcher is to provide run-to-completion as we have in JS.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 6

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> I don't have a sense of how common this pattern is, and therefore what the
> false-positive rate would be. Maybe worth a try though?

We have:
a. MediaDecoder (main thread) -> MDSM::Shutdown (MDSM TaskQueue) -> MediaDecoderReader::Shutdown (Reader TaskQueue)

and also:
b. MediaDecoder (main thread) sends events to MediaDecoderReader (Reader TaskQueue) using MediaEventSource.

We need to guarantee events are sent to the Reader's TaskQueue before MediaDecoderReader::Shutdown shuts down its TaskQueue to avoid assertions caused by dispatch failures.

Bobby Holley (:bholley)

Comment 7

•

8 years ago

(In reply to JW Wang [:jwwang] [:jw_wang] from comment #5)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> > Fundamentally, the point of tail dispatching is to make asynchronous
> > messages from one thread to another occur atomically.
> Why is the atomicity required? I think the purpose of tail dispatcher is to
> provide run-to-completion as we have in JS.

I don't think so. We introduced tail dispatching to support state mirroring. For state mirroring, we wanted to send a single atomic state update to other threads once a given task is complete. But to preserve ordering, that means that we also need to defer any dispatched tasks until after the state updates are applied. That's the reason we require tail dispatch for any thread that holds canonical mirrored state.

(In reply to JW Wang [:jwwang] [:jw_wang] from comment #6)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> > I don't have a sense of how common this pattern is, and therefore what the
> > false-positive rate would be. Maybe worth a try though?
> 
> We have:
> a. MediaDecoder (main thread) -> MDSM::Shutdown (MDSM TaskQueue) ->
> MediaDecoderReader::Shutdown (Reader TaskQueue)
> 
> and also:
> b. MediaDecoder (main thread) sends events to MediaDecoderReader (Reader
> TaskQueue) using MediaEventSource.
> 
> We need to guarantee events are sent to the Reader's TaskQueue before
> MediaDecoderReader::Shutdown shuts down its TaskQueue to avoid assertions
> caused by dispatch failures.

That's the failure mode from bug 1316145, right? That would be a non-false positive under my proposed solution from comment 4. I'm specifically wondering how many _other_ situations there are where we dispatch a task to A, then to B, and then to A again.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 8

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> I don't think so. We introduced tail dispatching to support state mirroring.
> For state mirroring, we wanted to send a single atomic state update to other
> threads once a given task is complete. But to preserve ordering, that means
> that we also need to defer any dispatched tasks until after the state
> updates are applied. That's the reason we require tail dispatch for any
> thread that holds canonical mirrored state.

There are 2 categories of tasks in discussion:
1. state change tasks (used by state mirroring): for which we must favor atomicity over ordering.
2. regular tasks: for which we might favor ordering over atomicity.

There are some proposed solutions to fix the ordering of regular tasks:
1. comment 4 (this breaks atomicity).
2. implicitly append a new PerThreadTaskGroup if the returned PerThreadTaskGroup is not the last element. (this breaks atomicity).
3. provide a function to drain the tail dispatcher for functions (ex: MediaDecoder::Shutdown()) that are sensitive to TaskQueue shutdown. (atomicity is broken when the caller agrees).

Bobby Holley (:bholley)

Comment 9

•

8 years ago

Sorry I've been slow to respond here - I was meaning to get back to this before the holidays but didn't. I'll try to pick this discussion up again once I'm back. :-)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 10

•

8 years ago

No worries! Please take your time and enjoy the holidays. :)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 11

•

8 years ago

CC Bevis who is making changes to AbstractThread and TaskQueue.

Bobby Holley (:bholley)

Comment 12

•

8 years ago

(In reply to JW Wang [:jwwang] [:jw_wang] from comment #8)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> > I don't think so. We introduced tail dispatching to support state mirroring.
> > For state mirroring, we wanted to send a single atomic state update to other
> > threads once a given task is complete. But to preserve ordering, that means
> > that we also need to defer any dispatched tasks until after the state
> > updates are applied. That's the reason we require tail dispatch for any
> > thread that holds canonical mirrored state.
> 
> There are 2 categories of tasks in discussion:
> 1. state change tasks (used by state mirroring): for which we must favor
> atomicity over ordering.
> 2. regular tasks: for which we might favor ordering over atomicity.

The key point here is that the atomicity of regular tasks exists _because_ of ordering - specifically to provide a consistent ordering between state changes and tasks that might observe them.

> 
> There are some proposed solutions to fix the ordering of regular tasks:
> 1. comment 4 (this breaks atomicity).
> 2. implicitly append a new PerThreadTaskGroup if the returned
> PerThreadTaskGroup is not the last element. (this breaks atomicity).
> 3. provide a function to drain the tail dispatcher for functions (ex:
> MediaDecoder::Shutdown()) that are sensitive to TaskQueue shutdown.
> (atomicity is broken when the caller agrees).

Can you explain the difference between 1 and 3? They sound the same to me (and I still think that we should pursue the solution in comment 4).

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 13

•

8 years ago

(In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> Can you explain the difference between 1 and 3? They sound the same to me
> (and I still think that we should pursue the solution in comment 4).
No difference. 1 includes 3 after my second thought.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 14

•

8 years ago

(In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> The key point here is that the atomicity of regular tasks exists _because_
> of ordering - specifically to provide a consistent ordering between state
> changes and tasks that might observe them.

I don't quite get it. Can you explain how ordering implies atomicity for regular tasks?

Below is my understanding about the atomicity of regular tasks:

Say we want to dispatch 3 regular tasks to thread1. By atomicity I mean to have a task dispatched to thread1 and the Run() function will call task1->Run(), task2->Run(), and task3->Run() in sequence. It is impossible for other tasks to run in between these 3 tasks.

By breaking atomicity of regular tasks, I mean we might have taskA dispatched to call task1->Run() and taskB dispatched to call task2->Run() and task3->Run() in sequence. It is possible for other tasks to run in between task1 and task2.

Bobby Holley (:bholley)

Comment 15

•

8 years ago

(In reply to JW Wang [:jwwang] [:jw_wang] from comment #14)
> (In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> > The key point here is that the atomicity of regular tasks exists _because_
> > of ordering - specifically to provide a consistent ordering between state
> > changes and tasks that might observe them.
> 
> I don't quite get it. Can you explain how ordering implies atomicity for
> regular tasks?

I'm probably just being a bit sloppy with my terminology. I was specifically talking about ordering of the tasks relative to state changes (the regular tasks need to come after all the state changes, because that's the only well-defined ordering). This requires that the regular tasks be tail-dispatched, but I realize now that they don't necessarily need to be atomic (see below). 
 
> Below is my understanding about the atomicity of regular tasks:
> 
> Say we want to dispatch 3 regular tasks to thread1. By atomicity I mean to
> have a task dispatched to thread1 and the Run() function will call
> task1->Run(), task2->Run(), and task3->Run() in sequence. It is impossible
> for other tasks to run in between these 3 tasks.
> 
> By breaking atomicity of regular tasks, I mean we might have taskA
> dispatched to call task1->Run() and taskB dispatched to call task2->Run()
> and task3->Run() in sequence. It is possible for other tasks to run in
> between task1 and task2.

These definitions make sense to me.

Thinking about this some more, I think you're right that the regular tasks don't need to be atomic, the just need to be delayed. I _think_ we can implicitly append additional PerThreadTaskGroups for regular tasks, _assuming_ that we make sure to put all state updates (including ones that occur after additional PerThreadTaskGroups are appended) continue to live in the first PerThreadTaskGroup.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/118400/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/118400/

Comment hidden (mozreview-request)

Comment on attachment 8812640 [details]
Bug 1318226. P2 - add gtest TestTaskQueue to test the order of regular tasks.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/94284/diff/1-2/

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

7 years ago

Attachment #8845206 - Flags: review?(bobbyholley)

Attachment #8812640 - Flags: review?(bobbyholley)

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

7 years ago

Depends on: 1345711

Bobby Holley (:bholley)

Comment 18

•

7 years ago

mozreview-review

Comment on attachment 8812640 [details]
Bug 1318226. P2 - add gtest TestTaskQueue to test the order of regular tasks.

https://reviewboard.mozilla.org/r/94284/#review126040

Sorry for the delay on this one.

Attachment #8812640 - Flags: review?(bobbyholley) → review+

Bobby Holley (:bholley)

Comment 19

•

7 years ago

mozreview-review

Comment on attachment 8845206 [details]
Bug 1318226. P1 - preserve the order of regular tasks.

https://reviewboard.mozilla.org/r/118400/#review126042

::: xpcom/threads/TaskDispatcher.h:127
(Diff revision 1)
> -    PerThreadTaskGroup& group = EnsureTaskGroup(aThread);
> +    // To preserve the event order, we need to append a new group if the last
> +    // group is not targeted for |aThread|.

Probably worth explaining the issue here in a bit more detail, and/or referencing the discussion in the bug and/or the testcase that demonstrates this issue.

Attachment #8845206 - Flags: review?(bobbyholley) → review+

Comment hidden (mozreview-request)

Comment on attachment 8845206 [details]
Bug 1318226. P1 - preserve the order of regular tasks.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/118400/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8812640 [details]
Bug 1318226. P2 - add gtest TestTaskQueue to test the order of regular tasks.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/94284/diff/2-3/

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 22

•

7 years ago

Thanks!

Assignee: nobody → jwwang

Pulsebot

Comment 23

•

7 years ago

Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd225b588582
P1 - preserve the order of regular tasks. r=bholley
https://hg.mozilla.org/integration/autoland/rev/1bfd5d58b3b1
P2 - add gtest TestTaskQueue to test the order of regular tasks. r=bholley

Carsten Book [:Tomcat]

Comment 24

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cd225b588582
https://hg.mozilla.org/mozilla-central/rev/1bfd5d58b3b1

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Bug 1318226. P2 - add gtest TestTaskQueue to test the order of regular tasks. 8 years ago JW Wang [:jwwang] [:jw_wang] 58 bytes, text/x-review-board-request	bholley : review+	Details
Bug 1318226. P1 - preserve the order of regular tasks. 7 years ago JW Wang [:jwwang] [:jw_wang] 59 bytes, text/x-review-board-request	bholley : review+	Details