Closed Bug 1318226 Opened 8 years ago Closed 7 years ago

Tail dispatcher breaks the order of events

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jwwang, Assigned: jwwang)

References

Details

Attachments

(2 files)

Without tail dispatching:
TaskQueue* tq1;
TaskQueue* tq1;
tq1->Dispatch(R1);
tq2->Dispatch(R2); // R2 will call tq1->Dispatch(R3)

R1 is guaranteed to happen before R3.

With tail dispatching:
We can't guarantee R1 happens before R3.

http://searchfox.org/mozilla-central/rev/efcb1644e49c36445e75d89b434fdf4c84832c84/dom/media/MediaDecoder.cpp#304
This causes bug 1316145 where mCompositorUpdatedEvent dispatches a task to the TaskQueue of MediaDecoderReader after it is shut down.
Blocks: 1316145
Hi Bobby,
Do you have any suggestion how to fix the event disorder introduced by tail dispatching?
Flags: needinfo?(bobbyholley)
Hm, nice find! That's a pretty tough one.

Fundamentally, the point of tail dispatching is to make asynchronous messages from one thread to another occur atomically. It's not possible to reconcile that with ordering assumptions where callers _depend_ on the order of interleaved events dispatched from a given thread.

We could at least detect this case though: i.e. make TaskDispatcher::AddTask MOZ_CRASH if the PerThreadTaskGroup it returns is not end-of-list. Callers could then either drain the tail dispatcher before continuing (ensuring the correct ordering), or pass some kind of opt-out token (indicating that they don't care about the ordering).

I don't have a sense of how common this pattern is, and therefore what the false-positive rate would be. Maybe worth a try though?
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> Fundamentally, the point of tail dispatching is to make asynchronous
> messages from one thread to another occur atomically.
Why is the atomicity required? I think the purpose of tail dispatcher is to provide run-to-completion as we have in JS.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> I don't have a sense of how common this pattern is, and therefore what the
> false-positive rate would be. Maybe worth a try though?

We have:
a. MediaDecoder (main thread) -> MDSM::Shutdown (MDSM TaskQueue) -> MediaDecoderReader::Shutdown (Reader TaskQueue)

and also:
b. MediaDecoder (main thread) sends events to MediaDecoderReader (Reader TaskQueue) using MediaEventSource.

We need to guarantee events are sent to the Reader's TaskQueue before MediaDecoderReader::Shutdown shuts down its TaskQueue to avoid assertions caused by dispatch failures.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #5)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> > Fundamentally, the point of tail dispatching is to make asynchronous
> > messages from one thread to another occur atomically.
> Why is the atomicity required? I think the purpose of tail dispatcher is to
> provide run-to-completion as we have in JS.

I don't think so. We introduced tail dispatching to support state mirroring. For state mirroring, we wanted to send a single atomic state update to other threads once a given task is complete. But to preserve ordering, that means that we also need to defer any dispatched tasks until after the state updates are applied. That's the reason we require tail dispatch for any thread that holds canonical mirrored state.

(In reply to JW Wang [:jwwang] [:jw_wang] from comment #6)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #4)
> > I don't have a sense of how common this pattern is, and therefore what the
> > false-positive rate would be. Maybe worth a try though?
> 
> We have:
> a. MediaDecoder (main thread) -> MDSM::Shutdown (MDSM TaskQueue) ->
> MediaDecoderReader::Shutdown (Reader TaskQueue)
> 
> and also:
> b. MediaDecoder (main thread) sends events to MediaDecoderReader (Reader
> TaskQueue) using MediaEventSource.
> 
> We need to guarantee events are sent to the Reader's TaskQueue before
> MediaDecoderReader::Shutdown shuts down its TaskQueue to avoid assertions
> caused by dispatch failures.

That's the failure mode from bug 1316145, right? That would be a non-false positive under my proposed solution from comment 4. I'm specifically wondering how many _other_ situations there are where we dispatch a task to A, then to B, and then to A again.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> I don't think so. We introduced tail dispatching to support state mirroring.
> For state mirroring, we wanted to send a single atomic state update to other
> threads once a given task is complete. But to preserve ordering, that means
> that we also need to defer any dispatched tasks until after the state
> updates are applied. That's the reason we require tail dispatch for any
> thread that holds canonical mirrored state.

There are 2 categories of tasks in discussion:
1. state change tasks (used by state mirroring): for which we must favor atomicity over ordering.
2. regular tasks: for which we might favor ordering over atomicity.

There are some proposed solutions to fix the ordering of regular tasks:
1. comment 4 (this breaks atomicity).
2. implicitly append a new PerThreadTaskGroup if the returned PerThreadTaskGroup is not the last element. (this breaks atomicity).
3. provide a function to drain the tail dispatcher for functions (ex: MediaDecoder::Shutdown()) that are sensitive to TaskQueue shutdown. (atomicity is broken when the caller agrees).
Sorry I've been slow to respond here - I was meaning to get back to this before the holidays but didn't. I'll try to pick this discussion up again once I'm back. :-)
No worries! Please take your time and enjoy the holidays. :)
CC Bevis who is making changes to AbstractThread and TaskQueue.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #8)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #7)
> > I don't think so. We introduced tail dispatching to support state mirroring.
> > For state mirroring, we wanted to send a single atomic state update to other
> > threads once a given task is complete. But to preserve ordering, that means
> > that we also need to defer any dispatched tasks until after the state
> > updates are applied. That's the reason we require tail dispatch for any
> > thread that holds canonical mirrored state.
> 
> There are 2 categories of tasks in discussion:
> 1. state change tasks (used by state mirroring): for which we must favor
> atomicity over ordering.
> 2. regular tasks: for which we might favor ordering over atomicity.

The key point here is that the atomicity of regular tasks exists _because_ of ordering - specifically to provide a consistent ordering between state changes and tasks that might observe them.

> 
> There are some proposed solutions to fix the ordering of regular tasks:
> 1. comment 4 (this breaks atomicity).
> 2. implicitly append a new PerThreadTaskGroup if the returned
> PerThreadTaskGroup is not the last element. (this breaks atomicity).
> 3. provide a function to drain the tail dispatcher for functions (ex:
> MediaDecoder::Shutdown()) that are sensitive to TaskQueue shutdown.
> (atomicity is broken when the caller agrees).

Can you explain the difference between 1 and 3? They sound the same to me (and I still think that we should pursue the solution in comment 4).
(In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> Can you explain the difference between 1 and 3? They sound the same to me
> (and I still think that we should pursue the solution in comment 4).
No difference. 1 includes 3 after my second thought.
(In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> The key point here is that the atomicity of regular tasks exists _because_
> of ordering - specifically to provide a consistent ordering between state
> changes and tasks that might observe them.

I don't quite get it. Can you explain how ordering implies atomicity for regular tasks?

Below is my understanding about the atomicity of regular tasks:

Say we want to dispatch 3 regular tasks to thread1. By atomicity I mean to have a task dispatched to thread1 and the Run() function will call task1->Run(), task2->Run(), and task3->Run() in sequence. It is impossible for other tasks to run in between these 3 tasks.

By breaking atomicity of regular tasks, I mean we might have taskA dispatched to call task1->Run() and taskB dispatched to call task2->Run() and task3->Run() in sequence. It is possible for other tasks to run in between task1 and task2.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #14)
> (In reply to Bobby Holley (PTO) (busy with Stylo) from comment #12)
> > The key point here is that the atomicity of regular tasks exists _because_
> > of ordering - specifically to provide a consistent ordering between state
> > changes and tasks that might observe them.
> 
> I don't quite get it. Can you explain how ordering implies atomicity for
> regular tasks?

I'm probably just being a bit sloppy with my terminology. I was specifically talking about ordering of the tasks relative to state changes (the regular tasks need to come after all the state changes, because that's the only well-defined ordering). This requires that the regular tasks be tail-dispatched, but I realize now that they don't necessarily need to be atomic (see below). 
 
> Below is my understanding about the atomicity of regular tasks:
> 
> Say we want to dispatch 3 regular tasks to thread1. By atomicity I mean to
> have a task dispatched to thread1 and the Run() function will call
> task1->Run(), task2->Run(), and task3->Run() in sequence. It is impossible
> for other tasks to run in between these 3 tasks.
> 
> By breaking atomicity of regular tasks, I mean we might have taskA
> dispatched to call task1->Run() and taskB dispatched to call task2->Run()
> and task3->Run() in sequence. It is possible for other tasks to run in
> between task1 and task2.

These definitions make sense to me.

Thinking about this some more, I think you're right that the regular tasks don't need to be atomic, the just need to be delayed. I _think_ we can implicitly append additional PerThreadTaskGroups for regular tasks, _assuming_ that we make sure to put all state updates (including ones that occur after additional PerThreadTaskGroups are appended) continue to live in the first PerThreadTaskGroup.
Attachment #8845206 - Flags: review?(bobbyholley)
Attachment #8812640 - Flags: review?(bobbyholley)
Depends on: 1345711
Comment on attachment 8812640 [details]
Bug 1318226. P2 - add gtest TestTaskQueue to test the order of regular tasks.

https://reviewboard.mozilla.org/r/94284/#review126040

Sorry for the delay on this one.
Attachment #8812640 - Flags: review?(bobbyholley) → review+
Comment on attachment 8845206 [details]
Bug 1318226. P1 - preserve the order of regular tasks.

https://reviewboard.mozilla.org/r/118400/#review126042

::: xpcom/threads/TaskDispatcher.h:127
(Diff revision 1)
> -    PerThreadTaskGroup& group = EnsureTaskGroup(aThread);
> +    // To preserve the event order, we need to append a new group if the last
> +    // group is not targeted for |aThread|.

Probably worth explaining the issue here in a bit more detail, and/or referencing the discussion in the bug and/or the testcase that demonstrates this issue.
Attachment #8845206 - Flags: review?(bobbyholley) → review+
Thanks!
Assignee: nobody → jwwang
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd225b588582
P1 - preserve the order of regular tasks. r=bholley
https://hg.mozilla.org/integration/autoland/rev/1bfd5d58b3b1
P2 - add gtest TestTaskQueue to test the order of regular tasks. r=bholley
https://hg.mozilla.org/mozilla-central/rev/cd225b588582
https://hg.mozilla.org/mozilla-central/rev/1bfd5d58b3b1
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: