Access to TaskBuffersManager::mTaskQueue is not-threadsafe

RESOLVED FIXED in Firefox 58

Status

()

P2
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: jya, Assigned: jya)

Tracking

(Blocks: 6 bugs, {regression})

55 Branch
mozilla58
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox56 wontfix, firefox57- wontfix, firefox58 fixed)

Details

Attachments

(2 attachments)

(Assignee)

Description

a year ago
TrackBuffersManager::QueueTask and TrackBuffersManager::ProcessTask reads mTaskQueue on both the main thread and the MediaSourceDemuxer's taskqueue.

However, the mTaskQueue can be cleared on the demuxer's taskqueue.

As such, reading the value of mTaskQueue must be done in a thread-safe fashion.

A race in reading the value of mTaskQueue could explain some of the crashes we see in bug 1247189 or bug 1402681.

I see to solution for it:
1- Only ever access mTaskQueue while holding a mutex
2- Seeing that mTaskQueue is only ever cleared while performing a Detach, we could have an atomic mDetached that is set prior clearing mTaskQueue and always check on the value of mDetached before reading mTaskQueue and abort the current operation.
(Assignee)

Comment 1

a year ago
bug 1264694 was bad...
Blocks: 1264694
Keywords: regression
(Assignee)

Updated

a year ago
Priority: -- → P2
How about adding the atomic mDetached and then never clear the mTaskQueue anymore?

In addition, will you take this bug? or I could give it a try?
Thanks!
(Assignee)

Comment 3

a year ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #2)
> How about adding the atomic mDetached and then never clear the mTaskQueue
> anymore?
if you do that you get bug 1264694 once again...
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee: nobody → jyavenard

Comment 8

a year ago
mozreview-review
Comment on attachment 8917797 [details]
Bug 1407940 - P1. Use Mutex in place of Monitor.

https://reviewboard.mozilla.org/r/188736/#review194242
Attachment #8917797 - Flags: review?(gsquelart) → review+

Comment 9

a year ago
mozreview-review
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

r+ assuming you're confident that OnTaskQueue() always means mTaskQueue is not null (see 2nd issue).
Other suggestions up to you, of course.

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

#ifdef

::: dom/media/mediasource/TrackBuffersManager.h:464
(Diff revision 3)
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

Is this assertion correct?

I'm thinking of the following scenario:
1. A task is queued, that will run ResetTaskQueue(),
2. Another task is queued,
3. Task #1 runs and resets mTaskQueue,
4. Task #2 runs and tests OnTaskQueue(), but now mTaskQueue is null!

If that's not possible, I'd suggest you add a comment to explain that. And maybe change the MOZ_ASSERT(OnTaskQueue) calls into MOZ_DIAGNOSTIC_ASSERTs?

But if it is possible, there are a lot of places where you just dereference mTaskQueue without checking for null, because you assume OnTaskQueue==true implies mTaskQueue!=nullptr, you'll need to change those.

::: dom/media/mediasource/TrackBuffersManager.h:467
(Diff revision 3)
>    }
> +#if DEBUG
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

(Assuming you keep this assertion, after dealing with the previous issue.)
OnTaskQueue() is inside an `#ifdef DEBUG` and will only be called from `MOZ_ASSERT()`, so I think this one could be a MOZ_ASSERT too.
No real difference in the end, but at least we won't get the wrong idea that this test will also happen in RELEASE builds.

::: dom/media/mediasource/TrackBuffersManager.h:471
(Diff revision 3)
> +  void ResetTaskQueue()
> +  {
> +    MutexAutoLock mut(mMutex);
> +    mTaskQueue = nullptr;

This should only be called once from the taskqueue, right? How about a `MOZ_ASSERT(mTaskQueue && mTaskQueue->IsCurrentThreadIn());` for peace of mind?

::: dom/media/mediasource/TrackBuffersManager.h:518
(Diff revision 3)
> +  // mTaskQueue is only ever written after construction on the task queue.
> +  // As such, it can be accessed while on task queue without the need for the
> +  // mutex.
> +  RefPtr<AutoTaskQueue> mTaskQueue;

To enforce the "accessed on task queue without mutex", I would suggest you add a companion method to GetTaskQueueSafe(), e.g.:
  AutoTaskQueue* GetTaskQueueFromTaskQueue() const
  {
#ifdef DEBUG
    RefPtr tq = GetTaskQueueSafe();
    MOZ_ASSERT(!tq || tq->IsCurrentThreadIn());
#endif
    return mTaskQueue.get();
  }

Then you can change all naked `mTaskQueue` accesses to that (except in the constructor and accessors) and avoid risking getting it wrong.


Optionally, you could also have a non-failing one, e.g.:
  NotNull<AutoTaskQueue*> TaskQueueFromTaskQueue() const // <-- notice no "Get" in the name
  {
#ifdef DEBUG
    RefPtr tq = GetTaskQueueSafe();
    MOZ_ASSERT(tq && tq->IsCurrentThreadIn());
#endif
    return WrapNotNull(mTaskQueue.get());
  }
Which you can use when dereferencing without checking, e.g.: `TaskQueueFromTaskQueue()->Dispatch(...`.
But this could be one step too far? :-)
Attachment #8917798 - Flags: review?(gsquelart) → review+

Comment 10

a year ago
mozreview-review
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194284

Thanks!

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

Is necessary to make this function only running on DEBUG?
That would make the release assert become useless.
Attachment #8917798 - Flags: review?(alwu) → review+

Comment 11

a year ago
mozreview-review-reply
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

> #ifdef

Ignore this, I see that `#if DEBUG` works as well and is used a lot.
(Assignee)

Comment 12

a year ago
mozreview-review-reply
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

It is following bug 1247189. Though OnTaskQueue() could be removed from the code called by MediaSourceDemuxer, as we know it's called on Task Queue , but it doesn't matter if mTaskQueue is null.
(Assignee)

Comment 13

a year ago
mozreview-review
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194316

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

indeed.

this function is only ever used on the taskqueue, as it is safe to read mTaskQueue there

::: dom/media/mediasource/TrackBuffersManager.h:464
(Diff revision 3)
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

you cannot have another task being queued after a Detach task has run (which is what called ResetTaskQueue).
so the scenario you describe can't happen.

we also have assertions that the last task is always detach task, and test that the no other task can run following that.

i think the code makes it clear enough that it is indeed the case.

Everywhere mTaskQueue is used, i have made sure it is preceded by MOZ_ASSERT(OnTaskQueue), so i think that case is covered as well.
The assert aren't there for functionnal purposes, only to make it clesr to the reader, where it is we are running.

you can't be OnTaskQueue and have mTaskQueue being null.
(Assignee)

Comment 14

a year ago
mozreview-review-reply
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

> To enforce the "accessed on task queue without mutex", I would suggest you add a companion method to GetTaskQueueSafe(), e.g.:
>   AutoTaskQueue* GetTaskQueueFromTaskQueue() const
>   {
> #ifdef DEBUG
>     RefPtr tq = GetTaskQueueSafe();
>     MOZ_ASSERT(!tq || tq->IsCurrentThreadIn());
> #endif
>     return mTaskQueue.get();
>   }
> 
> Then you can change all naked `mTaskQueue` accesses to that (except in the constructor and accessors) and avoid risking getting it wrong.
> 
> 
> Optionally, you could also have a non-failing one, e.g.:
>   NotNull<AutoTaskQueue*> TaskQueueFromTaskQueue() const // <-- notice no "Get" in the name
>   {
> #ifdef DEBUG
>     RefPtr tq = GetTaskQueueSafe();
>     MOZ_ASSERT(tq && tq->IsCurrentThreadIn());
> #endif
>     return WrapNotNull(mTaskQueue.get());
>   }
> Which you can use when dereferencing without checking, e.g.: `TaskQueueFromTaskQueue()->Dispatch(...`.
> But this could be one step too far? :-)

I like that last option...

I'll probably do that one.
(Assignee)

Comment 15

a year ago
mozreview-review-reply
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194284

> Is necessary to make this function only running on DEBUG?
> That would make the release assert become useless.

indeed....
Hi Jean, 
Can you please tell which versions are affected, because I want to set flags?
Flags: needinfo?(jyavenard)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 19

a year ago
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/05efc1a5fb95
P1. Use Mutex in place of Monitor. r=gerald
https://hg.mozilla.org/integration/autoland/rev/e676d6852012
P2. Only ever access mTaskQueue in a thread-safe fashion. r=alwu,gerald
https://hg.mozilla.org/mozilla-central/rev/05efc1a5fb95
https://hg.mozilla.org/mozilla-central/rev/e676d6852012
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
status-firefox56: --- → wontfix
status-firefox57: --- → wontfix
status-firefox-esr52: --- → wontfix
(Assignee)

Comment 21

a year ago
[Tracking Requested - why for this release]: this is the likely cause to bug 1247189 and many others. While relatively low volume it does cause regular crashes.
Uplifting to 57 would be good
status-firefox57: wontfix → affected
tracking-firefox57: --- → ?
Flags: needinfo?(jyavenard)
Uplifting to 57 would require you filling the uplift request template :)
Flags: needinfo?(jyavenard)
I would not mind this riding the trains as we have this for a long time.
I'll call this fix-optional for 57.
status-firefox57: affected → fix-optional
tracking-firefox57: ? → -
(Assignee)

Comment 25

a year ago
As Sylvestre mentioned, while it would be nice to bury this bug once and for all, it's been there for so long it could ride the train.

And it would require a modification to work with 57.
Flags: needinfo?(jyavenard)
status-firefox57: fix-optional → wontfix
You need to log in before you can comment on or make changes to this bug.