Closed Bug 1407940 Opened 7 years ago Closed 7 years ago

Access to TaskBuffersManager::mTaskQueue is not-threadsafe

Categories

(Core :: Audio/Video: Playback, defect, P2)

55 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- wontfix
firefox56 --- wontfix
firefox57 - wontfix
firefox58 --- fixed

People

(Reporter: jya, Assigned: jya)

References

(Blocks 6 open bugs)

Details

(Keywords: regression)

Attachments

(2 files)

TrackBuffersManager::QueueTask and TrackBuffersManager::ProcessTask reads mTaskQueue on both the main thread and the MediaSourceDemuxer's taskqueue.

However, the mTaskQueue can be cleared on the demuxer's taskqueue.

As such, reading the value of mTaskQueue must be done in a thread-safe fashion.

A race in reading the value of mTaskQueue could explain some of the crashes we see in bug 1247189 or bug 1402681.

I see to solution for it:
1- Only ever access mTaskQueue while holding a mutex
2- Seeing that mTaskQueue is only ever cleared while performing a Detach, we could have an atomic mDetached that is set prior clearing mTaskQueue and always check on the value of mDetached before reading mTaskQueue and abort the current operation.
bug 1264694 was bad...
Blocks: 1264694
Keywords: regression
Priority: -- → P2
How about adding the atomic mDetached and then never clear the mTaskQueue anymore?

In addition, will you take this bug? or I could give it a try?
Thanks!
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #2)
> How about adding the atomic mDetached and then never clear the mTaskQueue
> anymore?
if you do that you get bug 1264694 once again...
Assignee: nobody → jyavenard
Comment on attachment 8917797 [details]
Bug 1407940 - P1. Use Mutex in place of Monitor.

https://reviewboard.mozilla.org/r/188736/#review194242
Attachment #8917797 - Flags: review?(gsquelart) → review+
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

r+ assuming you're confident that OnTaskQueue() always means mTaskQueue is not null (see 2nd issue).
Other suggestions up to you, of course.

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

#ifdef

::: dom/media/mediasource/TrackBuffersManager.h:464
(Diff revision 3)
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

Is this assertion correct?

I'm thinking of the following scenario:
1. A task is queued, that will run ResetTaskQueue(),
2. Another task is queued,
3. Task #1 runs and resets mTaskQueue,
4. Task #2 runs and tests OnTaskQueue(), but now mTaskQueue is null!

If that's not possible, I'd suggest you add a comment to explain that. And maybe change the MOZ_ASSERT(OnTaskQueue) calls into MOZ_DIAGNOSTIC_ASSERTs?

But if it is possible, there are a lot of places where you just dereference mTaskQueue without checking for null, because you assume OnTaskQueue==true implies mTaskQueue!=nullptr, you'll need to change those.

::: dom/media/mediasource/TrackBuffersManager.h:467
(Diff revision 3)
>    }
> +#if DEBUG
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

(Assuming you keep this assertion, after dealing with the previous issue.)
OnTaskQueue() is inside an `#ifdef DEBUG` and will only be called from `MOZ_ASSERT()`, so I think this one could be a MOZ_ASSERT too.
No real difference in the end, but at least we won't get the wrong idea that this test will also happen in RELEASE builds.

::: dom/media/mediasource/TrackBuffersManager.h:471
(Diff revision 3)
> +  void ResetTaskQueue()
> +  {
> +    MutexAutoLock mut(mMutex);
> +    mTaskQueue = nullptr;

This should only be called once from the taskqueue, right? How about a `MOZ_ASSERT(mTaskQueue && mTaskQueue->IsCurrentThreadIn());` for peace of mind?

::: dom/media/mediasource/TrackBuffersManager.h:518
(Diff revision 3)
> +  // mTaskQueue is only ever written after construction on the task queue.
> +  // As such, it can be accessed while on task queue without the need for the
> +  // mutex.
> +  RefPtr<AutoTaskQueue> mTaskQueue;

To enforce the "accessed on task queue without mutex", I would suggest you add a companion method to GetTaskQueueSafe(), e.g.:
  AutoTaskQueue* GetTaskQueueFromTaskQueue() const
  {
#ifdef DEBUG
    RefPtr tq = GetTaskQueueSafe();
    MOZ_ASSERT(!tq || tq->IsCurrentThreadIn());
#endif
    return mTaskQueue.get();
  }

Then you can change all naked `mTaskQueue` accesses to that (except in the constructor and accessors) and avoid risking getting it wrong.


Optionally, you could also have a non-failing one, e.g.:
  NotNull<AutoTaskQueue*> TaskQueueFromTaskQueue() const // <-- notice no "Get" in the name
  {
#ifdef DEBUG
    RefPtr tq = GetTaskQueueSafe();
    MOZ_ASSERT(tq && tq->IsCurrentThreadIn());
#endif
    return WrapNotNull(mTaskQueue.get());
  }
Which you can use when dereferencing without checking, e.g.: `TaskQueueFromTaskQueue()->Dispatch(...`.
But this could be one step too far? :-)
Attachment #8917798 - Flags: review?(gsquelart) → review+
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194284

Thanks!

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

Is necessary to make this function only running on DEBUG?
That would make the release assert become useless.
Attachment #8917798 - Flags: review?(alwu) → review+
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

> #ifdef

Ignore this, I see that `#if DEBUG` works as well and is used a lot.
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

It is following bug 1247189. Though OnTaskQueue() could be removed from the code called by MediaSourceDemuxer, as we know it's called on Task Queue , but it doesn't matter if mTaskQueue is null.
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194316

::: dom/media/mediasource/TrackBuffersManager.h:463
(Diff revision 3)
> -  AbstractThread* GetTaskQueue() const
> +  RefPtr<AutoTaskQueue> GetTaskQueueSafe() const
>    {
> +    MutexAutoLock mut(mMutex);
>      return mTaskQueue;
>    }
> +#if DEBUG

indeed.

this function is only ever used on the taskqueue, as it is safe to read mTaskQueue there

::: dom/media/mediasource/TrackBuffersManager.h:464
(Diff revision 3)
>    bool OnTaskQueue() const
>    {
> -    MOZ_RELEASE_ASSERT(GetTaskQueue());
> -    return GetTaskQueue()->IsCurrentThreadIn();
> +    RefPtr<AutoTaskQueue> taskQueue = GetTaskQueueSafe();
> +    MOZ_RELEASE_ASSERT(taskQueue);

you cannot have another task being queued after a Detach task has run (which is what called ResetTaskQueue).
so the scenario you describe can't happen.

we also have assertions that the last task is always detach task, and test that the no other task can run following that.

i think the code makes it clear enough that it is indeed the case.

Everywhere mTaskQueue is used, i have made sure it is preceded by MOZ_ASSERT(OnTaskQueue), so i think that case is covered as well.
The assert aren't there for functionnal purposes, only to make it clesr to the reader, where it is we are running.

you can't be OnTaskQueue and have mTaskQueue being null.
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194244

> To enforce the "accessed on task queue without mutex", I would suggest you add a companion method to GetTaskQueueSafe(), e.g.:
>   AutoTaskQueue* GetTaskQueueFromTaskQueue() const
>   {
> #ifdef DEBUG
>     RefPtr tq = GetTaskQueueSafe();
>     MOZ_ASSERT(!tq || tq->IsCurrentThreadIn());
> #endif
>     return mTaskQueue.get();
>   }
> 
> Then you can change all naked `mTaskQueue` accesses to that (except in the constructor and accessors) and avoid risking getting it wrong.
> 
> 
> Optionally, you could also have a non-failing one, e.g.:
>   NotNull<AutoTaskQueue*> TaskQueueFromTaskQueue() const // <-- notice no "Get" in the name
>   {
> #ifdef DEBUG
>     RefPtr tq = GetTaskQueueSafe();
>     MOZ_ASSERT(tq && tq->IsCurrentThreadIn());
> #endif
>     return WrapNotNull(mTaskQueue.get());
>   }
> Which you can use when dereferencing without checking, e.g.: `TaskQueueFromTaskQueue()->Dispatch(...`.
> But this could be one step too far? :-)

I like that last option...

I'll probably do that one.
Comment on attachment 8917798 [details]
Bug 1407940 - P2. Only ever access mTaskQueue in a thread-safe fashion.

https://reviewboard.mozilla.org/r/188738/#review194284

> Is necessary to make this function only running on DEBUG?
> That would make the release assert become useless.

indeed....
Hi Jean, 
Can you please tell which versions are affected, because I want to set flags?
Flags: needinfo?(jyavenard)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/05efc1a5fb95
P1. Use Mutex in place of Monitor. r=gerald
https://hg.mozilla.org/integration/autoland/rev/e676d6852012
P2. Only ever access mTaskQueue in a thread-safe fashion. r=alwu,gerald
[Tracking Requested - why for this release]: this is the likely cause to bug 1247189 and many others. While relatively low volume it does cause regular crashes.
Uplifting to 57 would be good
Flags: needinfo?(jyavenard)
Uplifting to 57 would require you filling the uplift request template :)
Flags: needinfo?(jyavenard)
I would not mind this riding the trains as we have this for a long time.
I'll call this fix-optional for 57.
As Sylvestre mentioned, while it would be nice to bury this bug once and for all, it's been there for so long it could ride the train.

And it would require a modification to work with 57.
Flags: needinfo?(jyavenard)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: