Closed Bug 1406118 Opened 7 years ago Closed 9 months ago

Crash in TrackBuffersManager::RemoveFrames

Categories

(Core :: Audio/Video: Playback, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1846957

People

(Reporter: jya, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash, Whiteboard: [bad-ram?] [media-crash], qa-not-actionable)

Crash Data

[@ mozilla::TrackBuffersManager::RemoveFrames ]rare crash, with no obvious explanation. 



https://crash-stats.mozilla.com/report/index/5a6ead0f-8aeb-4bf7-b7b7-764dd0171002
See Also: → 1247189
I think it might also be related with the detached source buffer.

If the detached happen before resolving the promise [1], then we would get that situation.

Maybe we should check whether we have been detached or not every time when the promise is resolved.

[1] http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#1281
Flags: needinfo?(jyavenard)
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #1)
> I think it might also be related with the detached source buffer.
>
> If the detached happen before resolving the promise [1], then we would get
> that situation.
> 
> Maybe we should check whether we have been detached or not every time when
> the promise is resolved.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/
> TrackBuffersManager.cpp#1281

No it won't, because the Detach task won't run until the appendBuffer task has fully completed.
A new task can't start while a current one is running.
http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#179

mCurrentTask is only cleared once the appendTask completes and succeed.
http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#796
Or is rejected.
http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#808

Despite, the bug is about RemoveFrames, nothing to do with demuxing content.

RemoveFrames can only be called before inserting new frames or when processing a RangeRemoval task.

The code you linked to has nothing to do with either of those operations anyway.

When you call Detach, it queues a DetachTask, that task won't run until all other tasks have run and is always the last task to run. You even added assertions to make sure of that.

http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#151
Flags: needinfo?(jyavenard)
I'm thinking about one situation (although I'm not sure it's possible or not).

First, notice that all crashes happened on old version, so the race between main thread and task queue is possible.

Assume the CC has happened, TBM got detached and then we might have a race, same reason as 1402681.

Then, we call TBM::ProcessTasks() on the same time and on the different threads. This would make us bypass the checking [1] and run the different tasks at the same time.

If one task is append task and other is reset task (which might be called from [2]), then we would clear |mQueuedSamples| during the appending process, it would cause the UAF.

Do you think is it possible? 

[1] http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#179
[2] http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/SourceBuffer.cpp#217
Flags: needinfo?(jyavenard)
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #3)
> I'm thinking about one situation (although I'm not sure it's possible or
> not).
> 
> First, notice that all crashes happened on old version, so the race between
> main thread and task queue is possible.

check the description of this commit.

I could have sworn that the crash report I put in the description was *after* bug 1402681 !


> 
> Assume the CC has happened, TBM got detached and then we might have a race,
> same reason as 1402681.
> [snip]
> Do you think is it possible? 

it certainly is possible, but only before bug 1402681.

(I was actually thinking of reports https://crash-stats.mozilla.com/report/index/e4795edc-c34d-40df-ba02-a7d8a0171005)
Flags: needinfo?(jyavenard)
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #3)
> 
> Then, we call TBM::ProcessTasks() on the same time and on the different
> threads. This would make us bypass the checking [1] and run the different
> tasks at the same time.

actually, on second thoughts...

no that's not possible.

When TrackBuffersManager::RemoveFrames() is running, it means that we either have a AppendBuffer or a RangeRemoval task running.

While those two tasks are running, the Detach task hasn't run.
http://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/dom/media/mediasource/TrackBuffersManager.cpp#179

bug 1247189 can occur due to the race that would have occurred *after* the Detach task being run (and having a second Detach task running again, this time on the main thread)

While the Detach task is pending, you can't have the race, as only the Detach task clear mTaskQueue

As such, this bug is different than bug 1247189
Assignee: nobody → alwu
Assignee: alastor0325 → nobody
Severity: normal → critical
Keywords: crash
Don't see how that crash could happen normally, that is with proper ram, no bit flip of any kind etc.

such a rare incidence (a couple every month) i tend to blame cosmic rays
Keywords: stalled
Whiteboard: [bad-ram?]
Priority: P2 → P3
Whiteboard: [bad-ram?] → [bad-ram?] [media-crash]
Whiteboard: [bad-ram?] [media-crash] → [bad-ram?] [media-crash], qa-not-actionable
Severity: critical → S2

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 10 desktop browser crashes on nightly
  • Top 10 AArch64 and ARM crashes on nightly

:jimm, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)
Keywords: topcrash
Status: NEW → RESOLVED
Closed: 9 months ago
Duplicate of bug: 1846957
Flags: needinfo?(jmathies)
Resolution: --- → DUPLICATE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
You need to log in before you can comment on or make changes to this bug.