Null deref in TrackBuffersManager

REOPENED
Assigned to

Status

()

defect
P2
normal
REOPENED
3 years ago
3 months ago

People

(Reporter: jya, Assigned: alwu)

Tracking

(Blocks 2 bugs, {crash})

Trunk
mozilla58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox56 wontfix, firefox57 wontfix, firefox58 fixed)

Details

(crash signature)

Attachments

(6 attachments)

Reporter

Description

3 years ago
There are various crash reports related to a sample in the track buffer being null.

Example:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ATrackBuffersManager%3A%3AGetSample
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ATrackBuffersManager%3A%3ARemoveFrames
or:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ATrackBuffersManager%3A%3AGetNextRandomAccessPoint

Those bug reports make little sense. It's a physical impossibility to have a sample in our track buffer that is null.

Each demuxer is used, use a sample being nullptr as a condition to exit their loop, as such a nullptr can never be added to trackbuffer:

mp4:
https://dxr.mozilla.org/mozilla-central/source/dom/media/fmp4/MP4Demuxer.cpp#304

webm:
https://dxr.mozilla.org/mozilla-central/source/dom/media/webm/WebMDemuxer.cpp#846


Yet we have crashes there for example:

https://hg.mozilla.org/releases/mozilla-beta/annotate/828db915a076/dom/media/mediasource/TrackBuffersManager.cpp#l2133

addresss of crash is 0x34 which points to sample being null (+offset of mKeyFrame member)

The TrackBuffersManager is only ever accessed from the same task queue. There can't be data races in there.

More disturbing is that the vast majority of those bugs occur on MacOS (82%) the majority being OS X 10.8 (39%)
Reporter

Comment 1

3 years ago
There are over 500 of those crashes a week apparently.

The track buffer uses an infallible nsTArray to store RefPtr<MediaRawData> element.
samples are added in the array there:
https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasource/TrackBuffersManager.cpp#1377

and in particular here:
https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasource/TrackBuffersManager.cpp#1553

note that if sample was ever nullptr, the access to the members just above would have caused a crash well before it was added to the array.
Reporter

Comment 2

3 years ago
the samples are going through a temporary array and are then inserted into our current array there:

https://dxr.mozilla.org/mozilla-central/rev/2dfb45d74f42d2a0010696f5fd47c7a7da94cedb/dom/media/mediasource/TrackBuffersManager.cpp#1694

The data is added at the location contained in trackBuffer.mNextInsertionIndex

trackBuffer.mNextInsertionIndex itself is calculated either there: https://dxr.mozilla.org/mozilla-central/rev/2dfb45d74f42d2a0010696f5fd47c7a7da94cedb/dom/media/mediasource/TrackBuffersManager.cpp#1616

or there:
https://dxr.mozilla.org/mozilla-central/rev/2dfb45d74f42d2a0010696f5fd47c7a7da94cedb/dom/media/mediasource/TrackBuffersManager.cpp#1625

as one can see, this value is always <= array.Length()

Once the samples are added, mNextInsertionIndex is incremented by the number of samples just added there:
https://dxr.mozilla.org/mozilla-central/rev/2dfb45d74f42d2a0010696f5fd47c7a7da94cedb/dom/media/mediasource/TrackBuffersManager.cpp#1695
Reporter

Comment 3

3 years ago
Robert, I was told that you would be the person most likely to provide a reasonable explanation.

I can't at this stage provide any explanations on how this situation could ever happen.

Could there be those macs are hackintosh and running some wierd stuff? 
surprisingly, on windows which has a much greater market share, those bugs aren't very common

thank you
Flags: needinfo?(kairo)
Crash Signature: [@ mozilla::TrackBuffersManager::GetSample ] [@ @0x0 | mozilla::TrackBuffersManager::GetSample ] [@ mozilla::TrackBuffersManager::RemoveFrames ] [@ mozilla::TrackBuffersManager::GetNextRandomAccessPoint ] [@ mozilla::TrackBuffersManager::Seek ]
Reporter

Comment 4

3 years ago
Let see if bug 1247041 has any impacts (I don't believe so, but time will tell)
See Also: → 1247041

Comment 5

3 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #3)
> Robert, I was told that you would be the person most likely to provide a
> reasonable explanation.

Sorry, I have no idea about the code - I may know how to pull stuff out of crash stats, but not how our code works.
Flags: needinfo?(kairo)
Reporter

Comment 6

3 years ago
Note that all signatures but [@ mozilla::TrackBuffersManager::GetSample ] are now disappeared from crash reporter.

The remaining one is mostly in 45. But still two crashes in 48 in the past 7 days.
Both making no sense at all:
https://crash-stats.mozilla.com/report/index/6a357eb6-c8db-42dc-a813-9a7f02160424

as the null deref has been tested just 2 lines above.
Could bug 1281295 (TSan: data race involving ~TrackBuffersManager() -> ShutdownDemuxers() on main thread) be a cause of this?

Imagine ~TrackBuffersManager() completed on the main thread (and therefore is not visible anymore in stack traces), it would have destroyed a few things that are used on the TrackBuffersManager's task queue, and could explain nonsensical issues seen there.

E.g., in https://crash-stats.mozilla.com/report/index/6a357eb6-c8db-42dc-a813-9a7f02160424 the main thread is performing some cycle collection, which could have destroyed TrackBuffersManager and friends.
Depends on: 1281295
No longer depends on: 1281295
Sorry: Bug 1281295 was a red herring, nothing to do with TrackBuffersManager.
Reporter

Comment 9

3 years ago
There's been no crashes since 51 with those signatures, can't think of anything that would fundamentally affect this outcome...
Reporter

Comment 10

3 years ago
still happening with bug 1315631 in :(
Reporter

Comment 11

3 years ago
Only signatures
[@ mozilla::TrackBuffersManager::GetSample ] and [@ mozilla::TrackBuffersManager::RemoveFrames ] are still present in the past week...

So maybe there is hope that bug 1315631 did help
jya, 
Can you take this bug?
Flags: needinfo?(jyavenard)
I happened had a chance to look into this in a glance.
The crash CallStack seems to de-ref a MediaRawData sample here [1].
Then look into the place where sample comes from [2], the reference count is not increased.

So, it's likely that the sample might be released during the following execution. 
"""
    if (!sample) {
      return nullptr;
    }

    RefPtr<MediaRawData> p = sample->Clone();
"""

I'll find out where and when the sample is added into the track to see if it's the case.

[1] http://searchfox.org/mozilla-central/rev/225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/dom/media/mediasource/TrackBuffersManager.cpp#2290-2300
[2] http://searchfox.org/mozilla-central/rev/225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/dom/media/mediasource/TrackBuffersManager.cpp#2251
Reporter

Comment 14

2 years ago
For this crash to occur, there must be a nullptr stored into the array of RefPtr<MediaRawData> which is just impossible.

There are no cases under which a null is stored into the array. none ever.

So either there's memory corruption that happens to corrupt the existing array or it's something else.

> So, it's likely that the sample might be released during the following execution. 

and how could the sample be released in those 4 lines of code ?
Flags: needinfo?(jyavenard)
Keywords: crash
The crash reports opened by VC indicates that "sample-> access null" or "sample-> access 0xFFFFFFFF"(with arrow).

It makes me feel it is UAF case which  the pointer ``sample`` gotten from ``GetSample`` has already corrupted and been released by somewhere. The sample->vtableptr is pointed to some poisoned address or null.
James,
Is there any thing we can move this bug forward?
Flags: needinfo?(jacheng)
Reporter

Comment 17

2 years ago
That bug make no sense to me. UAF aren't possible here as all the objects are ref counted.

Right above the test we also check that the sample isn't null.
https://crash-stats.mozilla.com/report/index/05318c49-97ea-4980-a901-29b5e0170729

The only one crash(gecko 56) I can search from socorro is this one.

But it seems more like UAF not null dereference.

the 'sample' we got from 'GetSample' is a destructed object.

the sample->_vptr = 0x00007FFD77EEF600  which is an invalid address,

The vtable pointer is destroyed from somewhere but we still hold this instance by RefPtr....

I cannot find any 'explicit delete' raw pointer action.

Hi jya,

Do you have any idea where might be the suspicious point that we delete the instance manually?

Thanks.
Flags: needinfo?(jacheng) → needinfo?(jyavenard)
Reporter

Comment 19

2 years ago
I have no idea for this one.

As far as I can tell, UAF aren't possible as all is ref counted, and we never had the use of dangling pointers.

All is always running on the same task queue so we have no possible race.
Pointers are always stored in RefPtr, there's just no logical possibility that the object got used after free.

my only explanation at this stage is memory corruptipn. Feel free to investigate it further !
Flags: needinfo?(jyavenard)
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
Depends on: 1402728
Assignee

Comment 21

2 years ago
In bug1402681, I found the way to reproduce the situation that track buffers manager doesn't have a valid mTaskQueue, and it might be a possible root cause. I'll continue to investigate it.
Assignee: nobody → alwu
帥!
Good to see some light in this mysterious bug.
Assignee

Comment 23

2 years ago
The reason we've accessed the null task queue is because we do the TBM::Detach() before MFR::Shutdown().

The Detach() is called from [1], and it doesn't have direct relationship with MFR::Shutdown(), since MFR::Shutdown() is triggered by MDSM.

After Detach() called, anything related with track demuxer's mManager would be wrong. 

Eg. In MFR::Shutdown calling MediaSourceTrackDemuxer::Reset, it would use the mManager->Seek

[1] http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/media/mediasource/MediaSourceDecoder.cpp#174

---

The first thing I've tried is to make MD::shutdown returns a promise and then do Detach() after MFR::shutdown, but it would cause the UAF because the source demuxer and source track demuxer are both released after MFR::shutdown (losing the ref counted).

Still trying to find the correct solution.
Reporter

Comment 24

2 years ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #23)
> The reason we've accessed the null task queue is because we do the
> TBM::Detach() before MFR::Shutdown().

Yeah, just as I commented in bug 1402681 comment 13

 
> The Detach() is called from [1], and it doesn't have direct relationship
> with MFR::Shutdown(), since MFR::Shutdown() is triggered by MDSM.
> 
> After Detach() called, anything related with track demuxer's mManager would
> be wrong. 

Indeed.

We can simply handle this particular case (checking mTaskQueue hasn't been cleared or adding a new mDetached member) and abort all the operations early...

How the MediaSource / SourceBuffer can be cycle collected prior the MDSM/MFR being shutdown is a worry though...

To avoid cycles only MediaSource holds a strong reference to the MediaSourceDecoder and the MediaSourceDecoder is detached from the MediaSource when cancelled.

http://searchfox.org/mozilla-central/source/dom/media/mediasource/MediaSource.cpp#163

However, this should have first been called from MediaSourceDecoder::Shutdown 
http://searchfox.org/mozilla-central/source/dom/media/mediasource/MediaSourceDecoder.cpp#173

But I can see how this could be racy if the Detach task gets to run quicker than the MDSM async shutdown
Assignee

Comment 25

2 years ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #23)
> 
> The first thing I've tried is to make MD::shutdown returns a promise and
> then do Detach() after MFR::shutdown, but it would cause the UAF because the
> source demuxer and source track demuxer are both released after
> MFR::shutdown (losing the ref counted).
> 
> Still trying to find the correct solution.

I was wrong, at that time the last ref to MediaSourceDemuxer is MediaSourceDecoder, so the thing would work if we release the ref of demuxer after detaching the media source.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 29

2 years ago
Got try fail, need to investigate that.
Assignee

Comment 30

2 years ago
Posted file CC before shutdown
This log seems showing the case that the CC happened before MFR::Shutdown.
Assignee

Updated

2 years ago
Depends on: 1403478
Reporter

Comment 31

2 years ago
mozreview-review
Comment on attachment 8912495 [details]
Bug 1247189 - part1 : should detach TBM after detaching it from demuxers.

https://reviewboard.mozilla.org/r/183818/#review189100

::: dom/media/mediasource/TrackBuffersManager.h:464
(Diff revision 1)
>    {
>      return mTaskQueue;
>    }
>    bool OnTaskQueue() const
>    {
> -    return !GetTaskQueue() || GetTaskQueue()->IsCurrentThreadIn();
> +    return GetTaskQueue() && GetTaskQueue()->IsCurrentThreadIn();

should just assert that GetTaskQueue() isn't returning nullptr.

In fact that should be a MOZ_RELEASE_ASSERT because having taskqueue null indicates an error.
Reporter

Updated

2 years ago
Blocks: 1257921
Assignee

Comment 32

2 years ago
Still trying the fix the try fails, would update more information when I fix them.
Assignee

Comment 33

2 years ago
After trying other different ways, like don't detach in CC or change media decoder's shutdown as a promise, but they couldn't fix the issue correctly. So, I would simply check the whether the TrackBuffersManager is detached to prevent running the task after detach() called, as jya mentioned in comment24.

Since detached could be unavoidable run before MFR::shutdown, eg, calling removeSourceBuffer.
When we detach the source buffer, what we do is to break the cycle between MediaSource/SourceBuffer/TrackBuffersManager and InputDemuxer(mp4/webm), the MediaTrackDemuxer won't know its mManager has been detached or not.

Therefore, we always have the possibility that detach() is called before MFR::Shotdown().
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 37

2 years ago
From [1], the task was executed after finished detach task. I guess it might be caused by queuing two detach tasks in the task queue.

If the previous detach task is still waiting in the task queue when we're calling the second detach(), then we might have two detach tasks in the queue. 

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=134315866&repo=try&lineNumber=2540
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Reporter

Comment 41

2 years ago
On second thought, I'm not sure there's anything we need to worry about should the trackbuffersmanager be accessed after the detach queue has run following bug 1402681.

The various trackbuffers are empty, the code as such will return early with EOS.

It's only the data race on two different threads we had to be concerned about.
Assignee

Comment 42

2 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #41)
> On second thought, I'm not sure there's anything we need to worry about
> should the trackbuffersmanager be accessed after the detach queue has run
> following bug 1402681.
> 
> The various trackbuffers are empty, the code as such will return early with
> EOS.
> 
> It's only the data race on two different threads we had to be concerned
> about.

Cause we would check OnTaskQueue() for lots of tasks, it would fail if we execute the task after TBM detached.
Reporter

Comment 43

2 years ago
Removing [@ mozilla::TrackBuffersManager::RemoveFrames ] signature as this method is always called from JS while the other other signatures are from the Media source demuxer and are due to access to the sourcebuffer after it has been detached.
Crash Signature: [@ mozilla::TrackBuffersManager::GetSample ] [@ @0x0 | mozilla::TrackBuffersManager::GetSample ] [@ mozilla::TrackBuffersManager::RemoveFrames ] [@ mozilla::TrackBuffersManager::GetNextRandomAccessPoint ] [@ mozilla::TrackBuffersManager::Seek ] → [@ mozilla::TrackBuffersManager::GetSample ] [@ @0x0 | mozilla::TrackBuffersManager::GetSample ] [@ mozilla::TrackBuffersManager::GetNextRandomAccessPoint ] [@ mozilla::TrackBuffersManager::Seek ]
Reporter

Updated

2 years ago
See Also: → 1406118
Reporter

Comment 44

2 years ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #42)
> (In reply to Jean-Yves Avenard [:jya] from comment #41)
> > On second thought, I'm not sure there's anything we need to worry about
> > should the trackbuffersmanager be accessed after the detach queue has run
> > following bug 1402681.
> > 
> > The various trackbuffers are empty, the code as such will return early with
> > EOS.
> > 
> > It's only the data race on two different threads we had to be concerned
> > about.
> 
> Cause we would check OnTaskQueue() for lots of tasks, it would fail if we
> execute the task after TBM detached.

We do not rely on the value of mTaskQueue to decide on what to do. OnTaskQueue is only used for for diagnostics and it can now be argued that it returns true if mTaskQueue is null is the appropriate behaviour.
While those changes make it more clear on what is going on they, are unnecessary.
Reporter

Comment 45

2 years ago
mozreview-review
Comment on attachment 8912495 [details]
Bug 1247189 - part1 : should detach TBM after detaching it from demuxers.

https://reviewboard.mozilla.org/r/183818/#review191946

This can't be your first patch. As if you were to commit this as is, it would cause crashes. As we do know that OnTaskQueue is called with mTaskQueue being null.
Attachment #8912495 - Flags: review?(jyavenard) → review+
Reporter

Comment 46

2 years ago
mozreview-review
Comment on attachment 8912496 [details]
Bug 1247189 - part2 : remove reference to TrackBuffersManagers once detached.

https://reviewboard.mozilla.org/r/183820/#review192002

There's aa mechanism in place to notify the MediaSourceDemuxer that aa TBM got detached. Please use this instead. It should be up to the MediaSourceDemuxer to know if the TBM has been detached, not query the TBM.
You'll need to modify Sourcebuffer code so that the MediaSourceDemuxer is notified it is detached prior calling TBM::detach https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasource/SourceBuffer.cpp#284
Those two lines need to be swapped.

II should add that without your part 1 patch, none of this is necessary, the race is no longer possible since bug
Attachment #8912496 - Flags: review?(jyavenard) → review-
Reporter

Comment 47

2 years ago
mozreview-review
Comment on attachment 8914234 [details]
Bug 1247189 - part3 : don't process any task after detached.

https://reviewboard.mozilla.org/r/185552/#review191048

::: dom/media/mediasource/TrackBuffersManager.cpp:179
(Diff revision 1)
>  {
> +  // If the previous detach task is still waiting in the task queue when we're
> +  // calling detach(), then we might have two detach tasks in the queue. If so,
> +  // ignore the later one.
> +  if (IsDetached()) {
> +    RefPtr<SourceBufferTask> task = mQueue.Pop();

You're going to have compilation issues here as your task variable is unused in non debug build. Should place the entire code block in #if DEBUG
Attachment #8914234 - Flags: review?(jyavenard) → review+
Assignee

Comment 48

2 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #44)
> We do not rely on the value of mTaskQueue to decide on what to do.
> OnTaskQueue is only used for for diagnostics and it can now be argued that
> it returns true if mTaskQueue is null is the appropriate behaviour.
> While those changes make it more clear on what is going on they, are
> unnecessary.

But mTaskQueue could clearly show us whether the TBM is detached or not, and we don't want to access TBM after detached. So, we still need to care about its value.

What do you mean returning true when mTaskQueue is null? Then you don't have the way to check which thread you're running on.

(In reply to Jean-Yves Avenard [:jya] from comment #46)
> There's aa mechanism in place to notify the MediaSourceDemuxer that aa TBM
> got detached. Please use this instead. It should be up to the
> MediaSourceDemuxer to know if the TBM has been detached, not query the TBM.
> You'll need to modify Sourcebuffer code so that the MediaSourceDemuxer is
> notified it is detached prior calling TBM::detach
> https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasource/
> SourceBuffer.cpp#284
> Those two lines need to be swapped.

Although MediaSourceDemuxer would be notified the detached TBM, the MediaSourceTrackDemuxer won't. After TBM was detached, the MediaSourceTrackDemuxer still hold a ref to a detached TBM. Then, when MFR call MediaSourceTrackDemuxer.reset(), it would access the detached TBM.

Therefore, we still need to prevent this behavior.
Flags: needinfo?(jyavenard)
Assignee

Comment 49

2 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #47) 
> You're going to have compilation issues here as your task variable is unused
> in non debug build. Should place the entire code block in #if DEBUG

I didn't see any compilation issues on my try result.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1f0def96c77df5b924f179b9a631840012c59c7&selectedJob=134343136
Reporter

Comment 50

2 years ago
MediaSourceDemuxer keeps a table of all its track demuxers, so there should be no issue to notify the track demuxers when the sourcebuffer gets detached 

That the TBM:mTaskQueue could be null is non important so long as the functions are accessed on the task queue. As I mentioned, this bug could be closed now. I'm 99% sure it can no longer occur. 
That TBM:OnTaskQueue returns true when mTaskQueue is null allows to not assert. 


 (In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #49)
> (In reply to Jean-Yves Avenard [:jya] from comment #47) 
> > You're going to have compilation issues here as your task variable is unused
> > in non debug build. Should place the entire code block in #if DEBUG
> 
> I didn't see any compilation issues on my try result.
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=e1f0def96c77df5b924f179b9a631840012c59c7&selectedJob=1
> 34343136

You haven't done PGO and static analyser builds.
Flags: needinfo?(jyavenard)
Reporter

Comment 51

2 years ago
To be more precise.
I'm referring to a point of architecture.
You are asking the TrackBuffersManager if it's dead, and if it says yes you avoid using it. Dead people shouldn't talk :) 



The track demuxers instead should be notified when the TrackBuffersManager got detached and avoid using it. Theres already mechanisms to do that in place. 

Bug again, fundamentally none of this is necessary, it just clarify the picture a bit
Assignee

Comment 52

2 years ago
(In reply to Jean-Yves Avenard [:jya] from comment #50)
> MediaSourceDemuxer keeps a table of all its track demuxers, so there should
> be no issue to notify the track demuxers when the sourcebuffer gets detached 

Could you help me point out where is the mechanism, that MediaSourceTrackDemuxer could know whether its TBM is detached?

Because what I observed (maybe wrong) is, MFR's MediaSourceTrackDemuxer doesn't be notified the TBM it holds is detached or not, then we might access the detached TBM.

eg. MFR calls MediaSourceTrackDemuxer.reset()

> That the TBM:mTaskQueue could be null is non important so long as the
> functions are accessed on the task queue. 

Then, how do you test whether we're running on task queue? or you want to remove that assert?

> As I mentioned, this bug could be
> closed now. I'm 99% sure it can no longer occur. 
> That TBM:OnTaskQueue returns true when mTaskQueue is null allows to not
> assert. 

But we still need the change for [1] to avoid running task on non-task queue, right? because |!GetTaskQueue() | is not a correct checking criteria .

[1] http://searchfox.org/mozilla-central/source/dom/media/mediasource/TrackBuffersManager.h#464
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

2 years ago
Attachment #8912496 - Flags: review?(jyavenard)
Attachment #8915869 - Flags: review?(jyavenard)
Assignee

Comment 57

2 years ago
Comment on attachment 8912495 [details]
Bug 1247189 - part1 : should detach TBM after detaching it from demuxers.

New patch.
Attachment #8912495 - Flags: review+ → review?(jyavenard)
Assignee

Comment 58

2 years ago
Comment on attachment 8912496 [details]
Bug 1247189 - part2 : remove reference to TrackBuffersManagers once detached.

Don't know why it doesn't sync with review board.
Attachment #8912496 - Flags: review?(jyavenard)
Reporter

Comment 59

2 years ago
Damn, it happened again https://crash-stats.mozilla.com/report/index/e4795edc-c34d-40df-ba02-a7d8a0171005

So maybe the current theory on why it happened is still not correct :(
Reporter

Comment 60

2 years ago
mozreview-review
Comment on attachment 8912495 [details]
Bug 1247189 - part1 : should detach TBM after detaching it from demuxers.

https://reviewboard.mozilla.org/r/183818/#review192270
Attachment #8912495 - Flags: review?(jyavenard) → review+
Reporter

Comment 61

2 years ago
mozreview-review
Comment on attachment 8912496 [details]
Bug 1247189 - part2 : remove reference to TrackBuffersManagers once detached.

https://reviewboard.mozilla.org/r/183820/#review192272

::: commit-message-616a6:1
(Diff revision 4)
> +Bug 1247189 - part2 : detach TBM for MediaSourceTrackDemuxer.

"remove reference to TrackBuffersManagers once detached

Should never access the TrackBuffersManager once the SourceBuffer has been detached"

::: dom/media/mediasource/MediaSourceDemuxer.h:121
(Diff revision 4)
>    bool GetSamplesMayBlock() const override
>    {
>      return false;
>    }
>  
> +  bool ShouldDetachOwningManager(TrackBuffersManager* aManager) const;

the name isnt correct. the manager isnt owning.
thr MediaSourceTrackDemuxer keeps a reference on the TrackBuffersManager, not the other way round.

HasManager() would gave been more fitting I think.

::: dom/media/mediasource/MediaSourceDemuxer.h:123
(Diff revision 4)
>      return false;
>    }
>  
> +  bool ShouldDetachOwningManager(TrackBuffersManager* aManager) const;
> +  void DetachTrackBuffersManager();
> +

DetachManager

::: dom/media/mediasource/MediaSourceDemuxer.cpp:407
(Diff revision 4)
>  {
>    RefPtr<MediaSourceTrackDemuxer> self = this;
>    nsCOMPtr<nsIRunnable> task =
>      NS_NewRunnableFunction("MediaSourceTrackDemuxer::BreakCycles", [self]() {
>        self->mParent = nullptr;
>        self->mManager = nullptr;

use the new DetachManager() function here

::: dom/media/mediasource/MediaSourceDemuxer.cpp:554
(Diff revision 4)
> +}
> +
> +void
> +MediaSourceTrackDemuxer::DetachTrackBuffersManager()
> +{
> +  MOZ_ASSERT(mManager);

if it can be called from BreakCycle, can't assert here as it may be called twice
Attachment #8912496 - Flags: review?(jyavenard) → review+
Reporter

Comment 62

2 years ago
mozreview-review
Comment on attachment 8914234 [details]
Bug 1247189 - part3 : don't process any task after detached.

https://reviewboard.mozilla.org/r/185552/#review192278

::: commit-message-c133d:3
(Diff revision 2)
> +Bug 1247189 - part3 : don't process any task after detached.
> +
> +From [1], the task was executed after finished detach task. I guess it might be

don't guess in your commits :)

your assertion is valid, it is theorically possible that a second Detach task was queued prior the first one running.

i dont think it mattered much though, the second detach would be a no-op regardless

::: dom/media/mediasource/TrackBuffersManager.cpp:175
(Diff revision 2)
>  }
>  
>  void
>  TrackBuffersManager::ProcessTasks()
>  {
> +  // If the previous detach task is still waiting in the task queue when we're

it wouldn't be the previous detach task that would still be in the queue. 

"A second Detach task was queued, prior the first one running, ignore it"

if we're sure about it however, we don't want to just assert nor be conditional to debug build. we must find that cause for those crashes.

so let's not make this conditional to debug, and use MOZ_RELEASE_ASSERT
Reporter

Comment 63

2 years ago
mozreview-review
Comment on attachment 8915869 [details]
Bug 1247189 - part4 : ensure we always detach TBM from demuxers.

https://reviewboard.mozilla.org/r/187116/#review192280
Attachment #8915869 - Flags: review?(jyavenard) → review+
Reporter

Comment 64

2 years ago
please re-add your patch that modified OnTaskQueue() so that it MOZ_RELEASE_ASSERT if it's called after a detach task has run.

This will guarantee that those changes work as intended.
Flags: needinfo?(alwu)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 70

2 years ago
Comment on attachment 8917232 [details]
Bug 1247189 - part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not.

carry r+ from comment45, this patch was the previous patch1.
Flags: needinfo?(alwu)
Attachment #8917232 - Flags: review+
Assignee

Comment 71

2 years ago
Comment on attachment 8917232 [details]
Bug 1247189 - part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not.

Even I keep same commit id, the review board didn't keep the old review result for me. Need to ask for review again.
Attachment #8917232 - Flags: review+ → review?(jyavenard)
Reporter

Comment 73

2 years ago
mozreview-review
Comment on attachment 8917232 [details]
Bug 1247189 - part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not.

https://reviewboard.mozilla.org/r/188248/#review193464

::: commit-message-b4d77:3
(Diff revision 1)
> +Bug 1247189 - part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not.
> +
> +If mTaskQueue is nullptr, OnTaskQueue() would always return true even we're not on the task queue.

You can remove this line... No point describing what the code used to do.
(as per https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Committing_Rules_and_Responsibilities#Checkin_comment)

I would however add this line:
"The use of the TrackBuffersManager once detached is explictly forbidden, as such OnTaskQueue() can only be used before the DetachTask ran: we now strongly assert as such"
Attachment #8917232 - Flags: review?(jyavenard) → review+
Comment hidden (mozreview-request)

Comment 75

2 years ago
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/26e324eb052d
part1 : should detach TBM after detaching it from demuxers. r=jya
https://hg.mozilla.org/integration/autoland/rev/5b9af87e76a9
part2 : remove reference to TrackBuffersManagers once detached. r=jya
https://hg.mozilla.org/integration/autoland/rev/36a63b9f8dcb
part3 : don't process any task after detached. r=jya
https://hg.mozilla.org/integration/autoland/rev/85d236f46b00
part4 : ensure we always detach TBM from demuxers. r=jya
https://hg.mozilla.org/integration/autoland/rev/9be5bbb35797
part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not. r=jya
Assignee

Comment 76

2 years ago
uplift reminder.
Flags: needinfo?(alwu)
Assignee

Updated

2 years ago
Flags: needinfo?(alwu)
Backed out for crashing in /media-source/mediasource-errors.html:

https://hg.mozilla.org/integration/autoland/rev/fea45716484e7b504a027558aed895127472d598
https://hg.mozilla.org/integration/autoland/rev/72100c49932866ff29178482b953d34e56b36a78
https://hg.mozilla.org/integration/autoland/rev/40fc05c294b610d38a156ce8cc27ac28492a2d91
https://hg.mozilla.org/integration/autoland/rev/68baa2a6566d64ff98106e94399b174484007552
https://hg.mozilla.org/integration/autoland/rev/7e3f88db34fd472b285c0cb556cc01774694e5e3

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=9be5bbb35797abfae615b3e3096b5ed0f6575b13&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=136154779&repo=autoland

[task 2017-10-11T09:16:58.053Z] 09:16:58     INFO - TEST-START | /media-source/mediasource-errors.html
[task 2017-10-11T09:16:58.101Z] 09:16:58     INFO - PID 5765 | 1507713418094	Marionette	DEBUG	Register listener.js for window 2147483755
[task 2017-10-11T09:16:58.420Z] 09:16:58     INFO - PID 5765 | [Parent 5765, Gecko_IOThread] WARNING: pipe error (55): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[task 2017-10-11T09:16:58.421Z] 09:16:58     INFO - PID 5765 | [Parent 5765, Gecko_IOThread] WARNING: pipe error (61): Connection reset by peer: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[task 2017-10-11T09:16:58.421Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.423Z] 09:16:58     INFO - PID 5765 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x150083,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x150083,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x150083,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
[task 2017-10-11T09:16:58.424Z] 09:16:58     INFO - PID 5765 | 
[task 2017-10-11T09:16:58.480Z] 09:16:58     INFO - PID 5765 | Assertion failure
[task 2017-10-11T09:16:58.480Z] 09:16:58     INFO - PID 5765 | assert@chrome://browser/content/tabbrowser.xml:4630:46
[task 2017-10-11T09:16:58.480Z] 09:16:58     INFO - PID 5765 | finish@chrome://browser/content/tabbrowser.xml:4472:15
[task 2017-10-11T09:16:58.481Z] 09:16:58     INFO - PID 5765 | postActions@chrome://browser/content/tabbrowser.xml:4759:17
[task 2017-10-11T09:16:58.481Z] 09:16:58     INFO - PID 5765 | handleEvent@chrome://browser/content/tabbrowser.xml:5100:15
[task 2017-10-11T09:16:58.482Z] 09:16:58     INFO - PID 5765 | updateBrowserRemoteness@chrome://browser/content/tabbrowser.xml:2045:13
[task 2017-10-11T09:16:58.482Z] 09:16:58     INFO - PID 5765 | onxbloop-browser-crashed@chrome://browser/content/tabbrowser.xml:6182:13
[task 2017-10-11T09:16:58.530Z] 09:16:58     INFO - PID 5765 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2017-10-11T09:16:58.627Z] 09:16:58     INFO - PID 5765 | [Parent 5765, Gecko_IOThread] WARNING: waitpid failed pid:5816 errno:10: file /builds/worker/workspace/build/src/ipc/chromium/src/base/process_util_posix.cc, line 276
[task 2017-10-11T09:16:58.664Z] 09:16:58     INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/QKnIevgNS3y7UCrP04-l5g/artifacts/public/build/target.crashreporter-symbols.zip
[task 2017-10-11T09:16:58.672Z] 09:16:58     INFO - PID 5765 | [Parent 5765, Gecko_IOThread] WARNING: waitpid failed pid:5873 errno:10: file /builds/worker/workspace/build/src/ipc/chromium/src/base/process_util_posix.cc, line 276
[task 2017-10-11T09:16:58.769Z] 09:16:58     INFO - PID 5765 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2017-10-11T09:17:03.621Z] 09:17:03     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpmxvKzU.mozrunner/minidumps/518b10dc-538e-6c6e-8795-312d3272c63e.dmp /tmp/tmpSDf7ED
[task 2017-10-11T09:17:09.726Z] 09:17:09     INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/518b10dc-538e-6c6e-8795-312d3272c63e.dmp
[task 2017-10-11T09:17:09.726Z] 09:17:09     INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/518b10dc-538e-6c6e-8795-312d3272c63e.extra
[task 2017-10-11T09:17:09.766Z] 09:17:09     INFO - PROCESS-CRASH | /media-source/mediasource-errors.html | application crashed [@ mozilla::TrackBuffersManager::ProcessTasks]
[task 2017-10-11T09:17:09.766Z] 09:17:09     INFO - Crash dump filename: /tmp/tmpmxvKzU.mozrunner/minidumps/518b10dc-538e-6c6e-8795-312d3272c63e.dmp
[task 2017-10-11T09:17:09.767Z] 09:17:09     INFO - Operating system: Linux
[task 2017-10-11T09:17:09.767Z] 09:17:09     INFO -                   0.0.0 Linux 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64
[task 2017-10-11T09:17:09.767Z] 09:17:09     INFO - CPU: amd64
[task 2017-10-11T09:17:09.768Z] 09:17:09     INFO -      family 6 model 62 stepping 4
[task 2017-10-11T09:17:09.768Z] 09:17:09     INFO -      4 CPUs
[task 2017-10-11T09:17:09.768Z] 09:17:09     INFO - 
[task 2017-10-11T09:17:09.769Z] 09:17:09     INFO - GPU: UNKNOWN
[task 2017-10-11T09:17:09.769Z] 09:17:09     INFO - 
[task 2017-10-11T09:17:09.770Z] 09:17:09     INFO - Crash reason:  SIGSEGV
[task 2017-10-11T09:17:09.770Z] 09:17:09     INFO - Crash address: 0x0
[task 2017-10-11T09:17:09.770Z] 09:17:09     INFO - Process uptime: not available
[task 2017-10-11T09:17:09.771Z] 09:17:09     INFO - 
[task 2017-10-11T09:17:09.771Z] 09:17:09     INFO - Thread 31 (crashed)
[task 2017-10-11T09:17:09.771Z] 09:17:09     INFO -  0  libxul.so!mozilla::TrackBuffersManager::ProcessTasks [TrackBuffersManager.cpp:9be5bbb35797 : 178 + 0x0]
[task 2017-10-11T09:17:09.772Z] 09:17:09     INFO -     rax = 0x00007f629615b680   rdx = 0x00007f62873557e8
[task 2017-10-11T09:17:09.772Z] 09:17:09     INFO -     rcx = 0x0000000000000003   rbx = 0x00007f62899cc000
[task 2017-10-11T09:17:09.772Z] 09:17:09     INFO -     rsi = 0x0000000000000000   rdi = 0x0000000000633d00
[task 2017-10-11T09:17:09.773Z] 09:17:09     INFO -     rbp = 0x00007f629615b6e0   rsp = 0x00007f629615b670
[task 2017-10-11T09:17:09.773Z] 09:17:09     INFO -      r8 = 0x0000000000000000    r9 = 0x00007f62a6b7d17f
[task 2017-10-11T09:17:09.774Z] 09:17:09     INFO -     r10 = 0x0000000000000001   r11 = 0x0000000000000283
[task 2017-10-11T09:17:09.774Z] 09:17:09     INFO -     r12 = 0x0000000000000000   r13 = 0x00007f629615b700
[task 2017-10-11T09:17:09.774Z] 09:17:09     INFO -     r14 = 0x00007f6287658cf0   r15 = 0x00007f6288fe50a0
[task 2017-10-11T09:17:09.775Z] 09:17:09     INFO -     rip = 0x00007f62a43b7930
[task 2017-10-11T09:17:09.775Z] 09:17:09     INFO -     Found by: given as instruction pointer in context
[task 2017-10-11T09:17:09.775Z] 09:17:09     INFO -  1  libxul.so!mozilla::detail::RunnableMethodImpl<mozilla::TrackBuffersManager*, void (mozilla::TrackBuffersManager::*)(), true, (mozilla::RunnableKind)0u>::Run [nsThreadUtils.h:9be5bbb35797 : 1142 + 0x13]
[task 2017-10-11T09:17:09.776Z] 09:17:09     INFO -     rbx = 0x00007f6288fe50a0   rbp = 0x00007f629615b6f0
[task 2017-10-11T09:17:09.776Z] 09:17:09     INFO -     rsp = 0x00007f629615b6f0   r12 = 0x00007f627f37f6c0
[task 2017-10-11T09:17:09.777Z] 09:17:09     INFO -     r13 = 0x00007f629615b700   r14 = 0x00007f6287658cf0
[task 2017-10-11T09:17:09.777Z] 09:17:09     INFO -     r15 = 0x00007f6288fe50a0   rip = 0x00007f62a43a329a
[task 2017-10-11T09:17:09.777Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.778Z] 09:17:09     INFO -  2  libxul.so!mozilla::TaskQueue::Runner::Run [TaskQueue.cpp:9be5bbb35797 : 246 + 0x7]
[task 2017-10-11T09:17:09.778Z] 09:17:09     INFO -     rbx = 0x00007f6288fe50a0   rbp = 0x00007f629615b7a0
[task 2017-10-11T09:17:09.778Z] 09:17:09     INFO -     rsp = 0x00007f629615b700   r12 = 0x00007f627f37f6c0
[task 2017-10-11T09:17:09.779Z] 09:17:09     INFO -     r13 = 0x00007f629615b700   r14 = 0x00007f6287658cf0
[task 2017-10-11T09:17:09.779Z] 09:17:09     INFO -     r15 = 0x00007f6288fe50a0   rip = 0x00007f62a33a9ee5
[task 2017-10-11T09:17:09.779Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.780Z] 09:17:09     INFO -  3  libxul.so!nsThreadPool::Run() + 0x1d7
[task 2017-10-11T09:17:09.780Z] 09:17:09     INFO -     rbx = 0x00007f62873557c0   rbp = 0x00007f629615b830
[task 2017-10-11T09:17:09.780Z] 09:17:09     INFO -     rsp = 0x00007f629615b7b0   r12 = 0x00007f62a7f72570
[task 2017-10-11T09:17:09.781Z] 09:17:09     INFO -     r13 = 0x00007f62873557e8   r14 = 0x00007f62a8094d90
[task 2017-10-11T09:17:09.781Z] 09:17:09     INFO -     r15 = 0x00007f6288fe50a0   rip = 0x00007f62a55d7a77
[task 2017-10-11T09:17:09.782Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.782Z] 09:17:09     INFO -  4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) + 0x75f
[task 2017-10-11T09:17:09.782Z] 09:17:09     INFO -     rbx = 0x00007f6288ef6270   rbp = 0x00007f629615bda0
[task 2017-10-11T09:17:09.783Z] 09:17:09     INFO -     rsp = 0x00007f629615b840   r12 = 0x0000000000000000
[task 2017-10-11T09:17:09.783Z] 09:17:09     INFO -     r13 = 0x00007f62873557d0   r14 = 0x00007f629615b980
[task 2017-10-11T09:17:09.783Z] 09:17:09     INFO -     r15 = 0x00007f629615b920   rip = 0x00007f62a55d4dff
[task 2017-10-11T09:17:09.784Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.784Z] 09:17:09     INFO -  5  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) + 0x1f
[task 2017-10-11T09:17:09.784Z] 09:17:09     INFO -     rbx = 0x00007f62836980c0   rbp = 0x00007f629615bdd0
[task 2017-10-11T09:17:09.785Z] 09:17:09     INFO -     rsp = 0x00007f629615bdb0   r12 = 0x00007f6288e73b80
[task 2017-10-11T09:17:09.785Z] 09:17:09     INFO -     r13 = 0x00007f6288ef6270   r14 = 0x00007f6288e73ba0
[task 2017-10-11T09:17:09.785Z] 09:17:09     INFO -     r15 = 0x00007f62a7ca9d88   rip = 0x00007f62a55d8fbf
[task 2017-10-11T09:17:09.786Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.786Z] 09:17:09     INFO -  6  libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) + 0xfd
[task 2017-10-11T09:17:09.786Z] 09:17:09     INFO -     rbx = 0x00007f62836980c0   rbp = 0x00007f629615be30
[task 2017-10-11T09:17:09.787Z] 09:17:09     INFO -     rsp = 0x00007f629615bde0   r12 = 0x00007f6288e73b80
[task 2017-10-11T09:17:09.787Z] 09:17:09     INFO -     r13 = 0x00007f6288ef6270   r14 = 0x00007f6288e73ba0
[task 2017-10-11T09:17:09.787Z] 09:17:09     INFO -     r15 = 0x00007f62a7ca9d88   rip = 0x00007f62a5630e8d
[task 2017-10-11T09:17:09.788Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.788Z] 09:17:09     INFO -  7  libxul.so!MessageLoop::Run() + 0x47
[task 2017-10-11T09:17:09.788Z] 09:17:09     INFO -     rbx = 0x00007f6288ef6270   rbp = 0x00007f629615be60
[task 2017-10-11T09:17:09.789Z] 09:17:09     INFO -     rsp = 0x00007f629615be40   r12 = 0x00007f62836980c0
[task 2017-10-11T09:17:09.789Z] 09:17:09     INFO -     r13 = 0x00007f629615be80   r14 = 0x00000000ffffffff
[task 2017-10-11T09:17:09.789Z] 09:17:09     INFO -     r15 = 0x00007f62a7f728c0   rip = 0x00007f62a5dd5f47
[task 2017-10-11T09:17:09.790Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.790Z] 09:17:09     INFO -  8  libxul.so!nsThread::ThreadFunc(void*) + 0x114
[task 2017-10-11T09:17:09.790Z] 09:17:09     INFO -     rbx = 0x00007f6288ef6270   rbp = 0x00007f629615bec0
[task 2017-10-11T09:17:09.790Z] 09:17:09     INFO -     rsp = 0x00007f629615be70   r12 = 0x00007f62836980c0
[task 2017-10-11T09:17:09.791Z] 09:17:09     INFO -     r13 = 0x00007f629615be80   r14 = 0x00000000ffffffff
[task 2017-10-11T09:17:09.791Z] 09:17:09     INFO -     r15 = 0x00007f62a7f728c0   rip = 0x00007f62a5cf4b54
[task 2017-10-11T09:17:09.791Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.792Z] 09:17:09     INFO -  9  libnspr4.so!_pt_root + 0x12c
[task 2017-10-11T09:17:09.792Z] 09:17:09     INFO -     rbx = 0x00007f62836f8c80   rbp = 0x00007f629615bf10
[task 2017-10-11T09:17:09.792Z] 09:17:09     INFO -     rsp = 0x00007f629615bed0   r12 = 0x0000000000000000
[task 2017-10-11T09:17:09.793Z] 09:17:09     INFO -     r13 = 0x000000000000170f   r14 = 0x00007f629615c700
[task 2017-10-11T09:17:09.793Z] 09:17:09     INFO -     r15 = 0x00007f629615c670   rip = 0x00007f62b29e9ddc
[task 2017-10-11T09:17:09.794Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.794Z] 09:17:09     INFO - 10  libpthread-2.23.so + 0x76ba
[task 2017-10-11T09:17:09.794Z] 09:17:09     INFO -     rbx = 0x0000000000000000   rbp = 0x0000000000000000
[task 2017-10-11T09:17:09.795Z] 09:17:09     INFO -     rsp = 0x00007f629615bf20   r12 = 0x0000000000000000
[task 2017-10-11T09:17:09.795Z] 09:17:09     INFO -     r13 = 0x00007ffcf12aa3bf   r14 = 0x00007f629615c9c0
[task 2017-10-11T09:17:09.795Z] 09:17:09     INFO -     r15 = 0x00007f62836f8c80   rip = 0x00007f62b3ecd6ba
[task 2017-10-11T09:17:09.796Z] 09:17:09     INFO -     Found by: call frame info
[task 2017-10-11T09:17:09.796Z] 09:17:09     INFO - 11  libc-2.23.so + 0x1073dd
[task 2017-10-11T09:17:09.796Z] 09:17:09     INFO -     rsp = 0x00007f629615bfc0   rip = 0x00007f62b2f563dd
[task 2017-10-11T09:17:09.797Z] 09:17:09     INFO -     Found by: stack scanning
Flags: needinfo?(alwu)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Will this bug fix the crash?

https://treeherder.mozilla.org/logviewer.html#?job_id=136180824&repo=autoland&lineNumber=4590
0  firefox!mozilla::detail::MutexImpl::~MutexImpl [Mutex_posix.cpp:c742fe352867 : 67 + 0x0]
1  libxul.so!mozilla::TrackBuffersManager::~TrackBuffersManager [Mutex.h:c742fe352867 : 55 + 0xc]
Assignee

Comment 83

2 years ago
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #82)
> Will this bug fix the crash?
> 
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=136180824&repo=autoland&lineNumber=4590
> 0  firefox!mozilla::detail::MutexImpl::~MutexImpl
> [Mutex_posix.cpp:c742fe352867 : 67 + 0x0]
> 1  libxul.so!mozilla::TrackBuffersManager::~TrackBuffersManager
> [Mutex.h:c742fe352867 : 55 + 0xc]

What's this crash from?
Assignee

Comment 85

2 years ago
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #84)
> Bug 1407641.

I think this bug couldn't help that, it just avoids we access the detached TBM.

Comment 86

2 years ago
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/053999c48513
part1 : should detach TBM after detaching it from demuxers. r=jya
https://hg.mozilla.org/integration/autoland/rev/7fd2d15b8fae
part2 : remove reference to TrackBuffersManagers once detached. r=jya
https://hg.mozilla.org/integration/autoland/rev/bba2b7892653
part3 : don't process any task after detached. r=jya
https://hg.mozilla.org/integration/autoland/rev/ce7ff91e2bdb
part4 : ensure we always detach TBM from demuxers. r=jya
https://hg.mozilla.org/integration/autoland/rev/3c8730c5a21f
part5 : use IsCurrentThreadIn() as a main criteria to determine whether we're on the task queue or not. r=jya
Reporter

Updated

2 years ago
Depends on: 1407940
Assignee

Updated

2 years ago
Flags: needinfo?(alwu)
Reporter

Updated

2 years ago
Depends on: 1408987
With this small volume and the fact that we have this for a while, let's wontfix that in 57 and let that ride the train with 58
Reporter

Comment 89

2 years ago
still happening with both this change and bug 1407940 :(

https://crash-stats.mozilla.com/report/index/0bd2a7b7-d9fd-4e4d-afdd-981950171019

Once again we find that nullptr is found in the trackbuffer array
https://hg.mozilla.org/mozilla-central/annotate/c6a2643362a6/dom/media/mediasource/TrackBuffersManager.cpp#l2433

already 4 similar crashes.
Status: RESOLVED → REOPENED
Flags: needinfo?(alwu)
Resolution: FIXED → ---
Assignee

Comment 90

2 years ago
Still can't understand why... I believe now it's not related with the detaching on main thread since we've ended all possible cases. It's impossible to run the task on main thread after bug 1407940.

The only thing I could imagine is the demuxer output the nullptr sample, but i don't know whether it's possible...

Note something I observed, maybe it could help.

1. In 58, the crash would only happen from MediaSourceTrackDemuxer::DoGetSamples(), not seeing the crash from MediaSourceTrackDemuxer::DoSeek (which was often seen in previous version)

2. It would happen when playing webm, since there are the function call from ffvpx [1]
   - so if it's related to demuxer, we could focus on webm demuxer.

3. I noticed that all other MediaPlayback and MediaPDecoder threads are idle at that time, so it might mean we've finished decoding? 

[1] https://crash-stats.mozilla.com/report/index/6fda799d-81a9-41b3-a373-919820171018#allthreads
Flags: needinfo?(alwu)
Assignee

Updated

a year ago
Assignee: alastor0325 → nobody
Who would be a good new owner for this bug?
Flags: needinfo?(jyavenard)
Reporter

Comment 92

a year ago
Alastor is coming back isn't he? And seeing that I've given hope we would ever find the reason behind this...
Flags: needinfo?(jyavenard)
Reporter

Updated

a year ago
See Also: → 1466606

Alastor could please have a look at this again?

Assignee: nobody → alwu
Flags: needinfo?(alwu)
Assignee

Comment 94

4 months ago

Will investigate it again and report what I will find later, keep NI.

Assignee

Comment 95

4 months ago

Except the memory corruption, maybe one situaion could cause this issue. If there is no one holding a reference of track demuxer and we don't even call BreakCycle(), then we might have a pending task (such like DoSeek(), DoGetSamples() which will be executed later on a released track demuxer (which might explain why we got nullptr sample)

However, I can't see any possible call stack that can cause this situation...

Assignee

Comment 96

3 months ago

I'm still not able to get any further finding for this bug. But I will keep this bug to myself, and will check it again if I have any new finding.

Flags: needinfo?(alwu)
You need to log in before you can comment on or make changes to this bug.