Closed Bug 1402681 Opened 7 years ago Closed 7 years ago

Crash in mozilla::net::SpeculativeConnectArgs::Release

Categories

(Core :: Audio/Video: Playback, defect, P2)

58 Branch
Unspecified
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 + fixed
firefox58 --- fixed

People

(Reporter: calixte, Assigned: alwu)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [clouseau])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is report bp-dbb1f10d-fa1a-42a8-8bbd-28b440170923. ============================================================= There are 7 crashes in nightly 58 starting with buildid 20170922220129. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1401147. [1] https://hg.mozilla.org/mozilla-central/rev?node=aabb5872e82837b984daf097c75536e8a4910094
Flags: needinfo?(alwu)
This crash stack makes no sense. The crash in on thread 0... 0 mozilla::net::SpeculativeConnectArgs::Release() netwerk/protocol/http/nsHttpConnectionMgr.cpp:459 1 RefPtr<mozilla::FileBlockCache>::~RefPtr<mozilla::FileBlockCache>() mfbt/RefPtr.h:79 2 RefPtr<mozilla::MediaRawData>::`scalar deleting destructor'(unsigned int) 3 nsTArray_Impl<RefPtr<mozilla::MediaRawData>, nsTArrayInfallibleAllocator>::DestructRange(unsigned __int64, unsigned __int64) xpcom/ds/nsTArray.h:2012 4 nsTArray_Impl<RefPtr<mozilla::MediaRawData>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned __int64, unsigned __int64) xpcom/ds/nsTArray.h:2060 5 mozilla::TrackBuffersManager::TrackData::Reset() dom/media/mediasource/TrackBuffersManager.h:387 6 mozilla::TrackBuffersManager::ProcessTasks() dom/media/mediasource/TrackBuffersManager.cpp:215 7 mozilla::dom::SourceBuffer::cycleCollection::Unlink(void*) dom/media/mediasource/SourceBuffer.cpp:585 The track is very incomplete to start with, but upon detach this calls TrackBuffersManager::Detach which does: QueueTask(new DetachTask()); which will dispatch a task on the MSE task queue. TrackBuffersManager::ProcessTasks only ever runs thing on the task queue, never the main thread.
:gerald, I wonder if this isn't related to the unknown crash we've had about the track buffers being emptied and yet were being accessed. Bug 1247189 That things are being accessed outside the task queue would explain, it's like somehow dispatching a task didn't work and the test to find out if we were running on the task queue returning a wrong value
Flags: needinfo?(gsquelart)
Agreed that it may have the same cause as bug 1247189, but I'm still not seeing anything wrong. ... Apart from using a few raw TBM pointers. To eliminate doubt there (and so we don't keep looking at these lines to try to prove to ourselves that they're safe), I will try and convert them to RefPtrs. But I'm not sure it's the actual cause, so I'm doing that in separate bug 1402728. In the best case, it may just be what was needed. in the worst case it won't help, but at least it's one thing we won't have to worry about anymore.
Flags: needinfo?(gsquelart)
Why mTaskQueue doesn't exist could indicate we're on the task queue [1]? It seems wrong. The problem here was caused by OnTaskQueue() returns wrong result after deteched the track buffer. After deteched the TBM, the mTaskQueue would be nullptr [2], and it would make the OnTaskQueue() always return true even we're not on the task queue. Therefore, the TBM::ProcessTasks would be run on unexpected thread. I think we should only use |GetTaskQueue()->IsCurrentThreadIn()| to determine whether we're on the task queue or not. [1] http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/media/mediasource/TrackBuffersManager.h#464 [2] http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/media/mediasource/TrackBuffersManager.cpp#211
Flags: needinfo?(alwu)
Assignee: nobody → alwu
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #4) > Why mTaskQueue doesn't exist could indicate we're on the task queue [1]? It > seems wrong. > > The problem here was caused by OnTaskQueue() returns wrong result after > deteched the track buffer. This isn't explaining on the *why* the crash occurred however and why mTaskQueue is nullptr The only way for mTaskQueue to be null is if TBM::Detach was called more than once. However, this should never happen TBM::Detach can only be called from SourceBuffer::Detach http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/media/mediasource/SourceBuffer.cpp#281 At which time mTrackBuffersManager is set to null Or during Cycle Collection, http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/media/mediasource/SourceBuffer.cpp#583 Which test if mTrackBuffersManager is null. Why TBM::Detach is called twice is the core problem. Just testing if the task queue is already null is just papering over the issue without solving the core problem. > > After deteched the TBM, the mTaskQueue would be nullptr [2], and it would > make the OnTaskQueue() always return true even we're not on the task queue. > Therefore, the TBM::ProcessTasks would be run on unexpected thread. Good point, I had missed that possibility. > > I think we should only use |GetTaskQueue()->IsCurrentThreadIn()| to > determine whether we're on the task queue or not. That's true, but mTaskQueue should never be null to start With! >
Priority: -- → P2
(In reply to Jean-Yves Avenard [:jya] from comment #7) > This isn't explaining on the *why* the crash occurred however and why > mTaskQueue is nullptr About these two questions, Here are my assumptions, (1) why the crash occurred From here [1], we could see the wrapped natives could be unlinked twice if we didn't change this behavior. And the source buffer is exactly a the wrapped natives [2], so we could expect the |manager->detach()| might be called twice. [1] https://bug635044.bugzilla.mozilla.org/show_bug.cgi?id=706281#c7 https://bug635044.bugzilla.mozilla.org/show_bug.cgi?id=669903#c2 [2] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Language_bindings/XPConnect/XPConnect_wrappers (2) why mTaskQueue is nullptr In this case, first detach would clear the mTaskQueue. However, I observed another case could cause null mTaskQueue, see [3]. Since the BreakCycles() is not async, sometime we would use the MediaSourceTrackDemuxer::mManager which have been detached. Eg. see following code order 1. MediaSourceTrackDemuxer::Reset() or GetSamples() - dispatch seek or get-sample task to task queue 2. TrackBuffersManager::ProcessTasks - process detaching and clear mTaskQueue 3. Doing seek or get-sample task on task queue - NO task queue!!!!!!! However, this is a different issue, we could file another bug to fix it. [3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=30dcd347589cb6729b084fa692f21abe51de3bc9&selectedJob=133063838
Flags: needinfo?(jyavenard)
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #8) > (In reply to Jean-Yves Avenard [:jya] from comment #7) > > This isn't explaining on the *why* the crash occurred however and why > > mTaskQueue is nullptr > > About these two questions, Here are my assumptions, > > (1) why the crash occurred > > From here [1], we could see the wrapped natives could be unlinked twice if > we didn't change this behavior. > > And the source buffer is exactly a the wrapped natives [2], so we could > expect the |manager->detach()| might be called twice. > > [1] > https://bug635044.bugzilla.mozilla.org/show_bug.cgi?id=706281#c7 > https://bug635044.bugzilla.mozilla.org/show_bug.cgi?id=669903#c2 > > [2] > https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/ > Language_bindings/XPConnect/XPConnect_wrappers > > (2) why mTaskQueue is nullptr > > In this case, first detach would clear the mTaskQueue. However, I observed > another case could cause null mTaskQueue, see [3]. > > Since the BreakCycles() is not async, sometime we would use the > MediaSourceTrackDemuxer::mManager which have been detached. > Which BreakCycles are you referring to? MediaSourceTrackDemuxer::BreakCycles? It dispatch a task to clear mManager, so there's no issue about sync vs async Also, the track demuxer will never be used *after* a call to BreakCycles() so there can't be a use of MediaSourceTrackDemuxer::mManager after a call to MediaSourceTrackDemuxer::BreakCycles() What could be happening however is that TrackBuffersManager::Detach was called as the SourceBuffer was cycle collected. But then, the SourceBuffer shouldn't be cycle collected as it's used by the MediaSource, which itself hold a strong reference on the MediaSourceDemuxer and if the MediaSource itself was cycle collected, then the MediaFormatReader would have been shutdown prior SourceBuffer::Detach() being called. > Eg. see following code order > 1. MediaSourceTrackDemuxer::Reset() or GetSamples() > - dispatch seek or get-sample task to task queue > 2. TrackBuffersManager::ProcessTasks > - process detaching and clear mTaskQueue > 3. Doing seek or get-sample task on task queue > - NO task queue!!!!!!! > > However, this is a different issue, we could file another bug to fix it. you may have well found the actual reason for bug 1247189! However, I'm not convinced on your analysis on why we got there.
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #9) > > Which BreakCycles are you referring to? > MediaSourceTrackDemuxer::BreakCycles? > > It dispatch a task to clear mManager, so there's no issue about sync vs async > > Also, the track demuxer will never be used *after* a call to BreakCycles() > > so there can't be a use of MediaSourceTrackDemuxer::mManager after a call to > MediaSourceTrackDemuxer::BreakCycles() Yes, I mean MediaSourceTrackDemuxer::BreakCycles(). BreakCycles() is an async function, because it do dispatch a task instead of executing it immediately. So, after BreakCycles() called doesn't mean the cycle has been broken right now, we still need to wait the task being executed. However, after BreakCycles() called, it means the source buffer has been detached and the mTaskQueue has been reset to nullptr, then we should not do any operation for source buffer even the *real* breaking cycle task is still in the waiting queue. Thinking about that, if the next task in task queue is MediaSourceTrackDemuxer's Reset() or GetSamples(), it would access the mManager with null task queue. > What could be happening however is that TrackBuffersManager::Detach was > called as the SourceBuffer was cycle collected. > But then, the SourceBuffer shouldn't be cycle collected as it's used by the > MediaSource, which itself hold a strong reference on the MediaSourceDemuxer > and if the MediaSource itself was cycle collected, then the > MediaFormatReader would have been shutdown prior SourceBuffer::Detach() > being called. Not sure I got your point, but I don't understand what's the effect these operations cause. > you may have well found the actual reason for bug 1247189! > > However, I'm not convinced on your analysis on why we got there. Please see [1], it can be reproduced on both my local build and try server. Those step are not just an analysis, I've seen the code flow like that. It hit the assertion, and the log shows we access the source buffer which has been deteched. Running test_AVC3_mp4.html or test_eme_autoplay.html and then you could see the result. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=30dcd347589cb6729b084fa692f21abe51de3bc9&selectedJob=133063838 --- Back to this issue, do you think the assumption about the twice unlink is impossible? If so, why? I think we should add the checking like IsDeteching() to determine whether we should call TrackBufferesManager::detach(). How do you think? Thanks!
Flags: needinfo?(jyavenard)
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #10) > (In reply to Jean-Yves Avenard [:jya] from comment #9) > > > > Which BreakCycles are you referring to? > > MediaSourceTrackDemuxer::BreakCycles? > > > > It dispatch a task to clear mManager, so there's no issue about sync vs async > > > > Also, the track demuxer will never be used *after* a call to BreakCycles() > > > > so there can't be a use of MediaSourceTrackDemuxer::mManager after a call to > > MediaSourceTrackDemuxer::BreakCycles() > > Yes, I mean MediaSourceTrackDemuxer::BreakCycles(). > > BreakCycles() is an async function, because it do dispatch a task instead of > executing it immediately. indeed, but you wrote "Since the BreakCycles() is not async" so which one do you mean? > > So, after BreakCycles() called doesn't mean the cycle has been broken right > now, we still need to wait the task being executed. you don't because the MFR will never call the MediaSourceTrackDemuxer once BreakCycles is called. > > However, after BreakCycles() called, it means the source buffer has been > detached and the mTaskQueue has been reset to nullptr, then we should not do > any operation for source buffer even the *real* breaking cycle task is still > in the waiting queue. mTaskQueue is an AutoTaskQueue, and it's refcounted. Clearing it in the MediaSource demuxer has no impact on the value in TrackBuffersManager. So calling MediaSourceTrackDemuxer::BreakCycles has NOTHING to do with the problem at hand. I'm not saying that there's no problem in regards to Detach being called twice. I'm saying that your analysis on how it gets there is incorrect. > > Thinking about that, if the next task in task queue is > MediaSourceTrackDemuxer's Reset() or GetSamples(), it would access the > mManager with null task queue. it can't be. MediaSourceTrackDemuxer::BreakCycle() is the last call ever made to the MediaSourceTrackDemuxer, you can no longer use the MediaSourceTrackDemuxer once BreakCycle() has been called, in fact, the MFR couldn't even if it tried BreakCycles() is only called once MFR::Shutdown() is called, there can't be a call to MediaSourceTrackDemuxer::GetSamples(). As BreakCycles() simply queue a task, BreakCycles() is *always* the last task to run. There can't be re-ordering of the tasks once queued. Facts at hands: 1- TrackBuffersManager::mTaskQueue is null 2- This can only happen following a call to TrackBuffersManager::Detach having been called 3- Seeing that TrackBuffersManager::GetSamples() is called with mTaskQueue being null, it means TrackBuffersManager::GetSamples is called after TrackBuffersManager::Detach() has been called. Normally, calling Detach() would have deregistered the TrackBuffersManager from the MediaSourceDemuxer: mMediaSource->GetDecoder()->GetDemuxer()->DetachSourceBuffer( mTrackBuffersManager.get()); > Please see [1], it can be reproduced on both my local build and try server. > > Those step are not just an analysis, I've seen the code flow like that. It > hit the assertion, and the log shows we access the source buffer which has > been deteched. yes, you can reproduce it... but your analysis on the reason being due to how MediaSourceTrackDemuxer::BreakCycles() is asynchronous can't be correct. > > Running test_AVC3_mp4.html or test_eme_autoplay.html and then you could see > the result. > > [1] > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=30dcd347589cb6729b084fa692f21abe51de3bc9&selectedJob=1 > 33063838 > > --- > > Back to this issue, do you think the assumption about the twice unlink is > impossible? If so, why? i didn't say that. it's obviously happening. simplest fix would be to abort early in TrackBuffersManager::QueueTask if mTaskQueue is null, after all if that happens there's nothing more we can do anyway. > I think we should add the checking like IsDeteching() to determine whether > we should call TrackBufferesManager::detach(). > > How do you think? see above. testing mTaskQueue is enough, mTaskQueue is only set to null following a DetachTask.
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #11) > indeed, but you wrote "Since the BreakCycles() is not async" so which one do > you mean? Ah, sorry, I want to say "Since the BreakCycles() is not sync". > mTaskQueue is an AutoTaskQueue, and it's refcounted. > > Clearing it in the MediaSource demuxer has no impact on the value in > TrackBuffersManager. > > So calling MediaSourceTrackDemuxer::BreakCycles has NOTHING to do with the > problem at hand. > > I'm not saying that there's no problem in regards to Detach being called > twice. > > I'm saying that your analysis on how it gets there is incorrect. I'm not talking about the ref counted here, I just want to mention if we access the track buffers manager after BreakCycles() called, it won't have a valid mTaskQueue and the OnTaskQueue() would also return the wrong result. > it can't be. > MediaSourceTrackDemuxer::BreakCycle() is the last call ever made to the > MediaSourceTrackDemuxer, you can no longer use the MediaSourceTrackDemuxer > once BreakCycle() has been called, in fact, the MFR couldn't even if it tried > > BreakCycles() is only called once MFR::Shutdown() is called, there can't be > a call to MediaSourceTrackDemuxer::GetSamples(). > As BreakCycles() simply queue a task, BreakCycles() is *always* the last > task to run. > > There can't be re-ordering of the tasks once queued. It would also be called by TrackBuffersManager::ShutdownDemuxers(), I think it won't affect whether MFR decides calling GetSample() or not? > Facts at hands: > > 1- TrackBuffersManager::mTaskQueue is null > 2- This can only happen following a call to TrackBuffersManager::Detach > having been called > 3- Seeing that TrackBuffersManager::GetSamples() is called with mTaskQueue > being null, it means TrackBuffersManager::GetSamples is called after > TrackBuffersManager::Detach() has been called. > > Normally, calling Detach() would have deregistered the TrackBuffersManager > from the MediaSourceDemuxer: > mMediaSource->GetDecoder()->GetDemuxer()->DetachSourceBuffer( > mTrackBuffersManager.get()); > Hmm... seems it's not possible to insert a task between TrackBuffersManager's detach task and MediaSourceDemuxer::DoDetachSourceBuffer. I need to investigate again to see what's the actual code flow which causes us access the track buffer manager without mTaskQueue. > i didn't say that. > > it's obviously happening. > > simplest fix would be to abort early in TrackBuffersManager::QueueTask if > mTaskQueue is null, after all if that happens there's nothing more we can do > anyway. OK, will try.
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #12) > It would also be called by TrackBuffersManager::ShutdownDemuxers(), I think > it won't affect whether MFR decides calling GetSample() or not? TrackBuffersManager::ShutdownDemuxers() has nothing to do with TrackBuffersMananger::GetSamples and so forth. Different demuxers. The MediaFormatReader uses MediaSourceDemuxer with MediaSourceTrackDemuxers. TrackBuffersManagers keeps a set of demuxers to demux the data passed by appendBuffer. TrackBuffersManager::ShutdownDemuxers() will call BreakCycles on the internal demuxers (either webm or mp4). It's totally unrelated to the use of MediaFormatReader with the MediaSource demuxer. So no, it won't affect if the MFR calls MediaSourceTrackDemuxer::GetSamples() totally unrelated. > > Hmm... seems it's not possible to insert a task between > TrackBuffersManager's detach task and > MediaSourceDemuxer::DoDetachSourceBuffer. > > I need to investigate again to see what's the actual code flow which causes > us access the track buffer manager without mTaskQueue. as I wrote, the only possible explanation is the the SourceBuffer is detached from the MediaSource or the SourceBuffer is cycle collected before the MediaFormatReader and the MediaSourceDemuxer are being shutdown. I don't see how that could happen (if it was happening, it would happen very often and we would have a huge crash rate, and not the few dozen a month we get) > > > i didn't say that. > > > > it's obviously happening. > > > > simplest fix would be to abort early in TrackBuffersManager::QueueTask if > > mTaskQueue is null, after all if that happens there's nothing more we can do > > anyway. > > OK, will try. This will only fix the double call to TrackBuffersManager::Detach() due to double cycle collection. Still need to find out what's happening above... Your try test will greatly help I'm sure. You've obviously identified how to reproduce the problem. Now just need to understand what's happening :)
Will continue to investigate why the mTaskQueue would be null in bug1247189, since it's not directly related with the crash.
Comment on attachment 8911669 [details] Bug 1402681 - part1 : do not queue the task without task queue. https://reviewboard.mozilla.org/r/183066/#review188622
Attachment #8911669 - Flags: review+
Attachment #8912086 - Flags: review?(jyavenard)
Comment on attachment 8912086 [details] Bug 1402681 - part2 : add log. https://reviewboard.mozilla.org/r/183470/#review188638 ::: dom/media/mediasource/SourceBufferTask.h:58 (Diff revision 1) > , mAttributes(aAttributes) > {} > > static const Type sType = Type::AppendBuffer; > Type GetType() const override { return Type::AppendBuffer; } > + nsCString GetTypeName() const override { return nsPrintfCString("AppendBuffer"); } please use the NS_LITERAL_CSTRING macro rather than nsPrintfCString, no point constructing the string at run time on the heap. I would have made it simpler with returning const uint8_t* however as it's only used for debugging.
Attachment #8912086 - Flags: review?(jyavenard) → review+
Comment on attachment 8911669 [details] Bug 1402681 - part1 : do not queue the task without task queue. https://reviewboard.mozilla.org/r/183066/#review188642 ::: dom/media/mediasource/TrackBuffersManager.cpp:148 (Diff revision 2) > } > > void > TrackBuffersManager::QueueTask(SourceBufferTask* aTask) > { > + if (!GetTaskQueue()) { please add a comment about when this case can happen.
Comment on attachment 8911669 [details] Bug 1402681 - part1 : do not queue the task without task queue. https://reviewboard.mozilla.org/r/183066/#review188642 > please add a comment about when this case can happen. and why it's okay to dismissed the task (that it's a DetachTask and it's already been done, to make it clear, I would add a MOZ_ASSERT that the task type is DetachTask
[Tracking Requested - why for this release]: Why the crash will not be seen with this signature, this problem is serious as it cause a potential data race on internal members of TrackBuffersManager, that could lead to memory corruption This needs to be uplifted to 57
Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/145eac8ba706 part1 : do not queue the task without task queue. r=jya https://hg.mozilla.org/integration/autoland/rev/0cac2e15b4ca part2 : add log. r=jya
Uplift reminder, and let's wait for a few days to see whether the crash is fixed.
Flags: needinfo?(alwu)
See Also: → 1403426
Crash Signature: [@ mozilla::net::SpeculativeConnectArgs::Release] → [@ mozilla::net::SpeculativeConnectArgs::Release] [@ mozilla::net::UINT64Wrapper::Release]
Hi Alastor, Blake, could you please nominate a patch for uplift to Beta57? Thanks
Flags: needinfo?(bwu)
It looks this crash is gone. We should uplift the patch. And https://crash-stats.mozilla.com/report/index/2f119d23-f6fb-4178-bbfb-0ba8d0170929 is another problem.
Comment on attachment 8911669 [details] Bug 1402681 - part1 : do not queue the task without task queue. Approval Request Comment [Feature/Bug causing the regression]: this issue might exist for a while but we just found it recently [User impact if declined]: crash [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: doesn't have reproduce steps [List of other uplifts needed for the feature/fix]: both patches in this bug [Is the change risky?]: no [Why is the change risky/not risky?]: just prevent the same task calling twice on the same thread, doesn't do anything new [String changes made/needed]: no
Flags: needinfo?(bwu)
Flags: needinfo?(alwu)
Attachment #8911669 - Flags: approval-mozilla-beta?
Comment on attachment 8911669 [details] Bug 1402681 - part1 : do not queue the task without task queue. Fix a crash, taking it. Should be in 57b5
Attachment #8911669 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1407940
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: