Closed Bug 1578143 Opened 3 months ago Closed 3 months ago

Crash in [@ InvalidArrayIndex_CRASH | mozilla::TrackBuffersManager::DoEvictData]

Categories

(Core :: Audio/Video: Playback, defect, P2, critical)

defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- verified
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- verified
firefox71 --- verified

People

(Reporter: philipp, Assigned: bryce)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-d3ffd51e-0095-481c-9f7f-4b1770190901.

Top 10 frames of crashing thread:

0 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:27
1 xul.dll void mozilla::TrackBuffersManager::DoEvictData dom/media/mediasource/TrackBuffersManager.cpp
2 xul.dll void mozilla::TrackBuffersManager::ProcessTasks dom/media/mediasource/TrackBuffersManager.cpp:235
3 xul.dll void mozilla::TrackBuffersManager::QueueTask dom/media/mediasource/TrackBuffersManager.cpp:171
4 xul.dll nsresult mozilla::detail::RunnableMethodImpl<IPC::Channel*, bool  xpcom/threads/nsThreadUtils.h:1174
5 xul.dll nsresult mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:199
6 xul.dll nsresult nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:244
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1175
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
9 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:333

content crash reports with this signature are spiking up since two days ago.
the spike is mostly coming from spanish builds and comments/submitted urls point to (some change on the website) animeflv.net as the source of the crash. MOZ_CRASH reason is recorded as ElementAt(aIndex = 0, aLength = 0).

Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard) → needinfo?(bvandyk)

Spike looks to be across all releases which suggests it's not a new bug in Firefox, but something in what that site is doing which is triggering the crash. That said, it looks most pronounced in release -- possibly just more users.

Managed to catch this in a debug build:

>	xul.dll!InvalidArrayIndex_CRASH(unsigned __int64 aIndex, unsigned __int64 aLength) Line 27	C++
 	xul.dll!nsTArray_Impl<mozilla::media::Interval<mozilla::media::TimeUnit>,nsTArrayInfallibleAllocator>::ElementAt(unsigned __int64 aIndex) Line 1074	C++
 	xul.dll!nsTArray_Impl<mozilla::media::Interval<mozilla::media::TimeUnit>,nsTArrayInfallibleAllocator>::operator[](unsigned __int64 aIndex) Line 1103	C++
 	xul.dll!mozilla::media::IntervalSet<mozilla::media::TimeUnit>::operator[](unsigned __int64 aIndex) Line 482	C++
 	xul.dll!mozilla::TrackBuffersManager::DoEvictData(const mozilla::media::TimeUnit & aPlaybackTime, __int64 aSizeToEvict) Line 523	C++
 	xul.dll!mozilla::TrackBuffersManager::ProcessTasks() Line 237	C++
 	xul.dll!mozilla::TrackBuffersManager::QueueTask(mozilla::SourceBufferTask * aTask) Line 171	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<RefPtr<mozilla::SourceBufferTask> >::applyImpl<mozilla::TrackBuffersManager,void (mozilla::TrackBuffersManager::*)(mozilla::SourceBufferTask *),StoreRefPtrPassByPtr<mozilla::SourceBufferTask> ,0>(mozilla::TrackBuffersManager * o, void(mozilla::TrackBuffersManager::*)(mozilla::SourceBufferTask *) m, mozilla::Tuple<StoreRefPtrPassByPtr<mozilla::SourceBufferTask> > & args, std::integer_sequence<unsigned long long,0>) Line 1124	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<RefPtr<mozilla::SourceBufferTask> >::apply<mozilla::TrackBuffersManager,void (mozilla::TrackBuffersManager::*)(mozilla::SourceBufferTask *)>(mozilla::TrackBuffersManager * o, void(mozilla::TrackBuffersManager::*)(mozilla::SourceBufferTask *) m) Line 1130	C++
 	xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::TrackBuffersManager *,void (mozilla::TrackBuffersManager::*)(mozilla::SourceBufferTask *),1,mozilla::RunnableKind::Standard,RefPtr<mozilla::SourceBufferTask> >::Run() Line 1179	C++
 	xul.dll!mozilla::TaskQueue::Runner::Run() Line 200	C++
 	xul.dll!nsThreadPool::Run() Line 246	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1227	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 486	C++
 	xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 334	C++
 	xul.dll!MessageLoop::RunInternal() Line 315	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	xul.dll!MessageLoop::Run() Line 291	C++
 	xul.dll!nsThread::ThreadFunc(void * aArg) Line 460	C++
 	nss3.dll!_PR_NativeRunThread(void * arg) Line 406	C
 	nss3.dll!pr_root(void * arg) Line 137	C
 	[External Code]	
 	mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,1> >,void (*)(int, void *, void *)>::operator()<int,void *,void *>(int aArgs, void * aArgs, void * aArgs) Line 141	C++
 	mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) Line 603	C++
 	[External Code]	

Which looks similar to some of the better symbolized, older stacks in crash reports. We're dying because there are no buffered ranges in the track here, but we're attempting to access index 0.

Oldest crashes I see are from 67, which would make sense given bug 1531201 landing there and causing this.

Assignee: nobody → bvandyk
Has Regression Range: --- → yes
Flags: needinfo?(bvandyk)
Priority: -- → P2
Regressed by: 1531201

Note on how we appear to be arriving at the crash: we evict data when we're trying to append but that append would put us over a threshold. In these cases the eviction task is queued and processed as in the stack track in comment 1.

This is proving hard to catch in a debugger again, but noted that when the issue took place that the buffered ranges were empty, but the track buffer had several thousand frames in it. All of the frames that I inspecred had mTime and mDuration 0 -- which looks like the result of something going wrong, though I'm not sure what.

I suspect there is a race somewhere between eviction and other removal of frames, combined with something odd this site is doing to trigger a bug where we remove frames and clear buffered ranges, but fail to clear our buffers.

I think I can mitigate the immediate crash, and I can add some asserts to try and track the bug where we're removing the buffered ranges while retaining data in buffers.

Patch incoming to mitigate this crash. I suggest we get this crash out of release builds and follow up in another bug to figure out why we can have data but no buffered ranges.

It appears possible for data eviction to take place despite the manager not
currently having any buffered ranges. This shouldn't happen, but if it does we
don't want to crash the browser in release. This patch mitigates us performing a
bad access in that scenario.

See Also: → 1578848

Edit: log taken using MOZ_LOG=sync,timestamp,MediaSource:5

Attaching a log of the issue taking place. Not sure what to make of it, but figure it may be useful if we want to further identify what's going on here. Note that this log is taken with my patch that will remove all coded frames if we evict with 0 buffered ranges. You can identify when this happens from the line with

RemoveAllCodedFrames: RemoveAllCodedFrames called.

As well as the preceding line that shows that we're evicting with no buffered ranges. Up until that point in the log, I imagine things were running in a similar fashion as to pre-patch, but that Firefox would have crashed as in this bug when reaching that point due to the bad access.

Pushed by bvandyk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fd6a6b881725
Prevent TrackBuffersManager attempting to access buffered ranges if none exist when evicting data. r=jya
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

NI me to remind myself to come back and check stats and request uplift if nightly crashes are fixed.

Flags: needinfo?(bvandyk)

No crashes in nightly since this landed.

Can you request uplift to beta? Thanks!

Comment on attachment 9090482 [details]
Bug 1578143 - Prevent TrackBuffersManager attempting to access buffered ranges if none exist when evicting data. r?jya!

Beta/Release Uplift Approval Request

  • User impact if declined: Websites can crash Firefox by appending 0 duration media to a source buffer.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Would be useful to spend some time trying to repro and see if better steps can be found. I found this difficult to verify. For me the problem was intermittent, and was best reproduced by watching a video on animeflv.net until the video was nearly done, then seeking backwards in the video until the video stalled, after which the bug would sometimes repro. With the fix the stall still takes place, but the crash does not (as best I can tell).
  • List of other uplifts needed: None.
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): Low/Medium: the code path taken to invoke the new behaviour in the patch should be quite rare (pages filling up buffers with 0 duration media), so it's unlikely the new code will execute during normal use. However, the Media Source Extensions code is non trivial, and this is small, but non trivial sized change within that code base. Further, due to the difficulty in reproing the bug, we do not have automated test coverage.
  • String changes made/needed: None.
Flags: needinfo?(bvandyk)
Attachment #9090482 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9090482 [details]
Bug 1578143 - Prevent TrackBuffersManager attempting to access buffered ranges if none exist when evicting data. r?jya!

Crash fix, verified in nightly (from crash-stats)
Let's bring this to beta 10.

Attachment #9090482 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hello!
I've managed to reproduce the issue on the affected build Firefox 71.0a1 (20190902213933) using Windows 10x64. Here is the crash report having the same signature. On my end after several times of seeking the videos on animeflv.net, I've managed to reproduce it by opening several tabs from the same page with playing videos and seeking with the keyboard arrows on random videos until the stalling. Although neither these steps are consistent because the crash wasn't occurring every time. Sometimes the crash occurred on random tabs with playing video.
Using random actions and information from comment 11 I tried reproducing the crash with Firefox 71.0a1 (20190924215237) and Firefox 70.0b10 (20190924213629) on Windows 10x64. The video stalling still occurs but no crashes encountered. I think it's safe to assume that the issue is verified fixed.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

the crash signature is fairly common for users of spanish builds (#5 top tab crash for them on 69 release) probably due to the popularity of the crashing website, so maybe it's still worth keeping the fix for 69 dot release or 68 esr consideration?

Flags: needinfo?(ryanvm)

I discussed this with Bryce and I think it's best to let this ship with 70. There isn't much test coverage around this code and it hasn't had much time to bake yet. That said, I'll set esr68 back to fix-optional so we can look at maybe uplifting this fix there eventually in the future if it appears stable and the crash rate justifies it.

Flags: needinfo?(ryanvm)

Looks like this fix has been solid on 70. Should we consider nominating it for ESR68 uplift? It grafts cleanly as-landed.

Flags: needinfo?(bvandyk)

Comment on attachment 9090482 [details]
Bug 1578143 - Prevent TrackBuffersManager attempting to access buffered ranges if none exist when evicting data. r?jya!

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixes a crash. The fix appears to have worked and been stable on the trains for a month.
  • User impact if declined: Firefox may crash when using certain sites that appear to misuse MSE. There appear to be some popular sites for a Spanish users that do this.
  • Fix Landed on Version: 70
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): Medium. Non-trivial patch in a complicated part of the codebase. Mitigated by that the code has been landed for a month without issue.
  • String or UUID changes made by this patch: None.
Flags: needinfo?(bvandyk)
Attachment #9090482 - Flags: approval-mozilla-esr68?

Seems reasonable. Given that we still have crashes and haven't seen issue with the patch.

Comment on attachment 9090482 [details]
Bug 1578143 - Prevent TrackBuffersManager attempting to access buffered ranges if none exist when evicting data. r?jya!

Patch appears to be working. Approved for 68.3esr.

Attachment #9090482 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Flags: qe-verify+

Hello! After several attempts with Firefox 68.3.0esr (20191126000427) on Windows 10x64, there were no crashes encountered while navigating https://animeflv.net on various videos and using the seek bar.

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.