Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mp4_demuxer::MoofParser::RebuildFragmentedIndex

VERIFIED FIXED in Firefox -esr52

Status

()

defect
P1
critical
VERIFIED FIXED
2 years ago
2 years ago

People

(Reporter: philipp, Assigned: jya)

Tracking

({crash})

53 Branch
mozilla55
x86
Windows
Points:
---

Firefox Tracking Flags

(platform-rel +, firefox-esr5255+ fixed, firefox53 wontfix, firefox54+ wontfix, firefox55 verified, firefox56 verified)

Details

(Whiteboard: [platform-rel-Twitch], crash signature)

Attachments

(6 attachments)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-22262dd4-ebd3-445e-be9c-4edac0170522.
=============================================================
Crashing Thread (104)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	mozglue.dll 	mozalloc_handle_oom(unsigned int) 	memory/mozalloc/mozalloc_oom.cpp:46
2 	mozglue.dll 	moz_xrealloc 	memory/mozalloc/mozalloc.cpp:107
3 	xul.dll 	nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) 	obj-firefox/dist/include/nsTArray-inl.h:183
4 	xul.dll 	nsTArray_Impl<mp4_demuxer::Moof, nsTArrayInfallibleAllocator>::AppendElement<mp4_demuxer::Moof&, nsTArrayInfallibleAllocator>(mp4_demuxer::Moof&) 	obj-firefox/dist/include/nsTArray.h:2228
5 	xul.dll 	mp4_demuxer::MoofParser::RebuildFragmentedIndex(mp4_demuxer::BoxContext&) 	media/libstagefright/binding/MoofParser.cpp:62
6 	xul.dll 	mp4_demuxer::MoofParser::RebuildFragmentedIndex(mozilla::media::IntervalSet<__int64> const&) 	media/libstagefright/binding/MoofParser.cpp:35
7 	xul.dll 	mozilla::MP4TrackDemuxer::EnsureUpToDateIndex() 	dom/media/fmp4/MP4Demuxer.cpp:285
8 	xul.dll 	mozilla::MP4TrackDemuxer::GetSamples(int) 	dom/media/fmp4/MP4Demuxer.cpp:371
9 	xul.dll 	mozilla::TrackBuffersManager::DoDemuxAudio() 	dom/media/mediasource/TrackBuffersManager.cpp:1243
10 	xul.dll 	mozilla::TrackBuffersManager::DoDemuxVideo() 	dom/media/mediasource/TrackBuffersManager.cpp:1215
11 	xul.dll 	mozilla::TrackBuffersManager::CodedFrameProcessing() 	dom/media/mediasource/TrackBuffersManager.cpp:1183
12 	xul.dll 	mozilla::TrackBuffersManager::SegmentParserLoop() 	dom/media/mediasource/TrackBuffersManager.cpp:737
13 	xul.dll 	mozilla::detail::RunnableMethodImpl<mozilla::WatchManager<mozilla::dom::HTMLMediaElement>::PerCallbackWatcher* const, void ( mozilla::WatchManager<mozilla::dom::HTMLMediaElement>::PerCallbackWatcher::*)(void), 1, 0>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:860
14 	xul.dll 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:232
15 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:225
16 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1240
17 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:390
18 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
19 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
20 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
21 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:490
22 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
23 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
24 	ucrtbase.dll 	_o__CIpow 	
Ø 25 	kernel32.dll 	kernel32.dll@0x13369 	
Ø 26 	ntdll.dll 	ntdll.dll@0x39901 	
Ø 27 	ntdll.dll 	ntdll.dll@0x398d4

this crash signature is on the rise since friday last week - some user comments and url correlations from bug reports indicate that it's related to twitch.tv, so it seems likely that some recent change on the site is facilitating this increase.
those crashes appear to happen across all release channels (52esr to 55.0a1), mainly on 32bit browser versions on windows.
Flags: needinfo?(ajones)
(Assignee)

Comment 1

2 years ago
What type of fragmented MP4 is twitch TV using to cause OOM there!???

Sounds like they would be using one moof per sample, but surely, they don't...

I did see those type of using with MSE in the past, but never from professional sites :(
(Reporter)

Comment 2

2 years ago
hi desigan, do we have contacts with twitch who might be able to shed some light on this?
Flags: needinfo?(ajones) → needinfo?(dchinniah)
(Assignee)

Comment 3

2 years ago
We can improve memory usage by clearing the MP4 demuxer as soon as we're done demuxing. Right now, it keeps adding the moof entries to a table. Normally this isn't a problem because a moof contains a block of around 10s of samples.

IIRC, I had seen some twitch test where they used one moof for each sample, this would go very very big quickly.

Plus it's super inefficient to do that.

It would be much better if twitch used a different muxing algorithm.

Updated

2 years ago
Component: Audio/Video → Audio/Video: Playback
(Assignee)

Updated

2 years ago
Assignee: nobody → jyavenard
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Priority: -- → P1

Comment 10

2 years ago
mozreview-review
Comment on attachment 8873943 [details]
Bug 1367128: P1. Remove unused member.

https://reviewboard.mozilla.org/r/145338/#review149598
Attachment #8873943 - Flags: review?(alwu) → review+
Comment on attachment 8873944 [details]
Bug 1367128: P2. Add methods to trim index.

LGTM, but I can't provide any useful suggestion here.
I think alfredo is more qualified for this review.
Attachment #8873944 - Flags: review?(alwu) → review?(ayang)

Updated

2 years ago
Attachment #8873945 - Flags: review?(alwu) → review?(ayang)
(Assignee)

Updated

2 years ago
Attachment #8873944 - Flags: review?(ayang) → review?(gsquelart)
Attachment #8873945 - Flags: review?(ayang) → review?(gsquelart)
Comment on attachment 8873944 [details]
Bug 1367128: P2. Add methods to trim index.

https://reviewboard.mozilla.org/r/145340/#review149630

r+ as-is, but here's a suggestion for later:

::: media/libstagefright/binding/Index.cpp:419
(Diff revision 2)
> +      }
> +      canEvict = false;
> +      break;
> +    }
> +  }
> +  mMoofParser->RebuildFragmentedIndex(aByteRanges, &canEvict);

The bool* felt a bit awkward to me, and I thought you could just have returned whether things got trimmed... And then I noticed that RebuildFragmentedIndex actually returns something (foundValidMoof), but awkwardly again, it's only used in tests!

It's way out of scope for this bug, so if you agree, I would suggest you open a new bug to revisit RebuildFragmentedIndex later on.
Attachment #8873944 - Flags: review?(gsquelart) → review+
Comment on attachment 8873945 [details]
Bug 1367128: P3. Plumb in new eviction methods with MSE demuxer.

https://reviewboard.mozilla.org/r/145342/#review149632
Attachment #8873945 - Flags: review?(gsquelart) → review+
Track 54+ as there are some crashes in 54.
(Assignee)

Comment 15

2 years ago
mozreview-review-reply
Comment on attachment 8873944 [details]
Bug 1367128: P2. Add methods to trim index.

https://reviewboard.mozilla.org/r/145340/#review149630

> The bool* felt a bit awkward to me, and I thought you could just have returned whether things got trimmed... And then I noticed that RebuildFragmentedIndex actually returns something (foundValidMoof), but awkwardly again, it's only used in tests!
> 
> It's way out of scope for this bug, so if you agree, I would suggest you open a new bug to revisit RebuildFragmentedIndex later on.

it used to be used to determine if the buffered range had to be recalculated a while back...

Comment 16

2 years ago
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fe9f05834bba
P1. Remove unused member. r=alwu
https://hg.mozilla.org/integration/autoland/rev/564b0921883d
P2. Add methods to trim index. r=gerald
https://hg.mozilla.org/integration/autoland/rev/34cf246b51d8
P3. Plumb in new eviction methods with MSE demuxer. r=gerald
(In reply to [:philipp] from comment #2)
> hi desigan, do we have contacts with twitch who might be able to shed some
> light on this?

We do have a Twitch mailing list — however it looks like this is now resolved. If it's re-opened and we need to get in touch — yes, we do have the channels to do so. Just reach out.
Flags: needinfo?(dchinniah)
(Assignee)

Comment 19

2 years ago
The fix is based on something I had seen in twitch beta players. I had hoped at the time that they weren't going to use such design... but obviously they did.

Our architectural design is very inefficient with such data. likely causing rather high CPU usage.
(Assignee)

Comment 20

2 years ago
I'm guessing it's too late to request 54 uplift?
Flags: needinfo?(gchang)
Hi :jya,
Yes. We've already built 54 RC today.
Flags: needinfo?(gchang)
(Assignee)

Comment 22

2 years ago
It may be worth doing a point release with this... I can see how it will likely becoming a much bigger problem now that twitch has moved to html5
(In reply to Jean-Yves Avenard [:jya] from comment #22)
> It may be worth doing a point release with this... I can see how it will
> likely becoming a much bigger problem now that twitch has moved to html5

Would appending an init segment work to reset the demuxer?
Flags: needinfo?(ajones)
platform-rel: --- → +
Whiteboard: [platform-rel-Twitch]
Flags: needinfo?(ajones)
Would manual testing be of help here? We could schedule an exploratory testing session on Twitch for this. If the answer is yes, we'd appreciate any tips in terms of test conditions -- so that we could make the most out of our testing.
Flags: qe-verify?
Flags: needinfo?(jyavenard)
(Assignee)

Comment 25

2 years ago
the issue could likely be tested by using a 32 bits build and watching twitch.tv for a few hours...

You should at least see the memory usage climb (very slowly)
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #25)
> the issue could likely be tested by using a 32 bits build and watching
> twitch.tv for a few hours...
> 
> You should at least see the memory usage climb (very slowly)

Understood, we'll give this a try. Thank you -- flagging for manual testing.
Flags: qe-verify? → qe-verify+
Should we put Release/ESR52 approval requests on this? Sounds like you've felt previously this'd be worth dot-release consideration anyway and ESR52 are certainly OOM-prone given that's where XP/Vista users all are these days.
Flags: needinfo?(jyavenard)
(Assignee)

Comment 28

2 years ago
If they watch twitch... certainly.

Though this hasn't made 54 yet (it would be nice)
Flags: needinfo?(jyavenard)
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1374947
This isn't going to make 54 at this point. I did take a look at ESR52 uplifting and Parts 1 & 2 graft cleanly, but Part 3 will need a bit of rebasing. As far as I know, there haven't been any reports of regressions on 55 (and we're 3 weeks from RC), so I think we should go ahead with ESR52 approval requests.
Flags: needinfo?(jyavenard)
(Assignee)

Comment 31

2 years ago
Oh my..  I should have totally requested uplift on this one....

OOM is guaranteed when using popular live video sites.
Flags: needinfo?(jyavenard)
(Assignee)

Comment 32

2 years ago
Oh misread.. it’s already in 55
Comment on attachment 8873943 [details]
Bug 1367128: P1. Remove unused member.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: These patches fix OOMs that can occur on any live streaming site after extended watching, which is a common enough usage scenario that I think it warrants considation. Especially considering that our ESR52 userbase includes a number of users on XP/Vista machines with limited resources.
User impact if declined: Crashes
Fix Landed on Version: 55
Risk to taking this patch (and alternatives if risky): Per IRC discussion with jya, it's low-risk. It has good automated test coverage and has been shipping to users on Nightly since early June and Beta users for the entire 55 cycle with no known regressions.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8873943 - Flags: approval-mozilla-esr52?
(Assignee)

Comment 34

2 years ago
MozReview-Commit-ID: 1SQaAdePmyf
(Assignee)

Comment 35

2 years ago
The MoofParser's used by the MP4Demuxer, can grow very large as it keeps adding new tables and is never reset.

Once we have demuxed all data present in the MP4 media segment, we no longer require the samples table for that media segment and we can drop it from the index.

Unfortunately. some websites (in particular some using live video) use media segments containing a single sample in order to (incorrectly) reduce latency. While this is a very bad approach from a performance perspective it does happen in the wild.

MozReview-Commit-ID: I66jxcScmKM
(Assignee)

Comment 36

2 years ago
MozReview-Commit-ID: HULNEfD3tRJ
Comment on attachment 8873943 [details]
Bug 1367128: P1. Remove unused member.

Fix an OOM issue and has been baked for a while. Let's uplift it to ESR52.3.
Attachment #8873943 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
I managed to reproduce the issue on Firefox 55.0a1 (2017-05-23) and under Windows 10x86 and under Windows 10x64, using STR form Comment 25.
The issue is no longer reproducible on Firefox 55.0b13, or in Firefox 56.0a1(2017-07-27).
Tests were performed under Windows 10x64 and under Windows 10x86.

I've let twitch.tv running for about 3 hours and noticed that on the fix builds, the memory usage was about the same and for the affected build, the memory usage was higher with over 100MB. 

Since the issue is no longer reproducible, I'm marking this issue as Verified Fixed.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.