Closed Bug 1334445 Opened 3 years ago Closed 3 years ago

Crash in mp4_demuxer::ByteWriter::Write

Categories

(Core :: Audio/Video: Playback, defect, critical)

52 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox51 --- unaffected
firefox52 + fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: philipp, Assigned: gerald)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-8ff71745-4ef7-4651-b087-53afa2170127.
=============================================================
Crashing Thread (69)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mp4_demuxer::ByteWriter::Write(unsigned char const*, unsigned int) 	media/libstagefright/binding/include/mp4_demuxer/ByteWriter.h:67
1 	xul.dll 	mp4_demuxer::AnnexB::ConvertSampleToAnnexB(mozilla::MediaRawData*, bool) 	media/libstagefright/binding/AnnexB.cpp:52
2 	xul.dll 	mozilla::H264Converter::Input(mozilla::MediaRawData*) 	dom/media/platforms/wrappers/H264Converter.cpp:89
3 	xul.dll 	mozilla::MediaFormatReader::HandleDemuxedSamples(mozilla::TrackInfo::TrackType, mozilla::AbstractMediaDecoder::AutoNotifyDecoded&) 	dom/media/MediaFormatReader.cpp:1307
4 	xul.dll 	mozilla::MediaFormatReader::Update(mozilla::TrackInfo::TrackType) 	dom/media/MediaFormatReader.cpp:1604
5 	xul.dll 	mozilla::detail::RunnableMethodImpl<void ( mozilla::MediaFormatReader::*)(mozilla::TrackInfo::TrackType), 1, 0, mozilla::TrackInfo::TrackType>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:810
6 	xul.dll 	mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() 	obj-firefox/dist/include/mozilla/TaskDispatcher.h:193
7 	xul.dll 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:232
8 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:226
9 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1216
10 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:361
11 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:205
14 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:467
15 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
16 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
17 	ucrtbase.dll 	_o__CIpow 	
18 	kernel32.dll 	BaseThreadInitThunk 	
19 	ntdll.dll 	__RtlUserThreadStart 	
20 	ntdll.dll 	_RtlUserThreadStart

These reports since firefox 52 are annotated with MOZ_RELEASE_ASSERT(mPtr.append(aSrc, aCount)), which got added in bug 1307945.
And crashing as it should, it's a OOM

if we had used any of the object container (like nsTArray<uint8_t> it would have crashed just the same.
(In reply to Jean-Yves Avenard [:jya] from comment #1)
> And crashing as it should, it's a OOM
> 
> if we had used any of the object container (like nsTArray<uint8_t> it would
> have crashed just the same.

At least, we get a consistent error report, correct?

Back in bug 1307945, you said " The amount of data worked with is always tiny. If we go OOM there, we're screwed anyway and won't be able to recover."
So I think it is OOM condition after all.
But it is nice to have it get reported properly.

Strange, though, one of these days, FF seems to crash after seeing some youtube videos very often.

TIA
That's the problem with 32 bits build.. It's just a matter of time until we run out of usable address space. Nothing much we can do about that.

I don't believe you can recover nicely from those errors, as to handle them and say display an information message you need memory to start with.

now it could be that it was an invalid content that caused the failure, like attempting to allocate way too much data.
But when you mention YouTube, which has excellent content, and is a reference when it comes to how to encode things. It tells me it's just a plain OOM.

I could be wrong of course
(In reply to Jean-Yves Avenard [:jya] from comment #3)
> That's the problem with 32 bits build.. It's just a matter of time until we
> run out of usable address space. Nothing much we can do about that.
> 
> I don't believe you can recover nicely from those errors, as to handle them
> and say display an information message you need memory to start with.
> 
> now it could be that it was an invalid content that caused the failure, like
> attempting to allocate way too much data.
> But when you mention YouTube, which has excellent content, and is a
> reference when it comes to how to encode things. It tells me it's just a
> plain OOM.

I suppose your observation is correct.

One thing that bothers me, though, is that I use Facebook and
FB has a feature to show the activities of your "friends" basically a posting of these people in a single screen from the latest to the older.:  In the last couple of months, without my doing anything new (not even looking at the posted youtube links), FF tends to lock-up (no response) or crash.
It has been unusable when I want to see a posting a couple of days old. (Of course, it depends how far you need to go back, i.e., how many postings are made in that couple of days.)
But it has now come to the point that I often have to invoke chrome or edge to look at the posting since FF often now becomes unusable. 
(Does the remote debugging work when FF becomes unresponsive? [I am not sure if it is busy doing GC or something, but if so, I am not sure if we can learn what FF is doing by remote debugging.]

TIA

> 
> I could be wrong of course
As Jean-Yves pointed out, in many cases OOMs can't really be recovered from, as the memory is probably near-full and releasing our (supposedly) small amount of local data will probably not help.

However, I think there is some value in giving a best-effort shot at handling excessive writes, in case our local data is in fact unreasonably large (either because of a valid large file, or because of corrupted data that makes us produce a lot of internal data), and quickly releasing this data has some chance of preventing crashes.

And it's not that difficult here, there are only a few calls to check.
Assignee: nobody → gsquelart
Comment on attachment 8832381 [details]
Bug 1334445 - Abort AnnexB parsing on ByteWriter failure -

https://reviewboard.mozilla.org/r/108706/#review109824

the return value of ConvertSampleToAnnexB isn't checked everywhere (like the widevine decoder)... we should fix that
Attachment #8832381 - Flags: review?(jyavenard) → review+
Comment on attachment 8832381 [details]
Bug 1334445 - Abort AnnexB parsing on ByteWriter failure -

https://reviewboard.mozilla.org/r/108706/#review109824

I've only added more returns than there already were, and you're heavily modifying this code, so I'll open a separate bug for that...
Blocks: 1335695
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/feb30777c832
Abort AnnexB parsing on ByteWriter failure - r=jya
https://hg.mozilla.org/mozilla-central/rev/feb30777c832
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
[Tracking Requested - why for this release]: Uplift requests needed for 52 and 53.
Flags: needinfo?(gsquelart)
Comment on attachment 8832381 [details]
Bug 1334445 - Abort AnnexB parsing on ByteWriter failure -

Approval Request Comment
[Feature/Bug causing the regression]: bug 1307945
[User impact if declined]: Crashes due to OOM, mainly on Windows 32-bit, when playing certain H264 videos
[Is this code covered by automated tests?]: Yes, Tries in progress:
aurora: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c368fe23072ca9aec0efbc7c8041b7c848858a0
beta: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f17a891c9179dc46635ae9369baa8e9ae7db57a
[Has the fix been verified in Nightly?]: No, as there has been no crash reports on Nightly; But at least there have been no new crashes after landing 5 days ago
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Low indirect risk
[Why is the change risky/not risky?]: It is only adding early returns from functions that already have other such returns; the main risk is that in case of near-OOMs, crashes may happen in other threads because of the low-memory condition, and it would be harder to link to this code -- though it should still appear in stack traces of such crash reports, just not in the signatures
[String changes made/needed]: None
Flags: needinfo?(gsquelart)
Attachment #8832381 - Flags: approval-mozilla-beta?
Attachment #8832381 - Flags: approval-mozilla-aurora?
Comment on attachment 8832381 [details]
Bug 1334445 - Abort AnnexB parsing on ByteWriter failure -

check for ByteWriter::Write failures, aurora53+, beta52+
Attachment #8832381 - Flags: approval-mozilla-beta?
Attachment #8832381 - Flags: approval-mozilla-beta+
Attachment #8832381 - Flags: approval-mozilla-aurora?
Attachment #8832381 - Flags: approval-mozilla-aurora+
Tracking as new crash in beta52.
Setting qe-verify- per Comment 13.
Flags: qe-verify-
Depends on: 1335684
You need to log in before you can comment on or make changes to this bug.