Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheStream::Read ]

RESOLVED FIXED in Firefox 5

Status

()

Core
Audio/Video
--
critical
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: ashughes, Assigned: cpearce)

Tracking

({crash, regression, valgrind})

Trunk
mozilla5
crash, regression, valgrind
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox5 fixed)

Details

(crash signature, URL)

Attachments

(1 attachment, 1 obsolete attachment)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.2a1pre) Gecko/20110328 Firefox/4.2a1pre

When switching to fullscreen mode while watching a live stream at air.mozilla.org, Minefield crashes.  I've included two crash reports since they seem to be different signatures.

Initial Crash:
https://crash-stats.mozilla.com/report/index/bp-5683be15-e073-42be-8ecd-244e22110328

Subsequent Crash:
https://crash-stats.mozilla.com/report/index/bp-a8c38ded-c983-475e-8cc8-356d22110328
Severity: normal → critical
Keywords: crash
Summary: Crash when switching to fullscreen video on air.mozilla.org → Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheStream::Read ] [@ js::mjit::JITScript::purgePICs ]
(Assignee)

Comment 1

6 years ago
(In reply to comment #0)
> Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.2a1pre) Gecko/20110328
> Firefox/4.2a1pre

> Initial Crash:
> https://crash-stats.mozilla.com/report/index/bp-5683be15-e073-42be-8ecd-244e22110328

Must be a crash in nsMediaCache::NoteBlockUsage() when it's called by nsMediaCacheStream::Read()?
 
> Subsequent Crash:
> https://crash-stats.mozilla.com/report/index/bp-a8c38ded-c983-475e-8cc8-356d22110328

This second crash seems to be unrelated to video; all the video threads are asleep. It looks like it's a problem with poly IC or compartments? dmandelin, any ideas about the stack in this "subsequent crash"?
Spun off the JS crash into bug 646327.
Summary: Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheStream::Read ] [@ js::mjit::JITScript::purgePICs ] → Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheStream::Read ]
Looks like I can reproduce this pretty easily using VLC's streaming HTTP server.  I crashed with a different stack, but I'm pretty sure that's due to the underlying bug being heap corruption.  Valgrind reports:

==25627== Thread 19:
==25627== Syscall param read(buf) points to unaddressable byte(s)
==25627==    at 0x317DE0E4ED: ??? (syscall-template.S:82)
==25627==    by 0x7592BF1: pt_Read (ptio.c:1305)
==25627==    by 0x757A582: PR_Read (priometh.c:141)
==25627==    by 0x58054E2: nsMediaCache::ReadCacheFile(long, void*, int, int*) (nsMediaCache.cpp:686)
==25627==    by 0x5807420: nsMediaCacheStream::Read(char*, unsigned int, unsigned int*) (nsMediaCache.cpp:2165)
==25627==    by 0x5809360: nsMediaChannelStream::Read(char*, unsigned int, unsigned int*) (nsMediaStream.cpp:591)
==25627==    by 0x581F832: PageSync(nsMediaStream*, ogg_sync_state*, int, long, long, ogg_page*, int&) (nsOggReader.cpp:1390)
==25627==    by 0x58203A8: nsOggReader::SeekBisection(long, nsOggReader::SeekRange const&, unsigned int) (nsOggReader.cpp:1532)
==25627==    by 0x58207DA: nsOggReader::SeekInUnbuffered(long, long, long, nsTArray<nsOggReader::SeekRange, nsTArrayDefaultAllocator> const&) (nsOggReader.cpp:1279)
==25627==    by 0x5822A84: nsOggReader::Seek(long, long, long, long) (nsOggReader.cpp:1340)
==25627==    by 0x580F801: nsBuiltinDecoderStateMachine::Run() (nsBuiltinDecoderStateMachine.cpp:1168)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==  Address 0x14c9395c is 0 bytes after a block of size 20,972 alloc'd
==25627==    at 0x4A055AE: realloc (vg_replace_malloc.c:525)
==25627==    by 0x580458B: ogg_sync_buffer (ogg_framing.c:619)
==25627==    by 0x5820121: nsOggReader::ReadOggPage(ogg_page*) (nsOggReader.cpp:792)
==25627==    by 0x5820A2A: nsOggReader::ReadOggPacket(nsOggCodecState*, ogg_packet*) (nsOggReader.cpp:839)
==25627==    by 0x5821F5E: nsOggReader::DecodeVideoFrame(int&, long) (nsOggReader.cpp:704)
==25627==    by 0x580E6C6: nsBuiltinDecoderStateMachine::DecodeLoop() (nsBuiltinDecoderStateMachine.cpp:330)
==25627==    by 0x580FD96: nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run() (nsThreadUtils.h:345)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==    by 0x5E05785: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==25627==    by 0x5E412B6: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
==25627==    by 0x7593FF6: _pt_root (ptthread.c:187)
==25627==    by 0x317DE07760: start_thread (pthread_create.c:301)
==25627== 
==25627== Invalid write of size 1
==25627==    at 0x4A06F87: memcpy (mc_replace_strmem.c:635)
==25627==    by 0x58074D9: nsMediaCacheStream::Read(char*, unsigned int, unsigned int*) (nsMediaCache.cpp:2139)
==25627==    by 0x5809360: nsMediaChannelStream::Read(char*, unsigned int, unsigned int*) (nsMediaStream.cpp:591)
==25627==    by 0x581F832: PageSync(nsMediaStream*, ogg_sync_state*, int, long, long, ogg_page*, int&) (nsOggReader.cpp:1390)
==25627==    by 0x58203A8: nsOggReader::SeekBisection(long, nsOggReader::SeekRange const&, unsigned int) (nsOggReader.cpp:1532)
==25627==    by 0x58207DA: nsOggReader::SeekInUnbuffered(long, long, long, nsTArray<nsOggReader::SeekRange, nsTArrayDefaultAllocator> const&) (nsOggReader.cpp:1279)
==25627==    by 0x5822A84: nsOggReader::Seek(long, long, long, long) (nsOggReader.cpp:1340)
==25627==    by 0x580F801: nsBuiltinDecoderStateMachine::Run() (nsBuiltinDecoderStateMachine.cpp:1168)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==    by 0x5E05785: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==25627==    by 0x5E412B6: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
==25627==    by 0x7593FF6: _pt_root (ptthread.c:187)
==25627==  Address 0x14c95a43 is not stack'd, malloc'd or (recently) free'd

So I'm going to mark bug 646327 as a duplicate of this on the assumption that it's caused by the same heap corruption.
Duplicate of this bug: 646327
I added an assert based on code inspection, and reproducing the bug confirms we hit the assert (and it turns out we already had an assert covering this in the caller):

###!!! ASSERTION: Guess must be before range end: 'guess < endOffset', file /home/kinetik/mozilla-build/content/media/ogg/nsOggReader.cpp, line 1533
###!!! ASSERTION: readHead must not exceed aEndOffset: 'readHead <= aEndOffset', file /home/kinetik/mozilla-build/content/media/ogg/nsOggReader.cpp, line 1387

[Cursing NS_ASSERTION being non-fatal removed.]

endOffset is -1 because the stream length returned by nsMediaStream::GetLength() is -1.

This propagates via a SeekRange from SeekInUnbuffered down to PageSync, where: 

1389	      PRUint32 bytesToRead =
1390	        static_cast<PRUint32>(NS_MIN(static_cast<PRInt64>(PAGE_STEP),
1391	                                     aEndOffset - readHead));
1392	      if (bytesToRead <= 0) {
1393	        return PAGE_SYNC_END_OF_RANGE;
1394	      }

aEndOffset is -1, readHead is 3373, PAGE_STEP is 8192, so NS_MIN(8192, -3374) is -3374, which is then cast to PRUint32 resulting in 4294963922.  The bytesToRead check is almost useless.

The PageSync seeks to readHead, and attempts to read bytesToRead bytes into buffer (which is PAGE_STEP bytes).  This results in a 29395 byte heap overflow (followed by additional overflows, depending on what's in the media cache).
Keywords: valgrind
Created attachment 523514 [details] [diff] [review]
v0

Make SelectSeekRange return a null range when the stream length is unknown.  Make bytesToRead calculation in PageSync more robust when dealing with out-of-range values.  Fix a typo in an assertion and an unused variable warning.

This patch probably also fixes bug 635726.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Attachment #523514 - Flags: review?(chris)
(Assignee)

Comment 7

6 years ago
Maybe we should change our uses of NS_ASSERTION to be NS_ABORT_IF_FALSE instead? The SVG guys were talking about doing this in their code.

SelectSeekRange() should only return a null range if we're looking for an explicit range in which the seek target lies and we can't find one - aExact is PR_TRUE in this case. If aExact is PR_FALSE, we're supposed to use the ranges to narrow down the search space for the bisection search in unbuffered data.

If we're on a live stream, and we can't find the *exact* range in which the seek target lies, we should be failing the seek or doing something else appropriate, we shouldn't try to seek into unbuffered data in a live stream... Should we allow seeking into any range other than the one in which the current playback position resides in a live stream? We haven't really focused on live streams very much.

Why does the condition which sets |eo| in SelectSeekRange() not fire? Does that need to be:
if (r.mTimeEnd >= aTarget && r.mTimeEnd <= et) instead?

What if you change |eo| in SelectSeekRange() to be |ranges[ranges.Length()-1].mOffsetEnd| when we're playing a live stream? I'm a bit dubious that would work, but it might be worth a shot.

I can repro a similar problem when playing a live stream if I make my media cache really small and seek anywhere (i.e. in unbuffered data I assume).
I think changing NS_ASSERTION to NS_ABORT_IF_FALSE makes sense if we're likely to crash when the assertion fires.

But if we're not likely to crash, I think changing to NS_ABORT_IF_FALSE is macho posturing that has mostly negative effects (e.g., reducing the amount of information you might get out of a debugging session).
Duplicate of this bug: 650958
Duplicate of this bug: 651950
Blocks: 652006
(Assignee)

Updated

6 years ago
Duplicate of this bug: 652202
(Assignee)

Updated

6 years ago
Duplicate of this bug: 652128
Is this a regression from Fx4?
tracking-firefox5: --- → ?
tracking-firefox6: --- → ?
(Assignee)

Comment 14

6 years ago
There's been a spate of dupes filed on this, with various stacks.

We should fix this in Aurora.

Simple STR; load:
http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD.html
and seek in the audio element there. It's served without a Content-Length or byte range support, making it behave as a live stream.

Based on bug 629705 comment #4 this is probably a regression from:

d3aeb102da2b    Chris Pearce — Bug 639391 - Ensure Ogg GetBuffered() is
threadsafe. r=roc
Blocks: 639391
Keywords: regression
Even simpler STR (without the GL stuff): load http://cubicvr.org/CubicVR.js/bd3/track3.php and start seeking.
(Assignee)

Comment 16

6 years ago
Created attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

I made a mistake in nsOggReader::GetSeekRanges(), I was iterating over the wrong array, causing GetSeekRanges() to not report any byte ranges as buffered, causing the seek to go down the wrong code path.

This does not fix the wider issue of how we should handle seeking to a target in a live which is no longer buffered, but that's not a regression, whereas this crash is.
Attachment #528177 - Flags: review?(roc)
Comment on attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

Review of attachment 528177 [details] [diff] [review]:
Attachment #528177 - Flags: review?(roc) → review+
A test here would be nice.
(Assignee)

Comment 19

6 years ago
(In reply to comment #16)
> Created attachment 528177 [details] [diff] [review]
> Patch: Typo fix in nsOggReader::GetSeekRanges()

Landed on m-c, so will appear in Fx6.
http://hg.mozilla.org/mozilla-central/rev/3cdd96a6d372

The crash still occurs on Aurora for Fx5 release.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6
(Assignee)

Comment 20

6 years ago
Comment on attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

Can we have approval to land this on Aurora please? This patch is simple, and fixes a crash in video.
Attachment #528177 - Flags: approval-mozilla-aurora?
A crash *regression*, that is.
Would've been nice to take the robustness fixes from my patch...
Assignee: kinetik → chris
Attachment #523514 - Flags: review?(chris)
Attachment #523514 - Attachment is obsolete: true

Updated

6 years ago
Attachment #528177 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

6 years ago
tracking-firefox5: ? → ---
tracking-firefox6: ? → ---
(Assignee)

Comment 23

6 years ago
Landed on Aurora for Fx5.
http://hg.mozilla.org/mozilla-aurora/rev/ea8b7e3f457b
Target Milestone: mozilla6 → mozilla5
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → Trunk
(Assignee)

Updated

6 years ago
Duplicate of this bug: 629705
(Assignee)

Updated

6 years ago
status-firefox5: --- → fixed
Duplicate of this bug: 652006
Crash Signature: [@ nsMediaCacheStream::Read ]
For historians: bug 652721 was checked in with a commit message incorrectly pointing to this bug because I forgot to fix the commit message in the patch.
Crash Signature: [@ nsMediaCacheStream::Read ] → [@ nsMediaCacheStream::Read ] [@ nsMediaCache::ReadCacheFile(__int64, void*, int, int*) ]
Crash Signature: [@ nsMediaCacheStream::Read ] [@ nsMediaCache::ReadCacheFile(__int64, void*, int, int*) ] → [@ nsMediaCacheStream::Read ]
You need to log in before you can comment on or make changes to this bug.