Last Comment Bug 645773 - Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheStream::Read ]
: Crash when switching to fullscreen video on air.mozilla.org [@ nsMediaCacheSt...
Status: RESOLVED FIXED
: crash, regression, valgrind
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla5
Assigned To: Chris Pearce (:cpearce)
:
Mentors:
https://crash-stats.mozilla.com/repor...
: 629705 646327 650958 651950 652006 652128 652202 (view as bug list)
Depends on:
Blocks: 639391 652006
  Show dependency treegraph
 
Reported: 2011-03-28 11:23 PDT by Anthony Hughes (:ashughes) [GFX][QA][Mentor]
Modified: 2012-03-06 15:03 PST (History)
18 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
v0 (4.54 KB, patch)
2011-03-31 21:22 PDT, Matthew Gregan [:kinetik]
no flags Details | Diff | Review
Patch: Typo fix in nsOggReader::GetSeekRanges() (1.09 KB, patch)
2011-04-25 14:28 PDT, Chris Pearce (:cpearce)
roc: review+
jst: approval‑mozilla‑aurora+
Details | Diff | Review

Description Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-03-28 11:23:12 PDT
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.2a1pre) Gecko/20110328 Firefox/4.2a1pre

When switching to fullscreen mode while watching a live stream at air.mozilla.org, Minefield crashes.  I've included two crash reports since they seem to be different signatures.

Initial Crash:
https://crash-stats.mozilla.com/report/index/bp-5683be15-e073-42be-8ecd-244e22110328

Subsequent Crash:
https://crash-stats.mozilla.com/report/index/bp-a8c38ded-c983-475e-8cc8-356d22110328
Comment 1 Chris Pearce (:cpearce) 2011-03-28 14:36:43 PDT
(In reply to comment #0)
> Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.2a1pre) Gecko/20110328
> Firefox/4.2a1pre

> Initial Crash:
> https://crash-stats.mozilla.com/report/index/bp-5683be15-e073-42be-8ecd-244e22110328

Must be a crash in nsMediaCache::NoteBlockUsage() when it's called by nsMediaCacheStream::Read()?
 
> Subsequent Crash:
> https://crash-stats.mozilla.com/report/index/bp-a8c38ded-c983-475e-8cc8-356d22110328

This second crash seems to be unrelated to video; all the video threads are asleep. It looks like it's a problem with poly IC or compartments? dmandelin, any ideas about the stack in this "subsequent crash"?
Comment 2 Matthew Gregan [:kinetik] 2011-03-29 21:21:09 PDT
Spun off the JS crash into bug 646327.
Comment 3 Matthew Gregan [:kinetik] 2011-03-29 22:38:22 PDT
Looks like I can reproduce this pretty easily using VLC's streaming HTTP server.  I crashed with a different stack, but I'm pretty sure that's due to the underlying bug being heap corruption.  Valgrind reports:

==25627== Thread 19:
==25627== Syscall param read(buf) points to unaddressable byte(s)
==25627==    at 0x317DE0E4ED: ??? (syscall-template.S:82)
==25627==    by 0x7592BF1: pt_Read (ptio.c:1305)
==25627==    by 0x757A582: PR_Read (priometh.c:141)
==25627==    by 0x58054E2: nsMediaCache::ReadCacheFile(long, void*, int, int*) (nsMediaCache.cpp:686)
==25627==    by 0x5807420: nsMediaCacheStream::Read(char*, unsigned int, unsigned int*) (nsMediaCache.cpp:2165)
==25627==    by 0x5809360: nsMediaChannelStream::Read(char*, unsigned int, unsigned int*) (nsMediaStream.cpp:591)
==25627==    by 0x581F832: PageSync(nsMediaStream*, ogg_sync_state*, int, long, long, ogg_page*, int&) (nsOggReader.cpp:1390)
==25627==    by 0x58203A8: nsOggReader::SeekBisection(long, nsOggReader::SeekRange const&, unsigned int) (nsOggReader.cpp:1532)
==25627==    by 0x58207DA: nsOggReader::SeekInUnbuffered(long, long, long, nsTArray<nsOggReader::SeekRange, nsTArrayDefaultAllocator> const&) (nsOggReader.cpp:1279)
==25627==    by 0x5822A84: nsOggReader::Seek(long, long, long, long) (nsOggReader.cpp:1340)
==25627==    by 0x580F801: nsBuiltinDecoderStateMachine::Run() (nsBuiltinDecoderStateMachine.cpp:1168)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==  Address 0x14c9395c is 0 bytes after a block of size 20,972 alloc'd
==25627==    at 0x4A055AE: realloc (vg_replace_malloc.c:525)
==25627==    by 0x580458B: ogg_sync_buffer (ogg_framing.c:619)
==25627==    by 0x5820121: nsOggReader::ReadOggPage(ogg_page*) (nsOggReader.cpp:792)
==25627==    by 0x5820A2A: nsOggReader::ReadOggPacket(nsOggCodecState*, ogg_packet*) (nsOggReader.cpp:839)
==25627==    by 0x5821F5E: nsOggReader::DecodeVideoFrame(int&, long) (nsOggReader.cpp:704)
==25627==    by 0x580E6C6: nsBuiltinDecoderStateMachine::DecodeLoop() (nsBuiltinDecoderStateMachine.cpp:330)
==25627==    by 0x580FD96: nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run() (nsThreadUtils.h:345)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==    by 0x5E05785: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==25627==    by 0x5E412B6: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
==25627==    by 0x7593FF6: _pt_root (ptthread.c:187)
==25627==    by 0x317DE07760: start_thread (pthread_create.c:301)
==25627== 
==25627== Invalid write of size 1
==25627==    at 0x4A06F87: memcpy (mc_replace_strmem.c:635)
==25627==    by 0x58074D9: nsMediaCacheStream::Read(char*, unsigned int, unsigned int*) (nsMediaCache.cpp:2139)
==25627==    by 0x5809360: nsMediaChannelStream::Read(char*, unsigned int, unsigned int*) (nsMediaStream.cpp:591)
==25627==    by 0x581F832: PageSync(nsMediaStream*, ogg_sync_state*, int, long, long, ogg_page*, int&) (nsOggReader.cpp:1390)
==25627==    by 0x58203A8: nsOggReader::SeekBisection(long, nsOggReader::SeekRange const&, unsigned int) (nsOggReader.cpp:1532)
==25627==    by 0x58207DA: nsOggReader::SeekInUnbuffered(long, long, long, nsTArray<nsOggReader::SeekRange, nsTArrayDefaultAllocator> const&) (nsOggReader.cpp:1279)
==25627==    by 0x5822A84: nsOggReader::Seek(long, long, long, long) (nsOggReader.cpp:1340)
==25627==    by 0x580F801: nsBuiltinDecoderStateMachine::Run() (nsBuiltinDecoderStateMachine.cpp:1168)
==25627==    by 0x5E40C00: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==25627==    by 0x5E05785: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==25627==    by 0x5E412B6: nsThread::ThreadFunc(void*) (nsThread.cpp:278)
==25627==    by 0x7593FF6: _pt_root (ptthread.c:187)
==25627==  Address 0x14c95a43 is not stack'd, malloc'd or (recently) free'd

So I'm going to mark bug 646327 as a duplicate of this on the assumption that it's caused by the same heap corruption.
Comment 4 Matthew Gregan [:kinetik] 2011-03-29 22:39:43 PDT
*** Bug 646327 has been marked as a duplicate of this bug. ***
Comment 5 Matthew Gregan [:kinetik] 2011-03-30 21:45:32 PDT
I added an assert based on code inspection, and reproducing the bug confirms we hit the assert (and it turns out we already had an assert covering this in the caller):

###!!! ASSERTION: Guess must be before range end: 'guess < endOffset', file /home/kinetik/mozilla-build/content/media/ogg/nsOggReader.cpp, line 1533
###!!! ASSERTION: readHead must not exceed aEndOffset: 'readHead <= aEndOffset', file /home/kinetik/mozilla-build/content/media/ogg/nsOggReader.cpp, line 1387

[Cursing NS_ASSERTION being non-fatal removed.]

endOffset is -1 because the stream length returned by nsMediaStream::GetLength() is -1.

This propagates via a SeekRange from SeekInUnbuffered down to PageSync, where: 

1389	      PRUint32 bytesToRead =
1390	        static_cast<PRUint32>(NS_MIN(static_cast<PRInt64>(PAGE_STEP),
1391	                                     aEndOffset - readHead));
1392	      if (bytesToRead <= 0) {
1393	        return PAGE_SYNC_END_OF_RANGE;
1394	      }

aEndOffset is -1, readHead is 3373, PAGE_STEP is 8192, so NS_MIN(8192, -3374) is -3374, which is then cast to PRUint32 resulting in 4294963922.  The bytesToRead check is almost useless.

The PageSync seeks to readHead, and attempts to read bytesToRead bytes into buffer (which is PAGE_STEP bytes).  This results in a 29395 byte heap overflow (followed by additional overflows, depending on what's in the media cache).
Comment 6 Matthew Gregan [:kinetik] 2011-03-31 21:22:56 PDT
Created attachment 523514 [details] [diff] [review]
v0

Make SelectSeekRange return a null range when the stream length is unknown.  Make bytesToRead calculation in PageSync more robust when dealing with out-of-range values.  Fix a typo in an assertion and an unused variable warning.

This patch probably also fixes bug 635726.
Comment 7 Chris Pearce (:cpearce) 2011-03-31 22:28:29 PDT
Maybe we should change our uses of NS_ASSERTION to be NS_ABORT_IF_FALSE instead? The SVG guys were talking about doing this in their code.

SelectSeekRange() should only return a null range if we're looking for an explicit range in which the seek target lies and we can't find one - aExact is PR_TRUE in this case. If aExact is PR_FALSE, we're supposed to use the ranges to narrow down the search space for the bisection search in unbuffered data.

If we're on a live stream, and we can't find the *exact* range in which the seek target lies, we should be failing the seek or doing something else appropriate, we shouldn't try to seek into unbuffered data in a live stream... Should we allow seeking into any range other than the one in which the current playback position resides in a live stream? We haven't really focused on live streams very much.

Why does the condition which sets |eo| in SelectSeekRange() not fire? Does that need to be:
if (r.mTimeEnd >= aTarget && r.mTimeEnd <= et) instead?

What if you change |eo| in SelectSeekRange() to be |ranges[ranges.Length()-1].mOffsetEnd| when we're playing a live stream? I'm a bit dubious that would work, but it might be worth a shot.

I can repro a similar problem when playing a live stream if I make my media cache really small and seek anywhere (i.e. in unbuffered data I assume).
Comment 8 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-01 05:15:47 PDT
I think changing NS_ASSERTION to NS_ABORT_IF_FALSE makes sense if we're likely to crash when the assertion fires.

But if we're not likely to crash, I think changing to NS_ABORT_IF_FALSE is macho posturing that has mostly negative effects (e.g., reducing the amount of information you might get out of a debugging session).
Comment 9 Matthew Gregan [:kinetik] 2011-04-21 15:13:18 PDT
*** Bug 650958 has been marked as a duplicate of this bug. ***
Comment 10 Matthew Gregan [:kinetik] 2011-04-21 15:13:23 PDT
*** Bug 651950 has been marked as a duplicate of this bug. ***
Comment 11 Chris Pearce (:cpearce) 2011-04-22 18:26:12 PDT
*** Bug 652202 has been marked as a duplicate of this bug. ***
Comment 12 Chris Pearce (:cpearce) 2011-04-22 18:27:16 PDT
*** Bug 652128 has been marked as a duplicate of this bug. ***
Comment 13 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-04-22 18:31:25 PDT
Is this a regression from Fx4?
Comment 14 Chris Pearce (:cpearce) 2011-04-22 18:38:08 PDT
There's been a spate of dupes filed on this, with various stacks.

We should fix this in Aurora.

Simple STR; load:
http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD.html
and seek in the audio element there. It's served without a Content-Length or byte range support, making it behave as a live stream.

Based on bug 629705 comment #4 this is probably a regression from:

d3aeb102da2b    Chris Pearce — Bug 639391 - Ensure Ogg GetBuffered() is
threadsafe. r=roc
Comment 15 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-22 23:16:37 PDT
Even simpler STR (without the GL stuff): load http://cubicvr.org/CubicVR.js/bd3/track3.php and start seeking.
Comment 16 Chris Pearce (:cpearce) 2011-04-25 14:28:55 PDT
Created attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

I made a mistake in nsOggReader::GetSeekRanges(), I was iterating over the wrong array, causing GetSeekRanges() to not report any byte ranges as buffered, causing the seek to go down the wrong code path.

This does not fix the wider issue of how we should handle seeking to a target in a live which is no longer buffered, but that's not a regression, whereas this crash is.
Comment 17 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-25 16:25:30 PDT
Comment on attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

Review of attachment 528177 [details] [diff] [review]:
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-25 16:26:00 PDT
A test here would be nice.
Comment 19 Chris Pearce (:cpearce) 2011-04-25 16:38:15 PDT
(In reply to comment #16)
> Created attachment 528177 [details] [diff] [review]
> Patch: Typo fix in nsOggReader::GetSeekRanges()

Landed on m-c, so will appear in Fx6.
http://hg.mozilla.org/mozilla-central/rev/3cdd96a6d372

The crash still occurs on Aurora for Fx5 release.
Comment 20 Chris Pearce (:cpearce) 2011-04-25 16:39:13 PDT
Comment on attachment 528177 [details] [diff] [review]
Patch: Typo fix in nsOggReader::GetSeekRanges()

Can we have approval to land this on Aurora please? This patch is simple, and fixes a crash in video.
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-04-25 16:44:24 PDT
A crash *regression*, that is.
Comment 22 Matthew Gregan [:kinetik] 2011-04-25 17:07:12 PDT
Would've been nice to take the robustness fixes from my patch...
Comment 23 Chris Pearce (:cpearce) 2011-04-26 15:36:56 PDT
Landed on Aurora for Fx5.
http://hg.mozilla.org/mozilla-aurora/rev/ea8b7e3f457b
Comment 24 Chris Pearce (:cpearce) 2011-04-26 15:39:21 PDT
*** Bug 629705 has been marked as a duplicate of this bug. ***
Comment 25 Matthew Gregan [:kinetik] 2011-05-06 15:01:33 PDT
*** Bug 652006 has been marked as a duplicate of this bug. ***
Comment 26 Matthew Gregan [:kinetik] 2011-07-25 12:47:02 PDT
For historians: bug 652721 was checked in with a commit message incorrectly pointing to this bug because I forgot to fix the commit message in the patch.

Note You need to log in before you can comment on or make changes to this bug.