Last Comment Bug 500311 - crash while loading video clip [@ memcpy - oggplay_data_handle_theora_frame]
: crash while loading video clip [@ memcpy - oggplay_data_handle_theora_frame]
Status: VERIFIED FIXED
[sg:critical?] uninitialized memory
: crash, regression, testcase, verified1.9.1
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla1.9.2a1
Assigned To: Matthew Gregan [:kinetik]
:
Mentors:
http://tinyvid.tv/show/2fb8pbzg1ahwj
: 501890 520518 (view as bug list)
Depends on:
Blocks: 493678
  Show dependency treegraph
 
Reported: 2009-06-24 15:35 PDT by juan becerra [:juanb]
Modified: 2011-06-13 10:01 PDT (History)
24 users (show)
mbeltzner: blocking1.9.1-
mbeltzner: blocking1.9.1.1-
dveditz: wanted1.9.0.x-
cpearce: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
.4+
.4-fixed


Attachments
testcase (54.53 KB, video/ogg)
2009-06-25 18:55 PDT, Matthew Gregan [:kinetik]
no flags Details
patch v0 (3.14 KB, patch)
2009-07-13 23:20 PDT, Matthew Gregan [:kinetik]
cajbir.bugzilla: review+
Details | Diff | Review
patch v1 (7.77 KB, patch)
2009-07-29 20:31 PDT, Matthew Gregan [:kinetik]
no flags Details | Diff | Review
Patch - add test (66.78 KB, patch)
2009-08-03 22:06 PDT, Chris Pearce (:cpearce)
roc: review+
Details | Diff | Review
patch v1 for 1.9.1 (7.71 KB, patch)
2009-09-10 13:57 PDT, Matthew Gregan [:kinetik]
no flags Details | Diff | Review
patch v1 for 1.9.1 with test (76.13 KB, patch)
2009-09-10 18:30 PDT, Matthew Gregan [:kinetik]
dveditz: approval1.9.1.4+
Details | Diff | Review

Description juan becerra [:juanb] 2009-06-24 15:35:29 PDT
While testing Fx3.5rc3 on Vista, I visited the tinyvid.tv site and clicked on a few links, and one of them consistently crashed the browser while loading. It also crashes RC2, but does NOT crash in b4. Happens on Mac (latest Minefield).

http://crash-stats.mozilla.com/report/index/75648029-65ea-433f-b3d1-ca3d62090624

Steps:
1. Visit tinyvid.tv site
2. Click on clip "The Time Traveler's Wife trailer" link (video link is http://tinyvid.tv/file/2fb8pbzg1ahwj.ogg)

Expected: Play the clip

Actual: Browsers starts loading the clip, then crashes, consistently.
Comment 1 Matthew Gregan [:kinetik] 2009-06-24 15:54:20 PDT
Trunk crashes on OS X, too.  It's a null pointer deref.  Seems like we get a zeroed yuv_buffer from theora_decode_YUVout:

#0  0x193f0d77 in oggplay_data_handle_theora_frame (decode=0x1c3a3db0, buffer=0xb054d9c4) at /Users/kinetik/work/mozilla-central/mozilla/media/liboggplay/src/liboggplay/oggplay_data.c:365
365	    memcpy(p, q, decode->y_width);
(gdb) p q
$15 = (unsigned char *) 0x0
(gdb) p *buffer
$16 = {
  y_width = 0, 
  y_height = 0, 
  y_stride = 0, 
  uv_width = 0, 
  uv_height = 0, 
  uv_stride = 0, 
  y = 0x0, 
  u = 0x0, 
  v = 0x0
}
Comment 2 Chris Pearce (:cpearce) 2009-06-24 16:10:31 PDT
On Windows trunk I get the buffer as full of 0xcdcdcdcd, which MS's debug libraries stamp over free'd memory. As a consequence the malloc on line 345 is trying to allocate 1253318069 bytes, which probably isn't helping either.

Seems like buffer is being used/passed in after its been free'd.
Comment 3 Aaron Train [:aaronmt] 2009-06-24 16:11:17 PDT
Confirmed crash on Linux i686 as well

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820 Firefox/3.5
Comment 4 Aaron Train [:aaronmt] 2009-06-24 16:16:35 PDT
(In reply to comment #3)
> Confirmed crash on Linux i686 as well
> 
> Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820
> Firefox/3.5

(In reply to comment #3)
> Confirmed crash on Linux i686 as well
> 
> Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820
> Firefox/3.5

I retract that. I tested again, no crash but the video will not play properly.
Comment 5 fcp 2009-06-24 16:58:28 PDT
(In reply to comment #2)
> On Windows trunk I get the buffer as full of 0xcdcdcdcd, which MS's debug
> libraries stamp over free'd memory. As a consequence the malloc on line 345 is
> trying to allocate 1253318069 bytes, which probably isn't helping either.
> 
> Seems like buffer is being used/passed in after its been free'd.

In the debug heap of Visual C++, 0xCDCDCDCD means that the memory is allocated but not set to any values.  Freed memory is filled with 0xDDDDDDDD.
http://msdn.microsoft.com/en-us/library/bebs9zyz.aspx
Comment 6 Chris Pearce (:cpearce) 2009-06-24 17:34:29 PDT
(In reply to comment #5)
> In the debug heap of Visual C++, 0xCDCDCDCD means that the memory is allocated
> but not set to any values.  Freed memory is filled with 0xDDDDDDDD.
> http://msdn.microsoft.com/en-us/library/bebs9zyz.aspx

You're right, I stand corrected. Though I see 0xEEEEEEEE stamped on free'd memory, which is contrary to the documentation. Madness.
Comment 7 Aaron Train [:aaronmt] 2009-06-24 17:39:12 PDT
I get a different stack with the same crash on load of the above URL in Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5. I know there has been some work related imgCacheEntry. 

KERN_PROTECTION_FAILURE at address: 0x00000000

#0  0xffff0800 in ___memcpy () at /System/Library/Frameworks/System.framework/PrivateHeaders/i386/cpu_capabilities.h:246


#1  0x01470683 in std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr<imgCacheEntry>*, std::vector<nsRefPtr<imgCacheEntry>, std::allocator<nsRefPtr<imgCacheEntry> > > >, int, nsRefPtr<imgCacheEntry>, bool (*)(nsRefPtr<imgCacheEntry> const&, nsRefPtr<imgCacheEntry> const&)> ()
Comment 8 Damon Sicore (:damons) 2009-06-24 18:09:08 PDT
Hard to tell from the comments here on the severity, risk, etc.  Do people think we should block on this?
Comment 9 Matthew Gregan [:kinetik] 2009-06-24 18:10:19 PDT
So, the first time we call theora_decode_packetin for this video, th_decode_packetin returns TH_DUPFRAME (which means there's no frame data to extract), and since this error is > 0, theora_decode_packetin returns 0.  We then call theora_decode_YUVout, which returns no error, then pass the still uninitialized yuv_buffer to oggplay_data_handle_theora_frame and crash.

If I force oggplay_data_handle_theora_frame to return early (with the debugger) for this frame, we do not crash.  The video plays through, and replays from the beginning without crashing.

oggplayer doesn't crash when playing this video (with or without an oggplay_seek after load_metadata).
Comment 10 Matthew Gregan [:kinetik] 2009-06-24 18:12:09 PDT
I'm not sure if we should block.  I'm working on finding the regression range... that may help us understand the severity.
Comment 11 Matthew Gregan [:kinetik] 2009-06-24 20:24:25 PDT
Range is: http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?startdate=2009-05-20&enddate=2009-05-22

...which includes a large number of changes to that code.  Working on narrowing down the culprit now.
Comment 12 Mike Beltzner [:beltzner, not reading bugmail] 2009-06-24 22:07:36 PDT
It would also be helpful to know why it's only one video on tinyvid.tv, and how that video got encoded with the bit that's causing us to crash
Comment 13 Matthew Gregan [:kinetik] 2009-06-25 00:00:23 PDT
And the winner is: http://hg.mozilla.org/mozilla-central/rev/c3bf078aded1 (bug 493678).

The crash happens when loading via HTTP, but not with local files.  I'm not yet sure why that is, given the patch that caused the regression.  I've got a fix which I'm pretty sure is the correct thing to do, but I don't yet have an explanation for why this patch caused us to start crashing.  Someone who understands oggplay better might be able to explain it.  I'll CC Viktor.

The reason why we crash is that the first frame oggplay attempts to decode is a duplicate frame (TH_DUPFRAME, which means it doesn't exist in the Theora stream, and you're expected to reuse the prior frame).  The problem here is that it's the first frame we try to decode, so there's no prior frame data to reuse, and we end up using uninitialized data structures.

Except, it's not really the first frame, it's the second.  oggplay incorrectly skips the first frame during decoding.  When decoding headers, theora_decode_header eventually returns 0, which means all expected headers have been processed *and the current packet is video data*.  The problem is, oggplay returns as if this was the last header packet, without attempting to decode the video data.  This behaviour isn't clear from the old Theora API[0], but it is from the new Theora API[1].  Luckily, Chris Double has been working with the new API recently and pointed this behaviour out.

I'll attach the patch shortly, once I've run through mochitests, reftests, and done some more local testing.

Chris is going to analyze the files on tinyvid.tv and work out how many videos would return TH_DUPFRAME on the second frame (the first that oggplay would try to decode) to work out how many we'd crash on.  This'll give us a better idea of the scope of the bug.

I'm tempted to say that we should make this block anyway, because it'd be easy to construct a file that would invoke this crash, and since it causes us to pass uninitialized stack data to memcpy, there's some chance it could be a security risk.  I'll mark this bug as sensitive, just to be sure.  Feel free to unmark if you think I'm being paranoid.

[0] http://theora.org/doc/libtheora-1.0/group__oldfuncs.html#g02915e63c1bd733ee291f577a8b75a82
[1] http://theora.org/doc/libtheora-1.0/group__decfuncs.html#g006d01d36fbe64768c571e6a12b7fc50
Comment 14 Mike Beltzner [:beltzner, not reading bugmail] 2009-06-25 00:05:13 PDT
roc/doublec: can you take a look at comment 13 and see if you agree? Blocking at this point likely means slipping release :(
Comment 15 Matthew Gregan [:kinetik] 2009-06-25 05:07:10 PDT
The patch changed whether the decoder is initialized as active (it used to be
active, now it's inactive).  When it defaulted to active, it would decode a
bunch of packets before getting to the one causing the crash, so Theora's
internal state happened to be set up correctly.  Now that it defaults to
inactive, none of the incoming packets are fed to the decoder until the decoder
is activated--and when that happens, the next packet is the one we crash on
with this file.

The HTTP vs local difference comes down to the fact that with a local file,
once the decoder is activated, the packets fed to oggplay_callback_theora
restart from packet 0 (so we see packets up to 25, then reset to 0).  With
HTTP, the packets never reset to 0--we see 25, then the decoder activates, we
try to process packet 26 and crash.

The reason it doesn't reset is that we don't call oggplay_seek(mPlayer, 0) for
this video when it's hosted on tinyvid.tv (because it supplies a duration
header), and locally (because my test website doesn't support byte ranges), so
the code at
http://mxr.mozilla.org/mozilla-central/source/content/media/video/src/nsOggDecoder.cpp#1821
bails out and we never hit the oggplay_seek.

So the patch I mentioned earlier is probably not the correct fix for this bug.
Comment 16 Daniel Veditz [:dveditz] 2009-06-25 08:45:18 PDT
This isn't going to be the last unfixed ogg/theora crash for 3.5, shouldn't be a problem if the fix goes into 3.5.1 instead.
Comment 17 Henrik Skupin (:whimboo) 2009-06-25 10:00:18 PDT
Testcase only applies to tests we have attached to Bugzilla or in the repository itself. Chris, can we simply shorten those crashing movies to get a local testcase?
Comment 18 cajbir (:cajbir) 2009-06-25 16:36:51 PDT
It should be possible to shorten. It just needs a video with the first few frames be duplicates I believe.
Comment 19 Matthew Gregan [:kinetik] 2009-06-25 18:55:01 PDT
Created attachment 385294 [details]
testcase

I reduced the linked video a bit.
Comment 20 cajbir (:cajbir) 2009-07-01 15:51:29 PDT
This redhat bug report for an Ogg file that crashes Firefox seems to be this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=501632
Comment 21 cajbir (:cajbir) 2009-07-02 00:47:09 PDT
*** Bug 501890 has been marked as a duplicate of this bug. ***
Comment 22 Matěj Cepl 2009-07-03 23:44:43 PDT
(In reply to comment #20)
> This redhat bug report for an Ogg file that crashes Firefox seems to be this
> bug:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=501632

Just to note that I am not totally persuaded that this bug is a duplicate of the Red Hat one. I am not and I have never been able to crash our Firefox with http://proyectofedora.org/mexico/wp-content/uploads/2009/05/boot.ogg (and the reporter is apparently not able either with our FF 3.5), but clicking on the tinyvid video from the description of this bug kills even our FF 3.5 immediately.
Comment 23 Samuel Sidler (old account; do not CC) 2009-07-10 13:29:49 PDT
Matthew: Are you still working on this bug?
Comment 24 Matthew Gregan [:kinetik] 2009-07-10 16:01:56 PDT
I am.  I'm close to understanding the root cause now, so shouldn't be too far away from the correct fix.
Comment 25 Matthew Gregan [:kinetik] 2009-07-13 23:20:46 PDT
Created attachment 388421 [details] [diff] [review]
patch v0

This isn't a perfect fix, but it's fairly simple and returns liboggplay's behaviour to what we had in the past.

Basically, it reverts the change from bug 493678 and instead fixes that bug a different way by tracking the pseudo-active tracks via active_tracks during liboggplay initialization.
Comment 26 Matthew Gregan [:kinetik] 2009-07-14 00:29:36 PDT
Oops, I thought I had posted information about the root cause before posting the patch.

oggplay_initialise calls oggz_read in a loop, reading and processing (via callbacks) a block of data at a time.  Each time oggz_read returns, oggplay_initialise checks all_tracks_initialised, and bails out of the loop if it's true.  As the loop condition is only checked after each block of data has been processed, it's possible to have processed all of the track headers and begun processing the track data before the loop in oggplay_initialise bails out.

With the patch from bug 493678, as each track was detected and initialized, it started off inactive.  Prior to that patch, they defaulted to active.

The decoder callbacks have an early return that tests if the track is active, and bails out if not.  For Theora decoding, this means that the header decode functions will always be called (for header packets), but the data decode functions will only be called if the decoder is active.

In the situation that leads to the crash, oggplay_initialise returns successfully after having received callbacks for data packets but not processing them (because the track is inactive).  nsOggDecoder activates the tracks, then the first time we call oggplay_step_decode, oggplay decodes the next data packet, which happens to be empty in this case.  As this is the first data packet that is ever processed, there's no previous-frame data to fetch from the Theora decoder, and we crash handling uninitialized data.

My patch fixes the problem the simplest way, by returning oggplay to the old behaviour where tracks were created as active.  When this happens, for tracks that can be marked active, it increments the active_tracks counter so that track deactivation is tracked correctly to avoid bug 493678.

A cleaner way to fix this might be to modify oggplay and oggz so that decoder initialization and the actual decoding were logically separate.  One way might be to have the callbacks able to signal that initialization is complete (they can only signal success and failure right now), so that the initialization loop in oggplay_initialise could bail out as soon as the initial track headers had been processed.

During my analysis of this bug, I found some related incorrect behaviour (not crasher bugs) caused by the way initialization is handled.  For instance, in some cases oggplay will decode the same data packets multiple times during initialization, causing a sequence of duplicate frames to be queued in oggplay's buffers.  Given our plans for 1.9.2, I'm not sure if it's worth trying to fix this in oggplay or not.
Comment 27 David Keeler [:keeler] (use needinfo?) 2009-07-24 15:32:00 PDT
Is this the same as (or related to) bug 474077?  If so, should 474077 be marked as security sensitive? (And furthermore, should it or this be marked as a duplicate?)
Comment 28 David Keeler [:keeler] (use needinfo?) 2009-07-24 15:52:07 PDT
Moreover, I can't get the testcase to crash 3.5.1 or mozilla-central trunk (revision 9dfacae57238).  Has this been fixed?
Comment 29 Matthew Gregan [:kinetik] 2009-07-24 16:50:46 PDT
The patch hasn't landed, so it's not fixed.  I can still reproduce locally with the right configuration.

Sorry I didn't make it clear, but the testcase probably doesn't crash if loaded from Bugzilla.  It needs to be loaded from a webserver that supports byte range request, which I don't think Bugzilla does.

Bug 474077 doesn't have enough information to be sure if it's a dupe of this (or one of the other crashes we've seen in oggplay_data_handle_theora_frame).  It's probably a dupe of one of them, but I don't know which one.
Comment 30 Matthew Gregan [:kinetik] 2009-07-29 20:31:43 PDT
Created attachment 391524 [details] [diff] [review]
patch v1

Same patch, but with README_MOZILLA and update.sh updated, and patch of patch included.
Comment 31 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-07-30 03:07:49 PDT
http://hg.mozilla.org/mozilla-central/rev/a328a2594cd5
Comment 32 Chris Pearce (:cpearce) 2009-08-03 22:06:26 PDT
Created attachment 392425 [details] [diff] [review]
Patch - add test

Adds test which checks that the testcase plays when served with X-Content-Duration header. This ensures the previously crashing code-path is exercised.
Comment 33 Chris Pearce (:cpearce) 2009-08-03 23:27:47 PDT
Testcase pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/c7a55d7c39fa
Comment 34 Henrik Skupin (:whimboo) 2009-08-04 01:17:36 PDT
Verified fixed on OS X and Windows with builds like Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090801 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090801042759
Comment 35 Viktor Gal 2009-08-16 12:24:27 PDT
(In reply to comment #26)
> A cleaner way to fix this might be to modify oggplay and oggz so that decoder
> initialization and the actual decoding were logically separate.  One way might
> be to have the callbacks able to signal that initialization is complete (they
> can only signal success and failure right now), so that the initialization loop
> in oggplay_initialise could bail out as soon as the initial track headers had
> been processed.

This behavior has been implemented in 17ef4ca82df28eab294fb8c41e77add6f96e7ab7.
Basically what it does is that it first tries to read all the streams' headers in oggplay_initialise, and depending on the success of decoding the headers it marks whether the track can be activated. E.g. it does not allow to activate a stream that had bad headers. For more details see the description of the commit in master branch of oggplay.
Comment 36 Samuel Sidler (old account; do not CC) 2009-08-28 12:09:54 PDT
Now that this is fixed on trunk (and before 1.9.2 branched even, I believe), can we get a 1.9.1 branch patch worked up? Or does the one in this bug apply?

Please request approval1.9.1.4 on a 1.9.1-applicable patch.
Comment 37 Samuel Sidler (old account; do not CC) 2009-09-10 11:52:09 PDT
Matthew: This is now marked as blocking 1.9.1.4. Code freeze is scheduled for September 22. Please request approval on a 1.9.1-applicable patch.
Comment 38 Matthew Gregan [:kinetik] 2009-09-10 13:27:11 PDT
This was fixed on 1.9.2 before it branched.  The patch applies to 1.9.1 with a minor fixup to update.sh.  The test needs to be backported to the old test infrastructure--it's not too much work, but on the other hand we already have that test on trunk and 1.9.2.
Comment 39 Matthew Gregan [:kinetik] 2009-09-10 13:57:47 PDT
Created attachment 399820 [details] [diff] [review]
patch v1 for 1.9.1

Rebased patch.  I'll port the test and include it in the next rev.
Comment 40 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-09-10 16:12:10 PDT
In general I think we shouldn't put much effort into backporting tests. If it's easy, great, otherwise it doesn't seem worth it.
Comment 41 Matthew Gregan [:kinetik] 2009-09-10 18:30:28 PDT
Created attachment 399925 [details] [diff] [review]
patch v1 for 1.9.1 with test

Same patch plus ported testcase.  All video/audio mochitests pass with this applied.  Requesting approval for 1.9.1.4t--this has been on trunk for about 6 weeks, and effectively reverts the behaviour of this code to what it was before bug 493678 landed in late May.
Comment 42 Samuel Sidler (old account; do not CC) 2009-09-11 02:20:15 PDT
(In reply to comment #40)
> In general I think we shouldn't put much effort into backporting tests. If it's
> easy, great, otherwise it doesn't seem worth it.

I agree with this, for future reference.
Comment 43 Daniel Veditz [:dveditz] 2009-09-11 11:16:27 PDT
Comment on attachment 399925 [details] [diff] [review]
patch v1 for 1.9.1 with test

Approved for 1.9.1.4, a=dveditz for release-drivers
Comment 44 Matthew Gregan [:kinetik] 2009-09-13 17:24:36 PDT
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/6a56ef5eeae0
Comment 45 Al Billings [:abillings] 2009-09-14 15:55:34 PDT
Verified crash with testcase in 1.9.1.3 and fixed in Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.4pre) Gecko/20090914 Shiretoko/3.5.4pre.
Comment 46 Matthew Gregan [:kinetik] 2009-10-11 16:28:54 PDT
*** Bug 520518 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.