Closed Bug 500311 Opened 15 years ago Closed 15 years ago

crash while loading video clip [@ memcpy - oggplay_data_handle_theora_frame]

Categories

(Core :: Audio/Video, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1
Tracking Status
blocking1.9.1 --- .4+
status1.9.1 --- .4-fixed

People

(Reporter: jbecerra, Assigned: kinetik)

References

()

Details

(4 keywords, Whiteboard: [sg:critical?] uninitialized memory)

Crash Data

Attachments

(4 files, 2 obsolete files)

While testing Fx3.5rc3 on Vista, I visited the tinyvid.tv site and clicked on a few links, and one of them consistently crashed the browser while loading. It also crashes RC2, but does NOT crash in b4. Happens on Mac (latest Minefield).

http://crash-stats.mozilla.com/report/index/75648029-65ea-433f-b3d1-ca3d62090624

Steps:
1. Visit tinyvid.tv site
2. Click on clip "The Time Traveler's Wife trailer" link (video link is http://tinyvid.tv/file/2fb8pbzg1ahwj.ogg)

Expected: Play the clip

Actual: Browsers starts loading the clip, then crashes, consistently.
Flags: blocking-firefox3.5?
Component: General → Video/Audio
Flags: blocking-firefox3.5?
Product: Firefox → Core
QA Contact: general → video.audio
Flags: blocking1.9.1?
Trunk crashes on OS X, too.  It's a null pointer deref.  Seems like we get a zeroed yuv_buffer from theora_decode_YUVout:

#0  0x193f0d77 in oggplay_data_handle_theora_frame (decode=0x1c3a3db0, buffer=0xb054d9c4) at /Users/kinetik/work/mozilla-central/mozilla/media/liboggplay/src/liboggplay/oggplay_data.c:365
365	    memcpy(p, q, decode->y_width);
(gdb) p q
$15 = (unsigned char *) 0x0
(gdb) p *buffer
$16 = {
  y_width = 0, 
  y_height = 0, 
  y_stride = 0, 
  uv_width = 0, 
  uv_height = 0, 
  uv_stride = 0, 
  y = 0x0, 
  u = 0x0, 
  v = 0x0
}
On Windows trunk I get the buffer as full of 0xcdcdcdcd, which MS's debug libraries stamp over free'd memory. As a consequence the malloc on line 345 is trying to allocate 1253318069 bytes, which probably isn't helping either.

Seems like buffer is being used/passed in after its been free'd.
Confirmed crash on Linux i686 as well

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820 Firefox/3.5
(In reply to comment #3)
> Confirmed crash on Linux i686 as well
> 
> Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820
> Firefox/3.5

(In reply to comment #3)
> Confirmed crash on Linux i686 as well
> 
> Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624012820
> Firefox/3.5

I retract that. I tested again, no crash but the video will not play properly.
(In reply to comment #2)
> On Windows trunk I get the buffer as full of 0xcdcdcdcd, which MS's debug
> libraries stamp over free'd memory. As a consequence the malloc on line 345 is
> trying to allocate 1253318069 bytes, which probably isn't helping either.
> 
> Seems like buffer is being used/passed in after its been free'd.

In the debug heap of Visual C++, 0xCDCDCDCD means that the memory is allocated but not set to any values.  Freed memory is filled with 0xDDDDDDDD.
http://msdn.microsoft.com/en-us/library/bebs9zyz.aspx
(In reply to comment #5)
> In the debug heap of Visual C++, 0xCDCDCDCD means that the memory is allocated
> but not set to any values.  Freed memory is filled with 0xDDDDDDDD.
> http://msdn.microsoft.com/en-us/library/bebs9zyz.aspx

You're right, I stand corrected. Though I see 0xEEEEEEEE stamped on free'd memory, which is contrary to the documentation. Madness.
I get a different stack with the same crash on load of the above URL in Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5. I know there has been some work related imgCacheEntry. 

KERN_PROTECTION_FAILURE at address: 0x00000000

#0  0xffff0800 in ___memcpy () at /System/Library/Frameworks/System.framework/PrivateHeaders/i386/cpu_capabilities.h:246


#1  0x01470683 in std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr<imgCacheEntry>*, std::vector<nsRefPtr<imgCacheEntry>, std::allocator<nsRefPtr<imgCacheEntry> > > >, int, nsRefPtr<imgCacheEntry>, bool (*)(nsRefPtr<imgCacheEntry> const&, nsRefPtr<imgCacheEntry> const&)> ()
Hard to tell from the comments here on the severity, risk, etc.  Do people think we should block on this?
So, the first time we call theora_decode_packetin for this video, th_decode_packetin returns TH_DUPFRAME (which means there's no frame data to extract), and since this error is > 0, theora_decode_packetin returns 0.  We then call theora_decode_YUVout, which returns no error, then pass the still uninitialized yuv_buffer to oggplay_data_handle_theora_frame and crash.

If I force oggplay_data_handle_theora_frame to return early (with the debugger) for this frame, we do not crash.  The video plays through, and replays from the beginning without crashing.

oggplayer doesn't crash when playing this video (with or without an oggplay_seek after load_metadata).
I'm not sure if we should block.  I'm working on finding the regression range... that may help us understand the severity.
Range is: http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?startdate=2009-05-20&enddate=2009-05-22

...which includes a large number of changes to that code.  Working on narrowing down the culprit now.
It would also be helpful to know why it's only one video on tinyvid.tv, and how that video got encoded with the bit that's causing us to crash
And the winner is: http://hg.mozilla.org/mozilla-central/rev/c3bf078aded1 (bug 493678).

The crash happens when loading via HTTP, but not with local files.  I'm not yet sure why that is, given the patch that caused the regression.  I've got a fix which I'm pretty sure is the correct thing to do, but I don't yet have an explanation for why this patch caused us to start crashing.  Someone who understands oggplay better might be able to explain it.  I'll CC Viktor.

The reason why we crash is that the first frame oggplay attempts to decode is a duplicate frame (TH_DUPFRAME, which means it doesn't exist in the Theora stream, and you're expected to reuse the prior frame).  The problem here is that it's the first frame we try to decode, so there's no prior frame data to reuse, and we end up using uninitialized data structures.

Except, it's not really the first frame, it's the second.  oggplay incorrectly skips the first frame during decoding.  When decoding headers, theora_decode_header eventually returns 0, which means all expected headers have been processed *and the current packet is video data*.  The problem is, oggplay returns as if this was the last header packet, without attempting to decode the video data.  This behaviour isn't clear from the old Theora API[0], but it is from the new Theora API[1].  Luckily, Chris Double has been working with the new API recently and pointed this behaviour out.

I'll attach the patch shortly, once I've run through mochitests, reftests, and done some more local testing.

Chris is going to analyze the files on tinyvid.tv and work out how many videos would return TH_DUPFRAME on the second frame (the first that oggplay would try to decode) to work out how many we'd crash on.  This'll give us a better idea of the scope of the bug.

I'm tempted to say that we should make this block anyway, because it'd be easy to construct a file that would invoke this crash, and since it causes us to pass uninitialized stack data to memcpy, there's some chance it could be a security risk.  I'll mark this bug as sensitive, just to be sure.  Feel free to unmark if you think I'm being paranoid.

[0] http://theora.org/doc/libtheora-1.0/group__oldfuncs.html#g02915e63c1bd733ee291f577a8b75a82
[1] http://theora.org/doc/libtheora-1.0/group__decfuncs.html#g006d01d36fbe64768c571e6a12b7fc50
Group: core-security
roc/doublec: can you take a look at comment 13 and see if you agree? Blocking at this point likely means slipping release :(
Severity: normal → critical
Keywords: crash
Summary: crash while loading video clip → crash while loading video clip [@ memcpy - oggplay_data_handle_theora_frame]
The patch changed whether the decoder is initialized as active (it used to be
active, now it's inactive).  When it defaulted to active, it would decode a
bunch of packets before getting to the one causing the crash, so Theora's
internal state happened to be set up correctly.  Now that it defaults to
inactive, none of the incoming packets are fed to the decoder until the decoder
is activated--and when that happens, the next packet is the one we crash on
with this file.

The HTTP vs local difference comes down to the fact that with a local file,
once the decoder is activated, the packets fed to oggplay_callback_theora
restart from packet 0 (so we see packets up to 25, then reset to 0).  With
HTTP, the packets never reset to 0--we see 25, then the decoder activates, we
try to process packet 26 and crash.

The reason it doesn't reset is that we don't call oggplay_seek(mPlayer, 0) for
this video when it's hosted on tinyvid.tv (because it supplies a duration
header), and locally (because my test website doesn't support byte ranges), so
the code at
http://mxr.mozilla.org/mozilla-central/source/content/media/video/src/nsOggDecoder.cpp#1821
bails out and we never hit the oggplay_seek.

So the patch I mentioned earlier is probably not the correct fix for this bug.
This isn't going to be the last unfixed ogg/theora crash for 3.5, shouldn't be a problem if the fix goes into 3.5.1 instead.
Flags: wanted1.9.1.x+
Flags: wanted1.9.0.x-
Keywords: testcase
Whiteboard: [sg:critical?] uninitialized memory
Testcase only applies to tests we have attached to Bugzilla or in the repository itself. Chris, can we simply shorten those crashing movies to get a local testcase?
Flags: in-testsuite?
Hardware: x86 → All
Version: unspecified → Trunk
Flags: blocking1.9.1? → blocking1.9.1-
Whiteboard: [sg:critical?] uninitialized memory → [3.5.1?][sg:critical?] uninitialized memory
It should be possible to shorten. It just needs a video with the first few frames be duplicates I believe.
Attached video testcase
I reduced the linked video a bit.
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Flags: blocking1.9.1.1?
Whiteboard: [3.5.1?][sg:critical?] uninitialized memory → [sg:critical?] uninitialized memory
This redhat bug report for an Ogg file that crashes Firefox seems to be this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=501632
(In reply to comment #20)
> This redhat bug report for an Ogg file that crashes Firefox seems to be this
> bug:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=501632

Just to note that I am not totally persuaded that this bug is a duplicate of the Red Hat one. I am not and I have never been able to crash our Firefox with http://proyectofedora.org/mexico/wp-content/uploads/2009/05/boot.ogg (and the reporter is apparently not able either with our FF 3.5), but clicking on the tinyvid video from the description of this bug kills even our FF 3.5 immediately.
Matthew: Are you still working on this bug?
I am.  I'm close to understanding the root cause now, so shouldn't be too far away from the correct fix.
Attached patch patch v0 (obsolete) — Splinter Review
This isn't a perfect fix, but it's fairly simple and returns liboggplay's behaviour to what we had in the past.

Basically, it reverts the change from bug 493678 and instead fixes that bug a different way by tracking the pseudo-active tracks via active_tracks during liboggplay initialization.
Attachment #388421 - Flags: review?(chris.double)
Oops, I thought I had posted information about the root cause before posting the patch.

oggplay_initialise calls oggz_read in a loop, reading and processing (via callbacks) a block of data at a time.  Each time oggz_read returns, oggplay_initialise checks all_tracks_initialised, and bails out of the loop if it's true.  As the loop condition is only checked after each block of data has been processed, it's possible to have processed all of the track headers and begun processing the track data before the loop in oggplay_initialise bails out.

With the patch from bug 493678, as each track was detected and initialized, it started off inactive.  Prior to that patch, they defaulted to active.

The decoder callbacks have an early return that tests if the track is active, and bails out if not.  For Theora decoding, this means that the header decode functions will always be called (for header packets), but the data decode functions will only be called if the decoder is active.

In the situation that leads to the crash, oggplay_initialise returns successfully after having received callbacks for data packets but not processing them (because the track is inactive).  nsOggDecoder activates the tracks, then the first time we call oggplay_step_decode, oggplay decodes the next data packet, which happens to be empty in this case.  As this is the first data packet that is ever processed, there's no previous-frame data to fetch from the Theora decoder, and we crash handling uninitialized data.

My patch fixes the problem the simplest way, by returning oggplay to the old behaviour where tracks were created as active.  When this happens, for tracks that can be marked active, it increments the active_tracks counter so that track deactivation is tracked correctly to avoid bug 493678.

A cleaner way to fix this might be to modify oggplay and oggz so that decoder initialization and the actual decoding were logically separate.  One way might be to have the callbacks able to signal that initialization is complete (they can only signal success and failure right now), so that the initialization loop in oggplay_initialise could bail out as soon as the initial track headers had been processed.

During my analysis of this bug, I found some related incorrect behaviour (not crasher bugs) caused by the way initialization is handled.  For instance, in some cases oggplay will decode the same data packets multiple times during initialization, causing a sequence of duplicate frames to be queued in oggplay's buffers.  Given our plans for 1.9.2, I'm not sure if it's worth trying to fix this in oggplay or not.
blocking1.9.1: --- → needed
Flags: blocking1.9.1.1? → blocking1.9.1.1-
Attachment #388421 - Flags: review?(chris.double) → review+
Whiteboard: [sg:critical?] uninitialized memory → [sg:critical?] uninitialized memory [needs landing]
Is this the same as (or related to) bug 474077?  If so, should 474077 be marked as security sensitive? (And furthermore, should it or this be marked as a duplicate?)
Moreover, I can't get the testcase to crash 3.5.1 or mozilla-central trunk (revision 9dfacae57238).  Has this been fixed?
The patch hasn't landed, so it's not fixed.  I can still reproduce locally with the right configuration.

Sorry I didn't make it clear, but the testcase probably doesn't crash if loaded from Bugzilla.  It needs to be loaded from a webserver that supports byte range request, which I don't think Bugzilla does.

Bug 474077 doesn't have enough information to be sure if it's a dupe of this (or one of the other crashes we've seen in oggplay_data_handle_theora_frame).  It's probably a dupe of one of them, but I don't know which one.
Attached patch patch v1Splinter Review
Same patch, but with README_MOZILLA and update.sh updated, and patch of patch included.
Attachment #388421 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/a328a2594cd5
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical?] uninitialized memory [needs landing] → [sg:critical?] uninitialized memory
Attached patch Patch - add testSplinter Review
Adds test which checks that the testcase plays when served with X-Content-Duration header. This ensures the previously crashing code-path is exercised.
Attachment #392425 - Flags: review?
Testcase pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/c7a55d7c39fa
Flags: in-testsuite? → in-testsuite+
Target Milestone: --- → mozilla1.9.2a1
Verified fixed on OS X and Windows with builds like Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090801 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090801042759
Status: RESOLVED → VERIFIED
Flags: wanted1.9.1.x+
(In reply to comment #26)
> A cleaner way to fix this might be to modify oggplay and oggz so that decoder
> initialization and the actual decoding were logically separate.  One way might
> be to have the callbacks able to signal that initialization is complete (they
> can only signal success and failure right now), so that the initialization loop
> in oggplay_initialise could bail out as soon as the initial track headers had
> been processed.

This behavior has been implemented in 17ef4ca82df28eab294fb8c41e77add6f96e7ab7.
Basically what it does is that it first tries to read all the streams' headers in oggplay_initialise, and depending on the success of decoding the headers it marks whether the track can be activated. E.g. it does not allow to activate a stream that had bad headers. For more details see the description of the commit in master branch of oggplay.
Now that this is fixed on trunk (and before 1.9.2 branched even, I believe), can we get a 1.9.1 branch patch worked up? Or does the one in this bug apply?

Please request approval1.9.1.4 on a 1.9.1-applicable patch.
blocking1.9.1: needed → .4+
Whiteboard: [sg:critical?] uninitialized memory → [sg:critical?][needs 1.9.1 patch or approval request] uninitialized memory
Matthew: This is now marked as blocking 1.9.1.4. Code freeze is scheduled for September 22. Please request approval on a 1.9.1-applicable patch.
This was fixed on 1.9.2 before it branched.  The patch applies to 1.9.1 with a minor fixup to update.sh.  The test needs to be backported to the old test infrastructure--it's not too much work, but on the other hand we already have that test on trunk and 1.9.2.
Attached patch patch v1 for 1.9.1 (obsolete) — Splinter Review
Rebased patch.  I'll port the test and include it in the next rev.
In general I think we shouldn't put much effort into backporting tests. If it's easy, great, otherwise it doesn't seem worth it.
Same patch plus ported testcase.  All video/audio mochitests pass with this applied.  Requesting approval for 1.9.1.4t--this has been on trunk for about 6 weeks, and effectively reverts the behaviour of this code to what it was before bug 493678 landed in late May.
Attachment #399820 - Attachment is obsolete: true
Attachment #399925 - Flags: approval1.9.1.4?
(In reply to comment #40)
> In general I think we shouldn't put much effort into backporting tests. If it's
> easy, great, otherwise it doesn't seem worth it.

I agree with this, for future reference.
Comment on attachment 399925 [details] [diff] [review]
patch v1 for 1.9.1 with test

Approved for 1.9.1.4, a=dveditz for release-drivers
Attachment #399925 - Flags: approval1.9.1.4? → approval1.9.1.4+
Verified crash with testcase in 1.9.1.3 and fixed in Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.4pre) Gecko/20090914 Shiretoko/3.5.4pre.
Keywords: verified1.9.1
Alias: CVE-2009-3378
Alias: CVE-2009-3378
Blocks: 493678
Keywords: regression
Whiteboard: [sg:critical?][needs 1.9.1 patch or approval request] uninitialized memory → [sg:critical?] uninitialized memory
Group: core-security
Attachment #399925 - Attachment is patch: true
Attachment #399925 - Attachment mime type: application/octet-stream → text/plain
Crash Signature: [@ memcpy - oggplay_data_handle_theora_frame]
You need to log in before you can comment on or make changes to this bug.