Closed Bug 789075 Opened 12 years ago Closed 12 years ago

Video playback causes browser to stop responding (mostly) or crash in Cleopatra tool

Categories

(Core :: Audio/Video, defect)

15 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox15 - affected
firefox16 - wontfix
firefox17 + fixed
firefox18 + verified
firefox19 --- verified
firefox-esr10 - unaffected

People

(Reporter: cwiiis, Assigned: kinetik)

References

()

Details

(Keywords: crash, regression, reproducible, Whiteboard: [adv-track-main17+])

Crash Data

Attachments

(1 file, 1 obsolete file)

Visiting http://people.mozilla.com/~bgirard/cleopatra/?zippedProfile=profiles/sps-profile-1346944481.42.zip&videoCapture=videos/video-1346944481.42.webm in Firefox Nightly in Windows causes the browser to crash, usually, or sometimes become very unresponsive and use lots of memory (possibly the lack of responsiveness is down to paging?)

I've not been able to profile this as I can't get it running reliably enough to do anything with it, hopefully a Windows developer/someone more familiar with Windows can look at this.

Works fine in Firefox 15 on Linux.
Here is the crash report: bp-429c44b5-4a93-439e-9dd2-c7d452120906.
Frame 	Module 	Signature 	Source
0 	xul.dll 	FastConvertYUVToRGB32Row_SSE 	gfx/ycbcr/yuv_row_win.cpp:32
1 	xul.dll 	mozilla::gfx::ConvertYCbCrToRGB32 	gfx/ycbcr/yuv_convert.cpp:96
2 	xul.dll 	mozilla::layers::PlanarYCbCrImage::GetAsSurface 	gfx/layers/ImageContainer.cpp:495

It's similar to bug 672870 and bug 750190.
Severity: normal → critical
Crash Signature: [@ FastConvertYUVToRGB32Row_SSE]
Component: General → Graphics: Layers
Keywords: crash, reproducible
Product: Firefox → Core
bp-cebbaab8-6bf6-42e0-9e3a-a63372120907  : Firefox15.0.1
bp-d152db83-b687-426a-89eb-a45c62120907  : Firefox16beta
bp-fa4ee1ec-38f5-4c86-882d-d45a32120907  : Aurora17.0a2
bp-4c086efa-376a-4ad4-9a51-68c1a2120907  : Nightly18.0a1

Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/305cd10b57d2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/15.0 Firefox/15.0a1 ID:20120601055003
Crash:
http://hg.mozilla.org/mozilla-central/rev/12ab69851e05
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/15.0 Firefox/15.0a1 ID:20120601073103
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=305cd10b57d2&tochange=12ab69851e05



Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/202d63933120
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/15.0 Firefox/15.0a1 ID:20120531112503
Crash:
http://hg.mozilla.org/integration/mozilla-inbound/rev/07cec35eee23
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/15.0 Firefox/15.0a1 ID:20120531202501
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=202d63933120&tochange=07cec35eee23

In local build:
Last good: d9ac6a5b9aea
First Bad: 85a3f4b90adc

Triggered by: 
85a3f4b90adc	Andreas Gal — Bug 714408 Part 2 - Media plugin support for libstagefright - r=doublec
Thanks alice thats a big help!
Blocks: 714408
Keywords: regression
Version: Trunk → 15 Branch
FWIW, in a provided recent Debug Nightly you get following Assertion at Crashtime:

###!!! ASSERTION: must have binding parent when in native anonymous subtree with a parent node: '!IsInNativeAnonymousSubtree() ||GetBindingParent() || !GetParent()', file e:\builds\moz2_slave\m-cen-w32-dbg\build\obj-firefox\dist\include\nsIContent.h, line 230
Gal this regression was introduced by Bug 714408. Any though on this?
(In reply to Chris Lord [:cwiiis] from comment #0)
> Visiting
> http://people.mozilla.com/~bgirard/cleopatra/?zippedProfile=profiles/sps-
> profile-1346944481.42.zip&videoCapture=videos/video-1346944481.42.webm in
> Firefox Nightly in Windows causes the browser to crash, usually, or
> sometimes become very unresponsive and use lots of memory (possibly the lack
> of responsiveness is down to paging?)

Chris/Benoit - is this critical for your testing or do you think this may have significant user impact? We tracked this based upon the assumption that this was critical for testing, but want to double check.
This is a user facing bug. It just happens to reproduce on our testing site because it's a complex page. We don't know the trigger so it could happen anywhere on the web with a <video> tag.
(In reply to Benoit Girard (:BenWa) from comment #7)
> This is a user facing bug. It just happens to reproduce on our testing site
> because it's a complex page. We don't know the trigger so it could happen
> anywhere on the web with a <video> tag.

Please re-nominate if this starts becoming troublesome for internal testing. No need to track given the fact that this crash signature has a very low volume externally.
(In reply to Alex Keybl [:akeybl] from comment #8)
> (In reply to Benoit Girard (:BenWa) from comment #7)
> > This is a user facing bug. It just happens to reproduce on our testing site
> > because it's a complex page. We don't know the trigger so it could happen
> > anywhere on the web with a <video> tag.
> 
> Please re-nominate if this starts becoming troublesome for internal testing.
> No need to track given the fact that this crash signature has a very low
> volume externally.

Most of the time when this bug occurs the only symptom is that the browser becomes super unresponsive (to the point that usually my only resort is to kill it manually) -- I don't think we even have a crash signature for this bug. This could be happening all the time and we wouldn't even notice it... :/
Should also have mentioned at some point that this is 100% reproducible for me on Ubuntu 12.04 using the Aurora builds of Firefox (17)
This doesn't reproduce on mac. How bad is the 'stop responding' vs. crash? If we're not blocking on this because the crash stats are low it's not including the impact of the jank caused. This is a regression and can happen to our users on any site using video :(
This is a serious regression in web content.  I'm pretty disappointed that we let this slide for so long.  It might be already too late for 16 Beta, but we should definitely get this fixed in 17 at least.
Component: Graphics: Layers → Video/Audio
Assignee: nobody → gal
Renaming title to be more accurate. It's not a bug with our testing tool but with the browser support. This crash doesn't happen in webkit.
Summary: Eideticker video profiling tool causes browser to stop responding (mostly) or crash → Video playback causes browser to stop responding (mostly) or crash in Cleopatra tool
(In reply to Ehsan Akhgari [:ehsan] from comment #12)
> This is a serious regression in web content.  I'm pretty disappointed that
> we let this slide for so long.  It might be already too late for 16 Beta,
> but we should definitely get this fixed in 17 at least.

FF15 is marked as affected, but we haven't heard about this in the wild. Why are you concerned about this issue specifically?
(In reply to comment #14)
> (In reply to Ehsan Akhgari [:ehsan] from comment #12)
> > This is a serious regression in web content.  I'm pretty disappointed that
> > we let this slide for so long.  It might be already too late for 16 Beta,
> > but we should definitely get this fixed in 17 at least.
> 
> FF15 is marked as affected, but we haven't heard about this in the wild. Why
> are you concerned about this issue specifically?

Because it's a regression in web content, and a pretty serious one.
Firefox 15 doesn't appear to be affected. Firefox beta is affected.
(In reply to Benoit Girard (:BenWa) from comment #16)
> Firefox 15 doesn't appear to be affected. Firefox beta is affected.

This changes things. Can anybody speak to the prevalence of this issue? I'm hearing we can't go by crash stats since this is a hang, so we need a way to reason about the severity at this point.
With STR of comment 0, Firefox no longer crashes whatever the version.
With STR of bug 795892, Firefox crashes in 15.0.1 and above.
QA Contact: jbecerra
(In reply to Scoobidiver from comment #18)
> With STR of comment 0, Firefox no longer crashes whatever the version.
> With STR of bug 795892, Firefox crashes in 15.0.1 and above.

Confirming this with Windows 7 SP1 32-bit using Firefox 15.0.1 and 17.0a2 2012-10-02. Note that to reproduce the crash all I have to do is load the following URL;
http://people.mozilla.com/~bgirard/cleopatra/?zippedProfile=profiles/sps-profile-1348512228.12.zip&videoCapture=videos/video-1348512228.11.webm
I've been playing around with several YouTube WebM videos and have not experienced a crash yet with Firefox 17.0a2 2012-10-02. I'm not sure of any other websites which utilize WebM video and YouTube is probably the most common use case. It's possible this crash is tied to the specific usecase provided by Cleopatra, but I'm not sure what else to test to prove that claim.

That said, I'm not inclined to call this a Firefox 16 blocker (at least not yet). Besides, if it was a critical issue in Firefox 15.0.1, we would have heard about it earlier through our usual feedback channels.
Not sure why I thought Andreas owned bug 714408!
Assignee: gal → chris.double
Well cleopatra does more advanced usage of the video element. We use CORS and paint it a canvas. The crash report could be explained by trying to paint the video to a canvas. Someone would have to try this and I don't have a windows environment I can ramp it without stopping my current work (dual boot).
(In reply to Ehsan Akhgari [:ehsan] from comment #21)
> Not sure why I thought Andreas owned bug 714408!

Because part 2 was committed with Gal as the owner.
(In reply to Benoit Girard (:BenWa) from comment #22)
> The crash report could be explained by trying to paint the video to a canvas.

Is this a common usecase? Is there any known live sites which do this? 

I believe Alex's ask was figuring out how prevalent this crash is in the wild (ie. how easily a common set of users might hit it in Firefox 16's release and the likelihood that we'd need to chemspill a 16.0.1 for it). QA's exploratory testing is supposed to address this immediate concern and it's not clear to me how or where we can exercise "paint video to canvas".

Any advice on this front?
(In reply to comment #24)
> (In reply to Benoit Girard (:BenWa) from comment #22)
> > The crash report could be explained by trying to paint the video to a canvas.
> 
> Is this a common usecase? Is there any known live sites which do this? 
> 
> I believe Alex's ask was figuring out how prevalent this crash is in the wild
> (ie. how easily a common set of users might hit it in Firefox 16's release and
> the likelihood that we'd need to chemspill a 16.0.1 for it). QA's exploratory
> testing is supposed to address this immediate concern and it's not clear to me
> how or where we can exercise "paint video to canvas".
> 
> Any advice on this front?

As I've mentioned in a private email thread, I think it's best for a developer who knows this code to look into the bug as the regressing patch was not supposed to change behavior on Windows.  We don't need to guess here, really.
(In reply to Benoit Girard (:BenWa) from comment #22)
> Well cleopatra does more advanced usage of the video element. We use CORS
> and paint it a canvas. The crash report could be explained by trying to
> paint the video to a canvas. Someone would have to try this and I don't have
> a windows environment I can ramp it without stopping my current work (dual
> boot).

I can reproduce this semi-reliably just by clicking on points on the graph on the eideticker dashboard (e.g. http://eideticker.wrla.ch/#/lg-g2x/taskjs-scrolling/checkerboard) and playing back the videos that appear. I don't think this is specific to CORS or the canvas, although maybe something about that triggers whatever's wrong here reliably.
(In reply to comment #26)
> (In reply to Benoit Girard (:BenWa) from comment #22)
> > Well cleopatra does more advanced usage of the video element. We use CORS
> > and paint it a canvas. The crash report could be explained by trying to
> > paint the video to a canvas. Someone would have to try this and I don't have
> > a windows environment I can ramp it without stopping my current work (dual
> > boot).
> 
> I can reproduce this semi-reliably just by clicking on points on the graph on
> the eideticker dashboard (e.g.
> http://eideticker.wrla.ch/#/lg-g2x/taskjs-scrolling/checkerboard) and playing
> back the videos that appear. I don't think this is specific to CORS or the
> canvas, although maybe something about that triggers whatever's wrong here
> reliably.

Hmm, so maybe this is a better test case for whoever wants to investigate this bug.
(In reply to Ehsan Akhgari [:ehsan] from comment #25)
> > 
> > Any advice on this front?
> 
> As I've mentioned in a private email thread, I think it's best for a
> developer who knows this code to look into the bug as the regressing patch
> was not supposed to change behavior on Windows.  We don't need to guess
> here, really.

I agree here that this is the best thing to do.

Equally useful is to reduce the test case. The eideticker page is probably easier to reduce then cleopatra. If we can find the simplest HTML page that reproduces the issues then we can find the trigger and discuss how common it might be.
The eideticker page does not work for me, probably because the videos display "Video format or MIME type is not supported".
Nightly crashes for me on Win7 Nightly playing this video:
http://eideticker.wrla.ch/videos/video-1348512228.11.webm
(In reply to Chris Pearce (:cpearce) from comment #30)
> Nightly crashes for me on Win7 Nightly playing this video:
> http://eideticker.wrla.ch/videos/video-1348512228.11.webm

Same here on Windows 7 32-bit with Firefox 17.0a2 2012-10-02.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #31)
> (In reply to Chris Pearce (:cpearce) from comment #30)
> > Nightly crashes for me on Win7 Nightly playing this video:
> > http://eideticker.wrla.ch/videos/video-1348512228.11.webm
> 
> Same here on Windows 7 32-bit with Firefox 17.0a2 2012-10-02.

And also Firefox 15.0.1.
This looks like a stack smashing bug, marking security sensitive for now.
Group: core-security
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #32)
> (In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #31)
> > (In reply to Chris Pearce (:cpearce) from comment #30)
> > > Nightly crashes for me on Win7 Nightly playing this video:
> > > http://eideticker.wrla.ch/videos/video-1348512228.11.webm
> > 
> > Same here on Windows 7 32-bit with Firefox 17.0a2 2012-10-02.
> 
> And also Firefox 15.0.1.

Sounds like we're back to FF15 being affected? If so, this is not a recent regression (and obviously less critical given the lack of user outcry post-release).
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #32)
> (In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #31)
> > (In reply to Chris Pearce (:cpearce) from comment #30)
> > > Nightly crashes for me on Win7 Nightly playing this video:
> > > http://eideticker.wrla.ch/videos/video-1348512228.11.webm
> > 
> > Same here on Windows 7 32-bit with Firefox 17.0a2 2012-10-02.
> 
> And also Firefox 15.0.1.

On this testcase I crash in nightly 2012-05-31-03-05-17 but not in 2012-05-30-13-20-0, which matches the regression range reported above.

I'll build Gal's changeset (85a3f4b90adc) to get a stack to confirm it's the same crash, and not one of the other YCbCr->RGB bugs we've fixed recently.
I've gotten better call stacks, right before we crash.

We crash when the decoder state machine thread releases a video frame:

xul.dll!mozilla::layers::Image::Release()  Line 66 + 0x4f bytes	C++
xul.dll!VideoData::~VideoData()  Line 137 + 0x10 bytes	C++
xul.dll!nsAutoPtr<VideoData>::assign(VideoData * newPtr)  Line 38 + 0xb bytes	C++
xul.dll!nsBuiltinDecoderStateMachine::AdvanceFrame()  Line 2230	C++
xul.dll!nsBuiltinDecoderStateMachine::RunStateMachine()  Line 2027	C++
xul.dll!nsBuiltinDecoderStateMachine::CallRunStateMachine()  Line 2487	C++
xul.dll!nsBuiltinDecoderStateMachine::TimeoutExpired()  Line 2518	C++
xul.dll!nsTimerImpl::Fire()  Line 473 + 0x6 bytes	C++
xul.dll!nsTimerEvent::Run()  Line 558	C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result)  Line 612 + 0xe bytes	C++
xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait)  Line 220 + 0xd bytes	C++

...while the main thread is still trying to use that image:

xul.dll!mozilla::layers::ImageContainer::LockCurrentAsSurface(nsIntSize * aSize, mozilla::layers::Image * * aCurrentImage)  Line 287 + 0x12 bytes	C++
xul.dll!mozilla::layers::AutoLockImage::AutoLockImage(mozilla::layers::ImageContainer * aContainer, gfxASurface * * aSurface)  Line 560 + 0x1b bytes	C++
xul.dll!mozilla::layers::BasicImageLayer::GetAndPaintCurrentImage(gfxContext * aContext, float aOpacity, mozilla::layers::Layer * aMaskLayer)  Line 88	C++
xul.dll!mozilla::layers::BasicImageLayer::Paint(gfxContext * aContext, mozilla::layers::Layer * aMaskLayer)  Line 73 + 0x22 bytes	C++
xul.dll!mozilla::layers::BasicLayerManager::PaintSelfOrChildren(mozilla::layers::PaintContext & aPaintContext, gfxContext * aGroupTarget)  Line 829	C++
xul.dll!mozilla::layers::BasicLayerManager::PaintLayer(gfxContext * aTarget, mozilla::layers::Layer * aLayer, void (mozilla::layers::ThebesLayer *, gfxContext *, const nsIntRegion &, const nsIntRegion &, void *)* aCallback, void * aCallbackData, mozilla::layers::ReadbackProcessor * aReadback)  Line 946	C++
xul.dll!mozilla::layers::BasicLayerManager::PaintSelfOrChildren(mozilla::layers::PaintContext & aPaintContext, gfxContext * aGroupTarget)  Line 841	C++
xul.dll!mozilla::layers::BasicLayerManager::PaintLayer(gfxContext * aTarget, mozilla::layers::Layer * aLayer, void (mozilla::layers::ThebesLayer *, gfxContext *, const nsIntRegion &, const nsIntRegion &, void *)* aCallback, void * aCallbackData, mozilla::layers::ReadbackProcessor * aReadback)  Line 946	C++
xul.dll!mozilla::layers::BasicLayerManager::EndTransactionInternal(void (mozilla::layers::ThebesLayer *, gfxContext *, const nsIntRegion &, const nsIntRegion &, void *)* aCallback, void * aCallbackData, mozilla::layers::LayerManager::EndTransactionFlags aFlags)  Line 589	C++
xul.dll!nsDisplayList::PaintForFrame(nsDisplayListBuilder * aBuilder, nsRenderingContext * aCtx, nsIFrame * aForFrame, unsigned int aFlags)  Line 1069	C++
xul.dll!nsDisplayList::PaintRoot(nsDisplayListBuilder * aBuilder, nsRenderingContext * aCtx, unsigned int aFlags)  Line 957	C++
xul.dll!nsLayoutUtils::PaintFrame(nsRenderingContext * aRenderingContext, nsIFrame * aFrame, const nsRegion & aDirtyRegion, unsigned int aBackstop, unsigned int aFlags)  Line 1749	C++
xul.dll!PresShell::RenderDocument(const nsRect & aRect, unsigned int aFlags, unsigned int aBackgroundColor, gfxContext * aThebesContext)  Line 4361 + 0x20 bytes	C++
xul.dll!nsCanvasRenderingContext2DAzure::DrawWindow(nsIDOMWindow * window, double x, double y, double w, double h, const nsAString_internal & bgColor, unsigned int flags, mozilla::ErrorResult & error)  Line 3980	C++
xul.dll!mozilla::dom::CanvasRenderingContext2DBinding::drawWindow(JSContext * cx, JS::Handle<JSObject *> obj, nsCanvasRenderingContext2DAzure * self, unsigned int argc, JS::Value * vp)  Line 1876	C++
xul.dll!mozilla::dom::CanvasRenderingContext2DBinding::genericMethod(JSContext * cx, unsigned int argc, JS::Value * vp)  Line 2577 + 0x14 bytes	C++
mozjs.dll!js::CallJSNative(JSContext * cx, int (JSContext *, unsigned int, JS::Value *)* native, const JS::CallArgs & args)  Line 370 + 0xf bytes	C++
mozjs.dll!js::InvokeKernel(JSContext * cx, JS::CallArgs args, js::MaybeConstruct construct)  Line 367 + 0x15 bytes	C++
mozjs.dll!js::Interpret(JSContext * cx, js::StackFrame * entryFrame, js::InterpMode interpMode)  Line 2458 + 0x27 bytes	C++
mozjs.dll!js::RunScript(JSContext * cx, JS::Handle<JSScript *> script, js::StackFrame * fp)  Line 324 + 0x7 bytes	C++
mozjs.dll!js::InvokeKernel(JSContext * cx, JS::CallArgs args, js::MaybeConstruct construct)  Line 381	C++
mozjs.dll!js::Invoke(JSContext * cx, const JS::Value & thisv, const JS::Value & fval, unsigned int argc, JS::Value * argv, JS::Value * rval)  Line 411 + 0x21 bytes	C++
mozjs.dll!js::IndirectProxyHandler::call(JSContext * cx, JSObject * proxy, unsigned int argc, JS::Value * vp)  Line 450 + 0x3b bytes	C++
mozjs.dll!js::CrossCompartmentWrapper::call(JSContext * cx, JSObject * wrapper_, unsigned int argc, JS::Value * vp)  Line 722 + 0x12 bytes	C++
mozjs.dll!proxy_Call(JSContext * cx, unsigned int argc, JS::Value * vp)  Line 2999 + 0x23 bytes	C++
mozjs.dll!js::CallJSNative(JSContext * cx, int (JSContext *, unsigned int, JS::Value *)* native, const JS::CallArgs & args)  Line 370 + 0xf bytes	C++
mozjs.dll!js::InvokeKernel(JSContext * cx, JS::CallArgs args, js::MaybeConstruct construct)  Line 360 + 0xb bytes	C++
mozjs.dll!js::Interpret(JSContext * cx, js::StackFrame * entryFrame, js::InterpMode interpMode)  Line 2458 + 0x27 bytes	C++
mozjs.dll!js::RunScript(JSContext * cx, JS::Handle<JSScript *> script, js::StackFrame * fp)  Line 324 + 0x7 bytes	C++
mozjs.dll!js::InvokeKernel(JSContext * cx, JS::CallArgs args, js::MaybeConstruct construct)  Line 381	C++
mozjs.dll!js::CallOrConstructBoundFunction(JSContext * cx, unsigned int argc, JS::Value * vp)  Line 1086 + 0x25 bytes	C++
mozjs.dll!js::CallJSNative(JSContext * cx, int (JSContext *, unsigned int, JS::Value *)* native, const JS::CallArgs & args)  Line 370 + 0xf bytes	C++
mozjs.dll!js::InvokeKernel(JSContext * cx, JS::CallArgs args, js::MaybeConstruct construct)  Line 367 + 0x15 bytes	C++
mozjs.dll!js::Invoke(JSContext * cx, const JS::Value & thisv, const JS::Value & fval, unsigned int argc, JS::Value * argv, JS::Value * rval)  Line 411 + 0x21 bytes	C++
mozjs.dll!JS_CallFunctionValue(JSContext * cx, JSObject * objArg, JS::Value fval, unsigned int argc, JS::Value * argv, JS::Value * rval)  Line 5860 + 0x24 bytes	C++
xul.dll!nsJSContext::CallEventHandler(nsISupports * aTarget, JSObject * aScope, JSObject * aHandler, nsIArray * aargv, nsIVariant * * arv)  Line 1908 + 0x2e bytes	C++
xul.dll!nsGlobalWindow::RunTimeoutHandler(nsTimeout * aTimeout, nsIScriptContext * aScx)  Line 9630 + 0x49 bytes	C++
xul.dll!nsGlobalWindow::RunTimeout(nsTimeout * aTimeout)  Line 9881	C++
xul.dll!nsGlobalWindow::TimerCallback(nsITimer * aTimer, void * aClosure)  Line 10147	C++
xul.dll!nsTimerImpl::Fire()  Line 473 + 0x6 bytes	C++
xul.dll!nsTimerEvent::Run()  Line 558	C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result)  Line 612 + 0xe bytes	C++
xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait)  Line 220 + 0xd bytes	C++
Back to wontfix for FF16, since this bug is not a regression from FF15 and wasn't a critical issue for our users over the last cycle.
I think this has been broken for longer than the regression range suggests, and the changes in the suspected patch changed the allocation patterns/memory layout enough to resurface the crash.  That would explain the much earlier crashes in this code we've seen in bug 672870 and bug 750190.

In the row processing loop in ConvertYCbCrToRGB32, when y == 588, v_ptr points at the end of the fused buffer allocation containing the Y, U, and V planes.  Depending on memory layout, we get either a silent failure or a read fault when FastConvertYUVToRGB32Row_SSE accesses v_ptr.

u_ptr and v_ptr are calculated by halving y, so when y is 588, the u/v index is 294 for a buffer with valid indices in the range 0..293.
reproducible on WinXP, Windows 7 and OSX with http://eideticker.wrla.ch/videos/video-1348512228.11.webm
OS: Windows 7 → All
Hardware: x86 → All
Reassigning.
Assignee: chris.double → kinetik
Status: NEW → ASSIGNED
Attached patch patch v0 (obsolete) — Splinter Review
It looks like we've been handling odd sized WebM videos incorrectly since inception (see bug 566245 comment 8).  The current decoder examples show the correct way to do this (round up, not down), but they were only updated in late 2011: http://git.chromium.org/gitweb/?p=webm/libvpx.git;a=commitdiff;h=6aea309a93d31d775642c6d2b66bc51db6adcbea

Attached patch is Valgrind-clean (except for bug 798989) on x86_64 playing http://eideticker.wrla.ch/videos/video-1348512228.11.webm and a single white 9x5 frame video I'll add a crashtest for.  Try push: https://tbpl.mozilla.org/?tree=Try&rev=02e39ed967db

This *should* explain the bulk of the crashes in this code going back to Firefox 13 or so--if I understand correctly, we started copying the frames into our own buffer rather than converting them directly  out of the libvpx frame buffer in bug 715785.  Prior to that, we used the incorrectly calculated Cb/Cr dimensions but read memory that libvpx had intended us to read anyway.

Disappointingly, this doesn't explain bug 672870 nor bug 750190, neither of which I can reproduce in a current nightly.
Attachment #669013 - Flags: review?
Attachment #669013 - Flags: review? → review?(tterribe)
Comment on attachment 669013 [details] [diff] [review]
patch v0

Review of attachment 669013 [details] [diff] [review]:
-----------------------------------------------------------------

Should we add some ASSERTs to the code that does the copying to guarantee things like this aren't happening elsewhere?
Attachment #669013 - Flags: review?(tterribe) → review+
I added an assert and a crashtest.  The crashtest only fails 1/10 times on Linux (despite trying to engineer a file that would cause this as often as possible), but it'll fail every time on Windows.  With the assert in place, it fails on that, and with the fix in place the test passes.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9fd563bfcf50
It's not clear if this bug needs to be core-security, but there is some exposure: some of the heap following the video frame allocation may leak into the decoded (RGB) video frame via the Cr channel.  Whether it'd still be useful after colour conversion and quantization, I'm not sure.
Target Milestone: --- → mozilla19
https://hg.mozilla.org/mozilla-central/rev/9fd563bfcf50
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Please nominate for uplift with a risk assessment to help determine if we can take this on branches.
Attached patch patch v1Splinter Review
This is what I checked in, which is patch v0 plus the crashtest and assertion.
Attachment #669013 - Attachment is obsolete: true
Comment on attachment 672098 [details] [diff] [review]
patch v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug has existed since the feature was added in bug 566245, but I believe the out-of-bounds reads only started when the buffer copy was changed in bug 715785
User impact if declined: playing odd-sized WebM videos may crash browser, possible heap info/security leak
Testing completed (on m-c, etc.): tested locally, landed on m-c, crashtest and assertion include
Risk to taking this patch (and alternatives if risky): low, highest risk is that assertion results in additional crashes (but, if so, it has converted the potential info/security leak into a safe crash)
String or UUID changes made by this patch: none
Attachment #672098 - Flags: approval-mozilla-beta?
Attachment #672098 - Flags: approval-mozilla-aurora?
Comment on attachment 672098 [details] [diff] [review]
patch v1

Thank you, please go ahead with uplift.
Attachment #672098 - Flags: approval-mozilla-beta?
Attachment #672098 - Flags: approval-mozilla-beta+
Attachment #672098 - Flags: approval-mozilla-aurora?
Attachment #672098 - Flags: approval-mozilla-aurora+
Adding Qawanted: once this lands on Beta we should get some testing on differently sized WebM videos to see if any crashes can be reproduced and to confirm whether they are indeed now safe.
Keywords: qawanted, verifyme
When testing, please make sure you test with HW acceleration disabled so that this code is used for all playback, otherwise you'll hit the HWA path mostly and this code will only run when the tab thumbnails are generated once a second or so.

https://hg.mozilla.org/releases/mozilla-aurora/rev/d6dae702f67e
https://hg.mozilla.org/releases/mozilla-beta/rev/fe7f775dc2d7
Juan, can you please do some testing with this tomorrow in the 17.0b2 candidate builds?
Keywords: qawanted
I'm seeing hangs on my copy of firefox aurora with the videos chris lord posted today on his blog:

http://chrislord.net/blog/Software/Mozilla/progressive-tile-rendering.enlighten

Might make a good alternate test case.
Hangs without a crash could be bug 791344; the testcases in this bug tended to trigger that bug as well.  I'll nominate the patch in that bug for uplift too.
Actually, bug 791344 is on aurora already, so ignore my previous comment.  You could hit that issue on beta, so I've nominated the patch for uplift to beta to aid QA.
(In reply to Alex Keybl [:akeybl] from comment #14)
> (In reply to Ehsan Akhgari [:ehsan] from comment #12)
> > This is a serious regression in web content.  I'm pretty disappointed that
> > we let this slide for so long.  It might be already too late for 16 Beta,
> > but we should definitely get this fixed in 17 at least.
> 
> FF15 is marked as affected, but we haven't heard about this in the wild. Why
> are you concerned about this issue specifically?

Is my blog in-the-wild enough to consider fixing this in 16? (I accidentally came across this, I exclusively use Nightly and so didn't notice)
Verified on Nightly and Aurora. The fix on Beta didn't make it before the 17b2 was cut, so beta 2 is still crashing using the examples in comment #19 and comment #39 taking into account comment #51. We'll verify on Beta 3 candidates next week.
Is bug 714408 the regressing bug, or merely tickling the security issue? Want to find out if we can/should take a fix for the ESR10 branch.
I'm reasonably sure this can only be hit since bug 714408.
Whiteboard: [adv-track-main17+]
Group: core-security
mass remove verifyme requests greater than 4 months old
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: