Closed Bug 703135 Opened 13 years ago Closed 11 years ago

Crash in oc_huff_tree_collapse()


(Core :: Audio/Video, defect)

Windows 8
Not set





(Reporter: gps, Assigned: cpearce)



(Keywords: crash, dogfood, Whiteboard: [metro-mvp])


(7 files, 1 obsolete file)

I built Nightly on Windows 8 and Visual Studio 2011 and was treated to a crash a few seconds after application start.

Unhandled exception at 0x7444A42A (xul.dll) in firefox.exe: 0xC00000005: Access violation writing location 08BBC1BF0F.

The top of the stack is huffdec.c:346:oc_huff_tree_collapse().

It is failing on the "if(_tree!=NULL)_tree[node[l]++]=(ogg_int16_t)nbits;" line.

First few frames are:

Interestingly, the memory address of _tree is 0x052fc0b0 and node is 0x0a67f55c. This is pretty far away from where the write exception occurred, 0x08BBC1BF0F.

The disassembly reveals the error occurred at instruction:

 inc word ptr ds:[8BC1BF0Fh]

which is the first instruction after the annotation for the "if(_tree!=NULL)..." line.

Steps to reproduce:

1) Build Firefox on Windows 8 and Visual Studio 2011 (using vanilla .mozconfig)
2) Load (default page on empty Nightly profile).

I have a Visual Studio crash dump file and a VirtualBox snapshot that reliably produces the bug.

This could be a MSVC2011 and/or Windows 8 bug. Both are developer previews, after all.
Summary: oc_huff_tree_collapse → Crash in oc_huff_tree_collapse()
Wow. Can you attach the full disassembly of oc_huff_tree_collapse so we can see the context?
Crash is at 7444A42A.

EAX = 00000100 EBX = 00000000 ECX = 00000000 EDX = 00000008 ESI = 00000000 EDI = 00000101 EIP = 7444A42A ESP = 0A67F4D0 EBP = 0A67F55C EFL = 00010202 

+		_tree	0x052fc0b0 {27792}	short *
+		_tokens	0x0a67f5c8 {4 '\x4', 4 '\x4'}	unsigned char[2] *
		_ntokens	92	int
+		node	0x0a67f55c {0, 295, 321, 2663, 0, 0, 25036, 29764, -2012, 2663, 32, 0, 91, 0, 0, -4096, 4, 0, -24046, ...}	short[34]
+		depth	0x0a67f514 ""	unsigned char[34]
+		last	0x0a67f538 "[IF"	unsigned char[34]

>	xul.dll!oc_huff_tree_collapse(short * _tree, unsigned char[2] * _tokens, int _ntokens) Line 346	C
 	xul.dll!oc_huff_trees_unpack(oc_pack_buf * _opb, short * * _nodes) Line 403	C
 	xul.dll!oc_dec_headerin(oc_pack_buf * _opb, th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 214	C
 	xul.dll!th_decode_headerin(th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 242	C
 	xul.dll!nsTheoraState::DecodeHeader(ogg_packet * aPacket) Line 267	C++
 	xul.dll!nsOggReader::ReadHeaders(nsOggCodecState * aState) Line 163	C++
 	xul.dll!nsOggReader::ReadMetadata(nsVideoInfo * aInfo) Line 248	C++
 	xul.dll!nsBuiltinDecoderStateMachine::DecodeMetadata() Line 1370	C++
 	xul.dll!nsBuiltinDecoderStateMachine::DecodeThreadRun() Line 428	C++
 	xul.dll!nsRunnableMethodImpl<void (__thiscall nsMediaChannelStream::*)(void),0>::Run() Line 346	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 631	C++
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait) Line 245	C++
 	xul.dll!nsThread::ThreadFunc(void * arg) Line 272	C++
 	nspr4.dll!_PR_NativeRunThread(void * arg) Line 448	C
 	nspr4.dll!pr_root(void * arg) Line 122	C
 	msvcr110.dll!_callthreadstartex() Line 308	C
 	msvcr110.dll!_threadstartex(void * ptd) Line 286	C
 	kernel32.dll!75629391()	Unknown
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!77ed31b0()	Unknown
 	ntdll.dll!77ed3183()	Unknown
The crashing instruction
   inc         word ptr ds:[8BC1BF0Fh]  
is an increment to a hard-coded constant address (qualified with the ds segment register). I am not at all surprised that this causes a crash, but I have no idea where the compiler is getting this from, as there are no increments of a global variable anywhere in this code.
FWIW, the value at that DS address is 2344730383.

When stepping through things in the debugger, I noticed that the current line position was not progressing linearly. Starting at the nbits=oc_huff_tree_collapse_depth(...) line (343), it went:

 343, 345, 344, 345, 344, 345, 346, crash

Looking at the disassembly annotations, it does appear that lines are being evaluated out of order. I guess the optimizer was a little too aggressive? MSVC 2011 bug? Speaking of disassembly source annotations, they are different now! The assembly still looks the same, but the attributions of source lines are off by a few instructions! And, no, the binary didn't change. One would think Visual Studio would be deterministic here. Most interesting.

You want me to perform a debug build (without optimizations)?
Ehsan: you got m-c building on MSVC 2011 on Windows 7, right? Could you see if you experience this bug with a regular/non-debug build? If not, I'd be interested in seeing what your assembly looks like for the infringing function.
Severity: normal → critical
Keywords: crash
Gregory: worth trying to reproduce with a less-optimized build. That would confirm it's an optimizer bug.
I built with --enable-debug --no-optimize and the crash went away. Disassembly attached. The same revision of the source code was used.

I think we have an optimizer bug!
I meant --disable-optimize, duh. Sorry for the bugspam.
Great! So can we report it?
This is showing up on our vs11 builders as well. Appears to be an issue with the use of jemalloc.
Blocks: 74734
Blocks: elm-merge
No longer blocks: 74734
Attached file test case
This is rather strange. In debugging I added a simple print statement to the end of oc_huff_tree_collapse to dump ntree when _tree is NULL. With this addition the crash goes away. Commenting out the print statement brings is back. Some here commented that they thought this might be a compiler bug which seemed somewhat unlikely to me, but with behavior like this I'm wondering if a compiler issue might be the case.
Does the offending inc ds:[-data ptr-] instruction go away with the printf?
Attached patch patch (obsolete) — Splinter Review
This quirky little patch seems to address the problem.
Attached patch patchSplinter Review
minus the makefile change.
Attachment #635417 - Attachment is obsolete: true
We could land something like this or just wait to test the final I suppose. We landed another similar quirk in bug 761279.

I'm going to hold off until our new vs2012RC elm builders are up and running. Then do some additional testing.
(In reply to Jim Mathies [:jimm] from comment #16)
> Created attachment 635418 [details] [diff] [review]
> patch
> minus the makefile change.

This didn't fix the problem on elm, although it did solve it in a local build. Which means this is going to be tricky to track down.
Lat week I landed a few changes - I landed the patch here, I disabled jemalloc, and I disabled the first run start screen with the webm video. Today when I tested the latest builds by loading the video manually there's no crash.

elm builds -

So I'm going enable the start screen again to see if the crash comes back. (When I first tested the jemalloc and patch changes, it didn't address the problem.)

One other point, I don't think we are doing full clobber builds on elm currently which I think needs to change. If that's the case I can't be certain the changes I'm making are getting incorporated into new builds.
Whiteboard: completed-elm
This appears to be fixed in the final release of vc 2012.
Closed: 12 years ago
Resolution: --- → FIXED
Deeply saddened, but this is still happening on elm (pull from 2 days ago) with vs2012 on a samsung slate with win8 (release win8 not dev preview). 

Annoying, this means that nightly crashes promptly on startup, because the first page it tries to run is which contains a video. 

my stack trace is:

    >       gkmedias.dll!oc_huff_tree_collapse(short * _tree, unsigned char[2] * _tokens, int _ntokens) Line 346   C
            gkmedias.dll!oc_huff_trees_unpack(oc_pack_buf * _opb, short * * _nodes) Line 403       C
            gkmedias.dll!oc_dec_headerin(oc_pack_buf * _opb, th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 214       C
            gkmedias.dll!th_decode_headerin(th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 242        C
            xul.dll!nsTheoraState::DecodeHeader(ogg_packet * aPacket) Line 312     C++
            xul.dll!nsOggReader::ReadHeaders(nsOggCodecState * aState) Line 142    C++
            xul.dll!nsOggReader::ReadMetadata(nsVideoInfo * aInfo, nsDataHashtable<nsCStringHashKey,nsCString> * * aTags) Line 258 C++
            xul.dll!nsBuiltinDecoderStateMachine::DecodeMetadata() Line 1733       C++
            xul.dll!nsBuiltinDecoderStateMachine::DecodeThreadRun() Line 490       C++
            xul.dll!nsRunnableMethodImpl<enum tag_nsresult (__thiscall mozilla::places::`anonymous namespace'::VisitedQuery::*)(void),1>::Run() Line 351   C++
            xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 627       C++
            xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait) Line 221       C++
            xul.dll!nsThread::ThreadFunc(void * arg) Line 265      C++
            nspr4.dll!_PR_NativeRunThread(void * arg) Line 417     C
            nspr4.dll!pr_root(void * arg) Line 90  C
            msvcr110.dll!_callthreadstartex() Line 354     C
            msvcr110.dll!_threadstartex(void * ptd) Line 332       C
            kernel32.dll!76f18543()        Unknown
            [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]     
            ntdll.dll!77efac69()   Unknown
            ntdll.dll!77efac3c()   Unknown
Resolution: FIXED → ---
Whiteboard: completed-elm → [metro-mvp]
(In reply to :Ally Naaktgeboren from comment #22)
> my stack trace is:
>     >       gkmedias.dll!oc_huff_tree_collapse(short * _tree, unsigned
> char[2] * _tokens, int _ntokens) Line 346   C

Does the disassembly here look the same as before?
No longer blocks: elm-merge
Keywords: dogfood
Attachment #635418 - Flags: checkin+
I'm hitting this in 100% of the time locally in content/media/test/test_loop.html mochitest, with VS2012/Win8/Opt builds on my Lenovo W530.

Here's the dissasembly. We're still crashing at the "inc         word ptr ds:[8BC1BF0Fh]" instruction.

As far as I can tell my VS2012 is fully updated to version "11.0.5110601 Update 1".
I reported against Visual Studio 2012 using the "Report a bug" feature of Visual Studio, it's filed here:
I uploaded a dump to the bug, and the issue has been "routed to the appropriate VS development team for investigation."
This crash is annoying me. Let's disable compiler optimizations in that function when building with MSVC2012 and later until such a time as a fix is released for MSVC2012.
Attachment #708895 - Flags: review?(tterribe)
Attachment #708895 - Flags: review?(tterribe) → review+
(In reply to Chris Pearce (:cpearce) from comment #27)
> Created attachment 708895 [details] [diff] [review]
> Patch: Disable optimize in oc_huff_tree_collapse() in MSVC2012

Landed this patch:
Assignee: nobody → cpearce
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
I've heard back from Microsoft in the support issue I filed. They suggest we disable the specific optimization that fails only:

Hi Chris,

Thank you for taking the time to provide your feedback, this indeed is a bug in our product. After reviewing your reported issue, in the context of all the issues reported to us, we are considering fixing this bug in one of our future releases.

In the meantime, could you please consider accepting this workaround until you get a product release from us with the fix?
-> As an alternative to disabling all optimizations for the affected function, disable just `/Oy` optimization for the affected function, `oc_huff_tree_collapse`, using the following directive:

#pragma optimize( "y", off ) // disable frame pointer omission
// ...
// function definition
// ...
#pragma optimize( "", on ) // reset optimizations to what was specified at the command line

If this issue is severe, causing critical business situations or blocking your product development or deployment, please go to or call 1-800-MICROSOFT for assistance.

For Microsoft premier customers, please contact your administrator, your Technical Account Manager, or your Microsoft premier account representative.

I am closing this MSConnect item. Feel free to respond if you need anything else.

Daniel Podder
VC++ Team
I don't have an windows build environment to verify the relaxed work-around, but here's a try push.
You need to log in before you can comment on or make changes to this bug.