Crash in oc_huff_tree_collapse()

RESOLVED FIXED in mozilla21

Status

()

--
critical
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: gps, Assigned: cpearce)

Tracking

({crash, dogfood})

Trunk
mozilla21
x86
Windows 8
crash, dogfood
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [metro-mvp])

Attachments

(7 attachments, 1 obsolete attachment)

(Reporter)

Description

7 years ago
I built Nightly on Windows 8 and Visual Studio 2011 and was treated to a crash a few seconds after application start.

Unhandled exception at 0x7444A42A (xul.dll) in firefox.exe: 0xC00000005: Access violation writing location 08BBC1BF0F.

The top of the stack is huffdec.c:346:oc_huff_tree_collapse().

It is failing on the "if(_tree!=NULL)_tree[node[l]++]=(ogg_int16_t)nbits;" line.

First few frames are:
oc_huff_tree_unpack()
oc_dec_headerin()
th_decode_headerin()
nsTheoraState::DecodeHeader()

Interestingly, the memory address of _tree is 0x052fc0b0 and node is 0x0a67f55c. This is pretty far away from where the write exception occurred, 0x08BBC1BF0F.

The disassembly reveals the error occurred at instruction:

 inc word ptr ds:[8BC1BF0Fh]

which is the first instruction after the annotation for the "if(_tree!=NULL)..." line.

Steps to reproduce:

1) Build Firefox on Windows 8 and Visual Studio 2011 (using vanilla .mozconfig)
2) Load https://www.mozilla.org/projects/firefox/prerelease.html (default page on empty Nightly profile).

I have a Visual Studio crash dump file and a VirtualBox snapshot that reliably produces the bug.

This could be a MSVC2011 and/or Windows 8 bug. Both are developer previews, after all.
(Reporter)

Updated

7 years ago
Summary: oc_huff_tree_collapse → Crash in oc_huff_tree_collapse()
Wow. Can you attach the full disassembly of oc_huff_tree_collapse so we can see the context?
(Reporter)

Comment 2

7 years ago
Created attachment 575028 [details]
Disassembly from crash location

Crash is at 7444A42A.

EAX = 00000100 EBX = 00000000 ECX = 00000000 EDX = 00000008 ESI = 00000000 EDI = 00000101 EIP = 7444A42A ESP = 0A67F4D0 EBP = 0A67F55C EFL = 00010202 


+		_tree	0x052fc0b0 {27792}	short *
+		_tokens	0x0a67f5c8 {4 '\x4', 4 '\x4'}	unsigned char[2] *
		_ntokens	92	int
+		node	0x0a67f55c {0, 295, 321, 2663, 0, 0, 25036, 29764, -2012, 2663, 32, 0, 91, 0, 0, -4096, 4, 0, -24046, ...}	short[34]
+		depth	0x0a67f514 ""	unsigned char[34]
+		last	0x0a67f538 "[IF"	unsigned char[34]


>	xul.dll!oc_huff_tree_collapse(short * _tree, unsigned char[2] * _tokens, int _ntokens) Line 346	C
 	xul.dll!oc_huff_trees_unpack(oc_pack_buf * _opb, short * * _nodes) Line 403	C
 	xul.dll!oc_dec_headerin(oc_pack_buf * _opb, th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 214	C
 	xul.dll!th_decode_headerin(th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 242	C
 	xul.dll!nsTheoraState::DecodeHeader(ogg_packet * aPacket) Line 267	C++
 	xul.dll!nsOggReader::ReadHeaders(nsOggCodecState * aState) Line 163	C++
 	xul.dll!nsOggReader::ReadMetadata(nsVideoInfo * aInfo) Line 248	C++
 	xul.dll!nsBuiltinDecoderStateMachine::DecodeMetadata() Line 1370	C++
 	xul.dll!nsBuiltinDecoderStateMachine::DecodeThreadRun() Line 428	C++
 	xul.dll!nsRunnableMethodImpl<void (__thiscall nsMediaChannelStream::*)(void),0>::Run() Line 346	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 631	C++
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait) Line 245	C++
 	xul.dll!nsThread::ThreadFunc(void * arg) Line 272	C++
 	nspr4.dll!_PR_NativeRunThread(void * arg) Line 448	C
 	nspr4.dll!pr_root(void * arg) Line 122	C
 	msvcr110.dll!_callthreadstartex() Line 308	C
 	msvcr110.dll!_threadstartex(void * ptd) Line 286	C
 	kernel32.dll!75629391()	Unknown
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!77ed31b0()	Unknown
 	ntdll.dll!77ed3183()	Unknown
The crashing instruction
   inc         word ptr ds:[8BC1BF0Fh]  
is an increment to a hard-coded constant address (qualified with the ds segment register). I am not at all surprised that this causes a crash, but I have no idea where the compiler is getting this from, as there are no increments of a global variable anywhere in this code.
(Reporter)

Comment 4

7 years ago
FWIW, the value at that DS address is 2344730383.

When stepping through things in the debugger, I noticed that the current line position was not progressing linearly. Starting at the nbits=oc_huff_tree_collapse_depth(...) line (343), it went:

 343, 345, 344, 345, 344, 345, 346, crash

Looking at the disassembly annotations, it does appear that lines are being evaluated out of order. I guess the optimizer was a little too aggressive? MSVC 2011 bug? Speaking of disassembly source annotations, they are different now! The assembly still looks the same, but the attributions of source lines are off by a few instructions! And, no, the binary didn't change. One would think Visual Studio would be deterministic here. Most interesting.

You want me to perform a debug build (without optimizations)?
(Reporter)

Comment 5

7 years ago
Ehsan: you got m-c building on MSVC 2011 on Windows 7, right? Could you see if you experience this bug with a regular/non-debug build? If not, I'd be interested in seeing what your assembly looks like for the infringing function.

Updated

7 years ago
Severity: normal → critical
Keywords: crash
Gregory: worth trying to reproduce with a less-optimized build. That would confirm it's an optimizer bug.
(Reporter)

Comment 7

7 years ago
Created attachment 576308 [details]
Disassembly from crash location, debug/non-optimized

I built with --enable-debug --no-optimize and the crash went away. Disassembly attached. The same revision of the source code was used.

I think we have an optimizer bug!
(Reporter)

Comment 8

7 years ago
I meant --disable-optimize, duh. Sorry for the bugspam.
Great! So can we report it?
This is showing up on our vs11 builders as well. Appears to be an issue with the use of jemalloc.
Blocks: 74734

Updated

7 years ago
Blocks: 747347
No longer blocks: 74734
This is rather strange. In debugging I added a simple print statement to the end of oc_huff_tree_collapse to dump ntree when _tree is NULL. With this addition the crash goes away. Commenting out the print statement brings is back. Some here commented that they thought this might be a compiler bug which seemed somewhat unlikely to me, but with behavior like this I'm wondering if a compiler issue might be the case.

Updated

7 years ago
Duplicate of this bug: 764039
Does the offending inc ds:[-data ptr-] instruction go away with the printf?
Created attachment 635417 [details] [diff] [review]
patch

This quirky little patch seems to address the problem.
Created attachment 635418 [details] [diff] [review]
patch

minus the makefile change.
Attachment #635417 - Attachment is obsolete: true
We could land something like this or just wait to test the final I suppose. We landed another similar quirk in bug 761279.

I'm going to hold off until our new vs2012RC elm builders are up and running. Then do some additional testing.
(In reply to Jim Mathies [:jimm] from comment #16)
> Created attachment 635418 [details] [diff] [review]
> patch
> 
> minus the makefile change.

This didn't fix the problem on elm, although it did solve it in a local build. Which means this is going to be tricky to track down.
Lat week I landed a few changes - I landed the patch here, I disabled jemalloc, and I disabled the first run start screen with the webm video. Today when I tested the latest builds by loading the video manually there's no crash.

elm builds - 
https://bugzilla.mozilla.org/show_bug.cgi?id=703135

So I'm going enable the start screen again to see if the crash comes back. (When I first tested the jemalloc and patch changes, it didn't address the problem.)

One other point, I don't think we are doing full clobber builds on elm currently which I think needs to change. If that's the case I can't be certain the changes I'm making are getting incorporated into new builds.

Updated

7 years ago
Whiteboard: completed-elm
This appears to be fixed in the final release of vc 2012.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Deeply saddened, but this is still happening on elm (pull from 2 days ago) with vs2012 on a samsung slate with win8 (release win8 not dev preview). 

Annoying, this means that nightly crashes promptly on startup, because the first page it tries to run is  www.mozilla.org/projects/firefox/prerelease.html which contains a video. 

my stack trace is:

    >       gkmedias.dll!oc_huff_tree_collapse(short * _tree, unsigned char[2] * _tokens, int _ntokens) Line 346   C
            gkmedias.dll!oc_huff_trees_unpack(oc_pack_buf * _opb, short * * _nodes) Line 403       C
            gkmedias.dll!oc_dec_headerin(oc_pack_buf * _opb, th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 214       C
            gkmedias.dll!th_decode_headerin(th_info * _info, th_comment * _tc, th_setup_info * * _setup, ogg_packet * _op) Line 242        C
            xul.dll!nsTheoraState::DecodeHeader(ogg_packet * aPacket) Line 312     C++
            xul.dll!nsOggReader::ReadHeaders(nsOggCodecState * aState) Line 142    C++
            xul.dll!nsOggReader::ReadMetadata(nsVideoInfo * aInfo, nsDataHashtable<nsCStringHashKey,nsCString> * * aTags) Line 258 C++
            xul.dll!nsBuiltinDecoderStateMachine::DecodeMetadata() Line 1733       C++
            xul.dll!nsBuiltinDecoderStateMachine::DecodeThreadRun() Line 490       C++
            xul.dll!nsRunnableMethodImpl<enum tag_nsresult (__thiscall mozilla::places::`anonymous namespace'::VisitedQuery::*)(void),1>::Run() Line 351   C++
            xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 627       C++
            xul.dll!NS_ProcessNextEvent_P(nsIThread * thread, bool mayWait) Line 221       C++
            xul.dll!nsThread::ThreadFunc(void * arg) Line 265      C++
            nspr4.dll!_PR_NativeRunThread(void * arg) Line 417     C
            nspr4.dll!pr_root(void * arg) Line 90  C
            msvcr110.dll!_callthreadstartex() Line 354     C
            msvcr110.dll!_threadstartex(void * ptd) Line 332       C
            kernel32.dll!76f18543()        Unknown
            [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]     
            ntdll.dll!77efac69()   Unknown
            ntdll.dll!77efac3c()   Unknown
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: completed-elm → [metro-mvp]
(In reply to :Ally Naaktgeboren from comment #22)
> my stack trace is:
>     >       gkmedias.dll!oc_huff_tree_collapse(short * _tree, unsigned
> char[2] * _tokens, int _ntokens) Line 346   C

Does the disassembly here look the same as before?

Updated

6 years ago
No longer blocks: 747347
Keywords: dogfood
Attachment #635418 - Flags: checkin+
(Assignee)

Comment 24

6 years ago
Created attachment 704687 [details]
Disassembly of oc_huff_tree_collapse() VS2012 enable-optimize, disable-debug

I'm hitting this in 100% of the time locally in content/media/test/test_loop.html mochitest, with VS2012/Win8/Opt builds on my Lenovo W530.

Here's the dissasembly. We're still crashing at the "inc         word ptr ds:[8BC1BF0Fh]" instruction.

As far as I can tell my VS2012 is fully updated to version "11.0.5110601 Update 1".
(Assignee)

Comment 25

6 years ago
I reported against Visual Studio 2012 using the "Report a bug" feature of Visual Studio, it's filed here:

https://connect.microsoft.com/VisualStudio/feedback/details/777087/c-compiler-optimization-bug
(Assignee)

Comment 26

6 years ago
I uploaded a dump to the connect.ms bug, and the issue has been "routed to the appropriate VS development team for investigation."
(Assignee)

Comment 27

6 years ago
Created attachment 708895 [details] [diff] [review]
Patch: Disable optimize in oc_huff_tree_collapse() in MSVC2012

This crash is annoying me. Let's disable compiler optimizations in that function when building with MSVC2012 and later until such a time as a fix is released for MSVC2012.
Attachment #708895 - Flags: review?(tterribe)
Attachment #708895 - Flags: review?(tterribe) → review+
(Assignee)

Comment 28

6 years ago
(In reply to Chris Pearce (:cpearce) from comment #27)
> Created attachment 708895 [details] [diff] [review]
> Patch: Disable optimize in oc_huff_tree_collapse() in MSVC2012

Landed this patch:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f24905861be5
https://hg.mozilla.org/mozilla-central/rev/f24905861be5
Assignee: nobody → cpearce
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
(Assignee)

Updated

6 years ago
Duplicate of this bug: 804205
(Assignee)

Comment 31

6 years ago
I've heard back from Microsoft in the support issue I filed. They suggest we disable the specific optimization that fails only:




Hi Chris,

Thank you for taking the time to provide your feedback, this indeed is a bug in our product. After reviewing your reported issue, in the context of all the issues reported to us, we are considering fixing this bug in one of our future releases.

In the meantime, could you please consider accepting this workaround until you get a product release from us with the fix?
-> As an alternative to disabling all optimizations for the affected function, disable just `/Oy` optimization for the affected function, `oc_huff_tree_collapse`, using the following directive:

#pragma optimize( "y", off ) // disable frame pointer omission
// ...
// function definition
// ...
#pragma optimize( "", on ) // reset optimizations to what was specified at the command line

If this issue is severe, causing critical business situations or blocking your product development or deployment, please go to http://support.microsoft.com or call 1-800-MICROSOFT for assistance.

For Microsoft premier customers, please contact your administrator, your Technical Account Manager, or your Microsoft premier account representative.

I am closing this MSConnect item. Feel free to respond if you need anything else.

Thanks,
Daniel Podder
VC++ Team
Created attachment 744211 [details] [diff] [review]
Disable only omit-frame-pointer
I don't have an windows build environment to verify the relaxed work-around, but here's a try push. https://tbpl.mozilla.org/?tree=Try&rev=81cfe3121394
You need to log in before you can comment on or make changes to this bug.