Closed Bug 815543 Opened 12 years ago Closed 7 years ago

[10.8] Flash crash in OpenGL@0x3317

Categories

(Core Graveyard :: Plug-ins, defect)

x86_64
macOS
defect
Not set
critical

Tracking

(firefox20-)

RESOLVED WORKSFORME
Tracking Status
firefox20 - ---

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

It's #7 top crasher in 18.0a2, #6 in 19.0a1 and #2 in 20.0a1 on Mac OS X.

Signature 	OpenGL@0x3317 More Reports Search
UUID	6e409894-2610-4c12-9a65-c0dc62121126
Date Processed	2012-11-26 18:27:05
Process Type	plugin Version: Filename: Flash Player.plugin
Uptime	6317
Install Age	1.8 hours since version was first installed.
Install Time	2012-11-26 16:32:47
Product	Firefox
Version	20.0a1
Build ID	20121126030823
Release Channel	nightly
OS	Mac OS X
OS Version	10.8.2 12C3006
Build Architecture	amd64
Build Architecture Info	family 6 model 58 stepping 9
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x26887f00
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x 166GL Context? GL Context+ GL Layers? GL Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x 166

Frame 	Module 	Signature 	Source
0 	OpenGL 	OpenGL@0x3317 	
1 	OpenGL 	OpenGL@0x7ffe 	
2 	OpenGL 	OpenGL@0x8018 	
3 	libmozglue.dylib 	je_malloc 	jemalloc.c:4219
4 	CoreGraphics 	CoreGraphics@0x204d70 	
5 	CoreGraphics 	CoreGraphics@0x1432cc 	
6 	QuartzCore 	QuartzCore@0x52721 	
7 	QuartzCore 	QuartzCore@0x4c2f2 	
8 	libobjc.A.dylib 	libobjc.A.dylib@0x9a34 	
9 	libmozglue.dylib 	arena_dalloc 	jemalloc.c:1679
10 	CoreGraphics 	CoreGraphics@0x143533 	
11 	libsystem_c.dylib 	libsystem_c.dylib@0x1a134 	
12 	CoreGraphics 	CoreGraphics@0x67026 	
13 	CoreGraphics 	CoreGraphics@0x143309 	
14 	CoreGraphics 	CoreGraphics@0x3a9cdd 	
15 	GLEngine 	GLEngine@0x17a2e 	
16 	QuartzCore 	QuartzCore@0x43567

More reports at:
https://crash-stats.mozilla.com/report/list?signature=OpenGL%400x3317
Note the jemalloc stuff in the stacks.

Could this be a jemalloc bug, possibly one specific to OS X 10.8?
The displayed stack is almost certainly incorrect, and it's likely that jemalloc is not on the callstack at all. If it were, that probably just means there is heap corruption. It is extremely unlikely that there is actually an allocator bug here.
If some plugin used malloc() or friends, would that cause jemalloc code to run?
I don't know, on mac.

In any case, I have confirmation from Ted that since we're missing symbols for the OS libs, the stack after the first frame is complete guesswork. He's going to look at uploading the symbols from his mac which is matching (or mostly matching) and we can reprocess one or more of these to see what the stack actually is.
Flags: needinfo?(justin.lebar+bug)
I put the symbols I could get up on the symbol server. I didn't have the exact same versions of all libraries, but I had a number of them.
(In reply to Steven Michaud from comment #3)
> If some plugin used malloc() or friends, would that cause jemalloc code to
> run?

If the plugin runs in the same process as Gecko, I would expect so.

If the plugin runs OOP, I'm not sure.  Maybe glandium knows.
Flags: needinfo?(justin.lebar+bug)
I also wonder if system libraries/frameworks end up running jemalloc code when they use malloc() and friends.
(In reply to Steven Michaud from comment #7)
> I also wonder if system libraries/frameworks end up running jemalloc code
> when they use malloc() and friends.

They must; otherwise, strdup() would be an allocator mismatch waiting to happen.
(In reply to comment #8)

Makes sense.  So on to the second question :-)

Do you think the references to jemalloc code in the stack from comment #0 are spurious?

(Give me 10 minutes and I'll use atos to translate the addresses to symbols by hand.)
(Following up comment #10)

Parts of this stack make no sense to me (for example all the stuff that happens under pthread_mutex_unlock, if that's indeed being called).  So it may indeed be corrupt.

What do you think, Benoit?
Comment on attachment 685807 [details]
Stack from comment #0 with symbols

It makes no sense for arena_dalloc to be called in any stack which involves je_malloc, so that part is at least wrong.
Yeah, that stack makes no sense.

As for malloc, on OSX, in both main process and subprocesses, jemalloc is registered as the default zone allocator, which means je_malloc is always used. For realloc and free, however, je_free is only used if the pointer it is given is identified as having been allocated by jemalloc. If not, the free and realloc from the zone that allocated that particular memory are used.
That stack is completely bogus. I have a slightly-better stack I symbolized by dumping symbols from my local 10.8.2 install, but it's still not perfect (I don't have the matching version of CoreGraphics). The top two frames are:
Thread 0 (crashed)
 0  OpenGL!glcGetIOAccelService + 0xacb
    rbx = 0x000000011bad1000   r12 = 0x00000000000000f6
    r13 = 0x0000000000000f68   r14 = 0x000000011bad1000
    r15 = 0x00007fff5fbf7f78   rip = 0x00007fff8c5e8317
    rsp = 0x00007fff5fbf7e60   rbp = 0x00007fff5fbf7e70
    Found by: given as instruction pointer in context
 1  OpenGL!CGLUpdateContext + 0x19
    rbx = 0x000000000000211c   r12 = 0x00000000000000f6
    r13 = 0x0000000000000f68   r14 = 0x000000011bad1000
    r15 = 0x00007fff5fbf7f78   rip = 0x00007fff8c5ed019
    rsp = 0x00007fff5fbf7e80   rbp = 0x00007fff5fbf7f00
    Found by: call frame info

It wanders off into some CoreGraphics code after that. It doesn't seem terribly likely that this is allocator related, but I could be wrong. If we can find symbols for that version of CoreGraphics we can probably get a decent stack here. (The crash report says 10.8.2, which I'm on, but perhaps I'm missing one update or something?)
bug 790390 is like a dupe of this bug.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #14)
> That stack is completely bogus. I have a slightly-better stack I symbolized
> by dumping symbols from my local 10.8.2 install, but it's still not perfect
> (I don't have the matching version of CoreGraphics). The top two frames are:
> Thread 0 (crashed)
>  0  OpenGL!glcGetIOAccelService + 0xacb
>     rbx = 0x000000011bad1000   r12 = 0x00000000000000f6
>     r13 = 0x0000000000000f68   r14 = 0x000000011bad1000
>     r15 = 0x00007fff5fbf7f78   rip = 0x00007fff8c5e8317
>     rsp = 0x00007fff5fbf7e60   rbp = 0x00007fff5fbf7e70
>     Found by: given as instruction pointer in context
>  1  OpenGL!CGLUpdateContext + 0x19
>     rbx = 0x000000000000211c   r12 = 0x00000000000000f6
>     r13 = 0x0000000000000f68   r14 = 0x000000011bad1000
>     r15 = 0x00007fff5fbf7f78   rip = 0x00007fff8c5ed019
>     rsp = 0x00007fff5fbf7e80   rbp = 0x00007fff5fbf7f00
>     Found by: call frame info
> 
> It wanders off into some CoreGraphics code after that. It doesn't seem
> terribly likely that this is allocator related, but I could be wrong. If we
> can find symbols for that version of CoreGraphics we can probably get a
> decent stack here. (The crash report says 10.8.2, which I'm on, but perhaps
> I'm missing one update or something?)

Perhaps something regressed in apple's 10.8 OGL driver. It's also possible that we're hitting hardware accelerated CoreGraphics, getting the rest of the CoreGraphics symbols could confirm that. My other guess is that it could be triggered either by using HiDPI or could be caused by the plugin changes that landed to better support HiDPI.

Let's take look at the URL to see if we can get a hint at reproducing this issue (does it happen all over the web or a particular site).
Keywords: needURLs
(In reply to comment #14)

> That stack is completely bogus. I have a slightly-better stack I
> symbolized by dumping symbols from my local 10.8.2 install, but it's
> still not perfect (I don't have the matching version of
> CoreGraphics).

I translated the (non-Mozilla) symbols in that stack using atos on my
own (fully current) 10.8.2 install.  (And yes, I specified '-arch
x86_64'.)

How do you know you don't have the matching version of CoreGraphics?
Translating the symbols listed doesn't help much, because the stackwalk itself is broken.

Ted knows that they don't match because the debug ID of his local versions doesn't match the debug ID listed in the crash report.
Right, I ran dump_syms on my local binaries and checked the Debug ID vs. what's in the crash report. Debug IDs on mac are either the UUID of the binary (from LC_UUID), or if that's not present a hash of the first page of the text section. If you want to try dumping your CoreGraphics binary and see if yours matches it's pretty simple, just run $objdir/dist/host/bin/dump_syms /path/to/CoreGraphics and look at the first line of output. If it matches 9A1324EFC9CB30E4AE9F0AEF69052FAE0, you have the right version.
> Translating the symbols listed doesn't help much, because the
> stackwalk itself is broken.

Yes.  But I'm mainly interested in finding out how Ted knows his
local CoreGraphics is the "wrong" version.

> Ted knows that they don't match because the debug ID of his local
> versions doesn't match the debug ID listed in the crash report.

So my question now becomes "how do you generate the debug ID"?

Ted has answered it.
CoreGraphics debug ID on my 10.8.2 system (x86_64 architecture) is DCC70C6EAB6D3457A8237569CB29B1070.  So I don't have the "right" CoreGraphics, either.
The stacks at https://crash-stats.mozilla.com actually show two different build IDs for OS X 10.8.2 -- 12C3006 and 12C60.  12C60 is the one I have, and seems to be the latest.  The debug IDs for CoreGraphics in the few 12C60 examples I looked at are the same as mine (and presumably yours).

But the various stacks, though they all have jemalloc stuff in them, look quite different aside from the top few lines.  Another sign of corruption.
Apple has released several different "updates to OS X 10.8.2".  That presumably explains the different build IDs.

http://support.apple.com/kb/DL1580 (Update)
http://support.apple.com/kb/DL1581 (Combo Update)
http://support.apple.com/kb/DL1600 (Supplement)
http://support.apple.com/kb/DL1611 (Supplement 2.0)
Tracking for now to keep on our radar as 20 moves through the channels, but we should confirm if this is a dupe of bug 790390 and keep an eye on volume - if this drops off topcrash list please update.
I just noticed that I have a fat 32/64 binary of flash and our plugin code will prefer the matching architecture. Do we know when Adobe shipping fat binaries? It could explain new instability if we just started loading the 64 bit version.
> Do we know when Adobe shipping fat binaries?

Version 11 and up (see bug 804606 comment #27).  I don't know when the first 11 release took place, but you may be able to find out at http://helpx.adobe.com/flash-player/kb/archived-flash-player-versions.html.
(Released 10/3/2011) Flash Player 11.0.1.152 so that's not it.
There have been only two crashes for the last week.
Not tracking for release, given comment# 29.Please renominate if the crash landscape changes .
I'm marking this bug as WORKSFORME as bug crashlog signature didn't appear from a long time (over half year).
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: