It's #4 top crasher in 14.0a2 on Mac OS X and occurs only on Mac OS X 10.7 with 32-bit builds.
90% of crashes happen within one minute.
It first appeared in 14.0a1/20120410120625. The regression range might be:
One comment says:
"Trying to load a silverlight movie from netflix.com. restarted in 32 bit mode as required, plugin seems to initialise (silverlight logo appears) but the content fails to load."
Signature coreclr@0x211b5d More Reports Search
Date Processed 2012-05-07 01:40:33
Last Crash 1.9 minutes before submission
Install Age 3.3 minutes since version was first installed.
Install Time 2012-05-07 01:37:09
Build ID 20120506030520
Release Channel nightly
OS Mac OS X
OS Version 10.7.3 11D50
Build Architecture x86
Build Architecture Info family 6 model 23 stepping 10
Crash Reason EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash Address 0x118fa1b4
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863
Frame Module Signature Source
1 coreclr coreclr@0x211b5d
2 coreclr coreclr@0x15349d
3 coreclr coreclr@0x4a0be2
4 coreclr coreclr@0x152424
5 coreclr coreclr@0x1526fe
6 coreclr coreclr@0x15286a
7 coreclr coreclr@0x1da82f
8 coreclr coreclr@0x1dc166
9 agcore agcore@0x89f196
10 agcore agcore@0x7eba27
11 agcore agcore@0xc0390
12 agcore agcore@0x885fa6
13 agcore agcore@0x67f030
14 agcore agcore@0x67a034
15 agcore agcore@0x663ed5
16 agcore agcore@0x664148
17 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__
More reports at:
Adding qawanted to get some testing around Silverlight on OS X with Netflix. We should try switching between 32-bit and 64-bit as Scoobidiver suggests.
This spike coincides with the latest Silverlight update - http://www.zdnet.com/blog/microsoft/microsoft-quietly-rolls-out-silverlight-51/12682. We should try with that latest version to see what happens.
I was able to reproduce the browser crash using Silverlight 5.1.10411.0 and Firefox 14.0a2 32bit mode.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20120510 Firefox/14.0a2
Steps to reproduce:
1. Install latest Silverlight
2. Start Firefox in 32bit mode
3. Go to a page that uses Silverlight (e.g.: http://www.vectorlight.net/games/sandmania.aspx)
4. Click an area that uses a plugin to activate the plugins (or make sure the plugins.click_to_play pref is set to false in about:config)
Result: While Silverlight content is loading Firefox crashes.
Note: In 64bit mode, only Silverlight plugin crashes without crashing the browser as well.
Crash report: https://crash-stats.mozilla.com/report/index/bp-fb1c9221-d958-4fca-810b-7df032120511
I also crash (using Mihaela's STR from comment #3), testing with today's mozilla-central nightly, using either Silverlight 5.0.61118.0 or 5.1.10411.0 (the current version).
Thia only happens on OS X 10.7.3 -- not on 10.6.8. So this may at least partly be an OS bug.
Crash (in 32-bit mode) using Silverlight 5.0.61118.8:
Crash (in 32-bit mode) using Silverlight 5.1.10411.0
As always, Silverlight won't allow itself to be debugged in gdb, so I'll have to translate by hand the symbols in my crash stacks (in a later comment).
Thanks, Mihaela, for your STR!
I suspect this is only coincidentally a startup crash, but I'll leave the whiteboard as is for now.
(In reply to Steven Michaud from comment #5)
> I suspect this is only coincidentally a startup crash, but I'll leave the
> whiteboard as is for now.
It's a crash when the plugin starts, not when Firefox starts.
Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies. Here's the regression range:
Which is of course very close to Scoobidiver's regression range from comment #0.
I'll use hg bisect to find the patch that triggered these crashes.
I have accidentally bumped into this bug on Linux also, so please consider adding that platform to the bug
(In reply to Maniac Vlad Florin (:vladmaniac) from comment #8)
> I have accidentally bumped into this bug on Linux also, so please consider
> adding that platform to the bug
Can you provide your crash ID?
Finding a regression range for this bug is complicated by the fact that a 32-bit only custom build (made on OS X 10.7) crashes as far back as I've currently tested (2012-01-01).
I've got another urgent bug to work on, so I'll put off further work on finding a regression range until next week.
(In reply to comment #8)
So Vlad, you're running the Silverlight plugin on Linux?
(In reply to Steven Michaud from comment #11)
> (In reply to comment #8)
> So Vlad, you're running the Silverlight plugin on Linux?
Yes, http://www.go-mono.com/moonlight/ this one in particular
I'm back to looking for this bug's regression range. But it's not easy.
I've now found that it makes a difference whether you build on 10.7 or 10.6.
I'll keep digging.
It now looks like this bug was triggered by us starting to build mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see bug 720027, and particularly bug 720027 comment #40).
I'll try to find a workaround for this. But that will likely require all kinds of reverse engineering, which will take a while.
So Vlad, it sounds like your Linux crash is unrelated.
Please try to provide a crash id or gdb crash stack.
(Following up comment #14)
Another consequence is that this bug shouldn't effect any FF release -- which for now are still built on OS X 10.6.
That makes this bug much less urgent.
(Following up comment #7)
> Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies.
> Here's the regression range:
The machine that built firefox-2012-04-10-07-56-52-mozilla-central is called "moz2-darwin10-slave51". The machine that built firefox-2012-04-10-12-06-25-mozilla-central is called "bld-lion-r5-051".
This information is from "about:buildconfig".
"darwin10" is actually OS X 10.5. But I find I can only reproduce this bug with builds made on Lion (any trunk revision), and not with builds made on SnowLeopard (any trunk revision).
Created attachment 625296 [details]
Apple crash log
Here's an Apple crash log for these crashes.
I can reproduce this bug with today's mozilla-central and aurora nightlies (both of which were built on OS X Lion), but not with FF 13.0b4 or today's beta-debug nightly (both of which, from the "darwin10" in their build machine names, were built on OS X Leopard).
Since this bug won't effect the FF 14 release, strictly speaking the "status-firefox14" flag should be set to "unaffected".
And now that I think of it, "unaffected" should really be "uneffected" :-)
This bug also exists (with exactly the same characteristics) on OS X 10.8. I tested on the latest build (12A206, the 2nd update from DP3).
*** Bug 749281 has been marked as a duplicate of this bug. ***
(In reply to Steven Michaud from comment #14)
> It now looks like this bug was triggered by us starting to build
> mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see
> bug 720027, and particularly bug 720027 comment #40).
How could it be? Indeed, the first crash occurred in 14.0a1/20120410120625 while the first patch of bug 720027 landed in 14.0a1/20120411030716.
(In reply to comment #23)
See comment #17. It clearly shows that the first build made on Lion is also the first build with this bug.
(In reply to Steven Michaud from comment #24)
> (In reply to comment #23)
> See comment #17. It clearly shows that the first build made on Lion is also
> the first build with this bug.
Comment 17 confirms the regression range in comment 0 and the smaller one in bug 749281 comment 6. bug 720027 doesn't belong to this regression window.
Actually it *doesn't* confirm the one in comment #0, which is too long. Only the smaller one is truly accurate.
It's truly very important that this bug block 720027, and that no release builds be made on OS X 10.7 until this bug is sorted out.
Also, there's lots of other evidence (besides the regression range) that this bug is caused by building on OS X 10.7.
See comment #17 and comment #19. Also note that, even in current code, builds made on OS X 10.6 never have this bug, while builds made on 10.7 always do.
If you want to challenge the accuracy of what I just said, find 1) a way to build on 10.6 that *does* trigger this bug, or 2) a way to build on 10.7 that *doesn't* trigger this bug. That would actually advance the discussion.
*** Bug 756901 has been marked as a duplicate of this bug. ***
are you working on this? Do you want help debugging what is the difference the causes the crash?
I've put this one off for a week or two, since it can't get into a release (as long as we don't build releases on Lion).
Any help you can give me would be greatly appreciated!
(Following up comment #30)
And (of course) if you find a fix, just take the bug yourself.
I was able to reproduce this building the same firefox rev with the same clang rev and sdk on 10.7 and 10.6. I am debugging it.
The problem is in the linker (or the plugin expectations on its output). Linking just the firefox binary with ld64-97.17 makes the crash stop. Linking again with ld64-128.2 causses it to crash again.
Now debugging what is the difference in the output..
With lldb I was able to get the backtrace:
frame #0: 0x1479f8f4
frame #1: 0x2c1dac98 coreclr`GetCLRRuntimeHost + 26328
frame #2: 0x2c2b2793 coreclr`GetCLRRuntimeHost + 909779
frame #3: 0x2c2b2f5e coreclr`GetCLRRuntimeHost + 911774
frame #4: 0x2c16b39f coreclr`MetaDataGetDispenser + 110543
frame #5: 0x2c320ab5 coreclr`GetCLRRuntimeHost + 1361141
frame #6: 0x2c137553 coreclr`PAL_InitializeCoreCLR + 58243
frame #7: 0x2c31fb55 coreclr`GetCLRRuntimeHost + 1357205
frame #8: 0x2c31fdfb coreclr`GetCLRRuntimeHost + 1357883
frame #9: 0x2c1d3000 coreclr`MetaDataGetDispenser + 535600
frame #10: 0x2c1d4777 coreclr`GetCLRRuntimeHost + 439
frame #11: 0x19791453 agcore`PackagingCrc32 + 293795
frame #12: 0x1978fc9a agcore`PackagingCrc32 + 287722
frame #13: 0x18eb4009 agcore`DrmException_GetErrorDataFromHResult + 1100137
frame #14: 0x1973b6db agcore`LocalMessageReceive + 550731
frame #15: 0x194d6b01 agcore`StylusPointCollection_InsertItem + 318257
frame #16: 0x194d1b50 agcore`StylusPointCollection_InsertItem + 297856
frame #17: 0x194bacc4 agcore`StylusPointCollection_InsertItem + 204020
frame #18: 0x194baee9 agcore`StylusPointCollection_InsertItem + 204569
frame #19: 0x9ba300ce CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 30
frame #20: 0x9ba3000d CoreFoundation`__CFRunLoopDoObservers + 413
frame #21: 0x9ba02207 CoreFoundation`CFRunLoopRunSpecific + 375
frame #22: 0x9ba02088 CoreFoundation`CFRunLoopRunInMode + 120
frame #23: 0x9b44f723 HIToolbox`RunCurrentEventLoopInMode + 318
frame #24: 0x9b456a8b HIToolbox`ReceiveNextEventCommon + 381
frame #25: 0x9b4568fa HIToolbox`BlockUntilNextEventMatchingListInMode + 88
frame #26: 0x9110c0d8 AppKit`_DPSNextEvent + 678
frame #27: 0x9110b942 AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113
frame #28: 0x02f2c371 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 145 at nsAppShell.mm:176
frame #29: 0x91107cb1 AppKit`-[NSApplication run] + 911
frame #30: 0x02f2eadc XUL`nsAppShell::Run() + 236 at nsAppShell.mm:764
frame #31: 0x02b51ecd XUL`nsAppStartup::Run() + 189 at nsAppStartup.cpp:256
frame #32: 0x010128ca XUL`XREMain::XRE_mainRun() + 5898 at nsAppRunner.cpp:3754
frame #33: 0x0101318b XUL`XREMain::XRE_main(int, char**, nsXREAppData const*) + 843 at nsAppRunner.cpp:3831
frame #34: 0x010135cd XUL`XRE_main + 77 at nsAppRunner.cpp:3907
frame #35: 0x000032c5 firefox`_ZL7do_mainiPPc + 1061 at nsBrowserApp.cpp:157
frame #36: 0x00002c3c firefox`main + 764 at nsBrowserApp.cpp:244
coreclr is /Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/CoreCLR.bundle/Contents/MacOS/coreclr
Using the code in http://macfuse.googlecode.com/svn/trunk/filesystems/procfs/procfs.cc to look at the address maps I found:
14700000-14800000 1024K rw-/rwx COPY - DEFAULT uwir=0 sub=0
so coreclr is jumping into non executable memory. The code doing the jump looks like
000dac90 movl 0x0065f1a6(%ebx),%eax
000dac96 call (%eax)
with ebx being set by
000dab69 calll 0x000dab6e
000dab6e popl %ebx
So the address being called is being loaded from (relative to the load address is) 0x000dab6e + 0x0065f1a6 =0x739d14. This is in
otool -Iv says
I am not sure what LOCAL means, but that address has the value 0x65fb90.
I will probably try building the linker or at least reading the code to figure out what this is.
0x65fb90 is just an offset from the load base. In the case where we crash, the load address of coreclr + 0x65fb90 has the value 0x1479f8f4.
The strange thing is that attaching lldb to a running firefox liked with the old linker shows similar values. coreclr + 0x65fb90 has an address inside of an non executable area that is 1MB long. Looks like the difference is just not getting to this point..
I found the problem, will add a patch in a moment.
Created attachment 626663 [details] [diff] [review]
Pass -allow_heap_execute to the linker
I need some help with autoconf. This patch should fix the bug, but I get the strange warning:
Unknown variable:browser/app/Makefile:9:MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@
even when the variable is substituted with -Wl,-allow_heap_execute!
The pure 32 bit dmg in
now works. The universal dmg stil has problems. I guess there is another binary that also needs to be linked with -Wl,-allow_heap_execute.
Starting the universal firefox with
arch -i386 ./FirefoxNightly.app/Contents/MacOS/firefox http://www.vectorlight.net/games/sandmania.aspx
also works now, so it is probably just the out of process plugin loader that is missing the flag now.
The warning is from acoutput-fast.pl. Not sure what the correct thing to do is. I can move the variable to a .mk file and include that from browser/app/Makefile just to avoid the warning, but I am not sure it is worth it.
Comment on attachment 626663 [details] [diff] [review]
Pass -allow_heap_execute to the linker
Review of attachment 626663 [details] [diff] [review]:
@@ +5,5 @@
> DEPTH = ../..
> topsrcdir = @top_srcdir@
> srcdir = @srcdir@
> VPATH = @srcdir@
> +MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@
Generally we put these in config/autoconf.mk.in. That will make it easy for other apps to use it as well.
Created attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker
The attached patch uses autoconf.mk.in.
Talking with Ehsan and BenWa I also changed it to pass the flag only to the plugin wrapper. The logic is
* When running on 10.5 and 10.6 the kernel will ignore the bit.
* When running on >= 10.7, we can support silverlight only via the out of process wrapper.
works just fine for me. Steven Michaud, can you check that it works on 10.8 too?
> Steven Michaud, can you check that it works on 10.8 too?
It works fine in 64-bit mode (where the 32-bit Silverlight plugin is run out-of-process).
It still crashes in 32-bit mode -- but it sounds like that's deliberate.
> It still crashes in 32-bit mode -- but it sounds like that's deliberate.
Correct, as that lets the main binary be linked without -Wl,-allow_heap_execute. Let me know if you think this is the wrong tradeoff.
> Correct, as that lets the main binary be linked without -Wl,-allow_heap_execute.
> Let me know if you think this is the wrong tradeoff.
Though I know we don't support using plugins in 32-bit mode or in-process, I don't like the idea of crashing when Silverlight is run in process.
Why did you choose not to link the main binary with -allow_heap_execute? Did you think this would degrade security?
I'm not sure the additional security outweighs the disadvantages. Do you know whether Safari and Chrome allow heap execution in the main process?
> Do you know whether Safari and Chrome allow heap execution in the main process?
In non-plugin processes?
I suppose we should bring security people in on this issue.
Dan, what do you think of the issues raised in comment #46 and comment #47? Do you think it's worthwhile to prevent the main binary from executing code on the heap if this means that the Silverlight plugin will always crash when run in-process?
As I understand it, we only have the option of preventing a binary from executing heap code when it's compiled on OS X 10.7 and up (with clang and friends). Mozilla-central and aurora builds are currently made on 10.7, while beta and release builds are made on OS X 10.6.
(In reply to Steven Michaud from comment #48)
> > Do you know whether Safari and Chrome allow heap execution in the main process?
> In non-plugin processes?
For chrome I get
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome : heap execution not allowed
/Applications/Google\ Chrome.app/Contents/Versions/21.0.1145.0/Google\ Chrome\ Helper.app/Contents/MacOS/Google\ Chrome\ Helper : heap execution not allowed
/Applications/Google Chrome.app/Contents/Versions/21.0.1145.0/Google Chrome Helper EH.app/Contents/MacOS/Google Chrome Helper EH : heap execution allowed.
It looks like it is the last one that actually loads the silverlight plugin. We should now be in a similar position to chrome, but one advantage they have is using out of process plugins with 32 bit binaries (we only do it with 64 bits).
For safari I get:
/Applications/Safari.app/Contents/MacOS/Safari: heap execution allowed.
/System/Library/StagedFrameworks/Safari/WebKit2.framework/WebProcess.app/Contents/MacOS/WebProcess: heap execution allowed.
/System/Library/StagedFrameworks/Safari/WebKit2.framework/PluginProcess.app/Contents/MacOS/PluginProcess: heap execution allowed.
so they don't have the bit set in any of the executables, but like chrome they still use an out of process plugin when run as 32 bits.
BTW, why do we run the plugins in process when running in 32 bits?
> As I understand it, we only have the option of preventing a binary from
> executing heap code when it's compiled on OS X 10.7 and up (with clang and
Close, it is actually the linker. The ones in xcode 4.1 and newer set the bit (and have the -allow_heap_execute option if the user doesn't want it).
> BTW, why do we run the plugins in process when running in 32 bits?
If I remember right, it's because we never found a way to run QuickDraw or Carbon event mode plugins out-of-process.
btw, the chrome sources have interesting comments about it in
> Do you think it's worthwhile to prevent the main binary from
> executing code on the heap if this means that the Silverlight plugin
> will always crash when run in-process?
This is a tough question -- tougher than I first realized.
Many "viruses" count on being able to run code on the stack or the
heap. So preventing heap execution is worthwhile.
We currently only support this on the trunk and aurora branches (not
in our release builds), and then only when running on OS X 10.7 and
up. But clearly we're moving in that direction for *all* our builds
(if only because it's hard to support doing builds on old hardware).
So I guess I've changed my mind: I now agree that it's worthwhile
preventing heap execution in the main process, even if that means
Silverlight can't be run in-process.
On the other hand even the plugin-container process runs with the same level of user privileges as the main process, so a "virus" executing in the plugin-container process can still do a lot of damage.
But I'm still comfortable with the decision to not allow Silverlight to run in-process. As far as I know it's the only plugin that tries to execute code on the heap, which isn't exactly kosher.
So lets keep things the way they are, at least for the time being.
There are crashes after the patch landed: bp-d1bf9af1-21cc-4a1e-a137-656112120526.
(In reply to comment #57)
See comment #42 and following.
It was decided to only fix these crashes in the plugin process, and not when Silverlight is running in the main process. Preventing execution of code on the heap is a significant security gain, and the Silverlight plugin really shouldn't be running code on the heap.
Note that the Silverlight plugin is 32-bit-only on OS X.
Under the circumstances, we might want to consider making Silverlight default to running out-of-process even in 32-bit mode on OS X 10.6 and up (as the Flash plugin does).
I filed bug 758931 for remaining crashes. Feel free to close it as WONTFIX or use it for the OOP Silverlight 32-bit mode.
You can also rename this one to make clear it doesn't fix crashes.
(Following up comment #59)
I've opened bug 759364.
Steven: We are seeing crashes in the first Firefox 14 beta in this stack, although the flag shows version 14 as unaffected. Right now I see around ~600 crashes in similar stacks in the last week.
Odd. You sure it's the same crash?
These crashes only happen with builds made on OS X 10.7 (and then only with the Silverlight plugin, and then only in 32-bit mode). I didn't think betas were built on 10.7.
But I also see a bunch of these crashes for "14.0b6".
Is that a "real" beta? Is it (unlike other betas) built on OS X 10.7?
> the flag shows version 14 as unaffected
The reason I set the flag that way is that this bug will never get into the FF 14 release (presuming it will be built on OS X 10.6 and not on OS X 10.7).
The same is probably true of FF 15, and of at least several future FF versions. But we probably can't hold off building releases on 10.7 (or above) forever.
This is a build log from the beta branch, and it _is_ being built on 10.7:
14.0b6 = 14.0b1 - the naming convention has to do with aligning with the mobile landscape. So yes, this is the first beta in the 14 cycle.
(In reply to Steven Michaud from comment #64)
> But I also see a bunch of these crashes for "14.0b6".
> Is that a "real" beta? Is it (unlike other betas) built on OS X 10.7?
Comment on attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Starting to do beta builds on 10.7.
User impact if declined: Large numbers of crashes in our 14-branch betas.
Testing completed (on m-c, etc.): Currently on trunk and aurora, with no reported problems
Risk to taking this patch (and alternatives if risky): Minimal risk
String or UUID changes made by this patch: none
See bug 764385.
Comment on attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker
Low risk, startup crash fix. Approved.
*** Bug 764385 has been marked as a duplicate of this bug. ***
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20100101 Firefox/15.0
Verified the fix using STR from comment #3 on latest Firefox 15.0beta5 (built on bld-lion-r5-055 machine): no crash occured.
Also, there are no crash reports with this signature on Firefox 15 builds in the past 4 weeks.