It's #4 top crasher in 14.0a2 on Mac OS X and occurs only on Mac OS X 10.7 with 32-bit builds. 90% of crashes happen within one minute. It first appeared in 14.0a1/20120410120625. The regression range might be: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9ca66ce2672f&tochange=3fa30b0edd15 One comment says: "Trying to load a silverlight movie from netflix.com. restarted in 32 bit mode as required, plugin seems to initialise (silverlight logo appears) but the content fails to load." Signature coreclr@0x211b5d More Reports Search UUID bc7db362-3598-491d-9c76-99d142120507 Date Processed 2012-05-07 01:40:33 Uptime 2 Last Crash 1.9 minutes before submission Install Age 3.3 minutes since version was first installed. Install Time 2012-05-07 01:37:09 Product Firefox Version 15.0a1 Build ID 20120506030520 Release Channel nightly OS Mac OS X OS Version 10.7.3 11D50 Build Architecture x86 Build Architecture Info family 6 model 23 stepping 10 Crash Reason EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash Address 0x118fa1b4 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863 EMCheckCompatibility True Frame Module Signature Source 0 @0x118fa1b4 1 coreclr coreclr@0x211b5d 2 coreclr coreclr@0x15349d 3 coreclr coreclr@0x4a0be2 4 coreclr coreclr@0x152424 5 coreclr coreclr@0x1526fe 6 coreclr coreclr@0x15286a 7 coreclr coreclr@0x1da82f 8 coreclr coreclr@0x1dc166 9 agcore agcore@0x89f196 10 agcore agcore@0x7eba27 11 agcore agcore@0xc0390 12 agcore agcore@0x885fa6 13 agcore agcore@0x67f030 14 agcore agcore@0x67a034 15 agcore agcore@0x663ed5 16 agcore agcore@0x664148 17 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ More reports at: https://crash-stats.mozilla.com/report/list?signature=coreclr%400x211b5d
Adding qawanted to get some testing around Silverlight on OS X with Netflix. We should try switching between 32-bit and 64-bit as Scoobidiver suggests.
This spike coincides with the latest Silverlight update - http://www.zdnet.com/blog/microsoft/microsoft-quietly-rolls-out-silverlight-51/12682. We should try with that latest version to see what happens.
I was able to reproduce the browser crash using Silverlight 5.1.10411.0 and Firefox 14.0a2 32bit mode. Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20120510 Firefox/14.0a2 Steps to reproduce: 1. Install latest Silverlight 2. Start Firefox in 32bit mode 3. Go to a page that uses Silverlight (e.g.: http://www.vectorlight.net/games/sandmania.aspx) 4. Click an area that uses a plugin to activate the plugins (or make sure the plugins.click_to_play pref is set to false in about:config) Result: While Silverlight content is loading Firefox crashes. Note: In 64bit mode, only Silverlight plugin crashes without crashing the browser as well. Crash report: https://crash-stats.mozilla.com/report/index/bp-fb1c9221-d958-4fca-810b-7df032120511
I also crash (using Mihaela's STR from comment #3), testing with today's mozilla-central nightly, using either Silverlight 5.0.61118.0 or 5.1.10411.0 (the current version). Thia only happens on OS X 10.7.3 -- not on 10.6.8. So this may at least partly be an OS bug. Crash (in 32-bit mode) using Silverlight 5.0.61118.8: bp-e4a83fa8-6cf3-4c98-9dd7-473f32120511 Crash (in 32-bit mode) using Silverlight 5.1.10411.0 bp-e13772b4-a960-4ba5-8989-dbb162120511 As always, Silverlight won't allow itself to be debugged in gdb, so I'll have to translate by hand the symbols in my crash stacks (in a later comment). Thanks, Mihaela, for your STR!
I suspect this is only coincidentally a startup crash, but I'll leave the whiteboard as is for now.
(In reply to Steven Michaud from comment #5) > I suspect this is only coincidentally a startup crash, but I'll leave the > whiteboard as is for now. It's a crash when the plugin starts, not when Firefox starts.
Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies. Here's the regression range: firefox-2012-04-10-07-56-52-mozilla-central firefox-2012-04-10-12-06-25-mozilla-central http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fe5b0271cd1&tochange=3fa30b0edd15 Which is of course very close to Scoobidiver's regression range from comment #0. I'll use hg bisect to find the patch that triggered these crashes.
I have accidentally bumped into this bug on Linux also, so please consider adding that platform to the bug
(In reply to Maniac Vlad Florin (:vladmaniac) from comment #8) > I have accidentally bumped into this bug on Linux also, so please consider > adding that platform to the bug Can you provide your crash ID?
Finding a regression range for this bug is complicated by the fact that a 32-bit only custom build (made on OS X 10.7) crashes as far back as I've currently tested (2012-01-01). I've got another urgent bug to work on, so I'll put off further work on finding a regression range until next week.
(In reply to comment #8) So Vlad, you're running the Silverlight plugin on Linux?
(In reply to Steven Michaud from comment #11) > (In reply to comment #8) > > So Vlad, you're running the Silverlight plugin on Linux? Yes, http://www.go-mono.com/moonlight/ this one in particular
I'm back to looking for this bug's regression range. But it's not easy. I've now found that it makes a difference whether you build on 10.7 or 10.6. I'll keep digging.
It now looks like this bug was triggered by us starting to build mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see bug 720027, and particularly bug 720027 comment #40). I'll try to find a workaround for this. But that will likely require all kinds of reverse engineering, which will take a while.
So Vlad, it sounds like your Linux crash is unrelated. Please try to provide a crash id or gdb crash stack.
(Following up comment #14) Another consequence is that this bug shouldn't effect any FF release -- which for now are still built on OS X 10.6. That makes this bug much less urgent.
(Following up comment #7) > Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies. > Here's the regression range: > > firefox-2012-04-10-07-56-52-mozilla-central > firefox-2012-04-10-12-06-25-mozilla-central The machine that built firefox-2012-04-10-07-56-52-mozilla-central is called "moz2-darwin10-slave51". The machine that built firefox-2012-04-10-12-06-25-mozilla-central is called "bld-lion-r5-051". This information is from "about:buildconfig". "darwin10" is actually OS X 10.5. But I find I can only reproduce this bug with builds made on Lion (any trunk revision), and not with builds made on SnowLeopard (any trunk revision).
I can reproduce this bug with today's mozilla-central and aurora nightlies (both of which were built on OS X Lion), but not with FF 13.0b4 or today's beta-debug nightly (both of which, from the "darwin10" in their build machine names, were built on OS X Leopard).
Since this bug won't effect the FF 14 release, strictly speaking the "status-firefox14" flag should be set to "unaffected". And now that I think of it, "unaffected" should really be "uneffected" :-)
This bug also exists (with exactly the same characteristics) on OS X 10.8. I tested on the latest build (12A206, the 2nd update from DP3).
(In reply to Steven Michaud from comment #14) > It now looks like this bug was triggered by us starting to build > mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see > bug 720027, and particularly bug 720027 comment #40). How could it be? Indeed, the first crash occurred in 14.0a1/20120410120625 while the first patch of bug 720027 landed in 14.0a1/20120411030716.
(In reply to comment #23) See comment #17. It clearly shows that the first build made on Lion is also the first build with this bug.
(In reply to Steven Michaud from comment #24) > (In reply to comment #23) > See comment #17. It clearly shows that the first build made on Lion is also > the first build with this bug. Comment 17 confirms the regression range in comment 0 and the smaller one in bug 749281 comment 6. bug 720027 doesn't belong to this regression window.
Actually it *doesn't* confirm the one in comment #0, which is too long. Only the smaller one is truly accurate. It's truly very important that this bug block 720027, and that no release builds be made on OS X 10.7 until this bug is sorted out.
Also, there's lots of other evidence (besides the regression range) that this bug is caused by building on OS X 10.7. See comment #17 and comment #19. Also note that, even in current code, builds made on OS X 10.6 never have this bug, while builds made on 10.7 always do. If you want to challenge the accuracy of what I just said, find 1) a way to build on 10.6 that *does* trigger this bug, or 2) a way to build on 10.7 that *doesn't* trigger this bug. That would actually advance the discussion.
Hi Steven, are you working on this? Do you want help debugging what is the difference the causes the crash?
Hi Rafael. I've put this one off for a week or two, since it can't get into a release (as long as we don't build releases on Lion). Any help you can give me would be greatly appreciated!
(Following up comment #30) And (of course) if you find a fix, just take the bug yourself.
I was able to reproduce this building the same firefox rev with the same clang rev and sdk on 10.7 and 10.6. I am debugging it.
The problem is in the linker (or the plugin expectations on its output). Linking just the firefox binary with ld64-97.17 makes the crash stop. Linking again with ld64-128.2 causses it to crash again. Now debugging what is the difference in the output..
With lldb I was able to get the backtrace: frame #0: 0x1479f8f4 frame #1: 0x2c1dac98 coreclr`GetCLRRuntimeHost + 26328 frame #2: 0x2c2b2793 coreclr`GetCLRRuntimeHost + 909779 frame #3: 0x2c2b2f5e coreclr`GetCLRRuntimeHost + 911774 frame #4: 0x2c16b39f coreclr`MetaDataGetDispenser + 110543 frame #5: 0x2c320ab5 coreclr`GetCLRRuntimeHost + 1361141 frame #6: 0x2c137553 coreclr`PAL_InitializeCoreCLR + 58243 frame #7: 0x2c31fb55 coreclr`GetCLRRuntimeHost + 1357205 frame #8: 0x2c31fdfb coreclr`GetCLRRuntimeHost + 1357883 frame #9: 0x2c1d3000 coreclr`MetaDataGetDispenser + 535600 frame #10: 0x2c1d4777 coreclr`GetCLRRuntimeHost + 439 frame #11: 0x19791453 agcore`PackagingCrc32 + 293795 frame #12: 0x1978fc9a agcore`PackagingCrc32 + 287722 frame #13: 0x18eb4009 agcore`DrmException_GetErrorDataFromHResult + 1100137 frame #14: 0x1973b6db agcore`LocalMessageReceive + 550731 frame #15: 0x194d6b01 agcore`StylusPointCollection_InsertItem + 318257 frame #16: 0x194d1b50 agcore`StylusPointCollection_InsertItem + 297856 frame #17: 0x194bacc4 agcore`StylusPointCollection_InsertItem + 204020 frame #18: 0x194baee9 agcore`StylusPointCollection_InsertItem + 204569 frame #19: 0x9ba300ce CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 30 frame #20: 0x9ba3000d CoreFoundation`__CFRunLoopDoObservers + 413 frame #21: 0x9ba02207 CoreFoundation`CFRunLoopRunSpecific + 375 frame #22: 0x9ba02088 CoreFoundation`CFRunLoopRunInMode + 120 frame #23: 0x9b44f723 HIToolbox`RunCurrentEventLoopInMode + 318 frame #24: 0x9b456a8b HIToolbox`ReceiveNextEventCommon + 381 frame #25: 0x9b4568fa HIToolbox`BlockUntilNextEventMatchingListInMode + 88 frame #26: 0x9110c0d8 AppKit`_DPSNextEvent + 678 frame #27: 0x9110b942 AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113 frame #28: 0x02f2c371 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 145 at nsAppShell.mm:176 frame #29: 0x91107cb1 AppKit`-[NSApplication run] + 911 frame #30: 0x02f2eadc XUL`nsAppShell::Run() + 236 at nsAppShell.mm:764 frame #31: 0x02b51ecd XUL`nsAppStartup::Run() + 189 at nsAppStartup.cpp:256 frame #32: 0x010128ca XUL`XREMain::XRE_mainRun() + 5898 at nsAppRunner.cpp:3754 frame #33: 0x0101318b XUL`XREMain::XRE_main(int, char**, nsXREAppData const*) + 843 at nsAppRunner.cpp:3831 frame #34: 0x010135cd XUL`XRE_main + 77 at nsAppRunner.cpp:3907 frame #35: 0x000032c5 firefox`_ZL7do_mainiPPc + 1061 at nsBrowserApp.cpp:157 frame #36: 0x00002c3c firefox`main + 764 at nsBrowserApp.cpp:244 coreclr is /Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/CoreCLR.bundle/Contents/MacOS/coreclr Using the code in http://macfuse.googlecode.com/svn/trunk/filesystems/procfs/procfs.cc to look at the address maps I found: 14700000-14800000 1024K rw-/rwx COPY - DEFAULT uwir=0 sub=0 so coreclr is jumping into non executable memory. The code doing the jump looks like 000dac90 movl 0x0065f1a6(%ebx),%eax 000dac96 call (%eax) with ebx being set by 000dab69 calll 0x000dab6e 000dab6e popl %ebx So the address being called is being loaded from (relative to the load address is) 0x000dab6e + 0x0065f1a6 =0x739d14. This is in Section sectname __pointers segname __IMPORT addr 0x00739000 size 0x0000136c otool -Iv says 0x00739d14 LOCAL I am not sure what LOCAL means, but that address has the value 0x65fb90. I will probably try building the linker or at least reading the code to figure out what this is.
0x65fb90 is just an offset from the load base. In the case where we crash, the load address of coreclr + 0x65fb90 has the value 0x1479f8f4. The strange thing is that attaching lldb to a running firefox liked with the old linker shows similar values. coreclr + 0x65fb90 has an address inside of an non executable area that is 1MB long. Looks like the difference is just not getting to this point..
I found the problem, will add a patch in a moment.
Created attachment 626663 [details] [diff] [review] Pass -allow_heap_execute to the linker I need some help with autoconf. This patch should fix the bug, but I get the strange warning: Unknown variable:browser/app/Makefile:9:MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@ even when the variable is substituted with -Wl,-allow_heap_execute! https://tbpl.mozilla.org/?tree=Try&rev=ef93de207ed4
The pure 32 bit dmg in https://firstname.lastname@example.org/try-macosx-debug/firefox-15.0a1.en-US.mac.dmg now works. The universal dmg stil has problems. I guess there is another binary that also needs to be linked with -Wl,-allow_heap_execute.
Starting the universal firefox with arch -i386 ./FirefoxNightly.app/Contents/MacOS/firefox http://www.vectorlight.net/games/sandmania.aspx also works now, so it is probably just the out of process plugin loader that is missing the flag now.
The warning is from acoutput-fast.pl. Not sure what the correct thing to do is. I can move the variable to a .mk file and include that from browser/app/Makefile just to avoid the warning, but I am not sure it is worth it.
Comment on attachment 626663 [details] [diff] [review] Pass -allow_heap_execute to the linker Review of attachment 626663 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/app/Makefile.in @@ +5,5 @@ > DEPTH = ../.. > topsrcdir = @top_srcdir@ > srcdir = @srcdir@ > VPATH = @srcdir@ > +MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@ Generally we put these in config/autoconf.mk.in. That will make it easy for other apps to use it as well.
Created attachment 626881 [details] [diff] [review] Pass -allow_heap_execute to the linker The attached patch uses autoconf.mk.in. Talking with Ehsan and BenWa I also changed it to pass the flag only to the plugin wrapper. The logic is * When running on 10.5 and 10.6 the kernel will ignore the bit. * When running on >= 10.7, we can support silverlight only via the out of process wrapper. https://tbpl.mozilla.org/?tree=Try&rev=2d90637534bc
The build https://email@example.com/try-macosx64/firefox-15.0a1.en-US.mac.dmg works just fine for me. Steven Michaud, can you check that it works on 10.8 too?
> Steven Michaud, can you check that it works on 10.8 too? It works fine in 64-bit mode (where the 32-bit Silverlight plugin is run out-of-process). It still crashes in 32-bit mode -- but it sounds like that's deliberate.
> It still crashes in 32-bit mode -- but it sounds like that's deliberate. Correct, as that lets the main binary be linked without -Wl,-allow_heap_execute. Let me know if you think this is the wrong tradeoff.
> Correct, as that lets the main binary be linked without -Wl,-allow_heap_execute. > Let me know if you think this is the wrong tradeoff. Though I know we don't support using plugins in 32-bit mode or in-process, I don't like the idea of crashing when Silverlight is run in process. Why did you choose not to link the main binary with -allow_heap_execute? Did you think this would degrade security? I'm not sure the additional security outweighs the disadvantages. Do you know whether Safari and Chrome allow heap execution in the main process?
> Do you know whether Safari and Chrome allow heap execution in the main process? In non-plugin processes?
I suppose we should bring security people in on this issue. Dan, what do you think of the issues raised in comment #46 and comment #47? Do you think it's worthwhile to prevent the main binary from executing code on the heap if this means that the Silverlight plugin will always crash when run in-process? As I understand it, we only have the option of preventing a binary from executing heap code when it's compiled on OS X 10.7 and up (with clang and friends). Mozilla-central and aurora builds are currently made on 10.7, while beta and release builds are made on OS X 10.6.
(In reply to Steven Michaud from comment #48) > > Do you know whether Safari and Chrome allow heap execution in the main process? > > In non-plugin processes? For chrome I get /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome : heap execution not allowed /Applications/Google\ Chrome.app/Contents/Versions/21.0.1145.0/Google\ Chrome\ Helper.app/Contents/MacOS/Google\ Chrome\ Helper : heap execution not allowed /Applications/Google Chrome.app/Contents/Versions/21.0.1145.0/Google Chrome Helper EH.app/Contents/MacOS/Google Chrome Helper EH : heap execution allowed. It looks like it is the last one that actually loads the silverlight plugin. We should now be in a similar position to chrome, but one advantage they have is using out of process plugins with 32 bit binaries (we only do it with 64 bits). For safari I get: /Applications/Safari.app/Contents/MacOS/Safari: heap execution allowed. /System/Library/StagedFrameworks/Safari/WebKit2.framework/WebProcess.app/Contents/MacOS/WebProcess: heap execution allowed. /System/Library/StagedFrameworks/Safari/WebKit2.framework/PluginProcess.app/Contents/MacOS/PluginProcess: heap execution allowed. so they don't have the bit set in any of the executables, but like chrome they still use an out of process plugin when run as 32 bits. BTW, why do we run the plugins in process when running in 32 bits?
> As I understand it, we only have the option of preventing a binary from > executing heap code when it's compiled on OS X 10.7 and up (with clang and > friends). Close, it is actually the linker. The ones in xcode 4.1 and newer set the bit (and have the -allow_heap_execute option if the user doesn't want it).
> BTW, why do we run the plugins in process when running in 32 bits? If I remember right, it's because we never found a way to run QuickDraw or Carbon event mode plugins out-of-process.
btw, the chrome sources have interesting comments about it in http://src.chromium.org/svn/trunk/src/build/mac/change_mach_o_flags.py
> Do you think it's worthwhile to prevent the main binary from > executing code on the heap if this means that the Silverlight plugin > will always crash when run in-process? This is a tough question -- tougher than I first realized. Many "viruses" count on being able to run code on the stack or the heap. So preventing heap execution is worthwhile. We currently only support this on the trunk and aurora branches (not in our release builds), and then only when running on OS X 10.7 and up. But clearly we're moving in that direction for *all* our builds (if only because it's hard to support doing builds on old hardware). So I guess I've changed my mind: I now agree that it's worthwhile preventing heap execution in the main process, even if that means Silverlight can't be run in-process.
On the other hand even the plugin-container process runs with the same level of user privileges as the main process, so a "virus" executing in the plugin-container process can still do a lot of damage. But I'm still comfortable with the decision to not allow Silverlight to run in-process. As far as I know it's the only plugin that tries to execute code on the heap, which isn't exactly kosher. So lets keep things the way they are, at least for the time being.
There are crashes after the patch landed: bp-d1bf9af1-21cc-4a1e-a137-656112120526.
(In reply to comment #57) See comment #42 and following. It was decided to only fix these crashes in the plugin process, and not when Silverlight is running in the main process. Preventing execution of code on the heap is a significant security gain, and the Silverlight plugin really shouldn't be running code on the heap. Note that the Silverlight plugin is 32-bit-only on OS X.
Under the circumstances, we might want to consider making Silverlight default to running out-of-process even in 32-bit mode on OS X 10.6 and up (as the Flash plugin does).
I filed bug 758931 for remaining crashes. Feel free to close it as WONTFIX or use it for the OOP Silverlight 32-bit mode. You can also rename this one to make clear it doesn't fix crashes.
Steven: We are seeing crashes in the first Firefox 14 beta in this stack, although the flag shows version 14 as unaffected. Right now I see around ~600 crashes in similar stacks in the last week.
Odd. You sure it's the same crash? These crashes only happen with builds made on OS X 10.7 (and then only with the Silverlight plugin, and then only in 32-bit mode). I didn't think betas were built on 10.7.
But I also see a bunch of these crashes for "14.0b6". Is that a "real" beta? Is it (unlike other betas) built on OS X 10.7?
> the flag shows version 14 as unaffected The reason I set the flag that way is that this bug will never get into the FF 14 release (presuming it will be built on OS X 10.6 and not on OS X 10.7). The same is probably true of FF 15, and of at least several future FF versions. But we probably can't hold off building releases on 10.7 (or above) forever.
This is a build log from the beta branch, and it _is_ being built on 10.7: https://tbpl.mozilla.org/php/getParsedLog.php?id=12601857&tree=Mozilla-Beta&full=1
14.0b6 = 14.0b1 - the naming convention has to do with aligning with the mobile landscape. So yes, this is the first beta in the 14 cycle. (In reply to Steven Michaud from comment #64) > But I also see a bunch of these crashes for "14.0b6". > > Is that a "real" beta? Is it (unlike other betas) built on OS X 10.7?
Comment on attachment 626881 [details] [diff] [review] Pass -allow_heap_execute to the linker [Approval Request Comment] Bug caused by (feature/regressing bug #): Starting to do beta builds on 10.7. User impact if declined: Large numbers of crashes in our 14-branch betas. Testing completed (on m-c, etc.): Currently on trunk and aurora, with no reported problems Risk to taking this patch (and alternatives if risky): Minimal risk String or UUID changes made by this patch: none
Comment on attachment 626881 [details] [diff] [review] Pass -allow_heap_execute to the linker [Triage Comment] Low risk, startup crash fix. Approved.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20100101 Firefox/15.0 Verified the fix using STR from comment #3 on latest Firefox 15.0beta5 (built on bld-lion-r5-055 machine): no crash occured. Also, there are no crash reports with this signature on Firefox 15 builds in the past 4 weeks.