Last Comment Bug 753248 - [10.7][10.8] crash in coreclr with Silverlight applications with builds made on OS X 10.7 (Lion)
: [10.7][10.8] crash in coreclr with Silverlight applications with builds made ...
Status: RESOLVED FIXED
[startupcrash]
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: 14 Branch
: x86 Mac OS X
: -- critical (vote)
: mozilla15
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
: Mihaela Velimiroviciu (:mihaelav)
Mentors:
: 749281 756901 764385 (view as bug list)
Depends on:
Blocks: lion-compatibility mountain-lion-compat 720027
  Show dependency treegraph
 
Reported: 2012-05-08 23:20 PDT by Scoobidiver (away)
Modified: 2012-08-17 06:18 PDT (History)
16 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
verified


Attachments
Apple crash log (78.37 KB, text/plain)
2012-05-18 16:12 PDT, Steven Michaud [:smichaud] (Retired)
no flags Details
Pass -allow_heap_execute to the linker (3.27 KB, patch)
2012-05-23 18:40 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Splinter Review
Pass -allow_heap_execute to the linker (3.40 KB, patch)
2012-05-24 11:00 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-05-08 23:20:31 PDT
It's #4 top crasher in 14.0a2 on Mac OS X and occurs only on Mac OS X 10.7 with 32-bit builds.
90% of crashes happen within one minute.

It first appeared in 14.0a1/20120410120625. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9ca66ce2672f&tochange=3fa30b0edd15

One comment says:
"Trying to load a silverlight movie from netflix.com. restarted in 32 bit mode as required, plugin seems to initialise (silverlight logo appears) but the content fails to load."

Signature 	coreclr@0x211b5d More Reports Search
UUID	bc7db362-3598-491d-9c76-99d142120507
Date Processed	2012-05-07 01:40:33
Uptime	2
Last Crash	1.9 minutes before submission
Install Age	3.3 minutes since version was first installed.
Install Time	2012-05-07 01:37:09
Product	Firefox
Version	15.0a1
Build ID	20120506030520
Release Channel	nightly
OS	Mac OS X
OS Version	10.7.3 11D50
Build Architecture	x86
Build Architecture Info	family 6 model 23 stepping 10
Crash Reason	EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash Address	0x118fa1b4
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 		@0x118fa1b4 	
1 	coreclr 	coreclr@0x211b5d 	
2 	coreclr 	coreclr@0x15349d 	
3 	coreclr 	coreclr@0x4a0be2 	
4 	coreclr 	coreclr@0x152424 	
5 	coreclr 	coreclr@0x1526fe 	
6 	coreclr 	coreclr@0x15286a 	
7 	coreclr 	coreclr@0x1da82f 	
8 	coreclr 	coreclr@0x1dc166 	
9 	agcore 	agcore@0x89f196 	
10 	agcore 	agcore@0x7eba27 	
11 	agcore 	agcore@0xc0390 	
12 	agcore 	agcore@0x885fa6 	
13 	agcore 	agcore@0x67f030 	
14 	agcore 	agcore@0x67a034 	
15 	agcore 	agcore@0x663ed5 	
16 	agcore 	agcore@0x664148 	
17 	CoreFoundation 	__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__

More reports at:
https://crash-stats.mozilla.com/report/list?signature=coreclr%400x211b5d
Comment 1 Alex Keybl [:akeybl] 2012-05-09 16:32:44 PDT
Adding qawanted to get some testing around Silverlight on OS X with Netflix. We should try switching between 32-bit and 64-bit as Scoobidiver suggests.
Comment 2 Marcia Knous [:marcia - use ni] 2012-05-09 17:18:33 PDT
This spike coincides with the latest Silverlight update - http://www.zdnet.com/blog/microsoft/microsoft-quietly-rolls-out-silverlight-51/12682. We should try with that latest version to see what happens.
Comment 3 Mihaela Velimiroviciu (:mihaelav) 2012-05-11 05:49:52 PDT
I was able to reproduce the browser crash using Silverlight 5.1.10411.0 and Firefox 14.0a2 32bit mode.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20120510 Firefox/14.0a2

Steps to reproduce:
1. Install latest Silverlight
2. Start Firefox in 32bit mode
3. Go to a page that uses Silverlight (e.g.:  http://www.vectorlight.net/games/sandmania.aspx)
4. Click an area that uses a plugin to activate the plugins (or make sure the plugins.click_to_play pref is set to false in about:config)

Result: While Silverlight content is loading Firefox crashes.

Note: In 64bit mode, only Silverlight plugin crashes without crashing the browser as well.

Crash report: https://crash-stats.mozilla.com/report/index/bp-fb1c9221-d958-4fca-810b-7df032120511
Comment 4 Steven Michaud [:smichaud] (Retired) 2012-05-11 08:16:07 PDT
I also crash (using Mihaela's STR from comment #3), testing with today's mozilla-central nightly, using either Silverlight 5.0.61118.0 or 5.1.10411.0 (the current version).

Thia only happens on OS X 10.7.3 -- not on 10.6.8.  So this may at least partly be an OS bug.

Crash (in 32-bit mode) using Silverlight 5.0.61118.8:
bp-e4a83fa8-6cf3-4c98-9dd7-473f32120511

Crash (in 32-bit mode) using Silverlight 5.1.10411.0
bp-e13772b4-a960-4ba5-8989-dbb162120511

As always, Silverlight won't allow itself to be debugged in gdb, so I'll have to translate by hand the symbols in my crash stacks (in a later comment).

Thanks, Mihaela, for your STR!
Comment 5 Steven Michaud [:smichaud] (Retired) 2012-05-11 08:25:00 PDT
I suspect this is only coincidentally a startup crash, but I'll leave the whiteboard as is for now.
Comment 6 Scoobidiver (away) 2012-05-11 08:31:37 PDT
(In reply to Steven Michaud from comment #5)
> I suspect this is only coincidentally a startup crash, but I'll leave the
> whiteboard as is for now.
It's a crash when the plugin starts, not when Firefox starts.
Comment 7 Steven Michaud [:smichaud] (Retired) 2012-05-11 08:34:20 PDT
Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies.  Here's the regression range:

firefox-2012-04-10-07-56-52-mozilla-central
firefox-2012-04-10-12-06-25-mozilla-central

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fe5b0271cd1&tochange=3fa30b0edd15

Which is of course very close to Scoobidiver's regression range from comment #0.

I'll use hg bisect to find the patch that triggered these crashes.
Comment 8 Maniac Vlad Florin (:vladmaniac) 2012-05-11 10:55:20 PDT
I have accidentally bumped into this bug on Linux also, so please consider adding that platform to the bug
Comment 9 Scoobidiver (away) 2012-05-11 11:03:21 PDT
(In reply to Maniac Vlad Florin (:vladmaniac) from comment #8)
> I have accidentally bumped into this bug on Linux also, so please consider
> adding that platform to the bug
Can you provide your crash ID?
Comment 10 Steven Michaud [:smichaud] (Retired) 2012-05-11 11:32:38 PDT
Finding a regression range for this bug is complicated by the fact that a 32-bit only custom build (made on OS X 10.7) crashes as far back as I've currently tested (2012-01-01).

I've got another urgent bug to work on, so I'll put off further work on finding a regression range until next week.
Comment 11 Steven Michaud [:smichaud] (Retired) 2012-05-11 12:10:43 PDT
(In reply to comment #8)

So Vlad, you're running the Silverlight plugin on Linux?
Comment 12 Maniac Vlad Florin (:vladmaniac) 2012-05-11 12:42:50 PDT
(In reply to Steven Michaud from comment #11)
> (In reply to comment #8)
> 
> So Vlad, you're running the Silverlight plugin on Linux?

Yes, http://www.go-mono.com/moonlight/ this one in particular
Comment 13 Steven Michaud [:smichaud] (Retired) 2012-05-18 11:58:05 PDT
I'm back to looking for this bug's regression range.  But it's not easy.

I've now found that it makes a difference whether you build on 10.7 or 10.6.

I'll keep digging.
Comment 14 Steven Michaud [:smichaud] (Retired) 2012-05-18 15:43:07 PDT
It now looks like this bug was triggered by us starting to build mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see bug 720027, and particularly bug 720027 comment #40).

I'll try to find a workaround for this.  But that will likely require all kinds of reverse engineering, which will take a while.
Comment 15 Steven Michaud [:smichaud] (Retired) 2012-05-18 15:46:57 PDT
So Vlad, it sounds like your Linux crash is unrelated.

Please try to provide a crash id or gdb crash stack.
Comment 16 Steven Michaud [:smichaud] (Retired) 2012-05-18 15:51:19 PDT
(Following up comment #14)

Another consequence is that this bug shouldn't effect any FF release -- which for now are still built on OS X 10.6.

That makes this bug much less urgent.
Comment 17 Steven Michaud [:smichaud] (Retired) 2012-05-18 16:02:35 PDT
(Following up comment #7)

> Mihaela's STR from comment #3 only "works" in recent mozilla-central nightlies.
> Here's the regression range:
>
> firefox-2012-04-10-07-56-52-mozilla-central
> firefox-2012-04-10-12-06-25-mozilla-central

The machine that built firefox-2012-04-10-07-56-52-mozilla-central is called "moz2-darwin10-slave51".  The machine that built firefox-2012-04-10-12-06-25-mozilla-central is called "bld-lion-r5-051".

This information is from "about:buildconfig".

"darwin10" is actually OS X 10.5.  But I find I can only reproduce this bug with builds made on Lion (any trunk revision), and not with builds made on SnowLeopard (any trunk revision).
Comment 18 Steven Michaud [:smichaud] (Retired) 2012-05-18 16:12:00 PDT
Created attachment 625296 [details]
Apple crash log

Here's an Apple crash log for these crashes.
Comment 19 Steven Michaud [:smichaud] (Retired) 2012-05-18 16:24:57 PDT
I can reproduce this bug with today's mozilla-central and aurora nightlies (both of which were built on OS X Lion), but not with FF 13.0b4 or today's beta-debug nightly (both of which, from the "darwin10" in their build machine names, were built on OS X Leopard).
Comment 20 Steven Michaud [:smichaud] (Retired) 2012-05-18 16:42:23 PDT
Since this bug won't effect the FF 14 release, strictly speaking the "status-firefox14" flag should be set to "unaffected".

And now that I think of it, "unaffected" should really be "uneffected" :-)
Comment 21 Steven Michaud [:smichaud] (Retired) 2012-05-18 17:06:55 PDT
This bug also exists (with exactly the same characteristics) on OS X 10.8.  I tested on the latest build (12A206, the 2nd update from DP3).
Comment 22 Steven Michaud [:smichaud] (Retired) 2012-05-18 17:16:02 PDT
*** Bug 749281 has been marked as a duplicate of this bug. ***
Comment 23 Scoobidiver (away) 2012-05-18 23:02:58 PDT
(In reply to Steven Michaud from comment #14)
> It now looks like this bug was triggered by us starting to build
> mozilla-central nightlies on OS X 10.7, which happened on 2012-04-10 (see
> bug 720027, and particularly bug 720027 comment #40).
How could it be? Indeed, the first crash occurred in 14.0a1/20120410120625 while the first patch of bug 720027 landed in 14.0a1/20120411030716.
Comment 24 Steven Michaud [:smichaud] (Retired) 2012-05-19 08:56:50 PDT
(In reply to comment #23)

See comment #17.  It clearly shows that the first build made on Lion is also the first build with this bug.
Comment 25 Scoobidiver (away) 2012-05-19 09:30:42 PDT
(In reply to Steven Michaud from comment #24)
> (In reply to comment #23)
> See comment #17.  It clearly shows that the first build made on Lion is also
> the first build with this bug.
Comment 17 confirms the regression range in comment 0 and the smaller one in bug 749281 comment 6. bug 720027 doesn't belong to this regression window.
Comment 26 Steven Michaud [:smichaud] (Retired) 2012-05-19 09:35:42 PDT
Actually it *doesn't* confirm the one in comment #0, which is too long.  Only the smaller one is truly accurate.

It's truly very important that this bug block 720027, and that no release builds be made on OS X 10.7 until this bug is sorted out.
Comment 27 Steven Michaud [:smichaud] (Retired) 2012-05-19 09:39:36 PDT
Also, there's lots of other evidence (besides the regression range) that this bug is caused by building on OS X 10.7.

See comment #17 and comment #19.  Also note that, even in current code, builds made on OS X 10.6 never have this bug, while builds made on 10.7 always do.

If you want to challenge the accuracy of what I just said, find 1) a way to build on 10.6 that *does* trigger this bug, or 2) a way to build on 10.7 that *doesn't* trigger this bug.  That would actually advance the discussion.
Comment 28 Steven Michaud [:smichaud] (Retired) 2012-05-20 14:43:18 PDT
*** Bug 756901 has been marked as a duplicate of this bug. ***
Comment 29 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-22 06:49:38 PDT
Hi Steven,
are you working on this? Do you want help debugging what is the difference the causes the crash?
Comment 30 Steven Michaud [:smichaud] (Retired) 2012-05-22 07:44:41 PDT
Hi Rafael.

I've put this one off for a week or two, since it can't get into a release (as long as we don't build releases on Lion).

Any help you can give me would be greatly appreciated!
Comment 31 Steven Michaud [:smichaud] (Retired) 2012-05-22 08:03:09 PDT
(Following up comment #30)

And (of course) if you find a fix, just take the bug yourself.
Comment 32 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 05:02:21 PDT
I was able to reproduce this building the same firefox rev with the same clang rev and sdk on 10.7 and 10.6.  I am debugging it.
Comment 33 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 06:04:32 PDT
The problem is in the linker (or the plugin expectations on its output). Linking just the firefox binary with ld64-97.17 makes the crash stop. Linking again with ld64-128.2 causses it to crash again.

Now debugging what is the difference in the output..
Comment 34 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 09:52:39 PDT
With lldb I was able to get the backtrace:

    frame #0: 0x1479f8f4
    frame #1: 0x2c1dac98 coreclr`GetCLRRuntimeHost + 26328
    frame #2: 0x2c2b2793 coreclr`GetCLRRuntimeHost + 909779
    frame #3: 0x2c2b2f5e coreclr`GetCLRRuntimeHost + 911774
    frame #4: 0x2c16b39f coreclr`MetaDataGetDispenser + 110543
    frame #5: 0x2c320ab5 coreclr`GetCLRRuntimeHost + 1361141
    frame #6: 0x2c137553 coreclr`PAL_InitializeCoreCLR + 58243
    frame #7: 0x2c31fb55 coreclr`GetCLRRuntimeHost + 1357205
    frame #8: 0x2c31fdfb coreclr`GetCLRRuntimeHost + 1357883
    frame #9: 0x2c1d3000 coreclr`MetaDataGetDispenser + 535600
    frame #10: 0x2c1d4777 coreclr`GetCLRRuntimeHost + 439
    frame #11: 0x19791453 agcore`PackagingCrc32 + 293795
    frame #12: 0x1978fc9a agcore`PackagingCrc32 + 287722
    frame #13: 0x18eb4009 agcore`DrmException_GetErrorDataFromHResult + 1100137
    frame #14: 0x1973b6db agcore`LocalMessageReceive + 550731
    frame #15: 0x194d6b01 agcore`StylusPointCollection_InsertItem + 318257
    frame #16: 0x194d1b50 agcore`StylusPointCollection_InsertItem + 297856
    frame #17: 0x194bacc4 agcore`StylusPointCollection_InsertItem + 204020
    frame #18: 0x194baee9 agcore`StylusPointCollection_InsertItem + 204569
    frame #19: 0x9ba300ce CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 30
    frame #20: 0x9ba3000d CoreFoundation`__CFRunLoopDoObservers + 413
    frame #21: 0x9ba02207 CoreFoundation`CFRunLoopRunSpecific + 375
    frame #22: 0x9ba02088 CoreFoundation`CFRunLoopRunInMode + 120
    frame #23: 0x9b44f723 HIToolbox`RunCurrentEventLoopInMode + 318
    frame #24: 0x9b456a8b HIToolbox`ReceiveNextEventCommon + 381
    frame #25: 0x9b4568fa HIToolbox`BlockUntilNextEventMatchingListInMode + 88
    frame #26: 0x9110c0d8 AppKit`_DPSNextEvent + 678
    frame #27: 0x9110b942 AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113
    frame #28: 0x02f2c371 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 145 at nsAppShell.mm:176
    frame #29: 0x91107cb1 AppKit`-[NSApplication run] + 911
    frame #30: 0x02f2eadc XUL`nsAppShell::Run() + 236 at nsAppShell.mm:764
    frame #31: 0x02b51ecd XUL`nsAppStartup::Run() + 189 at nsAppStartup.cpp:256
    frame #32: 0x010128ca XUL`XREMain::XRE_mainRun() + 5898 at nsAppRunner.cpp:3754
    frame #33: 0x0101318b XUL`XREMain::XRE_main(int, char**, nsXREAppData const*) + 843 at nsAppRunner.cpp:3831
    frame #34: 0x010135cd XUL`XRE_main + 77 at nsAppRunner.cpp:3907
    frame #35: 0x000032c5 firefox`_ZL7do_mainiPPc + 1061 at nsBrowserApp.cpp:157
    frame #36: 0x00002c3c firefox`main + 764 at nsBrowserApp.cpp:244

coreclr is /Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/CoreCLR.bundle/Contents/MacOS/coreclr

Using the code in http://macfuse.googlecode.com/svn/trunk/filesystems/procfs/procfs.cc to look at the address maps I found:

14700000-14800000     1024K rw-/rwx        COPY      -    DEFAULT uwir=0 sub=0

so coreclr is jumping into non executable memory. The code doing the jump looks like

000dac90        movl    0x0065f1a6(%ebx),%eax
000dac96        call    (%eax)

with ebx being set by

000dab69        calll   0x000dab6e
000dab6e        popl    %ebx

So the address being called is being loaded from (relative to the load address is) 0x000dab6e +  0x0065f1a6 =0x739d14. This is in

Section
  sectname __pointers
   segname __IMPORT
      addr 0x00739000
      size 0x0000136c

otool -Iv says

0x00739d14 LOCAL

I am not sure what LOCAL means, but that address has the value  0x65fb90.

I will probably try building the linker or at least reading the code to figure out what this is.
Comment 35 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 10:54:27 PDT
0x65fb90 is just an offset from the load base. In the case where we crash, the load address of coreclr + 0x65fb90 has the value 0x1479f8f4.

The strange thing is that attaching lldb to a running firefox liked with the old linker shows similar values.  coreclr + 0x65fb90 has an address inside of an non executable area that is 1MB long. Looks like the difference is just not getting to this point..
Comment 36 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 15:26:34 PDT
I found the problem, will add a patch in a moment.
Comment 37 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-23 18:40:31 PDT
Created attachment 626663 [details] [diff] [review]
Pass -allow_heap_execute to the linker

I need some help with autoconf. This patch should fix the bug, but I get the strange warning:

Unknown variable:browser/app/Makefile:9:MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@

even when the variable is substituted with -Wl,-allow_heap_execute!

https://tbpl.mozilla.org/?tree=Try&rev=ef93de207ed4
Comment 38 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 05:02:14 PDT
The pure 32 bit dmg in

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/respindola@mozilla.com-ef93de207ed4/try-macosx-debug/firefox-15.0a1.en-US.mac.dmg

now works.  The universal dmg stil has problems. I guess there is another binary that also needs to be linked with -Wl,-allow_heap_execute.
Comment 39 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 05:06:23 PDT
Starting the universal firefox with

arch -i386 ./FirefoxNightly.app/Contents/MacOS/firefox http://www.vectorlight.net/games/sandmania.aspx

also works now, so it is probably just the out of process plugin loader that is missing the flag now.
Comment 40 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 07:05:29 PDT
The warning is from acoutput-fast.pl. Not sure what the correct thing to do is. I can move the variable to a .mk file and include that from browser/app/Makefile just to avoid the warning, but I am not sure it is worth it.
Comment 41 Ted Mielczarek [:ted.mielczarek] 2012-05-24 09:26:23 PDT
Comment on attachment 626663 [details] [diff] [review]
Pass -allow_heap_execute to the linker

Review of attachment 626663 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/app/Makefile.in
@@ +5,5 @@
>  DEPTH		= ../..
>  topsrcdir	= @top_srcdir@
>  srcdir		= @srcdir@
>  VPATH		= @srcdir@
> +MOZ_ALLOW_HEAP_EXECUTE_FLAGS = @MOZ_ALLOW_HEAP_EXECUTE_FLAGS@

Generally we put these in config/autoconf.mk.in. That will make it easy for other apps to use it as well.
Comment 42 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 11:00:44 PDT
Created attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker

The attached patch uses autoconf.mk.in.

Talking with Ehsan and BenWa I also changed it to pass the flag only to the plugin wrapper. The logic is

* When running on 10.5 and 10.6 the kernel will ignore the bit.
* When running on >= 10.7, we can support silverlight only via the out of process wrapper.

https://tbpl.mozilla.org/?tree=Try&rev=2d90637534bc
Comment 43 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 12:06:00 PDT
The build 

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/respindola@mozilla.com-2d90637534bc/try-macosx64/firefox-15.0a1.en-US.mac.dmg

works just fine for me.  Steven Michaud, can you check that it works on 10.8 too?
Comment 44 Steven Michaud [:smichaud] (Retired) 2012-05-24 12:14:19 PDT
> Steven Michaud, can you check that it works on 10.8 too?

It works fine in 64-bit mode (where the 32-bit Silverlight plugin is run out-of-process).

It still crashes in 32-bit mode -- but it sounds like that's deliberate.
Comment 45 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 12:15:43 PDT
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=58bfdaaba99c
Comment 46 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 12:17:08 PDT
> It still crashes in 32-bit mode -- but it sounds like that's deliberate.

Correct, as that lets the main binary be linked without  -Wl,-allow_heap_execute. Let me know if you think this is the wrong tradeoff.
Comment 47 Steven Michaud [:smichaud] (Retired) 2012-05-24 12:34:49 PDT
> Correct, as that lets the main binary be linked without -Wl,-allow_heap_execute.
> Let me know if you think this is the wrong tradeoff.

Though I know we don't support using plugins in 32-bit mode or in-process, I don't like the idea of crashing when Silverlight is run in process.

Why did you choose not to link the main binary with -allow_heap_execute?  Did you think this would degrade security?

I'm not sure the additional security outweighs the disadvantages.  Do you know whether Safari and Chrome allow heap execution in the main process?
Comment 48 Steven Michaud [:smichaud] (Retired) 2012-05-24 12:35:20 PDT
> Do you know whether Safari and Chrome allow heap execution in the main process?

In non-plugin processes?
Comment 49 Steven Michaud [:smichaud] (Retired) 2012-05-24 12:41:41 PDT
I suppose we should bring security people in on this issue.

Dan, what do you think of the issues raised in comment #46 and comment #47?  Do you think it's worthwhile to prevent the main binary from executing code on the heap if this means that the Silverlight plugin will always crash when run in-process?

As I understand it, we only have the option of preventing a binary from executing heap code when it's compiled on OS X 10.7 and up (with clang and friends).  Mozilla-central and aurora builds are currently made on 10.7, while beta and release builds are made on OS X 10.6.
Comment 50 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 13:01:26 PDT
(In reply to Steven Michaud from comment #48)
> > Do you know whether Safari and Chrome allow heap execution in the main process?
> 
> In non-plugin processes?

For chrome I get

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome : heap execution not allowed
/Applications/Google\ Chrome.app/Contents/Versions/21.0.1145.0/Google\ Chrome\ Helper.app/Contents/MacOS/Google\ Chrome\ Helper : heap execution not allowed
/Applications/Google Chrome.app/Contents/Versions/21.0.1145.0/Google Chrome Helper EH.app/Contents/MacOS/Google Chrome Helper EH : heap execution allowed.

It looks like it is the last one that actually loads the silverlight plugin.  We should now be in a similar position to chrome, but one advantage they have is using out of process plugins with 32 bit binaries (we only do it with 64 bits).

For safari I get:

/Applications/Safari.app/Contents/MacOS/Safari: heap execution allowed.
/System/Library/StagedFrameworks/Safari/WebKit2.framework/WebProcess.app/Contents/MacOS/WebProcess: heap execution allowed.
/System/Library/StagedFrameworks/Safari/WebKit2.framework/PluginProcess.app/Contents/MacOS/PluginProcess: heap execution allowed.

so they don't have the bit set in any of the executables, but like chrome they still use an out of process plugin when run as 32 bits.

BTW, why do we run the plugins in process when running in 32 bits?
Comment 51 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 13:03:07 PDT
> As I understand it, we only have the option of preventing a binary from
> executing heap code when it's compiled on OS X 10.7 and up (with clang and
> friends).

Close, it is actually the linker. The ones in xcode 4.1 and newer set the bit (and have the -allow_heap_execute option if the user doesn't want it).
Comment 52 Steven Michaud [:smichaud] (Retired) 2012-05-24 13:08:48 PDT
> BTW, why do we run the plugins in process when running in 32 bits?

If I remember right, it's because we never found a way to run QuickDraw or Carbon event mode plugins out-of-process.
Comment 53 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-24 13:10:22 PDT
btw, the chrome sources have interesting comments about it in

http://src.chromium.org/svn/trunk/src/build/mac/change_mach_o_flags.py
Comment 54 Steven Michaud [:smichaud] (Retired) 2012-05-24 13:20:37 PDT
> Do you think it's worthwhile to prevent the main binary from
> executing code on the heap if this means that the Silverlight plugin
> will always crash when run in-process?

This is a tough question -- tougher than I first realized.

Many "viruses" count on being able to run code on the stack or the
heap.  So preventing heap execution is worthwhile.

We currently only support this on the trunk and aurora branches (not
in our release builds), and then only when running on OS X 10.7 and
up.  But clearly we're moving in that direction for *all* our builds
(if only because it's hard to support doing builds on old hardware).

So I guess I've changed my mind:  I now agree that it's	worthwhile
preventing heap execution in the main process, even if that means
Silverlight can't be run in-process.
Comment 55 Steven Michaud [:smichaud] (Retired) 2012-05-24 13:28:33 PDT
On the other hand even the plugin-container process runs with the same level of user privileges as the main process, so a "virus" executing in the plugin-container process can still do a lot of damage.

But I'm still comfortable with the decision to not allow Silverlight to run in-process.  As far as I know it's the only plugin that tries to execute code on the heap, which isn't exactly kosher.

So lets keep things the way they are, at least for the time being.
Comment 56 Ed Morley [:emorley] 2012-05-25 08:31:02 PDT
https://hg.mozilla.org/mozilla-central/rev/58bfdaaba99c
Comment 57 Scoobidiver (away) 2012-05-26 13:32:10 PDT
There are crashes after the patch landed: bp-d1bf9af1-21cc-4a1e-a137-656112120526.
Comment 58 Steven Michaud [:smichaud] (Retired) 2012-05-26 13:47:12 PDT
(In reply to comment #57)

See comment #42 and following.

It was decided to only fix these crashes in the plugin process, and not when Silverlight is running in the main process.  Preventing execution of code on the heap is a significant security gain, and the Silverlight plugin really shouldn't be running code on the heap.

Note that the Silverlight plugin is 32-bit-only on OS X.
Comment 59 Steven Michaud [:smichaud] (Retired) 2012-05-26 13:51:11 PDT
Under the circumstances, we might want to consider making Silverlight default to running out-of-process even in 32-bit mode on OS X 10.6 and up (as the Flash plugin does).
Comment 60 Scoobidiver (away) 2012-05-26 23:21:14 PDT
I filed bug 758931 for remaining crashes. Feel free to close it as WONTFIX or use it for the OOP Silverlight 32-bit mode.
You can also rename this one to make clear it doesn't fix crashes.
Comment 61 Steven Michaud [:smichaud] (Retired) 2012-05-29 09:20:37 PDT
(Following up comment #59)

I've opened bug 759364.
Comment 62 Marcia Knous [:marcia - use ni] 2012-06-12 14:11:41 PDT
Steven: We are seeing crashes in the first Firefox 14 beta in this stack, although the flag shows version 14 as unaffected. Right now I see around ~600 crashes in similar stacks in the last week.
Comment 63 Steven Michaud [:smichaud] (Retired) 2012-06-12 14:19:45 PDT
Odd.  You sure it's the same crash?

These crashes only happen with builds made on OS X 10.7 (and then only with the Silverlight plugin, and then only in 32-bit mode).  I didn't think betas were built on 10.7.
Comment 64 Steven Michaud [:smichaud] (Retired) 2012-06-12 14:25:41 PDT
But I also see a bunch of these crashes for "14.0b6".

Is that a "real" beta?  Is it (unlike other betas) built on OS X 10.7?
Comment 65 Steven Michaud [:smichaud] (Retired) 2012-06-12 14:32:48 PDT
> the flag shows version 14 as unaffected

The reason I set the flag that way is that this bug will never get into the FF 14 release (presuming it will be built on OS X 10.6 and not on OS X 10.7).

The same is probably true of FF 15, and of at least several future FF versions.  But we probably can't hold off building releases on 10.7 (or above) forever.
Comment 66 :Ehsan Akhgari 2012-06-12 16:46:37 PDT
This is a build log from the beta branch, and it _is_ being built on 10.7:

https://tbpl.mozilla.org/php/getParsedLog.php?id=12601857&tree=Mozilla-Beta&full=1
Comment 67 Marcia Knous [:marcia - use ni] 2012-06-12 17:06:42 PDT
14.0b6 = 14.0b1 - the naming convention has to do with aligning with the mobile landscape. So yes, this is the first beta in the 14 cycle.

(In reply to Steven Michaud from comment #64)
> But I also see a bunch of these crashes for "14.0b6".
> 
> Is that a "real" beta?  Is it (unlike other betas) built on OS X 10.7?
Comment 68 Steven Michaud [:smichaud] (Retired) 2012-06-13 08:37:04 PDT
Comment on attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Starting to do beta builds on 10.7.
User impact if declined: Large numbers of crashes in our 14-branch betas.
Testing completed (on m-c, etc.): Currently on trunk and aurora, with no reported problems
Risk to taking this patch (and alternatives if risky): Minimal risk
String or UUID changes made by this patch: none
Comment 69 Steven Michaud [:smichaud] (Retired) 2012-06-13 08:41:42 PDT
See bug 764385.
Comment 70 Alex Keybl [:akeybl] 2012-06-14 08:51:18 PDT
Comment on attachment 626881 [details] [diff] [review]
Pass -allow_heap_execute to the linker

[Triage Comment]
Low risk, startup crash fix. Approved.
Comment 72 Scoobidiver (away) 2012-06-16 05:54:45 PDT
*** Bug 764385 has been marked as a duplicate of this bug. ***
Comment 73 Mihaela Velimiroviciu (:mihaelav) 2012-08-17 06:18:49 PDT
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20100101 Firefox/15.0

Verified the fix using STR from comment #3 on latest Firefox 15.0beta5 (built on bld-lion-r5-055 machine): no crash occured. 
Also, there are no crash reports with this signature on Firefox 15 builds in the past 4 weeks.

Note You need to log in before you can comment on or make changes to this bug.