Closed Bug 540640 Opened 16 years ago Closed 15 years ago

[10.5] Crashes [@ libclient.dylib@0x119b59] [@ libclient.dylib@0x1192e9] [@ libclient.dylib@0x119289] triggered by bad interaction between JEP and Silverlight -- Apple and Microsoft bugs

Categories

(Core Graveyard :: Plug-ins, defect)

All
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: smichaud, Assigned: smichaud)

References

Details

Attachments

(1 file)

Several different crashes in libclient.dylib happen only on OS X 10.5 and are 100% associated with the Java Embedding Plugin and Silverlight (agcore, coreclr). See recent interesting-modules files at http://people.mozilla.com/crash_analysis/. I can reproduce them using the STR from bug 532981 comment #22. Bug 532981's crashes happen only on 10.6. Both that bug and this one must have the same underlying cause -- probably an Apple bug. Here are the STR over again: 1) Start Firefox (3.0.X, 3.5.X, or 3.6) and visit the following URL: http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html (Close this window or keep it open, as you choose.) 2) Open a new window and visit http://silverlight.services.live.com/invoke/99030/ControlsDemo/iframe.html. (Close this window or keep it open, as you choose.) 3) Open a new window and visit http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html. At this point you should crash. Or you may need to reload this page.
Here's a Breakpad id for one of my crashes: bp-2c73caac-f1e6-4423-acdb-dd7b62100119
Assignee: nobody → smichaud
> See recent interesting-modules files at > http://people.mozilla.com/crash_analysis/. Only the FF 3.5.7 interesting-modules files are big enough (have enough crashes) for the libclient.dylib crashes to show up in them.
With patient digging and a bit of luck, I've figured out why these crashes (and those of bug 532981) happen, and how to work around them. It has to do with Mach exception handling. The situation is complex enough that it's difficult to know who to blame. Suffice it to say that there are bugs and/or design flaws in Apple's JVM, the Silverlight plugin, and even (at a fundamental level) in Mach exception handling itself. Applications that use Mach exception handling need (basically) to do two things -- 1) Tell the kernel which exceptions are "supported", and which Mach port to send its exception handling messages to; 2) Set up an exception handling thread that accepts Mach messages on the appropriate port, and handles them when they are sent (by the kernel, when it detects a "supported" exception). The calls to request that the kernel handle certain exceptions are per-task (task_swap_exception_ports()/task_set_exception_ports()) or per-thread (thread_swap_exception_ports()/thread_set_exception_ports()). But Mach exception handling is really designed for use by *applications* (not by plugins) -- the reason is that there's no way (that I can find) to specify multiple Mach ports (and multiple handlers) for the main thread. Mozilla browsers don't use Mach exception handling. And everything's fine if only one plugin uses it. But Apple's JVM (in the JEP and in Apple's plugins), the Silverlight plugin and the Flash plugin all use it. Furthermore, while the Flash plugin tries to be well-behaved, both Apple's JVM and the Silverlight plugin blithely assume that they're the only "applications" using Mach exception handling -- so both set the main thread's Mach exception port unconditionally (to different values), and leave it that way. (Secondary threads aren't a problem -- each plugin creates its own, and can configure their Mach exception handling as it sees fit.) So here's how this bug's crashes happen: Apple's JVM is loaded (by the JEP) and sets the main thread's Mach exception port to (say) 0x00001234. Then the Silverlight plugin gets loaded and changes the main thread's Mach exception port to (say) 0x00005678. Now, whenever a "supported" exception happens in the JVM on the main thread, the kernel tries to get it handled on the "wrong" Mach port -- which leads to a crash. You might wonder why nothing happens in the opposite case -- when Apple's JVM stomps on the Silverlight plugin's setting for the main thread's Mach exception port. I'll have more to say about this in the next comment.
As I said in comment #3, there are two steps to setting up Mach exception handling. Apple's JVM, the Flash plugin and the Silverlight plugin all take the first step (tell the kernel which exceptions are "supported", and which Mach port to send its exception handling messages to). And both Apple's JVM and the Flash plugin take the second step (set up an exception handling thread that accepts Mach messages on the appropriate port). But the Silverlight plugin never does this! The docs I can find on Mach exception handling don't say what's supposed to happen when you take "step 1" without "step 2". My best guess is that the kernel, if it failed to send a handling message on a "supported" exception, would simply crash the application. But why bother to set up Mach exception handling if you aren't really going to use it? The Silverlight plugin already uses C++ exception handling (__cxa_allocate_exception(), __cxa_throw(), __cxa_begin_catch(), __cxa_end_catch() and friends). So why should it also need Mach exception handling ... especially if it's not really using it? So Apple's JVM has (so far) one serious bug -- it uses Mach exception handling unconditionally, without regard to other plugins. But Silverlight has two -- it stomps on other plugins' Mach exception handling settings for the main thread, and it does this for no reason at all. The fact that the Silverlight plugin doesn't use Mach exception handling is probably the reason it doesn't crash (or otherwise malfunction) when Apple's JVM stomps on its Mach exception settings for the main thread.
But wait, there's more ... No crashes happen (with the STR from comment #0) on OS X 10.4.11, or using Java 1.4.2 (on OS X 10.5.X with older Apple JVMs). But these older Apple JVMs still use Mach exception handling, and still perform both "steps" (from comment #3) to set it up. What did Apple do to make the crashes start happening? It seems they changed how the "server" works on their Mach exception handling thread. When Apple's JVM started using the simple canned server method from libSystem.dylib called mach_msg_server() (see mach/mach.h) is when the crashes started happening. Previous Apple JVMs called mach_msg() from some kind of custom server, which somehow worked around the design flaw in Mach exception handling that triggers these crashes. For some reason, older Apple JVMs keep receiving error handling messages (from the kernel) at the "old" Mach exception port for the main thread, even after the Silverlight plugin has changed this setting! So are there accepted/recommended (if undocumented) ways to work around the design flaw in Mach exception handling on the main thread, and make it possible for several plugins that use Mach exception handling to peacefully cooexist even while stomping on each others' settings? But then why didn't Apple notice the problem and go back to using their custom Mach exception handling server? It's just as likely the previous "work around" was accidental, and is no longer possible. So (in my own workaround for this problem) I'm not going to look for this kind of solution. Neither am I going to try to make Apple's JVM well-behaved. I'm going to make the smallest possible changes to keep Apple's JVM happy when the Silverlight plugin stomps on its main-thread Mach exception handling settings.
Summary: [10.5] Crashes [@ libclient.dylib@0x119b59] [@ libclient.dylib@0x1192e9] [@ libclient.dylib@0x119289] triggered by bad interaction between JEP and Silverlight -- probable Apple bug → [10.5] Crashes [@ libclient.dylib@0x119b59] [@ libclient.dylib@0x1192e9] [@ libclient.dylib@0x119289] triggered by bad interaction between JEP and Silverlight -- Apple and Microsoft bugs
Here's the debugging patch I used to do much of the research behind my previous comments. Though it "fixes" the crashes, it's not a viable solution (the method it uses just happens to work (because Silverlight is the only plugin that uses thread_swap_exception_ports(), as opposed to thread_set_exception_ports())). My "fix" is merely an illustration (part of the evidence that supports my explanation of these bug's crashes). Here are the three plugins (Java, Silverlight and Flash) I used in my tests: http://java.sun.com/applets/jdk/1.4/demo/applets/Clock/example1.html http://gallery.expression.microsoft.com/en-us/CWAStyle (at http://silverlight.net/community/samples/silverlight-samples/) http://www.playercore.com/bugFiles/ime/imekrjp.swf Note that it's impossible to debug the Silverlight plugin in gdb -- the browser quickly freezes up, and main thread stack traces stop making sense. This happens in both Firefox and Safari. It's why I went to the trouble to use Apple's Symbolication framework to log stack traces in my patch. Here's a tryserver build: https://build.mozilla.org/tryserver-builds/smichaud@pobox.com-bugzilla540640-debug/bugzilla540640-debug-macosx.dmg
And here's yet more ... This bug's crashes only happen (on the main thread) when Apple's JVM handles very low-level exceptions -- like java.lang.NullPointerException. They don't happen with (for example) java.lang.ArrayOutOfBoundsException. So (practically speaking) I only need to worry about Java code that might run on the main thread and might try to use try/catch to handle a NullPointerException. There's very little of this in the JEP, and all of it is called via JNI. So I've fixed this bug (and bug 532981) (in my current version of the JEP (what will become JEP 0.9.7.3)) by wrapping four JNI calls in code that ensures the main thread's exception port is (re)set to whatever value Apple's JVM originally set it to (when JNI_CreateJavaVM() was called as the first Java applet was loaded). I'm not entirely sure why these crashes don't happen in Safari. But I strongly suspect it's because (by chance) Apple's Java plugin never calls Java code on the main thread that might try to handle a NullPointerException.
Depends on: 551327
I've just released a new version of the Java Embedding Plugin (0.9.7.3) that fixes this bug (by working around it). For more information see bug 551327.
JEP 0.9.7.3 has now landed on the 1.9.2 and 1.9.1 branches, and should be in tomorrow's Firefox 3.6.3pre and 3.1.10pre nightlies (at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/). Please test with them and let us know your results.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: