WebInitForCarbon workaround is problematic for xulrunner embedding consumers

RESOLVED FIXED

Status

()

RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: jaas, Assigned: smichaud)

Tracking

Trunk
All
Mac OS X
Points:
---

Firefox Tracking Flags

(status1.9.2 .5-fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=509130#c51

we are embedding xulrunner into a carbon application, we used to
call WebInitForCarbon in the very first begining because some other components
in our application need to call WebKit functions.

After we upgrade to xulrunner 1.9.1.3, our applicaion keeps crash in 
[@objc_msgSend | IdleTimerVector ]. But we can't workaround this problem as you
did for this bug, because we can't make sure xulrunner was the first loaded
component...

I tried to remove WebInitForCarbon call in our application, the WebKit
functions stop work in this case..Is there any other workaround for this bug
for embedders? After all, WebInitForCarbon could be called by anyone at anytime
if xulruner is just a embedded component...
(Reporter)

Updated

9 years ago
Blocks: 509130
(Assignee)

Comment 1

9 years ago
I think there must be one or more autorelease pools on the stack
before the embedded xulrunner runs -- this must (I think) be the
reason you see crashes.

My patch for bug 533001 might fix your problem.  Please try the
tryserver build mentioned at bug 533001 comment #9, and let us know
your results.

To take my analysis any further, though, I'll need to know more about
your application (the one that embeds xulrunner).  Please post a link
to the source code and binaries.

Comment 2

9 years ago
hi Steven, our application is based on SWT Widgets(http://www.eclipse.org/swt/). On Mac platform, SWT has 2 kinds of browser widgets: xulrunner-based and Webkit-based, and SWT will call WebInitForCarbon() when Webkit-based browser widget is created.

These 2 kinds of browser widgets are used in different parts of our applications. Which kind of browser will be created first will totally depends on application users.

I tried your new patch for 533001. It disables PoolCleaner(and avoid crashes) if xulrunner-based browser is created before webkit-based browser. But if webkit-based browser is the first loaded component, PoolCleaner will still be enabled.

This is the call stack if xulrunner-based browser is created *after* webkit-based browser:

Stack1:

Breakpoint 1, 0x96950ad6 in PoolCleaner ()
(gdb) bt
#0  0x96950ad6 in PoolCleaner ()
#1  0x96c68fc1 in _ResetAllIdleTimers ()
#2  0x96c30ef0 in ConvertPlatformEventRecordAndPostWithOptions ()
#3  0x96c30826 in PullEventsFromWindowServerOnConnection ()
#4  0x916acff5 in __CFMachPortPerform ()
#5  0x916d16b8 in CFRunLoopRunSpecific ()
#6  0x916d1aa8 in CFRunLoopRunInMode ()
#7  0x96c572ac in RunCurrentEventLoopInMode ()
#8  0x96c56ffe in ReceiveNextEventCommon ()
#9  0x96c56f39 in BlockUntilNextEventMatchingListInMode ()
#10 0x90dfe6d5 in _DPSNextEvent ()
#11 0x90dfdf88 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#12 0x1fc6fd1b in nsAppShell::ProcessNextNativeEvent (this=0x1e7d1be0, aMayWait=0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/cocoa/nsAppShell.mm:634
#13 0x1fcb59a7 in nsBaseAppShell::DoProcessNextNativeEvent (this=0x1e7d1be0, mayWait=0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/xpwidgets/nsBaseAppShell.cpp:151
#14 0x1fcb5ff6 in nsBaseAppShell::OnProcessNextEvent (this=0x1e7d1be0, thr=0x1dcd3ad0, mayWait=0, recursionDepth=0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/xpwidgets/nsBaseAppShell.cpp:278
#15 0x1fc6e694 in nsAppShell::OnProcessNextEvent (this=0x1e7d1be0, aThread=0x1dcd3ad0, aMayWait=0, aRecursionDepth=0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/cocoa/nsAppShell.mm:804
#16 0x1fda0958 in nsThread::ProcessNextEvent (this=0x1dcd3ad0, mayWait=0, result=0xbfffc9d4) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/xpcom/threads/nsThread.cpp:497
#17 0x1fd37048 in NS_ProcessPendingEvents_P (thread=0x1dcd3ad0, timeout=20) at nsThreadUtils.cpp:180
#18 0x1fcb5913 in nsBaseAppShell::NativeEventCallback (this=0x1e7d1be0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/xpwidgets/nsBaseAppShell.cpp:121
#19 0x1fc710b6 in nsAppShell::ProcessGeckoEvents (aInfo=0x1e7d1be0) at /Volumes/M/XULRunner1913DebugBuild/other/mozilla/mac/mozilla-1.9.1/widget/src/cocoa/nsAppShell.mm:411
#20 0x916d13c5 in CFRunLoopRunSpecific ()
#21 0x916d1aa8 in CFRunLoopRunInMode ()
#22 0x1b8dc31a in Java_org_eclipse_swt_internal_carbon_OS_CFRunLoopRunInMode ()
....

Stack2:
Breakpoint 1, 0x96950ad6 in PoolCleaner ()
(gdb) bt
#0  0x96950ad6 in PoolCleaner ()
#1  0x96cc559c in IdleTimerVector ()
#2  0x916d18f5 in CFRunLoopRunSpecific ()
#3  0x916d1aa8 in CFRunLoopRunInMode ()
#4  0x1b8dc31a in Java_org_eclipse_swt_internal_carbon_OS_CFRunLoopRunInMode ()
....

Comment 3

9 years ago
I also saw another crash in XULRunner, is it also related to PoolCleaner?

Thread 0 Crashed:
0   libobjc.A.dylib               	0x93e8f688 objc_msgSend + 24
1   XUL                           	0x20094b92 nsThread::ProcessNextEvent(int, int*) + 838
2   XUL                           	0x2002b0dc NS_ProcessPendingEvents_P(nsIThread*, unsigned int) + 146
3   XUL                           	0x1ffa99a7 nsBaseAppShell::NativeEventCallback() + 181
4   XUL                           	0x1ff65072 nsAppShell::ProcessGeckoEvents(void*) + 518
5   com.apple.CoreFoundation      	0x916d13c5 CFRunLoopRunSpecific + 3141
6   com.apple.CoreFoundation      	0x916d1aa8 CFRunLoopRunInMode + 88
7   com.apple.HIToolbox           	0x96c572ac RunCurrentEventLoopInMode + 283
8   com.apple.HIToolbox           	0x96c56ffe ReceiveNextEventCommon + 175
9   com.apple.HIToolbox           	0x96d9a377 ReceiveNextEvent + 58
10  libswt-pi-carbon-3555.jnilib  	0x1b98eb3a Java_org_eclipse_swt_internal_carbon_OS_ReceiveNextEvent + 188
..
(Assignee)

Comment 4

9 years ago
(In reply to comment #3)

> I also saw another crash in XULRunner, is it also related to
> PoolCleaner?

Probably not (since IdleTimerVector isn't on the stack).

By the way, does XULRunner support Breakpad, at least in principle?

If it does, you should see its dialog when XULRunner crashes outside
of gdb (asking you to report the crash).  (Though even if XULRunner
does normally support Breakpad, SWT might have turned this off.)
(Assignee)

Comment 5

9 years ago
(In reply to comment #2)

Hi, peina.

> hi Steven, our application is based on SWT
> Widgets(http://www.eclipse.org/swt/). On Mac platform, SWT has 2
> kinds of browser widgets: xulrunner-based and Webkit-based, and SWT
> will call WebInitForCarbon() when Webkit-based browser widget is
> created.
>
> These 2 kinds of browser widgets are used in different parts of our
> applications. Which kind of browser will be created first will
> totally depends on application users.

Thanks for the additional information.  Now I understand your
situation much better.  But I'm not sure there's anything we can do
about it :-(

I can only hook public ("external") functions lazily loaded by another
module (like WebInitForCarbon()).  So I can't hook PoolCleaner().
And, if hooking WebInitForCarbon() doesn't "work" (because it must be
hooked before the first call to WebInitForCarbon()), there doesn't
seem to be any other reasonable way to disable PoolCleaner().

All I can suggest is that you find a way to force-load the
xulrunner-based browser widget whenever your app is initialized.

Comment 6

9 years ago
We turned off crash reporter when building XULRunner, because it's a embedded component of our application.

(In reply to comment #4)
> (In reply to comment #3)
> 
> > I also saw another crash in XULRunner, is it also related to
> > PoolCleaner?
> 
> Probably not (since IdleTimerVector isn't on the stack).
> 
> By the way, does XULRunner support Breakpad, at least in principle?
> 
> If it does, you should see its dialog when XULRunner crashes outside
> of gdb (asking you to report the crash).  (Though even if XULRunner
> does normally support Breakpad, SWT might have turned this off.)

Comment 7

9 years ago
Another problem: Sometimes the whole application crashes when initing xulrunner-browser with the following stack:

Thread 0 Crashed:
0   libSystem.B.dylib             	0xffff00c4 __compare_and_swap64 + 4 (cpu_capabilities.h:226)
1   dyld                          	0x8fe176ce ImageLoaderMachO::doBindLazySymbol(unsigned long*, ImageLoader::LinkContext const&) + 430
2   dyld                          	0x8fe06ed5 dyld::bindLazySymbol(mach_header const*, unsigned long*) + 149
3   dyld                          	0x8fe18c2f stub_binding_helper_interface2 + 21
4   ...wt-cocoa-carbon-3555.jnilib	0x1727f202 Java_org_eclipse_swt_internal_cocoa_Cocoa_HICocoaViewCreate + 102

...

I'm pretty sure it's caused by the new patch..do you have any idea of this crash?(Java_org_eclipse_swt_internal_cocoa_Cocoa_HICocoaViewCreate is a JNI call which calls HICocoaViewCreate method(Creates a Carbon view that serves as a wrapper for a Cocoa view)
(Assignee)

Comment 8

9 years ago
> I'm pretty sure it's caused by the new patch.

Which one?  My patch for bug 509130 or my proposed patch for bug
533001?

If it's the patch for bug 533001, I'd really like to try and debug the
crash.  But for that I need to be able to reproduce it -- basically, I
need a copy of your application, plus whatever version of SWT you're
using, plus steps to reproduce the crash.

> do you have any idea of this crash?

No, I've never seen anything like it before.  And it doesn't appear
it's ever been reported in bugzilla (I searched on
'compare_and_swap64' and 'doBindLazySymbol').

There's also nothing at http://crash-stats.mozilla.com/.  But there
wouldn't be any SWT-specific reports there, since you guys have turned
off Breakpad.

Comment 9

9 years ago
The crash happens after I patched xulrunner with patch 533001.

I will try to write a sample app to reproduce it because our app is too huge.
Thanks!

Comment 11

9 years ago
Hi Steven, I sent the sample application to you by mail.

I didn't contain xulrunner binary into this application, so you will need to copy xulrunner into /Application/xulrunner to make it work:

/Applications
    |--------------xulrunner
                       |---------chrome
                       |---------components
	               |---------defaults
                       ....

The sample app works well with xulrunner if it didn't contain any patch, but will crash at startup if I patched xulrunner with patch 533001. And it will only crash if WebInitForCarbon is called *before* XPCOMGlueStartup:

1.Crash if I write code like this:	
                        System.setProperty("org.eclipse.swt.browser.XULRunnerPath","/Applications/xulrunner"); //$NON-NLS-1$\
		Cocoa.WebInitForCarbon();
		Browser browser = new Browser(parent, SWT.MOZILLA);
		browser.setUrl("http://mozilla.org");

2.Not crash if I write code like this:
		System.setProperty("org.eclipse.swt.browser.XULRunnerPath","/Applications/xulrunner"); //$NON-NLS-1$\
		Browser browser = new Browser(parent, SWT.MOZILLA);
		Cocoa.WebInitForCarbon();
		browser.setUrl("http://mozilla.org");

Comment 12

9 years ago
BTW, my test env is Mac OS X 10.5.8. not on 10.4.
Peina:

I'm able to reproduce your crash.  For those not too familiar with
XULRunner (which inludes myself), here's a detailed description of how
I did it.  I tested on OS X 10.5.8.

1) Built XULRunner from current 1.9.1-branch source, after having
   applied a variant of my patch for bug 533001.

   To make it build correctly, I found I had to add an 'extern "C"'
   wrapper to the line that includes getsect.h (in nsToolkit.mm):

   extern "C" {
   #include <mach-o/getsect.h>
   }

2) Added a soft link named 'xulrunner' in the /Applications directory
   to my build's objdir/dist/XUL.framework/Versions/Current directory:

   $ cd /Applications
   $ sudo ln -s /path/to/objdir/dist/XUL.framework/Versions/Current xulrunner

3) Loaded your Mozilla.Crash test application into gdb:

   gdb /path/to/Mozilla.Crash/Eclipse.app/Contents/MacOS/eclipse

3) Ran Eclipse.app:

   (gdb) run

I'm going to be trying to debug the crash ... but probably not right
away.  I've got other, more pressing things to attend to.
I'm still not entirely sure what bug(s) you're reporting.  Here's my
current best guess.  Please tell me where it's accurate or inaccurate.

1) With XULRunner 1.9.1.3 you've started to see crashes at
   [@objc_msgSend | IdleTimerVector ].

   (Note that XULRunner 1.9.1.3 doesn't contain my patch for bug
   509130 or (of course) my patch for bug 533001.)

2) When you add my patch for bug 509130 to 1.9.1-branch XULRunner (or
   if you use 1.9.2-branch or trunk XULRunner, which already contain
   my patch for bug 509130), you have other problems.

   It's not at all clear what those problems are.

3) When you add my proposed patch for bug 533001 to XULRunner
   (1.9.1-branch, 1.9.2-branch, or trunk), you sometimes get the crash
   reported in comment #7 -- if you call WebInitForCarbon() before you
   load XULRunner.

4) Even when you don't crash using my proposed patch for bug 533001,
   the patch only prevents crashes [@objc_msgSend | IdleTimerVector ]
   if you load XULRunner before you call WebInitForCarbon() (i.e. if
   the xulrunner-based browser is created before the webkit-based
   browser).
Here's another, much simpler way to put my questions from comment #14:

Is my patch for bug 533001 (aside from its crash) any better at
solving your problem than my patch for bug 509130?

Comment 16

9 years ago
When a SWT Mozilla Browser is created, SWT will do the following things:

(scenario A)
1. Call XPCOMGlueStartup
2. Call WebInitForCarbon 
3. Do other things to init Mozilla Browser(nsAppShell::init was called here)

SWT will also call WebInitForCarbon when a SWT Webkit Browser is created, so if we create a Webkit browser before Mozilla Browser, the scenario will be:

(scenario B)
1.Call WebInitForCarbon
2.Call XPCOMGlueStartup
3.Do other things to init Mozilla Browser(nsAppShell::init was called here)

Patch for bug 509130 works for none of these 2 scenarios. Because WebInitForCarbon is called by SWT(step 2 in scenario A, step 1 in scenario B) *Before* called by patch 509130.

Patch for bug 533001 works for scenario A, because when WebInitForCarbon is called by SWT, XPCOMGlueStartup have already hooked it up.  But it will cause a constant crash for scenario B(As my sample demonstrates).

So, patch for 533001 is better than 509130 in resolving my problem. What I want is to get a improved version of patch 533001 which will not cause *constant* crash in scenario B...
Thanks!

So you still get crashes in IdleTimerVector with the patch for bug
509130, in both scenario A and scenario B?

I'll be working to revise my patch for bug 533001 so that it no longer
causes your crash from comment #7 in scenario B -- because otherwise I
won't consider my patch safe to land.  But I don't think I'm going to
be able to stop the IdleTimerVector crashes from happening in scenario
B (because in that scenario your code calls WebInitForCarbon() before
my code has had a chance to hook it).

So, even with my revised patch for bug 533001, you'll still need to
prevent scenario B from happening.

Comment 18

9 years ago
The patch for bug 509130 stops crash in IdleTimerVector. Didn't try scenario B because it crashes at the very first beginning..

Comment 19

9 years ago
(In reply to comment #18)
> The patch for bug 509130 stops crash in IdleTimerVector. Didn't try scenario B
                ---------- should be 533001.
> because it crashes at the very first beginning..

Comment 20

9 years ago
Hi Steven, any progress?
> Hi Steven, any progress?

Not yet.

I'm going to try to find a day to work on this (this week or next)
... but I'm not sure that'll be enough.
Created attachment 424165 [details] [diff] [review]
1.9.1-branch version of rev2 patch for bug 533001

Peina:

I've finally had a chance to get back to this.  Furthermore I've found
and fixed the bug that was causing the dyld crashes (like the one you
reported in comment #7).

I've posted a new trunk patch at bug 533001 comment #10.  Here's a
1.9.1-branch version of that patch.

Please test with one or both of them, and let us know your results.

Thanks for your patience.  And thanks for your original report -- this
was a serious bug, and I'm very glad to have fixed it.

Comment 23

9 years ago
Hi Steven, thanks for your new patch! It works well on both 10.5 and 10.6 and no longer causes dyld crashes.
Steven, if we want to try to stop these IdleTimerVector and PoolCleaner crashes in Camino on branches where bug 509130 has already landed, we need what patches? Or is there no solution for cases where apps already have their own autorelease pool in place before Gecko's pools? I'm having trouble following the discussion here.
Smokey, you're right that the patch for bug 509130 probably won't work
in Camino (since Camino has already called NSApplicationMain() by the
time nsAppShell::Init() runs).  Though I haven't tested this.

To fix the IdleTimerVector crashes in Camino, you'll probably need my
patch for bug 533001.  Please try it out and let us know your results.

Open a new bug on the IdleTimerVector crashes in Camino, and make it
depend on bug 533001.  Or if the bug 509130 patch does fix the
crashes, make it depend on that bug.
Oops, I forgot the IdleTimerVector crashes aren't reproducible.

That leaves us in a real quandary -- we can't know whether or not the
patch for bug 509130 fixes the IdleTimerVector crashes in Camino until
the patch gets into a Camino release.  (The nightlies don't have
enough users to settle the question.)
I can regularly (but irreproducibly) trigger the crashes in a 1.9.2 build (which already has the patch in bug 509130) during my normal browsing, whereas I never, ever saw them in a 1.9.0 build.  (Until Monday I seemed to have had an old version of the DivX plug-in installed that something must have installed during the dark ages; I disabled it when I couldn't get my 1.9.2 build to launch without crashing, but something else is still causing the crashes during 1.9.2-based daily browsing.)

At any rate, I'll spin a new build with the patch for bug 533001 and see what happens, and I'll file a new bug to track these crashes with Camino.
This should be fixed by my patch for bug 533001, which landed on the trunk on
2010-04-05 and which just landed on the 1.9.2 branch (and so will appear in FF
3.6.5).
Status: NEW → RESOLVED
Last Resolved: 9 years ago
status1.9.2: --- → .5-fixed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.