Closed Bug 509130 Opened 11 years ago Closed 10 years ago

Crashes [@objc_msgSend | IdleTimerVector ] (OS X) caused by WebKit used by Carbon plugin (DivXBrowserPlugin)

Categories

(Core :: Plug-ins, defect, P2)

x86
macOS
defect

Tracking

()

VERIFIED FIXED
Tracking Status
status1.9.2 --- beta3-fixed

People

(Reporter: smichaud, Assigned: smichaud)

References

Details

(Keywords: topcrash, Whiteboard: [crashkill] rdar://problem/7362667)

Crash Data

Attachments

(1 file, 1 obsolete file)

Currently, the #7 topcrasher on OS X has a stack whose top two levels
are objc_msgSend and IdleTimerVector.  The stacks are utterly
mystifying.  But I just checked the first five stacks in the list, and
I found that all contain a "DivXBrowserPlugin" module.

This plugin is presumably provided by DivX Labs
(http://labs.divx.com/).  I've never heard of it before, so it's
presumably fairly uncommon.  So it may be more than a cooincidence
that it appears in all crash-stack "modules" lists that I checked.
The module analysis at http://dbaron.org/log/20090922-crashes shows:

  objc_msgSend | IdleTimerVector (54 crashes)
    100% (54/54) vs.   9% (127/1364) libcurl.3.dylib
    100% (54/54) vs.  10% (130/1364) DivXBrowserPlugin
    100% (54/54) vs.  25% (339/1364) WebKit
    100% (54/54) vs.  25% (339/1364) WebCore
     87% (47/54) vs.  16% (217/1364) ApplePixletVideo
     87% (47/54) vs.  16% (218/1364) QuickTimeH264
    100% (54/54) vs.  32% (430/1364) JavaScriptCore
     87% (47/54) vs.  19% (258/1364) QuickTimeFireWireDV
    ...


which indeed suggests that DivXBrowserPlugin may be responsible.
A "DivX Decoder" or "DivX 6 Decoder" also appears to be implicated in
crashes whose top two levels are objc_msgSend and -[ToolbarWindow
sendEvent:]:

http://crash-stats.mozilla.com/report/list?platform=mac&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=objc_msgSend%20%7C%20-%5BToolbarWindow%20sendEvent%3A%5D
Summary: Crashes [@objc_msgSend | IdleTimerVector ] (OS X) possibly caused by DivXBrowserPlugin → Crashes [@objc_msgSend | IdleTimerVector ] and [@objc_msgSend | -[ToolbarWindow sendEvent:] ] (OS X) possibly caused by DivX plugin
blocking 1.9.2+.  P2.
Flags: blocking1.9.2+
Priority: -- → P2
Dbaron, could you perform your module analysis on the [@objc_msgSend | -[ToolbarWindow sendEvent:]] crash stack?
Could you also perform your analysis on [@objc_msgSend | -[NSWindow sendEvent:]]?
I've now looked further into the [@objc_msgSend | IdleTimerVector]
crash and it's not nearly as mysterious as I first thought.

IdleTimerVector() is an undocumented and non-public method of the
HIToolbox framework.  But there's a documented
InstallEventLoopIdleTimer() function, also in the HIToolbox framework
(http://developer.apple.com/legacy/mac/library/documentation/Carbon/Reference/Carbon_Event_Manager_Ref/Reference/reference.html#//apple_ref/doc/uid/TP30000135-CH1g-CJBJCEJJ),
use of which likely triggers calls to IdleTimerVector() from the OS.

We don't use InstallEventLoopIdleTimer() anywhere in the tree.  But
it's something a plugin might use.  And using it might cause crashes
in IdleTimerVector() if (say) a plugin instance called
InstallEventLoopIdleTimer() on creation, but didn't call
RemoveEventLoopTimer() on destruction.

Does anyone here know better than I do how to go about contacting DivX
developers?  Note that the DivX home page is http://labs.divx.com/.
Assignee: nobody → smichaud
Status: NEW → ASSIGNED
I'm also going to be contacting Divx due to bug 519353, I'll loop you in on that.
Duplicate of this bug: 519718
> I'm also going to be contacting Divx due to bug 519353, I'll loop
> you in on that.

Justin, any word from Divx?
Keywords: topcrash
I installed the DivX Web Player version 1.4.1.4 and visited http://demo.pandonetworks.com/acme/divx/ to try to play some videos, but so far no crash using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b2pre) Gecko/20091016 Namoroka/3.6b2pre.
I did get this crash on the trunk when closing down a tab that had the site in comment 10 loaded - http://crash-stats.mozilla.com/report/index/bp-c182296b-a7f3-416f-8dc8-4c1e52091016, but it is a slightly different stack.
It's an entirely different crash :-)  It's in a secondary thread (not the main thread as with the IdleTimerVector crashes).

But the crash *is* in DivX code.  And the thread it happened in probably was created by the DivX plugin.
(In reply to comment #9)

> Justin, any word from Divx?

Not yet, I just got a couple new contacts today from the Plugin Check guys, hopefully will get a response from there.
Whiteboard: [crashkill]
Depends on: 523900
Please update bug 523900 with the DivX contact info when we have it.
I downloaded "DivX 7 (for Mac)" from
http://www.divx.com/en/downloads/divx/mac and installed it (using the
resulting DivXInstaller.dmg download).  Then I visited the same site
Marcia mentioned in comment #10
(http://demo.pandonetworks.com/acme/divx/) and tried the first demo.

(If you do this yourself, you'll find that you have to click on a
button on the right side of the window to (as is says) install another
plugin.  You'll be warned that by installing the plugin you're joining
a "private P2P network".  What actually happens when you click the
button is that you load a signed Java applet, which (if you allow it
full access to your computer) downloads a bunch of stuff and installs
a PandoWebInst.plugin to your /Library/Internet Plug-Ins/ directory.
For all I know it may also do other things.  This whole business makes
me a little uncomfortable.  I did all this on a spare, disposable
partition (running OS X 10.5.8).  I suggest others do the same.)

I did this in a Namoroka build I'd made from recent code, running in
gdb.  I was able to confirm what I said in comment #6, and possibly
also find the bug that's causing these crashes:

DivXBrowserPlugin does call InstallEventLoopIdleTimer() on creation of
the first plugin instance.  Here's the top part of the stack from gdb:

#0  0x9669d7d1 in InstallEventLoopIdleTimer ()
#1  0x9561a5db in WebInitForCarbon ()
#2  0x1a0254b7 in getPluginClass ()
#3  0x19ff6336 in getPluginClass ()
#4  0x19fe11c5 in NPP_GetMIMEDescription ()
#5  0x19fe151f in NPP_GetMIMEDescription ()
#6  0x19fe4049 in Mozilla_NPP_New ()
#7  0x19fe4414 in NPP_New ()
#8  0x029c3a11 in nsNPAPIPluginInstance::InitializePlugin (this=0x174b3530)
...

Very strangely, InstallEventLoopIdleTimer() gets called with its
outTimer parameter set to NULL!  The function doesn't barf (for some
reason), but now the DivXBrowserPlugin has no record of the idle timer
it just installed.

When you close the window (and destroy the plugin instance), the
DivXBrowserPlugin does call RemoveEventLoopTimer():

#0  0x96645598 in RemoveEventLoopTimer ()
#1  0x19f8d68b in getPluginClass ()
#2  0x19f73a63 in NPP_GetMIMEDescription ()
#3  0x19f7707c in Mozilla_NPP_Destroy ()
#4  0x19f7742f in NPP_Destroy ()
#5  0x029c2cea in nsNPAPIPluginInstance::Stop
...

But its inTimer parameter is NULL!  So (presumably) nothing happens.

And IdleTimerVector continues to be called (occasionally), as long as
at least one browser window is open.
Can you also try reproducing with the DivX beta from http://labs.divx.com? Our Divx contact says it's improved and has built-in crash reporting capabilities.
(In reply to comment #16)

There's not much point, I'm afraid -- I can't reproduce these crashes.

A DivX developer should look at what I've reported here (particularly
in comment #6 and comment #15), and see if this identifies a bug in
the DivXBrowserPlugin.  I believe it does.
And I believe that fixing that bug is likely to fix the IdleTimerVector crashes.
I've now installed the DivXPlusWebPlayer243 beta (from
http://labs.divx.com/node/14711#download) and have been able to
reproduce most of what I say in comment #15:

The call to InstallEventLoopIdleTimer() still has its inTimer
parameter set to NULL, and IdleTimerVector still gets called after the
last DivXBrowserPlugin instance has been destroyed.  But now the
plugin never calls RemoveEventLoopTimer() at all.

Note that there's a complication when breaking on
RemoveEventLoopTimer() -- it can also be called from system code by
RemoveActivateTSMDocument_Timer().  These calls happen even when no
plugins are loaded, and aren't relevant to this bug.  Each of them
seems to match a corresponding call to InstallEventLoopTimerInMode()
from InstallActivateTSMDocument_Timer().
I've sent email to the DivX Web Player contact identified at bug
523900.  I've pointed him at this bug and asked him to pass the
problem along to whoever can help.  I've also asked him/them to
respond here, if at all possible.
I'm the product manager for the DivX Web Player. I will file a ticket for it in our bug tracking system.

If you are interested in finding other sites that embed the Web Player to test try searching for "No video? Get the DivX Web Player".
According to dbaron's module correlation information, the crashes at
-[ToolbarWindow sendEvent:] aren't strongly associated with
DivXBrowserPlugin (or anything else from DivX).

http://people.mozilla.org/~dbaron/crash-stats/20090929-interesting-modules
http://people.mozilla.com/crash_analysis/
Summary: Crashes [@objc_msgSend | IdleTimerVector ] and [@objc_msgSend | -[ToolbarWindow sendEvent:] ] (OS X) possibly caused by DivX plugin → Crashes [@objc_msgSend | IdleTimerVector ] (OS X) possibly caused by DivXBrowserPlugin
(Following up comment #15)

> DivXBrowserPlugin does call InstallEventLoopIdleTimer() on creation
> of the first plugin instance.  Here's the top part of the stack from
> gdb:
>
> #0  0x9669d7d1 in InstallEventLoopIdleTimer ()
> #1  0x9561a5db in WebInitForCarbon ()
> #2  0x1a0254b7 in getPluginClass ()
> #3  0x19ff6336 in getPluginClass ()
> #4  0x19fe11c5 in NPP_GetMIMEDescription ()
> #5  0x19fe151f in NPP_GetMIMEDescription ()
> #6  0x19fe4049 in Mozilla_NPP_New ()
> #7  0x19fe4414 in NPP_New ()
> #8  0x029c3a11 in nsNPAPIPluginInstance::InitializePlugin (this=0x174b3530)
> ...

I've just discovered that InstallEventLoopIdleTimer() is called from
WebKit code (from WebInitForCarbon() in mac/Carbon/CarbonUtils.m).  So
this may turn out to be a WebKit bug.

I'll dig further into this tomorrow.

getPluginClass() is in DivXBrowserPlugin.  It's presumably modeled
after a similarly named method in the testing/sample plugin called
TestNetscapePlugIn.
I've ruled out the possibility that these crashes are caused/triggered
by the browser unloading DivXBrowserPlugin.

Since at least the patch for bug 393928 (landed on 2007-10-09), no
plugin that's a bundle ever gets unloaded (on OS X).  This is because
of the following lines in nsPluginTag::nsPluginTag(nsPluginInfo*
aPluginInfo) (in nsPluginHost.cpp/nsPluginHostImpl.cpp):

  #ifdef XP_MACOSX
      mCanUnloadLibrary(!aPluginInfo->fBundle),
  #else
      mCanUnloadLibrary(PR_TRUE),
  #endif
(Following up comment #24)

This is even with the boolean pref "plugins.unloadASAP" set to 'true'
(it defaults to false).  See the patch for bug 500925.
(In reply to comment #24)

So we are certain there isn't anything that my DivX plug-in is doing to cause this bug, and I can close this out on my end?
(In reply to comment #26)

It's a bit premature for that.

And even if this bug isn't your "fault", there may be something we can ask you to do that would fix/alleviate it.
(Following up comment #24)

I've now also (I think) ruled out the possibility that the
IdleTimerVector crashes are caused/triggered by the WebKit framework
being unloaded.

(As you'll remember (from comment #23), the WebInitForCarbon() method
that calls InstallEventLoopIdleTimer() is in the WebKit framework (in
WebKit/mac/Carbon/CarbonUtils.m).  So is the PoolCleaner() method
whose address WebInitForCarbon() supplies to
InstallEventLoopIdleTimer() as its inTimerProc parameter.)

According to dbaron's module correlation information (links at comment
#22), all the IdleTimerVector crash stacks have 100% correlation with
both DivXBrowserPlugin and the WebKit framework.  Which (I think)
shows that none of these crashes happened when the WebKit framework
wasn't present (because of it having been unloaded).

Interestingly, there are crashes at __CFRunLoopDoTimer that also have
a 100% correlation with both DivXBrowserPlugin and the WebKit
framework.  These crashes appear to be caused by the EventLoopTimerRef
installed by WebInitForCarbon() having somehow prematurely been
disposed of.  But this isn't because WebKit has been unloaded, or
because WebKit code ever explicitly disposes of this
EventLoopTimerRef.

*Very* puzzling.

Sherlock Holmes to the contrary, it looks like eliminating the
impossible leaves us with nothing at all, however improbable.

But wait for the next comment :-)
Once WebKit's WebInitForCarbon() has called
InstallEventLoopIdleTimer(), WebKit's PoolCleaner() gets called every
time the OS detects that no user events are being received.

PoolCleaner() ensures that an autorelease pool is always in place (for
the Cocoa code that may get called from WebKit even when it's being
used from a Carbon app, like DivXBrowserPlugin).  It also periodically
"cleans" or "drains" these autorelease pools, so they don't fill up
with Objective-C objects that have, in effect, been leaked.

Firefox also has code that does more or less the same thing, in
nsAppShell::OnProcessNextEvent() and
nsAppShell::AfterProcessNextEvent() (in
widgets/src/cocoa/nsAppShell.mm).

I wonder if these two "cleaner" strategies can interfere with each
other, and cause objects to be deleted/released unexpectedly.

Whether or not this is true, it's pretty clear only one of them is
needed.  So it shouldn't hurt to turn the other one off.

But this is easier said than done.

It's apparently impossible to tell (programmatically) if
WebInitForCarbon() or InstallEventLoopIdleTimer() has ever been
called.  Though one can tell whether or not the WebKit framework has
been loaded, and whether or not 'PoolCleaner' is a valid symbol.

So one could look for the WebKit framework or the 'PoolCleaner' symbol
after loading each new plugin, and turn of the nsAppShell "cleaner" if
either was found.  But this might be dangerous, because the plugin
using the WebKit framework might not be a Carbon app, and might not
call WebInitForCarbon().  (There doesn't appear to be any reasonable
way of telling (programmatically) whether or not a given binary is a
Carbon app.)

It turns out to be easier to approach this problem from the other
direction -- it's easier to "turn off" WebKit's 'PoolCleaner'.  But
it's much better to do this from the plugin than from the browser.

In my tests (so far only on OS X 10.5.8) I've discovered that only one
"idle timer" can be installed/active on the main event loop at a time.
So it'd be easy for DivXBrowserPlugin to call
InstallEventLoopIdleTimer() directly from getPluginClass(), after its
call to WebInitForCarbon(), and set 'inTimerProc' to the address of an
empty EventLoopIdleTimerProcPtr (one that does nothing at all).  This
address (and the EventLoopTimerRef "returned" by
InstallEventLoopIdleTimer()) would have to stay valid for as long as
DivXBrowerPlugin was in memory (i.e. even after the last plugin
instance had been destroyed).  And there should probably be code to
dispose of the EventLoopTimerRef when/if DivXBrowserPlugin gets
unloaded from memory.
So, Paul, could you make DivXBrowserPlugin do what I described in the
last paragraph of comment #29?

It sounds like you already detect when you're running in a
Mozilla-family browser (hence the calls to Mozilla_NPP_New()
and Mozilla_NPP_Destroy() in the stacks from comment #15).

Could DivXBrowserPlugin, when it detects that it's running in a
Mozilla-family browser, call InstallEventLoopIdleTimer() just after
the call to WebInitForCarbon(), so as to install an idle timer
pointing to an EventLoopIdleTimerProcPtr that does nothing at all?
Paul, how many versions back does your Firefox support go?

What I've asked you to do in comment #30 would only be appropriate for
FF 3.0 and higher.
It'd also be possible for the browser to "turn off" WebKit's
'PoolCleaner'.  But there isn't any particularly good way to do this.

Here's the best I've been able to come up with:

1) Have nsAppShell::Init() create an nsITimer with an interval of 5-10
   seconds.

2) Have the timer's callback use InstallEventLoopIdleTimer() to
   install a new, empty, idle timer, storing the resulting
   EventLoopTimerRef object in the nsAppShell object.

   As appropriate, have it also dispose of the previously
   created/installed EventLoopTimerRef object.

3) Have the nsAppShell destructor (or possibly nsAppShell::Exit())
   destroy the timer and dispose of any remaining EventLoopTimerRef.
Bad news.

> In my tests (so far only on OS X 10.5.8) I've discovered that only
> one "idle timer" can be installed/active on the main event loop at a
> time.

This doesn't seem to be true, after all.  I don't know why my earlier
tests made it seem so.

So, Paul, you can forget my suggestion from comment #30, and what I
say in comment #32 won't work.

Sigh.

Back to the drawing board.
Progress is slow ... but it's still happening.

The latest news is I've discovered that (for complicated reasons)
WebKit's PoolCleaner() doesn't work on OS X 10.6.X.  There also don't
appear to be any IdleTimerVector crashes on OS X 10.6.X.

This confirms that what PoolCleaner() does (it periodically "cleans"
or "drains" autorelease pools) is at the heart of the problem.  Which
makes it more likely that my hunch is correct that FF's (nsAppShell's)
and WebKit's (PoolCleaner's) autorelease pool "cleaners" are somehow
interfering with each other.
I've now made substantial progress on this bug, and should have a fix
soon.

This bug is entirely caused by WebKit's PoolCleaner().  It has nothing
to do with what nsAppShell's own autorelease pool "cleaner" does, or
with any sort of bad interaction between it and WebKit's
PoolCleaner().

So this is a WebKit bug -- not a DivXBrowserPlugin bug or a Firefox
bug.

I've figured out another way to disable WebKit's PoolCleaner() (one
that actually works).  It all depends on "when" the call to
WebInitForCarbon() takes place (I'll have more to say about this
later).

I've also found out that if you call WebInitForCarbon() very early (in
nsAppShell::Init()), and in such a way that PoolCleaner() isn't
disabled, Firefox gets very crashy.  These crashes aren't exactly the
same as this bug's reported crashes -- for example I haven't seen any
crashes under IdleTimerVector().  But they *are* crashes which (like
the IdleTimerVector() crashes) seem to be caused by objects
mysteriously having gotten deleted.  For example the following (on the
main thread):

  Program received signal EXC_BAD_ACCESS, Could not access memory.
  Reason: KERN_INVALID_ADDRESS at address: 0xc0000023
  0x96b10688 in objc_msgSend ()
  (gdb) bt
  #0  0x96b10688 in objc_msgSend ()
  #1  0x946cf026 in -[NSApplication run] ()
  #2  0x02b543fa in nsAppShell::Run (this=0x520c60)
  ...

The crashes keep happening when I disable nsAppShell's autorelease
pool cleaner.  They go away when WebInitForCarbon() is called from
nsAppShell::Init(), but in such a way that WebKit's PoolCleaner() is
disabled.
Attached patch Possible fix/workaround (obsolete) — Splinter Review
Here's a patch that will, I hope, work around this bug (and get rid of
the IdleTimerVector and related crashes).

It's too bad that, in order to do this, we have to always call
WebInitForCarbon() (on OS X 10.5.X and below).  But I don't see any
way around it.

I've already shown that my patch defangs WebKit's PoolCleaner().  The
only other thing we might possibly need to worry about is the call to
HIWebViewRegisterClass() from WebInitForCarbon().

The source for this is in WebKit/mac/Carbon/HIWebView.mm.  It
registers a custom class of events and installs a handler for them
(HIWebViewEventHandler()).  It seems likely it will never be called if
we don't load a plugin that (like DivXBrowserPlugin) actually uses
Carbon WebKit.  If it *is* called, all it apparently does is translate
these custom events into Cocoa events.

Nonetheless, I think we'll need to let this patch bake on the trunk
for a while before we can be sure it does no harm.

A tryserver build will follow in a few hours.
Attachment #409447 - Flags: review?(joshmoz)
Summary: Crashes [@objc_msgSend | IdleTimerVector ] (OS X) possibly caused by DivXBrowserPlugin → Crashes [@objc_msgSend | IdleTimerVector ] (OS X) caused by Carbon WebKit (used by DivXBrowserPlugin)
Comment on attachment 409447 [details] [diff] [review]
Possible fix/workaround

Actually this still needs some work.  I'll post another patch on Monday.
Attachment #409447 - Flags: review?(joshmoz)
We need to focus our efforts on a WebKit/divx fix here. Can we get a WebKit bug filed? Is this something DivX can do anything about?
> Can we get a WebKit bug filed?

I'll do that.

> Is this something DivX can do anything about?

No.  Only the browser has enough control over when WebKit's WebInitForCarbon() is called.

Revised patch coming up shortly.
Turns out the previous patch didn't completely disable PoolCleaner().

A tryserver build should be available in a few hours.
Attachment #409447 - Attachment is obsolete: true
Attachment #409942 - Flags: review?(joshmoz)
Here's a tryserver build made with my rev1 patch:
https://build.mozilla.org/tryserver-builds/smichaud@pobox.com-bugzilla509130/bugzilla509130-macosx.dmg

The tryserver didn't store this in a new directory.  Instead it overwrote the contents of my previous patch's directory.  Maybe I gave it the wrong ID when submitting it.
Safari also works around this bug (and disables PoolCleaner()) ... but
in a different way.

When the DivX plugin calls WebInitForCarbon(), and
WKGetNSAutoreleasePoolCount() is first called, the latter returns '3'
-- which means that there were two autorelease pools on the "stack"
when WKGetNSAutoreleasePoolCount() was first called.

But on each subsequent call to WKGetNSAutoreleasePoolCount() (from
PoolCleaner()), WKGetNSAutoreleasePoolCount() returns '1' -- which
means there are no autorelease pools on the stack.  It also means that
the first pool created (in the call to WebInitForCarbon()) never gets
released (because PoolCleaner() is disabled).

It's not at all clear how Safari does this, and I don't see how we
could do it ourselves.  I suspect some kind of underhanded OS magic.
Summary: Crashes [@objc_msgSend | IdleTimerVector ] (OS X) caused by Carbon WebKit (used by DivXBrowserPlugin) → Crashes [@objc_msgSend | IdleTimerVector ] (OS X) caused by WebKit used by Carbon plugin (DivXBrowserPlugin)
(Following up comment #42)

> It's not at all clear how Safari does this, and I don't see how we
> could do it ourselves.  I suspect some kind of underhanded OS magic.

I've figured out how Safari avoids these crashes when it loads the
DivX plugin (why sPool never gets reallocated by PoolCleaner()).  It's
not anything Safari *does*.  Rather it has to do with how the main
thread's run loop works in WebKit apps.  Which isn't how it works in
Mozilla Corp browsers.  Which means we can't get away with what Safari
"does" (or rather doesn't do).

In WebKit apps, the main thread's run loop never runs in the default
run loop mode (kCFRunLoopDefaultMode) when it's running nested.  So
when PoolCleaner() is called in the default run loop mode, there are
always 0 autorelease pools on the autorelease pool stack -- which is
never the number of pools that were on the stack (2) when the DivX
plugin called WebInitForCarbon().

In Firefox (specifically in nsAppShell) the main thread's run loop can
be running in the default run loop mode even when it's nested several
levels deep.  This is because of how the app shell is designed, on all
platforms:

nsAppShell::ProcessNextNativeEvent() is called while Gecko events are
being processed, in order to prevent native event starvation (since
some Gecko events are processed synchronously).  On all platforms,
this means ProcessNextNativeEvent() will process native events while
at least one earlier call to process a native event is already on the
stack (the number of earlier calls depends on the level of nesting).

ProcessNextNativeEvent() grabs the next native event in the queue.  On
OS X it runs the event in whatever mode happens to be current -- which
may be kCFRunLoopDefaultMode.

The size of the autorelease pool stack depends on the level of
nesting.  So it's easy to see that, no matter how many autorelease
pools were on the stack when WebInitForCarbon() was called (if it was
called from DivXBrowserPlugin), it's likely that PoolCleaner() will be
called at some point, in kCFRunLoopDefaultMode, with the same number
of autorelease pools on the stack.  Which means PoolCleaner() will
reallocate sPool, which (as we've seen) causes objects in the current
autorelease pool to be released unexpectedly.

In a WebKit app (Safari), a call to PoolCleaner() on the main thread
in kCFRunLoopDefaultMode always looks like this:

#0  0x9564fd5f in WKGetNSAutoreleasePoolCount ()
#1  0x9561a675 in PoolCleaner ()
#2  0x966ab59c in IdleTimerVector ()
#3  0x9048d8f5 in CFRunLoopRunSpecific ()
#4  0x9048daa8 in CFRunLoopRunInMode ()
#5  0x9663d2ac in RunCurrentEventLoopInMode ()
#6  0x9663d0c5 in ReceiveNextEventCommon ()
#7  0x9663cf39 in BlockUntilNextEventMatchingListInMode ()
#8  0x946656d5 in _DPSNextEvent ()
#9  0x94664f88 in -[NSApplication
                  nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#10 0x0000c303 in ?? ()
#11 0x9465df9f in -[NSApplication run] ()
#12 0x9462b1d8 in NSApplicationMain ()
#13 0x00002c92 in ?? ()

In Firefox it can look like this:

#0  0x9564fd5f in WKGetNSAutoreleasePoolCount ()
#1  0x9561a675 in PoolCleaner ()
#2  0x966ab59c in IdleTimerVector ()
#3  0x9048d8f5 in CFRunLoopRunSpecific ()
#4  0x9048daa8 in CFRunLoopRunInMode ()
#5  0x9663d2ac in RunCurrentEventLoopInMode ()
#6  0x9663d0c5 in ReceiveNextEventCommon ()
#7  0x9663cf39 in BlockUntilNextEventMatchingListInMode ()
#8  0x946656d5 in _DPSNextEvent ()
#9  0x94664f88 in -[NSApplication
                  nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#10 0x9465df9f in -[NSApplication run] ()
#11 0x02b6edba in nsAppShell::Run (this=0x520710)
#12 0x029eb2c7 in nsAppStartup::Run (this=0xbffff208)
#13 0x02021ebb in XRE_main (argc=1, argv=0xbffff828, aAppData=0x50f520)
#14 0x00001d0c in main (argc=1, argv=0xbffff828)

or like this:

#0  0x9564fd5f in WKGetNSAutoreleasePoolCount ()
#1  0x9561a675 in PoolCleaner ()
#2  0x966ab59c in IdleTimerVector ()
#3  0x9048d8f5 in CFRunLoopRunSpecific ()
#4  0x9048daa8 in CFRunLoopRunInMode ()
#5  0x9663d2ac in RunCurrentEventLoopInMode ()
#6  0x9663cffe in ReceiveNextEventCommon ()
#7  0x9663cf39 in BlockUntilNextEventMatchingListInMode ()
#8  0x946656d5 in _DPSNextEvent ()
#9  0x94664f88 in -[NSApplication
                  nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#10 0x02b70b2c in nsAppShell::ProcessNextNativeEvent
                  (this=0x520710, aMayWait=0)
#11 0x02baae0e in nsBaseAppShell::DoProcessNextNativeEvent ()
#12 0x02baae0e in nsBaseAppShell::OnProcessNextEvent
                  (this=0x520710, thr=0x514810, mayWait=0,
                  recursionDepth=0)
#13 0x02b6f2e5 in nsAppShell::OnProcessNextEvent
                  (this=0x520710, aThread=0x514810, aMayWait=0,
                  aRecursionDepth=0)
#14 0x02c3f88c in nsThread::ProcessNextEvent
                  (this=0x514810, mayWait=0, result=0xbfffe27c)
#15 0x02bfd8c7 in NS_ProcessPendingEvents_P (thread=0x514810, timeout=20)
#16 0x02baaab2 in nsBaseAppShell::NativeEventCallback (this=0x520710)
#17 0x02b6f868 in nsAppShell::ProcessGeckoEvents (aInfo=0x520710)
#18 0x9048d3c5 in CFRunLoopRunSpecific ()
#19 0x9048daa8 in CFRunLoopRunInMode ()
#20 0x9663d2ac in RunCurrentEventLoopInMode ()
#21 0x9663d0c5 in ReceiveNextEventCommon ()
#22 0x9663cf39 in BlockUntilNextEventMatchingListInMode ()
#23 0x946656d5 in _DPSNextEvent ()
#24 0x94664f88 in -[NSApplication
                  nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#25 0x9465df9f in -[NSApplication run] ()
#26 0x02b6edba in nsAppShell::Run (this=0x520710)
#27 0x029eb2c7 in nsAppStartup::Run (this=0xbffff208)
#28 0x02021ebb in XRE_main (argc=1, argv=0xbffff828, aAppData=0x50f520)
#29 0x00001d0c in main (argc=1, argv=0xbffff828)
Attachment #409942 - Flags: review?(joshmoz) → review+
Landed on trunk:
http://hg.mozilla.org/mozilla-central/rev/dd1f2d353b6d
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
It'll be a bit difficult to verify that this bug is fixed.

It's not reproducible.  And (for all that it's a topcrasher) it
doesn't happen often enough to be very visible in nightlies --
currently all the 1.9.2-branch-plus crashes in the last week are in FF
3.6 Beta 1.

It's possible we won't know for sure until beta2 is released.
> It's possible we won't know for sure until beta2 is released.

Actually, it looks like this patch just missed beta2 :-(
(There will most likely be a beta 3 this week or next.)
Whiteboard: [crashkill] → [crashkill] rdar://problem/7362667
This bug's patch got into FF 3.6 beta3 (on the 1.9.2 branch), and is
of course also present in subsequent builds (e.g. beta4).

Just now http://crash-stats.mozilla.com/ found NO crashes with a stack
signature containing "IdleTimerVector" in beta3 or subsequent builds.
I think this means we can verify this bug fixed.
Status: RESOLVED → VERIFIED
Depends on: 533001
Hi Steven, we are embedding xulrunner into a carbon application, we used to call WebInitForCarbon in the very first begining because some other components in our application need to call WebKit functions.

After we upgrade to xulrunner 1.9.1.3, our applicaion keeps crash in  [@objc_msgSend | IdleTimerVector ]. But we can't workaround this problem as you did for this bug, because we can't make sure xulrunner was the first loaded component...

I tried to remove WebInitForCarbon call in our application, the WebKit functions stop work in this case..Is there any other workaround for this bug for embedders? After all, WebInitForCarbon could be called by anyone at anytime if xulruner is just a embedded component...
Depends on: 536684
I filed bug 536684 for you peina - lets track this issue separately. Thanks for the heads up.
Crash Signature: [@objc_msgSend | IdleTimerVector ]
You need to log in before you can comment on or make changes to this bug.