Last Comment Bug 436575 - Moz apps experience unkillable hangs after installing Mac OS X 10.5.3 (loading VerifiedDownloadPlugin.plugin)
: Moz apps experience unkillable hangs after installing Mac OS X 10.5.3 (loadin...
Status: VERIFIED FIXED
rdar://5986742
: fixed1.8.1.15, hang, relnote
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: 1.9.0 Branch
: PowerPC Mac OS X
: -- critical (vote)
: ---
Assigned To: Steven Michaud [:smichaud] (Retired)
:
Mentors:
: 437348 437752 442381 (view as bug list)
Depends on:
Blocks: 438394
  Show dependency treegraph
 
Reported: 2008-05-30 13:09 PDT by Christoph Krant
Modified: 2009-01-26 14:24 PST (History)
44 users (show)
samuel.sidler+old: blocking1.8.1.15+
samuel.sidler+old: wanted1.8.1.x+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
hang stack (120.66 KB, text/plain)
2008-06-04 05:29 PDT, Carsten Book [:Tomcat]
no flags Details
hang stack (13.43 KB, text/plain)
2008-06-04 12:10 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details
Sample of FirefoxDebug (17.22 KB, text/plain)
2008-06-04 19:04 PDT, Bob Clary [:bc:]
no flags Details
Work around Apple's bug (4.23 KB, patch)
2008-06-06 09:32 PDT, Steven Michaud [:smichaud] (Retired)
no flags Details | Diff | Splinter Review
Fix rev1 (minor cleanup) (4.76 KB, patch)
2008-06-08 10:04 PDT, Steven Michaud [:smichaud] (Retired)
no flags Details | Diff | Splinter Review
Yet another hang report FF3rc2 10.5.3 (143.87 KB, text/plain)
2008-06-09 01:06 PDT, vangelis
no flags Details
Fix rev2 (use calloc on path to avoid UTF8 issues) (5.01 KB, patch)
2008-06-09 09:14 PDT, Steven Michaud [:smichaud] (Retired)
nelson: review-
Details | Diff | Splinter Review
Censor VDP when opening plugins (1.17 KB, patch)
2008-06-09 12:06 PDT, Mike Shaver (:shaver -- probably not reading bugmail closely)
no flags Details | Diff | Splinter Review
DEBUG_shaver gone, NS_WARNING added (1.18 KB, patch)
2008-06-09 13:18 PDT, Mike Shaver (:shaver -- probably not reading bugmail closely)
jaas: review+
brendan: superreview+
dsicore: approval1.9+
Details | Diff | Splinter Review
Sample after main process killed (1.45 KB, text/plain)
2008-06-09 13:31 PDT, Bob Clary [:bc:]
no flags Details
Fix rev3 (fix leak and other problems) (5.15 KB, patch)
2008-06-09 15:34 PDT, Steven Michaud [:smichaud] (Retired)
nelson: review+
Details | Diff | Splinter Review
Report created by system after force quiting non-responding Camino (282.21 KB, text/plain)
2008-06-19 14:17 PDT, ronald.gold
no flags Details

Description Christoph Krant 2008-05-30 13:09:54 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

I set up a new MacBook (Mac OS 10.5.0) and installed Mac OS 10.5.2 Combo Update. After that update I installed Firefox 2.0.0.14 defined it to be the default browser and tried to shutdown the MacBook. In thise case the MacBook will always stop shuting down und freeze until  hard reset has been made. If you only install Mac OS 10.5.2 Update, the probleme does not happen. I tested thise with two MacBooks und the bug could be reproduced, every time, on both Systems.
(Please forgive me my bad english) 

Reproducible: Always

Steps to Reproduce:
1. Install Mac OS X 10.5.0 on a MacBook
2. Use Combo Update to update Mac OS X to Version 10.5.3 
3. Install Firefox 2.0.0.14, make it the default browser und try to shudown the MacBook
Actual Results:  
System freezes 

Expected Results:  
System should shutdown
Comment 1 Nick Kreeger 2008-06-03 15:43:55 PDT
We're seeing this @ Songbird as well. Here is our steps to reproduce:

1.) Delete your ~/Library/Application Support/Firefox
2.) Start Firefox
3.) Close Firefox
4.) Delete your Firefox profile dir again
5.) Start firefox, after a couple of seconds - the never ending hang happens
Comment 2 Carsten Book [:Tomcat] 2008-06-04 05:26:42 PDT
Confirmed on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9pre)
Gecko/2008060218 Firefox/3.0pre and a Firefox 3 RC2 Build using the Steps to reproduce from Nick. So from comment #0 it seems this Affects Firefox 2 and 3.0 RC2

Note: When i used the Steps to reproduce and used force quit to get out of this hang i see:
-> I get a crash/hang report from apple (i will attach this to this bug)
-> Firefox is not able to shutdown from the Dock, even when i used Force Quit...also the Firefox Icon stays in "active" the Dock, but there is no Firefox Process in the Activity Monitor ?
-> When you shutdown Mac OS 10.5.3 (its the only way to end Firefox, Mac 10.5.3 hangs on shutdown, as described on comment #0

Also i have seen this Hang also while installing an extension, so there might be also other steps to reproduce.
Comment 3 Carsten Book [:Tomcat] 2008-06-04 05:29:04 PDT
Created attachment 323705 [details]
hang stack
Comment 4 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-04 07:57:55 PDT
(In reply to comment #1)
> We're seeing this @ Songbird as well. Here is our steps to reproduce:
> 
> 1.) Delete your ~/Library/Application Support/Firefox
> 2.) Start Firefox
> 3.) Close Firefox
> 4.) Delete your Firefox profile dir again
> 5.) Start firefox, after a couple of seconds - the never ending hang happens

Does this happen when you remove/modify the Application Support folders of other applications?

I noticed an unending hang in Adium two days ago (unresponsive to Force Quit and kill -9, showed as zombie process in grep) and had to do a full system reboot to get around it. Minefield at that time started and shutdown just fine, though.
Comment 5 Samuel Sidler (old account; do not CC) 2008-06-04 08:30:23 PDT
This problem should be reported to Apple, even before we finish our investigation of it. When someone who's seen this reports the problem, please paste the rdar number in this bug.
Comment 6 Carsten Book [:Tomcat] 2008-06-04 08:40:58 PDT
(In reply to comment #4)
> Does this happen when you remove/modify the Application Support folders of
> other applications?

also Songbird is affected by this Problem...nuking the ~/Library/Application Support/Sonbird1 Directory and restarting Songbird results in the same Problem/Hang as Firefox http://pastebin.mozilla.org/451186 so its also a Problem for other Apps...


Comment 7 Carsten Book [:Tomcat] 2008-06-04 09:11:52 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > Does this happen when you remove/modify the Application Support folders of
> > other applications?
> 
> also Songbird is affected by this Problem...nuking the ~/Library/Application
> Support/Sonbird1 Directory and restarting Songbird results in the same
> Problem/Hang as Firefox http://pastebin.mozilla.org/451186 so its also a
> Problem for other Apps...
> 

also Camino is hanging with the steps to reproduce - adium is fine when using the steps to reproduce(In reply to comment #6)
Comment 8 Carsten Book [:Tomcat] 2008-06-04 09:45:44 PDT
as discussed on the Mac Meeting requesting blocking Firefox 3, because its a very bad user experience
Comment 9 Damon Sicore (:damons) 2008-06-04 10:18:48 PDT
(In reply to comment #1)

> 1.) Delete your ~/Library/Application Support/Firefox
> 2.) Start Firefox
> 3.) Close Firefox
> 4.) Delete your Firefox profile dir again
> 5.) Start firefox, after a couple of seconds - the never ending hang happens
> 

Are we seeing this hang in any other situation than the above STR?  During the mac meeting, I was under the impression the STR was a little simpler that the above.  If there's not a more direct/frequent for the everyday user, set of steps to hang, I'm not sure this would actually cause an RC re-spin, assuming we did have a fix.

Steven?  Carsten?

Comment 10 Marcia Knous [:marcia - use ni] 2008-06-04 10:24:28 PDT
I mentioned to Tomcat on IRC that I saw some odd behavior when trying to run my betatest update testing from RC1->RC2.  Here is what I saw:

1. Download the Italian or French RC 1 build from this directory: http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.0rc1/mac/. Change update channel to betatest in Preferences.
2. Launch the build with a new profile.
3. App would hang (sometimes when launching the RC 1 it would be okay, but then would hang when updating to RC2. Unable to remove zombie dock icon and had to restart my machine to remove the offending apps that were stuck in the dock. This happened with more than one l10n build (I believe with both French and Italian).
Comment 11 Carsten Book [:Tomcat] 2008-06-04 10:31:37 PDT
(In reply to comment #9)
> (In reply to comment #1)
> 
> > 1.) Delete your ~/Library/Application Support/Firefox
> > 2.) Start Firefox
> > 3.) Close Firefox
> > 4.) Delete your Firefox profile dir again
> > 5.) Start firefox, after a couple of seconds - the never ending hang happens
> > 
> 
> Are we seeing this hang in any other situation than the above STR?  During the
> mac meeting, I was under the impression the STR was a little simpler that the
> above.  If there's not a more direct/frequent for the everyday user, set of
> steps to hang, I'm not sure this would actually cause an RC re-spin, assuming
> we did have a fix.
> 
> Steven?  Carsten?
> 

Hi Damon, it seems that deleting the Firefox Application Support Directory is
enough to get the hang. I'm not sure if this also affects so new user of
Firefox, because they didn't had a Firefox App Dir/User Profile before.

I was also seeing this during installing a extension and so there might be more scenarios to get this hang.

Comment 12 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-04 11:24:55 PDT
Do we have confirmation that installing an add-on causes this hang? I can't reproduce that set of steps here.

This is definitely wanted on branch, but as it's entirely unclear if there's even anything we can *do* about this (since as I'm reading things, this might be a problem with 10.5.3 - if someone knows more about the bug could they please comment?) and since the STR aren't exactly common, I'm not sure we want to block and respin for this.
Comment 13 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-04 12:10:26 PDT
Created attachment 323743 [details]
hang stack

Here's a debug build's stack.

The problem is 'VerifiedDownloadPlugin.plugin', which is located in /Library/Internet\ Plugins. A little googling reveals that this plugin was first introduced by Apple in 10.4.7, and I'm guessing that Apple did something bad in 10.5.3. We're dying in OS X's image loader...

Anyway, removing this plugin and rebooting solves the problem for me, but I'm worried that the plugin is part of Apple's download security scheme (maybe it's this plugin that is responsible for marking downloaded executables as coming from the internet). Someone with Apple contacts may be able to figure out what this plugin really does, google doesn't seem to know.
Comment 14 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-04 12:37:52 PDT
smichaud: is ben's analysis similar to yours?
Comment 15 Steven Michaud [:smichaud] (Retired) 2008-06-04 13:09:15 PDT
> smichaud: is ben's analysis similar to yours?

Not entirely.  But I think we've been converging on the same target
from different directions.

By the way, here's a procedure that (in my very limited testing) fixes
this bug (or rather works around it).  A few minutes ago I asked
Tomcat and Marcia to test it, but I haven't yet heard back from them.

Please try it out and let me know your results.  I have some ideas
about why it works ... but I don't want to spend the time explaining
them until we can confirm that it actually does work:

1) Quit Firefox and restart your computer, if necessary by
   force-restarting it (to get a clean slate).

2) Install a new copy of Firefox 3 (or Minefield) -- any version.

3) In Terminal, change to that copy's Contents/MacOS/plugins directory.

4) Move the JavaEmbeddingPlugin.bundle up one level (to the
   Contents/MacOS directory).  I used "mv JavaEmbeddingPlugin.bundle
   ..".

Now try whatever STRs you've been able to find for this hang.
Comment 16 Steven Michaud [:smichaud] (Retired) 2008-06-04 13:15:30 PDT
(Following up comment #15)

Tomcat's just told me (and I've confirmed it) that this only works
once.  Subsequent attempts to make FF hang do work -- it still hangs.
Comment 17 Carsten Book [:Tomcat] 2008-06-04 14:17:39 PDT
Also another example (from steven) for getting this hang is:

-> Create a new Profile on Firefox 2 or 3 (or use an existing profile where you know its okay)
-> Delete ~/Library/Application Support/Firefox/pluginreg.dat and/or (this depend if its a Firefox 2 or Firefox 3 Profile) ~/Library/Application Support/Firefox/Profiles/[xxxxxxxx]/pluginreg.dat.
-> Start Firefox and go to about:plugins
-> Firefox hangs instantly

Tested on Firefox 3 RC2 Build 2 and Firefox 2.0.0.14 

This steps to reproduce are btw. not uncommon , we recommend this exact steps to reproduce as example on http://support.mozilla.com/en-US/kb/Opening+PDF+files+within+Firefox#Re_initialize_the_plugins_database 
Comment 18 Bob Clary [:bc:] 2008-06-04 15:10:17 PDT
I am seeing a hang during the JavaScript browser tests where the Firefox process enters an uninterpretable  wait. I have a sample of the process if that would help.
Comment 19 Steven Michaud [:smichaud] (Retired) 2008-06-04 16:01:58 PDT
(In reply to comment #18)

It can't hurt (even though it might turn out to be completely
unrelated).

But don't bother if your sample wasn't made with a build containing
debug symbols (i.e. don't bother if it was made with any downloadable
distro, all of which have their debug symbols stripped).
Comment 20 John Gaunt (redfive) 2008-06-04 16:13:05 PDT
We have seen this hang Songbird when installing extensions, but not reliably. I don't have STR at the point but wanted to mention it so more people will try common steps like this. It is clearly an issue with restarting the app for us.
Comment 21 Steven Michaud [:smichaud] (Retired) 2008-06-04 16:33:52 PDT
For what it's worth, I've found (by laboriously tracing in gdb) that
the hangs seem to take place at the line
pluginFilesArray.appendElement(item) in
nsPluginHostImpl::ScanPluginsDirectory(), during the next call to
ScanPluginsDirectory() (from
nsPluginHostImpl::ScanPluginsDirectoryList()) after the JEP
(MRJPlugin.plugin and JavaEmbeddingPlugin.bundle) was loaded (in other
words the JEP was loaded during the previous call to
ScanPluginsDirectory() from ScanPluginsDirectoryList()).

http://mxr.mozilla.org/mozilla/source/modules/plugin/base/src/nsPluginHostImpl.cpp#5101

Does this ring a bell for anyone?

It may be relevant that MRJPlugin.plugin dynamically loads
JavaEmbeddingPlugin.bundle ... though the hangs don't ever (as far as
I can tell) take place exactly when that happens.
Comment 22 Steven Michaud [:smichaud] (Retired) 2008-06-04 16:51:16 PDT
Just so that everyone knows:

If I can find a reasonably easy fix/workaround for this (over the next
few days), I'm going to renom it for blocking1.9.  We can go over the
specific arguments for and against if/when I do that.
Comment 23 Steven Michaud [:smichaud] (Retired) 2008-06-04 17:19:30 PDT
(Following up comment #21)

Actually, the problem has nothing to do with MRJPlugin.plugin
dynamically loading JavaEmbeddingPlugin.bundle, or with either of them
separately:

The hangs still happen if you delete both of them from
Contents/MacOS/plugins (and from /Library/Internet Plug-Ins and
~/Library/Internet Plug-Ins).
Comment 24 Bob Clary [:bc:] 2008-06-04 19:04:41 PDT
Created attachment 323806 [details]
Sample of FirefoxDebug

(In reply to comment #19)
> (In reply to comment #18)
Comment 25 philippe (part-time) 2008-06-04 23:29:51 PDT
OK, I've been experiencing very similar issues with Camino (both Camino 1.6.1 and trunk builds): bug 437348. All started with 10.5.3.

I've submitted a Bug Report @ Apple: Bug ID# 5987623.
Comment 26 Mark Banner (:standard8) (afk until 26th July) 2008-06-05 02:36:57 PDT
I think I too have seen this with 10.5.3, I have slightly different STR which is why I'm posting:

- Build TB with --enable-tests
- run the test mailnews/extensions/bayesian-spam-filter/test/unit/test_bug228675.js (checked in today)
- test passes
- run it again
- xpcshell hangs, can't be killed with Ctrl-C. All subsequent attempts to run the test again hang, MacBook hangs on shutdown/restart.

looking at the log, its trying to load the plugins (not a surprise based on other comments). Moving the VerifiedDownloadPlugin.plugin fixes the problem.

Just thought this way may be slightly easier for folks to reproduce/test with.
Comment 27 Chris Lawson (gone) 2008-06-05 08:09:42 PDT
Yeah, I think I saw what Philippe saw in comment 25 yesterday as well. Camino hung, I tried to force-quit it, couldn't get it to force-quit, then it hung the whole OS when I tried to restart.
Comment 28 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-06-05 08:25:38 PDT
Since this seems to be related to plug-ins, --> Core:Plug-ins.
Comment 29 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-06-05 08:34:00 PDT
*** Bug 437348 has been marked as a duplicate of this bug. ***
Comment 30 Chris Lawson (gone) 2008-06-05 09:02:19 PDT
http://unsanity.org/archives/mac_os_x/reminder_verifi.php

explains (sort of) what this plugin does, which seems to be "nothing".

Note the deafening silence in response to this post:

http://lists.apple.com/archives/Webkitsdk-dev/2006/Jun/msg00042.html

It may, in fact, have something to do with Dashboard widget downloads, but for now, I'd guess a safe workaround is to tell people to remove it (not delete it).
Comment 31 Doug Turner (:dougt) 2008-06-05 09:56:41 PDT
fwiw, i see this happening in camino as well as safari.
Comment 32 Steven Michaud [:smichaud] (Retired) 2008-06-05 12:15:06 PDT
> as well as safari

Please tell us how to reproduce this in Safari.
Comment 33 Stuart Morgan 2008-06-05 13:15:16 PDT
In the same timeframe that I've started getting these hangs in Camino, I've had two different computers (one laptop, one desktop, both 10.5.3) lock up in open$NOCANCEL$UNIX2003 during xcodebuild, so I suspect this is a much deeper OS problem.
Comment 34 Steven Michaud [:smichaud] (Retired) 2008-06-05 13:47:39 PDT
> so I suspect this is a much deeper OS problem

I think you're right.  For example see:

http://discussions.apple.com/thread.jspa?threadID=1540494&tstart=0
http://www.macfixit.com/article.php?story=20080604090119930

But it'd be _really_ nice to be able to repro this with Safari.  Then
Apple could hardly ignore it.

For myself, I haven't really tried (I've been busy trying to come up
with a workaround on our side).  Doug Turner hints at having a repro
(in comment #31) ... but I'm not sure he really meant to say that.

Anyone else able to repro this in Safari?
Comment 35 Samuel Sidler (old account; do not CC) 2008-06-05 14:36:14 PDT
Not going to hold 1.8.1.15 for this as it's freezing tomorrow. Keeping nominated for 1.8.1.x and 1.8.1.16, so we can revisit after more investigation.
Comment 36 Daniel Raffel 2008-06-05 22:34:39 PDT
I started seeing this issue in Songbird on 5/31.  Early in the week I reported the issue to Apple as it appeared to be an OS X 10.5.3 bug.  I heard back from their Senior Vice President of Software Engineering saying "Thanks for the heads up.  We are investigating."  On June 4th I also heard from a Apple Software Update Integration Manager.  I have pointed them both at this bug for more info and asked for a RADAR report but haven't heard back.
Comment 37 Steven Michaud [:smichaud] (Retired) 2008-06-06 08:57:15 PDT
I've got a patch and tryserver build coming up -- this is clearly an
Apple bug, but my patch works around it.

As Ben Turner first noticed, the crux of the problem is the
VerifiedDownloadPlugin.  The reason is that, as of OS X 10.5.3, this
plugin's executable can't be dlopened.  The first time you do it
"works" -- but now you'll hang the next time you shutdown or restart
the OS, and the next attempt (of any kind) to access this plugin's
executable will also hang.

So even if you quit the browser and restart it, it will hang the next
time it tries to dlopen the VerifiedDownloadPlugin.  And (for example)
nm will hang if you try to read its symbols ... though, ironically, it
doesn't have any symbols, or even a symbol table!
Comment 38 Steven Michaud [:smichaud] (Retired) 2008-06-06 09:32:46 PDT
Created attachment 324065 [details] [diff] [review]
Work around Apple's bug

As you can see in my patch's comments, it gets around the Apple bug
(that, as of OS X 10.5.3, you can't dlopen the VerifiedDownloadPlugin)
by using nlist() to check for a symbol first, to see if dlopen()
should be used on a given plugin.

As you can see by running nm on the VerifiedDownloadPlugin's
executable, it doesn't contain any symbols -- so no program should
ever need to dlopen the plugin (and Safari never does).

Using nlist() somehow gets around the Apple bug (whatever that
ultimately turns out to be) -- I guess because nlist() accesses the
executable file directly (at least if it hasn't already been
dlopened), bypassing all the dynamic linking infrastructure.

I've tested my patch on 10.5.3, 10.5.2 and 10.4.11, with the Flash,
QuickTime and JEP plugins.  It seems to work fine with all of them, so
I didn't bother to limit the effects of my patch to OS X 10.5.3 and
above.  (For Flash I used http://www.vg.no (crawling with Flash
objects) and the text-entry examples at bug 357670 comment #25.  For
QuickTime I used the movie trailers at http://www.apple.com/trailers.
For Java I used Sun's demo applets at
http://java.sun.com/applets/jdk/1.4/index.html.)

Marcia, Tomcat, Bob Clary, and everyone else:  Please test my tryserver
build as much as possible, as soon as you can.

Camino and Songbird folks:  Please do rebuilds with my patch and see
if your problems are resolved.

I'll wait a few days for the results to come in (and to see if minor
changes to my patch fix whatever problems turn up).  Then, if all goes
well, I'll renominate this for blocking1.9 -- in the hope that there
will be an RC3 and that this patch will be included.

Here's a tryserver build made with this patch:

https://build.mozilla.org/tryserver-builds/2008-06-06_08:40-smichaud@pobox.com-bugzilla436575/smichaud@pobox.com-bugzilla436575-firefox-try-mac.dmg

Side note:  I didn't use the address returned by nlist(), because this
is often just an offset in the executable, and not the actual address
of the symbol.
Comment 39 Nick Kreeger 2008-06-06 09:45:02 PDT
(In reply to comment #38)
> Created an attachment (id=324065) [details]
> Work around Apple's bug

In your comment, you reference the |nlink()| function several times - but I only see a call to |nlist()|. Is this a typo, or have I not had enough coffee yet this morning?
Comment 40 Steven Michaud [:smichaud] (Retired) 2008-06-06 09:48:57 PDT
> In your comment, you reference the |nlink()| function several times
> - but I only see a call to |nlist()|. Is this a typo, or have I not
> had enough coffee yet this morning?

Sorry, dumb mistake.

nlink -> nlist

I'm the one who hasn't had enough coffee ... or sleep :-(
Comment 41 Samuel Sidler (old account; do not CC) 2008-06-06 13:10:36 PDT
Comment on attachment 324065 [details] [diff] [review]
Work around Apple's bug

NSPR bugs needs to be reviewed by NSPR reviewers.

See http://www.mozilla.org/owners.html
Comment 42 Carsten Book [:Tomcat] 2008-06-06 14:01:37 PDT
The tryserver build from steven Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9pre) Gecko/2008060608 Minefield/3.0pre fixes all the known ways to reproduce the crash for me.

No crash on all the tests, also java and quicktime were working fine.
Comment 43 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-06 14:26:37 PDT
Nice job, Steven! What a mess this was...
Comment 44 Justin Dolske [:Dolske] 2008-06-06 16:25:28 PDT
Would it be possible to blocklist the plugin as a workaround for shipping RC2? Or does the blocklisting only happen after the binary has been opened to see what it is?
Comment 45 Bob Clary [:bc:] 2008-06-06 16:27:11 PDT
I applied the patch to a current 1.9.0 tree and successfully completed both opt and debug runs of the browser-based javascript tests without experiencing the hang which was not possible prior to the patch.
Comment 46 philippe (part-time) 2008-06-06 17:32:04 PDT
(In reply to comment #25)
 
> I've submitted a Bug Report @ Apple: Bug ID# 5987623.
> 

Follow up from Apple:
[quote]
This is a follow up to Bug ID# 5987623.  After further investigation it has been determined that this is a known issue, which is currently being investigated by engineering.  This issue has been filed in our bug database under the original Bug ID# 5986742.
[/quote]

PS - I've experienced the same 'Kill is not a kill' after a hang in Terminal.app, while building Camino an hour ago. Force restarting the machine.
I'll test Steven's patch next.
Comment 47 Steven Michaud [:smichaud] (Retired) 2008-06-06 19:09:15 PDT
(In reply to comment #41)

> NSPR bugs needs to be reviewed by NSPR reviewers.

Yes, of course.  But since the patch is in OS-X-specific code, I
thought I'd ask Josh to review first.
Comment 48 Steven Michaud [:smichaud] (Retired) 2008-06-06 19:19:21 PDT
(In reply to comment #44)

> Or does the blocklisting only happen after the binary has been
> opened to see what it is?

I don't know.  The VerifiedDownloadPlugin binary can be opened
(e.g. by nlist()), it just can't be dlopened (i.e. no attempt can be
made to link it in).

> Would it be possible to blocklist the plugin as a workaround for
> shipping RC2?

We'll have to see what the reaction to this bug is in RC2.  But I'm
pretty sure we'll need an RC3.  And I don't think that's too high a
price to pay -- what's a couple weeks delay, on something for which
we've been working our asses off for two years?
Comment 49 Justin Dolske [:Dolske] 2008-06-06 19:35:19 PDT
If you're saying this needs an RC3, I think you need to provide a strong justification of why (impact+risk), what's different/overlooked since this was blocking-'d, and renom ASAP to get this on the driver's radar.
Comment 50 Justin Dolske [:Dolske] 2008-06-06 19:48:56 PDT
Meant to note: I dropped in here after hearing our interns saying that this bug was causing new Firefox installs to hang on exit, but I can't reproduce that on my 10.5.3 machine (via creating a new profile, at least). If reproducing this requires manual twiddling (as described in previous comments), then it doesn't seem like a release blocker to me -- but if it's highly reproducible for normal never-used-firefox-before users, then that's alarming.
Comment 51 Nick Kreeger 2008-06-06 19:59:24 PDT
(In reply to comment #50)
> Meant to note: I dropped in here after hearing our interns saying that this bug
> was causing new Firefox installs to hang on exit, but I can't reproduce that on
> my 10.5.3 machine (via creating a new profile, at least). If reproducing this
> requires manual twiddling (as described in previous comments), then it doesn't
> seem like a release blocker to me -- but if it's highly reproducible for normal
> never-used-firefox-before users, then that's alarming.
> 

If you read up - this isn't just _happening_ on new installs. The hang can be rather random. This happens sometimes during the XPCOM restart after extensions are installed (the double app-launch).

I actually had every Mozilla app (Thunderbird, Songbird, Camino) hang on this (and when I opened Firefox as well) _right_ after the update was installed. 

I should note that the Songbird QA team spent the day running a build with Steven's patch and they could not get the hang to happen again - and we are planning on shipping with it.
Comment 52 Manish Singh 2008-06-07 14:16:03 PDT
(In reply to comment #50)
> Meant to note: I dropped in here after hearing our interns saying that this bug
> was causing new Firefox installs to hang on exit, but I can't reproduce that on
> my 10.5.3 machine (via creating a new profile, at least). If reproducing this
> requires manual twiddling (as described in previous comments), then it doesn't
> seem like a release blocker to me -- but if it's highly reproducible for normal
> never-used-firefox-before users, then that's alarming.
> 

Another datapoint: Freshly installed system, upgraded to 10.5.3, then installed FF3 RC1, and hang on startup. No browser window even popped up. Same symptoms of the others, process was unkillable, and OS shutdown hung as well too. I did no manual tweaking.

This also happened to me with iTerm. Both with Firefox and iTerm I was building something out of macports at the time, so the machine was fairly loaded, perhaps that has something to do with it?

If the patch here is a reasonable workaround, I feel it should go in an RC3. Even if it's only mildly reproducible, a first run bug *sucks*, since first impressions count a lot.

I have to wonder what's going on in Apple QA though. Do they not test popular third party apps at all?

Comment 53 Justin Dolske [:Dolske] 2008-06-07 15:07:09 PDT
Preemptively renomming, based on last few comments, to get back on driver's list.
Comment 54 Marcia Knous [:marcia - use ni] 2008-06-07 15:35:50 PDT
I am glad this bug has been renominated by Justin. During my update testing (See Comment 10) I could not even use my Intel PPC Mac machine for testing due to this bug. Once I remove the verifieddownload plugin all is well.

I tested Steven's tryserver build on a PPC Mac running Leopard. I had to download a fresh FF 3 RC2, and when I launched Firefox 2 the same thing happened. So I am concerned that eager new users will have a very unpleasant experience when trying to run Firefox 3 if this bug is not addressed.

We should also try to gather some data to see how many users are actually running 10.5.3. We may be able to get some of this data from the crash stats. Just some ideas...
Comment 55 Marcia Knous [:marcia - use ni] 2008-06-07 15:36:59 PDT
I just noticed in Comment 54 that I referred to my machine as Intel/PPC, which clearly is a mistake (it is an Intel). However, I have seen this bug on both Intel and PPC.
Comment 56 Steven Michaud [:smichaud] (Retired) 2008-06-07 15:48:18 PDT
There appear to be lots of ways this bug's hang can happen, but (as
far as I know) all of them involve re-initializing the plugins
database.  The first time this happens (since you started your
computer) is "free" -- the operation "works", though your computer
will now hang when you next shutdown or restart it.  But the next time
you re-initialize the plugins database (before you restart your
computer), you will hang right away.

The first time you run Firefox 3 after having run Firefox 2.0.0.X (i.e
the first time you upgrade), your plugins database is always
re-initialized.  And, as Tomcat mentioned in comment #17, instructions
on how to re-initialize the plugins database are included in our
"Troubleshooting plugins" document:

http://support.mozilla.com/en-US/kb/Troubleshooting+plugins#Re_initializing_the_plugins_database

So, though it's not guaranteed that either of these two operations
will cause a hang, it's quite likely they will do so.

And these aren't just ordinary hangs.  Most people probably won't
notice the unkillable zombie processes.  But many _will_ notice the
hangs on restart -- which can be a sign of hard disk problems, and
therefore quite frightening.

Altogether, I don't think we should do a FF 3 release that includes
this bug -- especially since we now have a fix for it.

Justin (in comment #44) suggested it might be possible to blocklist
the VerifiedDownloadPlugin without doing an RC3.  I don't know whether
or not this is possible (or reasonable).  If it's both possible and
reasonable it should be considered.  (It would presumably block the
plugin by name.  But I doubt that many users will be changing its
name, and as far as I know there aren't any other plugins that trigger
this bug.)

But if blocklisting the VerifiedDownloadPlugin isn't feasible (without
doing an RC3), I think we should do an RC3 and land this patch in it.

Many of us have worked our asses off on Firefox 3 for as much as two
years.  I don't think postponing the final release by a few weeks is
such a big deal in comparison.
Comment 57 Steven Michaud [:smichaud] (Retired) 2008-06-07 15:57:16 PDT
One more thing:

What's reported here (including the nature of my workaround) clearly
shows that this is an Apple bug, and not a Mozilla.org bug.  So in
principle we could just say "let Apple fix it".

Judging by what philippe says in comment #46, Apple appears to have
acknowledged that this is their bug.  But how long will it take them
to fix it?
Comment 58 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-07 20:42:02 PDT
I'm betting that we have to know the name and version that the plugin exposes before we can blocklist it, and we have to load the plugin before we can know those. Since the hang is in the dylib loader I think we're sunk.

I think this should block as well, even though I hate saying it.

This is clearly an Apple bug but I'm worried that Firefox will be blamed for it. Even if the blogosphere correctly recognizes Apple's fault the recommended workaround will still probably be "Don't use Firefox".

And as Steven points out we have no idea when Apple will fix it.
Comment 59 Samuel Sidler (old account; do not CC) 2008-06-07 20:54:18 PDT
A patch now exists for this bug. Re-nomming for 1.8.1.15 given the severity. (Don't kill me Dan!)
Comment 60 Brian Polidoro 2008-06-08 05:08:40 PDT
*** Bug 437752 has been marked as a duplicate of this bug. ***
Comment 61 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-08 06:28:50 PDT
This definitely blocks 1.9.0.1 - can we please expedite reviews?

I'm not sure that we should block 3.0 on this; doing so would require at least another week of time, and after looking around for complaints and instances of this happening, it doesn't seem to be very commonplace. Ars Technica recently wrote about it, and most of the comments on that article from OSX users are about how they're *not* seeing this problem.

Leaving it nominated for now to get other driver opinions.
Comment 62 Steven Michaud [:smichaud] (Retired) 2008-06-08 10:04:13 PDT
Created attachment 324203 [details] [diff] [review]
Fix rev1 (minor cleanup)

1) Fix the problem noticed by Nick Kreeger in comment #39 -- change
   "nlink()" in comment to "nlist()".

2) Add to comment explaining why we shouldn't use the symbol returned
   by nlist(), and why using nlist() shouldn't be a performance hit.

3) Expand "path" to make room for a path that's all double-byte UTF8
   characters -- admittedly an edge case, but still a reasonable (and
   almost completely cost-free) precaution.
Comment 63 Steven Michaud [:smichaud] (Retired) 2008-06-08 10:30:22 PDT
(Following up comment #60)

Bug 437752 documents a hang in _CFBundleCopyExecutableURLRaw -- not in
_CFBundleDlfcnLoadBundle as reported here.  (Yes, that bug's trace was
made with a distro whose debug symbols had been stripped, but system
calls are usually accurately reported even in those traces.)

If I remember right, I saw similar hangs while looking for a
workaround for this bug.  This raises the possibility that there are
other ways to see this hang than just a second call to load the
VerifiedDownloadPlugin (and that they will occur more often than we
currently expect).

If I'm right, my patch should also "fix" (i.e. work around) the hang
reported at bug 437752.  I've asked that bug's reporter to test my
tryserver patch from comment #38.
Comment 64 vangelis 2008-06-08 12:46:19 PDT
(In reply to comment #63)
> If I'm right, my patch should also "fix" (i.e. work around) the hang
> reported at bug 437752.  I've asked that bug's reporter to test my
> tryserver patch from comment #38.
> 
I'm testing it right now, will report back soon.
Comment 65 vangelis 2008-06-09 01:06:24 PDT
Created attachment 324252 [details]
Yet another hang report FF3rc2 10.5.3

This is another hang report generated on a different machine. Firefox 3 rc2 mac osx 105.3
Comment 66 Atsushi Sakai 2008-06-09 05:55:41 PDT
(In reply to comment #62)
> 3) Expand "path" to make room for a path that's all double-byte UTF8
>    characters -- admittedly an edge case, but still a reasonable (and
>    almost completely cost-free) precaution.

Please note that a Unicode character may be up to 4 bytes in UTF-8,
though I think such a long file name is an edge case.
Comment 67 Steven Michaud [:smichaud] (Retired) 2008-06-09 07:20:53 PDT
> Please note that a Unicode character may be up to 4 bytes in UTF-8

Thanks.  I wondered about that.

So it's probably best to use malloc to create a 'path' buffer of exactly the right length.  Revised patch coming up shortly.
Comment 68 Steven Michaud [:smichaud] (Retired) 2008-06-09 07:24:37 PDT
(In reply to comment #65)

I assume this crash _wasn't_ with my tryserver build.

By the way, you needn't report more crashes (with non-patched distros) unless they're significantly different from those that have already been reported.  This crash looks identical to the one at bug 437752.
Comment 69 John Daggett (:jtd) 2008-06-09 07:36:38 PDT
Steven, any clue as to why this suddenly started occurring with RC2?  Was a change made between RC1 and RC2 that affected our behavior with regards to this mysterious plugin?  Or did we just not catch it?  Is it something related to the order in which dynamic libraries are loaded and a change in between RC1 and RC2 somehow affected that?  It sounds like you have a grasp on how to work around the problem but I think it's important to try and figure out why this started happening.

Also, does the crash occur with the 10.5.4 seed?
Comment 70 Steven Michaud [:smichaud] (Retired) 2008-06-09 07:44:33 PDT
(In reply to comment #69)

This doesn't just happen with RC2.  It also happens with Firefox
2.0.0.X, and with all the FF3 nightlies I tested (not very
systematically) back into 2006.

I think there's almost no chance that the 10.5.4 seeds have a fix for
this -- otherwise Apple'd be telling us to wait for 10.5.4.  But I do
have access to them and haven't yet tried any.  I'll do so, probably
later today.
Comment 71 John Daggett (:jtd) 2008-06-09 07:49:49 PDT
(In reply to comment #70)
> This doesn't just happen with RC2.  It also happens with Firefox
> 2.0.0.X, and with all the FF3 nightlies I tested (not very
> systematically) back into 2006.

Ah, ok, that's a good thing I guess.  

I was only able to reproduce this with multiple profiles.  Was there a way to cause this *without* using multiple profiles?  And if not, any clues as to what happens differently when using multiple profiles that would cause this?
Comment 72 Steven Michaud [:smichaud] (Retired) 2008-06-09 07:56:03 PDT
(In reply to comment #71)

What's confusing is that you don't get a hang the _first_ time you dlopen the VerifiedDownloadPlugin -- only the second time (and subsequently).

But, knowing that, reproducing this problem is very easy:  Just make your plugins database be rebuilt more than once.  The easiest way to do that is to delete/rename ~/Library/Application Support/Firefox/Profiles/[xxxxxxxx]/pluginreg.dat and restart Firefox.
Comment 73 vangelis 2008-06-09 08:23:49 PDT
(In reply to comment #68)
> (In reply to comment #65)
> 
> I assume this crash _wasn't_ with my tryserver build.

That's right. I'm testing your tryserver build (https://build.mozilla.org/tryserver-builds/2008-06-06_08:40-smichaud@pobox.com-bugzilla436575/smichaud@pobox.com-bugzilla436575-firefox-try-mac.dmg) on two different machines right now, so far no hangs, and both restart/shut down as expected.

I deleted all FF related files/profiles/preferences before running it for the first time.

> 
> By the way, you needn't report more crashes (with non-patched distros) unless
> they're significantly different from those that have already been reported. 
> This crash looks identical to the one at bug 437752.
> 
OK, just that the new report comes from a different mac, so I posted it as a reference.

Comment 74 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 08:24:23 PDT
Can we get an answer about whether this plugin is blocklistable?  I haven't seen it here, and it would seem to matter a great deal to our forward path.

Given that it's also in FF2, I think it's not an FF3 blocker: we _haven't_ had a firedrill level of outcry from our millions of FF2-on-Mac users.  We'll get a fix in 2.0.0.something and 3.0.1, ideally with confirmation from Apple that using nlist will avoid the problem systematically.
Comment 75 Dave Townsend [:mossop] 2008-06-09 08:41:16 PDT
(In reply to comment #58)
> I'm betting that we have to know the name and version that the plugin exposes
> before we can blocklist it, and we have to load the plugin before we can know
> those. Since the hang is in the dylib loader I think we're sunk.

This is correct. On all platforms other than Windows we have to load the plugin library in order to gather all of the plugin info from it, so if the crash happens immediately on load then I think blocklist is not going to be of any use.

It does look like on Mac some of the info can be got without loading the library and technically it can block just based on a plugin library filename, but right now things aren't hooked up to evaluate the blocklist with only partial information.
Comment 76 Steven Michaud [:smichaud] (Retired) 2008-06-09 08:47:52 PDT
(In reply to comment #74)

> we _haven't_ had a firedrill level of outcry from our millions of FF2-on-Mac
> users

But remember that OS X 10.5.3 hasn't been out very long, that the symptoms of this bug can be quite confusing, and that they (i.e. the hangs on restart) can be mistaken for hard disk problems.
Comment 77 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 09:03:45 PDT
I'm aware of that; the thesis was that we would see OMG SKYFALL if we released FF3 without a workaround for this OS X bug, and I believe that the evidence does not support that thesis.  Given that it's not a regression vs. 2, I'm not sure why we would block the release on it -- nothing will degrade for users upgrading from FF2.

It doesn't get a fix in users' hands much faster, if at all faster, versus releasing on our current path and putting it in a quick 3.0.1 with the Mac Flash IME fix and other high-temperature items -- and we still don't have a reviewed patch.
Comment 78 Steven Michaud [:smichaud] (Retired) 2008-06-09 09:14:31 PDT
Created attachment 324290 [details] [diff] [review]
Fix rev2 (use calloc on path to avoid UTF8 issues)

A new tryserver build will follow shortly.
Comment 79 Steven Michaud [:smichaud] (Retired) 2008-06-09 09:31:48 PDT
Wan-Teh Chang, could you review this patch (attachment 324290 [details] [diff] [review]) as soon
as possible (or find someone else to do it)?

Josh moved last week, and is (apparently) still having a hard time
getting a reliable Internet connection.
Comment 80 Steven Michaud [:smichaud] (Retired) 2008-06-09 09:53:03 PDT
Here's a tryserver build made with my rev2 patch:

https://build.mozilla.org/tryserver-builds/2008-06-09_09:12-smichaud@pobox.com-bugzilla436575-rev2/smichaud@pobox.com-bugzilla436575-rev2-firefox-try-mac.dmg
Comment 81 Justin Dolske [:Dolske] 2008-06-09 10:10:41 PDT
(In reply to comment #74)

> Given that it's also in FF2, I think it's not an FF3 blocker: we _haven't_ had
> a firedrill level of outcry from our millions of FF2-on-Mac users.

Well, when does the browser itself delete (and recreate) pluginreg.dat? If this only commonly happens during a Firefox software update, I would worry that we don't have real-world data on impact yet -- 10.5.3 was released May 28th, and the last Firefox 2 update (2.0.0.14) was on April 17th. Most FF2 users would already have updated before getting 10.5.3, and so would not have triggered this bug. [Maybe this bug needs to be a .15 blocker?]

Comments here indicate current FF2 users can hit this when installing extensions -- but only sometimes (?), so I'm not sure how to interpret the lack of "OMG I installed an addon and FF2 hung" bugs.

The other worry point is that with all the FF3 launch hype, there will presumably be a lot of OS X users trying Firefox for the first time, and it sounds like they're all likely to hit this.
Comment 82 Samuel Sidler (old account; do not CC) 2008-06-09 10:12:38 PDT
Steven, does your patch apply on branch as well or will we need a new one if we decide to make this bug blocking?

Dolske, pluginreg.dat is regenerated whenever a new plugin gets installed or an old one gets upgraded. (See: upgrading from Firefox 2.0.0.14 to Firefox 3 RC2, which includes a newer JEP.)
Comment 83 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-09 10:27:29 PDT
Could we also get a clear answer to the following questions about the effect of this. I've read through the comments, and it's still unclear:
 - is the hang at Firefox startup or shutdown?
 - does the hang prevent a user from starting Firefox again?
 - does the hang prevent a user from starting other applications?
 - once experienced, does the hang always happen every time Firefox is used?

Also, do we have confirmation that this is Mozilla-only, or has it been reported in other applications?

Also, an answer to Justin's comment about "when does Firefox do the things that would trip this problem" would be super-duper-great.
Comment 84 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 10:38:59 PDT
A modest proposal:

- a truly minimal patch that simply checks the plugin filename against "VerifiedDownloadPlugin.plugin" and pretends it couldn't load the plugin (some error code); no changes to the way we load all plugins, no exposure to strange linking patterns on other plugins, etc.

- respin with that on Mac only

- omg, iPhone 2
Comment 85 Steven Michaud [:smichaud] (Retired) 2008-06-09 10:50:20 PDT
(In reply to comment #83)

Most of these questions are answered in comments above, somtimes
multiple times.  If you re-read only one of them, it should (I think)
be comment #56.

But, in a bug that's pretty complex and already has lots of comments,
additional repetition won't hurt -- so I'll answer all of your
questions again:

>  - is the hang at Firefox startup or shutdown?

Counting only the Firefox hangs for which we currently have reliable
information, they happen whenever the plugins database is
re-initialized _for the second time_ (the second time before your
reboot your computer).  As far as I know Firefox only ever
re-initializes the plugins database at startup.

But remember that as soon as you've re-initialized the plugins
database even once, you'll hang the next time you shutdown (or
restart) your computer.

>  - does the hang prevent a user from starting Firefox again?

Yes (assuming that the previous hang was caused by Firefox attempting
(and failing) to re-initialize your plugins database).  Restarting
your computer (and dealing with the additional hang on restart) gets
around this problem.

>  - does the hang prevent a user from starting other applications?

In principle this is possible, but we currently don't have any
reports.  The reason it's possible is that the hangs that occur at the
second attempt to access the VerifiedDownloadPlugin's executable are
triggered by _any kind of access_ (not just an attempt to dlopen the
plugin).

(I'll do some more testing on this and let people know my results.)

> - once experienced, does the hang always happen every time Firefox
>   is used?

Yes, until you restart (and hang on restart).  See my answer to your
second question.
Comment 86 Steven Michaud [:smichaud] (Retired) 2008-06-09 10:54:50 PDT
(In reply to comment #84)

So you're proposing making this change and not having an RC3?

That makes sense, in a way.

But you'll need to find someone else to do it -- someone who knows the
plugin loading code better and would know exactly where to intervene.
Comment 87 Steven Michaud [:smichaud] (Retired) 2008-06-09 10:56:14 PDT
(Following up comment #86)

If you do this, please open a new bug for it.
Comment 88 Paul O'Shannessy [:zpao] (not reading much bugmail, email directly) 2008-06-09 11:04:32 PDT
(In reply to comment #83)
>  - is the hang at Firefox startup or shutdown?

Startup.

>  - does the hang prevent a user from starting Firefox again?

In the same login session, yes (normal users anyway. FF can still be started from terminal with different profile or profile manager)

>  - does the hang prevent a user from starting other applications?

No

>  - once experienced, does the hang always happen every time Firefox is used?

Sometimes. With one profile I've gotten it to repeat multiple times (even after reboot). With another it started fine after reboot.

All of my tests were with RC2 using completely new profiles.  I didn't test the case which seems most worrisome - completely new FF users, but I think that was addressed above.
Comment 89 Steven Michaud [:smichaud] (Retired) 2008-06-09 11:06:18 PDT
(In reply to comment #82)

> Steven, does your patch apply on branch as well or will we need a
> new one if we decide to make this bug blocking?

I haven't tried it.  But the "original" code (what my patch changes)
looks exactly the same on the 1.8 branch as it does on the trunk (aka
the 1.9.0 branch).  So I think it should apply (maybe with some
offsets), and work just fine.
Comment 90 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-09 11:10:29 PDT
(In reply to comment #88)
> >  - is the hang at Firefox startup or shutdown?
> 
> Startup.

Say what? When did this change? Comment 0 expresses that the zombification is at shutdown, not startup.
Comment 91 Steven Michaud [:smichaud] (Retired) 2008-06-09 11:15:37 PDT
(In reply to comment #90)

The hang (in Firefox) when the plugins database is re-initialized and the hang at shutdown/restart are two different things.
Comment 92 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-09 11:31:02 PDT
So we hang at startup when we re-init the plugin, and that's a one-time only hang, but then we always hang at shutdown and create a zombie?

Someone break it down for me!
Comment 93 Steven Michaud [:smichaud] (Retired) 2008-06-09 11:40:52 PDT
> So we hang at startup when we re-init the plugin, and that's a
> one-time only hang, but then we always hang at shutdown and create a
> zombie?

Nope.  _Please_ read comment #56.

We hang the second time (and subsequent times) the plugins database is
re-initialized (since starting the computer).

If the plugins database has been re-initialized at least once (even
without a hang), the OS (not Firefox) hangs the next time you
shutdown/restart the computer.

(The zombies (of firefox-bin) are created every time we re-initialize
the plugins database (or hang while trying to do so).  If at least one
unkillable zombie is present, we hang at computer shutdown/restart.)
Comment 94 Samuel Sidler (old account; do not CC) 2008-06-09 11:58:48 PDT
(In reply to comment #84)
> - a truly minimal patch that simply checks the plugin filename against
> "VerifiedDownloadPlugin.plugin" and pretends it couldn't load the plugin (some
> error code); no changes to the way we load all plugins, no exposure to strange
> linking patterns on other plugins, etc.

Marking as blocking1.8.1.15 based on this method of workaround.
Comment 95 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 12:06:26 PDT
Created attachment 324315 [details] [diff] [review]
Censor VDP when opening plugins

I'm about to reboot into 10.5.3, ideally so that someone can tell me how to repro this without blowing away my profile.  It keeps VDP from being loaded, though, and looks to be high-level enough to avoid us touching the file at all.  Try server builds queued, I'll post the link here when they're up.
Comment 96 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 12:20:10 PDT
(In reply to comment #87)
> (Following up comment #86)
> 
> If you do this, please open a new bug for it.

(Why?  The patch addresses exactly this bug, as summarized and reported -- why wouldn't any other bug opened for it be DUP of this one?)
Comment 97 Steven Michaud [:smichaud] (Retired) 2008-06-09 12:27:41 PDT
(In reply to comment #96)

Yeah, you're right.  No need for a new bug.
Comment 98 Steven Michaud [:smichaud] (Retired) 2008-06-09 12:30:23 PDT
For what it's worth, I just found out that the new OS X 10.5.4 seed
(Build 9E6, available to ADC members) doesn't fix this bug.
Comment 99 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 12:44:01 PDT
https://build.mozilla.org/tryserver-builds/2008-06-09_12:04-shaver@mozilla.com-vdpcensor/shaver@mozilla.com-vdpcensor-firefox-try-mac.dmg has the try server build -- I can't make it hang, but haven't tried obliterating my hard-won profile entirely yet...
Comment 100 Mike Schroepfer 2008-06-09 12:56:07 PDT
Not sure exactly what VerifiedDownloadPlugin does - according to http://www.unsanity.org/archives/000465.php there is speculation that it verifies downloads of dashboard widgets...
Comment 101 Josh Aas 2008-06-09 12:58:24 PDT
Comment on attachment 324315 [details] [diff] [review]
Censor VDP when opening plugins

>+#ifdef DEBUG_shaver

I don't like debug stuff with people's names. Please either remove this or take the time to integrate it properly into regular debug builds if the info is important enough (better output formatting and simply ifdef DEBUG at a minimum).

>+     * Don't load the (useless) VDP plugin, to avoid tripping a bug in OS X
>+     * 10.5.3 (see bug 436575).
>+     */

What is your evidence that the plugin is useless? Just because you don't understand it doesn't mean it is useless. I think it is what tags downloaded files for Mac OS X's executable security purposes. I should be able to post more exact info on what the plugin does soon.

That said, this patch will disable that. Have you downloaded a DMG with this patch applied and launched the downloaded app, then compared the behavior you get to what happens in Safari? We should make sure we know exactly what security services we're disabling by doing this.

>+    if (!strcmp(temp.get(), "VerifiedDownloadPlugin.plugin"))
>+        return NS_ERROR_FAILURE;
>+

A printed warning is probably in order here, to remind developers of what we're doing.
Comment 102 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-09 13:07:55 PDT
(In reply to comment #101)
> What is your evidence that the plugin is useless? Just because you don't

I don't think any evidence is necessary. Loading the plugin causes a hang. We can't load it either way, so let's avoid the hanging part. :)
Comment 103 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 13:10:45 PDT
(In reply to comment #101)
> That said, this patch will disable that. Have you downloaded a DMG with this
> patch applied and launched the downloaded app, then compared the behavior you
> get to what happens in Safari?

Sam has, you get the mark-of-the-web warning still.

Safari, as reported here, doesn't load this file.  The recommended workaround is to move this file aside, I'm just emulating that workaround here.  It's entirely possible that simply opening the file triggers something to be mutated in our process, which is why it's there, but simply opening the file also causes us to _hang_, and that's what we need to avoid.  Steven's patch has the same behaviour in terms of whether we load the plugin, but affects more of what we do with other plugins.

application/x-verifieddownload is the content type built into Safari, so I suspect it doesn't use this file at all; if I were a betting man, I'd say it's related to Dashboard widgets.

> We should make sure we know exactly what
> security services we're disabling by doing this.

Sure; I'd love some information on that, but given that the workaround also disables it, and Apple's notorious reticence to actually explain things like this, I don't think we'll _ever_ know "exactly what" it does to our process.

I'll update the patch with an NS_WARNING and remove the DEBUG_shaver, assuming that you meant "r-" by your comments.
Comment 104 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 13:18:37 PDT
Created attachment 324322 [details] [diff] [review]
DEBUG_shaver gone, NS_WARNING added
Comment 105 Carsten Book [:Tomcat] 2008-06-09 13:19:52 PDT
With the new patch from shaver i can still reproduce a Firefox hang on shutdown
but no hang on system shutdown.

Steps to reproduce:
-> new Profile
-> start Firefox via the Command line/Terminal
-> Quit Firefox

you will see in the Terminal :
Tomcats-MacBook-Pro:MacOS carstenbook$ ### MRJPlugin:  getPluginBundle() here.
###
### MRJPlugin:  CFBundleGetBundleWithIdentifier() succeeded. ###
### MRJPlugin:  CFURLGetFSRef() succeeded. ###

You need to press Enter here to get out of this hang and to be able to restart
Firefox again, otherwise you get the Message that Firefox is still in use when
you don't do this and when you try to restart Firefox via the Application Menu.
Comment 106 Steven Michaud [:smichaud] (Retired) 2008-06-09 13:25:11 PDT
(In reply to comment #103)

> Steven's patch has the same behaviour in terms of whether we load
> the plugin, but affects more of what we do with other plugins.

My patch avoids linking in the plugin's executable (i.e. dlopening the
plugin), but doesn't avoid "loading" it in other respects
(e.g. reading from its Info.plist).

I agree that it's very unlikely that this makes any difference to the
plugin's operation ... but I, too, have no idea how the
VerifiedDownloadPlugin actually "works".

Even if Apple can't tell us all the VerifiedDownloadPlugin's secrets,
they _may_ be able to confirm (or deny) that there's no difference
between my patch and shaver's patch wrt the VerifiedDownloadPlugin's
functionality.
Comment 107 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 13:27:15 PDT
Comment on attachment 324322 [details] [diff] [review]
DEBUG_shaver gone, NS_WARNING added

Optimistically requesting approval.

(Requiring VDP to be loaded to implement a security feature would be very fragile -- if you start up Firefox and don't do anything related to plugin scanning, you never load it.  The people who would be protected by some back door thing triggered by VDP load would are the people who are hanging on 10.5.3 -- a minority of sessions to be sure, if not also users.)
Comment 108 Jesse Ruderman 2008-06-09 13:27:29 PDT
Please *don't* add an NS_WARNING.  It's not worth the console noise (bug 341986) for everyone on every startup to "remind" people that their build contains a workaround for this Mac OS X bug.
Comment 109 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 13:30:02 PDT
I don't see that warning on every startup, only when plugin scanning is triggered.  Are you seeing plugin scan triggering on every startup?  (If so, you're tripping this 10.5.3 bug every time, I'd imagine.)

I don't like the WARNING either, I'll admit, but if that's the cost of a review so I can get the patch into the tree and respin, I'll gladly bear it.  I can land a version without the warning in m-c if we come to some other agreement by then.
Comment 110 Bob Clary [:bc:] 2008-06-09 13:31:53 PDT
Created attachment 324325 [details]
Sample after main process killed

In case this will be helpful, this is a sampled stack of the hung process after the goQuitApplication call attempted to kill Firefox after a js test. It does implicate the profile service and file access.
Comment 111 Håkan Waara 2008-06-09 13:37:24 PDT
I'm no fan of warnings, but I can see that if us disabling this plugin causes unexpected problems later on, it might be helpful with a warning to quickly make the connection between that problem and this bug.
Comment 112 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 13:40:45 PDT
Nobody's going to make that connection, because they'll be numb to the console spew.  And we don't print it every time we don't load the plugin, just when we might have loaded it to scan and unload it.  (We don't load the plugin in the "normal" session case, in which there is no plugin-related activity to trigger a scan; in those cases, we don't print the message.)
Comment 113 Carsten Book [:Tomcat] 2008-06-09 13:48:17 PDT
(In reply to comment #105)
Note this is a improvement comparing to the original problem (and no regression
of the patch). 

Without the Patch from Shaver here, you get this hang too _but_ the Firefox
turn into a zombie process and so without the patch you get the unkillable
Firefox process and also the Mac 10.5.3 system shutdown problem.
Comment 114 Mike Schroepfer 2008-06-09 14:09:13 PDT
(In reply to comment #103)
> (In reply to comment #101)
> > That said, this patch will disable that. Have you downloaded a DMG with this
> > patch applied and launched the downloaded app, then compared the behavior you
> > get to what happens in Safari?
> 
> Sam has, you get the mark-of-the-web warnin

I've tried the new build, old build, and safari and get the same behavior when downloading files r.e. the warning.

Comment 115 Nelson Bolyard (seldom reads bugmail) 2008-06-09 14:18:31 PDT
Comment on attachment 324290 [details] [diff] [review]
Fix rev2 (use calloc on path to avoid UTF8 issues)

Some review comments from an NSPR module co-owner:



>+        CFStringRef executable = (CFStringRef)
>+            CFBundleGetValueForInfoDictionaryKey(lm->bundle, 
CFSTR("CFBundleExecutable"));

Nit: Please wrap that too-long line to live within 80 columns.

>+        if (executable) {
>+            char *subdir = "/Contents/MacOS/";

Better would be
              static const char subdir[] = "/Contents/MacOS/";

>+            char *path = NULL;

>+            // Allow for maximum possible length of executable's UTF8 equivalent.
>+            exec_len = CFStringGetLength(executable) * 4;
>+            path_len = strlen(lm->name) + strlen(subdir) + exec_len;
>+            path = (char *) calloc(path_len + 1, sizeof(char));

Because if the way that the path array is filled in below, it seems 
unnecessary to use calloc here.  malloc would suffice.

>+            if (path) {
>+                strcpy(path, lm->name);
>+                strcat(path, subdir);
>+                CFStringGetCString(executable, path + strlen(path),
>+                                   exec_len, kCFStringEncodingUTF8);
>+                nl[0].n_un.n_name = name;
>+                nlist_rv = nlist(path, nl);

There are no further references to path below here.  path is leaked.
Comment 116 Steven Michaud [:smichaud] (Retired) 2008-06-09 14:33:19 PDT
> path is leaked

Sorry, you're right.  I'll post a new patch that fixes this (and addresses your other comments).
Comment 117 Damon Sicore (:damons) 2008-06-09 15:08:02 PDT
Comment on attachment 324322 [details] [diff] [review]
DEBUG_shaver gone, NS_WARNING added

a1.9+=damons
Comment 118 Brendan Eich [:brendan] 2008-06-09 15:17:09 PDT
Comment on attachment 324322 [details] [diff] [review]
DEBUG_shaver gone, NS_WARNING added

Nit, ignore if not prevailing style: usually more readable if the major comment has a blank line before it (alternative here is to move temp's decl and init lines down under the comment, since there is a blank line above those two lines).

/be
Comment 119 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 15:19:11 PDT
Thanks, all.  Landed on trunk, please feel free to reland anywhere else that wants the fix.  (m-c is closed right now, I didn't feel that this could wait.)

/cvsroot/mozilla/modules/plugin/base/src/nsPluginsDirDarwin.cpp,v  <--  nsPluginsDirDarwin.cpp
new revision: 1.14; previous revision: 1.13
done
Comment 120 Steven Michaud [:smichaud] (Retired) 2008-06-09 15:34:07 PDT
Created attachment 324352 [details] [diff] [review]
Fix rev3 (fix leak and other problems)

This is probably academic now ... but just in case.

Here's a tryserver build:

https://build.mozilla.org/tryserver-builds/2008-06-09_14:51-smichaud@pobox.com-bugzilla436575-rev3/smichaud@pobox.com-bugzilla436575-rev3-firefox-try-mac.dmg
Comment 121 John Gaunt (redfive) 2008-06-09 15:53:22 PDT
(In reply to comment #105)
> With the new patch from shaver i can still reproduce a Firefox hang on shutdown
> but no hang on system shutdown.
> 

Is there no concern over this?

I'm a little shocked that a new patch was written and dropped into the bug,  landed, and the bug resolved to fixed in such quick fashion. Hurray for efficiency but there was an existing patch that fixed the issue with no reports of still seeing the bug and had 3 days of eyeballs on it instead of 3+ hours. Perhaps I'm just missing so more detailed understanding of the issue at hand, where shaver's patch is so much better.

Can anyone explain the reasoning here?
Comment 122 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-09 16:11:44 PDT
John, please see comment 113 where Carsten indicates that the needing to press enter only happens if you launch Firefox from the command line, and isn't actually related to Shaver's patch or this bug.

Marking blocking1.9+; we're going to figure out the best way to get this fix out there. In the meantime, Reed's clobbered OSX nightlies.
Comment 123 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-09 16:19:57 PDT
As Tomcat reported, the Firefox shutdown hang is distinct from the system hang that this bug is about; it's resolved by pressing Enter, and exists with or without my patch.  Why would there be concern over it?

My patch affects only the specific case that we are sure is a problem, and does not risk side-effects from changing how we load _all_ libraries and plugins on the Mac.  The version of Steven's patch that you're testing does have problems related to UTF-8 (see comment 78) and the subsequent one has a string leak on, I believe, all library loads.  The most recent patch is in fact newer than mine, and requires analysis of many more possible effects on the system.  (Does nlist handle symbols from dependent libraries the same way, for example?)

Are you really saying that you think we should have taken a change to NSPR's library-loading system instead of blocking the specific file that we don't want to open?  I certainly disagree, and the other endgame drivers seem to have the same opinion I do.
Comment 124 Justin Dolske [:Dolske] 2008-06-09 18:17:28 PDT
Comment on attachment 324322 [details] [diff] [review]
DEBUG_shaver gone, NS_WARNING added

Checked on on GECKO19_20080529_RELBRANCH, a=schrep.

Checking in modules/plugin/base/src/nsPluginsDirDarwin.cpp;
  new revision: 1.13.16.1; previous revision: 1.13
Comment 125 Wan-Teh Chang 2008-06-09 18:24:59 PDT
Comment on attachment 324352 [details] [diff] [review]
Fix rev3 (fix leak and other problems)

Steven, thanks for the patch.  Why do you say we're using dlopen()
in the comment?  We're not using dlopen().

If we don't need this patch any more, I'd prefer to not check it in.
Comment 126 Steven Michaud [:smichaud] (Retired) 2008-06-09 19:21:16 PDT
(In reply to comment #125)

> We're not using dlopen()

Actually we are, though only indirectly -- it's called (at several
removes) from CFBundleLoadExecutable() (and
CFBundleGetFunctionPointerForName()).  See the traces under "hang
stack" (attachment 323743 [details]) and "Sample of FirefoxDebug" (attachment
323806) above.

> If we don't need this patch any more, I'd prefer to not check it in.

Right now it doesn't look like it's needed.  But we haven't yet heard
the end of the story (from users or Apple).  So it _might_ be needed.

In any case I don't intend to try to land it now.
Comment 127 vangelis 2008-06-10 02:25:23 PDT
(In reply to comment #63)
> (Following up comment #60)
> 
> Bug 437752 documents a hang in _CFBundleCopyExecutableURLRaw -- not in
> _CFBundleDlfcnLoadBundle as reported here.  (Yes, that bug's trace was
> made with a distro whose debug symbols had been stripped, but system
> calls are usually accurately reported even in those traces.)
> 
> If I remember right, I saw similar hangs while looking for a
> workaround for this bug.  This raises the possibility that there are
> other ways to see this hang than just a second call to load the
> VerifiedDownloadPlugin (and that they will occur more often than we
> currently expect).
> 
> If I'm right, my patch should also "fix" (i.e. work around) the hang
> reported at bug 437752.  I've asked that bug's reporter to test my
> tryserver patch from comment #38.
> 

I had the _CFBundleCopyExecutableURLRaw unkillable hang (Bug 437752) on two macs and have been using the tryserver patch from comment #38 for 3 days now on these two machines. Firefox is running very smoothly, with no hangs and the system (10.5.3) also shuts down/restarts as expected.

Does it make sense to test tryserver rev3 from comment #120?

The only error I can "see" so far is:

Error: Warning: unrecognized command line flag -foreground
Source File: file:///Applications/Minefield.app/Contents/MacOS/components/nsBrowserContentHandler.js
Line: 661
Comment 128 Carsten Book [:Tomcat] 2008-06-10 04:19:29 PDT
(In reply to comment #127)
> The only error I can "see" so far is:
> 
> Error: Warning: unrecognized command line flag -foreground
> Source File:
> file:///Applications/Minefield.app/Contents/MacOS/components/nsBrowserContentHandler.js
> Line: 661
> 

Different Bug -> Bug 369147
Comment 129 Steven Michaud [:smichaud] (Retired) 2008-06-10 08:04:30 PDT
(In reply to comment #127)

> Does it make sense to test tryserver rev3 from comment #120?

Not for the moment, since a different patch for this bug has now
landed.  It's better to test with a currently nightly build (today's
or later), which should have that patch.  For example one of the
following:

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-06-10-04-mozilla1.9.0/firefox-3.0pre.en-US.mac.dmg
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-06-10-04-firefox3.0rc3/firefox-3.0.en-US.mac.dmg

If you see hangs or have plugin problems with these, come back and
test the tryserver rev3 build from comment #120 -- to see if it makes
any difference.
Comment 130 Carsten Book [:Tomcat] 2008-06-10 10:52:05 PDT
verified fixed using the patch from shaver on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9) Gecko/2008061004 Firefox/3.0 (Firefox 3 RC3 Candidate Build) and the known steps to reproduce. 

Filed Bug 438373 for the problem mentioned in comment #105
 
Comment 131 Daniel Veditz [:dveditz] 2008-06-10 12:51:13 PDT
Checked attachment 324322 [details] [diff] [review] into the 1.8 branch
Comment 132 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-10 12:53:08 PDT
I filed bug 438394 to track the landing of this on mozilla-central; do we want to also file a new bug on a non-workaround version?
Comment 133 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-06-10 12:56:45 PDT
Anything we do in our application will be a workaround; any real fix has to come from Apple, on their timeline.
Comment 134 Al Billings [:abillings] 2008-06-11 14:31:21 PDT
Sam, can you verify this fix for branch?
Comment 135 Mike Schroepfer 2008-06-17 15:26:23 PDT
Did this get landed on mozilla-central?
Comment 136 Håkan Waara 2008-06-17 15:29:01 PDT
(In reply to comment #135)
> Did this get landed on mozilla-central?

That's bug 438394 and it hasn't as far as I'm aware.
Comment 137 ronald.gold 2008-06-19 14:11:18 PDT
After 3 weeks of no problems with OS X 10.5.3, this morning on my Core 2 Duo MacBook, Camino 1.6.1, Camino 2.0a1-Pre, Safari, and Eudora all hang. Removing VerifiedDownloadPlugin.plugin does not solve the problem for me. I will attach the Reports of the hang for both Camino and Eudora.
Comment 138 ronald.gold 2008-06-19 14:17:52 PDT
Created attachment 325827 [details]
Report created by system after force quiting non-responding Camino
Comment 139 Steven Michaud [:smichaud] (Retired) 2008-06-19 14:40:44 PDT
(In reply to comment #137 and comment #138)

This is a new problem, unrelated to what's reported here.

Please open a new bug and attach your report to it.

Since the hang's in Camino, you should classify it as a Camino bug.
(If things turn out otherwise it can be reclassified.)
Comment 140 Simon Fraser 2008-06-19 18:02:28 PDT
That last hang looks like it's caused by 1Password.
Comment 141 Chris Lawson (gone) 2008-06-19 18:06:42 PDT
(In reply to comment #140)
> That last hang looks like it's caused by 1Password.

Bingo. See bug 440566, which was filed in response to comment 139.
Comment 142 ronald.gold 2008-06-20 03:54:21 PDT
Actually, (see Bug 440566), my problem was not caused by 1Password. Upgrading Apple Airport Utility to version 5.3.2 did something which resulted in hanging of all of my Internet applications, not just Camino, but also Safari, Apple Software Update, and Eudora. Reinstalling OS X 10.5.3 so that Airport Utility 5.2.2 was reinstalled solved the problem. No problems apparent with 1Password installed.
Comment 143 Samuel Sidler (old account; do not CC) 2008-07-01 00:50:59 PDT
Apple fixed this in 10.5.4, by my tests.
Comment 144 Chris Lawson (gone) 2008-08-13 20:04:01 PDT
*** Bug 442381 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.