Closed Bug 456705 Opened 16 years ago Closed 15 years ago

Firefox 2.0.0.17 crashes when opening a https-site or on shutdown with FoxyProxy 2.8.5 [@ nsSSLThread::Run]

Categories

(Core :: Security: PSM, defect, P2)

1.8 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: 1001110, Assigned: mayhemer)

References

()

Details

(5 keywords, Whiteboard: [semi dupe of bug 427715?])

Crash Data

Attachments

(8 files, 10 obsolete files)

9.58 KB, text/plain
Details
34.68 KB, text/plain
Details
27.19 KB, text/plain
Details
57.04 KB, patch
Details | Diff | Splinter Review
2.44 KB, patch
Details | Diff | Splinter Review
28.75 KB, patch
mayhemer
: review+
Details | Diff | Splinter Review
29.62 KB, patch
mayhemer
: review+
Details | Diff | Splinter Review
29.66 KB, patch
Details | Diff | Splinter Review
User-Agent:       Opera/9.27 (Windows NT 5.1; U; en)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.17) Gecko/20080829 Firefox/2.0.0.17

Firefox 2.0.0.17 crashes if I open a https-secured site. I tried different sites and all lead to Firefox crashing. It worked OK in the previous version. unsecured sites work OK.

Reproducible: Always

Steps to Reproduce:
1. start up Firefox
2. type in an URL starting with https:// or clicking on a link starting with https:// (i.e. https://bugzilla.mozilla.org/)

Actual Results:  
Firefox instantly crashes

Expected Results:  
Firefox should display the sites w/o chrashing

I use Firefox with the default theme and the following extensions:
Adblock Plus 0.7.5.5
Adblock Plus: Element Hiding Helper 1.0.5
FoxyProxy 2.8.5

The URLs I tried were NOT configured to run through FoxyProxy.
When you crash do you submit any Talkback Crash data?

Does the crash happen if you make a new profile for a test?
- http://support.mozilla.com/en-US/kb/Managing+profiles
Keywords: crash
Version: unspecified → 2.0 Branch
Firefox did not crash when I create a new profile. It also does NOT crash if I open https-sites in the new profile.

I did not sent Talkback Crash data - I've enabled it now. Starting Firefox with my default profile makes it crashing again on entering https-sites. It seems to be a problem with one of the extensions.
I have sent a crash report with the following description:
Crash related to:
https://bugzilla.mozilla.org/show_bug.cgi?id=456705

I hope that helps
The Feedback Agent just told me that it was unable to send the report since it is unable to connect to the server. I think there is some firewall on my side that is blocking it.
First of all: I'm not a developer. 

As you know that it's a plugin that causes the crash it would now be helpful to know, which plugin of those three crashes. Thus you might have to disable one add-on a time and restart firefox. 
First go to Tools > Add-ons. Right click an entry under 'Add-Ons' and click 'disable' in the context menu entry. Restart Firefox.

Then try again to access a https web site. 

When you know the add-on that crashes, go to the appropriate web site of this add-on and report the issue there. 

Adblock Plus: http://adblockplus.org/forum 
FoxyProxy: http://foxyproxy.mozdev.org/drupal (Section 'Bugs').
For future reference, plugins are specifically things like flash, java, etc, whilst extensions are specifically the firefox addons.
(In reply to comment #5)
> For future reference, plugins are specifically things like flash, java, etc,
> whilst extensions are specifically the firefox addons.

You're right and of course I know the difference. Just muddled these two up this time.
It is a problem with Adblock Plus: Element Hiding Helper 1.0.5 and FoxyProxy 2.8.5. If one of these two extensions is enabled in any combination, Firefox will crash (on https).

Adblock Plus 0.7.5.5 alone will work fine. FoxyProxy (even if enabled alone) will crash Firefox. Adblock Plus: Element Hiding Helper will run only with Adblock Plus. Running the two Adblock Plus extensions together will crash Firefox.

I will go to the extension forums with this problem.
Additional info: This is a 100% reproducible problem. Using XP Pro SP3, Fx 2.0.0.17 of course and FoxyProxy 2.8.5. I have 42 installed extensions. Of those one is Adblock Plus 0.7.5.5 which as a previous poster stated causes no problems. 

When running Fx in Safe Mode, no problems. Running with all add-ons disabled, no problems. Running with all add-ons enabled except FoxyProxy, no problems. Problems occur only if FoxyProxy is enabled. This is as standalone only add-on installed or with other extensions installed (enabled or disabled). 

I have also reported this problem to the author at:
http://foxyproxy.mozdev.org/drupal/content/foxyproxy-crashes-firefox-20017-shutdown
Is this a regression? Did things work in 2.0.0.16
An addon should not be able to crash the browser unless the addon is using binary contents (not only written in JS/xul)
(In reply to comment #9)
> Is this a regression? Did things work in 2.0.0.16

In my case things worked in 2.0.0.16, failed after upgrading to 2.0.0.17
Keywords: regression
Same here, things where fine in Firefox 2.0.0.16.
(In reply to comment #11)
> An addon should not be able to crash the browser unless the addon is using
> binary contents (not only written in JS/xul)

True, and yet just as a web page "should not" be able to crash the browser it does sometimes happen.
Flags: blocking1.8.1.18?
I confirmed the bug. It Foxyproxy 2.8.5 crashes Firefox 2.0.0.17 on first run after installation. On subsequent runs, I'm not seeing it crash.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Firefox 2.0.0.16 does not crash on shutdown with FoxyProxy 2.8.5, but 2.0.0.17 does. Talkback Id for this crash: TB49885565Z
I see lots of crashes like TB49885565 at [0x00000000 8560629a] with no stack (i.e. useless), at least one of which mentions FoxyProxy and a couple mention it's constant since upgrading. Also lots of crashes at PL_DHashTableOperate doing SSL, like TB49869527. Might be related or maybe two different problems.
Summary: Firefox crashes when opening a https-site → Firefox 2.0.0.17 crashes when opening a https-site or on shutdown with FoxyProxy 2.8.5
Attached file stack Mac 10.5.5.
stack using foxyproxy and 2.0.0.17 Debug Build on Mac (Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.18pre Gecko/2008092519 Firefox/2.0.0.18pre).
OS: Windows XP → All
Hardware: PC → All
Summary: Firefox 2.0.0.17 crashes when opening a https-site or on shutdown with FoxyProxy 2.8.5 → Firefox 2.0.0.17 crashes when opening a https-site or on shutdown with FoxyProxy 2.8.5 [@ nsSSLThread::Run]
As author of FoxyProxy, I can tell you that I have no idea how to fix this. Could really use some help. FoxyProxy has no binary components.
As mentioned in bug 456705, this is affecting roughly 35,000 users. I've confirmed that FoxyProxy with Firefox 2.0.0.17 crashes whenever visiting a SSL site.
Crash does not occur with versions 2.x versions of Firefox before 2.0.0.17, and it doesn't occur with Firefox 3.x.
Attached file backtrace on ubuntu
Attached is the backtrace for ubuntu
This seems a consistent enough crash that we can narrow down the regression window -- that'd be a good start.
Product: Firefox → Core
QA Contact: general → general
Version: 2.0 Branch → 1.8 Branch
i will work on a regression range for this bug.
well, here are some problems (based on attachment 340417 [details]):

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp&rev=1.97.2.20&mark=1527#1487
nsSSLIOLayerHelpers::Init() can fail.

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSComponent.cpp&rev=1.126.2.9&mark=292#274
nsNSSComponent::nsNSSComponent() assumes it doesn't (and can't really do much if it does...) - nsNSSComponent::InitializeNSS should probably be used instead....

this code is broken:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp&rev=1.97.2.20&mark=1356,542#531

it should null out mutex because it's not a class variable, it's a static.

attachment 340425 [details] deals w/ a different mutex which seems less likely to be dead in the same way, afaict it should be alive here:
1022       nsAutoLock threadLock(ssl_thread_singleton->mMutex);
and unhappy here:
1046     nsAutoLock threadLock(ssl_thread_singleton->mMutex);

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsSSLThread.cpp&rev=1.2.2.8&mark=1022,1046,61#1020

note that conceivably if the lifespan of the thread is wrong, bad things could happen, however i'm not able to find an obvious path for this (and finding a pretty source browser for foxyproxy was hard, so i gave up [yes, i downloaded the addon itself, but i have to pack for vacation or something...]).

attachment 340463 [details] is different. table is null. it could stem from failing to check the Init method:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp&rev=1.97.2.20&mark=1537#1533

but again this is unlikely (although it is a bug).
eric, this comment's for you (comment 28 was for kaie):

proxy.js has:
    fileProtocolHandler = CC["@mozilla.org/network/protocol;1?name=file"].createInstance(CI["nsIFileProtocolHandler"]);
which is wrong. protocolhandlers are singletons. the proper way to get one can be found here:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/toolkit/components/downloads/src/nsDownloadManager.cpp&rev=1.53.2.13&mark=1663-1666#1660

mook points out that nsIProtocolHandlers aren't usefully threadsafe, you must get a proxy for them, and in fact, you really want them to give you proxied objects, otherwise what you get is fairly useless.

you should look at some patches i've done involving nsIURIs and crashing (i think i may have even written some of them at your place)
Kai: comment 28 was for you
@timeless: pretty source browser for foxyproxy is here: http://trac.leahscape.com/trac/foxyproxy/browser. Many thanks for the help.

I've replaced all references of CC["@mozilla.org/network/protocol;1?name=file"].createInstance(CI["nsIFileProtocolHandler"]) with CC["@mozilla.org/network/protocol;1?name=file"].getService(CI["nsIFileProtocolHandler"]) and, at least in my initial testing, the crashes are fixed.

Instead of explicitly creating proxy objects for the service (can that be done in JS?), I've tried to ensure that use of the nsIFileProtocolHandler service is single-threaded by creating a reference to the service each and every time it's used--instead of storing references in variables.

IOW, this kind of code:

var fph = CC["@mozilla.org/network/protocol;1?name=file"].getService(CI["nsIFileProtocolHandler"]);
function doStuff() {
 // use fph here. fcn may be called whenever and by whomever
}

has been converted to this kind of code:

function doStuff() {
  var fph = CC["@mozilla.org/network/protocol;1?name=file"].getService(CI["nsIFileProtocolHandler"]);
 // use fph here. fcn may be called whenever and by whomever
}

Is that sufficient to guarantee single-threaded use of the service within JS?
Found the regression window:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.17pre)
Gecko/2008082603 BonEcho/2.0.0.17pre - works on SSL Sites with Proxyproxy
installed -> no crash
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.17pre)
Gecko/2008082703 BonEcho/2.0.0.17pre - fails on SSL Sites with Proxyproxy
installed ->  crash

Bonsai Query for this Timeframe -> http://tinyurl.com/3hb75u
(In reply to comment #32) 
> Bonsai Query for this Timeframe -> http://tinyurl.com/3hb75u

This crash goes away if I back out the fix for bug 445890 ("XMLHttpRequest.responseXml not accessible from signed remote XUL code").

FoxyProxy uses XMLHttpRequest to load a PAC file, load strings from a chrome: URI, and load an .xml settings file out of the prefs. I don't know what XMLHttpRequest and multiple instances of the file protocol handler have to do with crashing on SSL connections.
Blocks: 445890
> and load an .xml settings file out of the prefs

I meant "profile [directory]".
(In reply to comment #33)
> This crash goes away if I back out the fix for bug 445890
> ("XMLHttpRequest.responseXml not accessible from signed remote XUL code").
> 
> FoxyProxy uses XMLHttpRequest to load a PAC file, load strings from a chrome:
> URI, and load an .xml settings file out of the prefs. I don't know what
> XMLHttpRequest and multiple instances of the file protocol handler have to do
> with crashing on SSL connections.

I don't know if this helps, but FoxyProxy does define a custom protocol handler (see components/relativeprotocolhandler.js). It is used to handle PAC files specified with the FoxyProxy-specific scheme "relative://" Although this crash occurs when accessing SSL connections, I thought I'd mention the custom protocol handler since it's not obvious to those unfamiliar with FoxyProxy and *is* related to protocol handlers.

From the FoxyProxy help:

For PAC files on an ftp server, use the ftp:// scheme. For example, ftp://leahscape.com/path/proxy.pac. You can also use http://, https://, file:// and any other supported scheme. Finally, if you need to refer to a PAC file on the local file system using a relative path, you can use the special FoxyProxy relative:// scheme. This is useful if your PAC file resides on a thumb drive. Supported relative paths are documented here (http://foxyproxy.mozdev.org/relativescheme.html). Special strings in the relative:// URL you specify are replaced with their corresponding values. For example, relative://ProfD/pacs/proxy3.pac points to the file proxy3.pac in the subdirectory of the Firefox profile directory named pacs/. With Portable Firefox on Windows, the Firefox profile directory is always a relative path because the drive letter can change across computers. Please note that special strings (ProfD, Home, TempD, etc.) are case-sensitive."
With FoxyProxy enabled I get lots of these even before I try to visit an SSL sites:

###!!! ASSERTION: nsNSSComponent is a singleton, but instantiated multiple times!: '(0 == mInstanceCount)', file /Users/daniel/dev/ff2/mozilla/security/manager/ssl/src/nsNSSComponent.cpp, line 297
###!!! ASSERTION: nsSSLThread is a singleton, caller attempts to create another instance!: '!ssl_thread_singleton', file /Users/daniel/dev/ff2/mozilla/security/manager/ssl/src/nsSSLThread.cpp, line 54
Break: at file /Users/daniel/dev/ff2/mozilla/security/manager/ssl/src/nsSSLThread.cpp, line 54
###!!! ASSERTION: nsCertVerificationThread is a singleton, caller attempts to create another instance!: '!verification_thread_singleton', file /Users/daniel/dev/ff2/mozilla/security/manager/ssl/src/nsCertVerificationThread.cpp, line 100
Break: at file /Users/daniel/dev/ff2/mozilla/security/manager/ssl/src/nsCertVerificationThread.cpp, line 100

...mostly for nsNSSComponent. With the FoxyProxy addon removed i don't see those.
Now I can tie the NSS assertion, FoxyProxy, and the XMLHttpRequest change together. In 2.0.0.17 w/FoxyProxy the first call to the nsNSSComponent constructor is due to FoxyProxy doing a XMLHttpRequest to load "chrome://foxyproxy/content/strings.xml" during its AppStartup notification. Along the way this calls nsJARChannel::GetOwner(), and that has to check to see if the jar is signed which fires up nsNSSComponent. Apparently it's too early to initialize NSS? nsNSSComponent::Init() fails because it can't get NS_APP_USER_PROFILE_50_DIR. This results in nsNSSComponent service not getting added to the component manager's service hashtable, which means it'll try again later even though nsNSSComponent has now been created and partially initialized.

The change in bug 445890 tried to set an owner on the channel, but mPrincipal was null. If there had been an owner that would have avoided the jar signature check. In other uses (loading our own resources) the jar channels usually have an explicit system principal owner -- maybe XMLHttpRequest should be using that when called from chrome?

Not always though: when we load chrome://global/skin/globalBindings.xml#tabbtowser-tabs we don't have an owner on the jar channel there either, so that becomes the second instantiation of NSSComponent to verify the jar. This one also returns an error so doesn't get added to the service table. If FoxyProxy is removed we still get this chrome load being the one that starts up NSS, but the service initialization is successful in that case.

If foxyproxy were "flat" instead of jarred this wouldn't come up, but that's ducking the issue.

Using XMLHttpRequest on a chrome: resource seems like an abuse of the feature, but I suppose it's an attractive nuisance to have a feature that does so much for you. Stringbundle is the feature made for loading localized strings, but that's a little more code. Is it a common pattern for addons to avoid stringbundles by using XML entities and XHR instead?
(In reply to comment #37)

> Stringbundle is the feature made for loading localized strings, but
> that's a little more code. Is it a common pattern for addons to avoid
> stringbundles by using XML entities and XHR instead?

The use of localized XML entities here is a way to avoid redundant translation strings. To elaborate:

I do indeed use a stringbundle (search for chrome://foxyproxy/locale/foxyproxy.properties in foxyproxy.js). However, it's my understanding that stringbundles can only load property files. They cannot load DTD files. Often a string is used in both a XUL file and in a JS file not associated with a XUL dialog. Rather than duplicate the string (once in a DTD file and once in a property file), requiring translators double work, this technique is used to access DTD-based strings from JS.

I would speculate this pattern isn't done frequently, but I do know at least one other major extension which does it--FoxClocks. Andy McDonald, FoxClocks author, was the one who coined this idea AFAIK.

I am open to alternative uses of stringbundles/DTD files/etc to avoid double translation issues... please let me know if Andy and I have missed the obvious!

> If foxyproxy were "flat" instead of jarred this wouldn't come up, but that's
> ducking the issue.

I realize you're trying to fix the underlying cause. At the moment, I'm trying to patch FoxyProxy so Firefox 2.x users can use it. With that in mind, do you know if this would come up if I avoid XHR for reading strings.xml and instead use a file input stream?
(In reply to comment #38)

> I am open to alternative uses of stringbundles/DTD files/etc to avoid double
> translation issues... please let me know if Andy and I have missed the obvious!

p.s. using document.getElementById("myLocalizedLabel").value isn't possible here because the XUL dialog may not be open when the string is needed in JS
(In reply to comment #37)

> If foxyproxy were "flat" instead of jarred this wouldn't come up, but that's
> ducking the issue.

I've built a "flat" (jar-less) XPI of foxyproxy and, indeed, the crashing behavior disappears on Windows and Ubuntu; can't test OS/X. I am releasing this shortly to AMO, so please don't blacklist FoxyProxy for Firefox 2.0.0.17.
To the OP and other FoxyProxy users: please get FoxyProxy 2.8.6 at https://addons.mozilla.org/en-US/firefox/addon/2464. It has the work-around mentioned by Dan in comment 37 and does not crash Firefox 2.0.0.17.
To comment #41: I just updated and got 2.8.7 which allows me to enable FoxyProxy and still surf pages like this (HTTPS). I'll have to keep an eye on the crashes and hangs on exit, but would otherwise say "WORKS FOR ME" (Thank you!).
I have now installed FoxyProxy 2.8.8 and it works fine for me. No crashes on https:// and no crashes or hangs on exit.
(In reply to comment #36)
> ###!!! ASSERTION: nsNSSComponent is a singleton, but instantiated multiple
> times!: '(0 == mInstanceCount)', file

This should get changed into a hard abort.
Flags: blocking1.8.1.18? → blocking1.8.1.18+
(In reply to comment #37)
> Now I can tie the NSS assertion, FoxyProxy, and the XMLHttpRequest change
> together. In 2.0.0.17 w/FoxyProxy the first call to the nsNSSComponent
> constructor is due to FoxyProxy doing a XMLHttpRequest to load
> "chrome://foxyproxy/content/strings.xml" during its AppStartup notification.
> Along the way this calls nsJARChannel::GetOwner(), and that has to check to see
> if the jar is signed which fires up nsNSSComponent. Apparently it's too early
> to initialize NSS? nsNSSComponent::Init() fails because it can't get
> NS_APP_USER_PROFILE_50_DIR. This results in nsNSSComponent service not getting
> added to the component manager's service hashtable, which means it'll try again
> later even though nsNSSComponent has now been created and partially
> initialized.

Yeah as a general rule of thumb you should ignore really doing anything in the app-startup notification. The profile has not been selected at that point and I think it isn't even determined that the app will not be restarted to install/uninstall extensions as well.

The normal pattern is to use an app-startup observer to register for the profile-after-change notification in the observer service. This notification fires after essentially everything is available. In Firefox 3.1 you will even be able to directly register for profile-after-change in the category manager and avoid app-startup entirely.
(In reply to comment #47)
> (In reply to comment #37)
> > Now I can tie the NSS assertion, FoxyProxy, and the XMLHttpRequest change
> > together. In 2.0.0.17 w/FoxyProxy the first call to the nsNSSComponent
> > constructor is due to FoxyProxy doing a XMLHttpRequest to load
> > "chrome://foxyproxy/content/strings.xml" during its AppStartup notification.
> > Along the way this calls nsJARChannel::GetOwner(), and that has to check to see
> > if the jar is signed which fires up nsNSSComponent. Apparently it's too early
> > to initialize NSS? nsNSSComponent::Init() fails because it can't get
> > NS_APP_USER_PROFILE_50_DIR. This results in nsNSSComponent service not getting
> > added to the component manager's service hashtable, which means it'll try again
> > later even though nsNSSComponent has now been created and partially
> > initialized.
> 
> Yeah as a general rule of thumb you should ignore really doing anything in the
> app-startup notification. The profile has not been selected at that point and I
> think it isn't even determined that the app will not be restarted to
> install/uninstall extensions as well.
> 
> The normal pattern is to use an app-startup observer to register for the
> profile-after-change notification in the observer service. This notification
> fires after essentially everything is available. In Firefox 3.1 you will even
> be able to directly register for profile-after-change in the category manager
> and avoid app-startup entirely.

The only thing FoxyProxy does in app-startup is:

gObsSvc.addObserver(this, "quit-application", false);
gObsSvc.addObserver(this, "domwindowclosed", false);
gObsSvc.addObserver(this, "profile-after-change", false);

The XHR to load chrome://foxyproxy/content/strings.xml is done in profile-after-change.
OK.  I've finally had a chance to read through this whole thing...

Looks like dveditz is spot on in comment 37.  We should probably fix PSM/NSS so that it doesn't leave itself in an inconsistent state.

Re: comment 38, using a file input stream would in fact help the problem too.

Re: comment 48, the XHR happens from _loadStrings(), which is called from the constructor, which is executed when the core code creates the component so that it can send it the app-startup notification.  So it's not happening _in_ FoxyProxy's app-startup, but immediately before it.

Over to PSM to fix the initialization issue...
Assignee: nobody → kaie
Component: General → Security: PSM
QA Contact: general → psm
Flags: blocking1.9.1?
(In reply to comment #49)
 
> Re: comment 38, using a file input stream would in fact help the problem too.

OK, thanks.

> Re: comment 48, the XHR happens from _loadStrings(), which is called from the
> constructor

Yep, you're right. I was looking at loadSettings(), not _loadStrings(). Sorry for the confusion.
Last week Boris helped to understand this bug better, I promised to summarize the findings here.

In addition to the bug already reported above, Doug ran into another scenario that trigger the failure, with assertion

###!!! ASSERTION: nsNSSComponent is a sin
gleton, but instantiated multiple times!: '(0 == mInstanceCount)', file c:/builds/mobile/mozilla-central/security/manager/ssl/src/nsNS
SComponent.cpp, line 300


When we first try to init the XPCOM nsNSSComponent, we sometimes fail. No service object gets registered. As a result it will be retried at a later time, when someone else asks for the service.

Unfortunately the nsNSSComponent fails to clean up correctly. This is because there are strong references to it, despite the init failure, and therefore the XPCOM manager fails to clean up the failed instance.

Boris identified that the references are because of RegisterObserver. We must unregister on failure. The init code should get checked for other activity that needs to be undone.

I'd also like to add some self protection mechanism. After we do above cleanup, should we still run into multiple instances, the secondary instances should refrain from any activity.
Kai: does the 1.9 branch have a similar initialization problem, even if it's not causing crashes?
Flags: blocking1.9.0.5?
Flags: blocking1.8.1.19?
Flags: blocking1.8.1.18+
Flags: blocking1.9.1? → blocking1.9.1+
Kai: I think we need to clean up the initialization issues here. FoxyProxy may not be triggering this crash anymore, but having to un-jar the addon is an ugly workaround that shows it's really a core issue we have to clean up. There might be other paths that similarly initialize NSS too early.
Flags: wanted1.9.0.x+
Flags: wanted1.8.1.x+
Flags: blocking1.9.0.5?
Flags: blocking1.9.0.5+
Flags: blocking1.8.1.19?
Flags: blocking1.8.1.19+
Whiteboard: [needs 1.8/1.9 patches]
Whiteboard: [needs 1.8/1.9 patches] → [needs 1.8/1.9 patches][semi dupe of bug 462806?]
Kai, what's the status of this bug? I could work on it if you don't have cycles.
Boris is doing some work to mitigate this in bug 462806, but that doesn't solve NSS issue.
Flags: blocking1.8.1.19+
The pressure is off the immediate problem with Boris's fix, but we'll still want NSS to initialize/shutdown cleanly. Will take branch patches when this blocking bug is fixed on mozilla-central.
Flags: blocking1.9.0.5+
-> me
Assignee: kaie → honzab.moz
Status: NEW → ASSIGNED
Attached patch v1 for 1.8 branch (obsolete) — Splinter Review
This is branch version of my fix for the psm/nss initialize.
- EnsureNSSInitialized is now designed to drop 'haveLoaded' flag when psm init fails (and when service is released) to allow retry later
- Moved thread creation from the constructor to safe position in Init() method; second creation caused deadlock of sockets
- Releasing the instance from observer service as Boris suggested; sufficient to let the service be released
- NSS generic constructors now fails when "@psm;1" could not be initialized; this probably also fixes bug 427715 (have no STR to check)
- This patch is missing a way to block second creation of nsNSSComponent instance (by accidental call of createInstance); should also be introduced?

Tested w/ and w/o Boris' patch for bug 462806 and isntalled FoxyProxy. No crashes.
Attachment #349913 - Flags: review?(kaie)
Attached patch v1 for 1.9.1 (obsolete) — Splinter Review
Same as v1 for 1.8 branch, just merged.
Attachment #349914 - Flags: review?(kaie)
Blocks: 427715
(In reply to comment #59)

> Tested w/ and w/o Boris' patch for bug 462806 and isntalled FoxyProxy. No
> crashes.

Please be sure you're using FoxyProxy 2.8.5 or earlier. 2.8.6 and higher work around this bug by not using a jar in the XPI. You can get 2.8.5 at https://addons.mozilla.org/en-US/firefox/addons/versions/2464
I did use 2.8.5.
Priority: -- → P2
I'm assuming that this fix will make it into a new nss/nspr release sometime soon? I currently have nss-3.12.2.0 on Fedora 9, which clearly does not have the fix (a plugin on my MUA uses curl to fetch RSS feeds which uses nss, it crashes regularly). Can someone suggest when an updated nss will appear ready for distro packagers to include it in updates?
Both patches are PSM fixes. There is no change to NSS.
bdm: please file a bug with a stack trace (from gdb, not a system call trace!), psm is based on xpcom which is not something that curl would typically use.
Should I be filing this as a Mozilla bug? Or a Fedora bug against nss?

I'm actually using Claws Mail with the RSSyl RSS plugin that calls curl which then uses nss for https:// RSS feeds.
Or a Claws Mail bug, or an RSSyl plugin bug?

I'd start by filing against the thing using NSS in this case, and seeing what they say.
I Bugzilla'd this on the Fedora Bugzilla against nss, so far I have not seen any response. There is a stack trace there as requested.

I suppose I could put it in the Mozilla bugzilla, but I don't understand the difference between nss and xpcom or the way in which they fit with the Fedora packages.
The Fedora bug link is below:

https://bugzilla.redhat.com/show_bug.cgi?id=470779
This bug is not about NSSRWLock_LockRead_Util crashes and isn't likely to fix them. The redhat bug is something else (which we're also suffering as a topcrash, so I'm sure we've got a bug on it too).
The workaround for this bug described by Dan in comment #37 (packaging the extension flat instead of jar'd) causes another issue: plugins (not addons) are mysteriously enabled after restart,excepting the java plugin. That is, if you disable all plugins and restart FF, all plugins (except java) are re-enabled. FoxyProxy does nothing to plugins explicitly. Jar'ing the XPI fixes the problem. There is a full description here: http://foxyproxy.mozdev.org/drupal/content/foxyproxy-286-force-enables-all-plugins

Please let me know if I should open a new issue for this.

Thanks,
Eric
You should open a new issue for that! Best to err on the side of unnecessarily opening a new bug in general, too - they're cheap!
(In reply to comment #72)
> You should open a new issue for that! Best to err on the side of unnecessarily
> opening a new bug in general, too - they're cheap!

Done. Bug 471245.
Keywords: qawanted
Whiteboard: [needs 1.8/1.9 patches][semi dupe of bug 462806?] → [has 1.8/1.9 patches][semi dupe of bug 427715?][needs review kaie]
Attachment #349913 - Flags: review?(kaie) → review?(wtc)
Attachment #349914 - Flags: review?(kaie) → review?(wtc)
     NS_ADDREF(inst);                                                          \
     rv = inst->_InitMethod();                                                 \
     if(NS_SUCCEEDED(rv)) {                                                    \
         rv = inst->QueryInterface(aIID, aResult);                             \
+        if (triggeredByNSSComponent)                                          \
+            EnsureNSSInitialized(ensureCalledByNSSComponent);                 \
     }                                                                         \
+    else                                                                      \
+        EnsureNSSInitialized(ensureReset);                                    \


Are you sure you always want to reset? What about:
+        if (triggeredByNSSComponent)                                          \
+            EnsureNSSInitialized(ensureReset);                                \

Or is there a reason I don't see?
In my understanding, we can arrive here while NSS is initialized, but we failed to create some other object (maybe hash wrapper object etc.)


>-// We must ensure that the nsNSSComponent has been loaded before
>-// creating any other components.
>-static void EnsureNSSInitialized(PRBool triggeredByNSSComponent)
>-{
>-  static PRBool haveLoaded = PR_FALSE;
>-  if (haveLoaded)
>-    return;
>-
>-  haveLoaded = PR_TRUE;
>-  
>-  if (triggeredByNSSComponent) {
>-    // We must prevent a recursion, as nsNSSComponent creates
>-    // additional instances
>-    return;
>-  }
>-  
>-  nsCOMPtr<nsISupports> nssComponent 
>-    = do_GetService(PSM_COMPONENT_CONTRACTID);
>-}


I'm wondering about races for variable haveLoaded.
With the old code, the variable started at false, and whatever happened afterwards, it remained at true, so it wasn't necessary to use a lock/mutex.

I see a possible race with the new code, because of your new "reset" feature.

Let's avoid unnecessary changes to the variable, only change when it's about the nssComponent object.


>+// We must ensure that the nsNSSComponent has been loaded before
>+// creating any other components.
>+PRBool EnsureNSSInitialized(nssEnsureNSSInitializedOp op)
>+{
>+  static PRBool haveLoaded = PR_FALSE;
>+
>+  if (op == ensureReset) {
>+    haveLoaded = PR_FALSE;
>+    return PR_FALSE;
>+  }
>+
>+  if (haveLoaded)
>+    return PR_TRUE;
>+
>+  haveLoaded = PR_TRUE;

I propose to remove this line

>+  
>+  if (op == ensureCalledByNSSComponent) {
>+    // We must prevent a recursion, as nsNSSComponent creates
>+    // additional instances

add
      haveLoaded = PR_TRUE;
here

>+    return PR_TRUE;
>+  }
>+  
>+  nsCOMPtr<nsISupports> nssComponent 
>+    = do_GetService(PSM_COMPONENT_CONTRACTID);
>+
>+  // Check if something didn't fail during nss init, if so,
>+  // uncheck the haveLoaded flag to try again later.
>+  if (!nssComponent)
>+    haveLoaded = PR_FALSE;

revert this to:

    if (nssComponent)
      haveLoaded = PR_TRUE;

>+
>+  return haveLoaded;
>+}

Does this make sense?
>+  // Check if something didn't fail during nss init, if so,
>+  // uncheck the haveLoaded flag to try again later.

I think you have a typo, accidental double negation here.
And with the new code, maybe you want to write
  "Check if NSS init succeeded"
or maybe it's obvious now and you can delete the comment.
I said

    >+  if (op == ensureCalledByNSSComponent) {
    >+    // We must prevent a recursion, as nsNSSComponent creates
    >+    // additional instances

    add
          haveLoaded = PR_TRUE;
    here


Actually, can we remove that assignement completely?
If we only do the final

    if (nssComponent)
      haveLoaded = PR_TRUE;

does it still work?
Comment on attachment 349913 [details] [diff] [review]
v1 for 1.8 branch

Will soon comment back on the review.
Attachment #349913 - Flags: review?(wtc) → review-
Attachment #349914 - Flags: review?(wtc) → review-
Attached patch v2 for 1.8 (obsolete) — Splinter Review
There is no need to worry about races, we are protected by monitor of nsComponentManagerImpl that lets other threads wait until the first thread finishes do_GetService completely, i.e. including the Init execution.

What you suggest in comment 76 will work (first tests show it still works). Only in case we first create nss component service independently and then we create a component that ensure the nss service we call do_GetService for it a second time. It's probably a very little overhead.

Tested again on 1.8.1 branch with FoxyProxy 2.8.5 and reversed patch for bug 462806.
Attachment #349913 - Attachment is obsolete: true
Attachment #361403 - Flags: review?(kaie)
Attached patch v2 for 1.9.1 and trunk (obsolete) — Splinter Review
Attachment #349914 - Attachment is obsolete: true
Attachment #361404 - Flags: review?(kaie)
Honza, I have a problem with your patch, but I don't know yet where the problem is. I'm currently working on a patch for bug 390036, it introduces additional SSL worker threads.

Whenever I merge your patch here with the patch from there, I get assertions that multiple instances of nsNSSComponent get created (with session restore of a https page).

My patch alone: works fine
Your patch alone: works fine

The new combination, or your changed order of init calls, or the changed logic of the XPCOM-constructor macro, or a side effect in my patch. So far I was unable to find the cause.
Attached patch merge test (a)Splinter Review
This is your trunk patch with my new feature patch from bug 390036 merged.
>diff --git a/security/manager/ssl/src/nsNSSComponent.cpp b/security/manager/ssl/src/nsNSSComponent.cpp
>--- a/security/manager/ssl/src/nsNSSComponent.cpp
>+++ b/security/manager/ssl/src/nsNSSComponent.cpp
>@@ -1747,9 +1762,25 @@ nsNSSComponent::Init()
>   rv = InitializeNSS(PR_TRUE); // ok to show a warning box on failure
>   if (NS_FAILED(rv)) {
>     PR_LOG(gPIPNSSLog, PR_LOG_ERROR, ("Unable to Initialize NSS.\n"));
>+
>+    DeregisterObservers();
>+    mPIPNSSBundle = nsnull;
>     return rv;
>   }
> 
>+  nsSSLIOLayerHelpers::Init();
>+  nsSSLThreadControl::Init();
>+  nsSSLThreadControl::Start();
>+  mCertVerificationThread = new nsCertVerificationThread();
>+  if (mCertVerificationThread)
>+    mCertVerificationThread->startThread();
>+
>+  if (!mSSLThread || !mCertVerificationThread)
>+  {
>+    PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("NSS init, could not create threads\n"));
>+    return NS_ERROR_OUT_OF_MEMORY;
>+  }

You left here !mSSLThread in the condition. Should be nsSSLThreadControl::mThreads ? Also mSSLThread should be (AFAIU) removed at all, right?

Also, it's obviously my fault that I do not deregister observers on this failure, that is why you got two instances, I have to add it to my patch for bug 456705.
Thanks Honza! That helps me.

Will you attach a new patch to this bug, where you fix the observers?
(In reply to comment #83)
> Will you attach a new patch to this bug, where you fix the observers?

Yes, probably today.
Attached patch v2.1 for 1.9.1 and trunk (obsolete) — Splinter Review
Fixing pre-return code in nsNSSComponent::Init().
Attachment #361404 - Attachment is obsolete: true
Attachment #362318 - Flags: review?(kaie)
Attachment #361404 - Flags: review?(kaie)
Attached patch v2.1 for 1.8 (obsolete) — Splinter Review
Attachment #361403 - Attachment is obsolete: true
Attachment #362319 - Flags: review?(kaie)
Attachment #361403 - Flags: review?(kaie)
Attachment #362318 - Flags: review?(kaie) → review+
Comment on attachment 362318 [details] [diff] [review]
v2.1 for 1.9.1 and trunk

r=kaie
Attachment #362319 - Flags: review?(kaie) → review+
Comment on attachment 362319 [details] [diff] [review]
v2.1 for 1.8

r=kaie
Comment on attachment 362318 [details] [diff] [review]
v2.1 for 1.9.1 and trunk

I'll land this soon on trunk.
Attachment #362318 - Flags: approval1.9.1?
Attachment #362319 - Flags: approval1.8.1.next?
http://hg.mozilla.org/mozilla-central/rev/9ad175f8b25f
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attachment #362318 - Attachment description: v2.1 for 1.9.1 and trunk → v2.1 for 1.9.1 and trunk [Checked-in on trunk comment 90]
Cause getService re-entrance, I had to catch this, reverting to the first version of the patch that prevents this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, for now I backed out, it is not that simple to return to the first version of the patch, I have to retest all the stuff again, it's for hours...
Status: REOPENED → ASSIGNED
Attachment #362318 - Attachment description: v2.1 for 1.9.1 and trunk [Checked-in on trunk comment 90] → v2.1 for 1.9.1 and trunk
Whiteboard: [has 1.8/1.9 patches][semi dupe of bug 427715?][needs review kaie] → [has 1.8/1.9 patches][semi dupe of bug 427715?]
Whiteboard: [has 1.8/1.9 patches][semi dupe of bug 427715?] → [needs trunk/1.9.1 landing before branch approval][semi dupe of bug 427715?]
Attached patch v3, trunk, 1.9.1 (obsolete) — Splinter Review
I added one more flag, that's set TRUE during nss component is in process of initiation. This prevents reenter of do_GetService for it (that leads to assertion false) and cleans the whole code up.

I change the flags only and only when called from nss component constructor now that prevents any race conditions - we are protected by XPCOM component manager monitor.
Attachment #362318 - Attachment is obsolete: true
Attachment #362319 - Attachment is obsolete: true
Attachment #363903 - Flags: review?(kaie)
Attachment #362319 - Flags: approval1.8.1.next?
Attachment #362318 - Flags: approval1.9.1?
Attached patch v3, for 1.8.1 (obsolete) — Splinter Review
Tested again with patch -R bz's fix and FoxyProxy 2.8.5.
Attachment #363906 - Flags: review?(kaie)
Attached patch v3.1, trunk, 1.9.1 (obsolete) — Splinter Review
Found a little mistake in the constructor, I was calling EnsureNSSInitialized(nssInitFailed) when NS_NEWXPCOM allocation for any component failed, need to be called only when invoked for nss component.
Attachment #363903 - Attachment is obsolete: true
Attachment #363906 - Attachment is obsolete: true
Attachment #363916 - Flags: review?(kaie)
Attachment #363903 - Flags: review?(kaie)
Attachment #363906 - Flags: review?(kaie)
Attached patch v3.1, 1.8.1 (obsolete) — Splinter Review
...
Attachment #363917 - Flags: review?(kaie)
Comment on attachment 363916 [details] [diff] [review]
v3.1, trunk, 1.9.1

Honza, thanks a lot.

r=kaie
Attachment #363916 - Flags: review?(kaie) → review+
Comment on attachment 363917 [details] [diff] [review]
v3.1, 1.8.1

r=kaie
Attachment #363917 - Flags: review?(kaie) → review+
http://hg.mozilla.org/mozilla-central/rev/5dde4d86be49
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Backed out, we get assertion failure on leak test boxes, see bottom of
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1235744234.1235747596.5195.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, no assertions anymore, re-entrance protection was to protective.
Attachment #363916 - Attachment is obsolete: true
Attachment #364570 - Flags: review+
Status: REOPENED → ASSIGNED
http://hg.mozilla.org/mozilla-central/rev/3ea8539640f5
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Whiteboard: [needs trunk/1.9.1 landing before branch approval][semi dupe of bug 427715?] → [needs 1.9.1 landing before branch approval][semi dupe of bug 427715?]
3.2 successfully landed on mozilla-central, we can try to land on 1.8.1. Locally deeply tested as STR for this bug is for FF 2.0.
Attachment #363917 - Attachment is obsolete: true
Attachment #364968 - Flags: review+
Attachment #364968 - Flags: approval1.8.1.next?
Comment on attachment 364570 [details] [diff] [review]
v3.2 [Checkin mozilla-central comment 102][Checkin mozilla-1.9.1 comment 106]

Successfully landed on mozilla-central.
Attachment #364570 - Attachment description: v3.2 → v3.2 [Checkin mozilla-central comment 102]
Attachment #364570 - Flags: approval1.9.1?
Attachment #365009 - Flags: approval1.9.0.8?
Attachment #364570 - Attachment description: v3.2 [Checkin mozilla-central comment 102] → v3.2 [Checkin mozilla-central comment 102][Checkin mozilla-1.9.1 comment 106]
Attachment #364570 - Flags: approval1.9.1?
Comment on attachment 364570 [details] [diff] [review]
v3.2 [Checkin mozilla-central comment 102][Checkin mozilla-1.9.1 comment 106]

(As it's blocking 1.9.1 doesn't need approval)

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/e1111e6b81f3
Keywords: fixed1.9.1
Whiteboard: [needs 1.9.1 landing before branch approval][semi dupe of bug 427715?] → [semi dupe of bug 427715?]
Attachment #365009 - Flags: approval1.9.0.8? → approval1.9.0.9?
Attachment #364968 - Flags: approval1.8.1.next? → approval1.8.1.next+
Comment on attachment 364968 [details] [diff] [review]
v3.2 for 1.8.1 [Checkin comment 109]

Approved for 1.8.1.22, a=dveditz for release-drivers
If this fixes topcrash bug 427715 then it'd be worth taking for sure.
Flags: blocking1.9.0.10?
Comment on attachment 364968 [details] [diff] [review]
v3.2 for 1.8.1 [Checkin comment 109]

Checking in nsNSSComponent.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSComponent.cpp,v  <--  nsNSSComponent.cpp
new revision: 1.126.2.10; previous revision: 1.126.2.9
done
Checking in nsNSSComponent.h;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSComponent.h,v  <--  nsNSSComponent.h
new revision: 1.38.4.4; previous revision: 1.38.4.3
done
Checking in nsNSSIOLayer.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp,v  <--  nsNSSIOLayer.cpp
new revision: 1.97.2.21; previous revision: 1.97.2.20
done
Checking in nsNSSIOLayer.h;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.h,v  <--  nsNSSIOLayer.h
new revision: 1.27.28.6; previous revision: 1.27.28.5
done
Checking in nsNSSModule.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSModule.cpp,v  <--  nsNSSModule.cpp
new revision: 1.38.4.2; previous revision: 1.38.4.1
done
Attachment #364968 - Attachment description: v3.2 for 1.8.1 → v3.2 for 1.8.1 [Checkin comment 109]
Keywords: fixed1.8.1
Flags: blocking1.9.0.10? → blocking1.9.0.10+
Attachment #365009 - Flags: approval1.9.0.10? → approval1.9.0.10+
Comment on attachment 365009 [details] [diff] [review]
v3.2 for 1.9.0 [Checkin comment 111]

Approved for 1.9.0.10, a=dveditz for release-drivers
Comment on attachment 365009 [details] [diff] [review]
v3.2 for 1.9.0 [Checkin comment 111]

Landed on 1.9.0.

Checking in nsNSSComponent.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSComponent.cpp,v  <--  nsNSSComponent.cpp
new revision: 1.168; previous revision: 1.167
done
Checking in nsNSSComponent.h;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSComponent.h,v  <--  nsNSSComponent.h
new revision: 1.54; previous revision: 1.53
done
Checking in nsNSSIOLayer.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp,v  <--  nsNSSIOLayer.cpp
new revision: 1.165; previous revision: 1.164
done
Checking in nsNSSIOLayer.h;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSIOLayer.h,v  <--  nsNSSIOLayer.h
new revision: 1.47; previous revision: 1.46
done
Checking in nsNSSModule.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsNSSModule.cpp,v  <--  nsNSSModule.cpp
new revision: 1.52; previous revision: 1.51
done
Attachment #365009 - Attachment description: v3.2 for 1.9.0 → v3.2 for 1.9.0 [Checkin comment 111]
Keywords: fixed1.9.0
Target Milestone: --- → mozilla1.9.2a1
This sent fxdbug-linux-tbox all leaky on 1.9.0.
(In reply to comment #112)
> This sent fxdbug-linux-tbox all leaky on 1.9.0.

Since it's never more than 92.0B, you're likely seeing bug 454837.
I believe that leak is not caused by my land. As I was watching the tree, it failed before already the same way. Just after my check-in it happened more often, but not in 100% cases, there were also greens. I'll check the leaks, my patch may be somehow related.
Verified for 1.9.0.11 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11pre) Gecko/2009051305 GranParadiso/3.0.11pre (.NET CLR 3.5.30729).
Crash Signature: [@ nsSSLThread::Run]
Blocks: 320954
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: