Closed Bug 75947 Opened 19 years ago Closed 18 years ago

Security DLLs are now loaded at startup, hurting startup time

Categories

(Core Graveyard :: Security: UI, defect, P4)

1.0 Branch
defect

Tracking

(Not tracked)

VERIFIED FIXED
psm2.2

People

(Reporter: sfraser_bugs, Assigned: kaie)

References

Details

(Whiteboard: nsenterprise)

Attachments

(5 files, 6 obsolete files)

95.88 KB, text/html
Details
138.09 KB, patch
kaie
: review+
Details | Diff | Splinter Review
9.24 KB, patch
Details | Diff | Splinter Review
1.35 KB, patch
javi
: review+
Details | Diff | Splinter Review
7.47 KB, patch
javi
: review+
Details | Diff | Splinter Review
With the PSM 2 landing we started loading the security DLLs PIPNSS and NSSckbi at 
startup. They get pulled in by nsGlobalWindow's ctor, which does a GetService of 
the nsIEntropyCollector.

Loading these two DLLs is going to hurt startup time somewhat.
Security:Crypto is the component for PSM issues.
Assignee: mstoltz → ddrinan
Component: Security: General → Security: Crypto
QA Contact: ckritzer → junruh
On my 19.8 second win32 startup (after a reboot) PSM2 accounts for 0.7 seconds, 
about 3.5% slowdown.
NSS is also eargerly loading a string bundle on startup.
Moving to Component "PSM".


Component: Security: Crypto → Client Library
Product: Browser → PSM
Target Milestone: --- → 2.0
Version: other → 2.0
I also set the target to PSM 2.0 in the hopes that we can make the mozilla 0.9.1
release.  
Blocks: 71781
I believe that the proposal on the table is to break entropy collecting into its
own DLL which will get loaded at launch time.  Then the rest of PSM will get
loaded when called for the first time.

I'll let ddrinan comment further.
Before PSM 2.0, the entropy collector was loaded as well (meaning psm-glue was
loaded).  So if we can make it such that only pipnss.dll is loaded, then we
should be about the same as PSM 1 days.  What's hurting start-up is the
initialization of NSS at the same time the entropy collector is loaded.

IMO, if we can delay the initialization of NSS until SSL or a cert action is
required the start up time would be on par with pre-PSM 2 days.

Breaking the entropy collector into its own dll won't do much to solve the
problem because you're still loading another module.  But making the entropy
collector not initialize NSS right away will.
I'd also be happy with just delaying kicking all this off by about 60 seconds
with a timer (or something like the category service.)  It might be the easier
path and I don't think 60 seconds less entropy is going to hurt.

No recent comments in this bug, is it going to be fixed this week?
I'm finishing off PSM feature work. I hope to get to this by friday (5/11).
Did it happen on Friday?  :-)
Adding thayes to cc-list.

After discussing the various proposals with other members of the NSS team, we've 
come to the conclusion that in order to reduce start up time while *not* 
reducing the amount of entropy collected, the following criteria must be met:
 - start entropy collection when browser loads and buffer the entropy.
 - delay initialization of NSS.
 - when NSS has be initialized (SSL, SDR, keygen etc), feed it the stored 
entropy.
 - contine to feed NSS entropy as it is collected.

Since the amount of engineering work is much greater than anticipated, we should 
talk tomorrow and figure out what the best course of action is.

While investigating this bug, we are of the opinion that NSS is not being fed 
enough entropy by the browser and that other sources of entropy should be added 
(e.g. mouse moves). I'll open a bug on this. 

Also, NSS should be initialized once and only once during the lifetime of the  
browser (with the one exception of profile switching) because each time NSS is 
initialized, it loses all of its previous entropy and other state information. 
I'll open a bug on this also.
We've had a number of meetings on this matter.

The basic problem is that we need to start collecting entropy as soon as the
client launches, and the proposals for delaying or changing the startup sequence
would further reduce the amount of entropy we have in the system. 

As you know, we need to feed a good deal of entropy into the PRNG system in
order to make sure that we have enough randomness for operations like SSL,
keygen, and the encrypted wallet feature.  Without a good source of randomness,
a long symetric key of 128-bits might only be worth a few bits.  (That would be
*REALLY* bad, btw)

Some of you may recall a prior incident where the browser did not have enough
entropy for SSL connections.  Those of you who don't remember the matter can
read about it in Steven Levy's new book "Crypto".  :-(

As a result of these meetings, I don't think we can both collect and use the
right amount of entropy *and* change the startup sequence.  It's not impossible,
but it will require some non-trivial amount of work that we haven't scheduled.  
Target -> PSM 2.1


Target Milestone: 2.0 → 2.1
Blocks: 77757
Keywords: nsenterprise
remove nsenterprise
Keywords: nsenterprise
Whiteboard: nsenterprise
Changing platform
Hardware: All → Macintosh
OS: Mac System 8.5 → All
Hardware: Macintosh → All
P2
Priority: -- → P2
P4
Priority: P2 → P4
*** Bug 91235 has been marked as a duplicate of this bug. ***
Moving all P3 and P4 bugs targetted to 2.1 to future.
Target Milestone: 2.1 → Future
start time is one of the key performance problems that still need to 
be addressed.   please find a way to bump the priority back up
on this one and move the milestone back in or risk the rath
of my 5-iron...
Mass assigning QA to ckritzer.
QA Contact: junruh → ckritzer
Depends on: 96058
The fix for bug 76915 (which loads security components from nsAppRunner.cpp)
makes this bug a good bit harder to fix.
Now that bug 76915 is fixed, is this bug still relevant?
QA Contact: ckritzer → junruh
Version: 2.0 → 2.1
No, the fix for bug 76915 makes this a lot worse and should probably be removed
and redone in a better way (and one that doesn't prevent the entire app from
working if PSM is messed up).
Changing my e-mail address.
Removing old e-mail address.
our profiling data showing nss_init() is showing 5% cost of time on startup.
we need this bug fixed ASAP.
Target Milestone: Future → 2.2
come talk to brendan and chofmann about this :-)
Can we see the actual profiling data. The bug says the problem is in the size of
the library itself, cathleen indicates that it's NSS_Initialize() itself. If we
have good accurate profiling info, we can get a better idea on what the problem
is and design ways around it. Thanks,

bob
One of the things you do right off the bat is read the cert7.db, which if you've
got a migrated profile with gobs of 4.x SMIME certs can be quite large (mine is
nearly 3Mb). Do you really need to read all these right away just to start
collecting entropy?
Both the loading of the dlls and the initialization are problems, and both could
be delayed.  nss_Init alone was 5.3% of the time (excluding polling) in a jprof
profile (and loading the NSS string bundle was a bit more), which is timer based
but may miss some of the time spent in loading DLLs.

I'll attach a sub-profile showing the time spent within nss_Init.  The total
number of non-polling timer hits was 13142, this profile covers 694, or 5.3%.
Bingo! Thanks.... The profile makes it very clear the massive amount of NSS
startup time is reading all the built-in certs into the temparary DB. This is
something we're changing in NSS 3.4, where all the certs aren't unloaded from a
token at startup time.

So the good news is this shouldn't be a problem for Mach5. The bad news there is
not safe fix in the near term.

bob
cc lord.
I'll let Bob Relyea's comments stands.
Loading the libraries is still an issue, though.
At least one library needs to get loaded immediately in order to start
accumulating input for seeding the cryptographic random number generator.  This
function (accumulating data) could be moved into the core libraries or modules,
however nobody has volunteered to do this.  If it's not done in the core, a PSM
component library will need to be loaded.

Is there an advantage to loading a small library instead of a larger one (in
terms of start-up time)?
Maybe the DOM code that sends mouse event entropy could start with the 100th
event rather than the 1st event (since it only sends one out of every hundred),
so we can get a window up without loading the security DLLs?
It should not be necessary to load the security libs to collect entropy.
Collection can be decoupled from consumption (that is, from feeding
the collected entropy into the PRNG).  

See my comments dated 2001-05-16 13:45 in bug 80841 to see how PSM 1.x
did this.
when is 2.2 milestone for PSM?
The next code name.
I'll attach a patch that removes the hack from nsAppRunner.cpp and replaces it
with a hack in the NSS module and provides the architecture for an entropy
collector that is not in the NSS module.  The gap in this patch is that I'm not
sure of the correct algorithm for accumulating the entropy.  That needs to be
filled in.

However, with this patch, there's still something that's loading the library
from JS code.
It's probably creating the  "secure browser ui" object for the window.  This
object determines the security status of the window (secure, mixed, insecure),
sets the state of the lock icon, and displays warnings about posting forms to
insecure sites.  It also provides the data that is used for displaying the
security status of the window in "Page Info".
We could probably move nsSecureBrowserUIImpl out of libpipnss, couldn't we? 
mozilla/xpfe/browser seems like a logical place to put it.  It doesn't depend on
any NSS code, and no PSM code aside from throwing the dialogs for transitions
between secure and non-secure sites.
Does that mean we'll have to move the definition of said dialogs (which are
probably defined in manager/ssl/public)?
Javi, currently nsSecureBrowserUIImpl uses a direct function call to
getNSSDialogs to request the pointer on which to execute the dialogs. But the
obtained pointer is already defined as an interface.

If we move this function call to an interface, could this be sufficient to move
nsSecureBrowserUIImpl out of the NSS component?
1)
I answer my own question. It seems to work. I'm attaching a patch to move the
file over as bryner suggested.

It is not yet a perfect patch, but is sufficient for testing whether moving this
over reduces startup time. One thing I have not yet cared for is the string
bundle. The moved source file only requests one single string from the bundle,
so this string could easily be moved over, too.

To test this patch, first apply it, and then "cd" to the mozilla directory, and
execute:
  mv security/manager/ssl/src/nsSecureBrowserUIImpl.* xpfe/browser/src/

dbaron, do you want to test this? Or can I help?


2) I looked at your patch. One thing that you have left out (and that we should
re-add if we take your patch) is the error message + aborting code, that exits
Mozilla if the security component can not init the certificate databases.
Not starting the entire browser if the certificate database is corrupt is silly.
 My patch makes it so we can start the browser, but just not use any security
code.  That is sufficient to prevent the crash.  If you'd like to add UI for
that as well, go ahead, but please don't make it so that it's impossible to
start the browser, and please don't do it by gross violations of modularity.
dbaron: i have a patch somewhere that enables mozilla w/ psm to run w/ a
readonly security db because last i checked writability was a nice and fatal
requirement :)
kaie, doesn't this create a build-time dependency on the nsINSSDialogs interface?
dbaron: Not starting the browser may seem "silly" to you, but that code was
inserted to deal with a top crasher that was reported via Talkback and we got
lots of pressure to deal with.  Before you go ripping out code, please ask to
find out why it was put there in the first place, and when appropriate replace
it with code that provides the same functionality.
I know that.  It *was* a silly fix for the given problem, and my patch still
fixes that problem.
  See bug 76915 for background on terminating the browser when the db is corrupt.

  A profile with a corrupted database is pretty useless.  How long will it take
for that user to run into serious problems? What will the failure modes of a
browser session with no security (hangs, page not loading, what?) how many bugs
would this generate? how many calls to a support center? How easy would it be to
associate whatever problem is reported to the fact that the db cannot be open? 
At least now, the failure mode is easily documented, that is to say, supportable.

  The chosen fix should address this issues.
It will act exactly as the browser does when security isn't installed.  That's a
legitimate mode of operation that we support.
Mozilla supports embedding customers not including the security libraries.

When the security libraries are part of the product (e.g., the Mozilla browser)
and something happens that disables them, that's different.  The patch to bug
76915 addressed that situation and should stay in.
Of course, your fix to bug 76915 doesn't work for any embedders that do include
the security libraries, since it's breaks modularity in a way that the system
was not designed to work.
If I start Mozilla and my global history database is corrupt, the program should
not abort.  If Mozilla finds a corrupted cache file, it should not abort.  If it
finds a bad BSD mailbox file, it should carry on as best it can.  Likewise for
the cert db.  How much pressure was on someone to fix a topcrash
notwithstanding!  No design by emergency bug-avoidance hacking, please.

/be
Commenting on the patch:
Why is there no user feedback as to the fact that security initialization failed?  

If the code is running, then the product intends for security to be available. 
Up until now we've *always* opened certificate databases read/write and failure
to do so was considered an error.  The alert is a way to provide feedback to the
user that something is wrong so that a user can tell the help desk technician
"The browser says security failed to initialize."  This is a big win when
debugging problems and we should continue to provide users with such error messages.

If embeders want the ability to run with read/only or no cert db's on disk then
we should provide a way for the embeders to specify that.  (File an RFE and
it'll get prioritized.)  We shouldn't just continue as if nothing is wrong if
NSS failed to initialize in the manner we've been doing since Comm 4.x days. 
This can lead all products to have a false sense of security, which IMHO would
be bad.  At the least we need to let the user know something is wrong, so the
poblem can be fixed.
 
I think I now understand what David says. I haven't tried it yet, but looking at
his code, the following is what he did:

David changed the way the instantiation of PSM objects works. The standard code
(used by any other component) just creates an instance.

David suggests to use a changed version for our module. He suggests to let
instantiation happen conditionally.

The first time any PSM object is requested, a test code is executed, and the
result of the test is remembered. This test tries to create the nsNSSComponent
singleton, which tries to init NSS.

If something goes wrong during the first test, the error condition will be
remembered. For all requests, by any part of the application, for PSM services,
the security module will return a failure.

David also made sure that both of our libraries, pki and nss, both are either
working or nonworking.

This means, if we can not init NSS, the application would run as if no security
were linked with the application.

This is much more clever solution than what I did in bug 76915.

We are free to add a GUI notification. We need to decide whether we still need
the low level GUI routine for user feedback, or whether we consider it now
unlikely that the failure happens in a very early stage of the application -
where XUL is not yet available (as we saw in bug 76915). But if move the entropy
collector out of the security library, it indeed seems unlikely that XUL is not
available when the first problem shows up. We are free to add an error message
to the constructor of NSS-Component - which will be shown once only.

Davids patch is part of the work of solving this particular bug. If we want to
delay loading of the security component, we no longer know when the first PSM
instance will be created. And with his patch he solves that problem.
> This is a big win when debugging problems and we should continue to provide
> users with such error messages.

Fine, but do it from within your module rather than cluttering up
nsAppRunner.cpp, which is designed to start the app (and only Mozilla the
browser, not any embedded apps), not to be a dumping ground for any code anybody
feels like putting there.

taking
Status: NEW → ASSIGNED
hmm, sorry for the spam, I thought clicking assign would reassign it to me...
Assignee: ddrinan → kaie
Status: ASSIGNED → NEW
I did!
bug 105167 2001-10-17 kaie@netscape.com NEW Client L All
junruh@netscape.comMozilla w/ NSS won't start w/ Readonly profile

There's even a patch.
I will work on a patch that puts all pieces together.

The entropy collection will be moved out of the security components completely.
David suggested to put it into the dom directory, so be it.

To gather the entropy I will use the following strategy (unless somebody
objects). The entropy collector will be used immediately. It will use an
internal cyclic cache of entropy collected so far.

I will extend the interface of EntropyCollector with the new function:
RegisterEntropyConsumer.

As soon as the security component is loaded, it will run its init code. During
that code, the entropy collector service is accessed, and the security component
registers itself as a consumer.

For now, to simplify things, we will not support multiple consumers, but only
one. Once a consumer is registered, other registrations will be rejected. (The
security component will unregister itself when it is unloaded, to allow for
later re-registration, in case this will ever be necessary.)

The method RegisterEntropyConsumer will pass over the contract id that should be
used to access the consumer. The collector will use GetService and cache the
pointer to the consumer. Immediately the collector will flush its cache, by
invoking a method on the consumer to hand over all entropy collected so far
(that is still in the cache).

Question: How many bytes should the entropy collector use for its cache?

This should be the number of bytes wanted by the security component immediately,
and will define the additional memory usage before the security component is loaded.

Once the collector has a registered consumer, it will no longer use a cache, but
free the memory, and pass over all collected entropy immediately.
Status: NEW → ASSIGNED
These comments are in response to the question asked just above.

PSM 1.0 had a very good design for this.  I suggest you follow that design
where it makes sense.

4 KB entropy buffer.  If more than 4KB of data gets put into it, then it
wraps around to the beginning of the buffer again.  It doesn't just overwrite
the old data with the new.  It combines them like so:

static unsigned char buf[4096];
static unsigned int in;
static unsigned long totlen;

update(unsigned char * src, unsigned int    len)
{
  totlen += len;
  while (len > 0) {
    unsigned int old = buf[in];
    buf[in] = ((old << 1) | (old >> 7)) ^ *src++;
    in = (in + 1) & 4095;
  }
}   

When the consumer starts, it is given MIN(in, totlen) bytes from buf.
The buffer then starts over, e.g. 
        in = totlen = 0;

> Once the collector has a registered consumer, it will no longer use a cache, 
> but free the memory, and pass over all collected entropy immediately.

In PSM 1.x, even after the consumer is started, input data continues to be
buffered.  Data is only given to the consumer when PSM is going to do 
something that might need new random values (like start a new SSL connection, 
or sign or encrypt a new message).  This improves efficiency, fewer function 
calls per byte consumed on average.
Attachment #55670 - Attachment is obsolete: true
Attachment #56710 - Attachment is obsolete: true
Attached patch Suggested fix (obsolete) — Splinter Review
This patch should fix this bug.

It uses dbaron's strategy to delay loading at the module level.

I extended the buffering entropy collector, using Nelson's strategy.

In one point I did not yet follow Nelson's suggestion. My understanding is:
With Nelson's strategy (to buffer always), the security component must be
actively requesting entropy. Currently, PSM doesn't do that. It just forwards
entropy to NSS using PK11_RandomUpdate as it becomes available, but never
actively requests entropy for consumption. Therefore we disable buffering (and
switch to forwarding) as soon as PSM is loaded.

I added a warning message to the constructor of the NSS component. Once this
gets loaded, and initializing NSS fails, a warning will be shown to the user.
We might want to enhace this warning text later. As a base I used the text we
already had in AppRunner, and added a statement that it is recommended to quit
the browser and fix the problem. But if we fail, the application will continue
to run. The warning message is:

"Could not initialize the browser's security component. The most likely cause
is problems with files in your browser's profile directory. Please check that
this directory has no read/write restrictions and your hard disk is not full or
close to full. It is recommended that you exit the browser and fix the problem.
If you continue to use this browser session, you will see incorrect browser
behaviour when accessing security features."

I removed the additional files for the AppRunner warning text.
I moved over nsSecureBrowserUI to the dom module, the same place where the
buffering entropy collector lives.
I split two interface files. Only two small interfaces had to move to dom, and
mozilla/dom/src now depends on uriloader.
I tested that the buffering and forwarding code works.
I tested that building works with --disable-crypto, too.

There is one drawback, that I think is acceptable temporarily, but should be
fixed soon (that's a new bug):

PSM uses XUL overlays that are hooked into the preferences. These are loaded
and displayed to the user, even if PSM can not load. Basically we have now
broken functionality. The controls are shown, but they are shown incorrectly,
not filled with content, and behave incorrectly.
I guess we can not prevent that those preferences are shown. If we can, please
tell me how to :-)
If we can't, we could do the following: Let's disable all the controls
initially in XUL. We might extend the onLoad handlers for every overlay. There
we try to access the PSM service. If that doesn't work, we keep the controls
disabled. If it works, we enable the controls to the initially wanted state.

The good thing is that I didn't see any crashes. When PSM is not available,
security is just not available. I only saw JavaScript exceptions caused by the
preferences JS code.

Note that files nsSecureBrowserUI.h and .cpp are included twice in the patch
for easier reviewing. Once completely at the new location. Another time at the
old location showing only what was changed in the files.

Who can review this patch?
adding patch, review keywords
Keywords: patch, review
I found another issue that applying this patch will create.

The constructor of nsNSSComponent registers content listeners for 4 mime types
in order to support downloading and importing certificates from websites.

If a user tries to download a certificate (for example in preparation to accept
a new Certification Authority) before PSM has initialized, the mime type will be
unknown and the user will be prompted to save the certificate to disk (which is
not what is wanted).

If I understand this correctly, to make it work in that case, the downloading of
certificates has to be re-implemented using nsIContentHandler and registered
with NS_CONTENT_HANDLER_CONTRACTID_PREFIX"mimetype" contract ids (See also bug
108600).
I actually need to make this bug dependent on bug 108600.

On my system, if I apply this patch, Mozilla crashes when I try to download a
certificate from a website, prior to loading PSM. My patch in bug 108600 fixes
the problem.
Depends on: 108600
Comment on attachment 57094 [details] [diff] [review]
Suggested fix

Make sure licenses for new files are correct.  Get someone on dom team to
review dom changes.

Other than that:

r=javi for security changes.
I *really* don't like the idea of having all this non-DOM related code in the
DOM library, I'm happy to provide whatever hooks it takes to make this work as
we want it to work, but moving this code into the DOM seems just wrong.

Would it not make more sense to create a little static library in the security
code and make the DOM link statically with that library?
jst told me Mozilla does already use this strategy in mozilla/content/shared,
which builds a static lib, and gkcontent.dll and gklayout.dll links with the
static shared lib.

I will look at the example and create a new patch.
Depends on: 109777
Javi asked me to test whether obtaining a certificate with this patch still works.

While testing, I found bug 109770, which blocks this one.
Depends on: 109770
When considering the library structure, remember that PSM is (currently) an
optional (extension) module.  Currently, NOTHING in the PSM area is built if
BUILD_PSM2 is not set.  That would include any static library and header files
that you put there.

There is no way to avoid putting some amount of code (or interfaces) in the base
modules.  If it's not in the DOM code, then fine, but it needs to live someone
that is always included in the build.
jst:

I looked at the mentioned example shared library. It does not implement any
interfaces, but we need to do that in our code.

I think we would have to add some code to your module anyway, at least to
nsDOMFactory.cpp. Your wish would make your and our code more complicated, and
we don't have a good example in the Mozilla code already lying around, but would
have to invent something new.

But basically I dislike the idea of inventing new hooks. We already have a
method to use hooks, that's XPCom. But I would like to avoid adding yet another
XPCom module. In my opinion, having many XPCom modules has a negative impact on
the overall application overhead and should be avoided, too.

If we are required to load one security related XPCom module during startup,
what about the following suggestion:

Currently, security does already use two different modules.

What about moving all stuff that is required initially to the module PKI? That
would mean, one small security module will be loaded during startup. But it will
not be the the NSS module, it will be the smaller PKI module.

dbaron: Do you think it would fix this bug if we only load the PKI library
immediately, and move the code from the patch to PKI instead to DOM?

To mention some numbers: Currently PKI module has about 10% of all security
code. Not counting NSS, which reduces this number even more.
Do not add any implementations to the pki module that will force it to *always*
be loaded.  The purpose of the pki module is to provide default UI for security
related tasks and nothing else.  Embedders will implement the subset of UI
operations they care about and I'm afraid that adding interfaces to pki that
force it to get loaded every time will cause undesired conflicts with the
implementations of embedders.
Thanks, Javi.

I think that means we have to go with a 3rd XPCom module for security, that will
(for now) only contain the code that we intended to move to dom.

This means, the new small security library will always be loaded during startup,
the other two libraries on demand.

If you think, this is not acceptable, but a solution is sought where no security
library is loaded at all during startup, please object now.
So we're going to create a brand new XPCOM interface to do that?
XPCOM is one of the reasons why the application has performance problems.
We're fixing this bug for performance reason as the request of the performance
group.

Could we revisit the options of having the code in the DOM code? Just "seems
wrong" may not be sufficient reason to go to all this trouble.

jst: can you elaborate why this is so bad to have this bit of code live in the DOM?

Performance folks: if we're going to fix this for performance reasons, shouldn't
we avoid adding bloat based on a code location issue.

What we're trying to do is to get executable code executed at the proper
location in the life of the application. In my humble opinion, achieving this at
the lowest engineering cost, and in the most performance/footprint efficient
manner may very well supercede objections at to where the text code lives. The
end user doesn't care about the latter.

My 2c.
We do need to make sure to preserve modularity where it makes sense.  However, I
would say since the dom module already provides the nsIEntropyCollector
interface, it would make sense for it to provide a "stub" implementation.
Please see also discussion in bug 105167, which discusses whether automatic
read/only mode for NSS should be used in case of unavailable read/write mode.

This can influence the patch we create in this bug. It would simplify it a
little, as the trick to "disable PSM from loading if NSS can not initialize"
would become obsolete. This trick would not work for the "switch profile during
runtime" case, where NSS needs to be re-initialized.

However, all the other above questions still apply.
I will re-implement this using a new 3rd XPCom module.
Fixing this bug will even have a greater impact than I thought.

I saw that currently, if one has at least two profiles, NSS is actually
initialized twice. Init - profile selection box shows up - shutdown - init again.

Once we fix this bug, both inits will be avoided during startup.

However, as NSS init takes some time, the same destruct and init sequence might
happen again at later times. I wonder whether we should add code that makes sure
NSS is at most initialized once per session (at least not destructed until the
profile is changed).

To make sure the NSS service does not get destructed, because currently nobody
holds a reference to it, should we make NSS increment its own reference count?
As a sidenote, I wonder whether this applies to other global services as well,
being created and destroyed multiple times, hurting performance.
Blocks: 99566
I had a chat with Conrad Carlen. He said during profile creation,
unloading/loading of NSS will happen even more times, around 4 times.

Anyway, even if it is only two times it is two much I think. I will use the
trick, to use NS_ADDREF_THIS in the constructor of nsNSSComponent.

This instance can live for the whole session.

In "turbo mode", it can live between profile switches, too, and will care on its
own for re-initializing of NSS (as implemented in bug 99566).

This imposes the risk that we leak on shutdown. Actually we are already doing
this, caused by code in nsNSSIOLayer.cpp (InitNSSMethods). I will remove that
other leaking reference and just make sure that NSS has already been initialized.

But we should avoid the leak and make sure we actually clean up on shutdown.
That includes making sure that NSS_Shutdown is actually called when we exit the
application. We don't do that currently.

To accomplish this, my idea is to register the nsNSSComponent with the observer
service and watch for an application shut down event. We we see that event, we
call NS_RELEASE_THIS.

Conrad suggested the event NS_XPCOM_SHUTDOWN_OBSERVER_ID should be appropriate.
>Anyway, even if it is only two times it is two much I think. I will use the
trick, to use NS_ADDREF_THIS in the constructor of nsNSSComponent.
>
>This instance can live for the whole session.

Why not just make it a service?
Hmm, what exactly do you suggest? I thought it is already a "service" because it
is only constructed using do_GetService, and designed to be a singleton.

However, the first reference to this service only lives for a short time during
startup. And when this reference is freed, the instance is destroyed, NSS unloaded.

Is there a possibility to declare something as a service, that won't be freed
once it is loaded, which I'm not aware of?
> He said during profile creation, unloading/loading of NSS will happen even
> more times, around 4 times.

The reason this happens is that the creation of the component fails so it's
never assigned to the service mgr which would maintain a ref to it.

Blocks: 105167
This is a reimplementation of the previous patch using a new 3rd XPCom module
that lives inside mozilla/security.

I was working on several related bugs together this week and I realized they
are correlated and the patches would have had conflicts. Therefore, in this
patch are included the fixes for bug 99566 and bug 105167, too.

Because of 105167, we now always support some kind of read-only mode. Not only
we have been asked to support that - there are good reasons to do it because of
profile switching. The "NSS read/write works" environment can cange between
different profiles. 

As NSS/PSM init no longer depends on the profile directory or good certificate
databases, I removed the trick with IsPSMWorking from the factory construction
code.

We still need to make sure that PSM behaves nicely while in NSS fallback mode
(this includes, as described above, XUL preference overlays enhancements), but
that's a different bug.
Attachment #57094 - Attachment is obsolete: true
Comment on attachment 58021 [details] [diff] [review]
Updated patch / security directory only

r=javi (With caveat of removing the debug statements from Makefile.in and
removing the DLL variables in the IDL makefile.win.)
Attachment #58021 - Flags: review+
No longer depends on: 96058
I was going to ask about making a plugin to handle the certificate mime types. 
If the plugin were called the plugin could init psm on demand.

But then i remembered, our arch doesn't trigger plugins for links, only inline 
content.
timeless: the right bug for discussion would be 109777. I got new ideas from
bbaetz, he suggested to use nsIDocumentLoaderFactory. I will look into it and
post my results in bug 109777.
Attached patch Fixed patch (obsolete) — Splinter Review
After testing some changes were necessary, here is the complete list:
- makefile changes as requested by Javi
- updated REQUIRES list in Windows makefile
- add missing NS_INIT_ISUPPORTS() to nsEntropyCollector::ctor
- move mutex deletion from beginning to end of nsNSSComponent::dtor
- return success from nsNSSComponent::Initialize, even when
  working in NSS fallback mode
- When I removed the special factory macro, I ignored the
  fact that NSS must still be loaded before any other instance in
  the NSS module is created. I re-added the adjusted factory
  macro - but now triggering init only, no check being made, and
  to module nss only, not to module pki.
Attachment #58021 - Attachment is obsolete: true
+  else if (nsCRT::strcmp(aTopic, NS_XPCOM_SHUTDOWN_OBSERVER_ID) == 0
+          || nsCRT::strcmp(aTopic, "quit-application") == 0) {
+    PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("nsNSSComponent: XPCom shutdown
observed\n"));

Are you sure you want to observe "quit-application" for this as well? I think
that notification gets sent when the user exits from the file menu. With turbo
mode, this doesn't mean that the app is really shutting down. I think observing
xpcom shutdown is enough.
Comment on attachment 58081 [details] [diff] [review]
Fixed patch

r=javi
Attachment #58081 - Flags: review+
Conrad: Ok, if you think "xpcom-shutdown" is enough, I'll remove the
"quit-application" observation code before I check it in. I registered for that
additional event, because I saw problems while testing, not receiving any
notifications. But I meanwhile learned that there is a known issue - see bug 110226.
Note that in my previous attachment, about 50% of the patch is caused by two
files who move to a new directory.

The files are nsSecureBrowserUIImpl.h and .cpp.

When reviewing, you can ignore those 4 hunks in the patch, and look at this
smaller file instead. It shows you the changes that I made to those files.
In light of bug 110226, I'll have to take that back :-/ If that isn't fixable
right away, just check that "quit-application" doesn't come at an unexpected
time with turbo mode.
conrad was right the first time - use "xpcom-shutdown"
Now the trick is that you don't want to ever remove yourself from the
xpcom-shutdown notification - just let yourself be removed by the observer
service when the product shuts down. That bug about observer service corruption
happens when an observer is removed during notification. ignore
"quit-application" - its a bit of a hack, anyway.
Problem is, with bug 110226, if the observer *before* this one removes itself,
we wont get notified. We'd have to yank calls to RemoveObserver() in response to
xpxom-shutdown all over the place.
Attachment #58081 - Attachment is obsolete: true
Attachment #58122 - Attachment is obsolete: true
Comment on attachment 58153 [details] [diff] [review]
To make reviewing easier / Changes of moved files only

ok, it all looks good. sr=alecf
I want to make one additional change, which is necessary. In my patch, in method
GetNSSDialogs, I use weak typing, by using nsISupports** as an argument. I need
to change that to be strongly typed to nsISecurityWarningDialogs**.
Comment on attachment 58152 [details] [diff] [review]
Updated Patch / as discussed with alecf

Updating this patch to indicate that it has r=javi and sr=alecf.
Attachment #58152 - Flags: superreview+
Attachment #58152 - Flags: review+
Comment on attachment 59366 [details] [diff] [review]
The additional little change I need to make

sr=alecf
Attachment #59366 - Flags: superreview+
Comment on attachment 59366 [details] [diff] [review]
The additional little change I need to make

r=javi
Attachment #59366 - Flags: review+
Sorry for the patchwork.

Conrad helped me do more testing. We found two issues, which the attached patch
corrects.

1) I did not expect that NSS could be initialized even before a profile is
selected. This happens in the "commercial application activation" phase.
Therefore I'm adding safety checks to the profile event observation code.

2) Our special factory code needs to be more clever. If the nsNSSComponent
itself is the first object requested, the "haveLoaded" flag needs to be set
immediately, in order to prevent a recursion. This can occur because the
nsNSSComponent class triggers the factory code in its init phase.
Comment on attachment 59439 [details] [diff] [review]
Another required incremental patch

Seems that adding the parameter to the NS_NSS_GENERIC_FACTORY_CONSTRUCTOR
 is unnecessary since all instances of the macro pass in PR_FALSE anyway.  I
don't see how  that parameter will ever be true, since teh value is just passed
through to sub callers.  How does this solve the problem?
Javi, the patch contains the following line which has the parameter set to true:

+NS_NSS_GENERIC_FACTORY_CONSTRUCTOR_INIT(PR_TRUE, nsNSSComponent, Init)

Comment on attachment 59439 [details] [diff] [review]
Another required incremental patch

OK.  r=javi
Attachment #59439 - Flags: review+
Comment on attachment 59439 [details] [diff] [review]
Another required incremental patch

sr=alecf but I really think you ought to re-think
NS_NSS_GENERIC_FACTORY_CONSTRUCTOR_INIT and consider using something like
NS_IMPL_NSGETMODULE_WITH_CTOR_DTOR 


 I really don't understand why this has to be so complicated. 

where are you passing in PR_TRUE to this?
Attachment #59439 - Flags: superreview+
If we can find a simpler way, sure.

Let me summarize what we want:
-----------------------------------------------------
Class nsNSSComponent, which implements several interfaces, does the core
initialization.

We must not instantiate any other class from this module, until the
nsNSSComponent service has been created, because of the required (conditional)
NSS init.

Therefore, we need to hook into instance creation. We do that by using a
modified version of NS_GENERIC_FACTORY_CONSTRUCTOR. We modified it to include a
call to EnsureNSSInitialized.

If anything else but nsNSSComponent is requested, the function create its first,
before the requested object is created.

Before the previous patch, we used the standard factory constructor the
nsNSSComponent.

But, this lead to the following:
- nsNSSComponent is created
- before the service had the chance to return from its init phase and become
globally registered, the class itself needs more instances from our module (for
content listener registration)
- in order to create the other instances, our special factory code is reached
- this sees that the indicator variable "haveLoaded" is not yet set, and tries
to create it again => recursion

The previous fixes that, it makes sure our haveLoaded indicator variable is
already set during nsNSSComponent creation, so that later factory code
invocations will not try to do it again.
-----------------------------------------------------

That said:

I wasn't aware of the existence of NS_IMPL_NSGETMODULE_WITH_CTOR_DTOR.
It seems you can execute code when the module is loaded.
In theory, we could use this to do our initialization.
But without rewriting our code, the following pseudo were required within that
module constructor:
  do_GetService(nsNSSComponent)

Without knowing better, I'd guess it is not possible to create an instance from
the load-in-progress module at this point in time.

If you say, this is possible, I can make a simple change to get rid of the
modified macros and can test whether it works.

If you say, it is NOT possible, we would have to rewrite the way the
nsNSSComponent does its initialization. But note the NSS init is not trivial. We
need to obtain misc information and query other services in order to do the
right thing, which all must be allowed from within the module constructor.
Blocks: 101263
all patches checked in, fixed
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
this is awesome!!!

tinderbox is showing:
-78.9% on leak ( 1.30MB down to 274KB )
-12.4% on heap size ( 9.75MB down to 8.54MB )
-5% on startup ( 6.85s down to 6.59s )

Thanks to kaie, dbaron and everyone else who helped!!!
The leak fix might well be from waterson's checkin for bug 45353, you should ask
him if it's about that magnitude. If PSM is really leaking that much then it's
something we've got to fix -- it will eventually start up, this only delays it.
No, there was another tinderbox building without --enable-crypto (on Ports) that
showed a leak drop of 28K due to waterson's checkin (and no bloat (MH) gain). 
That implies this is almost entirely this checkin, which is what was expected
from our previous measurements of NSS memory usage.
Verified.
Status: RESOLVED → VERIFIED
Product: PSM → Core
Version: psm2.1 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.