Closed Bug 501605 Opened 15 years ago Closed 15 years ago

very slow startup for Firefox 3.5 due to accessing IE Internet Temporary Files and Windows Temp folder

Categories

(NSS :: Libraries, defect, P1)

3.12.3
x86
Windows XP
defect

Tracking

(status1.9.2 beta1-fixed, status1.9.1 .2-fixed)

RESOLVED FIXED
3.12.3.1
Tracking Status
status1.9.2 --- beta1-fixed
status1.9.1 --- .2-fixed

People

(Reporter: amir.spamable, Assigned: nelson)

References

()

Details

(4 keywords, Whiteboard: [summary in comment 121-122] [fixed in mozilla-1.9.1 nightlies] [PLEASE NO MORE "ME TOO"] [FIPS Thaw][ts])

Attachments

(7 files, 4 obsolete files)

382.25 KB, application/zip
Details
346.90 KB, application/x-zip-compressed
Details
484.19 KB, application/octet-stream
Details
9.69 KB, patch
Details | Diff | Splinter Review
6.20 KB, patch
Details | Diff | Splinter Review
2.85 KB, patch
Details | Diff | Splinter Review
3.17 KB, patch
nelson
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1) Gecko/20090624 Firefox/3.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1) Gecko/20090624 Firefox/3.5

I have tried firefox 3.5 final version today. It starts up very very slow(almost morethen two minutes). I have a very powerfull machine at work (Q6600 with 4 GB Ram). HEre is what i have done to resolve the issues

- Cleared all the cache (also from ie just in case as i read somewhere that ie cache can also have an impact on startup)
- disable update check for all three things
- Tried to compact the data base by Vacuum command
and finally when things didnt go well I removed the profile fodler from Documents and settings (removed mozilla folder)
Even after doing all the things above i still have a very slow startup of firefox for the first time(after reboot), subsequent startups are fine (3-4)secs. I have also been experimenting with earlier releases beta 2,3,4 rc1 the problem started after beta 4 as far as i can remember.

Reproducible: Always

Steps to Reproduce:
1. Stat firefox after reboot
2.
3.
Actual Results:  
starts up in 2 mins

Expected Results:  
should start up in under 10 secs

I also tried clean install with no plugins/extensins that didnt help either.
You could help by finding the regression range. 
See https://wiki.mozilla.org/MozillaQualityAssurance:Triage#How_to_Help_with_Regressions_--_Finding_Regression_Windows

Trunk series look like: 200x-xx-xx-mozilla-central/
and branch series: 200x-xx-xx-mozilla-1.9.1/

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/
Blocks: 501563
Keywords: perf
Version: unspecified → 3.5 Branch
I am using Firefox 3.5 as well and I am having the same problem. Mine takes several minutes to start-up and during that time it uses a lot of CPU, sometimes up to 50%. I am having this problem on my Windows Vista Home Premium (32-bit) operating system, and I have all the latest updates installed. 

Here's what I have tried in an attempt to fix the problem I'm having:

- I created a new profile and had Firefox use that but unfortunately there was no change in the speed of start-up.
- I cleared the browsing history, download history, form & search history, cookies, active logins, and the cache. This did not improve the speed of start-up.
- I tried starting Firefox in safe mode, but no change happened in speed.
- I ran Disk Cleanup, made sure all the boxes were checked and cleared everything. No change happened in speed.
- I changed my homepage to "about:blank". This did make the browser start up faster but it also froze it and I was forced to use Task Manager to kill firefox.exe.

I hope this problem gets fixed very soon because I don't want to be forced to have to use a different web browser. :/
Which antivirus do you have? have you tried uninstalling and reinstalling the antivirus?
no AV here :)
I am using Avira AntiVir Personal.
Same problem than the guys the problem appeared after updating to 3.5 tryed everything up there. And i takes 4 to 10 minutos on a Core 2 Duo P8400 with 8GB RAM and 80% free 500GB hard drive. Internet Explore 8.0 opens minus than one second. Windows Vista Home Premium 64

Antivirus Kapersky Internet Security 8.0
I've seen other notifications of the issue on italian forums, but i can ensure the browser starts up faster than 3.0.x, so something external must be involved in the issue.
I suggest to follow steps in http://support.mozilla.com/kb/Basic Troubleshooting and see if a clean install solves the issue.

Also ensure your antivirus is updated to the latest version, eventually temporary disable its web surfing checks.
Can anyone who is experiencing this issue please check the error console if some suspicious entries are visible?
Anybody on Windows seeing the issue, please try the following:

In the add-ons manager plugins section try disabling all plugins one by one, nor safe mode or a new profile will disable your plugins. If this is due to a plug in you should be able to find which.

be sure you don't have any malware, some time ago Conficker created problems to many users. Try this removal tool http://www.symantec.com/security_response/writeup.jsp?docid=2009-011316-0247-99

Use process explorer http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx it will tell you what files are in use and by who, if anything is accessing temp folders you should be able to see it.
I haven't tried disabling the plugins yet, but after running Auslogics BoostSpeed; a program which has features such as a registry cleaner, disk cleaner and disk defragmenter in one, and running the registry cleaner, disk cleaner, disk defragmenter, and registry defragmenter part of it I noticed it only took 52 seconds for Firefox to start up. So that might have been part of the problem. 

And you say it might be plug-ins? Maybe it will help if I provide a list. Here is a list of all the plugins I have, with their versions in "(())" and any additional information in "[]":

- Adobe Acrobat ((9.1.0.163))
- Dyyno Player Plugin ((0.8.6.2))
- getPlus for Adobe 15235 ((1.5.2.35))
- Google Update ((1.2.183.7))
- Google Updater ((2.4.1536.6592))
- iTunes Application Detector ((1.0.1.1))
- Java(TM) Platform SE 6 U10 ((6.0.100.25))
- Java(TM) Platform SE 6 U10 ((6.0.100.25)) [the first says it is a binary, the second says it is a plugin for Netscape Navigator and a DLL helper]
- MetaStream 3 Plugin ((3.5.0.36))
- Microsoft Windows Media Player Firefox Plugin ((1.0.0.8))
- Move Media Player ((713.3.0.6))
- Mozilla Default Plugin ((1.0.0.15))
- My Web Search Plugin Stub ((1.0.0.0))
- QuickTime Plug-in 7.6.2 ((7.6.2.0))
- RealJukebox NS Plugin ((1.0.2.3146))
- RealNetworks Rhapsody Player Engine ((1.0.2.603))
- RealPlayer Version Plugin ((6.0.11.3006))
- RealPlayer(tm) G2 LiveConnect-Enabled Plug-In (32-bit) ((6.0.11.3008))
- Shockwave Flash ((10.0.22.87))
- Shockwave for Director ((11.5.0.595))
- Silverlight Plug-In ((2.0.40115.0))
- Turner Media Player Plugin 1.0.0.9 ((0.1.517.1555))
- Windows Presentation Foundation ((3.5.30729.1))
52 seconds is still too much, my Celeron laptop with Vista (that's a really slow machine) starts up the browser in 3s, so please, continue to investigate the issue.

I don't have the following plugins from your list, so i would try disabling these first:
- Dyyno Player Plugin ((0.8.6.2))
- getPlus for Adobe 15235 ((1.5.2.35))
- Google Updater ((2.4.1536.6592))
- MetaStream 3 Plugin ((3.5.0.36))
- Move Media Player ((713.3.0.6))
- My Web Search Plugin Stub ((1.0.0.0))
- RealJukebox NS Plugin ((1.0.2.3146))
- RealNetworks Rhapsody Player Engine ((1.0.2.603))
- Turner Media Player Plugin 1.0.0.9 ((0.1.517.1555))
I disabled the plug-ins you listed and the browser took about 48 seconds to start up. I'm going to try disabling all of the plug-ins and see if that helps.
46 seconds with all plugins disabled. I guess it isn't the plugins.
Could you test how Firefox behaves for a new Windows user account? Does it show the same timings on startup?
I really don't want to mess around with my computer's user account settings, I don't feel comfortable doing it and don't want to risk messing it up. :x  

I don't know what the problem could possibly be. My CPU isn't maxed out or anything and I have physical memory available.

I don't know if this helps or not, but I looked at firefox.exe in Task Manager and it says that the memory (private working set) is 83,216K. I'm not sure if it's supposed to be that high or not, so I thought I'd mention it.
can someone please try installing it at diffrent location e.g Firefox 3.5 and then see. I did so and it started it under 2 secs. 

But i should mention it that i will restart my computer to see startup time again....
Mine is working fine now. What i did was several things.

- installed it at diffrent location then the defualt one.
- Ran once click clean with BoostSpeed

Its starting in under 3-4 secs

Thanks guys and good luck
Since Gavin pointed to me the fact NSS 3.12.3 is using temporary internet files to generate seeds, could be this bug is due to that.

Reasoning is that:
- 3.0 does not show the behavior and downgrading to 3.0 solves the issue
- clearing temporary internet files from IE greatly reduce the problem (startup moves from minutes to seconds)
- NSS looks like accessing those files

Even if i can't reproduce, i'm confirming due to the high number of reports in various forums, on mozillazine, and here.
Assignee: nobody → nobody
Status: UNCONFIRMED → NEW
Component: General → Libraries
Ever confirmed: true
Flags: blocking1.9.1.1?
Product: Firefox → NSS
QA Contact: general → libraries
Summary: very slow startup for 3.5 → very slow startup for Firefox 3.5 due to accessing IE Internet Temporary Files
Version: 3.5 Branch → 3.12.3
the related NSS change looks like bug 466745
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Nelson will that bug be enough to solve the issue? reading that bug looks like we are doing the work twice, but some user has reported 10 minutes for a startup, even splitting that by two, will be a big number.
Common issue but I'm not 100% convinced that this is the cause.  It may be a compounding problem.
NSS counts the files in Temporary Internet Files, and then opens and reads
in no more than 10 files (and no more than some number of bytes per file) 
which are uniformly spaced throughout the directory, e.g. if the directory
contains 150 files, it reads every 15th file.  Doing this has been reported
to take a measurable number of *milliseconds*, and doing it twice was just 
wasteful.  But I cannot imagine how that process could take multiple MINUTES.
i can recall that only trying to open that folder in explorer in some XP system was taking more than 10 minutes, those folders usually contain hundreds of thousands of files, so maybe even counting and getting an enumeration of them could be slow.
Btw, some user talked about many more than 10 files accessed (hundreds), plus the fact any antivirus will catch in and add its own time.
in bug 500613, it was reported that it took 5 minutes to start firefox with a temp folder with 65.000 files (not the Temporary Internet Files folder).
indeed the folders used by NSS are:
C:\Documents and Settings\*user*\Local Settings\History
C:\Documents and Settings\*user*\Local Settings\Temporary Internet Files
C:\Documents and Settings\*user*\My Recent Documents
C:\Documents and Settings\*user*\Temp\
Recycle Bin
Network Neighborhood
Summary: very slow startup for Firefox 3.5 due to accessing IE Internet Temporary Files → very slow startup for Firefox 3.5 due to accessing IE Internet Temporary Files and Windows Temp folder
Nelson's lack of imagination as to what causes this bug does not justify treating it as a duplicate of bug 489811.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Thanks Jesse for re-opening this. I don't know why Nelson decided to close it, after all the problem was not resolved.

When I started Firefox 3.5 today, it started up in 10-20 seconds. But after browsing the web, I decided to test its speed out. I discovered that Firefox 3.5 took 1 minute and 45 seconds to start up. The only thing I can think of is the temporary internet files and maybe the Windows Temp folder as is mentioned in the title of this bug.

The odd thing is is when I searched for "Temp" I found three folders named "Temp" on Drive D, and my operating system is on Drive C. The second and third folders had one folder each in them, but that was it. I also found the Temporary Internet Files folder. That folder was located on Drive D as well and it was completely empty. I'm confused. Again, I have Windows Vista and I know the operating system this bug was reported on was Windows XP so I know the locations of the files/folders are different.
I am experiencing this problem as well. Yesterday I updated Firefox 3.0.x to 3.5.  My installation of 3.0 would typically take about 5-10 seconds to launch after a reboot, and 2-3 seconds to launch after it had already been running.  After I updated to 3.5, it takes 75-80 seconds to launch after a reboot, and 5-10 seconds to launch after it has already been running.

I just spoke with a contributor in the live support chat, who suggested that I clear the browser cache and clear the Windows temp folder.  I did this, rebooted, launched Firefox 3.5, and it now takes about 70-73 seconds to launch.  An improvement, yes, but still really bad.
In addition to the main Windows temp folder, I just went in and deleted the contents of the folders mentioned in comment #30 by Marco.  Amazingly, this actually increased the launch time: it took 92 seconds to launch.
> some user talked about many more than 10 files accessed (hundreds),
Got real evidence?

It's not difficult to impose limits on the number of files counted 
(in addition to the existing limits) but the affected code is now undergoing 
FIPS evaluation, and is frozen.  In the event that the FIPS evaluation finds
a flaw, there may be a window of opportunity to "thaw" the tree and fix it.
We'll see.
Whiteboard: FIPS Thaw
(In reply to comment #36)
> > some user talked about many more than 10 files accessed (hundreds),
> Got real evidence?

Process Explorer was the evidence... if i could reproduce any way i would do and bring more. Unfortunatly as of now we have to rely on the large number of users reporting the problem, and evaluate their informations.
OK, krystaiceman@yahoo.com has procmon logs available for anyone who wants them of a slow startup.  A zip of the CSV version of said log is attached.

What jumps out at me from the log are the many many _directories_ there are in the temp folders.  As far as I can tell, the following happens:

1)  Our limit on bytes read is per system folder, not overall, as far as I can
    see.
2)  Our walk counting files involves reading all the filenames from all the
    many directories; this takes some time.
3)  Our walk reading files involves exactly the same thing again (due to the
    "read some random files scattered through the list") behavior.

Does Windows have some way of measuring the size of just the directories in the relevant temp folders that someone who sees a slow startup could try?
Same problem for me. Firefox has always been slower to start than IE or Chrome. Firefox 3 took around 10 seconds to start. But 3.5 takes over 60 seconds. Meanwhile it hogs all CPU and memory, so any other running programs freeze. Same issue even if no other apps are running.

I have no themes and 3 simple add-ons. Even disabling add-ons has no effect.

Using Win XP SP3 on Core Duo, clean system with no themes, chat windows or anything other than the necessary system background apps running.
(In reply to comment #38)
> What jumps out at me from the log are the many many _directories_ there 
> are in the temp folders.  

Yes, NSS bounds the *depth* into which it will descend in such folders, but
not the breadth.  Definitely need to limit that.  

> As far as I can tell, the following happens:
> 
> 1)  Our limit on bytes read is per system folder, not overall, as far as 
>     I can see.

Why do you say that?  The limit is definitely for the entire scan.  In fact,
If you look at all the references to totalFileBytes, e.g. 
http://mxr.mozilla.org/mozilla1.9.1/search?string=totalFileBytes&find=win_rand.c
you will see that the count is never reset, to zero or any other value.  
It just keeps being added.  

When NSS hits the "limit", the effect is not to stop right there, but to change 
the behavior to read at most one 1KB buffer of data from at most one file per 
directory after that.  This is by design.  The idea is that, if we read a 
certain number of bytes before having read in the 20-21 files, then we greatly
reduce the number of bytes gathered from each of the remaining of the 20-21 
files to be read.  

> 2)  Our walk counting files involves reading all the filenames from all the
>     many directories; this takes some time.
> 3)  Our walk reading files involves exactly the same thing again (due to the
>     "read some random files scattered through the list") behavior.

It reads file uniformly spaced through out the list.  The only randomness in the set of files that are read comes from the OS'es alteration of the directories' contents from run to run.    

Correcting what I wrote above, NSS actually reads the first 10 files, and 
then 10 more files spaced uniformly throughout the list.  This is intended 
to maximize the probability that it does not read exactly the same files each 
time Firefox is started.  It is intended to never read more than 20 or 21 
files total.  

> Does Windows have some way of measuring the size of just the directories in 
> the relevant temp folders that someone who sees a slow startup could try?

I wish I knew.  As Windows directories grow, they always grow at the tail.  
So, if we limit the scan to the FIRST N files in each directory, then as the 
directories reach that limit and continue to grow. we always scan/read the 
same files with each process start.  We really try to avoid that.  If there 
was some efficient way to limit the scan to the LAST N files in each directory, 
that would be much better.  If we could efficiently get a count of files in 
the directory, and then efficiently skip to the N'th next file in the directory, that would undoubtedly speed this up a lot.  

I will analyze the log file more this weekend.
BTW, this whole scheme is not new in FF 3.5.  NSS has used it for years.
What's new in FF 3.5, introduced by Fennec guys, is that we search more 
system directories, and we use the Unicode UTF16 (W suffix) functions rather 
than the multi-byte character set (A suffix) functions to manipulate file
names.  We do some conversion between UTF16 and UTF8, and I think I see LOTS 
of room for improvement in the efficiency there.  We only need to do that 
conversion for the files we actually open/read, but now we do it for all the
files we enumerate.  I'll write a patch for that this weekend.
I apoligize for the potential cross-post.

Mine is doing the same thing.  
When multiple instances of Firefox 3.5 are running, you can open new instances
of the program, and they load very quickly.  Even if you have the downloads
window open, a new instance of the program will load quickly.  

When all instances of Firefox3.5 are closed, however, it takes approximately 30
seconds-1 minute to load.  

I have not tried any of the workarounds listed above, since frankly, it should
never take this long to load.

Computer = Gateway with Intel Centrino Duo, 2GB RAM, Vista Home Premium
(In reply to comment #44)

You can't open multiple instances of Firefox (unless you use the -no-remote flag on the commandline, but if you did that, then I don't need to explain what follows).

Each window will actually be opened inside the same Firefox application, even if you try opening a new application by clicking on the icon on your desktop. The new application closes immediately, and it's the old one that opens the window. That's why the normal startup routine is skipped.
I think this has been discovered before, but here's what I found:

After noticing Firefox was taking a long time to start today, I ran Auslogics BoostSpeed's Disk Cleanup utility. 

What I found was that in my "C:\Users\Jonathan\AppData\Local\Temp" directory, there were 25,279 files. After removing all these files, Firefox started in less than 10 seconds.

What I would love to know is why so many files accumulate there and if there is a way for Firefox to automatically delete all those files when it is closed.
They're not created by Firefox, but by ALL your other applications (and the operating system). Some are left by lazy applications (which forget to delete those temporary files), some are left by applications that have crashed (could never delete them in the nodmal way), some by applications that are still running (often impossible to delete). Sometimes you see a few files or folders that are created by Firefox like the plugtmp folders or temporary files created during downloading. Unless Firefox crashes, they will be cleaned up when the application closes.

Firefox shouldn't delete those apps just like that - there are special applications for that purpose, like that Auslogics BoostSpeed's Disk Cleanup utility (if it wasn't necessary, they wouldn't exist). Or by going into that directory and manually cleaning them.
For people with HUGE temp directories, it would be useful to know what 
percentage of those files are:
a) more than 7 days old
b) more than 48 hours old, but less than 7 days old
c) 24-48 hours old
d) less than 24 hours old.  

Firefox (not NSS) might do users a service by cleaning up really old temp 
files if this is a wide spread problem.  

PLEASE, NO MORE "ME TOO" COMMENTS!
Whiteboard: FIPS Thaw → FIPS Thaw - PLEASE, NO MORE "ME TOO" COMMENTS!
Sorry for the OFF:
Could someone enlighten me why Firefox is counting and reading some files in temp folder?
re comment 50, iirc nss uses it as a source of randomness

re comment 49, no, applications have no business deleting files they don't own without asking, and a browser is not the tool to offer to clean up random files.
> The limit is definitely for the entire scan.

Ah, indeed.  I was definitely wrong on that one.

> It reads file uniformly spaced through out the list. 

Right; I was a bit imprecise there.

> If there was some efficient way to limit the scan to the LAST N files in each
> directory

ccing some folks who might know...

> What's new in FF 3.5, introduced by Fennec guys, is that we search more 
> system directories

Hmm...  Do the new directories tend to have more files or something?  Looking at the old code, if I read it correctly at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/freebl/win_rand.c&rev=1.15&mark=239-245,256-257#237 we looked in three different directories on WinCE but only in whatever GetSystemDirectory returned on desktop Windows.  So we were not in fact reading these huge temp directories... Nor were we recursively descending into subdirectories, except on WinCE.  Again, unless I'm missing something. Am I?  (I wouldn't be surprised if I am; I'm not that familiar with the Windows API calls involved...)

> We do some conversion between UTF16 and UTF8, and I think I see LOTS 
> of room for improvement in the efficiency there.

That's possible, but the numbers I'm seeing in this bug for startup and temp dir sizes are on the order of a second or three per thousand files in the temp dir.  It shouldn't take on the order of a millisecond to do an encoding conversion like this.  What _could_ take on the order of a millisecond (or at least an appreciable fraction of that), though, is a stat() call or equivalent.  We're now doing two per file per system directory we look at, right?  One when counting files and one when reading data; we always do the stat because we need to decide whether the file is a directory...  Or is that information stored in the parent directory on Windows?

I pushed the fix for bug 489811 to try server; hopefully that will produce some Windows builds that people can try to see the impact.  It would also be really useful to get some file access logs from a build with that change; that might help reduce the noise in terms of sorting out the remaining issue.  Once those builds are available I'll post the link to them here.

Once we have Nelson's fix for the encoding conversion thing, we should try creating a test build with that too...

Nelson, thanks for looking into this!
Boris, in answer to some of your questions in comment 52, 
The idea of the new directories is not that they have more files, but rather 
that their contents are more dynamic, not as static, hence more entropy is 
gathered.  

Windows' file-name enumeration API that we use returns info about whether 
the file name is a directory, so we don't do any explicit stats as we count
files. Based on the log file, I'd say there's no stat buried down in that API 
either.

The log file shows results for two processes, one of which spawns the other.
There are 10384 unique lines in the first  log (pid 7140), and 
there are 13645 unique lines in the second log (pid 3428).
The first  process takes 28.0 seconds elapsed time. 
The second process takes 59.x seconds. 

In the ~10k lines in the first log, freebl's enumerating of files doesn't
being until line ~7050, which is 2/3's of the way through the file, and it 
ends at line ~9256, so it takes only a total of ~2200 lines out of ~10k.
It begins 22.4 seconds into the process lifetime, and ends 33.7 seconds 
into the process lifetime, so it takes only ~11.3 out of 28 seconds, which 
is ~40.4% of the elapsed time.  

To me that says that, even if you could reduce the time spent on these files
to zero, the startup time would still be very big.  It says that this file
enumeration and reading is only part of the problem, and not even the majority.

I am still analyzing these logs.  It's quite tedious.  I will attach my 
filtered and simplified version of the logs to this bug soon.
The log files show that the code is reading in the expected number of files.
They also seem to show something else: Windows sorts the file names in the
directories into an alphabetical order when enumerating the file names.  
I imagine that's pretty slow in huge directories, even if the code then only
reads a few files.  I wonder if there's an alternative API that doesn't sort
the file names.  

But still, this is less than half the total startup time for process 7140.
I will study the log for the second process next.

IINM, Firefox comes up, and decides what profile to use, and starts a new 
second process to actually use the selected profile.  It seems unfortunate
to startup NSS's softoken in both processes.  I wonder if NSS is needed at 
all in the first process.
In the log file for the second of the two firefox processes, PID 3428, 
freebl begins its scanning of files 13.3 seconds into the process lifetime,
at about line 6530, and ends its scanning 9.7 seconds later, at about line 8710.
I don't know how much time elapses after that scan finishes before the user 
perceives that the process is started up and read.  But let's assume it's zero
(worst case for NSS).  (The log actually goes on for ~5000 more lines and 
~40 more seconds after that, but I don't know when it was stopped.)

Here again in this process, the entire time spent by NSS's processing of files 
is less than half of the total startup time, perhaps much less.  I'm not trying
to excuse NSS's startup time.  I'm merely saying that Firefox must not assume
that NSS inefficiency is the entire problem, or that NSS improvements will 
make startup time acceptable.
> The idea of the new directories is not that they have more files,

Right; I'm asking whether in practice they end up having more files, thus leading to the slower startups we're seeing.

The log file times from comment 54-56 are curious; that data doesn't match up well with the reports that clearing the temp directory in fact speeds up startup by a significant amount...  And yes, I don't think this issue can account for all of our known startup slowness: even with temp directories cleared it takes 10 seconds or so to start up.

The other thing that might be interesting to try is putting up a test build wit the freebl scan disabled just to get direct measurements of how long that takes from a user-perception-of-startup viewpoint.  Maybe there's something else reading those temp directories too (though that seems unlikely).

I can't speak to not initializing NSS for the profile thing; it's odd that we do it at all there.  Ccing Benjamin in case he knows what's going on there.

One other interesting question is whether this file scan could happen on a background thread instead of the main thread, but it seems like that should be a last resort even if it's possible...
(In reply to comment #57)
> I'm asking whether in practice they end up having more files, 

The old scheme went through directories that were large (many files, ~5900 
on my system) but were quite static, changing little and seldom.  The new
scheme goes through directories that may be empty or may be huge, and that
change frequently.  I think that, for many (most?) users, the numbers of 
files in these directories is no more than in the old scheme, but for some
the new directories are MUCH larger, and we hear from them.  :)

I am surprised we didn't hear more about this from the Beta users.

> The log file times from comment 54-56 are curious; 

I suspect that the logging affected the times pretty significantly.
I've attached my filtered log files here. They're about half the size as 
before, due to the elimination of much redundant info.  

> might be interesting to try putting up a test build with the freebl scan 
> disabled just to get direct measurements of how long that takes 
> from a user-perception-of-startup viewpoint.  

I agree that test would be good, for two purposes:
a) to get absolute measurements of the effect, and 
b) to see how if affects perceived times.
But we would not want people to do any banking or other crypto using such 
a build.

> this file scan could happen on a background thread 

The PRNG initialization needs to happen early in the NSS initialization, 
so we can't let NSS initialization complete until it's done, but maybe all
of NSS initialization could be done on a background thread.  The questions
are:
a) what (in FF) needs NSS, and 
b) Is it OK if it is delayed until after the rest of the browser has completed
initialization?  

But I agree about that being a last resort.  We've got lots of other things to
try first.
Assignee: nobody → nelson
Priority: -- → P1
Since that Mozillazine topic/thread is now "locked" (no more postings are 
being accepted) I cannot post a workaround there, so I'll post it here.

Windows let's the user control the maximum amount of disk space used for 
the "Temporary Internet Files".  The default setting is extremely generous.
On my old 250GB drive, it was over 8GB.  Setting it down to say 100MB or 
200MB should REALLY help reduce the sheer number of files to be scanned.  

To adjust that, go to Control Panel, Internet Options, General Tab.
Under the Temporary Internet Files heading, click the "Settings" button.
Change the number under "Amount of disk space to use" to 100 or 200.
Click OK. Then, back in the Internet Options, General tab, under the 
Temporary Internet Files heading, click the "Delete Files" button.
Put a check in the box that says "Delete all offline content". Click OK.
Click OK to dismiss the Internet Options panel.  

Wish I could put that into the Mozillazine article. :(
On both Windows Vista systems it is 50 MB for the Temporary Internet Files. I never changed this and didn't import settings, so I'm pretty sure this must be the default value.
> 
> I am surprised we didn't hear more about this from the Beta users.
> 

I'm curious as why the Test machines didn't see a jump in Ts, unless its
measuring something not related, which is likely since I'm not a coder, just a
simple tester, and don't have any idea how the tests are constructed and what
they test.

I only use IE for testing against Minefield, so any files in my temp folders
are for the most part non-existent is why I didn't see any slow-downs.
(In reply to comment #60)
> On both Windows Vista systems it is 50 MB for the Temporary Internet Files. I
> never changed this and didn't import settings, so I'm pretty sure this must be
> the default value.

Also Windows 7 has 50 MB as the default value, and on Windows XP it is likely 640 MB by default (so a lot more).
(In reply to comment #61)
> I'm curious as why the Test machines didn't see a jump in Ts, unless its
> measuring something not related, which is likely since I'm not a coder, just a
> simple tester, and don't have any idea how the tests are constructed and what
> they test.
> 
> I only use IE for testing against Minefield, so any files in my temp folders
> are for the most part non-existent is why I didn't see any slow-downs.

The Talos machines likely saw no slowdown for the exact same reason. I doubt IE has ever been run on our Windows Talos machines.
> Wish I could put that into the Mozillazine article. :(

I had a mod in MZ support forum add the info to the 3.5 FAQ thread that is stickied. 

http://forums.mozillazine.org/viewtopic.php?f=38&t=1326475
I guess that people only do a Disk Cleanup if they want to free up space on their disk, but modern computers have large disks. Also, the option to clean up Temporary Files is not checked by default, so this might be forgotten.
Could also be people seeing this have the disk badly fragmented, or particularly slow for any reason (A bad cluster hitting one of those folders?). We should not rely on the fact "we don't read many files", since that could still be too much for some system with slow I/O. Nss is not clearly not the main reason for Firefox being slow at startup, still the change from 3.0 to 3.5 is huge for many users, this should let us think about the real benefit of reading a bunch of files at startup. I guess if can't be a better way to generate randomness (reading random data from memory?).
(In reply to comment #64)
> I had a mod in MZ support forum add the info to the 3.5 FAQ thread that is
> stickied. 

Really, there's the need for having a new thread where to post the trybuilds and get feedback from users suffering from the issue.
FYI on (XP) systems with tens of thousands of temporary Internet files it can take minutes just to dir/s/a | find "File(s)" to count them.
I kicked off some more try server builds with various parts of the file scan commented out; I'll post the links here once those are available for people to test.

Some questions from looking at the code some more:

1)  It makes sense for NSS in general to look at CSIDL_INTENET_CACHE, but does it make sense for NSS-as-part-of-Firefox?  If that folder is only used as a cache by IE, then I would expect it to not change much if Firefox is the browser being used.
2)  Does looking at CSIDL_COMPUTERSNEARME end up reading data (possibly lots of it) over the network if it does anything useful?  That's certainly what I'd expect given the MSDN description of that value.

The three builds I kicked off were: 1) disable the file scan completely, 2) disable just CSIDL_INTERNET_CACHE, 3) disable just CSIDL_COMPUTERSNEARME.  I'd really appreciate people testing startup (and only startup; see comment 58) with those builds once I have links to them.
Build with file scan completely disabled:
https://build.mozilla.org/tryserver-builds/bzbarsky@mozilla.com-try-008471217120/try-008471217120-win32.zip

Build with scan of CSIDL_INTENET_CACHE disabled:
https://build.mozilla.org/tryserver-builds/bzbarsky@mozilla.com-try-10b1ac4e10b1/try-10b1ac4e10b1-win32.zip

Build with scan of CSIDL_COMPUTERSNEARME disabled:
https://build.mozilla.org/tryserver-builds/bzbarsky@mozilla.com-try-e4cce7663841/try-e4cce7663841-win32.zip

These are all against m-c, not 1.9.1, sadly... my apologies for that.  In any case, I would really appreciate it if people would test those and report on what they see.
Just to give a short warning to everyone who will test those builds... You should create a FRESH PROFILE and not run the daily profile with such a build. Nightly builds on trunk are pre-alpha1 and can cause dataloss. Please check the following link how to create a new profile: http://support.mozilla.com/en-US/kb/Managing+profiles.
(In reply to comment #55)
> The log files show that the code is reading in the expected number of files.
> They also seem to show something else: Windows sorts the file names in the
> directories into an alphabetical order when enumerating the file names.  
> I imagine that's pretty slow in huge directories, even if the code then only
> reads a few files.  I wonder if there's an alternative API that doesn't sort
> the file names.  

The MSDN documention page for FindNextFileW <http://msdn.microsoft.com/en-us/library/aa364428%28VS.85%29.aspx> mentions this :

<quote>
The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.
</quote>

Note that it isn't spelled out that clearly in the page for FindFirstFileW <http://msdn.microsoft.com/en-us/library/aa364418%28VS.85%29.aspx>.
(In reply to comment #66)
> I guess if can't be a better way to generate randomness (reading random data from memory?).
We have a suggestion here by someone who's been developing for over 15 yrs:
http://tinyurl.com/mm6kwr

Further question, what does NSS check on Mac? Mac has been definitely left in cold and there are numerous reports for it as well: http://tinyurl.com/l2lqsv
And so far one Linux report (http://tinyurl.com/nvzyae). What's checked on Linux?

Btw, I can't believe devs are still defaulting to MZ forums when you made your "official" variant of firefox forums here: 
http://support.mozilla.com/tiki-view_forum.php?forumId=1

You act as if it doesn't exist. Like I said, Mozilla should of just adopted MZ forums, paid its hosting & bandwidth, instead of deeming it unconnected to the organization & making it raise funds with ads.

2nd topic @ the "offical" forums:
Firefox 3.5 slow to start troubleshooting &  info - http://tinyurl.com/n7fctb

Thanks bz for the tryserver builds. I'll try them myself. I feel like posting this for the average joes too but explaining how to create a 2nd profile and a Firefox shortcut pointing to the tryserver build's firefox.exe with the right parameters probably will give them a heartattack. :P I'll still post it, since I did find some techies there & ask for their results.
Thanks Noah, is interesting the fact a user is also reporting reading of all fonts in c:/windows/fonts, that could be another issue hitting people with many fonts, but it is not due to NSS, instead it is probably a scan by thebes gfxWindowsPlatform::FindFonts().
Ick, wrong link for mac. Those posts are from 2008.

http://tinyurl.com/nw9v4u - one of them but the mac reports are spread out.

(In reply to comment #74)
> Thanks Noah, is interesting the fact a user is also reporting reading of all
> fonts in c:/windows/fonts, that could be another issue hitting people with 
> many fonts, but it is not due to NSS, instead it is probably a scan by thebes
> gfxWindowsPlatform::FindFonts().
Indeed. That has been long standing but I thought we knew the risks associated with that and didn't give a damn. There was a time (4/30/2009) in #firefox in irc.mozilla.org where I recall a performance problem:
        wavetrip	you wouldnt believe
 	wavetrip	i solved it
 	Mardeg	        how?
 	wavetrip	it was not a browser problem
 	wavetrip	i remember 1 month ago
 	wavetrip	some colleague of mine was installing tons of fonts on my machine with adobe type manager
 	wavetrip	i disabled a set of fonts - about 1,500
 	wavetrip	and now its fast
 	Mardeg	I keep forgetting doing that can affect performance
	wavetrip	i also noticed that fotosop and flash were starting slower but i didnt made the connection
(In reply to comment #73)
> (In reply to comment #66)
> > I guess if can't be a better way to generate randomness (reading random data from memory?).
> We have a suggestion here by someone who's been developing for over 15 yrs:
> http://tinyurl.com/mm6kwr
> 

The unix version (unix_rand.c) also accesses /tmp, /usr/tmp, /var/tmp and similar, but doesn't actually try to read files in those locations - it only reads the directory structure as a file, and uses that to collect entropy. /tmp/* files would still contribute, but aren't really read.

Note that it also mixes in data from /dev/urandom - maybe we can do the same on Windows (using CryptGenRandom), to collect enough entropy without doing the whole enumeration. And without relying completely on CryptGenRandom too.
filed bug 502821 about fonts.
Blocks: 479078
Quoted from Noah's SUMO link:
"I did discover however that when I disabled my security suite's realtime scanning (F-Secure) all the problems disappeared and firefox was working at peak speed with no hangups, freezes or crashes which have been so annoying for the past few months."

So, note that real-time virus scanners can exacerbate the filesystem access, which could be why developers may find this difficult to reproduce.
It has occured to me that part of the reason many users seem to have huge temp
files, and temp internet files is likely due to the fact they are using the
IETab, or IEView Addon, not realizing that IE is being used, but merely opening
it in a Tab in Firefox.   

While this does not change the impact of this issue, it might help explain why
some users have such large files.  I do realize there is little the dev's can
do about this, but thought it might be food for thought as to why some
experience the start-up issue, and not others. 

Sadly, most users of IEtab think they are secure, and are ultra-paranoid and
have no clue they are leaving traces all over the place when they use IETab. 
They go all crazy deleting cookies/history and everything else, but are
clueless they have extensive history still in IE. :(
Linux has also its own startup bug: Bug 502727.
In reply to comment 76, if we could read the directory contents, the way 
that we do on Unix, avoiding a sort, I would be interested in that.
(In reply to comment #81)
> In reply to comment 76, if we could read the directory contents, the way 
> that we do on Unix, avoiding a sort, I would be interested in that.

I'm normally not developing under Windows, but I tried it in a spare moment. fopen() doesn't work, but CreateFile() does, if you pass FILE_FLAG_BACKUP_SEMANTICS, and if you use BackupFile() to read the data instead of ReadFile(). But it only reads a few hundreds of bytes per directory. This interface is only used to backup the directory itself, not the contents. This wouldn't deliver more entropy that a regular stat() call on the directory.

What about using CryptGenRandom to receive some extra entropy from the operating system (similar to reading from /dev/urandom) ? It's currently only used in RNG_SystemRNG.
It looks like RNG_SystemInfoForRNG is called twice from nsc_CommonInitialize:

>	freebl3.dll!ReadSystemFiles()  Line 262
 	freebl3.dll!RNG_SystemInfoForRNG()  Line 357
 	freebl3.dll!rng_init()  Line 420
 	nspr4.dll!PR_CallOnce()  Line 806
 	freebl3.dll!RNG_RNGInit()  Line 462 + 0x10 bytes
 	softokn3.dll!RNG_RNGInit()  Line 835
 	softokn3.dll!nsc_CommonInitialize()  Line 2582 + 0x5 bytes


>	freebl3.dll!ReadSystemFiles()  Line 262
 	freebl3.dll!RNG_SystemInfoForRNG()  Line 357
 	softokn3.dll!RNG_SystemInfoForRNG()  Line 1570
 	softokn3.dll!nsc_CommonInitialize()  Line 2602

http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/softoken/pkcs11.c#2562
(In reply to comment #83)
> It looks like RNG_SystemInfoForRNG is called twice from nsc_CommonInitialize:

That's bug 489811
Ok.  I don't think eliminating the 2nd call will help much though.

On my system, I had about 500 files in those folders and that
took ReadSystemFiles() about 20 millisec to complete [each call].
After creating 16000 empty files in the TEMP folder, the first call
takes 1200ms and the second call 60ms.  Creating another 16000 empty
files: 2300ms/80ms.  Another 16000 files: 3400ms/125ms.
@Mats we are not living in a perfact world. What if the user start firefox when boot sequence is still being executed. You never know IF ReadSystemFiles results would be still in the disks cache. depending on that the worst case could be 2300ms/2300ms
I am not sure that my slow start is due to this:
daily nightly first start after boot 45-50sec
after it was once loaded < 3sec
first start after boot Antivir OFF 30 sec.

bz build "Build with file scan completely disabled:
":
25 sec.

the same build Antivir disabled :25 sec

This is my main profile, places.sqlite has 27MB
Blocks: 503012
In comment #35, I mentioned that manually deleting the files from the various temp folders (as described in comment #30) was not having much of an impact on my Firefox startup speed (and, after doing it again, actually prolonged the startup speed even further).

Just out of curiosity, I went into IE7 (which I never use), went to the Tools menu, and selected Delete Browsing History.  From there, I clicked the buttons for Delete Files, Delete Cookies, Delete History, Delete Forms, and Delete Passwords.

I rebooted the computer, and launched Firefox.

Much to my surprise, Firefox only took about 15 seconds to launch, compared to 70-90 seconds to launch after manually cleaning out those temp folders.  Somehow, cleaning out the folders via IE's own interface seemed to have a much, much bigger impact than doing it manually.

Anyway, I'm thrilled that the launch time is down to 15 seconds.  But I hope that eventually you guys will be able to get it down to 5 seconds or less, which is what I was accustomed to when I was using 3.0.x.  Thanks. :-)
I don't see any slowdown due to temporary directories. I cleaned them, but it had no effect (although I didn't do an exact timing).

When I made the log, my computer had trouble keeping it all in memory. So, log might not be completely fair.

I see gaps near thread exit.
Trick #88 seem to help for me. It starts faster. I did cleanup from IE8. It had lot to clean up, because it took a while.
What about set the home page to about:blank and try then to see the start?
I had always 'about:blank'. So, loading initial page was never included. But, I didn't do exact measurements, but it seems much faster now, after cold startup (reboot of XP).
Could someone who is seeing this bug _please_ test the test builds I put up instead of just "doing the cleanup"?  That would provide a much more useful data point in terms of getting the bug fixed!
Previously, NSS's file/directory scanning code imposed a depth limit on the
depth of directories in the tree into which it would recurse looking for file.
It also imposed a limit on the number of bytes to be read, after which it 
greatly reduced the additional input it would gather.  

This patch adds two more limits.  These limit breadth, not depth.  
One limits the number of subdirectories of any one directory into which the 
code will recurse.  The new limit is 4, which may be too small.
The second limit limits the number of files that will be counted in a 
directory to no more than 1024, which may be too high.  

I am rather confident that this will reduce the amount of time that NSS 
spends gathering this info.  I am concerned on the effect of these changes
on the gathered entropy.  My concern is that, in very full directories, 
it may be that the same files and subdirectories will nearly always be read 
in each instance of the process. :(  It's possible to measure that.  I guess
I should attempt to do that next. 

Boris, perhaps you'd like to apply this and make yet another try build, but 
I'd suggest that you wait until you see the results from the build that has removed the NSS file scanning altogether.
Attachment #387557 - Flags: review?(julien.pierre.boogz)
re: comment 93, Boris I did test your build with respect to a nightly whats wrong with that? Since this always requires a full reboot could you please define the experimental setup:
- fat/new profile
- cold start/warm start  I do not understand why the entropy gathering would differ in both cases.
- antivirus off/on?
Attachment #387557 - Flags: review?(julien.pierre.boogz) → review+
I think the antivirus bit is fairly significant. I have used anti-viruses in the past that were worse than viruses in terms of performance impact - they made the computer so slow at to be useless. Some corporate policies had to be violated in order to get any work done.
Bernd, sorry I missed your comment.  My impression was that people were seeing very slow warm startups, not just cold startups.  Is that incorrect?  Sounds like in your case the antivirus+file scan is certainly having an impact on your cold startup, but not so much on warm, right?
(In reply to comment #93)
Here's the results of cold start time; the average of at least 3 tests.
Windows XP SP3, Pentium4 2.8G, 1.5G memory

Firefox 3.5 : 34.3 sec
Build 1 (file scan completely disabled) : 19.7 sec
Build 2 (CSIDL_INTENET_CACHE disabled) : 20.0 sec
Build 3 (CSIDL_COMPUTERSNEARME disabled) : 19.7 sec

As you can see, your builds are faster than 3.5, and I guess the difference between builds is the measurement deviation.
Both 3.5 and each builds can warm start within 1 or 2 secs.
There's no significant impact with antivirus on/off on my side. (Avast! 4.8)
Hope this helps.
In reply to comment 98, some questions about these builds.
Did Build 2 have all scans enabled EXCEPT the IE cache scan?
Did Build 3 have all scans enabled EXCEPT the network neighborhood scan?
If so, I cannot explain why build 1 times would be so similar to builds 2 & 3.
> Did Build 2 have all scans enabled EXCEPT the IE cache scan?
> Did Build 3 have all scans enabled EXCEPT the network neighborhood scan?

Yes.  Here's what the diffs for them looked like:

-       CSIDL_INTERNET_CACHE, 
+       //      CSIDL_INTERNET_CACHE, 

and

-       CSIDL_COMPUTERSNEARME, 
+       //      CSIDL_COMPUTERSNEARME, 

respectively.

> If so, I cannot explain why build 1 times would be so similar to builds 2 & 3.

Me neither.
Is Firefox Preloader helps firefox start faster? for me 3.0.Xx is working.
the real bug here is bug 338601. turns out that using system temp directories has no sound cryptographic properties.
Flags: wanted1.9.1.x?
I'm not sure if it was spelled out directly before, but opening Network Neighbourhood can easily take over a minute on many Windows computers.

I'm a bit hazy on the details now but I remember one thing: windows keeps an address of the computer which last advertised itself as (NetBIOS?) cache (or something along the lines) and will attempt to ask this computer for a list of networked computers, with a long (30 s?) timeout. Only after that fails, it will attempt to discover names itself, which can easily take another minute.

Anyone who has this problem: can you please open explorer -> My Network Places -> Entire Network? Does it freeze anywhere along the way?
(In reply to comment #88)
> Much to my surprise, Firefox only took about 15 seconds to launch, compared to
> 70-90 seconds to launch after manually cleaning out those temp folders. 
> Somehow, cleaning out the folders via IE's own interface seemed to have a much,
> much bigger impact than doing it manually.

That's because you first deleted the files in Windows Explorer, but forgot that they were not deleted, but moved into the recycle bin. That directory is *also* part of the system folders that are scanned, so moving them around didn't help anything.
I fail to see the need to get the seed from files off disk. Why not get it from a combination of the output options of something like netstat, date, system uptime, and current process information?

I had not run into this problem until I read about it yesterday. My IE temp directories set to 20MB max and cleaned up regularly. In fact, my IE temporary folder is empty now and probably most of the time. Recycle bin is emptied immediately. Also, I hardly log off and my machine is up almost all the time.

Hearing that it reads stuff from network neighborhood is worrying. I disable a lot of my network shares at work because of network latency. When does the code to look in machines around me trigger?
It already does that, see the function RNG_SystemInfoForRNG. Reading those
files is only a small part of collecting entropy. It doesn't rally matter what
data is read, as long as its as random as possible. The more the better
(reading the entire disk would be a good idea for instance, except that it
takes far too long), as long as it doesn't depend on a single item, which might
not deliver enough entropy. Or which present an attack vector. Relying *only*
on these files would be a bad idea (disk cloning, empty directories). Relying
only on netstat would also be bad (if you don't have a network. Relying only on
date & time is very easy to guess for an outsider, etc ... Note that the NSS
library is already very old, and has been enhanced lots of times over the
years. It also has been FIPS certified. The maintainers really know what
they're doing.

windows implementation (calls to lots of stuff, including the ReadSystemFiles):
http://mxr.mozilla.org/mozilla/source/security/nss/lib/freebl/win_rand.c#282

unix & mac implementation (including netstat and /dev/urandom) :
http://mxr.mozilla.org/mozilla/source/security/nss/lib/freebl/unix_rand.c#854
(In reply to comment #102)
> using system temp directories has no sound cryptographic properties.

So on the one hand, I'm quite certain that the NSS team cares a great deal about the value of the entropy they collect, and the resultant security the library offers. But on the other hand, this is my point of confusion, too:

win_rand already makes some use of the CryptoAPI calls on windows, but I'm having trouble finding any analysis that suggests these calls are insufficient on their own, or that the file sampling contributes a meaningful amount of entropy to the pool. Have we done these analyses?

Put another way: removing the file sampling logic completely and relying exclusively on cryptoAPI calls on Windows platforms that support it would certainly make us faster. Can someone quantify what we give up, if we do so?

The CryptGenRandom docs (http://msdn.microsoft.com/en-us/library/aa379942%28VS.85%29.aspx) suggest that their entropy pool is collected from "process ID and thread ID, the system clock, the system time, the system counter, memory status, free disk clusters, the hashed user environment block" along with entropy contributed by callers - this seems like the list of things I'd (perhaps naively?) expect to see.

I apologize if this has been asked and answered elsewhere, I don't really see it in this bug though, and before we tweak too carefully the number of files we read or the number of bytes per file, I think it would help the discussion to understand why the platform API is insufficient (and possibly inform Microsoft, if so).

In a perfect world, I'd rather not have any disk i/o on startup at all, and I imagine that if the NSS team had all the entropy they needed, they would prefer the same; hence my confusion.
(In reply to comment #108)
> The CryptGenRandom docs
> (http://msdn.microsoft.com/en-us/library/aa379942%28VS.85%29.aspx) suggest that
> their entropy pool is collected from "process ID and thread ID, the system
> clock, the system time, the system counter, memory status, free disk clusters,
> the hashed user environment block" along with entropy contributed by callers -
> this seems like the list of things I'd (perhaps naively?) expect to see.

Please see http://mxr.mozilla.org/mozilla/source/security/nss/lib/freebl/win_rand.c#282 : what do you think RNG_SystemInfoForRNG collects ?
(In reply to comment #100)
> > If so, I cannot explain why build 1 times would be so similar to builds 2 & 3.
> 
> Me neither.

I found the explanation using Process Monitor.
Build 1 doesn't read neither IE cache nor history.
Build 2 reads history, but doesn't scan IE cache.
In my case, there's few histories, so it doesn't impact much (+0.3sec?).
Build 3 does read IE cache, but the reading occurs *after* Fx window is
displayed. (later than font folder scan)
So the startup time I measured was almost the same as Build 1.
If you can make a build which scans IE cache only or something similar, I'm
willing to test.
(In reply to comment #104)
> (In reply to comment #88)
> > Much to my surprise, Firefox only took about 15 seconds to launch, compared to
> > 70-90 seconds to launch after manually cleaning out those temp folders. 
> > Somehow, cleaning out the folders via IE's own interface seemed to have a much,
> > much bigger impact than doing it manually.
> 
> That's because you first deleted the files in Windows Explorer, but forgot that
> they were not deleted, but moved into the recycle bin. That directory is *also*
> part of the system folders that are scanned, so moving them around didn't help
> anything.

Not true.  I emptied the recycle bin and rebooted.  I never leave anything in the recycle bin; I'm kind of anal about that, actually.
(In reply to comment #109)
> (In reply to comment #108)
> > The CryptGenRandom docs
> > (http://msdn.microsoft.com/en-us/library/aa379942%28VS.85%29.aspx) suggest that
> > their entropy pool is collected from "process ID and thread ID, the system
> > clock, the system time, the system counter, memory status, free disk clusters,
> > the hashed user environment block" along with entropy contributed by callers -
> > this seems like the list of things I'd (perhaps naively?) expect to see.
> 
> Please see
> http://mxr.mozilla.org/mozilla/source/security/nss/lib/freebl/win_rand.c#282 :
> what do you think RNG_SystemInfoForRNG collects ?

Yes, Jo - my confusion wasn't whether those things were happening, my question was why they are collectively insufficient, such that disk i/o needs to also be introduced. I expect that there are a few possible reasons for that, namely:

 - that the NSS team performed an entropy analysis of the bits they were getting from these comparatively low-latency sources, and found them insufficient
 - that the code for disk i/o predated other code, and that it was formerly necessary to perform this work in order to ensure appropriate entropy, though perhaps now it isn't
 - that some other constraint (FIPS validation, perhaps?) made it unacceptable to use the low-latency sources of entropy on their own.

This is a really terrible experience for a significant chunk of Firefox users (and likely other NSS consumers), but I'm much more interested in finding a way to solve it while maintaining security than I am about laying blame at the moment; I hope no one feels a need to get defensive - we all want the same outcome: a fast and secure NSS.

If there are strong reasons why the entropy from file accesses are needed, we need to know about that so that we can continue to try to minimize the damage they cause. My questions are an attempt to establish that this is the case, that we would be materially less secure if we just turned off the disk reads altogether.

If not, if we can turn those off without harming our entropy pool in a meaningful way, then I think the solution to this bug is clear.
How is this not a dup of bug 441355? See also bug 287116.

I'm fully tired of this particular trail of tears. There's no good reason to waste users' time at startup pretending to scrape entropy off the filesystem.

/be
(In reply to comment #113)
> How is this not a dup of bug 441355?

That bug is about mitigating the cost of NSS initialization on Firefox startup, but this bug seems to have been morphed into reducing NSS initialization (given the component change, at least), which makes it more a dupe of bug 338601, I think. Anecdotal evidence suggests bug 466745 may have worsened the problem, though.
As Vlad wrote in bug 441355 comment #8 (June this year):

"The core issue is NSS rummaging around temp files; I don't think we need to avoid initializing NSS at all, but we should use OS-provided randomness for init if it exists, especially on win32.  Doing the temp file trawling is taking > 500ms for me on a cold startup of firefox under win7."

We don't need two bugs to track one problem. If bug 441355 has a summary that overspecifies a fix, let's fix it and dup this bug against it -- or just dup it against this bug.

I don't care about the older bug having priority, just about keeping one bug per problem to fix. Bugs start from symptoms and evolve to cures, and split as needed. Since no one thinks NSS initialization should be deferred, only sped up, let's cure the slow NSS init symptom, ideally with one bug -- this one or bug 441355.

/be
(In reply to comment #115)
> Since no one thinks NSS initialization should be deferred, only sped
> up

I wasn't aware that had been settled. Seems to me like there are clear benefits to both potential changes, and that if it were feasible to do both we would, so I see value in keeping the bugs separate. But I'm not particularly close to the issue, and I don't feel strongly about it.
Joji Ikeda, it's interesting that you see NSS init after the browser window appears.  That's certainly something we aimed for, and more or less achieve sometimes...  But it sounds like with Firefox 3.5 that's not what you see?

I just checked, and NSS init will happen after the first browser window appears if the homepage is about:blank and we're not going to be showing a first-run page.  At least in some cases; extensions may affect this, as might speed of the computer (the init basically happens off a timer, as far as I can tell).  If you were using the same profile for all three startups, that might explain what you were seeing...

Brendan, with all due respect you're not helping here.  We want to both delay NSS init and make it faster.  This bug is about making it faster; please don't conflate the issues.
(In reply to comment #117)
> Brendan, with all due respect you're not helping here.  We want to both delay
> NSS init and make it faster.  This bug is about making it faster; please don't
> conflate the issues.

Then please reply to Vlad in bug 441355 comment 8. If we did have fast NSS init, then why would we want to defer it?

bz, I don't see why my shorter comments are singled out for your criticism. The many longer preceding comments don't help either, however well-intended. There is a myth that groveling through the filesystem can come up with useful bits of entropy. Why is this myth still around?

/be
(In reply to comment #118)
> Then please reply to Vlad in bug 441355 comment 8. If we did have fast NSS
> init, then why would we want to defer it?

NSS initialization will never be free, so there will always be a potential benefit to deferring it until after the UI is responsive. Whether that benefit outweighs the cost of doing so is another question, but I think that should be tracked independently.
(In reply to comment #107)
> Relying *only*
> on these files would be a bad idea (disk cloning, empty directories). Relying
> only on netstat would also be bad (if you don't have a network. Relying only on
> date & time is very easy to guess for an outsider, etc ...

Which was why I wrote a combination of..., not relying on a single metric. The UNIX version already uses netstat; doesn't NSS require a network in order to be useful?

Although I am not affected by this issue, I can sympathize with those who wait more than a minute before FF3.5 launches.
Summarizing what I'm reading so far

 - the test builds with file-walking OFF (either entirely or partially) startup much faster (as per comment 98)
 - we're having trouble reproducing consistently
 - clearing IE cache improves startup time
 - potential involvement with antivirus applications

Do we know if the same system is much slower on Firefox 3.5 as compared to Firefox 3? Do we have a good list of what we changed in this codepath between those two versions?

While I lean towards the camp who wonders why we need additional entropy on Windows, and why we think touching files that are placed on the disk by other applications is a kosher thing to be doing (seems like a potentially problematic dependency for us to create for ourselves), I'm also quite curious to know why this changed version-over-version.

Ultimately I think that a new bug should be filed on any suggestion to replace the file-walking with a pure call for native system entropy; this bug could be made to depend on that new one, potentially, but it feels like we should figure out why it's so slow in 3.5 compared to 3.0 as well as perhaps work up a solution that doesn't require file-walking at all.
Someone replied off the bug:

 - yes, it's slower in 3.0 than in 3.5
 - yes, we know what we changed; in order to get additional entropy on WinCE, it was felt that additional directories had to be added (IE cache, etc).

Can we get a build that IFDEFs the new WinCE related behaviour for WinCE builds only on the 191 branch, or just have it reverted since we're not targetting WinCE builds of Firefox 3.5 / Gecko 1.9.1?
Whiteboard: FIPS Thaw - PLEASE, NO MORE "ME TOO" COMMENTS! → [summary in comment 121-122] [PLEASE NO MORE "ME TOO"] [FIPS Thaw]
Attached patch patch v2 (Work in progress) (obsolete) — Splinter Review
NSS needs to gather adequate entropy, even on systems that lack good PRNGs.
While WinXP and later systems are all believed to always include support for 
a built-in high-quality PRNG, it was optional or entirely non-existent on 
older versions of Windows.  NSS supports Windows systems all the way back to Win2K.  That is why NSS retains the file scanning code to this day.

The entropy gathered from such file scanning suffices to make key generation
adequately secure against remote attackers.  That is the principal objective.
It may be that a local attacker (another process on the same system) can 
gather most of the same info as NSS gathers, which means that this method is
not as strong against local attackers as against remote attackers, but the 
primary objective is to thwart remote attackers.  

On other OSes than windows, we use a system PRNG (if present) with a fallback
to other means.  I think we can do the same thing for Windows.  The patch 
attached herewith does that.
Sorry for bugspam; realized that I'm carrying a lot of context in my head, here.

I'm approaching this from a "we must fix this for Firefox 3.5.1" perspective. Marking blocking1.9.1.1+ as per. It's entirely possible that we want a very different solution on the trunk (such as avoiding filewalking altogether) but my immediate concern is salving the wounds suffered by our existing audience. I imagine that we'd want to spawn a new bug (as suggested in comment 121) for trunk.
Flags: blocking1.9.2+
Flags: blocking1.9.1.1?
Flags: blocking1.9.1.1+
I like using CryptGenRandom -- but note that on WINCE, it's in coredll.dll, not in advapi32.dll.
(In reply to comment #123)
> Created an attachment (id=387944) [details]
> patch v2 (Work in progress)
...
> On other OSes than windows, we use a system PRNG (if present) with a fallback
> to other means.  I think we can do the same thing for Windows.  The patch 
> attached herewith does that.

Thanks a bunch, Nelson.  I've thrown this patch at try server, though it will
be a few hours before it clears.
> The entropy gathered from such file scanning suffices to make key generation
> adequately secure against remote attackers. 

I am skeptical.  Someone who is able to watch the traffic between you and a SSL server can probably also watch enough of your traffic to get an idea of what your IE cache looks like.  And someone who can MITM can arrange the IE cache to his liking, making Firefox think it has collected a large amount of entropy when it has in fact collected none.
Johnathan, does your try build also include the fix for bug 489811 ?
Both are needed.
OK.  A build with the attached patch but _not_ the fix for bug 489811 is available at https://build.mozilla.org/tryserver-builds/bzbarsky@mozilla.com-try-94008cd09801/try-94008cd09801-win32.zip

I'll spin up another build with both patches.
Re: comment 120,

No, NSS does not require a network to be useful. It can be used for offline applications. You could also use the NSS softoken purely as a source of random numbers if your application so wished. The collection of entropy is highly platform specific. On Unix, we only use netstat on those platforms which don't have a /dev/urandom or another sufficiently random way to collect data.  On Solaris 8, which doesn't have /dev/urandom, I implemented a entropy collection using Solaris kstat's . Solaris 9 and above have /dev/urandom . So, solaris never uses netstat anymore effectively. Hopefully, at some point, we will be able to drop all platforms that don't have a random number source. but that day has not happened yet. We still have NSS server products running on Win2k which need entropy generation, and for them, entropy data needs to be collected from files.
A build with both patches is available at https://build.mozilla.org/tryserver-builds/bzbarsky@mozilla.com-try-cb6f5c8eca67/try-cb6f5c8eca67-win32.zip

This build and the one from comment 129 are both against 1.9.1 branch, so are basically what will become Firefox 3.5.1 plus the two patches (the one attached to this bug and the one in bug 489811).  They should be safe to use against the Firefox 3.5 profile if really desired.

Testing would be very much appreciated!
(In reply to comment #125)
> I like using CryptGenRandom -- but note that on WINCE, it's in coredll.dll, 
> not in advapi32.dll.

... and its name is CeGenRandom.  NSS uses that on WinCE.  Notice the ifdefs
separating the WinCE code from the other Windows code.
Boris and Mike, 
The NSS team would like FF 3.5.1 to take a new tag of NSS 3.12.4 with these
fixes.  That will also get the CRLDP feature.  If that's not desirable for
3.5.1, then an alternative is for the NSS team to make a tag that is 
essentially NSS 3.12.3 (same as FF 3.5.0) except for the shared libraries 
that contain the code that is now in FIPS validation (essentially libsoftokn3, 
libfreebl3, and nssdbm3).  

The code being changed by the patch for this bug and the patch for bug 489811
is all in the FIPS validated part of NSS.  We want to make this change, and 
get the sources with those changes off to the FIPS evaluation, and have FF 
use it, so that when NSS is finally FIPS validated, FF will already be using
that code).
Nelson, would you mind posting in .planning with a summary of the changes between 3.12.3 and 3.12.4 other than this bug and bug 489811, and with bug links?  Anything you can give in terms of risk analysis would also be very much appreciated...  We can pick up that conversation there.
I also emailed you more in detail about this, but just in case you didn't receive it:

(In reply to comment #123)
> Created an attachment (id=387944) [details]
> patch v2 (Work in progress)
> ...
> On other OSes than windows, we use a system PRNG (if present) with a fallback
> to other means.  I think we can do the same thing for Windows.  The patch 
> attached herewith does that.

Examining the diff, it seems the patch code isn't reached because at line 524 in win_rand.c, the if expression is true, as the function RtlGenRandom (through its undocumented counterpart SystemFunction036) is found and called, at least on windows XP advapi32.dll exports this function and thus will be found and bound. This means that the boolean flag is never set as the routine goto's to the end right after calling RtlGenRandom (through SystemFunction036). 

IMHO, cutting out the call to RtlGenRandom (per Microsoft's advice from their own docs about this function) and simply checking whether CryptGenRandom can be called is better, as Microsoft strongly advices to call CryptGenRandom in the first place anyway (as RtlGenRandom is an undocumented OS-only function). The code for calling CryptGenRandom is already in place, so it won't take much to change this. 

Disclaimer: I haven't ran it through a debugger as I'm not familiar with Mozilla's buildsystem so I haven't tested this assumption, but from reading the code I'm pretty confident the patch should take into account the call to RtlGenRandom to be effective.
(In reply to comment #131)
Here's a result of tests, both a fresh profle without addon, and my daily profile with 14 addons (and a lot of bookmarks); the average of at least 3 tests.
Shiretoko refers the build with both patches.

A. Fresh proflie
Firefox 3.5 : 27.7 sec
Build 1-3 : 19.8 sec (comment #98)
Shiretoko : 11.0 sec

B. My daily profile
Firefox 3.5 : 34.3 sec (comment #98)
Shiretoko : 24.3 sec

Homepage is about:blank on both profiles (including Build 1-3 tests).
FYI, my Content.IE5 (IE cache) has 16 subfolders, 2468 files.
Posted with Shiretoko.
Attached patch Patch v3 - for review (obsolete) — Splinter Review
In reply to comment 135, yes, there are two code paths that use one of 
Windows' PRNGs, and the previous patch only set the flag in one of them.

Also agreed that, for many users, the result was that patch v2 was effectively
no different than patch v1 with respect to the amount of file scanning done.
So, all the time differences reported in comment 136 must be due to the patch
for bug 489811, I think.

Boris or Johanthan, please spin a new try build with this patch and the fix
for bug 489811. Thanks.
Attachment #387944 - Attachment is obsolete: true
Attachment #388148 - Flags: review?(julien.pierre.boogz)
(In reply to comment #137)
> Created an attachment (id=388148) [details]
> Patch v3 - for review
> In reply to comment 135, yes, there are two code paths that use one of 
> Windows' PRNGs, and the previous patch only set the flag in one of them.
> 
> Also agreed that, for many users, the result was that patch v2 was effectively
> no different than patch v1 with respect to the amount of file scanning done.
> So, all the time differences reported in comment 136 must be due to the patch
> for bug 489811, I think.

Checking the patch, I see you simply added an extra line in the extra code path. Please don't this the wrong way, but isn't it better simply swap the two code paths: so FIRST check for the necessary exported methods/structs for CryptGenRandom, and if available call that method and be done, and if one or more elements aren't available (i.e. old version of windows), fall back onto the RtlGenRandom code path, and if THAT's also not available, simply leave it as is?

This has the advantage that there's no risk of running a future obscure bug when Microsoft decides to change RtlGenRandom's shadow function SystemFunctio036, e.g. by adding more parameters or by changing the parameters and NSS crashes: the problem is that now, the RtlGenRandom code path is _always_ taken, as modern windows systems always export SystemFunction036 in advapi32, while one really should call CryptGenRandom when available. 

When swapping the two code paths (or remove the RtlGenRandom reliance altogether), the code gets more robust, IMHO, as it doesn't break when MS decides to change RtlGenRandom, something they warn for in the docs of RtlGenRandom.
Attachment #388148 - Flags: review?(julien.pierre.boogz) → review+
(In reply to comment #139)
In below results, ShiretokoPv2 refers the build in comment #131 (Shiretoko in comment #136), and ShiretokoPv3 refers the build in comment #139.

Firefox 3.5 : 27.7 sec (comment #136)
Build 1-3 : 19.8 sec (comment #98)
ShiretokoPv2 : 11.0 sec (comment #136)
ShiretokoPv3 : 14.3 sec

ShiretokoPv3 was tested with another fresh profile, which homepage is about:blank.
Haven't tested ShiretokoPv3 with my daily profile.
While either of these is an obvious improvement over what we have in 3.5.1, can someone break down why v3 of the patch is apparently slower than v2?
Mike, see also comments 98-100.  Several changes that have CLEARLY reduced 
the amount of work being done significantly have nonetheless been reported 
as producing slower results by the one person who seems to be testing these 
patches.  I think the startup time variance from run to run is VERY high,
perhaps even higher than the effects of the changes we're trying to make. 
Consequently, small (tiny) samples of results are not necessarily conclusive.
I think the short answer is: we need a larger sample of tests from affected 
users.
I see a patch with review; let's get it on trunk and bake it then get it on 1.9.1 and ask affected users to download a latest-nightly?
Mike, please see comment 133.
Status: REOPENED → ASSIGNED
Attachment #388148 - Flags: superreview?(wtc)
Comment on attachment 388148 [details] [diff] [review]
Patch v3 - for review

Wan-Teh has agreed to review this patch.
(In reply to comment #141)
> While either of these is an obvious improvement over what we have in 3.5.1, can
> someone break down why v3 of the patch is apparently slower than v2?

There's no info in how the patch was tested. The only way to test this patch properly IMHO is through filemonitoring first, and then measure on cold boots (so every measurement is done after a cold boot), and preferably through a profiler as well. 

Also is the patch still containing two code paths for windows where one is redundant as on modern windows, CryptGenRandom is never called while it _should_.
I tested both (3.0.11 and ShiretokoPv3 from comment 139) after a fresh boot, full profile, and after I first ran it on a cold boot to check for plugins so that delay wasn't taken into account on both tests. My temp folder contains many files. I waited till the userinit process was done (Windows XP, sp3). Profile has a default, simple startpage obtained from a local server in our network. 

times from icon click to fully usable window
Firefox 3.0.11: 11 seconds
ShiretokoPv3: 14 seconds

No virusscanner active. Both after a cold boot and no plugin verification delay, as I ran the particular instance first, then booted, then tested the startup time. 

There's a lot of diskactivity during ShiretokoPv3's startup, which can be checked using filemon. I don't have the time right now to do that, though my previous filemon checks with ff 3.5 revealed a lot of seeks within sqlite db files (perhaps copy these to a ramdisk) and font file enumeration (besides the tempfolder traversals). I don't expect startup speeds like IE has, which cheats by letting the shell pre-fetch essential dlls, though I think it's necessary to stay as close as possible to the startup time of FF 3.0. 

Without profiling it's not really doable to point to a location which causes the majority of time spend before the window shows up. As I'm only familiar with microsoft's C++ profiler and intel's profiler, I can't check where the bottlenecks are, but I'm sure others can. If anyone doing development wants me to run filemon again on ShiretokoPv3, let me know.
(In reply to comment #145)
> (From update of attachment 388148 [details] [diff] [review])
> Wan-Teh has agreed to review this patch.

We'd like to get this into 3.5.1 which may have a very short timeline.  WTC - are you able to review this today?

Nelson, are you able to provisionally put together a special-purpose tag that has the currently shipping 3_12_3 tag + this patch?  While I'd like to see 3.12.4 ready to land on branch, I don't think we should have this scoped change pressure us to take a full point release delta in a short-turnaround release. (I've filed bug 504080 to track the requirements for landing 3.12.4 on 1.9.1).

The other option we could take, since we need to fix this in 3.5.1, is to apply the (reviewed) patch directly to our pull of NSS on branch, and then update to an official tag once one is available. Obviously it would be more desirable to avoid this, but time is of the essence, really.
FWIW, in comment 148 Johnathan is referring to the fact that 3.5 now has a public exploit, and so we've gone into immediate-release-required mode. We'd very much like to include this, but no longer have the appropriate time to discuss the readiness of the other changes, regardless of our desire to take those as well.
Comment on attachment 388148 [details] [diff] [review]
Patch v3 - for review

r=wtc.  I have some comments and suggested changes.  The
only serious issue is that we should also add
    usedWindowsPRNG = PR_TRUE;
to WINCE's RNG_SystemRNG function, if the CeGenRandom
call succeeds.

>+static PRBool   usedWindowsPRNG;

The use of this global variable across RNG_SystemRNG and
RNG_SystemInfoForRNG is error-prone because it assumes
that we always call RNG_SystemRNG before calling
RNG_SystemInfoForRNG.  It is necessary to verify that
this is true now and to maintain this property in the
future.

While verifying that, I found that you didn't address
the issue that we call RNG_SystemInfoForRNG twice during
NSS softoken initialization: one is inside RNG_Init and
the other is a direct call.

> #define MAX_DEPTH 2
>+#define MAX_FOLDERS 4U
>+#define MAX_FILES 1024U

Nit: If the U suffix is not necessary, please remove it.
You can clearly see the inconsistency with MAX_DEPTH.

>     int                 iContinue;
>+    unsigned int        iFolders  = 0;
>+    unsigned int        iFiles    = 0;

Nit: We probably should use the 'u' prefix for iFolders
and iFiles because they have the 'unsigned int' type.

>+    // tack *.* on the end so we actually look for files. this will not overflow

Nit: I suggest wrapping this line.  It is longer than or exactly
80 characters.

>@@ -161,15 +163,12 @@ EnumSystemFilesInFolder(Handler func, PR
> 	} else {
> 	    // pass the full pathname to the callback
> 	    _snwprintf(szFileName, _MAX_PATH, L"%s\\%s", szSysDir, 
>-		       fdData.cFileName);
>+	    	       fdData.cFileName);

Nit: Would be nice to eliminate this whitespace change to make
patch smaller.

> #ifndef WINCE		     
> 	CSIDL_INTERNET_CACHE, 
>-	CSIDL_COMPUTERSNEARME, 
>+/*	CSIDL_COMPUTERSNEARME,  */
> 	CSIDL_HISTORY,
> #endif

Please delete the CSIDL_COMPUTERSNEARME line.

>+ReadOneFile(const PRUnichar *szFileName)
> {
...
>+	if (success)
>+	    ReadSingleFile(narrowFileName);
>+    	else
>+	    dwFileToRead++; /* couldn't read this one, read the next one. */

We don't need the "else" part because we aren't incrementing
dwFileToRead in ReadSingleFile if the PR_Open or PR_Read call
fails.  It would be nice to be consistent.

>@@ -348,8 +362,8 @@ void RNG_SystemInfoForRNG(void)
>     }
> #endif
> 
>-    // now let's do some files
>-    ReadSystemFiles();
>+    if (!usedWindowsPRNG)
>+	ReadSystemFiles();

I think we can just have
    if (usedWindowsPRNG)
        return;
at the very beginning of RNG_SystemInfoForRNG.  If you think
that is too drastic, please add a comment to explain why only
ReadSystemFiles() is skipped.

>@@ -524,6 +537,7 @@ size_t RNG_SystemRNG(void *dest, size_t 
>     if (pRtlGenRandom) {
> 	if (pRtlGenRandom(dest, maxLen)) {
> 	    bytes = maxLen;
>+	    usedWindowsPRNG = PR_TRUE;
> 	} else {
> 	    bytes = rng_systemFromNoise(dest,maxLen);
> 	}

Please make the same change to the WINCE definition of
RNG_SystemRNG, after the CeGenRandom() call.  (Search
for "bytes = maxLen" to find that line.)
Attachment #388148 - Flags: superreview?(wtc) → superreview+
Thanks for the SR, Wan-Teh.

Nelson - can you tell us when we should have a tag that is Firefox3.5.0 + these changes (here and in bug 489811 with the double-init)?

Once we have that tag, I can pull and apply it to the 1.9.1 branch using the instructions here, which I believe to be current ( https://developer.mozilla.org/en/Updating_NSPR_or_NSS_in_mozilla-central ).

Alternately, if you attach a patch here with the review comments addressed, I can just apply that + bug 489811 to our 1.9.1 source tree directly, if that's quicker.

We are currently holding the release of 3.5.1 on these fixes.
(In reply to comment #150)
> [...] you didn't address the issue that we call RNG_SystemInfoForRNG twice 
> during NSS softoken initialization: one is inside RNG_Init and
> the other is a direct call.

Wan-Teh,
That is bug 489811.  I believe bug 489811 is already fixed on the NSS trunk.
Do you believe it is not?
(In reply to comment #150)
In your review you forgot a very important problem: the RtlGenRandom call is always taken on modern windows (Windows XP and up) and CryptGenRandom is never called on these systems. This is a problem because Microsoft will change SystemFunction036 in the future. 

I figure most developers here are non-windows developers and I have the feeling some don't really care that much about windows, as their main platform is Linux or other Unix platform. However take into account that the problem this bug is all about is windows specific and the fix therefore should also be the best fix for Windows as possible. 

It's easy to fix this: swap the calls to RtlGenRandom and CryptGenRandom (so swap the code blocks). As a windows developer this really starts to get annoying: please people don't drag your feet because some problem is windows specific. Thanks.
Frans, Please stop beating that drum. 
The current code behavior with respect to Windows PRNGs is intentional, 
not accidental.  Your request is denied.
(In reply to comment #154)
> Frans, Please stop beating that drum. 
> The current code behavior with respect to Windows PRNGs is intentional, 
> not accidental.  Your request is denied.
Could you at least explain why you aren't doing that instead of saying no without reason (unless I'm missing a bug comment where you already did this...)?
Comment on attachment 388148 [details] [diff] [review]
Patch v3 - for review

Another issue I discovered while reviewing this patch is
that the Unix definition of RNG_SystemInfoForRNG reads from
the system RNG (/dev/urandom), but the Windows definition of
RNG_SystemInfoForRNG doesn't.  This is an inconsistency across
platforms.  You don't need to address this issue in this bug.

Re: comment 152: Nelson, I looked at the NSS source code in
mozilla-central, which is NSS 3.12.4.1.  I conform that bug 489811
is fixed on the NSS trunk.  Sorry about the confusion.

Re: comment 153: Frans, our decision to use RtlGenRandom was
based on Michael Howard's blog:
http://blogs.msdn.com/michael_howard/archive/2005/01/14/353379.aspx
Michael Howard is the software security guy at Microsoft.
Could you cite a reference as trustworthy as Howard's blog
that says RtlGenRandom shouldn't be used?  Thanks.
(In reply to comment #156)
> Re: comment 153: Frans, our decision to use RtlGenRandom was
> based on Michael Howard's blog:
> http://blogs.msdn.com/michael_howard/archive/2005/01/14/353379.aspx
> Michael Howard is the software security guy at Microsoft.
> Could you cite a reference as trustworthy as Howard's blog
> that says RtlGenRandom shouldn't be used?  Thanks.

Of course: Quoted from the MSDN documentation of RtlGenRandom:
(in red:)
"[The RtlGenRandom function is available for use in the operating systems listed in the Requirements section. It may be altered or unavailable in subsequent versions. Instead, use the CryptGenRandom function.]"
ref: http://msdn.microsoft.com/en-us/library/aa387694(VS.85).aspx

Thanks for giving background info about this, Wan-Teh Chang, much appreciated.
Attached patch patch v4 - for review (obsolete) — Splinter Review
This patch addresses Wan-Teh's review comments.  At his suggestion, I also 
changed ReadSingleFile to make ReadOneFile consistent.

Shawn, the rationale for the existing code is discoverable via bonsai's 
CVS blame pages.  That's why I didn't repeat it in this bug.
Attachment #387557 - Attachment is obsolete: true
Attachment #388148 - Attachment is obsolete: true
Attachment #388555 - Flags: superreview?(wtc)
Johnathan, 
The NSS team would very much like Firefox to take an NSS 3.12.4 tag, 
at least for the portion of NSS that is now undergoing FIPS evaluation, 
so that Firefox can claim to be running the version under evaluation.

If you judge that time does not permit that for 3.5.1, then I suggest 
you apply the patch for bug 489811 and the patch for this bug to the 
FF 1.9.1 branch directly, but plan to take the FIPS tag for 3.5.2.  

I am not confident that the NSS source in the FF 1.9.1 tree presently 
matches any known NSS tag, (I'm actually pretty sure it does not).
Consequently I doubt that I can produce an NSS tag in short order that 
matches what is now in the 1.9.1 tree plus these two changes.  So, if 
that's what you want, I think your best bet is to apply those patches 
yourself.
> Johnathan, 
> The NSS team would very much like Firefox to take an NSS 3.12.4 tag, 
> at least for the portion of NSS that is now undergoing FIPS evaluation, 
> so that Firefox can claim to be running the version under evaluation.
> 
> If you judge that time does not permit that for 3.5.1, then I suggest 
> you apply the patch for bug 489811 and the patch for this bug to the 
> FF 1.9.1 branch directly, but plan to take the FIPS tag for 3.5.2.  

That's my thinking here, exactly - that while 3.12.4 has some desirable fixes (and hopefully FIPS compliance), time does not permit that decision to be made here for 3.5.1

> I am not confident that the NSS source in the FF 1.9.1 tree presently 
> matches any known NSS tag, (I'm actually pretty sure it does not).
> Consequently I doubt that I can produce an NSS tag in short order that 
> matches what is now in the 1.9.1 tree plus these two changes.  So, if 
> that's what you want, I think your best bet is to apply those patches 
> yourself.

Okay, thanks for that.  Based on that comment, I've provisionally attached the rollup patch here that I will land against 1.9.1 if Wan-Teh sr's the new patch. This includes your v4, as well as the fix from bug 489811.
Comment on attachment 388570 [details] [diff] [review]
Rollup patch for 1.9.1.1 only landing

I've asked johnath to land this on 1.9.1, awaiting wtc's review. If the review comes back negative, we'll back it out. This just lets us run it against unit tests while we wait.
Attachment #388570 - Flags: approval1.9.1.1+
Talked this over with shaver and beltzner, we're going to push this and get post-facto sr from wan-teh. If there's a stop-ship issue discovered by the review, we'll address it, but if there isn't such an issue, we'll be N hours further along in the process.

Pushed to 1.9.1 only as:

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d40be78ef515  (including the bug 489811 fix)

Marking this bug fixed1.9.1.1 but not FIXED, since I believe this hasn't yet landed on NSS cvs.
Keywords: fixed1.9.1.1
Comment on attachment 388555 [details] [diff] [review]
patch v4 - for review

r=wtc.  There is one minor bug below.

>+    unsigned int        iFolders  = 0;
>+    unsigned int        iFiles    = 0;

Nit: I suggest declaring these as the 'int' type or using
'u' instead of 'i' in the Hungarian-notation variable names.

>-    // tack *.* on the end so we actually look for files. this will
>-    // not overflow
>-    wcscpy(szFileName, szSysDir);
>-    wcscat(szFileName, L"\\*.*");
>+    // append *.* so we actually look for files. this will not overflow
>+    _snwprintf(szFileName, _MAX_PATH, L"%s\\*.*", szSysDir);

Nit: Remove "this will not overflow" from the comment because
it addresses a concern about the wcscpy and wcscat calls in
the original code.

>@@ -411,6 +427,7 @@ size_t RNG_SystemRNG(void *dest, size_t 
>     size_t bytes = 0;
>     if (CeGenRandom(maxLen, dest)) {
> 	    bytes = maxLen;
>+	    usedWindowsPRNG = PR_TRUE;
>     }

We need to set usedWindowsPRNG to PR_FALSE before the CeGenRandom
call.  Sorry I didn't notice this earlier.
Attachment #388555 - Flags: superreview?(wtc) → superreview+
Checking in win_rand.c; new revision: 1.26; previous revision: 1.25

This bug and bug 338601 are mostly duplicates. The same patch fixes both.
However, bug 338601 was about 3.11 and this bug is about 3.12.3 which is 
must worse. So I'm going to mark them fixed separately.
Attachment #388555 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Depends on: 338601
Resolution: --- → FIXED
Target Milestone: --- → 3.12.4
(In reply to comment #163)
[...]
> We need to set usedWindowsPRNG to PR_FALSE before the CeGenRandom
> call.  Sorry I didn't notice this earlier.

For the record: This missing initialization only effects Windows CE.
It is fixed in the NSS patch, but not in 1.9.1.1.
And it only affects the PRNG the SECOND time it is reseeded during a process
lifetime, not the first time.  It's not a biggie, IMO.
This is a diff between what Johnathan landed on 1.9.1 (comment #162) and what Nelson landed on NSS trunk (comment #164). The differences match those described in comment #163.

I plan to land it on 1.9.1 shortly.
> I plan to land it on 1.9.1 shortly.
Groovy!  Thanks, Gavin.
As my proposed change didn't make the cut for some bizarre silly reason, and because with every 'patch-tuesday' when MS releases the security fixes of that month it can be over for this little routine, what shall I do now? Post a new bug report about this? I have a hard time believing that that will ever be addressed.
Yes, all remaining issues should be filed as separate bug reports, so we don't lose track of them and can discuss them separately.  Please note the bug number in a comment here.
Please file a new bug and cc me. I fully support your request to get rid of
RtlGenRandom in favor of CryptGenRandom. It doesn't seem to make any sense, but
if there is a valid reason, I'd like to see that discussed properly without
vague mentions of going to cvs blame, please.
(In reply to comment #171)
> Yes, all remaining issues should be filed as separate bug reports, so we don't
> lose track of them and can discuss them separately.  Please note the bug number
> in a comment here.
Here ya go:
https://bugzilla.mozilla.org/show_bug.cgi?id=504270
To any reporter: please try builds from the latest mozilla-1.9.1 nightly directory, located here:

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.1/

The fix we're planning on rolling out for Firefox 3.5.1 is contained in those builds. If it is insufficient, please let us know here; the response may be to open a new follow-up bug, or to re-open this bug.
Whiteboard: [summary in comment 121-122] [PLEASE NO MORE "ME TOO"] [FIPS Thaw] → [summary in comment 121-122] [fixed in mozilla-1.9.1 nightlies] [PLEASE NO MORE "ME TOO"] [FIPS Thaw]
Target Milestone: 3.12.4 → ---
(In reply to comment #174)
Don't know if it's insufficent or not, here's test results.
Firefox 3.5.1pre1 refers the build in comment 174, zip version.

A. Fresh proflie
Firefox 3.5 : 27.7 sec
ShiretokoPv2 : 11.0 sec
ShiretokoPv3 : 14.3 sec
Firefox 3.5.1pre1 : 13.7 sec

B. My daily profile
Firefox 3.5 : 34.3 sec
ShiretokoPv2 : 24.3 sec
ShiretokoPv3 : n/a
Firefox 3.5.1pre1 : 28.0 sec

Process Monitor shows there's no IE cache nor IE history traversal with Firefox 3.5.1pre1, on both cold start and warm start.
IMHO, why patch v2 build is the fastest is worth investigating, but this issue requires more testers, and should be a new bug.
Mike, don't forget that in a test that disabled NSS file scanning completely,
startup time for some users were still in excess of 15 seconds.  It would be
a mistake to assume that NSS is the cause of that.
Joji, thanks for reporting back with your results.

To anyone else who has experienced this problem: could you please try the 3.5.1 release candidate and check whether it improves the startup time?

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.1-candidates/build1/
Markng verified1.9.1.1 as per comment 175
Keywords: 4xp, verified1.9.1.1
Keywords: 4xp
(In reply to comment #177)
> Joji, thanks for reporting back with your results.
> 
> To anyone else who has experienced this problem: could you please try the 3.5.1
> release candidate and check whether it improves the startup time?
> 
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.1-candidates/build1/

Been having what appears to be the same problem over at bug 491100 and this new build does indeed reduce the startup time to what I would expect it to be. Thanks for this!
Keywords: fixed1.9.1.1
We should update the NSS version strings in mozilla-1.9.1 for
proper bookkeeping.  This is important because the NSS packages
in many Linux distributions are the NSS used in Firefox.

The NSS in mozilla-1.9.1 is exactly NSS_3_12_3_WITH_CKBI_1_75_RTM
plus
- a build fix for security/nss/lib/ckfw/Makefile
  http://hg.mozilla.org/mozilla-central/rev/b5f6a4b00c7f
- the patches in this bug

We should create a NSS_3_12_3_1_RTM CVS tag to identify this
version of NSS.

If you have already released Firefox 3.5.1, please approve this
patch for 1.9.1.2.
Attachment #388938 - Flags: review?(nelson)
Attachment #388938 - Flags: approval1.9.1.1?
Comment on attachment 388570 [details] [diff] [review]
Rollup patch for 1.9.1.1 only landing

These patches should also be pushed to mozilla-central.
Don't we have a requirement of "baking on the trunk"
before a patch can be checked in to a release branch?
Comment on attachment 388938 [details] [diff] [review]
Update NSS version strings in mozilla-1.9.1

Wan-Teh, 
I agree with you that, if FF 3.5.1 has not yet been released, then the three
_VERSION strings should be modified in the 1911 tree according to your 
attached patch for that release.  r=nelson for the 1.9.1.1 Hg tree.

Also, if Firefox is going to carry this code back to the 1.9.0 branch for 
FF 3.0.x, then this patch should also go to that branch.

But, for FF 3.5.2 (1.9.1.2) we expect Mozilla to take all of NSS 3.12.4 RTM
Attachment #388938 - Flags: review?(nelson) → review+
I'm happy to approve this for 1.9.1.2, so that it lands on mozilla-central. I'm also happy for this to go post-facto into the relbranch for 3.5.1 if we think that a distro will be using that (reed/ss/caillon care to comment?) but please know that we are NOT planning on respinning 3.5.1 and that patch would NOT have been part of it.
Attachment #388938 - Flags: approval1.9.1.1? → approval1.9.1.2+
(In reply to comment #181)
> We should update the NSS version strings in mozilla-1.9.1 for
> proper bookkeeping.  This is important because the NSS packages
> in many Linux distributions are the NSS used in Firefox.
> 
> The NSS in mozilla-1.9.1 is exactly NSS_3_12_3_WITH_CKBI_1_75_RTM
> plus
> - a build fix for security/nss/lib/ckfw/Makefile
>   http://hg.mozilla.org/mozilla-central/rev/b5f6a4b00c7f
> - the patches in this bug

Everything you listed above is all Windows-only, no? Then why do the Linux distros care? It shouldn't matter, correct?
(In reply to comment #183)
> But, for FF 3.5.2 (1.9.1.2) we expect Mozilla to take all of NSS 3.12.4 RTM

I'm not sure what gave you that impression. Is there a bug on file to do that nominated to block? I don't recall making any statement saying that this was true, fwiw!
(In reply to comment #186)
> (In reply to comment #183)
> > But, for FF 3.5.2 (1.9.1.2) we expect Mozilla to take all of NSS 3.12.4 RTM
> 
> I'm not sure what gave you that impression. Is there a bug on file to do that
> nominated to block? I don't recall making any statement saying that this was
> true, fwiw!

That's bug 504080, which is where I think we should take this conversation.
(In reply to comment #185)
>
> Everything you listed above is all Windows-only, no? Then why do the Linux
> distros care? It shouldn't matter, correct?

The "rollup patch for 1.9.1.1 only landing" (attachment 388570 [details] [diff] [review])
in this bug contains the patch for bug 489811, which is cross-platform.
I don't normally post comments because, honestly, I am too ignorant regarding the development of software and coding.  But I do read bug threads quite often to learn what I can from the great people at Mozilla and the excellent team of volunteers who contribute their time and talent for the betterment of us all.

After reading comment #175, I'd at least like to have a better understanding of how this bug is considered fixed when the time for a user's "typical" startup is still nearly 30 seconds?  (Before you flame me, yes, much more data is needed before any REAL conclusions are made.)  I'm not a developer, but am a very faithful user.  From a user's perspective the first experience, or first impression, of a browser is how fast it responds when you hit "go".  First impressions are very important, especially in this age of fast hardware.  People want responsiveness, and not only AFTER the application is open.

I am concerned from the limited amount of data available that the users who are really wanting this "fix" will be a bit disappointed.  The expectation is that this fix should _significantly_ improve startup performance, and the sample data is not showing this to be true (it does for a fresh profile, but not for a "typical" startup once the profile has been in use for a while).  

Just a user's two cents.... if you get this far, thank you for reading.
In reply to Mike's comment 184,
Mike, are you saying that the fix for the slow NSS startup time will not be
in 3.5.1? Or are you merely saying that Wan-Teh's latest addendum, which 
modifies version numbers, will not be part of 3.5.1?  

In reply to Paul's comment 189,
Paul, the total problem with FF 3.5 startup time has numerous components. 
The problem is not all in one area, but is spread across several areas.
This bug is about one of those areas, which (for some WIndows users) was the 
biggest single component of that slowness.  Fixing this bug does not mean 
that the entire startup slowness problem is fixed, but rather that this 
component is fixed. See the bugs that are "blocked" by this bug, and the 
other bugs that block them, to see the other components.
Hey, I have tried 3.5.1 but things have not improved a lot. Cold start is still more then a minute.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #191)
> Hey, I have tried 3.5.1 but things have not improved a lot. Cold start is still
> more then a minute.

Could you please try this utility:
http://technet.microsoft.com/en-us/sysinternals/bb896642.aspx

It's a small utility which shows you harddisk activity. Please use the following procedure:
- boot your machine and login
- start filemon
- click the filter option and specify firefox.exe
- start firefox for the first time. 

You'll see a lot of data logged in the window. A lot of the diskactivity is necessary, though if you see that firefox still traverses files in your temp folders, it's a problem. the timestamps at the front of the lines indicate how long the process was busy with that action, so you have an indication what happened. It can be the problem is not caused by this particular bug (nss' traversing temp folders) but caused by another issue. Using filemon can at least show if it is still this bug (if you see temp folder traversal) or if it is NOT this bug. 

I also agree with Paul, however further investigation is needed, e.g. by using proper profilers and monitors. Mozilla should step up and do more deeper research about what firefox.exe is doing at startup using normal tools like profilers and monitors like filemon. As the vast majority of the users use windows, it's not sufficient to check whether the latest debian based linux flavor works OK with firefox. It's also not sufficient to measure this with a stopwatch: the activities performed have to be investigated and checked. Only then things can be rescheduled to take place or dropped entirely (like the rediculous enumeration of all the font files, which is unnecessary)
(In reply to comment #191)
> Hey, I have tried 3.5.1 but things have not improved a lot. Cold start is still
> more then a minute.

Closing this again as FIXED, because you didn't show why this particular issue (slow start reading the IE Internet Temporary Files and Windows Temp folder) has been fixed. Do you have any proof that it is still reading those files ? See the suggestion by Frans.

There are other bugs which deal about the general start-up performance problems, like bug 479078 and bug 501563. Please refer to them and their dependents.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
(In reply to comment #193)
> (In reply to comment #191)
> > Hey, I have tried 3.5.1 but things have not improved a lot. Cold start is still
> > more then a minute.
> 
> Closing this again as FIXED, because you didn't show why this particular issue
> (slow start reading the IE Internet Temporary Files and Windows Temp folder)
> has been fixed. Do you have any proof that it is still reading those files ?
> See the suggestion by Frans.
> 
> There are other bugs which deal about the general start-up performance
> problems, like bug 479078 and bug 501563. Please refer to them and their
> dependents.

Bug 479078 shows little to no activity over several MONTHS (i.o.w.: no-one is really interested in getting this fixed) and bug 501563 is a bug related to Mac OS X, open for 17 days now and no-one has chimed in or showed any interest in working on it. 

You can point to these bugs to get this closed and move on, that doesn't make it go away for some users. I'm not suggesting that this bug should stay open because some user re-opens it with perhaps unrelated feedback, though closing it directly shows IMHO a 'you don't know what you're talking about'-attitude I've seen many times before in a lot of OSS projects. 

Anyway, to the one who's really in charge of Firefox development at Mozilla, I'd like to suggest the following: on a windows XP/vista box, install Opera 9.x /10.x and Firefox 3.5, next to IE8. You boot the machine and start each browser. Guess which one takes a long time to present itself compared to the other ones. Then step back, get a good italian coffee and think for a minute or two about how the many million users out there think about this difference in startup speed every single day. I.o.w.: you can invest a lot of time into new shiny features, if this delay (be it 15, 20 or even longer) isn't addressed very soon, more and more people will fly back to IE8 or opera. I mean, ok, IE8 cheats, but opera doesn't. It also loads after a cold boot in 1 second. A user doesn't give a hoot how fancy the browser is internally, s/he simply wants to visit a website. Now.
(In reply to comment #192)
> I also agree with Paul, however further investigation is needed, e.g. by using
> proper profilers and monitors. Mozilla should step up and do more deeper
> research about what firefox.exe is doing at startup using normal tools like
> profilers and monitors like filemon. As the vast majority of the users use
> windows, it's not sufficient to check whether the latest debian based linux
> flavor works OK with firefox. It's also not sufficient to measure this with a
> stopwatch: the activities performed have to be investigated and checked. Only
> then things can be rescheduled to take place or dropped entirely (like the
> rediculous enumeration of all the font files, which is unnecessary)

If you think we don't do this already then you are mistaken. For as long as I have been involved there have been drives to reduce startup time (even more so now we have to deal with slower mobile devices). We have continual feedback from performance tests from every change that is made to the source code (though we are always striving to improve these, and this bug will probably result in some new tests being added). Unlike Opera etc. Firefox is not just a webbrowser, it is a platform with many capabilities that need initialising at startup. It is those capabilities that makes all the great add-ons that have made Firefox a success possible. There are probably hundreds of bugs filed and fixed on individual causes for slow startup that developers have identified using the tools you mention. There is no one individual cause for slowness, just lots of small ones, each one must be tackled carefully. You are welcome to break out a profiler and see if we have missed anything but you'd be wrong to assume that we aren't doing this already and on all platforms.

Further discussion doesn't belong in this bug which is about one specific cause for slowness that has been resolved. The mozilla.dev.platform newsgroups is probably a good place to start a discussion on the subject if necessary.
Blocks: BH-2009
Comment on attachment 388938 [details] [diff] [review]
Update NSS version strings in mozilla-1.9.1

I pushed this patch to mozilla-1.9.1 in changeset 091341acf43d:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/091341acf43d
Target Milestone: --- → 3.12.3.1
Flags: wanted1.9.1.x?
Mass change: adding fixed1.9.2 keyword

(This bug was identified as a mozilla1.9.2 blocker which was fixed before the mozilla-1.9.2 repository was branched (August 13th, 2009) as per this query: http://is.gd/2ydcb - if this bug is not actually fixed on mozilla1.9.2, please remove the keyword. Apologies for the bugspam)
Keywords: fixed1.9.2
We're still getting people whining about startup times but I don't think we have people who are stalled for many many MINUTES anymore.  Marking common-issue- and I'll file a new bug if we find a common cause of slowstarts.
[i]"and I'll file a new bug if we find a common cause of slowstarts."[/i]
I can give you one: checking for updates. When I start any program, including a web browser, it means I want to use it _immediatelly_, not 5 minutes, 1 minute or 10 seconds later, but NOW. Check for updates when the app is idle, not when the user most probably wants to use it.
See bug 502068.

I have a Pentium M 1.73 laptop and it takes 1-2 minutes to open Firefox, what is an improvement but still too long. It was better in early days.
I have Firefox 3.5.3 - the problem does not appear to be fixed for me.  I still have to run Firefox Preloader, which has its own problems.
Val, it sounds like you were seeing a different problem, then.  Please file a separate bug on your problem and cc me on it?
Val, if you can help us indentifying what's the reason of the slowdown on your system that would help. Since this bug has been fixed we can exclude big temporary folders. (As suggested by Boris, you can open a new bug).
You could cooperate with http://support.mozilla.org guys to find the reason. I'd suggest to check if you have any security software that could interact with the startup of Firefox. Also check if it's slow even with a new profile (http://support.mozilla.com/kb/Profiles#new), could be in your profile there is some really huge file that kills startup time, or you have an extension that is not helping startup.
Whiteboard: [summary in comment 121-122] [fixed in mozilla-1.9.1 nightlies] [PLEASE NO MORE "ME TOO"] [FIPS Thaw] → [summary in comment 121-122] [fixed in mozilla-1.9.1 nightlies] [PLEASE NO MORE "ME TOO"] [FIPS Thaw][ts]
You need to log in before you can comment on or make changes to this bug.