nsFastLoadFileReader::ComputeChecksum is ~10% of cold-startup time

RESOLVED DUPLICATE of bug 412796

Status

()

Core
XPCOM
P2
normal
RESOLVED DUPLICATE of bug 412796
12 years ago
6 years ago

People

(Reporter: Darin Fisher, Assigned: (dormant account))

Tracking

(Depends on: 1 bug, {perf})

Trunk
x86
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [ts][patches in dependent bugs])

(Reporter)

Description

12 years ago
I profiled a cold-start of trunk Firefox on windows (static build w/ symbols), and I found that we spend about 10% of the time computing checksums on the two fastload files.

There's a comment in the code saying that the 8k buffer size ought to be tuned.  Maybe we can do even better.  Maybe we can avoid having to read in the entire fastload file since otherwise the file might not be read in its entirety (assuming it might contain data for windows that are not opened at startup).
(Reporter)

Updated

12 years ago
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P4

Comment 1

12 years ago
Wow 10% is a lot. Having faster startup time is something that's VERY important for a lot of users.
For the suite, there could be lots of data to checksum not needed for first browser window.  For Firefox, it seems to me the dialogs and any non-browser window data will be relatively small.

I suspect if we mmapped the whole file and ran over it to checksum it, warming it up for deserialization, we would see the profile data shift away from i/o.  I'm assuming the checksum computation is tiny -- it should be.

/be
(Reporter)

Comment 3

11 years ago
-> reassign to default owner
Assignee: darin.moz → nobody
Status: ASSIGNED → NEW
Taking, want to deCOMtaminate FastLoad.

/be
Assignee: nobody → brendan

Updated

11 years ago
Depends on: 104170

Updated

11 years ago
Blocks: 195010
Changing target to Future since 1.9a1 has long shipped. Brendan, is this still on the radar for 1.9?
Target Milestone: mozilla1.9alpha1 → Future
Flags: wanted1.9.1?
Whiteboard: [ts]

Comment 6

9 years ago
results from the FunctionTimer patch bug 480735:

ddahl-t500 1~/code/moz/mozilla-central/mozilla % ../obj-i686-pc-linux-gnu-optimize/dist/bin/firefox -P cold -no-remote
ddahl-t500 ~/code/moz/mozilla-central/mozilla % cat ~/logs/startup.txt | grep Compute                                
-1243674240[b5a18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: <     2 ms (    2 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: <     1 ms (    1 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
ddahl-t500 ~/code/moz/mozilla-central/mozilla %

Comment 7

9 years ago
sorry the last log data was warm startup, here are the cold startup numbers on Linux/Trunk:

ddahl-t500 139~/code/moz/mozilla-central/mozilla % cat ~/logs/startup.txt|grep Compute
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <    76 ms (   76 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <    33 ms (   33 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <     0 ms (    0 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)

Comment 8

9 years ago
I've also done some profiling on this.  I stuck some PRIntervalTimes into nsFastLoadFileReader::ComputeChecksum and ran 10 runs under simulated cold startup on OS X (i.e., sync && purge), where I'm getting 9-10s overall wall clock startup time.  Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a1pre) Gecko/20090826 Minefield/3.7a1pre

Upshot is, I'm seeing ~3% of wall clock startup time, not 10%, with ~83% of that 3% in reads and ~2% of that 3% in NS_AccumulateFastLoadChecksum, confirming comment 2.  (Despite using PR_IntervalToMicroseconds, I'm getting only ms granularity; besides CPU, the remaining ~15% of that 3% must be lost precision, including intervals < 1ms (Seeks and Tells).)

Data follows.  Numbers are times in ms.  "total" is total time spent in nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.  "read" is total time spent in Read calls from nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.  "NS_AccumulateFastLoadChecksum" is total time spent in NS_AccumulateFastLoadChecksum calls from nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.

> total <- c(314.0,94.0,92.0,94.0,610.0,516.0,256.0,94.0,154.0,564.0)
> mean(total)
[1] 278.8
> median(total)
[1] 205
> sd(total)
[1] 211.339
> min(total)
[1] 92
> max(total)
[1] 610
> read <- c(307.0,79.0,82.0,90.0,602.0,116.0,246.0,83.0,148.0,555.0)
> mean(read)
[1] 230.8
> median(read)
[1] 132
> sd(read)
[1] 198.8359
> min(read)
[1] 79
> max(read)
[1] 602
> NS_AccumulateFastLoadChecksum <- c(5.0,11.0,4.0,1.0,5.0,8.0,9.0,8.0,4.0,4.0)
> mean(NS_AccumulateFastLoadChecksum)
[1] 5.9
> median(NS_AccumulateFastLoadChecksum)
[1] 5
> sd(NS_AccumulateFastLoadChecksum)
[1] 2.998148
> min(NS_AccumulateFastLoadChecksum)
[1] 1
> max(NS_AccumulateFastLoadChecksum)
[1] 11
> mean(read) / mean(total)
[1] 0.8278336
> mean(NS_AccumulateFastLoadChecksum) / mean(total)
[1] 0.02116212
> wall_clock <- 9500
> mean(total) / wall_clock
[1] 0.02934737

Comment 9

9 years ago
BTW, mmapping per comment 2 is being worked on in bug 412796.
(Assignee)

Comment 10

9 years ago
Found out that we checksum 7mb worth of stuff every startup, only to read 200K of it.

I wonder if we should disable fastload checksums while we rewrite the fastload system to be less slow?
(Assignee)

Comment 11

9 years ago
http://people.mozilla.com/~tglek/less_fasl.diff saves 12% of warm startup on n810. Unfortunately cold startup time went to hell due to throughput decreasing by 90%. 

perhaps doing more aggressive mmap than bug 412796, with correct madvise flags will fix the throughput.
Fastload writes the file when we are slow-loading, so are we really pushing out XDR'ed bytes for more files than we read in when we next hit the .mfasl file? Or are we pushing out more bytes for some reason?

/be
(Assignee)

Comment 13

9 years ago
(In reply to comment #12)
> Fastload writes the file when we are slow-loading, so are we really pushing out
> XDR'ed bytes for more files than we read in when we next hit the .mfasl file?
> Or are we pushing out more bytes for some reason?
> 
> /be

yes during component registration every file is read in...which happens to coincide with when fastload is created(i think).
Argh, how asymmetric -- easy to fix?

/be
(Assignee)

Comment 15

9 years ago
(In reply to comment #14)
> Argh, how asymmetric -- easy to fix?
> 
> /be

well if I do what you are implying, it shouldn't be hard to not write fasl files during comporent registration. But then we'll get burned during execution since other components and modules would be slowloaded.

I think the proper fix is to do what zip files do and store checksums per fastloaded object. Furthermore, we should set the file ordering to be what startup looks like so it's a short sequential read on startup. That doesn't sound easy at all.
(Assignee)

Comment 16

9 years ago
(In reply to comment #10)
> Found out that we checksum 7mb worth of stuff every startup, only to read 200K
> of it.

That wasn't completely correct. The XPC.mfasl file is only ~ 2mb, so we checksum 2mb of it. But in the end due to spreading files all over the place and buffering aggressively we read in 7mb out of the 2mb file during startup.
(Assignee)

Updated

9 years ago
Depends on: 517956
(In reply to comment #16)
> (In reply to comment #10)
> > Found out that we checksum 7mb worth of stuff every startup, only to read 200K
> > of it.
> 
> That wasn't completely correct. The XPC.mfasl file is only ~ 2mb, so we
> checksum 2mb of it. But in the end due to spreading files all over the place
> and buffering aggressively we read in 7mb out of the 2mb file during startup.

Sounds like bug 195010 is indeed biting.

/be
(Assignee)

Comment 18

9 years ago
Conclusion:
Did some more digging. I think I got confused by xdr api on friday. In some cases we do indeed read the entire fastload file on startup, but it's not as bad as i thought.

 If a fastload file is written during normal(non-component-reg) startup then we do not overread due to checksum/etc. 

If a fastload is written during compreg we can overread, by around 2.25x due to checksoming. 

Together with overreading due to buffering as suggested in bug 195010, we end up with 2.5x more io, than filesize.

I think mmap io(something like bug 504864, but more aggressive) is going to solve the reading-multiple times problem.
Assignee: brendan → tglek
Depends on: 412796
Severity: minor → normal
Priority: P4 → P2
Target Milestone: Future → ---
Whiteboard: [ts] → [ts][patches in dependent bugs]
(Assignee)

Updated

9 years ago
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 412796
Flags: wanted1.9.1?
You need to log in before you can comment on or make changes to this bug.