Closed
Bug 21184
Opened 25 years ago
Closed 24 years ago
Disk cache writes are very slow on NT
Categories
(Core :: Networking: Cache, defect, P3)
Tracking
()
Future
People
(Reporter: fur, Assigned: gordon)
References
Details
(Keywords: perf, Whiteboard: [PDT-][nsbeta3-][pdtp3])
Attachments
(3 files)
|
5.39 KB,
text/plain
|
Details | |
|
11.09 KB,
text/plain
|
Details | |
|
1.15 KB,
patch
|
Details | Diff | Splinter Review |
There's a low-level cache test program that tests cache module functionality and
performance. Cache read performance is several MB/s, but the test reports
*extremely* low cache write performance on NT, e.g. 20KB/second. (Use
'TestRawCache -f' to reproduce.) According to the Yixiong Zou, the Intel
engineer that wrote the disk cache, the same test on Linux reports write times
of several megabytes per second.
I think the file transport or the event loop must be totally busted on NT.
| Reporter | ||
Comment 1•25 years ago
|
||
CC'ing sdagley since he's concerned about cache performance on the Mac, too.
| Reporter | ||
Updated•25 years ago
|
Summary: Disk cache (file transport) writes are very slow on NT → Disk cache writes are very slow on NT
| Reporter | ||
Comment 2•25 years ago
|
||
I realized that I should add that I'm not certain that this is entirely a file
transport problem, since the cache dbm database is also being manipulated, but
but it's still awfully strange that Linux is two orders of magnitude faster than
NT.
A good way to isolate the problem would be to instrument TestFileTransport.cpp
to see if it really is a file transport problem.
Updated•25 years ago
|
Target Milestone: M14
Bulk move of all Necko (to be deleted component) bugs to new Networking
component.
| Reporter | ||
Comment 4•25 years ago
|
||
I created TestFileWrite.cpp (attached) to benchmark file transport syncronous
write speed. My results for NT4 seem to support the conclusion that there's a
performance problem with file transport writes on NT, though it might be endemic
to the OS, rather than a file transport performance bug.
Synchronous Write() speed, not including overhead for OpenOutputStream()
File size Rate
=============================
10KB 50 KB/s
50KB 220 KB/s
100KB 490 KB/s
1000KB 490 KB/s
If I run the tests twice in a row, write speed goes up for the second run,
i.e. overwriting an existing file rather than writing a new file:
File size Rate
=============================
10KB 150 KB/s
50KB 770 KB/s
100KB 1140 KB/s
1000KB 3117 KB/s
The slow write speed for small files is problematic since 95% of cache files are
less than 50KB in size.
| Reporter | ||
Comment 5•25 years ago
|
||
Comment 6•25 years ago
|
||
I'd like to get this test case working on the mac to see how bad the performance
is there. Maybe Steve and Gordon can help. Fixing bug 10438 may help a lot.
Updated•25 years ago
|
Component: Networking → Networking: Cache
Comment 7•25 years ago
|
||
Before PDT can make a call, we need to hear some sort of update on this bug.
Comment 8•25 years ago
|
||
I'm trying to gather some data now, but the problem is now compounded by the
need to implement buffering for the new file stream classes that came along
with nsIFile. bug 19233
Comment 9•25 years ago
|
||
I got some really interesting data on this on NT. First I tried Scott's test
case, but it only writes 500 bytes at a time. So I wrote another one
(TestWriteSpeed.cpp) to isolate exactly what a good transfer size should be.
I'll check them both in to netwerk/tests.
Here's what I found: Disk write performance grows linearly with the buffer size
being written for buffers up to about 104k. At 16k, the transfer rate is 16k/ms,
at 104k, the transfer rate is 98.8k/ms (seem like awfully round numbers --
someone should check my test program when I check it in). After 100k,
performance drops sharply off and then peeks again at 168k. Another peek at
248k, and then at 324k, performance drops off to a whopping 3k/ms and just stays
there no matter how big the buffer (sounds like an NT bug). [I'll include my
table of numbers as an attachment.]
From this analysis, I'm going to make the write buffer size be 64k. Why? It's
smaller than 100k and yet is still large enough to get us pretty good
performance. That means that every buffered output stream would need a 64k chunk
of memory hanging around. Maybe I can find a way to pool the last one used so
that we save the malloc time for the buffer too.
I asked Wan-Teh whether we could use PR_Writev to write discontiguous buffers,
thereby saving the memory footprint for unused buffer space, yet still getting
the performance of large contiguous buffers, but (a) nspr doesn't implement it
for files, just for pipes, and (b) Windows and Mac don't support it anyway so it
would have to be emulated by nspr. It could improve unix performance, but I'm
not going to bother with it for now.
Next... to test PR_Read performance
Comment 10•25 years ago
|
||
Comment 11•25 years ago
|
||
P.S. The write columns (mean and stddev) are the times just for PR_Write. Iters
is the number of iterations averaged for that size. The total columns (mean and
stddev) include the time to call PR_Open and PR_Close too.
Comment 12•25 years ago
|
||
Here's what the TestWriteStream test is saying now with buffering in place. For
the first time run (when files don't exist):
File size Rate
=============================
10KB 8.6 MB/s
50KB 16.9 MB/s
100KB 17.4 MB/s
1000KB 2.8 MB/s
For subsequent runs, without deleting the test files:
File size Rate
=============================
10KB 16.3 MB/s
50KB 23.6 MB/s
100KB 25.3 MB/s
1000KB 2.7 MB/s
[Scott: You must have changed the test to report MB instead of KB -- are your
old numbers really correct?]
There is a weird drop-off in transfer rates for large files, but that's got to
be an NT bug. It's the same thing I saw in my TestWriteSpeed test output
(attached). I thought it was related to the buffer size passed to PR_Write, but
now I realize it's just a function of the size of the file that's being created
(because now with buffering turned on in the file transport, we're writing in
64k chunks).
Comment 13•25 years ago
|
||
Windows has some file flags that you can specify
when opening a file to optimize sequential file
access. If you always write or read sequentially,
this kind of flags may help.
Unfortunately, PR_Open does not allow you to specify
these flags because they are not universally supported.
I'd try them anyway -- use Win32 CreateFile to open
these files and then call PR_ImportFile (a "private"
function) to convert the HANDLE's to PRFileDesc's.
You can look at mozilla/nsprpub/pr/src/md/windows/w95io.c,
function _PR_MD_OPEN, to see how NSPR calls CreateFile.
Comment 14•25 years ago
|
||
Wouldn't it be better to add a PR_SEQUENTIAL flag that just gets ignored on
other platforms? That would avoid our need to ifdef for windows.
Comment 15•25 years ago
|
||
> [Scott: You must have changed the test to report MB instead of KB -- are your
> old numbers really correct?]
I think the numbers that I originally reported were correct. (The sync
write test took a significant fraction of a minute to run, even though only a
few MB were being written to disk.)
> There is a weird drop-off in transfer rates for large files, but that's got to
> be an NT bug.
There will probably be a drop-off in performance as soon as NT's disk cache
fills. At that point, transfer rate will drop to the level of the raw disk
throughput, i.e. probably somewhere between 2 and 8 MB/s, depending on your
particular HW configuration and the degree to which the I/O is sequential or
random. A fairer test might be to ensure that the same amount of data is
written out, regardless of file size, e.g. compare rates when writing out 30
files of size 50K and 15 files of size 100K.
Comment 16•25 years ago
|
||
PR_SEQUENTIAL is a good idea. I'd still be
interested in seeing performance numbers.
The Windows file flag that I talked about is
FILE_FLAG_SEQUENTIAL_SCAN. Another flag that
may be useful is FILE_FLAG_RANDOM_ACCESS.
Comment 17•25 years ago
|
||
Warren: can you give us status on this, along with your recommendation?
Comment 18•25 years ago
|
||
Wan-Teh: Realistically, when can we expect to see these nspr flags?
Comment 19•25 years ago
|
||
Comment 20•25 years ago
|
||
Patch looks good, except for 1 question: I've noticed that it's in w95io.c --
does that apply to NT also? If the answer is yes, then r=warren.
Comment 21•25 years ago
|
||
w95io.c is the generic win32 code. It applies to all win32 platforms.
Could you give the patch a try and see if it helps the disk cache writes?
Comment 22•25 years ago
|
||
Will do.
Updated•25 years ago
|
Target Milestone: M14 → M15
Comment 24•25 years ago
|
||
Moving non-essential, non-beta2 and performance-related bugs to M17.
Target Milestone: M15 → M17
Comment 25•25 years ago
|
||
Here is some data on the performance of PR_Write on Linux and Windows platforms
when writing in 16 K chunks:
Platform File_Size Data_Rate
Linux RH 6.1 32KB 8 MB/sec
64KB 9 MB/Sec
1MB 10 MB/Sec
Win 98 32 KB 3 MB/sec
64 KB 21 MB/sec
1 MB 20 MB/sec
Win NT 32 KB 10 MB/sec
64 KB 16 MB/sec
1 MB 25 MB/sec
This data is line with the data reported by Warren (for NT) on 2000-02-04,
except that there is no performance anamoly when writing a 1MB file on NT.
For a fixed file size of 1MB and a variable chunk size, the transfer rates on NT
are:
Chunk_Size Data_Rate
Win NT 128 KB 30 MB/sec
256 KB 29 MB/sec
324 KB 30 MB/sec
384 KB 31 MB/sec
512 KB 30 MB/sec
The transfer rate varies somewhat, but there is no big drop in performance when
using large chunks (324 KB, 512 KB, etc). This is different from Warren's data
reported on 2000-02-02.
When using the FILE_FLAG_SEQUENTIAL_SCAN flag on Windows NT, the results are:
File_Size Data_Rate
Win NT 64 KB 20 MB/sec
1 MB 30 MB/sec
And, when using the PR_SYNC flag (without the FILE_FLAG_SEQUENTIAL_SCAN flag):
File_Size Data_Rate
Win NT 64 KB 2 MB/sec
1 MB 1.9 MB/sec
Again with the PR_SYNC flag, but with a chunk size of 500 bytes
File_Size Data_Rate
Win NT 64 KB 145 KB/sec
1 MB 135 KB/sec
Conclusions:
1. On Linux and Windows platforms, the data transfer rate with PR_Write ranges
from a few MB/sec to a few tens of MB/sec, depending on the file size and
platform. For normal writes, the operating system on all of these platforms
caches the file data; the transfer rates are not out of the ordinary.
2. On NT, use of the FILE_FLAG_SEQUENTIAL_SCAN flag improves performance, but
not by an order of magnitude; i.e. use of this flag is not a solution to the
original problem of the order-of-magnitude difference in performance on Linux
and NT.
3. Use of the PR_SYNC flag (i.e. synchronous writes to the disk) on NT, reduces
the data transfer rate by an order of magnitude. When the chunk size is reduced
from 16K to 500 bytes, the rate goes down, not surprisingly, by another of
magnitude.
4. Use of larger chunk sizes increases the transfer rate upto a point, after
which there not much difference.
5. The data reported here is for the first creation of a file; overwriting an
existing file results in higher data transfer rates.
Comment 26•25 years ago
|
||
My suggestion would be to run the file transport test program under a controlled
environment. Ideally, the system should be a dual-boot Linux/NT platform and
tests run on a relatively idle system.
Profiling the test run may also help. Even a high-level measurement such as the
ratios of the real time, system time and user time for the test execution can
give useful data. For example, a significant ratio of idle time to total time
indicates presence of disk I/O operations or conflict with other activities in
the system.
A normal file write operation is a largely compute bound operation because it is
made of system calls and memory copies (of data from user space to file cache).
Presence of disk I/O operations indicates lack of sufficient room in the memory
file cache or the use of synchronous flag.
Updated•25 years ago
|
Target Milestone: M17 → M18
Comment 27•25 years ago
|
||
putting on nsbeta3 radar to see if any one is still interested.
I was playing with some performance benchmarks at
http://i-bench.zdnet.com/ibench/testlist/run.html
and noticed that we really don't seem to be much faster
on a 56k modem during the second and third cycles
that are testing cache performance. I don't have any
hard numbers yet but may have some over the weekend.
Keywords: nsbeta3
Comment 28•25 years ago
|
||
You gonna fix it?...
Updated•25 years ago
|
Whiteboard: [PDT-] → [PDT-][nsbeta3+]
Comment 29•25 years ago
|
||
Here is a summary
Does anyone else see similar results?
4.x Seamonkey PR2
Load Complex (graphics?) Pages
All iterations
279,240 1,951,060 6.99 x slower
First iteration (downloaded)
186360 284,680 1.53 x slower
Subsequent iteration (cached)
13269 238,680 17.88 x slower
Load text-based pages
All iterations
711670 936,970 1.32x slower
First iteration (downloaded)
661800 527,060 26% faster
Subsequent iteration (cached)
49870 409,910 8.21x slower
Processing
Java VM
27320 28,205 1.03x slower
JavaScript
8.96 (sec) 35.65 3.98x slower
from a 4.x run over 56k. second is from the monkey PR2.
Comment 30•25 years ago
|
||
Reassigning to Gagan because it looks like the cache isn't working.
Assignee: warren → gagan
Comment 32•25 years ago
|
||
Marking P2. We are supposed to be 17x slower according to tests on i-bench done
by chofman.
Priority: P3 → P2
Comment 33•25 years ago
|
||
pdtp3 - It's too late to be chasing this NT perf improvement. If you already
have an obvious fix in mind, please come explain the impact to the pdt.
Priority: P2 → P3
Whiteboard: [PDT-][nsbeta3+] → [PDT-][nsbeta3+][pdtp3]
Target Milestone: M18 → Future
Comment 34•25 years ago
|
||
Not holding PR3 for this. Marking nsbeta3-. Please nominate for RTM if we really
have to fix this before shipping Seamonkey
Whiteboard: [PDT-][nsbeta3+][pdtp3] → [PDT-][nsbeta3-][pdtp3]
| Assignee | ||
Comment 36•24 years ago
|
||
This bug was for the old cache, but there is some good discussion that may apply
to mozilla file i/o in general, so I'm marking it a duplicate of bug 73295 (the
new cache performance bug), so we don't lose all these comments.
*** This bug has been marked as a duplicate of 73295 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
| Comment hidden (collapsed) |
You need to log in
before you can comment on or make changes to this bug.
Description
•