Closed Bug 1211090 Opened 9 years ago Closed 9 years ago

Thousands of very small safe browsing file operations during startup account for >50% of system calls

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: gps, Assigned: gcp)

Details

(Keywords: perf)

Attachments

(4 files, 1 obsolete file)

I ran firefox.exe under sysinternal's process monitor on Windows. One of the things that stood out was that the safe browsing pset files (such as safebrowsing\goog-badbinurl-shavar.pset) from my profile have literally tens of thousands of ReadFile() system calls during startup. Most of them are reading less than 100 bytes at a time! Of the 70,378 system calls captured by procmon during firefox.exe startup, 42,461 (60.3%) of them involve files in the safebrowsing directory! goog-badbinurl-shavar.pset - a 466kb file - accounts for 8,806 of them and all but like 20-30 of those are ReadFile(). This stands in stark contrast to almost all other I/O during startup, which perform chunked 32kb or 64kb reads on my machine. I haven't added up all the I/O operations. And I'm not sure if these are concurrent with other operations. But reading goog-badbinurl-shavar.pset took 40-50ms on my machine. But that's with a brand new i7-6700K processor and a modern SSD. By contrast, reading in the DLLs at startup in 64kb chunks only takes like 1ms (those were likely in the page cache though). I reckon switching pset file reading to use fewer system calls would have a positive impact on startup time.
Looks like safe browsing .sbstore files are also written out using very small buffer sizes as well. 5,635 of the 70,378 system calls captured by this trace were WriteFile(). 5,526 of those are files in the safebrowsing directory.
Of the 140.5 MB of read I/O during Firefox startup, safe browsing files accounted for 69.8 MB. In contrast, xul.dll was 49.9 MB. There were also 32.8 MB of writes involving safe browsing files. Only 33.5 MB of writes were captured by this profile. A "safebrowsing-backup" directory accounted for all of these.
This is happening on every startup for me. AFAICT from a *very* cursory glance at the safe browsing code, I think the .sbstore files are supposed to be some kind of cache/database built from the .pset files. Having this building occur on every startup definitely smells a bit fishy to me. But I don't know the URL classifier code. I could be totally wrong here.
Summary: Thousands of very small safe browsing pset file read operations during startup account for >50% of system calls → Thousands of very small safe browsing file operations during startup account for >50% of system calls
I should have added this is with the official 2015-10-02 64-bit Nightly on a profile that has been used with Nightly for years.
We do a full update about 5 seconds after every startup because of bug 1175562. In light of that the IO amount is as expected. > I think the .sbstore files are supposed to be some kind of cache/database built from the .pset files. No, they're complimentary databases. We should look at reading/writing the files in large buffers, though.
(In reply to Gregory Szorc [:gps] from comment #0) > I haven't added up all the I/O operations. And I'm not sure if these are > concurrent with other operations. But reading goog-badbinurl-shavar.pset > took 40-50ms on my machine. But that's with a brand new i7-6700K processor > and a modern SSD. Note there's a telemetry probe for this. https://dxr.mozilla.org/mozilla-central/rev/5f16c6c2b969f70e8da10ee34853246d593af412/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp?offset=300#350 I think .pset writing accidentally regressed with bug 1046038, which replaced a single write of a large data-structure with a bunch of writes of smaller ones. Maybe we can see the actual effect in the telemetry data? Time to check those alerts... >We do a full update about 5 seconds after every startup because of bug 1175562. Actually, because of bug 779008 and bug 778855 (for which I haven't seen any proposal I'd actually want to implement), we do an update of the small test- tables every startup. SafeBrowsing updates are atomic, with this code: https://dxr.mozilla.org/mozilla-central/rev/5f16c6c2b969f70e8da10ee34853246d593af412/toolkit/components/url-classifier/Classifier.cpp#476 Which means we're going to be writing out the full database 2 times during startup (the update and the backup copy). Those are on a background thread, using OS calls, and about 2 seconds after startup, so I'm not sure how much we care about it? I think a case could be made to split the safebrowsing storage per provider, now that we actually track those. That would make the test- updates tiny and just seems sensible in general. Maybe something for our protover v4 work.
Gian-Carlo Pascutto [:gcp] from comment #6) > Maybe we can see the actual effect in the telemetry data? Time > to check those alerts... Alert for URLCLASSIFIER_PS_FILELOAD_TIME (Histogram Regression Detector) on the 2014-08-04 http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/85/alerts/?from=2014-08-04&to=2014-08-04 Too bad the URL doesn't work :( But yes, that's the same day 1046038 landed.
If it makes you feel any better, I found a similar issue with small reads in the startup cache and filed bug 1211106. I also see similar issues with reading from files like <AppData>\Local\Temp\mozilla-temp-files\mozilla-temp-16771. Even files in <Profile>\cache2\entries have some suspicious activity. I'm starting to see a pattern here. In a profile I just captured that covered a lot of random Firefox interaction, for all non-DLL file I/O I recorded 43,803 read operations reading 110,269,539 bytes for an average of 2,517 bytes/read and 22,199 write operations writing 94,337,211 bytes for an average of 4,249 bytes/write. And many of the operations in this set are using 32kb and 64kb sizes, raising the average significantly. Given what I've casually found in a night's worth of poking around, I think a performance engineer who actually knows what they are doing would have an absolute feast finding performance issues by tracing system calls [on Windows] :)
Cc'ing vladan, as this is likely relevant to his interests.
Flags: needinfo?(vladan.bugzilla)
Chris, can you reproduce this and grab a profile? Instructions on obtaining a profile: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem Please update anything that's out of date on that wiki page, that page gets referenced frequently
Flags: needinfo?(vladan.bugzilla) → needinfo?(chutten)
Assignee: nobody → gpascutto
Profiles of 32-bit Nightly do not contain main thread I/O calls against safebrowsing files between time 2300 and 34000. Maybe they happen earlier or later, or on a non-main thread. http://people.mozilla.org/~bgirard/cleopatra/#report=e579ea3e6f59ca9ae8c71fb69cc2b28380e4dcb4
Flags: needinfo?(chutten)
When I reproduced with a fresh profile, I think I had to manually trigger safebrowsing file download somehow in order for the file reads to occur near startup. Even then, I think it took a little bit for the downloads to complete. I basically let the profile sit around a few minutes, verified from debug logs (which I had to enable) that it was done. Then restarted.
I'm not sure if the downloads are complete, but the large malware shavar sets from google are in the profiles I was profiling (overloaded English word alert) at 1.2MB for pset and 2.2MB on sbstore. This is roughly in line with the profile I started on Monday and have been browsing in heavily... Would there be any reason they wouldn't be read in on start? I made sure to visit a site so it'd have to ask the table if it was okay. Is there any reason the profiler wouldn't pick up on multitudes of reads? The linked bug suggests that it's reading 2B at a time, which means that it ought to be pretty obvious. I filed a bug against that many reads of update.mar, so I know it can report _some_ opens/reads/writes.
(In reply to Chris H-C :chutten from comment #11) > Maybe they happen earlier or later, or on a non-main thread. As already stated in comment 6, they are on a background thread. The fix for bug 1175562 (which landed during this discussion) will lower the amount of IO you see shortly after startup.
Seems tricky to use nsIBufferedOutputStream together with fallocate (we know the filesize in advance).
WIP. Need to think about what to do with the fallocate calls.
Use a Buffererd Input/OutputStream, 64k buffer.
Attachment #8698905 - Flags: review?(nfroyd)
Attachment #8696005 - Attachment is obsolete: true
Extend nsIFileOutputStream to support preallocating disk storage.
Attachment #8698906 - Flags: review?(nfroyd)
Comment on attachment 8698907 [details] [diff] [review] Don't fail to open databases if we can't do buffered IO Review of attachment 8698907 [details] [diff] [review]: ----------------------------------------------------------------- I'm not a Necko peer, but I think the nsNetUtil* changes are pretty straightforward. r=me
Attachment #8698907 - Flags: review?(nfroyd) → review+
Comment on attachment 8698906 [details] [diff] [review] Add fallocate support to nsIOutputFileStream and use it Review of attachment 8698906 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the below addressed. ::: netwerk/base/nsFileStreams.cpp @@ +869,5 @@ > +NS_IMETHODIMP > +nsFileOutputStream::Fallocate(int64_t aLength) > +{ > + if (mFD && mozilla::fallocate(mFD, aLength)) { > + return NS_OK; I think we should just fail if !mFD, and return NS_OK even if mozilla::fallocate failed: if (!mFD) { return NS_ERROR_FAILURE; } mozilla::fallocate(...); return NS_OK; We don't have a good way to say NS_I_TRIED_BUT_FAILED_BUT_ITS_OK. ::: netwerk/base/nsFileStreams.h @@ +16,5 @@ > #include "nsILineInputStream.h" > #include "nsCOMPtr.h" > #include "nsIIPCSerializableInputStream.h" > #include "nsReadLine.h" > +#include "mozilla/FileUtils.h" Please include this in the .cpp file rather than the header if we don't need it in the header. ::: netwerk/base/nsIFileStreams.idl @@ +104,5 @@ > void init(in nsIFile file, in long ioFlags, in long perm, > in long behaviorFlags); > > /** > + * @param length storage to preallocate in bytes This is one of those comments written to satisfy the documentation requirements, isn't it? :) Please add a brief description, even something like "Provide a hint to the operating system that the underlying file for this stream should be at least |length| bytes long. As this is merely a hint, this method does not throw an error if the operating system indicates that it cannot preallocate storage." Along with a "@throws NS_ERROR_FAILURE if the file is not opened." @@ +106,5 @@ > > /** > + * @param length storage to preallocate in bytes > + */ > + void fallocate(in long long length); I don't want to call this fallocate on the ground that the name is unix-specific (mozilla::fallocate is sort of a historic accident). But I don't really have much better, beyond hintAllocationSize. ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp @@ +332,5 @@ > mIndexDeltas.SetLength(indexSize); > > mTotalPrefixes = indexSize; > > + uint32_t toRead = indexSize*sizeof(uint32_t); Maybe this should have been in a previous patch?
Attachment #8698906 - Flags: review?(nfroyd) → review+
Comment on attachment 8698905 [details] [diff] [review] Use Buffered IO for PrefixSet load/store. r= Review of attachment 8698905 [details] [diff] [review]: ----------------------------------------------------------------- Such happiness at using interface methods instead of raw NSPR calls. ::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp @@ +376,5 @@ > { > + // Now re-open > + nsCOMPtr<nsIOutputStream> localOutFile; > + nsresult rv = NS_NewLocalFileOutputStream(getter_AddRefs(localOutFile), aFile, > + PR_WRONLY | PR_TRUNCATE | PR_CREATE_FILE); Maybe it'd be best to specify 0644 for the permissions here as well, just to ensure we're not relying on special behavior of NS_NewLocalFileOutputStream?
Attachment #8698905 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #23) > Maybe it'd be best to specify 0644 for the permissions here as well, just to > ensure we're not relying on special behavior of NS_NewLocalFileOutputStream? I think we do want to use the default, NS_NewLocalFileOutputStream explicitly uses the default (-1) if not specified, and then nsIFileOutputStream goes on to claim defaults are a thing and it is 0664. That seems to be true: https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsFileStreams.cpp#863 I believe that should actually be 0666 (which is then & ~umask to be mostly 0664 or 0644). If we want, we can find out what the rationale here was, because, blaming through our gecko-dev tree shows the culprit to be: commit 0b0346079c102067e8a018b2450c3dcbddbee8ae Author: dougt%netscape.com <dougt%netscape.com> Date: Mon Jan 24 21:28:28 2000 +0000
(In reply to Nathan Froyd [:froydnj] from comment #22) > I think we should just fail if !mFD, and return NS_OK even if > mozilla::fallocate failed: ... > We don't have a good way to say NS_I_TRIED_BUT_FAILED_BUT_ITS_OK. I think we should propagate the error. The thing is, fallocate is *not* a hint. It does actually change the file size on disk (this is very obvious if you look at the non-Unix implementations). So the observable behavior is different if the call fails. On the other hand, I agree most writers won't or even shouldn't care, but still might want to know if they're stuck with a broken file handle, at the very least in debug mode. Maybe its more appropriate to throw NS_ERROR_INVALID_ARG for null handles, though. Does the above info change your opinion? > I don't want to call this fallocate on the ground that the name is > unix-specific (mozilla::fallocate is sort of a historic accident). But I > don't really have much better, beyond hintAllocationSize. How about the simple "Preallocate"?
Flags: needinfo?(nfroyd)
>I agree most writers won't or even shouldn't care, but still might want to know if they're stuck with a broken file handle To clarify, I mean writers shouldn't care if the fallocate itself fails, but might want to know that they're accidentally passing an invalid handle.
Let's propagate the error from fallocate, but return two different error codes for null-mFD and fallocate failure. Preallocate works as a name; let's make that [noscript], though. I don't know why JS would want to have that much control over something.
Flags: needinfo?(nfroyd)
https://hg.mozilla.org/mozilla-central/rev/59144b59cd11 Added restoring the seek pointer to fallocate, but it missed handling some (very obscure) error cases.
Attachment #8700670 - Flags: review?(nfroyd)
Comment on attachment 8700670 [details] [diff] [review] Catch some obscure failure cases when restoring the seek pointer Review of attachment 8700670 [details] [diff] [review]: ----------------------------------------------------------------- I'm not sure what the right behavior is here, though given that fallocate() callers ignore the return value, maybe it doesn't matter. ::: xpcom/glue/FileUtils.cpp @@ +57,5 @@ > > + int64_t returnpos = PR_Seek64(aFD, oldpos, PR_SEEK_SET); > + if (returnpos != oldpos) { > + return false; > + } Can this actually fail? Looking at MSDN for SetFilePointer, it doesn't look like it. And even if it does fail, we're already screwed, because failure being returned to the caller can't tell if SendEndOfFile has failed or because we couldn't restore the file position (hello incoming file corruption). @@ +119,5 @@ > } > iWrite += nBlk; > } > > + int64_t returnpos = PR_Seek64(aFD, oldpos, PR_SEEK_SET); I don't think this call can ever fail on Unix. Certainly the failure conditions listed in lseek(2) and fseek(3) (Linux manpages, at least) can never happen at this point, because we have a valid, open file descriptor, and we're seeking to a known-good point in the file.
Attachment #8700670 - Flags: review?(nfroyd)
Alert for URLCLASSIFIER_PS_FILELOAD_TIME (Histogram Regression Detector) on 2015-12-22 Alert details: http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/78/alerts/?from=2015-12-22&to=2015-12-22 Looks like a startup win of about 10-70ms for most users. I think that shows that the original regression is fixed.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: