Closed Bug 412796 Opened 17 years ago Closed 15 years ago

Optimize fastload system (mmap fileIO, endianness, packed structs)

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: skumar, Assigned: taras.mozilla)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [ts])

Attachments

(4 files, 9 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.13pre) Gecko/20070505 (Debian-1.8.0.14~pre071019c-0etch1) Epiphany/2.14
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.13pre) Gecko/20070505 (Debian-1.8.0.14~pre071019c-0etch1) Epiphany/2.14

I am working on improving start-up speed of firefox. One of the suggestions I found at the wiki page was improve fastload system using mmap file IO, arch-specific code to avoid endian agnostic code. As I found no document that explains fastload system, I read the code and wanted to experiment first to see how much these things improve the start-up time. Below is what I did as part of the experiment.

1. Instead of modifying the fastload code, I implemented a new interface called nsIMMAPFileInputStream, which mimics nsFileInputStream, but implements the methods using PR_MemMap and PR_MemUnmap.

2. Implemented nsIMmapInputStream to mimic nsIBufferedInputStream. This is because the fastload users (nsXULPrototypeCache and mozJSComponentLoader) wrap nsIBufferedInputStream on top of nsFileInputStream and pass that to FastLoad system.

3.Above two will not do any actual seek on the File Desc, as it is not needed for  mmap file IO. nsMMAPFileInputStream does seek on FD only for SetEOF ().

4. I did not modify the Writing part (nsFastLoadFileWriter and its used-s), as writing is rarely done and because I am just experimenting)

5. I implemented Read16, Read32, Read64 for nsFastLoadFileReader as these methods implemented by nsBinaryInputStream convert the numbers back to big-endian. Implemented Write16, Write32, Write64 for nsFastLoadFileWriter as well. And commented other places where the endian conversion was done.

6. Made nsFastLoadHeader, nsFastLoadFooterPrefix, nsFastLoadSharpObjectInfo structures as packed and modified the corresponding read and write methods to just read the structures instead of reading individual struct members.

7. Modified nsXULPrototypeCache.cpp and mozJSComponentLoader.cpp to use nsIMMAPFileInputStream and nsIFileInputStream.

8. Verified that my code is executed using printfs and via gdb. I also verified removing XUL.mfasl and XPC.mfasl as well.

9. The performance gain I am observing for a debug build is around 2 secs, but for non-debug build it is few hundred milliseconds.

10. I expected a proportinal gain on N800, but there too the gain is 750-500 ms (for 15 sec startup time)

I know of the following optimizations, but worried about insignificant performance gain.

A. Rewrite Fastload code to avoid Seeks altogether (code under HAVE_PR_MEMMAP). For mmap file IO, seeks are not needed. Can directly pass the offset for read.

B. deCOMtaminate fastload system.

Any comments, suggestions? 

1. About my approach
2. Why so little gain on a device like N800?
3. What would be the performance gain even if we implement A & B and any other suggestions? on N800.

4. If I do --enable-timeline, I observed a line in the output - "00014.219 (00c7b590):  Navigator Window visible now" - Does that mean the time taken to show the navigator window is 14.219 seconds? (Will attach some timeline log files)


Reproducible: Always

Steps to Reproduce:
1. --enable-timeline

Actual Results:  
Very little performance gain observed.

Expected Results:  
at least 3 secs improvement on N800
Component: Startup and Profile System → XPCOM
Product: Firefox → Core
QA Contact: startup → xpcom
Status: UNCONFIRMED → NEW
Ever confirmed: true
sudheer, do you have a patch that I can use?
Please note that this is experimental code and not ready for primetime.
Keywords: perf
Sounds like this has the potential to be a huge win in startup time, and maybe also page load time if we can do the same for the HTTP disk cache.
Assignee: nobody → skumar
Flags: blocking1.9?
Keywords: footprint
OS: Linux → All
Hardware: PC → All
Attached patch Clean patch (obsolete) — Splinter Review
Cleaned commented out code. Added macros HAVE_PR_MEMMAP (in nscore.h), ALWAYS_BIG_ENDIAN, USE_PACKED_STRUCTS to guard code related to mmap io, endian conversion and packed structs respectively.

HAVE_PR_MEMMAP is defined in nscore.h
USE_PACKED_STRUCTS is defined in nsFastLoadFile.h
ALWAYS_BIG_ENDIAN is not defined.
Attachment #301693 - Attachment is obsolete: true
Attached patch Clean patch (obsolete) — Splinter Review
Fixed a mistake in previous patch.
Attachment #303170 - Attachment is obsolete: true
Version: unspecified → Trunk
Sudheer, were you planning on asking for review at some point?
How are you handling the same profile directory being used on systems of different endianness (e.g. a home directory in NFS or AFS and a user who uses both SPARC and x86 machines)?  I guess it's probably acceptable to just invalidate the fastload file on endianness mismatch, but _something_ definitely needs to happen to handle this case.
Comment on attachment 303172 [details] [diff] [review]
Clean patch

Great to see something like this being attacked. General comments (wish I had been consulted on design):

* Don't use XPCOM boilerplate and add seemingly-reusable stream interfaces and implemnetations.

* Do aggressively inline the mmap usage into nsFastLoadFile.cpp.

* Consider making the stream types concrete: nsFastLoadFileReader instead of nsIObjectInputStream, etc. so we can inline method calls as appropriate.

* Rather than #ifdef the code, just always write and read as packed as you can.

* As bz says, you'll need to extend the fastload file format to include some kind of byte order mark, so it can invalidate based on changing order.

/be
Thanks very much for the comments be & bz.

I am not thinking this patch as candidate to go in. I wanted to see quickly if this gives any performance gain. I wanted some comments, suggestions from experts. I am glad that I am getting some now. 

>> wish I had been consulted on design

Great! Please give me more input. Was just experimenting. Will be more than happy to start with a new approach.

>>Don't use XPCOM boilerplate and add seemingly-reusable stream interfaces and
implemnetations.

You mean the newly added classes/ interfaces does not need to be similar to nsFileInputStream and nsBufferedInputStream? 

>>Do aggressively inline the mmap usage into nsFastLoadFile.cpp.

I like this idea. But, how will fastload code work for systems not having mmap support if the code assumes that the stream is mmaped?

>> * Consider making the stream types concrete: nsFastLoadFileReader instead of
nsIObjectInputStream, etc. so we can inline method calls as appropriate.

You mean make users of fastload (nsXULPrototypeCache.cpp and mozJSComponentLoader.cpp) use nsFastLoadFileReader directly than thru nsIObjectInputStream?

>> extend the fastload file format to include some kind of byte order mark

Will do. 
But, nsFastLoadFileReader::Open() should fail for Version or FileSize or Checksum? Isn't it?

Sorry I am relatively new to Mozilla.
Status: NEW → ASSIGNED
Not a known Ts win, so not a blocker. See Comment 0:

Actual Results:  
Very little performance gain observed.
Flags: blocking1.9? → blocking1.9-
Bug 196843 – Fastload CSS style sheets, might be a bigger win.
Sudheer said in comment 0 that he saw 5% Ts perf win, at least on the N800.  Maybe that's "very little performance gain" compared to the 20% he expected, but it's still huge.
Flags: blocking1.9- → blocking1.9?
Adding wanted for sure, but I'm not sure we'd block on this, especially if Sayrer thinks bigger wins are available.

Sudheer: when your comment is ready for review, please ask brendan
Flags: wanted1.9+
yeah, definitely a nice patch, we want it. not going to block on it, though.
Flags: blocking1.9?
Mobile devices usually use flash memory instead of magnetic disk which means less seek time cost.
Comment on attachment 303172 [details] [diff] [review]
Clean patch

please don't add doubled blank lines:
 #include "nsIPropertyBag2.h"
 
+
 // Helper, to simplify getting the I/O service.

+nsMmapInputStream::Create(nsISupports +    nsMmapInputStream* stream = new nsMmapInputStream();

this could use nsRefPtr instead of NS_ADDREF/NS_RELEASE

+nsMmapInputStream::Init(nsIInputStream* stream, PRUint32 bufferSize)
+{
+    NS_ASSERTION(stream, "need to supply a stream");

I'd rather NS_ENSURE_STATE or something.

+nsMmapInputStream::ReadSegments(nsWriteSegmentFun writer, void *closure,
+                                    PRUint32 count, PRUint32 *result)

PRUint32 count should line up with nsWriteSegmentFun

+    char * temp_buf = new char [count];

our style doesn't usually leave *s alone, pick a side (whichever side is prevailing)

+    PRUint32 read = 0;
+
+    Source ()->Read (temp_buf, count, result);
no space before ()s in function calls.

+    delete temp_buf;
that should probably be delete[]  (new[] and delete aren't compatible, nor is new with delete[]).

(Personally, I prefer malloc+free w/ a null check.)

+    nsresult rv = NS_OK;
+
+    rv = writer(this, closure, temp_buf, 0, *result, &read);
use one line, skip the = NS_OK.

+    return (*result > 0) ? NS_OK : rv;

very odd, is it worth adding comment explaining this logic?

+    if (mStream)
+        return Source()->IsNonBlocking(aNonBlocking);
+    return NS_ERROR_NOT_INITIALIZED;
reverse this:

if (!mStream)
  return  ...
return Source()->...


+    *aStream = mStream;
+    NS_IF_ADDREF(*aStream);

we generally combine this as:

NS_IF_ADDREF(*aStream = mStream;);

+    if (mStream == nsnull)
+        return NS_BASE_STREAM_CLOSED;

not sure about file style, but generally I think !mStream is preferred over == nsnull (someday we want to get rid of nsnull).

+class nsMmapInputStream : public nsBufferedStream,
+                              public nsIMmapInputStream,

more alignment issues (public, public). - last flag for this by me.

+// XXX Some code copied from nsFileInputStream. If there is a bug fix in
+// nsFileInputStream, please check this too and apply.
+
+
doubled line (no reason, last flag about this by me). Is this comment cross linked in nsFileInputStream? if not, please do.

+        // file descriptor is still referencing the file.  since we've already

one space after periods, and capitalize the first word in sentences (first and only flag for both).

+            // If REOPEN_ON_REWIND is not happenin', we haven't saved the file yet

Normal English is appreciated.

+        mMap = PR_CreateFileMap (mFD, fsz, PR_PROT_READONLY);

I'm going to assume this can fail.

+        mmapBuf = PR_MemMap (mMap, 0, cnt);

and that this would crash.

+        if (mBehaviorFlags & REOPEN_ON_REWIND) {

reverse this to return early with the current else case.

+        return NS_OK;
+    }
+    else

"else after return". drop the |else| token, its corresponding braces and unindent the remainder.
(first and only warning)

+        if (newoffset <= mmapLen)
...
+        else
+            return NS_ERROR_FAILURE;

return early with the else error case. Can you provide a better error result?

+    PRFileMap * mMap;
+
+    PRUint32 mmapLen;
+
+    PRUint32 mCurOffset;

+    PRInt32 mIOFlags;
+    /**
+     * The permissions passed to Init() for the file open.

your choices of double spacing and not aren't ideal. Please leave a line between a variable and the next comment block. Personally I wouldn't leave a blank line between two variable declarations.

"the file open [call]"?

+     * Flags describing our behavior.  See the IDL file for possible values.

Please use @see <...>

+#ifdef ALWAYS_BIG_ENDIAN
I think the code assumes that the compiler isn't stupid and will optimize the entire block away if it's a NOP. Are the compilers we're using really stupid enough to "not get it"?

The ifdef itself is scary, because it took me too long to figure out what you were doing.

+    rv = Read(reinterpret_cast<char*>(a16), sizeof *a16, &bytesRead);
+    if (NS_FAILED(rv)) return rv;
+    if (bytesRead != sizeof *a16)

I think we typically write sizeof PRUint16 instead of *a16.

+}
+NS_IMETHODIMP

blank line between methods, please (only warning).

+#ifdef  USE_PACKED_STRUCTS
too many spaces

+    if (NS_FAILED(rv))
+        return rv;
+
+    if (bytesRead != sizeof *aFooterPrefix) {
+        return NS_ERROR_UNEXPECTED;
+    }

drop the braces, none of the other returns have them.

I hope PACKED structs causes changed version information. Please keep in mind that we expect the fastload file to be shared across platforms... which means we're going to need to eat 2 versions for each version bump or something.

+#ifndef USE_PACKED_STRUCTS

some consistency between adding blocks above and adding blocks below would be appreciated.

+#ifndef USE_PACKED_STRUCTS
     nsresult rv;

I think I'd rather the nsresult declared before the ifdef if it's shared...

+#ifdef ALWAYS_BIG_ENDIAN
         checksum = NS_SWAP32(checksum);

I'm going to assert that the compiler can't possibly be this stupid and ask that these ifdefs be removed. They make reading things annoying.

+#define USE_PACKED_STRUCTS

+#ifdef USE_PACKED_STRUCTS
+__attribute__((__packed__))

how portable is this and precisely what are the consequences of mixing msvc, sun studio, gcc and icc?

+    nsresult Read32 (PRUint32 * a32);
+    nsresult Read64(PRUint64* a64);
+    nsresult Read16(PRUint16* a16);

The space is obviously wrong. Why this order?

+#include <typeinfo>

why?

+/* Some platforms don't have an implementation of PR_MemMap(). */
+/* See bug 318077 for WinCE.                                   */
+#if !defined(XP_BEOS) && !defined(XP_OS2) && !defined(WINCE)
+#define HAVE_PR_MEMMAP
+#endif

i don't like the idea that nscore does this. what's wrong with configure?

+#ifndef HAVE_PR_MEMMAP
         rv = NS_NewLocalFileInputStream(getter_AddRefs(fileInput), mFile);
         if (NS_FAILED(rv)) return rv;
 
         rv = NS_NewBufferedInputStream(getter_AddRefs(mInputStream),
                                        fileInput,
                                        XUL_DESERIALIZATION_BUFFER_SIZE);
+#else
+        rv = NS_NewMmapFileInputStream(getter_AddRefs(fileInput), mFile);
+        if (NS_FAILED(rv)) return rv;
+
+        rv = NS_NewMmapInputStream(getter_AddRefs(mInputStream),
+                                       fileInput,
+                                       XUL_DESERIALIZATION_BUFFER_SIZE);
+#endif

is it possible for Mmap to fail even if the platform supports it? If so, I'd prefer putting the MEMMAP code first like this:

+#ifdef HAVE_PR_MEMMAP
+        rv = NS_NewMmapFileInputStream(getter_AddRefs(fileInput), mFile);
+        if (NS_SUCCEEDED(rv)) +        rv = NS_NewMmapInputStream(getter_AddRefs(mInputStream),
+                                       fileInput,
+                                       XUL_DESERIALIZATION_BUFFER_SIZE);
+#endif
+if (!mInputStream) {
         rv = NS_NewLocalFileInputStream(getter_AddRefs(fileInput), mFile);
         if (NS_FAILED(rv)) return rv;
 
         rv = NS_NewBufferedInputStream(getter_AddRefs(mInputStream),
                                        fileInput,
                                        XUL_DESERIALIZATION_BUFFER_SIZE);
+}
Sudheer, don't worry about the drive-by review -- the high-level points about not adding general XPCOM stream classes, inlining more aggressively, etc., are what to focus on.

Timeless, that was bot-like. How about a few kind words and disclaimers?

/be
Thanks be. :-)
Thanks Timeless for your thorough and good comments. Learnt a lot about mozilla coding style. Also, as be pointed out, we are going to redesign most of this. We will definitely keep your comments in mind and will not repeat those mistakes (Sorry. I guess I am too accustomed to our company coding style)
sudheer, can you post another patch w/ brendan's suggestions?
It is sort of nice to be able to do:

#ifndef HAVE_PR_MEMMAP  (or whatever define we want)

         nsresult rv = NS_NewLocalFileInputStream(getter_AddRefs(fileInput),
                                                  mFile);
#else
        nsresult rv = NS_NewMmapFileInputStream(getter_AddRefs(fileInput),
                                                 mFile);
#endif //HAVE_PR_MEMMAP

If we remove the general stream classes, you will not be able to do this.    
I was advising to avoid the nsI*Stream interfaces altogether and use a buffer, either mmapped or malloced and read/written.

/be
Just curious, any news on that front?
Taras has been looking into using mmap for JAR stuff too. He might be interested in this.
Blocks: 459117
so. i'm adding the coverity keyword because some engineer somewhere foolishly stole the current patch and integrated it into a tree that coverity reviewed. coverity then proceeded to flag:

+    delete temp_buf;
that should probably be delete[]  (new[] and delete aren't compatible, nor is
new with delete[]).

As for bot like. either i flag it early, or i suffer when someone commits it to a tree, and then i get to botlike convert the coverity report into tiny bugs (which have the opportunity to get ignored).

kind words: thanks for listening, please do be careful, because people will be foolish and steal your work and use it even if it isn't close to ready.

fwiw, i've been giving direct reviews in email to other people, it gets me less bad press, but the thing is, that if i stick comments into a bug, then other people can learn. and i can yell at my coworkers for not reading them. if i had sent my comments by email, then there's no way for others to be aware that the patch isn't ready.
Keywords: coverity
Comment on attachment 303172 [details] [diff] [review]
Clean patch

recording coverity:review-
Attachment #303172 - Flags: review-
Attached patch Unbitrotted (obsolete) — Splinter Review
Just updated to trunk so I could test on n810
Attachment #303172 - Attachment is obsolete: true
Comment on attachment 390872 [details] [diff] [review]
Unbitrotted

Ugh, it still has some crappy timing code in it too
mfinkle says:minimal difference, barely noticeable. I think we all agree with Brendan that this patch needs to be more aggressive.
Depends on: 518230
Blocks: 518230
No longer depends on: 518230
I'm going to work on this to address issues found in bug 340148.

From initial inspection looks like nsFastLoadFileReader should own the mmap buffer.
Assignee: skumar → tglek
Blocks: 517956
Blocks: 340148
Attached patch proof of concept (obsolete) — Splinter Review
this patch reads stuff via mmap, i got rid of some big-endian nonsense, but most of it remains.
Attachment #390872 - Attachment is obsolete: true
(In reply to comment #32)
> Created an attachment (id=402937) [details]
> proof of concept

I ran some tests with this patch, looks good.  Cold startup wall clock dropped 1s from 11s to 10s.  Number of bytes read() from the two mfasl files unsurprisingly dropped from ~7.5MB to 0, a loss of ~200ms during those reads().  Not sure where the remainder of the 1s drop in wall clock comes from.  I didn't notice any change outside of noise in warm startup, but warm is only 1s for me.
Attachment #402937 - Flags: review?(brendan)
Comment on attachment 402937 [details] [diff] [review]
proof of concept

This is a bit of a hack, since I know how to piggyback on existing streams to setup mmap(ie how to get at mFD of nsFileStream), I pass in an nsIFile in order to be able to 

Would like to get a review indicating how much(or little) more I need to do to get this landed to get some perf wins.

In the longer term I'd like to rewrite fastload completely.

I was planning to only do minimal decomtamination in this revision of the landable patch. Unfortunately important things like making faslreader not inherite from Binary/SeekableStreams causes a domino effect of decom.

I'm with rewriting fasl completely if we think it's reasonable to spend a month or two on it.
(In reply to comment #34)
> (From update of attachment 402937 [details] [diff] [review])
> This is a bit of a hack, since I know how to piggyback on existing streams to
> setup mmap(ie how to get at mFD of nsFileStream), I pass in an nsIFile in order
> to be able to 
That should read:
This is a bit of a hack, since I don't see how to piggyback on existing streams to setup mmap(ie how to get at mFD of nsFileStream), I hackily pass in an nsIFile to avoid fighting that battle.
Comment on attachment 402937 [details] [diff] [review]
proof of concept

>diff --git a/content/xul/document/src/nsXULPrototypeCache.cpp b/content/xul/document/src/nsXULPrototypeCache.cpp
>--- a/content/xul/document/src/nsXULPrototypeCache.cpp
>+++ b/content/xul/document/src/nsXULPrototypeCache.cpp
>@@ -628,6 +628,14 @@ class nsXULFastLoadFileIO : public nsIFa
> NS_IMPL_THREADSAFE_ISUPPORTS1(nsXULFastLoadFileIO, nsIFastLoadFileIO)
> 
> 
>+

Nit: no more blank lines -- two is enough.

>+NS_IMETHODIMP
>+nsXULFastLoadFileIO::GetFile(nsIFile** aResult)
>+{
>+    NS_ADDREF(*aResult = mFile);
>+    return NS_OK;
>+}
>+
> NS_IMETHODIMP
> nsXULFastLoadFileIO::GetInputStream(nsIInputStream** aResult)
> {
>diff --git a/js/src/xpconnect/loader/mozJSComponentLoader.cpp b/js/src/xpconnect/loader/mozJSComponentLoader.cpp
>--- a/js/src/xpconnect/loader/mozJSComponentLoader.cpp
>+++ b/js/src/xpconnect/loader/mozJSComponentLoader.cpp
>@@ -382,6 +382,12 @@ ReportOnCaller(JSCLContextHelper &helper
> NS_IMPL_ISUPPORTS1(nsXPCFastLoadIO, nsIFastLoadFileIO)
> 
> NS_IMETHODIMP
>+nsXPCFastLoadIO::GetFile(nsIFile **ret) {

Left brace on own line nit.

>+    NS_ADDREF(*ret = mFile);
>+    return NS_OK;
>+}

Is this something we could deCOMtaminate into nsFastLoadFile.{h,cpp}?

>@@ -975,6 +981,7 @@ mozJSComponentLoader::StartFastLoad(nsIF
>                 if (NS_SUCCEEDED(rv)) {
>                     /* Get the JS bytecode version number and validate it. */
>                     PRUint32 version;
>+                    // feels like this should be done elsewhere

Is this comment useful? I say FIXME: bug nnnnnnn or it didn't happen :-P.

>@@ -251,8 +252,6 @@ NS_AddFastLoadChecksums(PRUint32 sum1, P
> #undef ONES_COMPLEMENT_ACCUMULATE
> #undef FLETCHER_ACCUMULATE
> 
>-static const char magic[] = MFL_FILE_MAGIC;
>-

Does this really need to move? Just looking for more minimal patch.

>-            rv = mSeekableInput->Seek(nsISeekableStream::NS_SEEK_SET,
>-                                      entry->mSaveOffset);
>-            if (NS_FAILED(rv))
>-                return rv;
>-
>+            mPos = entry->mSaveOffset;

Suggestions:

* replace mSeekable*->Foo calls with inline helpers, not open-coded replacements.

* Suggest more consistent and consistently-terse names: mFileData, mFilePos, mFileLen, etc. Or mData/Pos/Len but those seem a bit too short, and given mFile I see why you did mFileData.

>-
>-    rv = nsBinaryInputStream::Read(aBuffer, aCount, aBytesRead);
>-
>-    if (NS_SUCCEEDED(rv) && entry) {
>+    PRUint32 count = PR_MIN(mLen - mPos, aCount);
>+    memcpy(aBuffer, mFileData+mPos, count);
>+    *aBytesRead = count;
>+    mPos += count;
>+    if (entry) {

Here's a better example. The inline helper can certainly be infallible, or you could ignore the rv. Or would it help get this patch in to leave the nsresult rv and its test as is:

More in a bit, unless you hit me with a new patch.

/be
brendan: taras: agree on rewriting fastload -- i still favor undump insanity at some level, but whatever you do, be aggressive, win big perf
[3:29pm] brendan: and allow other parts of the system to partake more easily than by adding nsISerializable cruft

To expand a bit, I say clean up the internal layering with inline "read", "seek", "tell" etc. helpers that infallibly update the members to-do with the memory mapped file data, and let's land that.

Then, let's have a new fasl design that kicks ass big-time. The old design was from eight years ago, and XPCOM- and time-constrained. We can do much better.

/be
I put magic back in. it's not checksummed because it is memcmp()ed on startup anyways.

This is the most minimal fastload patch that gets rid of file io in the normal startup cases. It isn't worth optimizing further without a big api redo(in order to take advantage of mmap more, in the future we should store continious chunks in fasl files)

I changed nsBinaryStreams because it 
a) Allowed me to get rid of ReadSegments uglyness
b) Once I changed strings to be native-endian, it seemed like I should make the rest of the data types native too.
Attachment #402937 - Attachment is obsolete: true
Attachment #403851 - Flags: review?(brendan)
Attachment #402937 - Flags: review?(brendan)
found a problem with patch submitted for review nsFastLoadFileUpdater requests an outputstream...but since I no longer rely on nsInputStream to read the fasl stuff, the output stream uses PR_TRUNCATE, which in turn causes rug to be pulled from under mmap.

So my proposed solutions:
a) Get rid of PR_TRUNCATE in http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#409 . This may cause a bloated fasl file and more checksumming overhead because of it.
b) Add a isReading property nsIFastLoadFileIO. This would make PR_TRUNCing work as it did before
c) Modify nsIFileInputStream(or nsFileInputStream) to have an fd property such that it can return a nspr descriptor which would allow me to do mmap on InputStreams...this would be useful when we decided to add mmap elsewhere.

I'm leaning towards option c) as it would be a useful API addition
If we're going to store endian-specific data in the fastload file, part of its basic integrity checking at the beginning should include a a check that the endianness matches (e.g., an checking an integer that was written out).  Or do you consider the endianness of the 32-bit checksum sufficient?

Also, shouldn't you be changing MFL_VILE_VERSION?
(In reply to comment #40)
> Or do
> you consider the endianness of the 32-bit checksum sufficient?

Right.

> 
> Also, shouldn't you be changing MFL_VILE_VERSION?

I changed MFL_FILE_VERSION, is that insufficient?
(In reply to comment #41)
> (In reply to comment #40)
> > Or do
> > you consider the endianness of the 32-bit checksum sufficient?
> 
> Right.

Assuming a randomly distributed checksum, there's a 1/65536 chance that it fails.

> > Also, shouldn't you be changing MFL_VILE_VERSION?
> 
> I changed MFL_FILE_VERSION, is that insufficient?

That's what I meant.  I thought I didn't see it in the patch, but now I do see it.
(In reply to comment #42)
> 
> Assuming a randomly distributed checksum, there's a 1/65536 chance that it
> fails.

Actually there is a 1/65536 chance that the checksum will be endianness-independent. But you also have 65536/4,294,967,295 chance of hitting such a number(assuming a number of likely faulty assumptions). 

I think it's safe to say that if someone's profile dir changes endianness AND their checksum matches, they are a very lucky person.

There is also the chances of someone's endianness changing :) I think I'm not going to worry about someone fuzzing their mfasl files to making their browser crash.
Endianness changing is a real possibility in the case of:
 * profiles on shared/network drives shared between systems
 * migration between systems (e.g., getting a new Intel mac to replace a PPC one)
> There is also the chances of someone's endianness changing

Happens to most MIT students on a daily basis (likely multiple times), since the homedir is in AFS and the campus workstations are half SPARC and have x86.
(In reply to comment #45)
> > There is also the chances of someone's endianness changing
> 
> Happens to most MIT students on a daily basis (likely multiple times), since
> the homedir is in AFS and the campus workstations are half SPARC and have x86.

Well then, they'll be suffering frequent slow startup from this change. I think BE machines should ifdef in a diff filename for fasl if that really is a problem.
> Well then, they'll be suffering frequent slow startup from this change.

Yes.  But if there's an endianness marker in the file they at least won't suffer startup crashes...

Having the two separate fastload files with different endianness might be a good followup, though.
> Having the two separate fastload files with different endianness might be a
> good followup, though.
Can't you just append "LE"/"BE" to the file name (depending on the endianness of the writer), making it unnecessary to add an indicator into the file itself and fixing this followup at the same time?
Attached patch rev 2 (obsolete) — Splinter Review
This does the filename trick on BE platforms. It also adds an api to avoid nsFastFileUpdater truncating+crash/burn behavior in the last patch.
Attachment #403851 - Attachment is obsolete: true
Attachment #404096 - Flags: review?(brendan)
Attachment #403851 - Flags: review?(brendan)
Do the files need to be specialized based on OS or 32-vs-64-bit, too?
(In reply to comment #50)
> Do the files need to be specialized based on OS or 32-vs-64-bit, too?

No, the code use pruint32 and other types that have a fixed size. I checked that structs pack the same on my 32 and 64bit x86. Though it does seem like a good idea to put in assertions regarding struct sizes.
PR_STATIC_ASSERT, presumably!
Actually. I'm thinking of not doing endian-nativeness in this patch. It's a fairly insignificant win compared to the io cost savings. Given that strings aren't null-terminated yet, we don't even get reduced mem usage.

My main interest in native-endianness is that it seemed cleaner, and getting rid of ReadSegments uglyness was nice. 

On the other hand, there seem to be .mfasl portability concerns and my nsBinaryStream changes upset xpcshell-tests test_charset_conversion.js. From visual inspection looks like opera migration code relies on endianness too.
Comment on attachment 404096 [details] [diff] [review]
rev 2

cancelling review for now. Got a patch with less binarystream + more deCOM coming
Attachment #404096 - Flags: review?(brendan)
Attached patch minimal mmap (obsolete) — Splinter Review
Ok, no deCOM. As soon as I try to remove nsIFastloadIO, the house of OO cards collapses on me and I end up with an extra 50K of changes. I think i'd rather replace fasl stuff in bug 520309, instead of poorly retrofitting an existing system.
Attachment #404096 - Attachment is obsolete: true
Attachment #404728 - Flags: review?(brendan)
This probably coud use another madvise call specifying MADV_NORMAL in the bottom of ComputeChecksum, but I don't have a way to measure that. 


With this change, ComputeChecksum is almost 2x faster, than the code currently on trunk
Attachment #404728 - Attachment is obsolete: true
Attachment #404911 - Flags: review?(brendan)
Attachment #404728 - Flags: review?(brendan)
tracking-fennec: --- → ?
Keywords: mobile
Comment on attachment 404911 [details] [diff] [review]
rev 4, now with madvise


>+    if (ret)
>+        perror("madvise");
>+#endif
>+

Accidentally left some debugging stuff in. This patch seems to result in 20% startup speedup on n810. Would like this landed soon so we could look into 192ing it.
(In reply to comment #55)
> Created an attachment (id=404728) [details]
> minimal mmap
> 
> Ok, no deCOM. As soon as I try to remove nsIFastloadIO, the house of OO cards
> collapses on me and I end up with an extra 50K of changes. I think i'd rather
> replace fasl stuff in bug 520309, instead of poorly retrofitting an existing
> system.

This is good, leveraged thinking. Often there's a win of some kind without pulling on the XPCOM ball of string, and there's a bigger win with more work going around the whole ball of string.

/be
Comment on attachment 404911 [details] [diff] [review]
rev 4, now with madvise

>+    PRUint32 count = PR_MIN(mFileLen - mFilePos, aCount);
>+    // errors returned from the writer get ignored

Nit, may apply elsewhere in the patch, please check: Full sentences in comments, meaning capitalize and put a period at the end. Also a blank line before to avoid crowding the comment from above. ;-)

>+    aWriter(this, aClosure, (char*)(mFileData + mFilePos), 0,
>+            count, aResult);

Why ignore the error? Seems unsafe but I haven't thought it through fully -- note that Fletcher's checksum won't catch all deletion errors.

>     PRUint32 checksum = 0;
[deleted lines]
>+    // skip first 2 fields

Same comment nits here, e.g. Last nag from me about 'em tho.

>+    if (mHeader.mVersion != MFL_FILE_VERSION
>+        || mHeader.mFooterOffset == 0
>+        || memcmp(mHeader.mMagic, magic, MFL_FILE_MAGIC_SIZE))
>         return NS_ERROR_UNEXPECTED;

Nits: || at end of lines, brace the consequent cuz the condition is multiline.

>@@ -1925,6 +1830,11 @@ nsFastLoadFileWriter::Open()
>     return Init();
> }
> 
>+/**
>+ * FIXME: bug #411579 (tune this macro!) Last updated: Jan 2008
>+ */
>+#define MFL_CHECKSUM_BUFSIZE    (6 * 8192)
>+

This is deadwood, remove it.

>+    nsIFile *mFile; // .mfasl file
>+    PRFileDesc *mFd; // OS file-descriptor
>+    PRUint32 mFileLen; // length of file
>+    PRUint32 mFilePos; // current position within file
>+    PRFileMap *mFileMap; // nspr datastructure for mmap
>+    PRUint8 *mFileData; // pointer to mmaped file

Please indent // comments to start in the same column, on a 1 mod 4 boundary if not 1 mod 8 (column numbering starts at 1).

Forgot to ask whether the PR_MemMap call should have its return value checked.

r=me with the above addressed, thanks.

/be
Attachment #404911 - Flags: review?(brendan) → review+
http://hg.mozilla.org/mozilla-central/rev/98099ee84126(In reply to comment #59)
> Why ignore the error? Seems unsafe but I haven't thought it through fully --
> note that Fletcher's checksum won't catch all deletion errors.

That write is just the readsegments callback. Other readsegements behave similarly. 
> >+/**
> >+ * FIXME: bug #411579 (tune this macro!) Last updated: Jan 2008
> >+ */
> >+#define MFL_CHECKSUM_BUFSIZE    (6 * 8192)
> >+
> 
> This is deadwood, remove it.

Removed the comment, buffer is still used for checksumming after writes. I tried to make the writer use faslreader for checksumming(since one would exist 90% of the time, but that change snowballed out of control)

> Forgot to ask whether the PR_MemMap call should have its return value checked.

Nope. Actual work + errors happen in PR_CreateFileMap.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attachment #405580 - Flags: review?(brendan)
Attachment #405580 - Attachment is obsolete: true
Attachment #405582 - Flags: review?(brendan)
Attachment #405580 - Flags: review?(brendan)
Comment on attachment 405582 [details] [diff] [review]
misc fixes spotted by sun's compiler(without libffi noise)

>diff --git a/xpcom/io/nsFastLoadFile.cpp b/xpcom/io/nsFastLoadFile.cpp
>--- a/xpcom/io/nsFastLoadFile.cpp
>+++ b/xpcom/io/nsFastLoadFile.cpp
>@@ -871,7 +871,7 @@ nsFastLoadFileReader::Open()
> 
>     mFileLen = (PRUint32) size;
> 
>-    PRFileMap *mFileMap = PR_CreateFileMap(mFd, mFileLen, PR_PROT_READONLY);

Whoops, very embarrassing to have missed that!

>@@ -881,7 +881,7 @@ nsFastLoadFileReader::Open()
>         return NS_ERROR_FAILURE;
>     
> #ifdef XP_UNIX
>-    madvise(mFileData, mFileLen, MADV_WILLNEED);
>+    madvise((char*)mFileData, mFileLen, MADV_WILLNEED);

Maybe a space before the * to let things breathe, if not also a space after the cast. Local style and all that.

r=me of course.

/be
Attachment #405582 - Flags: review?(brendan) → review+
Depends on: 522141
Taras, I'm confusing.

Your patch is to use
+    madvise((char*)mFileData, mFileLen, MADV_WILLNEED);

But your commit is to use
+#if defined(XP_UNIX) && !defined(SOLARIS)

Why?
(In reply to comment #66)
> Taras, I'm confusing.
> 
> Your patch is to use
> +    madvise((char*)mFileData, mFileLen, MADV_WILLNEED);
> 
> But your commit is to use
> +#if defined(XP_UNIX) && !defined(SOLARIS)
> 
> Why?

Hesitation. Wasn't sure if I should rely on madvise on solaris since 
a) I realized that everywhere else(ie mac+linux) it takes a void*
b) I don't know if it makes any difference on solaris

After chatting with Brendan I realized that I probably should've pushed the (char*) version.
Depends on: 526176
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: