Open Bug 288473 (js-ccache) Opened 17 years ago Updated 3 years ago

Cache compiled javascripts

Categories

(Core :: DOM: Core & HTML, enhancement)

enhancement
Not set
normal

Tracking

()

People

(Reporter: erik, Assigned: millerdevel)

References

Details

(Keywords: perf)

Attachments

(2 files, 4 obsolete files)

Much like bug 219276 (copy pasted text and changed stylesheet to javascript)

We could put parsed javascripts in the cache to avoid reparsing them. When
surfing a site you will probably reload the same JS file(s) for pretty much
every page you visit so being able to just grab a parsed javascript seems like a
nice thing.

There are of course some issues. First of all it's a perf vs. memory issue. Is
it reall better to keep other things in the cache then two copies (the normal
filestream and the parsed objects) then to just reparse the file every time? And
is it worth the extra code?

...

Is js parse tree already cached in the XUL cache?
Severity: normal → enhancement
OS: Windows XP → All
Hardware: PC → All
This is not a JS engine bug, SpiderMonkey knows nothing about an HTTP-conforming
cache, Necko's or some other embedding's.

Trying DOM, but perhaps DOM level 0 is better.

/be
Component: JavaScript Engine → DOM
Assignee: general → general
QA Contact: general → ian
Duplicate of this bug: 445147
brendan, ideally we'd be able to cache the compiled script... except that we need to be able to change the principals if the script is loaded from a different page.  I suppose we could force the cache to be per-domain...

This came up today with gmail: about 20% of the gmail pageload time is under js_CompileScript.
How much of this is under eval? See bug 454184.

/be
Also, I hope the code being compiled is only ever loaded from one origin. Explicit load balancing hostname-1 through hostname-n.foo.com is so 1994. Erik, is gmail loading from a fixed origin?

/be
None.  It's directly under JS_EvaluateUCScriptForPrincipals.  I'd say it's an inline script, but it might have an @src and just have gotten prefetched...
A couple of notes.

JS API exists to precompile and (re-)execute scripts, used by XUL and XBL. The DOM could use this, but it would lose some JSOPTION_COMPILE_N_GO optimizations, and it would need to index the cache by a key that includes the origin or other ingredients of the principals compiled into the script.

We could add new API to "re-link" against different global objects and principals to relieve the cache from having to worry about this, at some cost in re-execute performance (probably small).

Making the cache data-associative rather than name-associative would be good for all those Ajax library copies (uncrunched, or crunched the same way) that are not yet on the Google or Yahoo! CDN. The idea is to hash the script as it loads, save the hash and any other hints in the compiled-script cache, and if it reloads from anywhere with the same hash and other hings (need SHA-1 or better, not MD5), use the cached bytecode.

Sites do have to reload the same script when they cross a domain boundary, I'm told (metaweb.com pals). So this helps intentional replication across origins as well as all the organic .js file copy propagation in the field. With the latter kind of copying, we don't know (at least I know of no studies) how often the copy is mutated or re-crunched.

All of this is do-able and would make a good *not*-my-first-bug project for some game hacker.

/be
> The DOM could use this, but it would lose some JSOPTION_COMPILE_N_GO
> optimizations

Details?  Would doing this only for scripts over a certain size make sense?  Or are the optimizations not constant-time wins?

Hashing by data would be a win in terms of not having to hook into the necko cache and such.  Just get the data from necko, however it gets it, then look in or private cache.  I assume said private cache could be fastloaded if we wanted to persist it across browser invocations...  That said, would fastloading a script (from disk, with attendant disk access) be any win over compiling an in-memory string if you already have the latter?  It's not obvious.  Could use data!

We do have SHA-1 via NSS, of course.  The only question there is performance, again.
(In reply to comment #5)
> Also, I hope the code being compiled is only ever loaded from one origin.
> Explicit load balancing hostname-1 through hostname-n.foo.com is so 1994. Erik,
> is gmail loading from a fixed origin?
> 
> /be

Gmail loads all of its JS from the same domain. Gmail is a bit of a special case though. We load all the js through an html file in an iframe to be able to do the loading progress bar. If we could get a 20% speedup by using a single js file again we would of course look into other ways of providing user feedback during page load.
Assignee: general → nobody
QA Contact: ian → general
Duplicate of this bug: 512358
Alias: js-ccache
Summary: Cache parsed javascripts → Cache compiled javascripts
As it is now, the cache metadata facility cannot be used to store serialized bytecode as it requires the both the key and value to be null terminated strings. This patch changes the value so that it is a (length, blob) pair instead of a null terminated string. To avoid API breakage, the existing get and set functions are kept as expecting a null terminated value, and the new functionality is exposed through the Get...Len and Set...Len functions.

This is a pre-requisite for being able to attach XDR serialized bytecode to cache entries via the existing metadata facility.
API breakage seems permissible here, since it's mechanically easy to work through and is less likely to result in data loss.
Note that we could also store base64-encoded xdr if we want to keep it null-free.  We do this for other binary blobs we need to stick in cache metadata (e.g. security info).  But that has obvious performance costs; I'm just as happy to have a proper blob api.
Make it a proper API! Be bold! Live the adventure! (Get me to sr it.)
Attachment #454605 - Flags: review?(michal.novotny)
Attached patch metadata patch v2 (obsolete) — Splinter Review
The metadata visitor was broken as-is because it was expecting all values to be null terminated strings. I've added a Len suffixed version of the visitor function, as I suspect that a user would want to handle the display of strings and binary data differently.
Attachment #454605 - Attachment is obsolete: true
Attachment #455709 - Flags: review?
Attachment #454605 - Flags: review?(michal.novotny)
Attachment #455709 - Flags: review? → review?(michal.novotny)
Attached patch xpcom io streams v1 (obsolete) — Splinter Review
Two new xpcom io streams needed to be added that would stream into and out of a buffer.

Serializing a script that has principals via XDR causes the XDR code to expect there to be a nsIObject{Input,Output}Stream as the xdr->userdata. As I want all of the serialized data to be dumped to a buffer in memory, this required writing some custom nsI{Input,Output}Streams.
Attachment #460722 - Flags: feedback?
Attachment #460722 - Flags: feedback? → feedback?(jwalden+bmo)
Attached patch js bytecode caching v1 (obsolete) — Splinter Review
This is the current status of the caching compiled js. It is no where near landable quality, contains some form of debug logging about every 3 lines, and I haven't remotely bothered to apply style clean-ups to it. That said, if anyone looks and sees massive issues please be loud about it.

For reasons of simplicity, it currently only handles external js files (anything that matches "http.*\.js"). If anyone has a good idea how to uniquely identify a piece of script within a webpage based on any of the information that's available in the parameters, let me know.

It does work though, and I can generate perf numbers for anyone that would be interested.
As a random example, loading jquery off of mozilla.org resulted in:
  No caching of bytecode: 74ms (this is how it is now)
  With caching of bytecode:
    not in cache:
        compile: 45ms
        execute: 40ms
        saving: 6ms
      Total: 91ms
    in cache:
        loading: 15ms
        running: 40ms
      Total: 55ms

The one-time cost of cashing is noticeable on something like gmail, so I wouldn't recommended it to always be enabled. On the other hand, the effect of having much of gmail cached is equally noticeable, especially in terms of how long it takes gmail to get past its initial loading bar screen.
> Two new xpcom io streams needed to be added 

We already have an input stream that reads from a byte buffer: nsIStringInputStream (admittedly not all that well-named).  Was there a reason not to use it?

For the output stream, you could create a new class (though it would need to be a lot more careful than the current one about various things, actually implement the stream API, etc), or you could use a pipe, pass the output end to the serializer and then ReadSegments the result into your string, doing only one allocation based on the (now known) length.  Either way would work, I think.
Another question: 45 + 40 > 74, right?  Is that due to lack of COMPILE_N_GO optimizations, or something else?
(In reply to comment #19)
> We already have an input stream that reads from a byte buffer:
> nsIStringInputStream (admittedly not all that well-named).  Was there a reason
> not to use it?

Not really... I was unaware of its existence and I didn't look into it in my quick glance over xpcom/io/ due to its name.


(In reply to comment #20)
> Another question: 45 + 40 > 74, right?  Is that due to lack of COMPILE_N_GO
> optimizations, or something else?

Yes. The main difference in the code is that COMPILE_N_GO is missing. A few people have mentioned the possibility of a 'linking' phase that binds a script to a global object so that we can get the COMPILE_N_GO optimizations back, but I'm not aware of that code existing yet.
Comment on attachment 460722 [details] [diff] [review]
xpcom io streams v1

I think bz is probably right about reusing nsStringInputStream here.  The older class certainly does more, but it's about the right fit for what's being done here (its not-quite-Panglossian naming aside).

A few brief comments on what you did write, for edification:

>diff -r a4a0c84f62d3 xpcom/io/nsIMemoryInputStream.idl

>+ * Portions created by the Initial Developer are Copyright (C) 2001

There's a license boilerplate page on mozilla.org somewhere, easily findable, makes it harder to avoid mistakes like the above copyright year.


>+[scriptable, uuid(64356aa1-bfb1-4241-bfe6-c5d1520df1c0)]
>+interface nsIMemoryInputStream : nsIInputStream
>+{
>+    [noscript] void setBuffer(in string value, in PRUint32 len);
>+};

Interfaces containing only unscriptable methods generally shouldn't be scriptable themselves.  :-)


>diff -r a4a0c84f62d3 xpcom/io/nsMemoryStream.cpp

>+nsMemoryOutputStream::nsMemoryOutputStream()
>+{
>+    mBuffer = (char*)malloc( STARTING_BUFFER_SIZE );

When writing these sorts of things for XPCOM, you need to be aware of allocator/deallocator mismatch issues -- try to allocate something with Mozilla's malloc implementation [jemalloc] and free it with msvcrt80.dll!free and things will go wrong.  NS_alloc and NS_free (I think there's NS_realloc, but I'm not sure) are the proper XPCOM idioms for allocating/freeing raw memory.


>+    if (aCount + mBufferLen > mMaxSize)
>+    {
>+        //FIXME: If (aCount+mBufferLen) has the highest bit set, then mMaxSize will overflow when
>+        //       trying to find the next power-of-two that's larger. As this would require >2GB
>+        //       of data to be sent into the output stream, I'm not particularly worried, but
>+        //       should I be accounting for it possibly happening anyway?

I consider paranoia a virtue when working on code whose input may be (even partially) determined by malicious third parties.  Unlikely, near-impossible, sure, but it's still an assumption waiting to be violated.


>+    nsresult rv = aWriter(this, aClosure, mBuffer, 0, toConsume, _retval);
>+    //FIXME: if the writer failed, should we still count any bytes it might have consumed,
>+    //       or should we skip over these next two lines?
>+    mBuffer += *_retval;
>+    mBufferLeft -= *_retval;
>+    return rv;

Out parameters are garbage when the method being called fails, so you'd skip in cases like this.
Attachment #460722 - Flags: feedback?(jwalden+bmo)
(In reply to comment #22)
> >+[scriptable, uuid(64356aa1-bfb1-4241-bfe6-c5d1520df1c0)]
> >+interface nsIMemoryInputStream : nsIInputStream
> >+{
> >+    [noscript] void setBuffer(in string value, in PRUint32 len);
> >+};
> 
> Interfaces containing only unscriptable methods generally shouldn't be
> scriptable themselves.  :-)

Disagree: we may still want script to be able to pass them around, even if they don't have any more methods in the most-derived interface than in the parent.  For streams it's pretty common for script to create-and-hand-off, without calling methods on them directly.
Perhaps.  I still think it stands as a general rule, with streams being a reasonable exception.
Fixed a bug where an out-param wasn't being assigned to.
Attachment #455709 - Attachment is obsolete: true
Attachment #463373 - Flags: review?
Attachment #455709 - Flags: review?(michal.novotny)
Attachment #463373 - Flags: review? → review?(michal.novotny)
Many updates: (If you're a tl;dr'er, just read the first sentence of each bullet.)

 * All review comments and known issues have been fixed. The entire xpcom io streams patch has died. RIP. I've tried to do all of the necessary cleanups and error checking. As usual, if you see any issues in the code, let me know.

 * The code should now detect if the cached entry has been updated and update the corresponding compiled version so that the two stay in sync. However, there's a slight issue. Whenever we update the metadata, the Last Modified time on the cache entry gets updated. What I'd like to do is attach the time that the time that the cache entry was last updated to the cache entry as metadata. So if I record the modification time, and set it as a metadata on the cache entry, this will cause the Last Modified time on the cache entry to be changed. This produces a chicken-and-the-egg situation. My current solution is to record the last modified time as the cache entry's last modified time plus a small constant so that it will always be slightly larger (and thus newer) than the real last modified time. This feels VERY hack-ish, but I don't know of a better way to do it. Suggestions welcome.

 * COMPILE_N_GO is on for the initial run. For giggles, I enabled it while compiling the script. XDR'ing still works just fine, and the XDR'd script can be loaded and run again (although presumably without the COMPILE_N_GO optimizations applied). I compared the time of parsing and running with COMPILE_N_GO enabled and disabled, and couldn't find any performance difference (at least on the order of milliseconds).

 * XDR'd scripts can be massive. For your average script, the XDR'd equivalent is on the order of 1x to 2x larger. However, for your large, industrial-grade, minified js files, it hurts badly. A 55k minified js file bloats up to 240k when XDR'd. A 125k js file pulled in off of facebook's cdn bloated up to 750k when XDR'd. It's still unknown what effects this will have on the current network cache code, as none of the metadata code was written with the expectation that it would be heavily used or contain large amounts of data. (The only current uses of the metadata are holding if the http was a GET or POST request and holding onto the http headers. This sums to less than 500 bytes.)

 * I can no longer reproduce getting ~74ms with all of my code disabled. I've gone back through and re-done all of my perf measurements. Apparently I shouldn't be comparing perf data I measured from the beginning of my internship with perf data I'm getting with my patches applied to a current m-c. A run from current m-c tip results in EvaluateString() taking 83ms to compile and run jQuery (this is _with_ COMPILE_N_GO enabled).
   That said, this means that the overhead of caching bytecode is fairly minimal. The main overhead is saving the bytecode, which is done _after_ we've run the script. Hopefully this should keep the amount of user-noticeable page load time differences to a minimum. The current code seems to be able to execute multiple loaded scripts in parallel (I've seen printf's from different runs of the function overlapping), so I'm staying hopeful that the extra time shouldn't be that bad. If anyone knows a way to measure page load time in a way that's more concrete than "it seems faster/slower", please speak up.
Attachment #460722 - Attachment is obsolete: true
Attachment #460731 - Attachment is obsolete: true
Duplicate of this bug: 640907
Michal, do you know when you'll get a look at this patch?
I'll do it during this week.
FWIW, I've sped up JS parsing quite a bit recently, and have a few more improvements up my sleeve;  see bug 564369, bug 584595, bug 588648, bug 639420, bug 637549.  cdleary has also done some improvements, eg. bug 549658.  So that should be taken into consideration for this bug.
Comment on attachment 463373 [details] [diff] [review]
metadata patch v3

First of all, I don't like that the distinction between string and binary data is based on a sign. I see 2 alternatives:

1) If we really need to remember how the metadata was stored, we should reserve some bit in the newly added PRUint32 (e.g. see Cache Location Format in nsDiskCacheMap.h). It should be more than enough to reserve 24 bits for the size. Btw. we should limit the size of the whole metadata, Now we limit the data but not the metadata.

2) We could simply decide whether call VisitMetaDataElement or VisitMetaDataElementLen on the basis of following condition:

if (data[datalen] == 0 && datalen == (strlen(data)+1))

I.e. any null terminated string that doesn't contain a null can be passed to VisitMetaDataElement.


> +/* The origional code here was created to store metadata as contiguous blocks of
> + * 
> + *     +------------+--------------+
> + *     | string key | string value |
> + *     +------------+--------------+

You don't need to describe how it used to be. Just document the current format.


> +    mBuffer->AppendLiteral("[binary data]");

Why not to dump binary data like in case of the cached content?


> + * The Original Code is test_bug248970_cache.js.
> + *
> + * The Initial Developer of the Original Code is
> + * Ehsan Akhgari.
> + * Portions created by the Initial Developer are Copyright (C) 2008

Correct the wrong information.


Data passed to UnflattenMetadata() needs to be checked for correctness. See bugs #634920 and #529733.


You also need somewhere to swap and unswap the value size (probably in FlattenMetadata and UnflattenMetadata). AFAIK all data structures in the cache do this, so one profile can be used on big-endian as well as on little-endian architecture.
Attachment #463373 - Flags: review?(michal.novotny) → review-
(In reply to Nicholas Nethercote [:njn] from comment #30)
> FWIW, I've sped up JS parsing quite a bit recently, and have a few more
> improvements up my sleeve;  see bug 564369, bug 584595, bug 588648, bug
> 639420, bug 637549.  cdleary has also done some improvements, eg. bug
> 549658.  So that should be taken into consideration for this bug.

This may be why we could no longer measure a speed up from storing preparsed startup cache entries in bug 696095.

I would be careful with this change as increasing size of cache entries(as this change is very likely to) would increase amount of disk IO negating any cpu-usage benefits.
(In reply to Taras Glek (:taras) from comment #32)
> (In reply to Nicholas Nethercote [:njn] from comment #30)
> > FWIW, I've sped up JS parsing quite a bit recently, and have a few more
> > improvements up my sleeve
> 
> This may be why we could no longer measure a speed up from storing preparsed
> startup cache entries in bug 696095.

The overall improvement from my changes was a ~1.8x speed-up in parsing speed: https://blog.mozilla.com/nnethercote/2011/07/01/faster-javascript-parsing/.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.