Closed Bug 900789 Opened 11 years ago Closed 10 years ago

Add cache to the load function/CLI of the JS Shell.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(3 files, 2 obsolete files)

We need to add a cache JS API to emulate the future interactions between the network cache (Bug 900255) and the JS Engine.  We should also make the JS shell load function use this cache API if a command line option is enabled, such as we can test cache interactions on benchmarks in order to evaluate the benefit on these.
No longer blocks: 900669
The idea is to be able to do some testing on our ability to cache the bytecode used during the start-up without having to build a browser.  This might also help at minimizing test cases.

One of the challenge(?) would be to enable compileAndGo script in the XDR Decoder.
Assignee: general → nicolas.b.pierron
Summary: Add a Cache JS API to improve loading time. → Add cache to the load function/CLI of the JS Shell.
Depends on: 920322
Attachment #810272 - Flags: review?(luke)
Attachment #810273 - Flags: review?(luke)
Oops, forgot about these reviews, sorry for the delay.

It seems like this patch would leave a bunch of .cache files scattered around next to the test files which seems undesirable.  With asm.js caching, I added a --js-cache option which creates a directory to contain all the cache files.  This is useful in fixing, e.g., bug 932940.  Additionally, jit_tests clears the directory before every test run and, when running multithreaded, creates a new subdir for each process so they don't clobber each other's cache files (which lets you do things like assert that a certain sequence results in a cache hit).  Anyhow, I named it --js-cache instead of --asmjs-cache so that we could reuse it for other types of caching, like this.
Comment on attachment 810273 [details] [diff] [review]
RunFile instrumentation of the cache.

Cancelling for now pending response to comment 4.
Attachment #810273 - Flags: review?(luke)
Attachment #810272 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #4)
> I added a --js-cache option which creates a directory to contain all the
> cache files.  This is useful in fixing, e.g., bug 932940.  Additionally,
> jit_tests clears the directory before every test run […]

Thanks, but the reason why I added it that way was exactly to be able to have it persist across runs of the test suite, as I am trying to XDR the test suite, as a practice test.  Basically, at the moment I am running:

  jit-tests --args="--bytecode-cache=save" -j1;  jit-tests --args="--bytecode-cache=load"

Later, I will add a function to run a file similar to what you did with AsmJS cache.  But in the mean time, I keep the above one as this is a good way for finding bugs in my code.
This patch is taking another approach, as opposed to asmjs cache we do not need to coalesce asmjs modules into a single entry as the bytecode cache is supposed to be coming from the httpd cache.

So instead of doing some file manipulation as I first did for improving XDR based on the test suite.  I modified the evaluate function to store the result of the cache in an ArrayBuffer.
Attachment #8356625 - Flags: review?(luke)
Comment on attachment 8356625 [details] [diff] [review]
Instrument evaluate functions to save/load the bytecode.

Change this from review to feedback, as even if this patch is practical for the test suite, it does not protect against fuzzers which might corrupt the content of the ArrayBuffer which serve as a fake cache.

I will add a JSClass with a reserved slots to hide the cache content from fuzzers.  Later when XDR is finished to be fuzzed for correctness, we can look into checking for corruptions.
Attachment #8356625 - Flags: review?(luke) → feedback?(luke)
Comment on attachment 8356625 [details] [diff] [review]
Instrument evaluate functions to save/load the bytecode.

The approach sounds good (particularly the part you mentioned about making a new opaque object type so that this API can get fuzzed).  Also it's great to avoid doing file I/O if you don't need it.

To wit: passing --js-ccache without --js-cache-per-process will persistently reuse an existing cache directory which I think would suffice for ad hoc (non-parallel-jit_test) testing.
Attachment #8356625 - Flags: feedback?(luke) → feedback+
Depends on: 958172
Attachment #8356625 - Attachment is obsolete: true
Attachment #8358410 - Flags: review?(luke)
Could you explain what these new evaluate options are?  Could you also add this explanation to the JS_FN_HELP("evaluate", ...) entry in js.cpp?

Second: at a high-level, why is it necessary to have a weak-map acting as a cache?  I was thinking a simpler setup could suffice: an evaluate option 'serializeXdr' that, when set, made the return value of Evaluate() be an opaque object owning the serialized bytes, and then you overload Evaluate() such that, if it is given this opaque object as the 'code' argument, it would deserialize and then run it.  I'm not sure how modeling a real cache via WeakMap would be useful (on the contrary, since its weak, it seems like it would add non-determinism to the results).
(In reply to Luke Wagner [:luke] from comment #11)
> Could you explain what these new evaluate options are?  Could you also add
> this explanation to the JS_FN_HELP("evaluate", ...) entry in js.cpp?

I add 3 options to evaluate:
 - saveBytecode: Save the bytecode corresponding to the given source.
 - loadBytecode: Load the bytecode corresponding to the given source.
 - assertEqBytecode: Assert if the encoded code is not identical to the decoded code.

> Second: at a high-level, why is it necessary to have a weak-map acting as a
> cache?  I was thinking a simpler setup could suffice: an evaluate option
> 'serializeXdr' that, when set, made the return value of Evaluate() be an
> opaque object owning the serialized bytes, and then you overload Evaluate()
> such that, if it is given this opaque object as the 'code' argument, it
> would deserialize and then run it.  I'm not sure how modeling a real cache
> via WeakMap would be useful (on the contrary, since its weak, it seems like
> it would add non-determinism to the results).

I am no longer exposing the key as part of the interface of the evaluate function, because if the key is visible, then fuzzers would be able to provide non-matching key & scripts.  We cannot make the cache entry details visible by fuzzers otherwise they might call the evaluate function with a cache entry which does not correspond with the source, which would be terrible.  This is the reason why I removed the name from saveBytecodeTo and loadBytecodeFrom.

The String object is used as a unique key for the WeakMap.  I thought that the WeakMap were keeping entries if their keys were still alive?  Another option would be to make an opaque type similar to StringObject which have 2 reserved pointers (String & ArrayBuffer), and mutate the array buffer with the last saved bytecode (as if this was a cache entry).  In fact this might easier than manipulating weakmaps.

Note that, if we return a cache entry, we still need the source for handling lazy functions, in case the script does not behave identically the second time.  Returning the cache object instead of the computed value implies that we cannot check if the returned values are identical without using the global.
Delta:
 - Remove WeakMap modification & usage.
 - Make a CacheEntryObject instead of a BytecoceCacheObject.
 - Document the 3 new options.

Usage:
 - Wrap the source in a CacheEntryObject, then give it as argument of the evaluate function.
 - Set the saveBytecode / loadBytecode to true to save/load it in the CacheEntry given with the source.
 - The cache entry hide both the source and the XDR output, as we do not want fuzzers to mutate them.

Note: This approach is easier to understand than the previous one, and we do not care about the caching mechanism implementation, we only care about entries within the hypothetical cache.
Attachment #8358410 - Attachment is obsolete: true
Attachment #8358410 - Flags: review?(luke)
Attachment #8361213 - Flags: review?(luke)
Comment on attachment 8361213 [details] [diff] [review]
Instrument evaluate functions to save/load the bytecode.

Review of attachment 8361213 [details] [diff] [review]:
-----------------------------------------------------------------

This is a nice simplification, but I think we can go simpler still and make only two minor changes to Evaluate:
 - add an option "encode" which tells Evaluate() to call JS_EncodeScript after JS::Compile then immediately return that encoded object without calling JS_ExecuteScript.
 - if the 'code' argument to Evaluate is a serialized script object, instead of calling JS::Compile, it does JS_DecodeScript and then proceeds like normal.

For the assertEqBytecode option, there is no need to involve Evaluate.  Just make a simple JSNative AssertSerializationIdentity that takes as an argument a serialized object, calls JS_Encode(JS_Decode(obj)) and asserts the equality.  With these changes, you won't need the JS_InitClass, prototype, etc.  Just a js::Class and plain C++ function that constructs one, given the XDR'd bytecode as an argument.

::: js/src/jit-test/lib/bytecode-cache.js
@@ +13,5 @@
> +
> +  // The generation counter is used to represent environment variations which
> +  // might cause the program to run differently, and thus to have a different
> +  // set of functions executed.
> +  ctx.global.generation = 0;

I don't understand what generation is doing here.

::: js/src/shell/js.cpp
@@ +46,5 @@
>  #include "jsprf.h"
>  #include "jsscript.h"
>  #include "jstypes.h"
>  #include "jsutil.h"
> +#include "jsweakmap.h"

I think you can remove this.

@@ +903,5 @@
> +
> +static const uint32_t cacheEntry_SOURCE = 0;
> +static const uint32_t cacheEntry_BYTECODE = 1;
> +
> +static const JSClass cacheEntry_class = {

Since there isn't any real cache any more, I'd call this SerializedScript (and name this SerializedScriptClass).

@@ +1273,5 @@
> +                    JS_ReportErrorNumber(cx, my_GetErrorMessage, nullptr,
> +                                         JSSMSG_CACHE_SINGLETON_FAILED);
> +                    return false;
> +                }
> +                JS::CompartmentOptionsRef(cx).cloneSingletonsOverride().set(true);

I don't understand this.
Attachment #8361213 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #14)
> Comment on attachment 8361213 [details] [diff] [review]
> Instrument evaluate functions to save/load the bytecode.
> 
> Review of attachment 8361213 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is a nice simplification, but I think we can go simpler still and make
> only two minor changes to Evaluate:
>  - add an option "encode" which tells Evaluate() to call JS_EncodeScript
> after JS::Compile then immediately return that encoded object without
> calling JS_ExecuteScript.

Hum … I am not sure you understand what I am trying to achieve here.  The goal is to encode the script after its execution, not before.  In a similar way as we would populate the cache, we fill the cache entry as the application is idle and waiting for user events.

Encoding the cache before would be a non-sense as only the top-level functions would be compiled and all the rest would only be lazy scripts which would have to be look up in the source file, which would be inefficient knowing that the goal is to postpone the time when we would have to read the source.

>  - if the 'code' argument to Evaluate is a serialized script object, instead
> of calling JS::Compile, it does JS_DecodeScript and then proceeds like
> normal.

Unless you mean that the serialized script object should include the source, I would say No.  The reason is to make a cache which deliberately is smaller than the source.  And the way to achieve this is by encoding LazyScripts.  But if the evaluated script relies on external inputs, such as the generation variable, then we might have to de-serialize a function which is not part of the cache.

Also, we want to encode the script a second time, precisely to handle cases where the start-up behave differently, such as over time, the cache entry might cover all start-up path of the script.  Which is also why I am not always checking that we get the same output all the time, as some scripts might have been de-lazified.

> For the assertEqBytecode option, there is no need to involve Evaluate.  Just
> make a simple JSNative AssertSerializationIdentity that takes as an argument
> a serialized object, calls JS_Encode(JS_Decode(obj)) and asserts the
> equality.

I admit this is a simple test that I can add, but this is not identical because the goal is precisely to run the script before encoding it again.

>  With these changes, you won't need the JS_InitClass, prototype,
> etc.  Just a js::Class and plain C++ function that constructs one, given the
> XDR'd bytecode as an argument.

I am not sure to understand, do you have an example?
(In reply to Luke Wagner [:luke] from comment #14)
> ::: js/src/jit-test/lib/bytecode-cache.js
> @@ +13,5 @@
> > +
> > +  // The generation counter is used to represent environment variations which
> > +  // might cause the program to run differently, and thus to have a different
> > +  // set of functions executed.
> > +  ctx.global.generation = 0;
> 
> I don't understand what generation is doing here.

"generation" is here to fake a different start-up path for an application, this is a determinist environment dependent input, similar to Date.now().

> @@ +903,5 @@
> > +
> > +static const uint32_t cacheEntry_SOURCE = 0;
> > +static const uint32_t cacheEntry_BYTECODE = 1;
> > +
> > +static const JSClass cacheEntry_class = {
> 
> Since there isn't any real cache any more, I'd call this SerializedScript
> (and name this SerializedScriptClass).

I thought of CacheEntry, as this is only the Key-Value pair of one entry.

> @@ +1273,5 @@
> > +                    JS_ReportErrorNumber(cx, my_GetErrorMessage, nullptr,
> > +                                         JSSMSG_CACHE_SINGLETON_FAILED);
> > +                    return false;
> > +                }
> > +                JS::CompartmentOptionsRef(cx).cloneSingletonsOverride().set(true);
> 
> I don't understand this.

This is made to preserve singletons such as we can encode the bytecode of functions after their executions.
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> Hum … I am not sure you understand what I am trying to achieve here.

To reduce the amount of stuff added to js.cpp.

> The goal is to encode the script after its execution, not before.

It seems more direct to just have add an orthogonal lazyParsing option to evaluate.  I guess that doesn't allow you to simulate partial evaluation.  Anyhow, this detail of whether evaluate() evaluates the script or not is mostly orthogonal to the API changes I was suggesting.  If you want to have Evaluate evaluate the script, that's fine.

> >  - if the 'code' argument to Evaluate is a serialized script object, instead
> > of calling JS::Compile, it does JS_DecodeScript and then proceeds like
> > normal.
> 
> Unless you mean that the serialized script object should include the source,

Of course, the source is logically part of the serialized script when you serialize LazyScripts.

> "generation" is here to fake a different start-up path for an application,
> this is a determinist environment dependent input, similar to Date.now().

Ah, this makes sense in the context of lazyscripts; I thought this was somehow supposed to affect the XDR encoding itself.

> > Since there isn't any real cache any more, I'd call this SerializedScript
> > (and name this SerializedScriptClass).
> 
> I thought of CacheEntry, as this is only the Key-Value pair of one entry.

But if you take away the cache...
Ok, thinking about this a little more with the new understanding that what you're trying to test here is the different variations of serialization you can get based on different runtime behavior, I think your API in the patch does make sense.
Attachment #8361213 - Flags: review?(luke)
Comment on attachment 8361213 [details] [diff] [review]
Instrument evaluate functions to save/load the bytecode.

Review of attachment 8361213 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/shell/js.cpp
@@ +965,5 @@
> +static bool
> +cacheEntry_createProto(JSContext *cx, HandleObject glob)
> +{
> +    /* Initialize CacheEntryObject. */
> +    RootedObject bcProto(cx, JS_InitClass(cx, glob, nullptr, &cacheEntry_class,

I don't think you need to JS_InitClass and have a constructor/props array.  Just have a simple global constructor function that creates an instance of CacheEntryClass.

@@ +977,5 @@
> +    return InitCacheEntryObject(cx, bcProto, undef);
> +}
> +
> +static bool
> +cacheEntry_isCacheEntry(JSObject *cache)

Unless you can see differently, I think the right style would be CacheEntry_* for all these names.

@@ +1229,5 @@
> +        // We cannot load or save the bytecode if we have no object where the
> +        // bytecode cache is stored.
> +        if (loadBytecode || saveBytecode || assertEqBytecode) {
> +            if (!cacheEntry)
> +                return false;

Need to report an error before returning false.

@@ +1336,5 @@
> +                JS_ReportErrorNumber(cx, my_GetErrorMessage, nullptr, JSSMSG_CACHE_EQ_SIZE_FAILED,
> +                                     loadLength, saveLength);
> +            } else {
> +                for (uint32_t i = 0; i < loadLength; i++) {
> +                    uint8_t lhs = loadBuffer[i];

Can you just use PodEqual instead of the manual loop?  It doesn't seem necessary to print the index of the mismatch: if you have a mismatch, it doesn't seem like you'd be able to debug it from the console output; you'd need to open up a debugger anyway.

@@ +1353,5 @@
> +            js_free(saveBuffer);
> +            return false;
> +        }
> +
> +        js_free(saveBuffer);

Can you make saveBuffer a ScopedFreePtr instead to avoid the manual js_free calls that are easy to forget?
Attachment #8361213 - Flags: review?(luke) → review+
https://hg.mozilla.org/mozilla-central/rev/f317d2cb357b
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: