Closed Bug 900784 (js-startup-cache) Opened 11 years ago Closed 7 years ago

[meta] Add start-up cache for any JavaScript code.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Performance Impact ?
Tracking Status
platform-rel --- -
firefox57 + disabled

People

(Reporter: nbp, Assigned: nbp)

References

(Blocks 2 open bugs, )

Details

(Keywords: meta, perf, Whiteboard: [shumway:p1] [platform-rel-Facebook])

Attachments

(26 files, 5 obsolete files)

3.65 KB, text/plain
Details
23.67 KB, text/plain
Details
20.80 KB, patch
Details | Diff | Splinter Review
1.71 KB, text/x-python
Details
116.00 KB, patch
nbp
: checkin-
Details | Diff | Splinter Review
1.12 KB, patch
nbp
: review+
Details | Diff | Splinter Review
3.68 KB, patch
jonco
: review+
Details | Diff | Splinter Review
4.08 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
5.82 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
8.27 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
1.57 KB, patch
nbp
: review+
Details | Diff | Splinter Review
13.65 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
1.35 KB, patch
jonco
: review+
Details | Diff | Splinter Review
15.77 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
3.04 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
73.01 KB, patch
jonco
: feedback+
mrbkap
: feedback+
Details | Diff | Splinter Review
2.73 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
6.72 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
9.14 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
12.14 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
14.08 KB, patch
francois
: review+
mrbkap
: review+
Details | Diff | Splinter Review
7.75 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
24.16 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
24.90 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
6.29 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
1.37 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
Gaia's applications are being redesigned to use one iframe per-view. One of the reason of this redesign is to be able to reclaim the memory of views which are no longer loaded and thus collecting the code and the objects of a removed iframe. On the other hand, this redesign implies that every view will need to load a set of script which used to be already loaded and already initialized, thus causing longer transition time which is critical to get as low as possible. In addition to improve Gaia, these optimizations will likely improve the loading time of web-content and of applications on both Desktop (multiple gmail tabs) and FirefoxOS (start-up time of applications). Also, the way sunspider and kraken benchmarks are implemented means that these changes should probably also be visible on these benchmarks as the browser versions are using iframes. On the other hand, the shell version of these benchmarks are using the load function. Thus the cache modifications should be experimented under the "load" function and under a flag, such as we can enable it in the shell when it will go wild in the browser. (to keep similar awfy results as experienced in the browser) We need to consider and evaluate the following order: 1/ Saving the Lazy Script, such as we can avoid validating and parsing the sources. 2/ Saving the bytecode of compileAndGo functions to avoid going through the bytecode emitter. This implies that we have to remove the restriction on compileAndGo from XDR, which probably means removing compileAndGo flag after all … 3/ Saving the hotness of baseline compiled functions. (or even the code ?) We can attach a flag to the function locations, to mention that we want them to be eagerly compiled with baseline. This might have some side effects such as adding/monitoring initial values which will change the behavior.
Depends on: 900789
Note discussion in bug 679942?
(In reply to Boris Zbarsky [:bz] from comment #1) > Note discussion in bug 679942? Thanks for the pointer. As Luke mentioned in Bug 679942 comment 7, this is mostly covered by sharing the scripts in the same runtime (so I feel that Bug 679942 title is miss-leading). The goal of this bug is really to make use of the cache to save *a subset* of the bytecode to improve start-up time of applications/web-pages. Note that caching on a phone is more tricky than on a desktop as we are even more constraint by the I/O latency/bandwidth, and the idea here is to be save at the right time such as the we can load the source later (not on the critical path).
These stats are made by starting refactored applications of Haida[1] and wait until every application becomes idle to start another one. These stats are highlighting the approximate size of bytecode / lazyScript that we should expect to cache (write/read). These stats are the total number of scripts created, and not the total number of scripts remaining. Thus the de-lazified scripts should be removed from the remaining lazyScripts, as we do not intend to save both. [1] https://github.com/vingtetun/gaia/tree/content
This list all the source files seen in each process in the order in which they have been loaded. Note that there is a common set of chrome files which corresponds to what is coming from the Preallocated app which is initialized with preload.js.
Depends on: 917996
For information, Gecko already make use of a JavaScript cache[1] for XUL applications. This cache relies on XDR to cache and load the bytecode. [1] http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/loader/mozJSSubScriptLoader.cpp
Summary: [meta] Improve loading time of loaded/cached scripts. → [meta] Add start-up cache for any JavaScript code.
This would be awesome for games in the webapp runtime, otherwise we need to compile a lot of code during each startup. I guess once this is in the browser, we'll get the feature in the webapp runtime for free?
(In reply to Marco Castelluccio [:marco] from comment #8) > This would be awesome for games in the webapp runtime, otherwise we need to > compile a lot of code during each startup. I guess once this is in the > browser, we'll get the feature in the webapp runtime for free? The primary goal[1] is to handled packaged apps for 1.4. Hopefully, we should be able to rely on the new cache abstraction suggested in Bug 900255. [1] https://wiki.mozilla.org/Javascript:SpiderMonkey:StartupCache
Depends on: 900255
(In reply to Nicolas B. Pierron [:nbp] from comment #9) > (In reply to Marco Castelluccio [:marco] from comment #8) > > This would be awesome for games in the webapp runtime, otherwise we need to > > compile a lot of code during each startup. I guess once this is in the > > browser, we'll get the feature in the webapp runtime for free? > > The primary goal[1] is to handled packaged apps for 1.4. Hopefully, we > should be able to rely on the new cache abstraction suggested in Bug 900255. > > [1] https://wiki.mozilla.org/Javascript:SpiderMonkey:StartupCache We support packaged apps in the Webapp Runtime. I'll follow the development of this StartupCache to see if we can easily support it in the Webapp Runtime too.
Depends on: 918809
Blocks: 885526
Whiteboard: [Shumway]
Whiteboard: [Shumway] → [Shumway:P1]
Alias: js-startup-cache
Depends on: 946843
Depends on: 946849
Depends on: 958172
Depends on: 959268
Keywords: perf
Depends on: 973889
Depends on: 977011
Depends on: 983687
No longer blocks: shumway-m4
Whiteboard: [Shumway:P1] → [shumway:p1]
No longer blocks: shumway-m4
Assignee: nicolas.b.pierron → nobody
Depends on: 1231565
platform-rel: --- → ?
Whiteboard: [shumway:p1] → [shumway:p1] [platform-rel-Facebook]
Depends on: 1286009
Depends on: 1288104
I pushed the latest patches I made to github [1], such that people can follow the progress and comment on it. As soon as I manage to get something working, I will submit patches for feedback/review. I tested the current set of patches so far, and faced issues from Necko patches from the beginning of July, so I should rebase my patches on top of the latest one of Bug 1258747 & Bug 1231565. [1] https://github.com/nbp/gecko-dev/compare/bugzil.la/js-startup-cache/base...nbp:bugzil.la/js-startup-cache/wip
This patch is based on Bug 1258747 and Bug 1231565 patches which are implementing the alternate data stream for channels. This WIP patch includes patches for Bug 1286009, Bug 1288104 and Bug 900874. This patch does not yet use any form of Idle observer for saving the content of alternate data (= bytecode).
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Attachment #8784805 - Flags: checkin-
Depends on: 1307633
(reviewed by jonco over irc)
Attachment #8800633 - Flags: review+
Attachment #8800634 - Flags: review?(jcoppeard) → review+
This is a clean-up patch comming after Bug 1194526 and Bug 1268147 which removed the need for the aType argument which used to be required by the CSP.
Moving aScriptFromHead to a field of the ScriptLoadRequest ensure that all the state to call StartLoad is either in the ScriptLoader or in the ScriptLoadRequest. Thus, we can later add new calls to StartLoad, if we have to restart a request because of some decoding issues.
Replace mScriptTextBug and mScriptTextLEngth of the nsScriptLoadRequest by an equivalent Vector. Also remove the aString argument from the nsScriptLoader::PrepareLoadedRequest and nsScriptLoader::OnStreamComplete, as the nsScriptLoadHandler is now taking a reference to the Vector hold by the nsScriptLoadRequest. Doing so is needed to avoid adding more arguments to OnStreamComplete, as we are going to add the bytecode cache input in a similar way as the text input.
Register alternative data type when requesting a script from the HTTPChannel. Detect the streamed data type, and fill either the source buffer, or the bytecode buffer, depending on the input type.
Attachment #8800635 - Flags: review?(bkelly)
Attachment #8800636 - Flags: review?(bkelly)
Attachment #8800635 - Flags: review?(bkelly) → review+
Attachment #8800637 - Flags: review?(bkelly)
Attachment #8800638 - Flags: review?(bkelly)
Comment on attachment 8800640 [details] [diff] [review] part 0.8 - nsJSUtils::EvaluateString, add the produced JSScript as an out-param. r= As we would have to cache the compiled bytecode cache, we have to be able to reference to the top-level of the JSScript computed by EvaluateString function.
Attachment #8800640 - Flags: review?(jcoppeard)
Attachment #8800640 - Flags: review?(bkelly)
Attachment #8800641 - Flags: review?(jcoppeard)
Comment on attachment 8800636 [details] [diff] [review] part 0.4 - Move aScriptFromHead to the ScriptLoadRequest. r= Review of attachment 8800636 [details] [diff] [review]: ----------------------------------------------------------------- More readable too since we don't have bare booleans at the call sites any more. r=me with comments addressed. ::: dom/base/nsScriptLoader.h @@ +176,5 @@ > int32_t mLineNo; > const mozilla::CORSMode mCORSMode; > const mozilla::dom::SRIMetadata mIntegrity; > mozilla::net::ReferrerPolicy mReferrerPolicy; > + bool mScriptFromHead; // Synchronous head script block loading of other non js/css content. Please initialize to a reasonable default in the nsScriptLoadRequest constructor. nit: You could also improve the packing here if you move the bool to just after the `nsScriptKind mKind` enum class member. Might need to specify the size of nsScriptKind as uint8_t.
Attachment #8800636 - Flags: review?(bkelly) → review+
Comment on attachment 8800640 [details] [diff] [review] part 0.8 - nsJSUtils::EvaluateString, add the produced JSScript as an out-param. r= Review of attachment 8800640 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsJSUtils.cpp @@ +195,5 @@ > } > } > > if (ok && aOffThreadToken) { > + aScript.set(JS::FinishOffThreadScript(aCx, *aOffThreadToken)); What happens if we didn't compile the script off main thread? It looks like aScript never gets set in this case.
Comment on attachment 8800637 [details] [diff] [review] part 0.5 - Replace manual handling of the ScriptText buffer of nsScriptLoadRequest. r= Review of attachment 8800637 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.h @@ +685,5 @@ > // Unicode decoder for charset. > nsCOMPtr<nsIUnicodeDecoder> mDecoder; > > // Accumulated decoded char buffer. > + mozilla::Vector<char16_t>& mBuffer; Holding a reference to a buffer owned by a different ref-counted object seems pretty dangerous to me. I see that we hold a ref to the request, so it should work, but might trip people up making further changes. Could we just replace all uses of `mBuffer` with `mRequest->mBuffer`? That's more explicit about what is happening and also saves us a few bytes.
Attachment #8800637 - Flags: review?(bkelly) → review-
Attachment #8800638 - Flags: review?(bkelly) → review+
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= This part is a big larger than expected, but it is hard to split without re-doing the patch, it does the following things: - In the StartLoad function: - Request the bytecode cache from the necko cache (Bug 1231565), if not requested otherwise. - In nsScriptLoaderHandler: - Detect if we are loading bytecode or source, and record it on the mScriptLoaderRequest. - In case of failure, call RestartLoad, to abort the current channel, and load the source. - Import the SRI hash from the bytecode cache. - In nsScriptLoadRequest: - Add a vector to store the bytecode. - Distinguish Loading (source / bytecode) from Loading_Source, needed to load the source if we failed to load the bytecode. - Add mCacheInfo, to be able to save the bytecode after some execution time, and make a reference for it in the cycle collector, as it is implemented by the HTTPChannel.
Attachment #8800642 - Flags: review?(valentin.gosu)
Attachment #8800642 - Flags: review?(jcoppeard)
Attachment #8800642 - Flags: review?(francois)
Attachment #8800642 - Flags: review?(bkelly)
Comment on attachment 8800641 [details] [diff] [review] part 0.9 - Add ScriptLoader LazyLogModule. r= Review of attachment 8800641 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +72,5 @@ > + MOZ_LOG(gScriptLoaderLog, mozilla::LogLevel::Debug, args) > +#define SLWARN(args) \ > + MOZ_LOG(gScriptLoaderLog, mozilla::LogLevel::Warning, args) > +#define SLERROR(args) \ > + MOZ_LOG(gScriptLoaderLog, mozilla::LogLevel::Error, args) These macros should probably be called LOG_something for clarity.
Attachment #8800641 - Flags: review?(jcoppeard) → review+
Comment on attachment 8800643 [details] [diff] [review] part 1.1 - Prevent the mutation of parsed objects after JSOP_OBJECT. r= In order to be able to XDR JSScript, we need to prevent the mutation of objects returned by the JSOP_OBJECT opcode. Otherwise, these opcode return the object contained in the JSScript, and these object by be mutated before being recorded by XDR serialization. This make XDR serialization possible after the execution, by preventing us from reusing the objects created by the parser, but by making a deep clone of them when we execute JSOP_OBJECT opcodes.
Attachment #8800643 - Flags: review?(luke)
Comment on attachment 8800644 [details] [diff] [review] part 1.2 - Add EvaluateBytecode function to nsJSUtils.h. r= This function decode synchronously the JSScript before starting the execution of them, as we would do with EvaluateString. This patch is unlikely to be the final version as the JS::DecodeScript function might take some time to decode the content of the bytecode cache. I am going to open a new bug to make JS_DecodeScript / JS_EncodeScript working off-thread.
Attachment #8800644 - Flags: feedback?(luke)
Attachment #8800644 - Flags: feedback?(jcoppeard)
Comment on attachment 8800640 [details] [diff] [review] part 0.8 - nsJSUtils::EvaluateString, add the produced JSScript as an out-param. r= Review of attachment 8800640 [details] [diff] [review]: ----------------------------------------------------------------- r=me with comments addressed. ::: dom/base/nsGlobalWindow.cpp @@ +12296,5 @@ > .setVersion(JSVERSION_DEFAULT); > JS::Rooted<JSObject*> global(aes.cx(), FastGetGlobalJSObject()); > + JS::Rooted<JSScript*> jsScript(aes.cx()); > + nsresult rv = nsJSUtils::EvaluateString(aes.cx(), script, global, > + &jsScript, options); Do we want to require the aScript out-param in EvaluateString()? Here we root the value just to throw it away. Or will all these values get used in a later patch? ::: dom/base/nsJSUtils.cpp @@ +195,5 @@ > } > } > > if (ok && aOffThreadToken) { > + aScript.set(JS::FinishOffThreadScript(aCx, *aOffThreadToken)); Also, if we intend not to set it, should we null it out to ensure a consistent out parameter value? ::: dom/base/nsJSUtils.h @@ +79,5 @@ > // TakeOwnershipOfErrorReporting() called on it. > static nsresult EvaluateString(JSContext* aCx, > const nsAString& aScript, > JS::Handle<JSObject*> aEvaluationGlobal, > + JS::MutableHandle<JSScript*> aScriptHandle, Can this out-param be placed down next to aRetValue? It seems like its intermixed with various input parameters. It might also help to name this something like aScriptHandleOut. Documenting it in the comment would be nice too.
Attachment #8800640 - Flags: review?(bkelly) → review+
Comment on attachment 8800645 [details] [diff] [review] part 1.3 - Save and Evaluate bytecode synchronously. r= Similarly to the previous patch, this code currently handle the serialization and saving of the SRI & Bytecode after the evaluation. In practice, we would want to save it on another thread. (f? jonco, luke, valentin) From this patch I only expect to keep the SRI encoding part in the bytecode buffer, within nsScriptLoader::OnStreamComplete. (r? francois)
Attachment #8800645 - Flags: review?(francois)
Attachment #8800645 - Flags: feedback?(valentin.gosu)
Attachment #8800645 - Flags: feedback?(luke)
Attachment #8800645 - Flags: feedback?(jcoppeard)
Attachment #8800645 - Flags: checkin-
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- I'm no expert in this area but this looks good to me. One thing though, it would be good to have tests to exercise RestartLoad. ::: dom/base/nsScriptLoader.cpp @@ +1201,5 @@ > +nsScriptLoader::RestartLoad(nsScriptLoadRequest *aRequest) > +{ > + // Start a new channel from which we explicitly request to stream the source > + // instead of the bytecode. > + aRequest->mProgress = nsScriptLoadRequest::Progress::Loading_Source; Can we assert the previous state here? ::: dom/base/nsScriptLoader.h @@ +86,5 @@ > mIsCanceled(false), > mWasCompiledOMT(false), > mOffThreadToken(nullptr), > mScriptText(), > + mScriptBytecode(), We probably don't need initialisers for mScriptText or mScriptBytecode.
Attachment #8800642 - Flags: review?(jcoppeard) → review+
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- I'd like to understand how the SRI data is serialized and deserialized better. In particular, we can have multiple things loading the same script with some using SRI and some not using SRI. ::: dom/base/nsScriptLoader.cpp @@ +3145,5 @@ > + SLLOG(("ScriptLoadRequest (%p): Source length = %u", > + mRequest.get(), unsigned(mBuffer.length()))); > + > + // If SRI is required for this load, appending new bytes to the hash. > + if (mSRIDataVerifier && NS_SUCCEEDED(mSRIStatus)) { So, what happens if you have different <script> tags loading the same URL where one uses SRI and the other doesn't? It seems you must always store the SRI information in case a future script tag requires SRI. Or do you have some other key in the cache which ensures this sri-vs-no-sri collision never happens? ::: dom/base/nsScriptLoader.h @@ +208,5 @@ > + > + // Holds the Cache entry information, which is used to register the bytecode > + // on the cache entry, such that we can load it the next time. > + // > + // Note, to avoid cycles throught the channel, we should not set this field nit: s/throught/through/g
Attachment #8800642 - Flags: review?(bkelly) → review-
Is there any overlap between this bug and efforts to serve pre-compiled sources from the web server? Do we know what mimetype will be used there?
Comment on attachment 8800643 [details] [diff] [review] part 1.1 - Prevent the mutation of parsed objects after JSOP_OBJECT. r= Review of attachment 8800643 [details] [diff] [review]: ----------------------------------------------------------------- I have no real intuition about CreateNativeGlobalForInner() (is that the right function); probably best to pick a DOM reviewer who can understand what invariant it is you're trying to preserve. (Personally, I wished we could've removed JSOP_OBJECT's literal-sharing behavior.)
Attachment #8800643 - Flags: review?(luke)
Comment on attachment 8800644 [details] [diff] [review] part 1.2 - Add EvaluateBytecode function to nsJSUtils.h. r= Review of attachment 8800644 [details] [diff] [review]: ----------------------------------------------------------------- Again, I feel like you need a DOM peer here to know what are the ambient invariants.
Attachment #8800644 - Flags: feedback?(luke)
Comment on attachment 8800645 [details] [diff] [review] part 1.3 - Save and Evaluate bytecode synchronously. r= Review of attachment 8800645 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +2261,5 @@ > + > + // singletonsAsTemplates is used to verify that the current compartment > + // can be used for encoding object literals in the bytecode cache, > + // otherwise, object literals are mutable and we cannot safely encode > + // them as we would encode the mutated state. This is kindof nuts that this impl detail is leaking into the SM+DOM interface. It means that seeming harmless SM-internal changes may create bugs that only show up when this bytecode cache is used. I really think we need to find a way for scripts to be static, stateless entities.
Comment on attachment 8800644 [details] [diff] [review] part 1.2 - Add EvaluateBytecode function to nsJSUtils.h. r= Review of attachment 8800644 [details] [diff] [review]: ----------------------------------------------------------------- Ditto what luke said about needing DOM peer review. ::: dom/base/nsJSUtils.cpp @@ +287,5 @@ > + > + MOZ_ASSERT_IF(aCompileOptions.versionSet, > + aCompileOptions.version != JSVERSION_UNKNOWN); > + MOZ_ASSERT(aCx == nsContentUtils::GetCurrentJSContext()); > + MOZ_ASSERT(!aBytecodeBuf.empty()); Presumably the bytecode starts a aBytecodeIndex, so we should check the buffer length is greater than that here. @@ +291,5 @@ > + MOZ_ASSERT(!aBytecodeBuf.empty()); > + MOZ_ASSERT(js::GetGlobalForObjectCrossCompartment(aEvaluationGlobal) == > + aEvaluationGlobal); > + MOZ_ASSERT(NS_IsMainThread()); > + MOZ_ASSERT(nsContentUtils::IsInMicroTask()); There are several blocks of assertions like this in the file. It would be nice to common them up if possible. @@ +297,5 @@ > + // Unfortunately, the JS engine actually compiles scripts with a return value > + // in a different, less efficient way. Furthermore, it can't JIT them in many > + // cases. So we need to be explicitly told whether the caller cares about the > + // return value. Callers can do this by calling the other overload of > + // EvaluateString() which calls this function with This comment thinks it's still part of EvaluateString. @@ +349,5 @@ > + } > + } > + > + // Wrap the return value into whatever compartment aCx was in. > + if (ok && !aCompileOptions.noScriptRval) { aCompileOptions.noScriptRval is set to true at the top of the function.
Attachment #8800644 - Flags: feedback?(jcoppeard) → feedback+
(In reply to Ben Kelly [:bkelly] from comment #37) > Comment on attachment 8800642 [details] [diff] [review] > part 1.0 - Add ability to load bytecode from the cache. r= > > Review of attachment 8800642 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'd like to understand how the SRI data is serialized and deserialized > better. In particular, we can have multiple things loading the same script > with some using SRI and some not using SRI. Bug 1288104 adds the implementation for the SRI functions. What we do is that we always store SRI information, even an empty one if no metadata are provided. We always decode/import the hash from the bytecode cache, and skip it at the end of OnStreamComplete. > ::: dom/base/nsScriptLoader.cpp > @@ +3145,5 @@ > > + SLLOG(("ScriptLoadRequest (%p): Source length = %u", > > + mRequest.get(), unsigned(mBuffer.length()))); > > + > > + // If SRI is required for this load, appending new bytes to the hash. > > + if (mSRIDataVerifier && NS_SUCCEEDED(mSRIStatus)) { > > So, what happens if you have different <script> tags loading the same URL > where one uses SRI and the other doesn't? It seems you must always store > the SRI information in case a future script tag requires SRI. Or do you > have some other key in the cache which ensures this sri-vs-no-sri collision > never happens? In the following scenario: - Load without SRI metadat - Save Bytecode cache - Load bytecode cache with SRI metadata When we verify the hash, we should notice that we have no SRI computed hash pre-registered, so we would reload the sources from the cache (not replaced by the bytecode cache) and compute the SRI computed hash. Later we would override the bytecode cache with new SRI computed hash. You can follow the design ideas in Bug 1288104 comment 0 and Bug 1288104 comment 1.
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the explanation. I'll look at this again with that context.
Attachment #8800642 - Flags: review- → review?(bkelly)
Delta: - Follow comment 29, replaces mBuffer reference by mRequest->mScriptText. Note, I did the same modification in part 1.0, for the bytecode buffer.
Attachment #8801167 - Flags: review?(bkelly)
Attachment #8800637 - Attachment is obsolete: true
(In reply to Ben Kelly [:bkelly] from comment #34) > ::: dom/base/nsGlobalWindow.cpp > @@ +12296,5 @@ > > .setVersion(JSVERSION_DEFAULT); > > JS::Rooted<JSObject*> global(aes.cx(), FastGetGlobalJSObject()); > > + JS::Rooted<JSScript*> jsScript(aes.cx()); > > + nsresult rv = nsJSUtils::EvaluateString(aes.cx(), script, global, > > + &jsScript, options); > > Do we want to require the aScript out-param in EvaluateString()? Here we > root the value just to throw it away. Or will all these values get used in > a later patch? Yes, this patch does not use them yet. Part 1.3 starts to use these values to serialize the JSScript with JS_EncodeScript after the first evaluation. Part 1.3 is not the yet definitive version, and we would have to store the JSScript somewhere (nsScriptLoadRequest?) where we can save it to be used as argument of JS_EncodeScript. I yet have to figure how to run JS_EncodeScript on a different thread, and where to store this JSScript. > ::: dom/base/nsJSUtils.cpp > @@ +195,5 @@ > > } > > } > > > > if (ok && aOffThreadToken) { > > + aScript.set(JS::FinishOffThreadScript(aCx, *aOffThreadToken)); > > Also, if we intend not to set it, should we null it out to ensure a > consistent out parameter value? I guess we could, Otherwise Rooted are null by default [1][2] if they are not initialized with any value. [1] http://searchfox.org/mozilla-central/source/js/public/RootingAPI.h#788-789 [2] http://searchfox.org/mozilla-central/source/js/public/GCPolicyAPI.h#117,127
Comment on attachment 8801167 [details] [diff] [review] part 0.5 - Replace manual handling of the ScriptText buffer of nsScriptLoadRequest. Review of attachment 8801167 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8801167 - Flags: review?(bkelly) → review+
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- I think it would be helpful to add a block comment summarizing the design chosen from bug 1288104 comment 0. Also, do you plan to add telemetry for the restarts? r=me with comments addressed. ::: dom/base/nsScriptLoader.cpp @@ +1285,5 @@ > > NS_ENSURE_SUCCESS(rv, rv); > > + aRequest->mCacheInfo = nullptr; > + if (!aRequest->IsLoadingSource()) { MOZ_ASSERT(aRequest->IsLoading()) here? @@ +2885,5 @@ > + mBytecodeBuffer(aRequest->mScriptBytecode) > +{ > + MOZ_ASSERT(mRequest->IsUnknownDataType()); > + if (mRequest->IsLoadingSource()) > + mRequest->mDataType = nsScriptLoadRequest::DataType::Source; Please use curly braces even for single-line if-statements. @@ +2937,5 @@ > + return NS_ERROR_OUT_OF_MEMORY; > + } > + > + *aConsumedLength = aDataLength; > + rv = MaybeDecodeSRI(); What guarantees do we have that we will receive enough data in this OnIncrementalData() call to decode the SRI? Isn't it possible we'll have the correct SRI value, but only receive part of the data in this chunk? It seems like maybe we should wait to do this in OnStreamComplete(). Although, I see we do MaybeDecodeSRI() in OnStreamComplete() as well. Can we just remove it from here? Perhaps with a comment that SRI will be extracted in OnStreamComplete()? @@ +3068,5 @@ > +nsresult > +nsScriptLoadHandler::MaybeDecodeSRI() > +{ > + if (!mSRIDataVerifier || mSRIDataVerifier->IsComplete() || NS_FAILED(mSRIStatus)) > + return NS_OK; curly braces on single line if-statements @@ +3071,5 @@ > + if (!mSRIDataVerifier || mSRIDataVerifier->IsComplete() || NS_FAILED(mSRIStatus)) > + return NS_OK; > + > + if (mBytecodeBuffer.length() <= mSRIDataVerifier->DataSummaryLength()) > + return NS_OK; curly braces on single line if-statements @@ +3162,5 @@ > + // If we abort while decoding the SRI, we fallback on explictly requesting > + // the source. Thus, we should not continue in > + // nsScriptLoader::OnStreamComplete, which removes the request from the > + // waiting lists. > + NS_ENSURE_SUCCESS(rv, mScriptLoader->RestartLoad(mRequest)); Using NS_ENSURE_SUCCESS() for this control flow is a bit non-standard. I can't remember another place we use a method with side-effects in the second parameter. Can you change this to a more typical if-statement? Also, NS_ENSURE_SUCCESS will trigger a warning here for an expected case which is probably not what we want. (:erahm, this comment is for you!) @@ +3168,5 @@ > + // Record the offset at which the bytecode starts. > + rv = SRICheckDataVerifier::HashLength(mBytecodeBuffer.length(), > + mBytecodeBuffer.begin(), > + &mRequest->mBytecodeOffset); > + NS_ENSURE_SUCCESS(rv, mScriptLoader->RestartLoad(mRequest)); Again, standard if-statement please.
Attachment #8800642 - Flags: review?(bkelly) → review+
(In reply to Ben Kelly [:bkelly] from comment #48) > Comment on attachment 8800642 [details] [diff] [review] > part 1.0 - Add ability to load bytecode from the cache. r= > > Review of attachment 8800642 [details] [diff] [review]: > ----------------------------------------------------------------- > […] > Also, do you plan to add telemetry for the restarts? I am thinking about it, but I have no yet investigated how to do that. I am waiting until I get something which is push-able behind a pref before looking at any telemetry. I will also have to look on how to add a preference to pref-off the alternate data choice, and the encoding part. > ::: dom/base/nsScriptLoader.cpp > @@ +1285,5 @@ > > > > NS_ENSURE_SUCCESS(rv, rv); > > > > + aRequest->mCacheInfo = nullptr; > > + if (!aRequest->IsLoadingSource()) { > > MOZ_ASSERT(aRequest->IsLoading()) here? Jon already added this assertion at the top of nsScriptLoader::StartLoad, in Bug 1240072. > @@ +2937,5 @@ > > + return NS_ERROR_OUT_OF_MEMORY; > > + } > > + > > + *aConsumedLength = aDataLength; > > + rv = MaybeDecodeSRI(); > > What guarantees do we have that we will receive enough data in this > OnIncrementalData() call to decode the SRI? Isn't it possible we'll have > the correct SRI value, but only receive part of the data in this chunk? It > seems like maybe we should wait to do this in OnStreamComplete(). > > Although, I see we do MaybeDecodeSRI() in OnStreamComplete() as well. Can > we just remove it from here? Perhaps with a comment that SRI will be > extracted in OnStreamComplete()? MaybeDecodeSRI attempts to decode the hash, if the content is large enough to be decoded, and if we have not done so yet (->IsComplete() check). The original idea was to be able to abort the streaming as soon as we are able to detect that we would fail. On the other hand, I admit that the failure conditions for MaybeDecodeSRI are unlikely, and might not be worth the extra call in OnIncrementalData. Among the failures, the most likely one is the lack of SRI hash in the bytecode cache, while expected by the verifier. So, I will rename MaybeDecodeSRI to DecodeSRI, and remove the call from OnIncrementalData.
(In reply to Jon Coppeard (:jonco) from comment #28) > Comment on attachment 8800640 [details] [diff] [review] > part 0.8 - nsJSUtils::EvaluateString, add the produced JSScript as an > out-param. r= > > Review of attachment 8800640 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/base/nsJSUtils.cpp > @@ +195,5 @@ > > if (ok && aOffThreadToken) { > > + aScript.set(JS::FinishOffThreadScript(aCx, *aOffThreadToken)); > > What happens if we didn't compile the script off main thread? It looks like > aScript never gets set in this case. Indeed. Currently the choice for using this code path (synchronous parsing) is based on the size of the script, if the script is small enough, then we would go through the parser & interpreter without giving away the top-level JSScript. So, this was not my priority in this patch. I guess I can split the synchronous part to first parse (and save the JSScript), and then run the code. Due to this and potential failures of the parser, the saving-part (part 1.3) checks that we have a JSScript set before making any attempt to encode the bytecode for it. I guess in terms of API, this is terrible that we are not reporting the JSScript in all cases of valid executions. In terms of implementation, this was the easiest way forward, to skip the synchronous case. I guess I should make a part which also split the synchronous case in 2, in order to report the JSScript in all valid cases, to make a proper API. What do you think?
Flags: needinfo?(jcoppeard)
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- The channel bits look good, with one minor nit. ::: dom/base/nsScriptLoader.cpp @@ +1290,5 @@ > + // Inform the HTTP cache that we prefer to have information coming from the > + // bytecode cache instead of the sources, if such entry is already registered. > + nsCOMPtr<nsICacheInfoChannel> cic(do_QueryInterface(channel)); > + if (cic) { > + cic->PreferAlternativeDataType(NS_LITERAL_CSTRING("javascript/moz-bytecode")); Is there any chance of the bytecode representation changing? In that case maybe something like javascript/moz-bytecode-[version] would be better. Also, define a const for the type, and use it in EnsureKnownDataType as well, so we don't introduce a bug when bumping up the version.
Attachment #8800642 - Flags: review?(valentin.gosu) → review+
Depends on: 1311020
Comment on attachment 8800645 [details] [diff] [review] part 1.3 - Save and Evaluate bytecode synchronously. r= Review of attachment 8800645 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +2275,5 @@ > + // Open the output stream to the cache entry alternate data > + // storage. This might fail if the stream is already open by another > + // request, in which case, we just ignore the current one. > + nsCOMPtr<nsIOutputStream> output; > + rv = aRequest->mCacheInfo->OpenAlternativeOutputStream(NS_LITERAL_CSTRING("javascript/moz-bytecode"), Use a const for the typename. @@ +2283,5 @@ > + uint32_t n; > + rv = output->Write(reinterpret_cast<char*>(aRequest->mScriptBytecode.begin()), > + aRequest->mScriptBytecode.length(), &n); > + SLLOG(("ScriptLoadRequest (%p): Write bytecode cache (length = %u, written = %u)", > + aRequest, unsigned(aRequest->mScriptBytecode.length()), n)); Maybe log the rv as well.
Attachment #8800645 - Flags: feedback?(valentin.gosu) → feedback+
(In reply to Nicolas B. Pierron [:nbp] from comment #50) > Currently the choice for using this code path (synchronous parsing) is based > on the size of the script, if the script is small enough, then we would go > through the parser & interpreter without giving away the top-level JSScript. Ah OK, it's intentional. It's pretty confusing from an API point of view. > I guess I should make a part which also split the synchronous case in 2, in > order to report the JSScript in all valid cases, to make a proper API. What > do you think? I think nsJSUtils::EvaluateString has a pretty strange API already and maybe this should be two functions, depending on whether you're executing a pre-compiled script or a fresh string.
Flags: needinfo?(jcoppeard)
Comment on attachment 8800642 [details] [diff] [review] part 1.0 - Add ability to load bytecode from the cache. r= Review of attachment 8800642 [details] [diff] [review]: ----------------------------------------------------------------- I don't have anything else to add to what bkelly has already suggested with respect to the SRI bits.
Attachment #8800642 - Flags: review?(francois) → review+
Comment on attachment 8800645 [details] [diff] [review] part 1.3 - Save and Evaluate bytecode synchronously. r= Review of attachment 8800645 [details] [diff] [review]: ----------------------------------------------------------------- The SRI bits look good to me.
Attachment #8800645 - Flags: review?(francois) → review+
Attachment #8800640 - Flags: review?(jcoppeard)
Comment on attachment 8800645 [details] [diff] [review] part 1.3 - Save and Evaluate bytecode synchronously. r= Review of attachment 8800645 [details] [diff] [review]: ----------------------------------------------------------------- I'm not an expert in this area but this looks like it will be work. As commented elsewhere we could perhaps generate the bytecode at the same time as parsing the code.
Attachment #8800645 - Flags: feedback?(jcoppeard) → feedback+
Here are some test results of the bytecode cache, in its current state (as of all the patches submitted up to now). tl;dr: 10% slow-down on the first load, 43% improvements by decoding instead of parsing. To get these results, I instrumented the EncodeScript and DecodeScript function of XDR to report results to the TraceLogger¹. Then I followed the following procedure: We ran firefox², with a profile with which I have not yet visited the targeted website. At this stage, we expect Firefox to be parsing and compiling a few lazy script, then encoding these compiled script in the bytecode cache. mach run https://quotidien.framapad.org/p/yL57AlAcJq Than after 30 seconds, we close firefox. Then I rerun the same command² a second time. This time, we expect firefox to be loading the resources from the cache (bytecode cache), and to decode the bytecode of the function compiled previously. mach run https://quotidien.framapad.org/p/yL57AlAcJq Then, from the result of the TraceLogger, we need to identify the 2 sections of code where we did similar things. In the current case, I managed to isolate these sections from the following properties: - beginning: EncodeScript / DecodeScript running across both runs, these events appears in the log at roughtly the same number of events. Event 867:- EncodeScript Event 866:- DecodeScript Both were characterized by large gap between the execution of the ParseCompileLazy function calls before them. As we do not know exactly the differences between the two executions, I discarded these lines and anything before them. - ending: The last EncodeScript / DecodeScript were followed by 2 large and consecutive ParseCompileLazy sections. I removed anything after as there is no way these would have been captured in the bytecode cache. Once the bytecode cache should be complete, i-e includes the code of the function being executed after the initialization of the page, we should expect larger speed-ups. Currently the bytecode cache only contains code which is used when the top-level of JS file is executed, i-e no code used during onload events. With the current implementation, we got (roughtly) the following results: First run: ParserCompileLazy: 4.8ms EncodeScript: 0.42 ms DecodeScript: 0 ms Second run: ParserCompileLazy: 0.75 ms EncodeScript: 0 ms DecodeScript: 2.29 ms So, the current set of patches are saving about something which is roughly twice as large as the source in the alternate-data stream, and reduce the load time caused by the parser by 43%. One of the big surprises, is that the encoding of the bytecode represents only a 10% slow down. ¹ The patch will be attached soonish. ² with the following environment variables TLLOG=ParserCompileFunction,ParserCompileLazy,ParserCompileModule,DecodeScript,DecodeFunction,EncodeScript,EncodeFunction TLOPTIONS=EnableMainThread,EnableOffThread,EnableGraph
Depends on: 1316078
Depends on: 1316081
(In reply to Nicolas B. Pierron [:nbp] from comment #57) > ² with the following environment variables > TLLOG=ParserCompileFunction,ParserCompileLazy,ParserCompileModule, > DecodeScript,DecodeFunction,EncodeScript,EncodeFunction > TLOPTIONS=EnableMainThread,EnableOffThread,EnableGraph Note, I would have to redo the previous measurements, as the previous results do not include the cost of the top-level parsing of the script, but only the cost of a few lazy functions which got compile. We have to add TLLOG=ParserCompileScript,… to the previous list. Whatever these would add to the parser time, this does not impact the fact that a JS start-up bytecode cache would be beneficial.
Comment on attachment 8808739 [details] [diff] [review] part 0.11 - Instrument XDR Encode/Decode function with the TraceLogger. Review of attachment 8808739 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Xdr.cpp @@ +118,5 @@ > XDRState<mode>::codeFunction(MutableHandleFunction funp) > { > + AutoTraceLog tl( > + TraceLoggerForMainThread(cx()->runtime()), > + mode == XDR_DECODE ? TraceLogger_DecodeFunction : TraceLogger_EncodeFunction); This seems like a strange style? I would have thought it should be: > AutoTraceLog tl(TraceLoggerForMainThread(cx()->runtime()), mode == XDR_DECODE ? TraceLogger_DecodeFunction : TraceLogger_EncodeFunction); Not sure though. Maybe create variable names to make sure AutoTraceLog fit on one line? @@ +146,5 @@ > XDRState<mode>::codeScript(MutableHandleScript scriptp) > { > + AutoTraceLog tl( > + TraceLoggerForMainThread(cx()->runtime()), > + mode == XDR_DECODE ? TraceLogger_DecodeScript : TraceLogger_EncodeScript); Ditto You can log the scripts name as following: http://searchfox.org/mozilla-central/source/js/src/jit/Ion.cpp#519
Attachment #8808739 - Flags: review?(hv1989) → review+
Do you have plans to use the code cache on Worker threads? It would be a great benefit to service worker startup speed which directly impacts page load time.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Ben Kelly [:bkelly] from comment #61) > Do you have plans to use the code cache on Worker threads? It would be a > great benefit to service worker startup speed which directly impacts page > load time. To make things simple at first (even if it is not), I would say not yet. The alternate-data provided by necko should still return the raw source instead of the bytecode cache content, if we did not specified any preferred alternative data type. So these changes should not conflict even if the same source is used in a document and in a worker thread. On a related subject, we would not be able to load any bytecode content from services workers until we implement support for alternate-data on the service workers. (similar to Bug 1231565) Which from what I understood was different than the necko implementation.
Flags: needinfo?(nicolas.b.pierron)
> On a related subject, we would not be able to load any bytecode content from services workers until we implement support for alternate-data on the service workers. Can you expand on what you mean by this? I'm a bit concerned that this architecture will never work with service workers enabled on a site because it stores the cached data in http cache. Since service worker script sits between http cache and the script loader, this means that we must always load the real source out of http cache. We have no way of knowing if the SW script will inspect the bytes, so we can't serve the compiled "alternate" data stream. In contrast, if this was implemented as a higher level cache above the network layer then it would just work with service workers. For example, I believe the asmjs cache works this way today. Nicolas, do you have a proposal for how to make this work in a service worker enabled site?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Ben Kelly [:bkelly] from comment #63) > > On a related subject, we would not be able to load any bytecode content from services workers until we implement support for alternate-data on the service workers. > > Can you expand on what you mean by this? Bug 1231565 added a few functions to the nsICacheInfoChannel, which let us provide the prefer alternate data type. At the moment, this is only implemented by Necko as far as I know. > I'm a bit concerned that this architecture will never work with service > workers enabled on a site because it stores the cached data in http cache. > Since service worker script sits between http cache and the script loader, > this means that we must always load the real source out of http cache. We > have no way of knowing if the SW script will inspect the bytes, so we can't > serve the compiled "alternate" data stream. From what I understand the service worker is a proxy, which behaves like a cache. I do not know the implementation of service workers, but I do not yet see any reasons why service workers should not implement nsICacheInfoChannel interface as well. > In contrast, if this was implemented as a higher level cache above the > network layer then it would just work with service workers. For example, I > believe the asmjs cache works this way today. The reason I prefer it being implemented as part of the lower level, such as the network layer, is that the network layer is doing invalidation of the cached content, either based on user inputs, or based on remote responses. Asm.js cache is working assuming that we already fetched the bytes from the sources, and the content of the cache is indexed by the hash & source comparison of the content. This is made to collect information about generated/patched function, which is a different use case than the start-up bytecode cache. In the case of the start-up bytecode cache, we are trying to avoid loading the original source, in exchange of some other format of the data. In such case the indexing is not made based on the hash & source comparison, but based on whatever the cache implementation needs to identify a valid cached resource from an URL. HTTP caching rules seems to be complex enough from my point of view to prefer reusing the existing code than making my own. I think that having this mechanism part of the nsIChannel, as an optional interface is the best choice as the invalidation of the alternate data depends only on the network cache. Also, avoiding the indexing based on the hash&source reduce the IPDL overhead. > Nicolas, do you have a proposal for how to make this work in a service > worker enabled site? I do not know much from the service worker. I guess they provide some-kind of implementation for nsIChannel, and I guess similarly they could provide an implementation for the nsICacheInfoChannel.
Flags: needinfo?(nicolas.b.pierron)
The problem with service workers is that they pass the byte stream through page-controlled script, essentially. The page can then modify the bytes or whatever. So you can't just create an out-of-band thing and not read the "real" data from the cache.
While I was resistant to this idea originally, upon further reflection I think we could do something like: 1) Attach an optional nsICacheInfoChannel to InternalResponse::GetCacheInfoChannel() 2) When fetch() returns a Response we hold on to the nsIChannel's nsICacheInfoChannel if present so we can expose it in the API from (1). 3) When a Response is stored in Cache API it will look for any secondary data stream if nsICacheInfoChannel is attached to the Response. If there, it will store it next to the main data stream in its disk storage area. 4) Cache API implements its own, reduced version of nsICacheInfoChannel. This is attached to Response objects returned from cache.match(), etc. 5) The secondary data stream from (3) can be accessed via the Cache API version of nsICacheInfoChannel. 6) The Cache API version of nsICacheInfoChannel would also allow code to set the secondary data stream in the Cache API entry. Things like script mutating the bytestream, cloning Responses, etc might lose the nsICacheInfoChannel. In those cases we fall back to the current case of no bytecode cache. One problem that would make this harder is if nsICacheInfoChannel is main thread only. Cache API works very hard to avoid the main thread today.
Anyway, the plan in comment 66 would be a non-trivial follow-up bug.
Depends on: 1330373
Depends on: 1331662
Attachment #8800645 - Flags: feedback?(luke)
Comment on attachment 8800643 [details] [diff] [review] part 1.1 - Prevent the mutation of parsed objects after JSOP_OBJECT. r= This part 1.1 is now obsolete because we are going to encode the bytecode incrementally (Bug 1316081). The incremental encoding solves 2 problems: 1. Adds back the ability to (sort-of) interrupt the encoding phase. 2. Does not require cloning the objects of runOnce scripts.
Attachment #8800643 - Attachment is obsolete: true
platform-rel: ? → -
I am adding the "leave-open" keyword, as I will land "part 0.*" patches, and we should keep this bug open until all non-obsolete "part 1.*" patches are landed.
Keywords: leave-open
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2c06f923117b part 0.1 - Fix typo in nsScriptLoader::StartLoad. r=jonco https://hg.mozilla.org/integration/mozilla-inbound/rev/64e1c07735d8 part 0.2 - Rename nsScriptLoadHandler::TryDecodeRawData to DecodeRawData. r=jonco https://hg.mozilla.org/integration/mozilla-inbound/rev/a56f86dc86e1 part 0.3 - Remove unused aType argument from nsScriptLoader::StartLoad. r=bkelly https://hg.mozilla.org/integration/mozilla-inbound/rev/3df44b58188d part 0.4 - Move aScriptFromHead to the ScriptLoadRequest. r=bkelly https://hg.mozilla.org/integration/mozilla-inbound/rev/c1d70cbf5402 part 0.5 - Replace manual handling of the ScriptText buffer of nsScriptLoadRequest. r=bkelly https://hg.mozilla.org/integration/mozilla-inbound/rev/55cf573e353a part 0.6 - Remove useless static_cast from nsScriptLoader::OnStreamComplete. r=bkelly https://hg.mozilla.org/integration/mozilla-inbound/rev/3f430c5ce617 part 0.7 - Remove trailing whitespace in nsScriptLoader.h comments. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/fa256821beac part 0.9 - Add ScriptLoader LazyLogModule. r=jonco https://hg.mozilla.org/integration/mozilla-inbound/rev/aeaf50c6d238 part 0.10 - Fix MOZ_LOG arguments indentation in nsScriptLoader.cpp. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/81c659b2e7e2 part 0.11 - Instrument XDR Encode/Decode function with the TraceLogger. r=h4writer
Depends on: 1341343
Depends on: 1342419
I ran tp5n benchmark (minus 3 crashing web-sites) with the set of patches which is currently available on my github account. I first tested with: a/ mach talos-test -a tp5n -tpcycles 5 --setpref dom.script_loader.bytecode_cache.enabled=true --no-download b/ mach talos-test -a tp5n -tpcycles 5 --setpref dom.script_loader.bytecode_cache.enabled=false --no-download I was not able to notice any major differences. The main reason seems to be that the current implementation only encode the bytecode of scripts which are large enough to be parsed out of the main thread. Logging the number of times this happened revealed that this was at most twice for a website, and most of the time this was between 0 and 1. To check if saving the bytecode of any scripts is interesting, I toggled the flag that I am using for testing, which enforce all scripts to be parsed out of the main thread, and thus being candidates for having their bytecode encoded and saved. c/ mach talos-test -a tp5n -tpcycles 5 --setpref dom.script_loader.bytecode_cache.enabled=true --setpref dom.script_loader.force_bytecode_cache=true --no-download d/ mach talos-test -a tp5n -tpcycles 5 --setpref dom.script_loader.bytecode_cache.enabled=false --setpref dom.script_loader.force_bytecode_cache=true --no-download This approximation is not exact but should highlight some trends as the preference (probably) causes the main thread to wait for the parser & decoder thread to finish before processing the results. I can see 3 different kind of results (5 runs), but a-part from a few regressions or noisy results, we can see a few improvements such as the following *best* examples where the results stands out of the noise level: * thesartorialist.blogspot.com no bytecode cache | bytecode cache 216.0, | 285.0, 204.0, | 143.0, 134.0, | 184.0, 209.0, | 160.0, 202.0 | 148.0 * orange.fr no bytecode cache | bytecode cache 165.0, | 173.0, 156.0, | 125.0, 139.0, | 123.0, 124.0, | 126.0, 144.0 | 117.0 * mail.ru no bytecode cache | bytecode cache 279.0, | 295.0, 279.0, | 274.0, 280.0, | 243.0, 282.0, | 270.0, 280.0 | 266.0 These results highlight the slow-down of the first run, compared to the speed-up of the following runs.
tl;dr: This is an overview of the full set of changes. Do not waste time reviewing this big patch. Feel free to suggest better reviewers. This diff contains the full set of part 1.* patches which I will attach on this bug later on, as I split this big patch in smaller pieces which are easier to review. This patch does multiples things: - Add a preference to enable/disable the bytecode cache. - Add ExecutionContext methods to: - decode the bytecode on the main thread. - decode the bytecode out of the main thread. - encode the bytecode when parsed out of the main thread. - Add Mime type for the alternate data streams: - with the build-id baked in, to avoid some decoding failures. - Stream either from the source / bytecode. - Add a nsScriptLoadRequest::mDataType to distinguish between the source and the bytecode. - Restart the nsScriptLoadRequest on SRI failures. - Add a way to enforce loading the source (provide a no-op mime type to be able to save the bytecode later) - Add saving of the bytecode: - Add a list of queued script to be encoded on the nsScriptLoader. - Scripts are queued when the evaluation is completed. - Scripts are encoded when the onload event got fired and there is no more script queued to be evaluated. (even if we do not save their bytecode) - Add test cases: - Add preferences to force saving the bytecode. (by forcing off-thread parsing) - Check restart of the request on SRI failures. - Check that bytecode is encoded and decoded as expected. - Add TRACE_FOR_TEST macro for reporting events on the script tag for tracing the code-path execution of the test cases. This work is still missing a few pieces: - Add telemetry probes to monitor Parsing time and decoding time (would be done in a follow-up bug) - Add request->mJSVersion as part of the Mime type, such that we can avoid decoding failures.
Attachment #8850091 - Flags: feedback?(jcoppeard)
Attachment #8850091 - Flags: feedback?(bugs)
Attachment #8850091 - Flags: feedback?(mrbkap)
Comment on attachment 8850091 [details] [diff] [review] part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. mrbkap really wants to give feedback to this one ;)
Attachment #8850091 - Flags: feedback?(bugs)
Flags: needinfo?(mrbkap)
Depends on: 1347120
Comment on attachment 8850091 [details] [diff] [review] part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. The test case provided in this patch works fine locally, because I run mochitest harness on my laptop which features multiple cores. However, it fails on try linux64, because the tc-M* are executed with only a single core. The test case fails on a single core because we do not even attempt to parse any scripts out of the main thread, which is the pre-requisite for the current implementation. I will change the ExecutionContext::CompileAndExec to report the JSScript pointer too, such that we can cache the bytecode even if it is being executed on the main thread.
Depends on: 1351357
Comment on attachment 8850091 [details] [diff] [review] part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. Review of attachment 8850091 [details] [diff] [review]: ----------------------------------------------------------------- I think I followed what's going on here and it seems pretty reasonable. I have a bunch of nits, comment typos, and a few code-cleanliness suggestions. I don't know the SRI code super-well, but assuming things work as they seem to be labeled, I think that also should be fine. ::: dom/base/nsScriptLoader.cpp @@ +97,5 @@ > +// This macro is used to wrap a tracing mechanism which is scheduling events > +// which are then used by the JavaScript code of test cases to track the code > +// path to verify the optimizations are working as expected. > +#define TRACE_FOR_TEST(elem, str) \ > + do { \ Nit: Might as well use PR_BEGIN_MACRO/PR_END_MACRO. @@ +1406,5 @@ > + // If we are explicitly loading from the sources, such as after a > + // restarted request, we might still want to save the bytecode after. > + // > + // The following tell the cache to look for an alternative data type which > + // does not exists, such that we can later save the bytecode with a "does not exist" @@ +2521,5 @@ > + !mXSLTRequests.isEmpty() || > + !mLoadedAsyncRequests.isEmpty() || > + !mNonAsyncExternalScriptInsertedRequests.isEmpty() || > + !mDeferRequests.isEmpty() || > + !mPendingChildLoaders.IsEmpty()) This is probably worthy of a helper given that it's duplicated below. @@ +2530,5 @@ > + // Create a new runnable dedicated to encoding the content of the bytecode > + // of all enqueued scripts. In case of failure, we give-up on encoding the > + // bytecode. > + nsCOMPtr<nsIRunnable> encoder = > + NewRunnableMethod(// "nsScriptLoader::EncodeBytecode", ? @@ +2652,5 @@ > + Unused << JS::FinishIncrementalEncoding(aes.cx(), script); > + request->mScriptBytecode.clearAndFree(); > + request->mCacheInfo = nullptr; > + request->mScript = nullptr; > + DropJSObjects(request.get()); I think I've seen this pattern four or so times by now. Seems like it'd be worth adding a helper on Request objects that deals with the Hold/DropJSObjetcs. @@ +2658,5 @@ > + return; > + } > + } > + > + RefPtr<nsScriptLoadRequest> request; Why is this declared outside of the loop? @@ +3224,5 @@ > aRequest->SetReady(); > > // If this is currently blocking the parser, attempt to compile it off-main-thread. > + if (aRequest->IsSource() && aRequest == mParserBlockingRequest && > + (NumberOfProcessors() > 1)) Nit: no need for these parens (you didn't add them, but might as well get rid of them while we're here). @@ +3600,5 @@ > + } > + > + nsCOMPtr<nsIRequest> req; > + nsresult rv = aLoader->GetRequest(getter_AddRefs(req)); > + NS_ASSERTION(req, "StreamLoader's request went away prematurely"); MOZ_ASSERT? ::: dom/base/nsScriptLoader.h @@ +159,5 @@ > return mProgress == Progress::Ready; > } > bool IsLoading() const { > + return mProgress == Progress::Loading || > + mProgress == Progress::Loading_Source; Nit: The second, overhung, line should line up with the first line. ::: dom/base/test/file_js_cache_save_after_load.js @@ +1,1 @@ > +function foo() { These files probably need public-domain license blocks. Why not just call this function send_ping()? ::: dom/base/test/test_script_loader_js_cache.html @@ +1,5 @@ > +<!DOCTYPE html> > +<html> > +<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=900784 --> > +<!-- The JS bytecode cache is not supposed to be observable. To make it > + observable, the nsScriptLoader is intrumented to trigger events on the instrumented @@ +49,5 @@ > + cacheTesting.flush(() => { resolve(); }); > + }); > + }; > + > + async function WaitForScriptTagEvent(url) { For my own curiosity: why make this an async function? It doesn't wait on anything and its caller (which is already an async function) is already async. @@ +70,5 @@ > + if (evt.target.id != "watchme") > + return; > + > + // Suppose that there is only one script tag. > + dump("## ScriptLoader event: " + evt.type + "\n"); These dumps should probably be taken out eventually or turned into `info()` calls. @@ +83,5 @@ > + result = `${result} & ping(=${ping})`; > + } > + stateMachineResolve(result); > + } else if (stateMachine === undefined) { > + // We followed an unknwon transition, report the known history. unknown
Attachment #8850091 - Flags: feedback?(mrbkap) → feedback+
Flags: needinfo?(mrbkap)
(In reply to Blake Kaplan (:mrbkap) from comment #76) > Comment on attachment 8850091 [details] [diff] [review] > part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. > > Review of attachment 8850091 [details] [diff] [review]: > ----------------------------------------------------------------- > > @@ +2530,5 @@ > > + // Create a new runnable dedicated to encoding the content of the bytecode > > + // of all enqueued scripts. In case of failure, we give-up on encoding the > > + // bytecode. > > + nsCOMPtr<nsIRunnable> encoder = > > + NewRunnableMethod(// "nsScriptLoader::EncodeBytecode", > > ? This is me working on an old version of mozilla-central, in which the naming capability did not exist yet. So as I want to name the runnable, I leave this comment here in order to add this argument later. > @@ +2658,5 @@ > > + return; > > + } > > + } > > + > > + RefPtr<nsScriptLoadRequest> request; > > Why is this declared outside of the loop? Because this is some habit I got from the the Rooting API, as declaring a Rooted register it-self into a linked list, they are better defined outside loops. > @@ +49,5 @@ > > + cacheTesting.flush(() => { resolve(); }); > > + }); > > + }; > > + > > + async function WaitForScriptTagEvent(url) { > > For my own curiosity: why make this an async function? It doesn't wait on > anything and its caller (which is already an async function) is already > async. There is no need, this is a legacy of some incremental modifications, until I decided to settle on the state machine based on the sequence of events. This function returns a Promise which is enough for using the await on the result of this function. As it does not have any yield/await keyword, thus there is no need for an async keyword.
Whiteboard: [shumway:p1] [platform-rel-Facebook] → [shumway:p1] [platform-rel-Facebook][qf:p1]
(In reply to Blake Kaplan (:mrbkap) from comment #76) > @@ +70,5 @@ > > + if (evt.target.id != "watchme") > > + return; > > + > > + // Suppose that there is only one script tag. > > + dump("## ScriptLoader event: " + evt.type + "\n"); > > These dumps should probably be taken out eventually or turned into `info()` > calls. I don't seem to have any "info" function available at hand. Where is it located?
This patch adds a few assertions checking mProgress state.
Attachment #8853096 - Flags: review?(mrbkap)
info() is a thing that exists in SimpleTest-based mochitests. This is a testharness-based mochitest. I don't think testharness has anything useful here. :(
This patch adds a few functions, which are following the new style added in Bug 1331662, which are using JS api functions to parse/decode (& maybe start the incremental XDR encoding added in Bug 1316081) and execute the code. Decoding errors should not happen as the next patch will only request the alternate data type when the version is the default version, and the build-id is asserted by the fact that it is append to the alternate data mime type.
Attachment #8853100 - Flags: review?(mrbkap)
This patch add a new state variable, to check if we are loading source or bytecode, or if we do not know yet. The next patches would make use of the bytecode state, to distinguish which function should be use for parsing/decoding.
Attachment #8853101 - Flags: review?(mrbkap)
This patch adds a flag to turn on/off the bytecode cache. For the moment this patch only request the stream from the necko cache, and always fallback on loading the source. 2 Mime types are used: javascript/moz-bytecode-20173003213625 javascript/null The first one includes the build-id, which ensure that newer version of Firefox would not use previously registered bytecode which have incompatible bytecode format (double checked by the decoding function). The second one is used to request from necko the fact to maintain the nsCacheChannelInfo, which would be used in part 1.5 for encoding. When Loading bytecode, the same loader is used to either stream the source or the bytecode (one or the other), and EnsureKnownDataType is used to distinguish the type of the input, and register it in the mDataType field added in the previous patch. When a stream is canceled, we fallback again in StartLoad function, but we want to maintain the fact that the script is queued in one of the nsScriptLoader lists. Thus, in nsScriptLoader::OnStreamComplete, we check the channel status against the error code returned by RestartLoad. (the OnStreamComplete is called once more if we fail from either OnIncrementalData or OnStreamComplete)
Attachment #8853107 - Flags: review?(valentin.gosu)
Attachment #8853107 - Flags: review?(mrbkap)
This patch adds the code necessary to import/export the SRI hash in the buffer used by the bytecode cache. (see Bug 1288104) The bytecode cache assumes that the first bytes are reserved for the SRI hash, and if the source's document does not contain any SRI hash, then an empty one is stored instead. As the bytecode cache content can be quite large, we want to restart the stream as soon as we have enough bytes to verify if the SRI is correct. If it is not, this could be either because the SRI hash is wrong or because the SRI hash is of a different algorithm. In any case, this is better verified with the source.
Attachment #8853111 - Flags: review?(mrbkap)
Attachment #8853111 - Flags: review?(francois)
Attachment #8800642 - Attachment is obsolete: true
Attachment #8800644 - Attachment is obsolete: true
Attachment #8800645 - Attachment is obsolete: true
This patch makes use of function added as part of part 1.0, and of the API added by Bug 1316078 to decode and execute the bytecode. The bytecode is considered from the offset after the SRI decoding to the end. The bytecode can be decoded out of the main thread, in which case the same heuristic as the parser is used at the moment. (to be changed later?)
Attachment #8853112 - Flags: review?(mrbkap)
This patch contains the code to encode the content which is incrementally stored by the XDR Incremental Encoder into the mScriptBytecode buffer held by the nsScriptLoadRequest. (Bug 1316081) To save the bytecode we need the mCacheChannelInfo that we collect at the end of the nsScriptLoadHandler::OnStreamComplete, and the mScript that we collect (to be tuned later) when parsing code out of the main thread. Both of these fields have to be added as part of the GC/CC marking and tracing. This patch implements the following heuristics: - Any script which for which we can save the bytecode, is added to the mBytecodeEncodingQueue. This reuses the fact that nsScriptLoadRequest are inlined list elements, which are popped before the EvaluateScript function call. - We do not want to encode anything if there is still a script which has to be executed, even if we do not save the btyecode of this script. If there is a script waiting to be executed, this means that the start-up is not complete yet. (MaybeTriggerBytecodeEncoding) - We do not want to encode anything until we get some event telling us that the page is loaded, (mLoadEventFired) trigered by the nsDocumentViewer::LoadComplete function. In case of any failure we try to stop gracefully (GiveUpBytecodeEncoding), by stopping the incremental recording of the bytecode. Otherwise, EncodeBytecode and EncodeRequestBytecode are used to finalize the incremental recording of the bytecode, and save it on the necko cache as the alternate data type, with the mime type javascript/moz-bytecode-... . (sorry this is still an old tree, and I cannot uncomment NewRunnableMethod first argument yet)
Attachment #8853114 - Flags: review?(valentin.gosu)
Attachment #8853114 - Flags: review?(mrbkap)
As the bytecode cache is not supposed to be observable except for performance differences. To check that the code is working as expected, this code adds a preference to toogle the emission of events on the script tag. These events are used to trace the code execution order to reconstruct the code path taken by the nsScriptLoader. The test case added in this patch are following the order in which these events are emitted and reconstruct a simple word which summarize this state. The test case asserts that we are able to encode, fallback, re-encode, and decode in multiple iframe at the same time. It also checks that we do record "ping" events from the onload listener, before encoding any bytecode. Note, I disabled the test case on Linux, as task-cluster mochitest are being executed on a single core, which breaks the heuristics which assume that we want to encode the bytecode of scripts which are parse out of the main thread. (we are not parsing any script out of the main thread when we have a single core apparently) Question, where is the "info" function?
Attachment #8853118 - Flags: review?(mrbkap)
Comment on attachment 8853111 [details] [diff] [review] part 1.3 - Add SRI hash import/export code from/to JS bytecode cache. Review of attachment 8853111 [details] [diff] [review]: ----------------------------------------------------------------- The SRI-related bits look good. ::: dom/base/nsScriptLoader.cpp @@ +3285,5 @@ > uint32_t aDataLength, > const uint8_t* aData) > { > nsresult rv = NS_OK; > + nsAutoCString url; nit: `url` is only used in the `LOG()` call, so this variable could be contained inside an `if (LOG_ENABLED) {}` block.
Attachment #8853111 - Flags: review?(francois) → review+
Comment on attachment 8853107 [details] [diff] [review] part 1.2 - Request JS bytecode if any is present in the cache. Review of attachment 8853107 [details] [diff] [review]: ----------------------------------------------------------------- The usage of the alt-data mechanism is correct. I don't have a lot of context to know if the rest of the code is OK.
Attachment #8853107 - Flags: review?(valentin.gosu)
Comment on attachment 8853114 [details] [diff] [review] part 1.5 - Encode JS bytecode when no more scripts are executed. Review of attachment 8853114 [details] [diff] [review]: ----------------------------------------------------------------- The usage of the alt-data mechanism is correct.
Attachment #8853114 - Flags: review?(valentin.gosu)
Comment on attachment 8850091 [details] [diff] [review] part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. Review of attachment 8850091 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the slow reply. This looks good as far as I understand it. ::: dom/base/nsScriptLoader.cpp @@ +1409,5 @@ > + // The following tell the cache to look for an alternative data type which > + // does not exists, such that we can later save the bytecode with a > + // different alternative data type. > + LOG(("ScriptLoadRequest (%p): Request saving bytecode later", aRequest)); > + cic->PreferAlternativeDataType(kNullMimeType); I don't understand this part. Is this to tell the cache that we *don't* want any cached bytecode? If so we could probably come up with a simpler API to express this (outside the scope of the bug). @@ +2031,5 @@ > > + size_t len = aRequest->IsSource() > + ? aRequest->mScriptText.length() > + : aRequest->mScriptBytecode.length(); > + if (!JS::CanCompileOffThread(cx, options, len)) { Suggestion: We might want to have different thresholds for compiling bytecode / script text off thread. @@ +2440,5 @@ > + rv = exec.SyncAndExec(&aRequest->mOffThreadToken, &script); > + } else { > + TRACE_FOR_TEST(aRequest->mElement, "scriptloader_encode_and_execute"); > + MOZ_ASSERT(aRequest->mBytecodeOffset == > + aRequest->mScriptBytecode.length()); I don't understand this assert. @@ +2451,5 @@ > + // Queue the current script load request to later save the bytecode. > + if (NS_SUCCEEDED(rv) && aRequest->mCacheInfo) { > + aRequest->mScript = script; > + HoldJSObjects(aRequest); > + mozilla::HoldJSObjects(aRequest); It would be better to have Set/ClearScript methods on the request (I think mrbkap suggested this too). @@ +2475,5 @@ > } > + > + // Even if we are not saving the bytecode of the current script, we have > + // to trigger the encoding of the bytecode, as the current script can > + // call functions of a script for which we are recording the bytecode. I don't understand this comment. Do the other script's encoded bytecode depend on the encoded bytecode for the current script? @@ +3632,5 @@ > const uint8_t* aData) > { > + nsresult rv = NS_OK; > + nsAutoCString url; > + mRequest->mURI->GetAsciiSpec(url); Should be conditional on logging.
Attachment #8850091 - Flags: feedback?(jcoppeard) → feedback+
I will get to this review tomorrow.
Attachment #8853096 - Flags: review?(mrbkap) → review+
Comment on attachment 8853100 [details] [diff] [review] part 1.0 - Add nsJSUtils functions for encoding and decoding the bytecode. Review of attachment 8853100 [details] [diff] [review]: ----------------------------------------------------------------- r+ with some massaging of these names. ::: dom/base/nsJSUtils.h @@ +160,5 @@ > + > + // After getting a notification that an off-thread decoding terminated, > + // this function will synchronize the result by moving it to the main thread > + // before starting the execution of the script. > + MOZ_MUST_USE nsresult DecodeSyncAndExec(void **aOffThreadToken); This name is pretty confusing. In most uses in Gecko, "Sync" usually means "synchronous" instead of "synchronize" and it's hard to see past that. Maybe something like "DecodeAndExecOnMainThread"? @@ +162,5 @@ > + // this function will synchronize the result by moving it to the main thread > + // before starting the execution of the script. > + MOZ_MUST_USE nsresult DecodeSyncAndExec(void **aOffThreadToken); > + > + // Similar to SyncAndExec, except that in addition to fecth the source, we "in addition to fetching the source"
Attachment #8853100 - Flags: review?(mrbkap) → review+
Attachment #8853101 - Flags: review?(mrbkap) → review+
Comment on attachment 8853107 [details] [diff] [review] part 1.2 - Request JS bytecode if any is present in the cache. Review of attachment 8853107 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +1227,5 @@ > + // Start a new channel from which we explicitly request to stream the source > + // instead of the bytecode. > + aRequest->mProgress = nsScriptLoadRequest::Progress::Loading_Source; > + nsresult rv = StartLoad(aRequest); > + if (NS_FAILED(rv)) Nit: Prevailing style is to always brace if statements. @@ +2641,5 @@ > } > > + if (aChannelStatus == NS_BINDING_RETARGETED) { > + // When loading bytecode, we verify the SRI hash. If it does not match the > + // one from the document we restart the load, enforcing us to load the *forcing* us to load the source...
Attachment #8853107 - Flags: review?(mrbkap) → review+
Attachment #8853111 - Flags: review?(mrbkap) → review+
Attachment #8853112 - Flags: review?(mrbkap) → review+
Comment on attachment 8853118 [details] [diff] [review] part 1.6 - Add a test case for the JS start-up bytecode cache. Review of attachment 8853118 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +2308,5 @@ > MOZ_ASSERT(elementVal.isObject()); > aOptions->setElement(&elementVal.toObject()); > } > > + // When testing, we want to enforce uses of the bytecode cache. enforce -> force uses -> use
Attachment #8853118 - Flags: review?(mrbkap) → review+
Comment on attachment 8853114 [details] [diff] [review] part 1.5 - Encode JS bytecode when no more scripts are executed. Review of attachment 8853114 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +2385,5 @@ > + } > + } > + > + // Queue the current script load request to later save the bytecode. > + if (NS_SUCCEEDED(rv) && aRequest->mCacheInfo) { It seems like this if/else could be moved inside of the previous else branch and become if (NS_SUCCEEDED(rv)) { ... } else { aRequest->mCacheInfo = nullptr; } @@ +2388,5 @@ > + // Queue the current script load request to later save the bytecode. > + if (NS_SUCCEEDED(rv) && aRequest->mCacheInfo) { > + aRequest->mScript = script; > + HoldJSObjects(aRequest); > + mozilla::HoldJSObjects(aRequest); No need to double-hold aRequest here. @@ +2460,5 @@ > + // Create a new runnable dedicated to encoding the content of the bytecode > + // of all enqueued scripts. In case of failure, we give-up on encoding the > + // bytecode. > + nsCOMPtr<nsIRunnable> encoder = > + NewRunnableMethod(// "nsScriptLoader::EncodeBytecode", Friendly reminder to uncomment this once you update your tree. @@ +2585,5 @@ > + request = mBytecodeEncodingQueue.StealFirst(); > + LOG(("ScriptLoadRequest (%p): Cannot serialize bytecode", request.get())); > + // Note: Do not clear the mScriptBytecode buffer, because the > + // incremental encoder owned by the ScriptSource object still has a > + // reference to this buffer. This lost me. Where is this extra reference taken and dropped?
Attachment #8853114 - Flags: review?(mrbkap)
(In reply to Blake Kaplan (:mrbkap) from comment #97) > @@ +2585,5 @@ > > + request = mBytecodeEncodingQueue.StealFirst(); > > + LOG(("ScriptLoadRequest (%p): Cannot serialize bytecode", request.get())); > > + // Note: Do not clear the mScriptBytecode buffer, because the > > + // incremental encoder owned by the ScriptSource object still has a > > + // reference to this buffer. > > This lost me. Where is this extra reference taken and dropped? When we start the execution while encoding, we add a new object on the ScriptSource object, which is the XDRIncrementalEncoder. This XDRIncrementalEncoder hold a reference to the mScriptBytecode vector, as it is its output. When we call JS::FinishIncrementalEncoding, we command the XDRIncrementalEncoder to dump everything it has on the mScriptBytecode vector, and the ScriptSource to release the XDRIncrementalEncoder instance. This branch is used if we are unable to find all the argument necessary to call FinishIncrementalEncoding, and thus needed to release the XDRIncremetalEncoder instance from the ScriptSource. Thus the ScriptSource still has a reference to the mScriptBytecode. In practice the XDRIncrementalEncoder does not manipulate this vector unless the FinishIncrementalEncoding is called but this is not enforced by the API. Thus I prefer to keep the mScriptBytecode vector alive until both the ScriptSource and the nsScriptLoadRequest are released. I guess, I could make a follow-up patch to provide the output vector to the XDRIncrementalEncoder only when the FinishIncrementalEncoding function is called, I will investigate that later.
(In reply to Jon Coppeard (:jonco) from comment #92) > Comment on attachment 8850091 [details] [diff] [review] > part 1.* (squashed) - Add JS start-up bytecode cache to the nsScriptLoader. > > ::: dom/base/nsScriptLoader.cpp > @@ +1409,5 @@ > > + // The following tell the cache to look for an alternative data type which > > + // does not exists, such that we can later save the bytecode with a > > + // different alternative data type. > > + LOG(("ScriptLoadRequest (%p): Request saving bytecode later", aRequest)); > > + cic->PreferAlternativeDataType(kNullMimeType); > > I don't understand this part. Is this to tell the cache that we *don't* > want any cached bytecode? If so we could probably come up with a simpler > API to express this (outside the scope of the bug). Yes, this is to tell the cache that we do not want any stream other than the source but we might save some alternative data type later. > @@ +2031,5 @@ > > > > + size_t len = aRequest->IsSource() > > + ? aRequest->mScriptText.length() > > + : aRequest->mScriptBytecode.length(); > > + if (!JS::CanCompileOffThread(cx, options, len)) { > > Suggestion: We might want to have different thresholds for compiling > bytecode / script text off thread. I agree. I will look at tuning the bytecode cache heuristics in follow-up patches. > @@ +2440,5 @@ > > + rv = exec.SyncAndExec(&aRequest->mOffThreadToken, &script); > > + } else { > > + TRACE_FOR_TEST(aRequest->mElement, "scriptloader_encode_and_execute"); > > + MOZ_ASSERT(aRequest->mBytecodeOffset == > > + aRequest->mScriptBytecode.length()); > > I don't understand this assert. The mScriptBytecode vector already contains some bytes for the SRI hash of the original source. We cannot compute this hash from the bytecode cache content because we change the encoding of the source as soon as it is streamed to the child process. the mBytecodeOffset is used to save the location where the bytecode is supposed to start (i-e where the SRI hash is ending). This assertion asserts that the vector does not contain anything else. This is also a safe-guard, as the encoding API currently assumes that it has to append data in the mScriptBytecode buffer. > @@ +2475,5 @@ > > } > > + > > + // Even if we are not saving the bytecode of the current script, we have > > + // to trigger the encoding of the bytecode, as the current script can > > + // call functions of a script for which we are recording the bytecode. > > I don't understand this comment. Do the other script's encoded bytecode > depend on the encoded bytecode for the current script? Other scripts encoding depend on the execution of the current script. <script src=a.js /> <!-- to be encoded --> <script src=b.js /> <!-- not to be encoded (whatever heuristics) --> In this example, "a.js" could be a library which contains tons of functions, and "b.js" could be the code responsible for calling "a.js" functions to do the initialization of the page. We want to record the bytecode of functions stored in "a.js", after the execution of "b.js".
Comment on attachment 8853114 [details] [diff] [review] part 1.5 - Encode JS bytecode when no more scripts are executed. Review of attachment 8853114 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsScriptLoader.cpp @@ +2585,5 @@ > + request = mBytecodeEncodingQueue.StealFirst(); > + LOG(("ScriptLoadRequest (%p): Cannot serialize bytecode", request.get())); > + // Note: Do not clear the mScriptBytecode buffer, because the > + // incremental encoder owned by the ScriptSource object still has a > + // reference to this buffer. Please add a comment explaining that the buffer will be cleaned up by the GC.
Attachment #8853114 - Flags: review+
(In reply to Blake Kaplan (:mrbkap) from comment #94) > ::: dom/base/nsJSUtils.h > @@ +160,5 @@ > > + > > + // After getting a notification that an off-thread decoding terminated, > > + // this function will synchronize the result by moving it to the main thread > > + // before starting the execution of the script. > > + MOZ_MUST_USE nsresult DecodeSyncAndExec(void **aOffThreadToken); > > This name is pretty confusing. In most uses in Gecko, "Sync" usually means > "synchronous" instead of "synchronize" and it's hard to see past that. Maybe > something like "DecodeAndExecOnMainThread"? I opted for "Join" instead of "Sync", which should not be ambiguous: SyncAndExec -> JoinAndExec
Harald, Wanted to ping you on this bug. nbp is pretty close to landing this. He's testing perf gains against pretty old versions of sites present on TP5. Can we get the final bit of tuning/polish on this bug being based on tests against latest versions of the Top-5 sites we care about?
Flags: needinfo?(hkirschner)
ni? :bsmedberg who is working on a new page load benchmark. Should we try this out on the new page load benchmark process? :xeonchen could run also run it against live network on WebPageTest as secondary test to see first/repeat view impact on a larger set of sites.
Flags: needinfo?(hkirschner) → needinfo?(benjamin)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a4567f7deafa301e02483231b1ab0c2ef6679f3 Apart from Android tc-M(3), which I would also add in the skip-list, for the same reason as Linux (i-e running on a single core). The test case are running as expected on OSX and Windows (M(1), M-e10s(1), tc-M(1), tc-M-e10s(1)). Apparently, I got a new error message when executing bytecode, which does not seems to break the test case. I will investigate from where this error is coming from and fix it before landing.
Depends on: 1358521
(In reply to Nicolas B. Pierron [:nbp] from comment #104) > Apparently, I got a new error message when executing bytecode, which does > not seems to break the test case. I will investigate from where this error > is coming from and fix it before landing. This issue is fixed in Bug 1358521 patch.
The new pageload set and visibility via Talos automation is expected to be ready EOM with only moderate confidence. But yes, we should try this out on that benchmark when available. Also I'm assuming this can be pref-flipped on or off, so it's a good candidate for an A/B test, in particular against the first-nonblank-paint metric in telemetry.
Flags: needinfo?(benjamin)
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/be672e5ce84f part 0.12 - Assert nsScriptLoadRequest::mProgress state in nsScriptLoader methods. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/4d4a187ac115 part 1.0 - Add nsJSUtils functions for encoding and decoding the bytecode. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/404de81c3ca1 part 1.1 - Add nsScriptLoader::mDataType to track the type of the loaded content. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/152abfff2280 part 1.2 - Request JS bytecode if any is present in the cache. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/c08b6c15db29 part 1.3 - Add SRI hash import/export code from/to JS bytecode cache. r=francois,mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/fe6dcdf96501 part 1.4 - Decode JS bytecode either off/on-main-thread. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/e2208bf78a46 part 1.5 - Encode JS bytecode when no more scripts are executed. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/7773b641e9e0 part 1.6 - Add a test case for the JS start-up bytecode cache. r=mrbkap
(In reply to Benjamin Smedberg [:bsmedberg] from comment #106) > The new pageload set and visibility via Talos automation is expected to be > ready EOM with only moderate confidence. But yes, we should try this out on > that benchmark when available. > > Also I'm assuming this can be pref-flipped on or off, so it's a good > candidate for an A/B test, in particular against the first-nonblank-paint > metric in telemetry. Yes there is a preference named: dom.script_loader.bytecode_cache.enabled However they are a few pitfalls for benchmarks which is that saving the bytecode only happens after some time after the on-load event. So the performance will depend on when the page is refreshed. tp5n benchmark worked so far, except for a few failures on some pages. The heuristics, only encoding scripts which are parsed off-main-thread, used today is also subject to future optimization, and we would have to test that.
Depends on: 1299115
Depends on: 1362114
(In reply to Nicolas B. Pierron [:nbp] from comment #108) > Yes there is a preference named: > dom.script_loader.bytecode_cache.enabled Is there a way to set this pref from the browser itself? I see this pref is referenced in dom/base/nsScriptLoader.cpp, but there is no entry in modules/libpref/init/all.js and I don't see it when searching for it in about:config.
You can set it via about:config: See the "New" item in the context menu for the pref list. But yes, it would be simpler and more discoverable if it were in all.js.
Depends on: 1364117
Depends on: 1364118
No longer blocks: TimeToFirstPaint_FB
I did some testing with a simple heuristic, which enables the JS start-up bytecode cache after the 5th visit. Overall the results are *more than encouraging* (≃ -2.52% on tp5o opt e10s): tp5o opt e10s break-down: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=541bf10775c7&newProject=try&newRevision=d55684f881f6&originalSignature=80984697abf1f1ff2b058e2d9f0b351fd9d12ad9&newSignature=80984697abf1f1ff2b058e2d9f0b351fd9d12ad9&framework=1 Looking at the graph of each test, highlight that many of these results are below the average results from mozilla-inbound and other try runs, which gives me more confidences in these results than the single run from mozilla-inbound. try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d55684f881f634adbe7de0487378efbcba29ac96
Depends on: 1368675
Depends on: 1368887
Depends on: 1368992
Depends on: 1369051
Depends on: 1369044
Depends on: 1369803
Depends on: 1370345
Depends on: 1370869
Depends on: 1372115
Depends on: 1372207
Depends on: 1372643
Depends on: 1372942
Depends on: 1372993
Depends on: 1373241
Depends on: 1376634
Depends on: 1377268
Depends on: 1377369
Depends on: 1377681
Depends on: 1377682
Depends on: 1373934
Depends on: 1377906
Depends on: 1378308
Depends on: 1378327
Whiteboard: [shumway:p1] [platform-rel-Facebook][qf:p1] → [shumway:p1] [platform-rel-Facebook][qf:meta]
Latest telemetry results: - The encoding overhead on the script execution is between 2% - 10%. - The decoding improvement on the script execution is between 30% - 48%. These do not account for the parser time being replaced by the decoding time, but only for the removal of the LazyParsing of functions during the start-up of pages. Last time I measured, the decoding time was significantly smaller than the parsing time, but I do not have any telemetry to measure that at the moment.
https://sql.telemetry.mozilla.org/embed/query/5891/visualization/11969?api_key=wnsEoFnv9l9HmLA9UXMQ56zuk0nbmLcBiwzeCVwP This graph highlights the modification of the TIME_TO_DOM_COMPLETE_MS metric by the JSBC on Nightly. The initial bell curve from the control group seems to have been divided in 3 components, where one moved slightly towards faster loads, and another one moved towards slower loads, which might be related to the encoding. The encoding part is expected to be over-emphasized on nightly versions of Firefox as users are changing much more frequently their versions of Firefox, thus trashing the previously encoded bytecode after each update.
Comparing[1] the distribution data of TIME_TO_DOM_COMPLETE_MS reports between the control group and the JSBC-enabled group highlights that this features *saves around 33ms per page load*. Some regressions are expected, these regressions have a relative high impact (≤ 16ms) on pages which used to load quickly, they have a smaller relative impact on pages which used to load slowly (-8.4% max). On the other hand, we have some relative improvements before the median (+14.6% max), and some relative improvements over the long tail (+22.2% max) of really slow DOM_COMPLETE events. Also, 9.6% of the long-tail (≥ 50s) are now under 50s, and now appear as part of the graph. [1] https://docs.google.com/a/mozilla.com/spreadsheets/d/1pXtSdDmY4aneEmTgVSiqaK_g9R-C1OqHJSf9PPEYiko/edit?usp=sharing
Comment on attachment 8890456 [details] [diff] [review] Tune the bytecode cache heuristic based on telemetry results. Review of attachment 8890456 [details] [diff] [review]: ----------------------------------------------------------------- I don't know if I followed your math and logic from the telemetry results to the final heuristics in this patch. That being said, I'm pretty sure I followed you to the conclusion that this is definitely worth turning on. Are you planning on watching the Telemetry results on Beta to confirm your hypothesis about different Nightly versions causing more cache misses than would be seen in Release? ::: dom/script/ScriptLoader.cpp @@ +1957,5 @@ > sourceLengthMin = 1024; > + // Based only on the overhead of the encoding and the improvements of the > + // decoding, we should set this value to either 1 or 2. Considering the > + // fact that the bytecode is stored in addition to the source, and is > + // larger than the source, we use a higher threshold. This comment isn't very clear. What's it trying to say?
Attachment #8890456 - Flags: review?(mrbkap) → review+
Attachment #8890458 - Flags: review?(mrbkap) → review+
(In reply to Blake Kaplan (:mrbkap) from comment #119) > I don't know if I followed your math and logic from the telemetry results to > the final heuristics in this patch. That being said, I'm pretty sure I > followed you to the conclusion that this is definitely worth turning on. > > ::: dom/script/ScriptLoader.cpp > @@ +1957,5 @@ > > sourceLengthMin = 1024; > > + // Based only on the overhead of the encoding and the improvements of the > > + // decoding, we should set this value to either 1 or 2. Considering the > > + // fact that the bytecode is stored in addition to the source, and is > > + // larger than the source, we use a higher threshold. > > This comment isn't very clear. What's it trying to say? The math on telemetry are only based on the time overhead & improvement. I decided to not go as far as enabling it for such smaller hit-counts because there is also a memory overhead that I do not know how to take it into account in this threshold. Looking at DOM_SCRIPT_FETCH_COUNT sheet [1], which implements the first equation from Bug 1362114 comment 3, and highlights the minimal (normalized-)cost based on the factors reported in comment 114. These results suggest that in the worse case (= Cost (worse)) we should start encoding when the FETCH_COUNT reaches 2, and in the best case (= Cost (best)) when it reaches 1. These results also suggests that using a slightly larger (less than ~9) threshold does not impact dramatically the benefits that we have from this optimization. [1] https://docs.google.com/a/mozilla.com/spreadsheets/d/1pXtSdDmY4aneEmTgVSiqaK_g9R-C1OqHJSf9PPEYiko/edit?usp=sharing > Are you planning on watching the Telemetry results on Beta to confirm your > hypothesis about different Nightly versions causing more cache misses than > would be seen in Release? Yes. I might also have to tune the heuristics for mobile, as the pref-flip experiment was only running on desktop, and I do not know if we have the same fetch count distribution on mobile, nor the same performance improvements as reported in comment 114.
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d2cf47e965d7 Tune the bytecode cache heuristic based on telemetry results. r=mrbkap https://hg.mozilla.org/integration/mozilla-inbound/rev/bc37ab3e9abc Enable the JavaScript Start-up Bytecode Cache. r=mrbkap
Depends on: 1385310
Backout by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7c9e0fb89d84 Backed out changeset bc37ab3e9abc for leaks.
Nicolas and I chatted about this bug. I'd like to track this as blocking 57. Our hope is to enable this on 100% of Nightly57 by 8/22 with the memleak issues addressed. If not, we will still enable it for 25% of nightly57 population on 8/22.
Depends on: 1390863
Depends on: 1390856
Depends on: 1392225
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/049ef1d4eb26 Enable the JavaScript Start-up Bytecode Cache. r=mrbkap
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Depends on: 1392969
Depends on: 1385304
Backout by archaeopteryx@coole-files.de: https://hg.mozilla.org/mozilla-central/rev/b911a4c97fde Backed out changeset 049ef1d4eb26 on suspicion of letting devtools' browser_dbg_blackboxing-05.js frequently fail. r=backout
Backed out on suspicion of letting devtools' browser_dbg_blackboxing-05.js frequently fail: Failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=ec5e6cc98d9b8aaac237eb89abd27534f1d758c3&fromchange=dd0ce550bf64fa9f41c7f90329102d5c246ecaad&filter-searchStr=46b95f96e04e3285e9d3c9cb76755393f96119a4&selectedJob=125107266 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=125107266&repo=mozilla-inbound [task 2017-08-23T09:04:25.046274Z] 09:04:25 INFO - Debugger panel shown successfully. [task 2017-08-23T09:04:25.047256Z] 09:04:25 INFO - Waiting for editor event: 'cursorActivity' to fire: 1 time(s). [task 2017-08-23T09:04:25.048241Z] 09:04:25 INFO - Waiting for debugger event: 'Debugger:EditorSourceShown' to fire: 1 time(s). [task 2017-08-23T09:04:25.049209Z] 09:04:25 INFO - Buffered messages finished [task 2017-08-23T09:04:25.050200Z] 09:04:25 INFO - TEST-UNEXPECTED-FAIL | devtools/client/debugger/test/mochitest/browser_dbg_blackboxing-05.js | Test timed out -
Status: RESOLVED → REOPENED
Flags: needinfo?(nicolas.b.pierron)
Resolution: FIXED → ---
Before the backout, these AWSY regressions were noticed: == Change summary for alert #8954 (as of August 22 2017 19:26 UTC) == Regressions: 3% Heap Unclassified summary windows10-64 pgo 47,291,882.52 -> 48,864,383.51 3% Heap Unclassified summary windows10-64 opt 47,473,708.38 -> 49,053,052.95 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8954
Depends on: 1394442
Depends on: 1394132
Depends on: 1391495
Depends on: 1395233
on the backout, we saw a regression in talos: == Change summary for alert #9210 (as of August 23 2017 14:50 UTC) == Regressions: 4% tp5o summary windows10-64 opt e10s 271.94 -> 283.21 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9210
Depends on: 1397062
Depends on: 1397355
No longer depends on: 1397062
Depends on: 1401939
Blocks: 1377268
No longer depends on: 1377268
Blocks: 1342419
No longer depends on: 1342419
No longer depends on: 1377906
Depends on: 1405738
I believe the final decision we made a few weeks ago was to disable JSBC pref by default in beta57. Updating tracking and status flag to reflect that.
Blocks: 1416066
No longer depends on: 1416066
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
No longer depends on: 1385304
Flags: needinfo?(nicolas.b.pierron)
Resolution: --- → FIXED
Depends on: 1456834
No longer blocks: 1416066
Depends on: 1476319
Performance Impact: --- → ?
Whiteboard: [shumway:p1] [platform-rel-Facebook][qf:meta] → [shumway:p1] [platform-rel-Facebook]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: