Closed Bug 1382329 Opened 7 years ago Closed 7 years ago

Fix XDR encoding issues in script preloader

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox55 + fixed
firefox56 --- fixed

People

(Reporter: kmag, Assigned: kmag)

References

Details

(Keywords: regression)

Attachments

(4 files)

A few related issues with the XDR encoding and decoding of preloaded scripts have come up recently. The patches in this bug should address them.
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

https://reviewboard.mozilla.org/r/158952/#review164356

I don't entirely understand what you are doing, but it sounds reasonable...

::: commit-message-9b447:11
(Diff revision 1)
> +we'll be storing code in the ScriptPreloader cache.
> +
> +The main side-effect of this is that scripts which are used in a content
> +process do not use lazy source, but *do* use lazy parsing. And since we need
> +to clone pre-loaded scripts into their target compartment before executing, we
> +end up eagerly compiling those lazy functions on the main athreads soon as we

nit: "athreads"
Attachment #8888059 - Flags: review?(continuation) → review+
Comment on attachment 8888060 [details]
Bug 1382329: Part 2 - Eagerly encode scripts for the startup cache.

https://reviewboard.mozilla.org/r/158954/#review164384

This is probably fine, just a few points I'd like clarified.

::: js/xpconnect/loader/ScriptPreloader.cpp:549
(Diff revision 1)
>          if (!(script->mProcessTypes == script->mOriginalProcessTypes)) {
>              // Note: EnumSet doesn't support operator!=, hence the weird form above.
>              found = true;
>          }
>  
>          if (!script->mSize && !script->XDREncode(jsapi.cx())) {

Are we going to try to encode twice if the compilation fails in `NoteScript`?

::: js/xpconnect/loader/ScriptPreloader.cpp:1007
(Diff revision 1)
>      if (code == JS::TranscodeResult_Ok) {
>          mXDRRange.emplace(Buffer().begin(), Buffer().length());
> +        mSize = Range().length();
>          return true;
>      }
> +    mXDRData.destroy();

What's `mSize` at this point if the encode fails?
Attachment #8888060 - Flags: review?(erahm) → review+
Comment on attachment 8888060 [details]
Bug 1382329: Part 2 - Eagerly encode scripts for the startup cache.

https://reviewboard.mozilla.org/r/158954/#review164384

> Are we going to try to encode twice if the compilation fails in `NoteScript`?

Yes, but the only reason the encode should ever fail is OOM, so there's not really any harm in it. Not aside from the possibility of triggering one of corner cases this is trying to fix by encoding earlier, anyway, but the possibility of both those things happing at once should be astronomically small.

> What's `mSize` at this point if the encode fails?

`mSize` is always 0 when we don't have any XDR data, which means it's 0 before we attempted the encode, and it stays 0 if it fails.
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164394

The flow of this code has changed enough since the last time I looked at it that I'm not super confident in what's going on. Clearing the review for now pending clarification.

::: js/xpconnect/loader/ScriptPreloader.cpp:876
(Diff revision 1)
> +ScriptPreloader::FinishPendingParses(MonitorAutoLock& aMal)
> +{
> +    mMonitor.AssertCurrentThreadOwns();
> +
> +    FinishOffThreadDecode(false);
> +    while (!mPendingScripts.isEmpty()) {

I'm not 100% sure I follow the flow here, but this doesn't seem quite right.

We're passing `false` to `FinishOffThreadDecode` which means we don't call `DecodeNextBatch`, which means `mPendingScripts` is possibly never emptied leading to an infite loop.

::: js/xpconnect/loader/ScriptPreloader.cpp:877
(Diff revision 1)
> +{
> +    mMonitor.AssertCurrentThreadOwns();
> +
> +    FinishOffThreadDecode(false);
> +    while (!mPendingScripts.isEmpty()) {
> +        aMal.Wait();

What exactly are we waiting on here? The decode callback?
Attachment #8888061 - Flags: review?(erahm)
Comment on attachment 8888062 [details]
Bug 1382329: Part 4 - Hold mMonitor while accessing scripts in the write thread.

https://reviewboard.mozilla.org/r/158958/#review164396
Attachment #8888062 - Flags: review?(erahm) → review+
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164394

> I'm not 100% sure I follow the flow here, but this doesn't seem quite right.
> 
> We're passing `false` to `FinishOffThreadDecode` which means we don't call `DecodeNextBatch`, which means `mPendingScripts` is possibly never emptied leading to an infite loop.

You're right, this is meant to be checking `mParsingScripts` rather than `mPendingScripts`. After this is `FinishPendingParses` is called, the rest of the scripts in `mPendingScripts` are not meant to be decoded, so it should probably just be cleared rather than adding an extra boolean arg.

> What exactly are we waiting on here? The decode callback?

Yes. The off-thread decode callback notifes on `mMonitor` whenever an off-thread decode operation finishes.
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

https://reviewboard.mozilla.org/r/158952/#review164612

::: commit-message-9b447:4
(Diff revision 1)
> +Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader. r?mccr8,nbp
> +
> +In general, we always want to compile with lazy source whenever we intend to
> +store code in the startup bytecode cache. Currently, we only do so when the

I disagree.  Setting lazySource will disable the lazy parsing of function, thus cause the parser to generate the bytecode for all functions.  Which would consume much more memory than keeping the source.

The bytecode is way-way bigger than the source, and if we encode incrementally or after the execution, then we should not set the lazySource flag, in order to only capture the bytecode of functions which got evaluated in the previous runs.

Do we really want to save the complete bytecode, even of non-executed functions?  What is the impact of this change on the memory of Firefox?
Comment on attachment 8888060 [details]
Bug 1382329: Part 2 - Eagerly encode scripts for the startup cache.

https://reviewboard.mozilla.org/r/158954/#review164600

::: js/xpconnect/loader/ScriptPreloader.cpp:692
(Diff revision 1)
>  ScriptPreloader::NoteScript(const nsCString& url, const nsCString& cachePath,
>                              JS::HandleScript jsscript)

This function can still be called after the CloneAndExecuteScript, which would only clone the function if that target object is not in the same compartment as the JSContext.

In the following case:
http://searchfox.org/mozilla-central/source/js/xpconnect/loader/mozJSSubScriptLoader.cpp#182-183,229
http://searchfox.org/mozilla-central/source/js/xpconnect/loader/mozJSSubScriptLoader.cpp#351,386

The target object is already the global object, and we already made the switch to the same compartment.

Thus the XDREncode within this NoteScript function is still running after the execution of the JSScript.

::: js/xpconnect/loader/ScriptPreloader.cpp:721
(Diff revision 1)
> +    // exist in the child cache, encode it now, before it's ever executed.
> +    //
> +    // Ideally, we would like to do the encoding lazily, during idle slices.
> +    // There are subtle issues with encoding scripts which have already been
> +    // executed, though, which makes that somewhat risky. So until that
> +    // situation is improved, and thoroughly tested, we need to encode eagerly.

nit: add a hint to JS::TranscodeResult_Failure_RunOnceNotSupported within XDRScript function.
Attachment #8888060 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164616

::: js/xpconnect/loader/ScriptPreloader.cpp:879
(Diff revision 2)
> +    mMonitor.AssertCurrentThreadOwns();
> +
> +    mPendingScripts.clear();
> +
> +    FinishOffThreadDecode();
> +    while (!mParsingScripts.empty()) {

Why do we need this loop? I would expect the cleanup of FinishOffThreadDecode to clear the mParsingScripts list.
Attachment #8888061 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164618

You should probably ask shu for review since you asked him the last time, and I do not recall seeing these files before.
Attachment #8888061 - Flags: review+
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

https://reviewboard.mozilla.org/r/158952/#review164620
Attachment #8888059 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> (Diff revision 1)
> > +Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader. r?mccr8,nbp
> > +
> > +In general, we always want to compile with lazy source whenever we intend to
> > +store code in the startup bytecode cache. Currently, we only do so when the
>
> I disagree.  Setting lazySource will disable the lazy parsing of function,
> thus cause the parser to generate the bytecode for all functions.  Which
> would consume much more memory than keeping the source.

In ideal circumstances, that would be true. But in practice, we wind up
generating the bytecode when we clone the script to execute it, anyway.

> The bytecode is way-way bigger than the source, and if we encode
> incrementally or after the execution, then we should not set the lazySource
> flag, in order to only capture the bytecode of functions which got evaluated
> in the previous runs.

There are a few reasons this doesn't currently work:

1) We're now eagerly encoding the bytecode before it's been executed, so we
don't actually know which functions are executed during startup.

2) We currently eagerly generate bytecode for lazy functions every time we
clone a script. We currently need to clone scripts for a different global
before we execute them.

3) Soon, we'll need to clone scripts for a non-syntactic scope before
executing them, in order to support hosting all JSMs in a single global. And
scripts compiled for non-syntactic scopes don't support lazy parsing, in any
case.

> Do we really want to save the complete bytecode, even of non-executed
> functions?  What is the impact of this change on the memory of Firefox?

This shouldn't significantly increase memory usage, since we're currently
generating bytecode for lazy functions at startup regardless of this change.
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> ::: js/xpconnect/loader/ScriptPreloader.cpp:879
> (Diff revision 2)
> > +    mMonitor.AssertCurrentThreadOwns();
> > +
> > +    mPendingScripts.clear();
> > +
> > +    FinishOffThreadDecode();
> > +    while (!mParsingScripts.empty()) {
>
> Why do we need this loop? I would expect the cleanup of
> FinishOffThreadDecode to clear the mParsingScripts list.

FinishOffThreadDecode is a no-op unless a completed off-thread decode is
pending, so we need to loop until it's actually finished.

(In reply to Nicolas B. Pierron [:nbp] from comment #16)
> You should probably ask shu for review since you asked him the last time,
> and I do not recall seeing these files before.

Shu is mostly unavailable for the moment. Eric can review the specifics of
this file, but I'd also like a review from someone more familiar with the
JSAPI side of things.
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

https://reviewboard.mozilla.org/r/158952/#review164684
Attachment #8888059 - Flags: review+
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164686

Ok, I missed the mToken check in FinishOffThreadDecode in my previous review.
Attachment #8888061 - Flags: review+
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164910

Looks like there's still an issue, or at least we should add a comment clarifying what's going on.

::: js/xpconnect/loader/ScriptPreloader.cpp:879
(Diff revision 2)
> +    mMonitor.AssertCurrentThreadOwns();
> +
> +    mPendingScripts.clear();
> +
> +    FinishOffThreadDecode();
> +    while (!mParsingScripts.empty()) {

I agree the loop doesn't make sense at this point, `mParsingScripts` is always going to be empty after `FinishOffThreadDecode` returns due to the scoped cleanup.
Attachment #8888061 - Flags: review?(erahm) → review-
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review164910

> I agree the loop doesn't make sense at this point, `mParsingScripts` is always going to be empty after `FinishOffThreadDecode` returns due to the scoped cleanup.

That's only true if it doesn't return early because `mToken` is null. At the moment, it's more or less guaranteed not to be after the first `Wait()`, since we only notify on that monitor after a parse task finishes, and we only handle one off-thread parse task at a time. But that's not behavior that I want to rely on.
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

https://reviewboard.mozilla.org/r/158956/#review166348

r+ pending adding a comment. Generally it seems like the loop should move into `FinishOffThreadDecode`, but maybe there are callers that don't want to block. Arguably in that case we should change the name of the function to `MaybeFinishOffThreadDecode` or something like that.
Attachment #8888061 - Flags: review- → review+
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #24)
> Comment on attachment 8888061 [details]
> Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing
> scripts.
> 
> https://reviewboard.mozilla.org/r/158956/#review166348
> 
> r+ pending adding a comment. Generally it seems like the loop should move
> into `FinishOffThreadDecode`, but maybe there are callers that don't want to
> block. Arguably in that case we should change the name of the function to
> `MaybeFinishOffThreadDecode` or something like that.

In general, only the caller knows what end condition it's waiting for. Currently, there's only one off-thread decode at a time, so if we knew that was what we were waiting for, we could just loop in FinishOfftheadDecode. But that wasn't originally the case, it may change again, and there are still theoretically corner cases where we may need to schedule a subsequent decode and wait for it to finish before the script we need is decoded.

I'll update the comment, though.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d46324978a6db2a584a17c2ddfaba0caba8dec3b
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader. r=mccr8,nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7e1331a26642f19832142a29eae4d2bb052dd75
Bug 1382329: Part 2 - Eagerly encode scripts for the startup cache. r=erahm,nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/e1c94b661c79eee24f65a1fc0c122287ac527b3b
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts. r=erahm,nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/914753bd9224d104e4ffe5669480a5c511555d12
Bug 1382329: Part 4 - Hold mMonitor while accessing scripts in the write thread. r=erahm
Do you think we can uplift this to 55 (ideally today so we can get some testing in 55.0b13 before building release candidates)?
Flags: needinfo?(kmaglione+bmo)
Looking at crash-stat, I cannot find any new crashes on nightly for the known signatures associated with this bug. (for the moment)
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1359653
[User impact if declined]: We believe this issue is responsible for startup crashes when attempting to clone scripts with lazy functions.
[Is this code covered by automated tests?]: Not directly, but it is thoroughly exercised by the rest of the test suite.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No. We don't have reliable steps to reproduce. 
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly trivial change which mostly just makes behavior consistent between parent and child processes.
[String changes made/needed]: None.
Flags: needinfo?(kmaglione+bmo)
Attachment #8888059 - Flags: approval-mozilla-beta?
Comment on attachment 8888060 [details]
Bug 1382329: Part 2 - Eagerly encode scripts for the startup cache.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1359653
[User impact if declined]: We believe this issue is responsible for startup crashes when decoding certain scripts.
[Is this code covered by automated tests?]: Not directly, but it is thoroughly exercised by the rest of the test suite.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No. We don't have reliable steps to reproduce. 
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly trivial change, which mainly executes certain actions earlier, when they're less risky.
[String changes made/needed]: None.
Attachment #8888060 - Flags: approval-mozilla-beta?
Comment on attachment 8888061 [details]
Bug 1382329: Part 3 - Wait for pending parse tasks to finish before freeing scripts.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1359653
[User impact if declined]: This fixes crashes in certain cases that mostly affect extremely short sessions in automation.
[Is this code covered by automated tests?]: Not directly, but it is thoroughly exercised by the rest of the test suite.
[Has the fix been verified in Nightly?]: No, but I verified it in local builds.
[Needs manual test from QE? If yes, steps to reproduce]: No, this mostly affects automation, so that should be sufficient. 
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a relatively simple change that only affects certain uncommon corner cases, mainly during shutdown in extremely short sessions.
[String changes made/needed]: None.
Attachment #8888061 - Flags: approval-mozilla-beta?
Comment on attachment 8888062 [details]
Bug 1382329: Part 4 - Hold mMonitor while accessing scripts in the write thread.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1359653
[User impact if declined]: This fixes crashes in certain corner cases where the cache is invalidated during a write.
[Is this code covered by automated tests?]: Not directly, but it is thoroughly exercised by the rest of the test suite.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No, this is a race condition which is not easy to reliably reproduce. 
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly trivial change to ensure that we're holding the correct lock when necessary.
[String changes made/needed]: None.
Attachment #8888062 - Flags: approval-mozilla-beta?
Comment on attachment 8888059 [details]
Bug 1382329: Part 1 - Enable lazy source whenever compiling for the preloader.

fix startup crash regression in beta55.  should be in 55.0b13.

Thanks Kris!
Attachment #8888059 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8888060 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8888061 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8888062 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
No longer blocks: 1373934
Blocks: 1359653
Keywords: regression
You need to log in before you can comment on or make changes to this bug.