Closed Bug 1329901 Opened 6 years ago Closed 6 years ago

[GDocs] Extreme amount of time spent in FlagAllOperandsAsHavingRemovedUses().

Categories

(Core :: JavaScript Engine: JIT, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: sstangl, Assigned: sstangl)

References

()

Details

Attachments

(3 files)

STR in Bug 1269695 Comment 28 (let page load, then press Ctrl+End).

Using "perf" to profile the JS compilation helper thread shows an extreme amount of compilation time spent in FlagAllOperandsAsHavingRemovedUses(). About 35% of the time during the test, the compilation thread is actively compiling Ion code; of that time, 80% is apparently spent in that one function.

Although the compilation thread doesn't block main thread execution, it's possible that abnormally-long compilation times are masking other compilation opportunities.

Flame chart of Ion compilation attached. Note that although items are proportional to the amount of time taken, the x-axis is not time (items are just sorted alphabetically).

Nicolas, do any ideas come to mind?
Flags: needinfo?(nicolas.b.pierron)
FlagAllOperandsAsHavingRemovedUses() has the following loop-within-a-loop:

> for (size_t i = 0, e = rp->numOperands(); i < e; i++) {
>     if (mir->shouldCancel("FlagAllOperandsAsHavingRemovedUses inner loop"))
>         return false;

>     if (!rp->isObservableOperand(i))
>         continue;
>     rp->getOperand(i)->setUseRemovedUnchecked();
> }

Looking at the generated assembly with perf, the compiler totally messes up optimization because the guard isObservableOperand() is actually way more expensive than the behavior it's intended to protect. Instead of a tight loop, it's unrolled and all over the place, with tons of branches, touching memory all over the place, probably taking tons of page faults.

Additionally, shouldCancel() is called even though it's in the outer loop: this is only necessary because of isObservableOperand().

If we remove isObservableOperand(), then we can also remove shouldCancel(), and then the loop is able to be compiled extremely tightly. Re-measurement with perf shows that the function completely disappears with this change. Compilation time is greatly reduced.

Although the "JS Helper" thread now takes significantly fewer resources, the actual main thread doesn't get faster with this patch. But it's still an improvement.
Flags: needinfo?(nicolas.b.pierron)
Attachment #8825303 - Flags: review?(nicolas.b.pierron)
Same chart after the patch: total time in OptimizeMIR() reduced to ~5% from ~32%.

The number of samples taken with GenerateLIR() on the stack is pretty close in both of these runs, so they probably are measuring the same thing, modulo timing differences since I had to attach perf manually before pressing End.
Just as a sanity check, since it seems somewhat insane that this function could blow up so much, I took more perf recordings of Nightly on the testcase, and FlagAllOperandsAsHavingRemovedUses() dominated the compilation thread every time.

Still insane, but at least reproducible.
Assignee: nobody → sstangl
Comment on attachment 8825303 [details] [diff] [review]
Remove expensive isObservableOperand loop guard

Review of attachment 8825303 [details] [diff] [review]:
-----------------------------------------------------------------

Last time I tried this, this was a performance regression on benchmark, because the generated compiled code was keeping too many instruction live, preventing other optimizations to kick in.

Another approach would be to cache the ranges of observable slots on the CompileInfo, either by creating a boolean array ahead when they are constructed, or something else.
Attachment #8825303 - Flags: review?(nicolas.b.pierron)
If an operand is not observable, why would it matter whether it's marked as having uses removed?

This also affects Google Sheets performance: we spend ~10% of compilation time just within FlagAllOperandsAsHavingRemovedUses().

Would you happen to be able to remember which benchmark regresses? Since this is affecting a major P0 real-world use case, we may judge it to be more important than a benchmark, depending on the severity of regression.
Flags: needinfo?(nicolas.b.pierron)
I ran benchmarks locally and did not notice any regressions.

Could we land it and then see what regresses on AWFY?
Comment on attachment 8825303 [details] [diff] [review]
Remove expensive isObservableOperand loop guard

Review of attachment 8825303 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I miss read the conditions.

You are right, if the operand is observable, its computation should be kept around anyway as observable operands are considered to always have removed uses.

Which makes me wonder what condition did I added which was necessary to get some benchmark improvements with

http://searchfox.org/mozilla-central/source/js/src/jit/IonAnalysis.cpp#1042
http://searchfox.org/mozilla-central/source/js/src/jit/RangeAnalysis.cpp#2973
Attachment #8825303 - Flags: review+
Flags: needinfo?(nicolas.b.pierron)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c98510adc780
Remove expensive isObservableOperand() loop guards. r=nbp
Keywords: checkin-needed
This patch regressed some tests on AWFY: ss-fannkuck and 4 tests from the assorted list.
(In reply to Guilherme Lima from comment #9)
> This patch regressed some tests on AWFY: ss-fannkuck and 4 tests from the
> assorted list.

The good news is that the generated MIR is identical, so the difference is just the amount of time spent within FlagAllOperandsAsHavingRemovedUses().
Ah, it has nothing to do with the execution time of FlagAllOperandsAsHavingRemovedUses(). If you modify access-fannkuch.js to have n=11 at the bottom, the difference is magnified linearly with n. But that function does not show up at all in the profile.

The problem does not reproduce with --ion-offthread-compile=off, so I can't use the JITSpewer to compare the MIRGraphs.
Apparently I can use the JITSpewer to compare the MIRGraphs. The generated MIR is exactly the same, to the extent that the PNG files iongraph generates match byte-for-byte.

The only point that really looks like it could be having some change in behavior is in range analysis, but I instrumented it and there is no change in behavior.
https://hg.mozilla.org/mozilla-central/rev/c98510adc780
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Sean Stangl [:sstangl] from comment #11)
> The problem does not reproduce with --ion-offthread-compile=off, so I can't
> use the JITSpewer to compare the MIRGraphs.

The JITSpewer should work even with off-thread compilation (IONFLAGS=logs supports that now)

Otherwise, if you cannot reproduce it this with the logging enabled, you can record the perf issue under rr, and use the iongraph command added in gdb to display the final MIRGraph from CodeGenerator::generateBody.
Depends on: 1330667
Depends on: 1342016
Sean, does this bug have automated coverage? Would it benefit from manual testing?
Flags: qe-verify?
You need to log in before you can comment on or make changes to this bug.