Compile very large functions

RESOLVED FIXED in mozilla21

Status

()

defect
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Blocks 1 bug)

unspecified
mozilla21
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [games])

Attachments

(1 attachment, 2 obsolete attachments)

Reporter

Description

7 years ago
Now that we have background compilation, we can try to compile much larger functions than before when we were afraid of stalling the main thread. This could be very significant for large emscripten-compiled project for example.

Note that some emscripten-compiled projects like BananaBread are compiled with inlining turned off, because there was no benefit to it - because we don't compile large functions. I can make an inlined build for testing purposes if that would be useful here.

The emscripten test suite, which is on awfy-assorted tests, is compiled with inlining.
Reporter

Updated

7 years ago
Blocks: 644244
See also bug 807464 which has some data on script sizes. It should be pretty easy to experiment with now - we can just change/eliminate MAX_SCRIPT_SIZE in CheckScriptSize() in Ion.cpp.
Posted patch increase limit to 20000 (obsolete) — Splinter Review
I ran into this while poking at the Mandreel benchmark in Octane.  The baseline x86 scores we get for that benchmark are:

d8:     11592
Ion:    8760
Ion+bc: 9663   (bc == background compilation)

Looking at this in the JIT inspector it seems like the hottest functions (on lines 16894 and 17365) are running in JM rather than Ion, even though those hot functions come in under the MAX_SCRIPT_SIZE limit and do not abort/bailout.  The functions which call these are too large to Ion compile, so this may be some JM/Ion integration issue, and increasing the MAX_SCRIPT_SIZE makes this issue go away.  Increasing the limit to 20000 from 2000 yields:

Ion:    11396
Ion+bc: 12850

The concern in bug 807464 for dealing with these larger scripts is that they have unpredictable compilation times.  That is more an underlying issue than one to address with script size limits, and is mitigated to a good degree by background compilation.

The attached patch isn't shooting for any sort of optimal value for this benchmark.  Using 10000 or 40000 works just as well, and 10000 increases the score in the single threaded case to 12500.  I'm generally concerned about overtraining on benchmarks though (including the Emscripten ones), and 20000 seems like a fine value to give autogenerators some more room while excluding insanely large scripts which we still do not want to Ion compile.
Attachment #696502 - Flags: review?(dvander)
Posted patch increase limit to 20000 (obsolete) — Splinter Review
Bah, wrong patch.
Attachment #696502 - Attachment is obsolete: true
Attachment #696502 - Flags: review?(dvander)
Attachment #696503 - Flags: review?(dvander)
Compiling more with Ion is also good news for the baseline compiler because the code it generates will likely be slower than JM+TI (it won't have to recompile though).

We probably also want to raise the MAX_LOCALS_AND_ARGS limit, Emscripten for instance uses a ton of different locals. We should be careful though, more locals means more phis, more resume point operands etc.
Reporter

Comment 5

7 years ago
> I'm generally concerned about overtraining on benchmarks though (including the Emscripten ones), and 20000 seems like a fine value to give autogenerators some more room while excluding insanely large scripts which we still do not want to Ion compile.

Wouldn't we always want to eventually compile every function, no matter how large, if it's called enough? If (time to compile it + our estimate of how fast it will run when compiled * our estimate of how much it will be called) <= (our estimate of how fast it runs when not compiled * our estimate of how much it will be called) then it seems like we should compile it, and that will eventually be true for any arbitrarily large function is called often enough?
(In reply to Alon Zakai (:azakai) from comment #5)
> > I'm generally concerned about overtraining on benchmarks though (including the Emscripten ones), and 20000 seems like a fine value to give autogenerators some more room while excluding insanely large scripts which we still do not want to Ion compile.
> 
> Wouldn't we always want to eventually compile every function, no matter how
> large, if it's called enough? If (time to compile it + our estimate of how
> fast it will run when compiled * our estimate of how much it will be called)
> <= (our estimate of how fast it runs when not compiled * our estimate of how
> much it will be called) then it seems like we should compile it, and that
> will eventually be true for any arbitrarily large function is called often
> enough?

Well, not necessarily.  Compilation time for Ion is not linear in script size, and even if the underlying algorithms were improved there is a basic issue that compilation time will scale up with both script size and the total number of locals and args in the script (MAX_LOCALS_AND_ARGS above) --- for every basic block / bailout point the compiler needs to keep track of the state for every local/arg and active stack value.  This may be improved by disallowing bailouts (asm.js) though that can hurt performance in certain ways (e.g. hoistable bounds checks become unhoistable bitmasks).  Some scripts out there are just so outlandishly large that trying to compile them could tie up the compilation thread more or less indefinitely.  (I'm thinking of the 2MB script with 20k locals in bug 735974.)  While we can continue bumping or restructuring the thresholds to allow compiling larger scripts, I think there should still be some absolute limit on what can be compiled.
Reporter

Comment 7

7 years ago
I see that there might be some scripts that currently take too long to compile, yes. But even if compilation time is nonlinear, if the function is run often enough, it will eventually be productive to compile it - assuming we can feasibly get reasonable estimates of how long it would take to compile it, how often it is run, and how much faster it would be if compiled. And also if we have a way to handle a surprisingly long compilation, that as you say can tie up the compilation thread in a bad way (aborting the compilation?). Which of these is the trickiest?
(In reply to Jan de Mooij [:jandem] from comment #4)
> Compiling more with Ion is also good news for the baseline compiler because
> the code it generates will likely be slower than JM+TI (it won't have to
> recompile though).
> 
> We probably also want to raise the MAX_LOCALS_AND_ARGS limit, Emscripten for
> instance uses a ton of different locals. We should be careful though, more
> locals means more phis, more resume point operands etc.

These values also are tied to the encoding of snapshots/safepoints so we have to be careful. I think V8 caps them to something even lower, maybe for similar reasons.
Attachment #696503 - Flags: review?(dvander) → review+
Brian, is it worth using the old limit on single-core CPUs, or do you think it won't matter?
I am concerned about this patch; comments inline.

(In reply to Brian Hackett (:bhackett) from comment #2)
> The concern in bug 807464 for dealing with these larger scripts is that they
> have unpredictable compilation times.  That is more an underlying issue than
> one to address with script size limits, and is mitigated to a good degree by
> background compilation.

Background compilation obviously does not permit us to compile scripts of unlimited size. From the (admittedly limited) analysis in Bug 807464, the expectation for a script of size 20000, given complexity that scales with its size, is a compilation time of a minute and a half. Note that when I was running tests there were certain scripts that compiled significantly more slowly than the expectation, so this patch permits scripts of which the compilation time may be measured in several minutes.

Any one function that takes several minutes to complete is effectively an infinite compilation. Really, any compilation that takes longer than a few seconds should probably be treated as infinite, because resources are limited, and having several threads spinning forever affects computer responsiveness.

I think it would be reasonable to optimistically assume that every function is compilable, and thereby greatly increase the maximum script size limit as done in this patch, provided that we have the capability to detect scripts that are not reasonably compilable. This should at least include a time-based cutoff and a deterministic version of the time-based cutoff. Anything less than this is setting us up for a world of frustration.

> The attached patch isn't shooting for any sort of optimal value for this
> benchmark.  Using 10000 or 40000 works just as well...

This should be of significant concern: these values have meaningful impact on performance, and we obviously have no idea what the maximum ramifications of such high values are. The expected time to compile a function of length 40000 is at least 4 minutes, which is absurd.

As an additional note, per Bug 807464, a maximum script size > 2000 is detrimental to performance when parallel compilation is disabled. Since parallel compilation is a run-time option even in the browser, the 2000 limit should be kept when parallel compilation is disabled.
(In reply to Sean Stangl from comment #10)
> I am concerned about this patch; comments inline.
> 
> (In reply to Brian Hackett (:bhackett) from comment #2)
> > The concern in bug 807464 for dealing with these larger scripts is that they
> > have unpredictable compilation times.  That is more an underlying issue than
> > one to address with script size limits, and is mitigated to a good degree by
> > background compilation.
> 
> Background compilation obviously does not permit us to compile scripts of
> unlimited size. From the (admittedly limited) analysis in Bug 807464, the
> expectation for a script of size 20000, given complexity that scales with
> its size, is a compilation time of a minute and a half. Note that when I was
> running tests there were certain scripts that compiled significantly more
> slowly than the expectation, so this patch permits scripts of which the
> compilation time may be measured in several minutes.
> 
> Any one function that takes several minutes to complete is effectively an
> infinite compilation. Really, any compilation that takes longer than a few
> seconds should probably be treated as infinite, because resources are
> limited, and having several threads spinning forever affects computer
> responsiveness.
> 
> I think it would be reasonable to optimistically assume that every function
> is compilable, and thereby greatly increase the maximum script size limit as
> done in this patch, provided that we have the capability to detect scripts
> that are not reasonably compilable. This should at least include a
> time-based cutoff and a deterministic version of the time-based cutoff.
> Anything less than this is setting us up for a world of frustration.
> 
> > The attached patch isn't shooting for any sort of optimal value for this
> > benchmark.  Using 10000 or 40000 works just as well...
> 
> This should be of significant concern: these values have meaningful impact
> on performance, and we obviously have no idea what the maximum ramifications
> of such high values are. The expected time to compile a function of length
> 40000 is at least 4 minutes, which is absurd.
> 
> As an additional note, per Bug 807464, a maximum script size > 2000 is
> detrimental to performance when parallel compilation is disabled. Since
> parallel compilation is a run-time option even in the browser, the 2000
> limit should be kept when parallel compilation is disabled.

I think you miscalculated the expected times in this post.  Using the formula you provided in bug 807464 comment 3, I get expected compilation times of 93ms for a 20,000 byte script and 243ms for a 40,000 byte script.  Those are long but not unreasonable in a background context.  They are unreasonable in a single threaded context though, I'm fine with leaving the limit at 2,000 when doing main thread compilation.

Of course, the other aspect of bug 807464's analysis is that compilation times are unpredictable for these larger scripts, which suggests not that we shouldn't compile them but that using a script size cap is a pretty junky heuristic.  It would be better if the main thread just canceled compilations that are taking too long, for some definition of "too long" (scaling up with use count presumably, but with a maximum cap).  The main thread can already cancel off thread compilations (we do that if the compilation is being invalidated due to GC or new type info), but a facility for this sort of eager canceling doesn't exist.  That could be added now, but I'd prefer to wait for the baseline compiler as that will require all sorts of retuning in this area anyways.
(In reply to Brian Hackett (:bhackett) from comment #11)
> I think you miscalculated the expected times in this post.  Using the
> formula you provided in bug 807464 comment 3, I get expected compilation
> times of 93ms for a 20,000 byte script and 243ms for a 40,000 byte script. 
> Those are long but not unreasonable in a background context.  They are
> unreasonable in a single threaded context though, I'm fine with leaving the
> limit at 2,000 when doing main thread compilation.
> 
> Of course, the other aspect of bug 807464's analysis is that compilation
> times are unpredictable for these larger scripts, which suggests not that we
> shouldn't compile them but that using a script size cap is a pretty junky
> heuristic.  It would be better if the main thread just canceled compilations
> that are taking too long, for some definition of "too long" (scaling up with
> use count presumably, but with a maximum cap).  The main thread can already
> cancel off thread compilations (we do that if the compilation is being
> invalidated due to GC or new type info), but a facility for this sort of
> eager canceling doesn't exist.  That could be added now, but I'd prefer to
> wait for the baseline compiler as that will require all sorts of retuning in
> this area anyways.

Ah, yeah. Sorry about the miscalculation. With compile times around 100ms, we should be fine with taking this patch and leaving the 2000 limit in the sequential case.
This is different enough to need a rereview.  More complicated than I'd like due to not being able to compile off thread during an IGC.  (This restriction is on the TODO list for removal after bug 804676.)
Attachment #696503 - Attachment is obsolete: true
Attachment #697972 - Flags: review?(dvander)
Comment on attachment 697972 [details] [diff] [review]
variable script length patch

Review of attachment 697972 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Ion.cpp
@@ +1200,5 @@
> +OffThreadCompilationEnabled(JSContext *cx)
> +{
> +    return js_IonOptions.parallelCompilation
> +        && cx->runtime->useHelperThreads()
> +        && cx->runtime->helperThreadCount() == 0;

Should that be != 0?
Attachment #697972 - Flags: review?(dvander) → review+
Hmm, looks fine on awfy (which uses threadsafe builds / off thread compilation).
https://hg.mozilla.org/mozilla-central/rev/cb1f63ecc1cd
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
This caused a huge regression on misc-skinning on awfy-assorted (from 0.8 to 80 seconds) Do we know what's happening there?

http://arewefastyet.com/#machine=11&view=breakdown&suite=misc
Reporter

Comment 20

7 years ago
Thanks for noticing. That's a very important benchmark, it's real-world code from imvu - please let's look into this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
If we aren't inclined to believe regressions from talos' Kraken runs, can we just switch them off? It seems a complete waste otherwise...
(In reply to Jan de Mooij [:jandem] from comment #19)
> This caused a huge regression on misc-skinning on awfy-assorted (from 0.8 to
> 80 seconds) Do we know what's happening there?
> 
> http://arewefastyet.com/#machine=11&view=breakdown&suite=misc

Filed bug 829175.

Comment 23

6 years ago
This bug is fixed, isn't?
Reporter

Comment 24

6 years ago
Should be, yes.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago6 years ago
Resolution: --- → FIXED
Whiteboard: [games]
Blocks: gecko-games
(In reply to Brian Hackett (:bhackett) from comment #17)
> Hmm, looks fine on awfy (which uses threadsafe builds / off thread
> compilation).

It looks like this was incorrect; AWFY actually did show a Kraken regression of around 2% when this landed on inbound on 2013-01-08.  It may have been masked by noise when looking at individual runs, but it's clear when looking at the moving average.  Should we file a bug for that regression?
(In reply to Matt Brubeck (:mbrubeck) from comment #25)
> It looks like this was incorrect; AWFY actually did show a Kraken regression
> of around 2% when this landed on inbound on 2013-01-08.  It may have been
> masked by noise when looking at individual runs, but it's clear when looking
> at the moving average.  Should we file a bug for that regression?

In particular, AWFY showed a 31% increase in kraken-beat-detection in this range:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0a8823187cdb&tochange=00e758fb0e1e

That was followed two weeks later by a 38% decrease in kraken-beat-detection thanks to bug 832578.  I don't know if that "fixed" the original regression, or just canceled it out.
You need to log in before you can comment on or make changes to this bug.