Closed Bug 977853 Opened 10 years ago Closed 10 years ago

Perpetual bailouts in parallel code due to ConvertElementsToDouble optimization

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: nmatsakis, Assigned: nmatsakis)

References

Details

Attachments

(1 file, 2 obsolete files)

Examining bug 966567 reveals the following negative cycle for bailouts. The test case includes a bit of code that winds up creating a 4-element array in which the 4th element is always an integer:

    var position = [..., 1.0];
    var normal = position;

It then later uses this function as an argument to a function PJS_Div4():

    normal[0] = ...;
    normal[1] = ...;
    normal[2] = ...; // Note: normal[3] not modified
    ... PJS_Div4(normal, ...) ...

During the warmup runs, which never get as far as ion but stay within the interpreter and baseline, `normal[3]` is always the integer `1`. So when PJS_Div4() reads it, the type set of that read is just `int` (whereas the typesets for the other reads are `int|double`).

When we begin parallel execution, we use ion compilation, and it decides that since `position` has mixed ints and doubles, it should create a uniformly double array. Hence, `normal[3]` is not the integer 1 but the double 1.0. In PJS_Div4(), we insert a barrier for the read `normal[3]` because the observed type set is just an integer, but the heap type set is `int|double`. This barrier naturally fails. (Note: we also disable the "convert array to doubles upon access" code because it is not threadsafe.)

When the barrier fails, we exit parallel execution and go back to sequential. We do not yet go into ion, though, but rather stay in interpreter and baseline, which means that the type set is not updated and remains just integer. This repeats a lot.

With this patch, in parallel execution mode, we detect the scenario where the observed type set is smaller than the heap type set, but the heap type set is just `int|double`, and we ignore the observed type set and instead just coerce the value that we read into a double. This causes the bailouts to go away.

As far as I can tell, these same bailouts happen in sequential mode, but they are quickly resolved because we bailout to a partially executed PJS_Div with a pre-constructed array. This means that the type set is extended to include double. In parallel mode, this doesn't happen because we bailout all the way from the kernel. Therefore, I have limited these changes to parallel execution mode, though I imagine they might be equally useful in sequential. I defer that judgement call to jandem.
Blocks: 966567
Attached patch Bug977853.diff (obsolete) — Splinter Review
Attachment #8383302 - Flags: review?(jdemooij)
Attachment #8383302 - Flags: feedback?(shu)
Comment on attachment 8383302 [details] [diff] [review]
Bug977853.diff

Review of attachment 8383302 [details] [diff] [review]:
-----------------------------------------------------------------

Is this patch working around a ConvertToDoubles issue where even though we don't convert an array to doubles in parallel execution, we still ToDouble when we setelem/initelem its elements? Can we just fix that logic instead?
Attachment #8383302 - Flags: feedback?(shu)
We could also update that logic. I thought it'd be better to bail less frequently and carry on, but perhaps I am wrong, and it's better to bail so that the type sets can be fully updated?

I was also concerned about the case where a converted array flows in from the outside; but I guess in that event, the warmup should fix the typesets properly.
Put another way: it's good to specialize on what you have observed vs what you consider theoretically possible, but it seems that if the gap between the two is small, it may be better to just generate code that is not quite so narrowly tailored to what you have observed than to force a bailout. Or that was my thinking.
(In reply to Niko Matsakis [:nmatsakis] from comment #3)
> We could also update that logic. I thought it'd be better to bail less
> frequently and carry on, but perhaps I am wrong, and it's better to bail so
> that the type sets can be fully updated?
> 

It's my understanding that we are bailing out at all due to inconsistent handling of ConvertToDoubles logic in Ion. In parallel execution, we turn off the conversion optimization on the GETELEM side, but not the INIT/SETELEM side. What ends up happening is that for an Array marked as ConvertToDoubles, we still end up converting the values we write to doubles, despite not expecting this on the GETELEM side.

So we can not bail out either by adding a ToDouble on the GETELEM side, or by removing the ToDouble on the SETELEM side. The actual optimization seems to bet on that the writes are less frequent than reads, which is why we do the conversion once on the storage and not on the reads. I think that's the intuition I was going with which makes me think your patch is maybe not the right approach.
(In reply to Shu-yu Guo [:shu] from comment #5)
>  I think that's the intuition I was going with which makes me think your patch is maybe not the
> right approach.

Yes, I understood, and it's a fair point. My thought was that by being more accepting, we probably addressed similar problems that can occur in regular code as well -- that is, you might just have code that, during warmup, only saw integers, but which will later see doubles, and it's easily predicted. Admittedly, we haven't seen such code yet, and of course the normal bailout-and-rerun mechanism ought to address this.

In other words, just changing the SETELEM side will I believe be sufficient to restore the normal bailout-and-rerun mechanism so that it functions. Changing the GETELEM *might* permit us to improve on the bailout-and-rerun mechanism. But that's a bit of a gamble.

So, essentially, I think I'm swayed that your change is the more conservative one, and we should keep the idea of inflating typesets as a means of pre-empting bailouts in our backpocket as a tool we may want some day.
Looking at the code again I remembered something I had forgotten. The other part of my reasoning was that we want to interact with normal ion code. Therefore, this patch allowed us to accept arrays that sequential or parallel ion had produced, but by continuing to convert to double before hand, we produced the kinds of arrays that normal ion expected. That said, there seems to be no harm in us failing to convert to double, since normal ion will insert a call to iterate through the array and convert from int to double anyway.
After a long discussion in IRC, Shu and I came to various conclusions (Shu: let me know if you think this is not an accurate summary):

1. The approach in this patch is close but not quite right. In the event where the array has "convert elements to double" set, and the heap type set is |int+double|, it would be better to load the element and set its type set to |int+double|. What was a bit unclear to me is whether it is better to freeze the type set and omit the barrier or to include the barrier and leave the type set unfrozen. I guess the latter is closer to what we do normally so that is probably the best approach to default to.

Somewhat separately:

2. We could integrate the ConvertElementsToDoubles optimization into par exec better. For example, we could permit the MConvertElements MIR to be added even in par mode, but simply bailout if (1) the input is not already converted and (2) the input is not thread-local.

3. However, there is some danger, because part of the optimization modifies the template object flags for new arrays to set (or clear) the ShouldConvertElements flag. This seems like a data race and makes me nervous, since the same script could (in principle) be being compiled in both seq and par modes (and possibly definite prop mode as well?).
Ah, another point:

The current code only applies the ConvertElements optimization is the knownType is DOUBLE, but it would probably make sense to also apply the optimization if the knownType is INT.
Attachment #8383302 - Flags: review?(jdemooij)
Canceling review because I plan to revamp the patch tomorrow.
Attached patch Bug977853.diff (obsolete) — Splinter Review
In the case of a read from an array that is set to "always convert to doubles", but where only ints have been observed, skip the barrier and just use a broader observed type set. This is similar to the previous attempt, except that we use a broader type set rather than always coercing to double.
Attachment #8383302 - Attachment is obsolete: true
Attachment #8383612 - Flags: review?(jdemooij)
Attachment #8383612 - Flags: feedback?(shu)
Note: there is more that could be done here, but this is the minimum needed to stop PJS from bailing out. Some thoughts for further improvements:

1. Convert to doubles also in this case, rather than skipping the optimization entirely if barrier is true, at least in seq mode.

2. Make convert to doubles work in par mode as long as input is already converted or thread-local.

I think if we did both 1 and 2 it would also work to avoid the bailouts in PJS, though it feels like there is still some room for this patch. I am concerned about case where the array originates from outside ion, in which case we can't convert it in threaded exec, but if the seq variant doesn't run ion, it won't convert it either. Therefore, I could imagine that in parallel mode we *try* to convert the array to doubles but don't bailout on failure, instead just do nothing. This would then in turn require that we are conservative on the reads. Maybe pointless, since we pay the price of converting to doubles but don't gain the benefits.

So another option would be to do #1 and then have the code in the current patch remain for parallel execution, possibly only parallel execution. (I guess though that in seq mode similar situations can arise if the read is not in a loop etc, but I think they are corrected more quickly as noted previously.)
Sorry, in previous comment, I wrote "also work to avoid the bailouts in PJS" but I really meant "avoid the bailouts for bug 966567"
Testing on octane revealed no particular effect. Nonetheless, per discussion with jandem on IRC, we've decided to limit this change to parallel mode. Here are the benchmark results from my machine:

Before:

Richards: 28909
DeltaBlue: 28808
Crypto: 26157
RayTrace: 29144
EarleyBoyer: 27193
RegExp: 2538
Splay: 19020
SplayLatency: 14647
NavierStokes: 28761
PdfJS: 15037
Mandreel: 26877
MandreelLatency: 26517
Gameboy: 45852
CodeLoad: 16468
Box2D: 34158
zlib: 43745
Typescript: 16570
----
Score (version 9): 22078

After:

Richards: 28838
DeltaBlue: 29006
Crypto: 26095
RayTrace: 28371
EarleyBoyer: 27513
RegExp: 2457
Splay: 18914
SplayLatency: 14368
NavierStokes: 28493
PdfJS: 15007
Mandreel: 26481
MandreelLatency: 27391
Gameboy: 44601
CodeLoad: 16509
Box2D: 34090
zlib: 43661
Typescript: 16089
----
Score (version 9): 21919

So a net drop of 0.7%.
Attached patch Bug977853.diffSplinter Review
Limit optimization to parallel mode. Check that new_ doesn't fail to allocate.
Attachment #8383612 - Attachment is obsolete: true
Attachment #8383612 - Flags: review?(jdemooij)
Attachment #8383612 - Flags: feedback?(shu)
Attachment #8383707 - Flags: review?(jdemooij)
Assignee: nobody → nmatsakis
Comment on attachment 8383707 [details] [diff] [review]
Bug977853.diff

Review of attachment 8383707 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, this looks good. It would be good to add a small testcase to jit-tests that exercises this code path though.
Attachment #8383707 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/mozilla-central/rev/6f9de0cc867b
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: