Closed Bug 628776 Opened 13 years ago Closed 6 years ago

Mops performance improvement on x86 - patching mopAddrToRangeCheckedRealAddrAndDisp

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), enhancement, P3)

x86
All
enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sold2, Unassigned)

References

Details

User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6.6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
Build Identifier: tamarin-redux-c72c9b1c20a9

Well, we don't really need range checks anymore. This function, however, also uses to extract a constant displacement that is used as the displacement field for the 'mov' instruction, sparing us some 'add's, so we want to keep this going. A few things to think about:

* The JIT apperantly optimizes away unnecessary loads, that is, loads whose value is never used. That's very desirable generally, only that in this case it changes the semnatics of the program a little bit, since we throw away an implicit bounds check (didn't have this problem before, since the bounds check was explicit). i.e.:
   pushint -1;
   li32;
   pop;
You'd expect this to throw a RangeError, but it doesn't, because no 'mov' takes place. Not THAT bad, since no practical code should do such a thing anyway, but still... Is using 'LOAD_VOLATILE' for EVERY load mop the way to go? Seems too violent.

* We don't need a base address now, which means it's very possible that we'll have a load/store with only a displacement. Either I'm doing something wrong or the JIT has some issue with displacement-only stores. Loads are ok, I get:
   mov ..., [disp];
but for stores, I get something like:
   mov ecx, disp;
   mov [ecx], ...
(Actually, only checked this one on central. Need to see if it still happens on redux)

* How much is it going to take to have the JIT take adventage of the SIB field? Extracting the displacement is nice, but we can potentially encode any pattern that is a subset of 'reg1 + reg2 * {1,2,4,8} + disp' in a single 'mov'. Now that we don't have 'reg1' occupied by the base addres, this becomes very relevant for code that does array access, and that's what compilers usually turn it into. This is useful in general too, because of 'lea'. I've seen some 'lea's jitted around but they all seem to involve only a register and a displacement.

Reproducible: Always
Blocks: 628350
(In reply to comment #0)
> Either I'm doing something wrong or the JIT has some issue with
> displacement-only stores. Loads are ok ... (Actually, only checked 
> this one on central. Need to see if it still happens on redux)

This happens on redux too. When the 'disp' and 'mopAddr' can be combined, I'm doing: (Corresponding to the section that starts with the comment "if mopAddr is a compiletime constant...")
   const int32_t address = mopAddr->immI() + *disp;
   *disp = 0;
   return InsConstPtr(reinterpret_cast<void*>(address));

(Any reason why there's no InsPtr, and does it matter?)
(In reply to comment #0)
> How much is it going to take to have the JIT take adventage of the SIB field?

Apperantly, the JIT does use SIB, in a restricted manner. I noticed 'reg1 + reg2 + disp', but no scaling.
Component: Virtual Machine → JIT Compiler (NanoJIT)
(In reply to comment #0)
> User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0;
> Trident/4.0; GTB6.6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ;
> SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.21022;
> .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
> Build Identifier: tamarin-redux-c72c9b1c20a9
> 
> Well, we don't really need range checks anymore. This function, however, also
> uses to extract a constant displacement that is used as the displacement field
> for the 'mov' instruction, sparing us some 'add's, so we want to keep this
> going. A few things to think about:
> 
> * The JIT apperantly optimizes away unnecessary loads, that is, loads whose
> value is never used. That's very desirable generally, only that in this case it
> changes the semnatics of the program a little bit, since we throw away an
> implicit bounds check (didn't have this problem before, since the bounds check
> was explicit). i.e.:
>    pushint -1;
>    li32;
>    pop;
> You'd expect this to throw a RangeError, but it doesn't, because no 'mov' takes
> place. Not THAT bad, since no practical code should do such a thing anyway, but
> still... Is using 'LOAD_VOLATILE' for EVERY load mop the way to go? Seems too
> violent.
>

Violent, yes, but the alternative is nondetermnistic behavior (imagine a hotspot style dynamic jit that suddenly erases, or injects, or moves an exception).

This is a bug -- optimizations should not change program semantics.  proof of the bug would be if you run the program with -Dinterp (interpreter), or with all optimizations disabled (requires hacking CodegenLIR.cpp).

There's probably a latent documentation bug that li32 (and others) are underspecified.

> * How much is it going to take to have the JIT take adventage of the SIB field?

Probably not a lot.  Both the x86 and x64 backends have support code for SIB encodings, and already recognize these addressing mode patterns in LIR code.

See Nativei386.cpp (e.g. MODRMsib()) and NativeX64.cpp (e.g. mod_rxb())
(In reply to comment #3)
> Probably not a lot.  Both the x86 and x64 backends have support code for SIB
> encodings, and already recognize these addressing mode patterns in LIR code.
> See Nativei386.cpp (e.g. MODRMsib()) and NativeX64.cpp (e.g. mod_rxb())

Yeah, you're right. The JIT does SIB just fine, I overlooked it. This comment can be ignored.

> This is a bug -- optimizations should not change program semantics.  proof of
> the bug would be if you run the program with -Dinterp (interpreter), or with
> all optimizations disabled (requires hacking CodegenLIR.cpp).

I think you misunderstood me. The li32 instruction (at the abc level) is not what being discarded. The load instruction (at the LIR level) is. The JIT apperantly discards loads whose value is never used (which is a good thing). Right now, mopAddrToRangeCheckedRealAddrAndDisp generates explicit code that does the bounds checking, in addition to the actual load. The load itself can be discarded by the JIT and the bounds checking still takes place.
With segmentation, the bounds checking is implied by the load instruction itself. If the load is being discarded, so does the bounds checking.
Btw, as it turns out, using LOAD_VOLATILE doesn't help here either.
Got it.  yeah, my point was that ABC OP_li32(-1) should trigger an error one way or the other regardless of whether the result is used.  If we're relying on a segfault to catch the error, we need nanojit to preserve the load.  we have to explicity mark it live (LIR_live can do that) unless we somehow prove it can't fault.
Flags: flashplayer-qrb+
Flags: flashplayer-bug-
Priority: -- → P3
QA Contact: vm → sold2
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.