628776 - Mops performance improvement on x86 - patching mopAddrToRangeCheckedRealAddrAndDisp

Reporter

Description

•

13 years ago

User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6.6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
Build Identifier: tamarin-redux-c72c9b1c20a9

Well, we don't really need range checks anymore. This function, however, also uses to extract a constant displacement that is used as the displacement field for the 'mov' instruction, sparing us some 'add's, so we want to keep this going. A few things to think about:

* The JIT apperantly optimizes away unnecessary loads, that is, loads whose value is never used. That's very desirable generally, only that in this case it changes the semnatics of the program a little bit, since we throw away an implicit bounds check (didn't have this problem before, since the bounds check was explicit). i.e.:
   pushint -1;
   li32;
   pop;
You'd expect this to throw a RangeError, but it doesn't, because no 'mov' takes place. Not THAT bad, since no practical code should do such a thing anyway, but still... Is using 'LOAD_VOLATILE' for EVERY load mop the way to go? Seems too violent.

* We don't need a base address now, which means it's very possible that we'll have a load/store with only a displacement. Either I'm doing something wrong or the JIT has some issue with displacement-only stores. Loads are ok, I get:
   mov ..., [disp];
but for stores, I get something like:
   mov ecx, disp;
   mov [ecx], ...
(Actually, only checked this one on central. Need to see if it still happens on redux)

* How much is it going to take to have the JIT take adventage of the SIB field? Extracting the displacement is nice, but we can potentially encode any pattern that is a subset of 'reg1 + reg2 * {1,2,4,8} + disp' in a single 'mov'. Now that we don't have 'reg1' occupied by the base addres, this becomes very relevant for code that does array access, and that's what compilers usually turn it into. This is useful in general too, because of 'lea'. I've seen some 'lea's jitted around but they all seem to involve only a register and a displacement.

Reproducible: Always

sold2

Reporter

Updated

•

13 years ago

Blocks: 628350

sold2

Reporter

Comment 1

•

13 years ago

(In reply to comment #0)
> Either I'm doing something wrong or the JIT has some issue with
> displacement-only stores. Loads are ok ... (Actually, only checked 
> this one on central. Need to see if it still happens on redux)

This happens on redux too. When the 'disp' and 'mopAddr' can be combined, I'm doing: (Corresponding to the section that starts with the comment "if mopAddr is a compiletime constant...")
   const int32_t address = mopAddr->immI() + *disp;
   *disp = 0;
   return InsConstPtr(reinterpret_cast<void*>(address));

(Any reason why there's no InsPtr, and does it matter?)

sold2

Reporter

Comment 2

•

13 years ago

(In reply to comment #0)
> How much is it going to take to have the JIT take adventage of the SIB field?

Apperantly, the JIT does use SIB, in a restricted manner. I noticed 'reg1 + reg2 + disp', but no scaling.

Component: Virtual Machine → JIT Compiler (NanoJIT)

Edwin Smith

Comment 3

•

13 years ago

(In reply to comment #0)
> User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0;
> Trident/4.0; GTB6.6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ;
> SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.21022;
> .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
> Build Identifier: tamarin-redux-c72c9b1c20a9
> 
> Well, we don't really need range checks anymore. This function, however, also
> uses to extract a constant displacement that is used as the displacement field
> for the 'mov' instruction, sparing us some 'add's, so we want to keep this
> going. A few things to think about:
> 
> * The JIT apperantly optimizes away unnecessary loads, that is, loads whose
> value is never used. That's very desirable generally, only that in this case it
> changes the semnatics of the program a little bit, since we throw away an
> implicit bounds check (didn't have this problem before, since the bounds check
> was explicit). i.e.:
>    pushint -1;
>    li32;
>    pop;
> You'd expect this to throw a RangeError, but it doesn't, because no 'mov' takes
> place. Not THAT bad, since no practical code should do such a thing anyway, but
> still... Is using 'LOAD_VOLATILE' for EVERY load mop the way to go? Seems too
> violent.
>

Violent, yes, but the alternative is nondetermnistic behavior (imagine a hotspot style dynamic jit that suddenly erases, or injects, or moves an exception).

This is a bug -- optimizations should not change program semantics.  proof of the bug would be if you run the program with -Dinterp (interpreter), or with all optimizations disabled (requires hacking CodegenLIR.cpp).

There's probably a latent documentation bug that li32 (and others) are underspecified.

> * How much is it going to take to have the JIT take adventage of the SIB field?

Probably not a lot.  Both the x86 and x64 backends have support code for SIB encodings, and already recognize these addressing mode patterns in LIR code.

See Nativei386.cpp (e.g. MODRMsib()) and NativeX64.cpp (e.g. mod_rxb())

sold2

Reporter

Comment 4

•

13 years ago

(In reply to comment #3)
> Probably not a lot.  Both the x86 and x64 backends have support code for SIB
> encodings, and already recognize these addressing mode patterns in LIR code.
> See Nativei386.cpp (e.g. MODRMsib()) and NativeX64.cpp (e.g. mod_rxb())

Yeah, you're right. The JIT does SIB just fine, I overlooked it. This comment can be ignored.

> This is a bug -- optimizations should not change program semantics.  proof of
> the bug would be if you run the program with -Dinterp (interpreter), or with
> all optimizations disabled (requires hacking CodegenLIR.cpp).

I think you misunderstood me. The li32 instruction (at the abc level) is not what being discarded. The load instruction (at the LIR level) is. The JIT apperantly discards loads whose value is never used (which is a good thing). Right now, mopAddrToRangeCheckedRealAddrAndDisp generates explicit code that does the bounds checking, in addition to the actual load. The load itself can be discarded by the JIT and the bounds checking still takes place.
With segmentation, the bounds checking is implied by the load instruction itself. If the load is being discarded, so does the bounds checking.
Btw, as it turns out, using LOAD_VOLATILE doesn't help here either.

Edwin Smith

Comment 5

•

13 years ago

Got it.  yeah, my point was that ABC OP_li32(-1) should trigger an error one way or the other regardless of whether the result is used.  If we're relying on a segfault to catch the error, we need nanojit to preserve the load.  we have to explicity mark it live (LIR_live can do that) unless we somehow prove it can't fault.

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb+

Flags: flashplayer-bug-

Priority: -- → P3

QA Contact: vm → sold2

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Status: UNCONFIRMED → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

Mops performance improvement on x86 - patching mopAddrToRangeCheckedRealAddrAndDisp

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: sold2, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated