Closed Bug 527754 Opened 15 years ago Closed 15 years ago

CseFilter not able to handle downstream modification of instructions

Categories

(Core Graveyard :: Nanojit, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rreitmai, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 1 obsolete file)

The new cse filter as introduced by bug 502778 subtly changed behaviour for downstream filters.

It is no longer possible to return a different instruction from a filter than the one received; e.g SoftFloatFilter input=call output=qjoin.

As an example, the following code from CseFilter::insCall()
expects that out->insCall() returns an ins where isCall() is true.

           return exprs->add(LInsCall, out->insCall(ci, args), k);
(In reply to comment #0)
> 
> It is no longer possible to return a different instruction from a filter than
> the one received; e.g SoftFloatFilter input=call output=qjoin.

That's almost right... it's possible to return a different instruction so long as it has the same LInsHashKind.  But in the example case, LIR_qjoin has kind LIns2 instead of LInsC so the qjoin is incorrectly put into the LInsC hashtable.

Ugh.  This wasn't intentional, sorry.  Thanks for the diagnosis, I never would have worked that out with Tracemonkey because it doesn't do that kind of optimisation AFAIK.

I can see a fix that involves an opcode-to-hashkind lookup table which would change the code above to this:

           LIns* ret = out->insCall(ci, args);
           exprs->add(hashkind[ret->opcode()], ret, k);
           return ret;

Either we do that or we have to revert to a single hash table for all instructions.  I think the opcode-to-hashkind table sucks less.  And I might be able to remove LInsHashKind by using LInsRepKind... I'll take a look.
Assignee: nobody → nnethercote
I've determined that this problem existed even before the patch for bug
502778 was landed.  Consider this function from before the landing:

    LIns* CseFilter::insImm(int32_t imm)
    {
        uint32_t k;
        LInsp found = exprs.find32(imm, k);
        if (found)
            return found;
        return exprs.add(out->insImm(imm), k);
    }

find32() looks up the hash table, returning a previous identical instruction
(if found) or NULL (if not).  'k' is a reference argument that gets set to
the index where the instruction was found, or the empty slot where it wasn't
found and thus should be inserted.  In the not-found case, add() then sticks
the result of 'out->insImm(imm)' directly into that slot without rehashing.
This assumes that 'out->insImm(imm)' belongs in the same slot, ie. that
no downstream stages have changed the instruction.  So if SoftFloatFilter
changes a LIR_call to a LIR_qjoin, that assumption is violated.

So it's always been a problem -- CseFilter doesn't correctly handle
downstream filters changing instructions -- it just hasn't been noticed.    
                                                              
Probably what was happening pre-502778 was that the LIR_qjoin was being put  
into the wrong place in the hash table, which means that any subsequent      
identical LIR_qjoin wasn't being CSEd... unless that identical LIR_qjoin had 
been transformed from a correspondingly identical LIR_call(!)  It probably   
worked most/all of the time because the bug occurred consistently and in the 
same direction (if you see what I mean).                      
                                                              
With the 502778 landing the problem was made more obvious because the        
transformed instructions were not just being put in the wrong slot, but      
instead in the wrong hashtable.                               
                                                              
The obvious fix is, in the not-found case, to rehash the result of           
'out->insImm(imm)' when adding it.  Unfortunately, this will probably cause  
a non-negligible slowdown as the cost of hashing is already non-negligible,  
and this fix would almost double the amount of hashing.  And it's annoying   
to have to do, because downstream transformations are extremely rare and so  
the rehash will almost always give the same value as the hash.

Another possibility is to check that the instruction hasn't been transformed 
downstream, something like this:                              
                                                              
    LIns* CseFilter::insImm(int32_t imm)                      
    {                                                         
        uint32_t k;                                           
        LInsp found = exprs.find32(imm, k);                   
        if (found)                                            
            return found;                                     
        LInsp new = out->insImm(imm);                         
        if (new->opcode == LIR_int && new->imm32() == imm)    
            return exprs.add(new, k);   // put straight into slot k          
        else                                                  
            return exprs.add(new);      // rehash before adding
    }                                                         
                                                              
The equality test is probably faster than a rehash, but it still sucks.     
                                                              
I'm not sure what to do.  CSE is a black art, and the writer pipeline's     
behaviour is a lot more subtle and complex than it first appears.  I can    
backout 502778 and the follow-up 527288 if people think that is a good idea.
That at least gets us back to a better-behaving buggy version.
I'm comfortable disallowing downstream transforms, but it would be good to have a mechanism in place to enforce it.

I suppose one option is to introduce a filter whose sole purpose is to ensure this behaviour.  We'd then simply place it upstream of the CseFilter; probably for debug builds only.
Ah, that makes things easier.  An extra pass is overkill, we can just do this:

    LIns* CseFilter::insImm(int32_t imm)                      
    {                                                         
        uint32_t k;                                           
        LInsp found = exprs.find32(imm, k);                   
        if (found)                                            
            return found;                                     
        LInsp new = out->insImm(imm);                         
        NanoAssert(new->opcode == LIR_int && new->imm32() == imm);    
        return exprs.add(new, k);        
    }
OK, works for me.   

The advantage of an external filter though is that it won't be tied to cse-only thus allowing it to be re-used in alternate situations where a similar issue may arise.

But either approach is fine by me.
Attached patch patch (obsolete) — Splinter Review
This patch adds assertions that check that filters downstream from CseFilter
don't change any instructions, because CseFilter assumes they don't.  It
also changes the signature of findImmf() in order to minimise the amount of
uint64_t/double type-punning required.

(I didn't use an external filter as comment 5 suggests;  in-CseFilter assertions are less work for now, it can be added if it's warranted in the future.)

Rick, Brendan:  this means that SoftFloatFilter can no longer be downstream
from CseFilter.  Is this a problem for TR/TM?  I don't have patches for TR/TM.  AFAIK SoftFloatFilter is broken for TM anyway.
Attachment #412787 - Flags: review?(rreitmai)
Attachment #412787 - Flags: review?(brendan)
Comment on attachment 412787 [details] [diff] [review]
patch

graydon could review too, I'm gonna duck on account of my TM bugs.

/be
Attachment #412787 - Flags: review?(brendan) → review?(gal)
Attachment #412787 - Flags: review?(rreitmai) → review+
Comment on attachment 412787 [details] [diff] [review]
patch

Looks good and I already moved the filter upstream in TR.
Comment on attachment 412787 [details] [diff] [review]
patch

Nice. Please switch the order of the filters in TM. Duping a bug I just filed against this.
Attachment #412787 - Flags: review?(gal) → review+
Blocks: 531165
Actually, this one depends on 531165 -- if we commit this patch first then platforms using SoftFloatFilter will hit the assertions in this patch.  We need to switch the pass order first.
No longer blocks: 531165
Depends on: 531165
This patch fixes the WinNT bustage, whereby a uint64_t arg was being passed to hashImmf() which expected a double.  Not sure why Linux GCC didn't complain about that.

But the ARM problem remains.  This bug is all about forbidding anything downstream of CseFilter from modifying instructions.  But LirBufWriter::insLoad() does exactly that -- since bug 527178's patch went in, insLoad() can change a load's base by adding a LIR_piadd to it.  The same thing is true of LirBufWriter::insStorei().  Argh.

It seems like changing the load base should happen further upstream, that LirBufWriter shouldn't be changing instructions as its putting them in the buffer.  Graydon, any thoughts?
Attachment #412787 - Attachment is obsolete: true
Depends on: 527178
Component: JIT Compiler (NanoJIT) → Nanojit
Product: Tamarin → Core
QA Contact: nanojit → nanojit
(In reply to comment #13)
> This bug is all about forbidding anything
> downstream of CseFilter from modifying instructions. 

This limitation is going to be a problem in the long run, IMHO... it effectively means that we can't do peephole optimizations downstream of cse, and trying to do them upstream seems to be painful (if not impossible) without some elaborate machinery to return faux LInsp values (so that things upstream of said peephole filter still "see" the instructions they expect).

Granted, this is probably more a general issue with the reader/writer architecture we have, which really seems to be optimized for single-instruction-at-a-time optimizations; adding optimizations that are N-to-1 transformations seems to be quite painful.
Whiteboard: fixed-in-nanojit
http://hg.mozilla.org/mozilla-central/rev/04951a8e1c25
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Sayre, this bug hasn't been fixed in NJ-central or TM yet -- I pushed a patch but backed it out.  (See comment 11, comment 12 and comment 15.)  Looks like you merged the push but not the backout?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #17)
> Sayre, this bug hasn't been fixed in NJ-central or TM yet -- I pushed a patch
> but backed it out.  (See comment 11, comment 12 and comment 15.)  Looks like
> you merged the push but not the backout?

I pushed the backout as well, the comment just had the nanojit-central cset id, so I didn't notice it was backing out this check-in.

http://hg.mozilla.org/mozilla-central/rev/1de57866b40b
(In reply to comment #14)
> (In reply to comment #13)
> > This bug is all about forbidding anything
> > downstream of CseFilter from modifying instructions. 
> 
> This limitation is going to be a problem in the long run, IMHO... it
> effectively means that we can't do peephole optimizations downstream of cse,
> and trying to do them upstream seems to be painful (if not impossible) without
> some elaborate machinery to return faux LInsp values (so that things upstream
> of said peephole filter still "see" the instructions they expect).

The limitation was always there, it's just that it only became obvious in really obscure cases before bug 502778 landed.  (Eg. it was lucky that SoftFloatFilter worked, see comment 2.)  The limitation can be revisited in the future if necessary, but it will involve a compile-time performance hit.

As for peephole optimisations, a LIR_nop could help a lot.

> Granted, this is probably more a general issue with the reader/writer
> architecture we have, which really seems to be optimized for
> single-instruction-at-a-time optimizations; adding optimizations that are
> N-to-1 transformations seems to be quite painful.

Yes, the current pipelines are very fast but very restrictive.  Eg. in the writer pipeline, instructions can't be modified (without great difficulty) once they've been written into the LirBuffer.  And the only optimisations that the reader pipeline can do is to ignore instructions.
http://hg.mozilla.org/tracemonkey/rev/507718e9dcf8
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey
(In reply to comment #19)
> As for peephole optimisations, a LIR_nop could help a lot.

Yep. I'm also hoping that bug 522593 could make some optimizations a lot more doable, as it would seem to allow construction of meta-opcodes without overhead...
http://hg.mozilla.org/tamarin-redux/rev/d1bb609bb3fe
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
http://hg.mozilla.org/mozilla-central/rev/507718e9dcf8
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: