Closed Bug 463137 Opened 11 years ago Closed 10 years ago

TM/nanojit: type-check LIR

Categories

(Core :: JavaScript Engine, enhancement)

x86
Linux
enhancement
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: jimb, Assigned: njn)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(4 files)

It would be nice if TraceMonkey developers could spend less of their time tracking down code generation errors.  A verification pass that ran after LIR generation could catch some kinds of incorrect code immediately.  If such a pass ran after native code generation, it could catch bugs in the assembler as well.  If the pass used a type system that was sound, it would catch all segmentation faults due to JITted code, in addition to other kinds of bugs.
The verifier would be an optional pass, perhaps enabled by default in debugging builds, that would disassemble machine code into LIR, analyze the LIR to infer the types of the values being operated on, and use those types to validate memory reference, arithmetic operations, and so on.

The results here would be:

- a type system for validation, whose types describe what the validator can know about values.  These types would include what JavaScript calls types, shapes, and perhaps internally used values like shape numbers.

- type rules: for each primitive operation (add; subtract; fetch), a function that looks at the types of the operands and computes the type of the result --- or complains if the operation is ill-typed.

- An "abstract interpreter" for LIR that starts with the known types of the inputs and then walks the code applying the type rules for the operations in the code.

We could stage the work as follows:

- Start with a trivial type system, perhaps with types "tagged value", "integer", "pointer", "float".  Ensure that we never ld the result of a float-valued LIR, etc.  This allows us to get the interpreter working on the LIR vocabulary.

- Elaborate the type system to know about objects, shapes, pointers to floats, dense arrays, etc.

- Disassemble machine code to LIR, to bring the assembly pass under scrutiny as well.

The type system would necessarily be specific to the layout of the objects it describes.  For example, in TraceMonkey, passing a guard that checks the shape of an object would refine the validator's understanding of type of that object and permit subsequent member access.  So different clients of nanojit would need different type systems.

This could mean that the abstract interpreter and disassembler should be part of nanojit, and the type system should be provided by nanojit's client.  Or, the type system could be part of nanojit and provide all the types all nanojit's clients need.  Or perhaps the whole verifier would be TraceMonkey-specific.
Summary: TraceMonkey should verify the compiler's output → TM: TraceMonkey should verify the compiler's output
Severity: normal → enhancement
Assignee: general → jim
Flags: wanted1.9.2?
There's a hitch in the idea of translating machine code to LIR.

Edwin Smith has said that LIR is SSA without phi nodes: before a join, all live values need to be spilled to memory.  If we try to translate arbitrary graphs of machine code to LIR, there's no reason the incoming machine code would be expressible in this form.  If the machine code modifies a register on both branches above a join, and then refers to that register after the join, there's no way to express the operand of that last reference: it can't point to both originators at once.

Perhaps LIR produced from machine code *could* include phi nodes.  The validator needs to check them anyway.  The 'phi' LIR would be one that the validator would understand, but not the compiler.
Yes you probably want to introduce a LIR_phi node and have it point to expressions (hint you could use LIR_2 as the operand, aka con/cons style, to chain the refs indefinitely). 

Unlike the more general case; with the current compiler we'll find that all expressions across a branch get sunk to a store.
It sounds like a simple type system would be easy.  Disassembling native code would be a lot harder, as would more advanced type systems.


(In reply to comment #2)
> (hint you could use LIR_2 as the operand, aka con/cons style, to
> chain the refs indefinitely). 

I was hoping to get rid of LIR_2... I think it's currently only used for cmov and qcmov, because they are trinary and current LIR only handles binary ops, but with variable-width LIR (bug 492866) trinary ops could be supported.  But that's not all that important.
[clueless] Why isn't a phi node an ordinary two-operand instruction?
comment #3 true, was speaking from the perspective of the current tamarin branch which is fixed LIR.

comment #4: it could be, it depends on how we want to define it.  I was thinking of the case where we have n incoming paths and want to use a single LIR_phi to represent the merged value.  I would assume then that the stores on each path would be the operands for the phi (or more likely the expression being stored).
(In reply to comment #5)
> comment #4: it could be, it depends on how we want to define it.  I was
> thinking of the case where we have n incoming paths and want to use a single
> LIR_phi to represent the merged value.  I would assume then that the stores on
> each path would be the operands for the phi (or more likely the expression
> being stored).

How is this different from simply having a phi node whose operands are phi nodes?  If one forbids the left operand of a phi node from also being a phi node after the same most recent join, then an n-ary phi node is simply a linked list of binary phi nodes chained along the right children, like a lisp list.  And the semantics "just work".
Minor divergence (not about how to represent phi nodes):

LIR as it stands forces all scalar (non-memory, I mean) operands to
be into memory at a join point, because there are no phi nodes.

In that case, why bother to continue the fragment?  Wouldn't it be
the same cost to jump to another fragment, which has to pull all that
stuff back out of memory, as it would be to continue in the same
fragment after the join point?

If there's no real desire to do phi-node-ing on scalar values, then
this might be a nice simplication of LIR (== get rid of forward jumps
altogether, since they're just as expensive as jumping to a new 
fragment).  Sure, phi-nodes will make LIR more general, but they'll
also make the code that has to process it significantly more complex.

So I guess my assumptions here are borked .. but which ones?
In case the term "abstract interpreter" in the description misleads anyone: we're just talking about the obvious type checking algorithm, traversing the LIR and checking each node.

But when the type rules are handled by callbacks from nanojit into jstracer, the nanojit code for operations other than branches ends up looking like an interpreter for LIR that delegates all the actual operations (adding two values; fetching) to a class provided by jstracer.  At branch and join points we do non-interpreter-like things, but making everything else look as much like an interpreter for LIR as possible gives you confidence that your type checking is true to the semantics of the language.
Assignee: jim → jseward
Should we rename this bug as "type-check LIR", and push other kinds of checking into another bug?
Comment #10 contains a first attempt at a typechecker for LIR.

This is a WIP patch, but is pretty usable.  I'm putting it up 
for people to look at and try out.

It can run through the whole of trace-test.js without difficulty,
and it found the type error reported in
https://bugzilla.mozilla.org/show_bug.cgi?id=503427

It also reports some other invalid-LIns* reference errors which
I have not yet investigated.

How to use: firstly, is only built as part of a debug build.
There are two settings:

* TMFLAGS=lircheck.  Runs the checker, but is silent unless 
  errors are reported, in which case all error output is prefixed
  by the string LIR_CHECKER, for easy greppage.

* TMFLAGS=lircheckverbose.  Runs the checker and gives a very
  verbose listing of everything that goes through it.  This is
  necessary for diagnosing type errors.

When is the checker run?  The basic algorithm is such that it could
be placed at any reader or writer pipeline stage, although at the moment
the implementation inherits from LirWriter, so it can only be used
in a writer position right now.

Currently the checking is done right at the front of the writer
pipeline, at the point where the tracer and the regexp compiler
generate new LIRs.

The type system distinguishes 32-bit from 64-bit quantities and
float from integer quantities.  It works by collecting up all the
constraints imposed on both results and arguments from all the LIR
nodes, and trying to resolve them -- if they can't be resolved then
a type error is reported.  This roundabout mechanism is needed 
because LIR is overloads load and store data types.

For example, LIR_ldq produces a 64-bit something, and that could
be fed either to a floating or integer operation (but not both).
So we have to see if later uses of the value further constrain it
to 64-bit int or 64-bit float, and report any cases where it gets
constrained to both as being a type error.
Summary: TM: TraceMonkey should verify the compiler's output → TM/nanojit: type-check LIR
Comment on attachment 388356 [details] [diff] [review]
First attempt at a typechecker for LIR


>+            case LTy_CMIN:  return "CMIN(no-constraint)";

I'd print this as "Any" or "AnyXX".

>+    /* If possible, merge two type constraints together, generating a
>+       new type constraint.  Or fail to.  'res' is unchanged whenever
>+       this returns false. */
>+    bool mergeLTys ( /*OUT*/LTy* res, LTy t1, LTy t2, LogControl* logc )

I'd really really like to see the types described (where they are defined, in LIR.h) as a lattice.  This function would then be meetLTys() or glbLTys() (or joinLTys() or lubLTys(), depending how the lattice is drawn).  Then CMAX/CMIN would be top/bottom (or bottom/top).  And it'd be good to have a brief comment explaining the lattice for those who aren't familiar with them :)  

And you can avoid the need for two return values by just returning CMAX where you currently return false, and then callers can check for CMAX instead of false.

Also, C++ references are arguably better than pointers for out-parameters.

>+          case LIR_flt:
>+          case LIR_fle:
>+          case LIR_fgt:
>+          case LIR_fge:
>+          case LIR_feq:
>+              *resFml    = LTy_Int32;
>+              argFmls[0] = LTy_Flt64;
>+              args[0]    = ip->oprnd1();
>+              argFmls[1] = LTy_Flt64;
>+              args[1]    = ip->oprnd2();
>+              return 2;

Nit: I'd reorder this (and similar cases) so that argFmls[] assignments are all together, and likewise for args[] assignments.

>+          case LIR_alloc:
>+              *resFml = getHostWordLTy();
>+              return 0;

LIR_alloc takes an integer argument, no?

>+          case LIR_cmov:
>+              // kludge; args 2 and 3 are in LIR_2 pointed to by
>+              // the second operand.
>+              *resFml    = LTy_Any32;
>+              argFmls[0] = LTy_Any32; /* cond */
>+              args[0]    = ip->oprnd1();
>+              return 1;

You should check that operand 2 is a LIR_2 and then constrain its operands to have the same types.  (This will go away when bug 501232 is landed, but it can be handled in this form.)

>+          case LIR_fcall:
>+          case LIR_call: {
>+              ArgSize sizes[10];

Use MAXARGS.  (Occurs more than once.)

>+              *resFml = opc == LIR_fcall ? LTy_Flt64  
>+                                         : LTy_Int32;

Can LIR_fcall return a 64-bit integer?

>+        /* Algorithm: associate with each LIR, a Ty indicating the

Nit: Use "LIns" or "LIR instruction" instead of "LIR".  (Occurs more than once.)

>+   LIns* LirChecker::ins0(LOpcode v) {
>+        LIns* i = out->ins0(v);
>+        checkLIns(i);
>+        return i;
>+    }
>+    LIns* LirChecker::ins1(LOpcode v, LIns* a) {
>+        LIns* i = out->ins1(v, a);
>+        checkLIns(i);
>+        return i;
>+    }

This repetition is a shame.  I wonder if it's worth making a class, say, LirWriter0, beneath LirWriter that has no virtual methods other than the constructor, make LirWriter inherit from LirWriter0, and make the pipeline a LirWriter0* instead of a LirWriter.  (And feel free to use better names than LirWriter/LirWriter0.)

It'd be good to know about places where the type-checking has holes.  Eg. I think LIR_ov is one.  And so is LIR_callh -- actually, you don't seem to have LIR_callh in there anywhere...
(In reply to comment #12)
> >+          case LIR_fcall:
> >+          case LIR_call: {
> >+              ArgSize sizes[10];
> 
> Use MAXARGS.  (Occurs more than once.)

Actually, you probably want MAXARGS+1, to allow for the return value?
(In reply to comment #11)

> For example, LIR_ldq produces a 64-bit something, and that could
> be fed either to a floating or integer operation (but not both).
> So we have to see if later uses of the value further constrain it
> to 64-bit int or 64-bit float, and report any cases where it gets
> constrained to both as being a type error.

tamarin-tracing (but not -redux or tracemonkey) used a value encoding
where all non-double values were tagged in a way that made them look
like NaN's.  Then, you'd have code like this:

value = ldq ...
if (value & mask < limit) { // tag test uses value as int64
   // value is a double
   ... = fadd(value, ...) // use value as double
} else {
   // value is tagged-something
   ... = qlo(value)  // <-- use value as int64
}

I think the type checker would reject code like this?

possible solutions:

1. ignore -- TT is obsolete, not a problem in TR or TM (but we may
want to bring back this encoding in TR)

2. require a no-op cast instruction in one case or the other.  
(this would correspond to the implicit GPU<->FPU copy that the
assembler would have to emit, in one case or the other)

3. can we allow int64|double to coexist without being an error and
without losing useful type checking?
Is on hold during the nanojit merging, since it visits all of the LIR.
Will resume when that's completed.
QA Contact: nanojit → general
Blocks: 495569
Depends on: 534309
Depends on: 520714
No longer depends on: 534309
Stealing.  I have a mostly-working patch that I'll put up soon.  It's a lot easier now because LIR has become better typed, eg. int64 and float64 operations are distinguished (thanks in part to bug 520714).
Assignee: jseward → nnethercote
Attached patch NJ patchSplinter Review
This NJ patch reimplements the LIR type-checker, adding new stages
ValidateWriter and ValidateReader to NJ.  It's much simpler than Julian's type
checker, because LIR has been cleaned up so there's no need for merging type
constraints and all that -- we can just check types immediately.  There are a
couple of obscure cases where LIR isn't type-clean, these are currently
unchecked and marked with XXX, and discussed in detail in bug 540368.

- ValidateWriter does a combination of type-checking and also "structure
  checking", ie. ad-hoc stuff mostly required due to hacks in LIR's design
  (also some 32-bit vs 64-bit differences).  It replaces SanityFilter.

- Unlike Julian's patch, ValidateWriter checks implicit LIR instructions, ie.
  those specified as args to functions like insLoad(), rather than explicit
  LIns objects.  This is less clean -- you can't do it all in a nice single
  checkLIns() function -- but it is necessary if you want to check the LIR
  coming directly from the compiler at the start of the writer pipeline, before
  optimisations are applied, which I think is important.

- ValidateReader is very similar to TR's, it just checks some things that
  ValidateWriter cannot check.

- StackFilter doesn't need a LirBuffer, I removed this field.

- StackFilter is now allocated via 'alloc' rather than on the stack, like the
  other reader pipeline stages.
    
- TR's ValidateWriter used to abort if it hit a guard, because TR doesn't use
  guards.  That check is now gone, it's hard to do within NJ.

- TR's ValidateWriter checks that the arg to LIR_ret is word-sized, but
  requiring it to be int-sized is more consistent with everything else (we
  should have LIR_qret and LIR_pret if necessary) and how it's used in TM and
  TR doesn't seem to use it.  I changed the comment in LIRopcode.tbl
  accordingly as well.
        
- I moved two CallInfo methods from Assembler.cpp to LIR.cpp, a more sensible
  place for them.

- I removed a bogus LIR_ov instance in ExprFilter -- it's unary but was being
  treated as binary.
  
- In lirasm, I fixed some misplaced opcodes.  And I fixed ill-typed
  qilsh/qirsh/qursh usage in --random mode.

  There's one lirasm test failure, for multfrag3.in on 64-bit, because it uses
  'param' when it wants some int32 args, but that gives it int64 args.  Not
  sure how to fix this, I'm not even sure if it supposed to be possible to get
  32-bit int args on 64-bit platforms.
  
- In LIRopcode.tbl I more clearly documented the nastiest instructions.

- I added LIR::retType().
Attachment #422162 - Flags: review?(jseward)
Attachment #422162 - Flags: review?(rreitmai)
Attached patch TM patchSplinter Review
This TM patch adjust for the above NJ patch.  In particular:

- Fixes a type error in js_SetCall{Arg,Val}() - an int-sized arg was being
  passed where a word-sized one was expected.  Turns out it didn't change
  anything because on X64 the codegen for 64-bit integers that fit into 32-bits
  is the same as that for 32-bit integers.  But it's good to have fixed anyway.
Attachment #422163 - Flags: review?(jseward)
Attachment #422162 - Attachment description: patch → NJ patch
Attached patch TR patchSplinter Review
This is the TR patch.  SeqReader has become a bit ugly but I couldn't see
a better way to do it.  It passes the acceptance tests.
Attachment #422164 - Flags: review?(rreitmai)
Attachment #422164 - Flags: review?(rreitmai) → review+
Comment on attachment 422162 [details] [diff] [review]
NJ patch

Excellent clean-up.  

As you mention in the comments, we are validating the LIR that's pushed into the Writer, it would be interesting to see if some of this code could be co-opted to validate the LIR that comes out the other end.  

Thinking out loud, could we could wire up an instruction pump that pulls LIR from a ValidateReader and feeds it to a ValidateWriter that then pushes it to a null sink?
Attachment #422162 - Flags: review?(rreitmai) → review+
(In reply to comment #20)
> 
> As you mention in the comments, we are validating the LIR that's pushed into
> the Writer, it would be interesting to see if some of this code could be
> co-opted to validate the LIR that comes out the other end.  

Something like that is certainly possible, but I think we'd be getting into diminishing returns territory.  In TM we have a ValidateWriter at the start of the pipeline and the end of the pipeline, and I don't think any of the stages modify LIns objects as they are returned.  There's also a certain amount of sanity checking in the backends.
(In reply to comment #17)

> - TR's ValidateWriter checks that the arg to LIR_ret is word-sized, but
>   requiring it to be int-sized is more consistent with everything else (we
>   should have LIR_qret and LIR_pret if necessary) and how it's used in TM and
>   TR doesn't seem to use it.  I changed the comment in LIRopcode.tbl
>   accordingly as well.

I think this new ValidateWriter is great, but does the above comment mean we can't use it in 64-bit TR builds until we add qret/pret?  I'm also in favor of those new opcodes, but if using ValidateWriter depends on them existing, then we (TR folks) should hurry up and work on the qret patch.
(In reply to comment #22)
> 
> I think this new ValidateWriter is great, but does the above comment mean we
> can't use it in 64-bit TR builds until we add qret/pret?  I'm also in favor of
> those new opcodes, but if using ValidateWriter depends on them existing, then
> we (TR folks) should hurry up and work on the qret patch.

There isn't a dependency, because the patch just doesn't type-check LIR_ret and the two other ill-typed opcodes (LIR_qlo, LIR_live).  Look for the 'XXX' marks in the patch.  Bug 540368 exists for those and it can be fixed after this bug.
Comment on attachment 422162 [details] [diff] [review]
NJ patch

Nicely done; seems a clean structure.  My only comment, which definitely falls into the hair-splitting department, is that you use the term "formals" to refer to the expected argument types of primitive operations.  I've only seen it used to refer to the arguments & types of subroutines.
Attachment #422162 - Flags: review?(jseward) → review+
Comment on attachment 422163 [details] [diff] [review]
TM patch

Looks ok.
Attachment #422163 - Flags: review?(jseward) → review+
(In reply to comment #24)
> My only comment, which definitely falls
> into the hair-splitting department, is that you use the term "formals" to refer
> to the expected argument types of primitive operations.  I've only seen it used
> to refer to the arguments & types of subroutines.

Dude, I copied that from your original patch!
(In reply to comment #26)

> Dude, I copied that from your original patch!

Oh .. um .. er .. my apologies ..
/me feels a bit silly now :)
I'm trying to pull this into TR, but there's a serious failure in our acceptance tests, and it boils down to the revised ValidateReader. In ValidateReader::read(), we have 

        case LIR_jtbl: 
            ...
            LIns* target = ins->getTarget();

but in LIns::getTarget(), we have

    LIns* LIns::getTarget() const {
        NanoAssert(isBranch() && !isop(LIR_jtbl));
        return oprnd2();
    }

it's not clear to me which of these is expected to be true...
Should have debugged before commenting... the line

                LIns* target = ins->getTarget();

should be

                LIns* target = ins->getTarget(i);
I took the liberty to push the fix to nj as http://hg.mozilla.org/projects/nanojit-central/rev/0a2444656722 on self-review, as this looks trivial and safe.
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Hooray for tag checking :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.