TM: allow code printing between stages in the writer pipeline

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
9 years ago
7 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

9 years ago
Any good compiler allows its code representation to be printed between every compile stage.  We have this facility in the reader pipeline in Nanojit, but not in the writer pipeline in TM.  I propose to add it.  This will make it easier to see what the FuncFilter, ExprFilter and CseFilter are doing.
(Assignee)

Comment 1

9 years ago
Hmm, the writer pipeline is designed in such a way that this is rather 
difficult.  I have an in-progress patch that almost gets there but stumbles
at the final(?) hurdle.

First, we only have machinery for pretty-printing an LIns.  But the writer
pipeline mostly doesn't deal in LIns objects, but rather in "requests" like
ins2(LIR_add, a, b).  In order to pretty-print such a request, we have to
first convert into an LIns:

- An easy but memory-wasteful way is to add a LirBufWriter within
  VerboseWriter.  Each LIR instruction gets written into one buffer per
  writer stage (modulo optimisations removing the need for some of the
  instructions), but that's not the end of the world for debug-only output.

- Another way is to just create temporary LIns objects, eg. on the stack.
  This avoids the memory overhead.

My patch does the former.  Either way, we then hit two problems with the
naming of instructions in the output.

The first problem is that some instructions get special names.  We have a
function addName() which allows you to associate a name with each LIns.  The
name is stored in the LirNameMap.  If that LIns is printed (either as an
lvalue or an operand), the given name will be printed.  This leads to output
like this:

  01: state = iparam 0 ecx
  02: sp = ld state[0]
  03: rp = ld state[4]
  04: cx = ld state[8]

rather than the more plainly-named:

  01: iparam1 = iparam 0 ecx
  02: ld1 = ld state[0]
  03: ld2 = ld state[4]
  04: ld3 = ld state[8]

However, because addName() associates each name with a single LIns pointer, it
is fundamentally incompatible with the above-described approaches, whereby a
single LIR request is temporarily (or permanently) stored as multiple LIns
objects, one per pipeline stage.  The name only gets added for the final
LIns* that is returned by the pipeline.  All the intermediate VerboseWriters
don't get the name added, so they print the plainly-named form (at least,
for lvalues, the operands gets the pretty names because they use the final
LIns* values, not the intermediate LIns* values).

The solution I used in the patch was to get rid of addName(), and instead
pass the name through all the ins*() functions.  Actually, not all of them,
just those that can define expressions.  I used some macro hackery to avoid
the overhead of this extra argument for non-debug builds.  And the argument
has a default value of NULL so you don't have to specify it if you aren't
using a special name.  This works, it increases complexity of writer stages
a bit but not too much.

The second problem is with the non-special names the remaining instructions
are given.  Specifically, we see things like this:

   ld1 = ld sp[-16]
   $var0 = i2f ld2

when it should be this:

   ld1 = ld sp[-16]
   $var0 = i2f ld1

The problem is that the 'ld1' name is assigned to the LIns created for the
VerboseWriter's local buffer.  But that LIns is different to the final LIns
returned by the writer pipeline.  When the final LIns is printed in the
second line, it hasn't been seen by the naming machinery, so it is given a
new name 'ld2'.

The naming scheme is heavily tied into having each LIns only existing
once, without copies or anything.  (Having only one copy of each LIns is
important when we're dealing with a data-flow-based IR where each LIns has pointers back to other LInses.)

One possibility is to pass down an auto-generated name through the ins()
pipeline, much like how its done for manually-chosen names.  But that would
clutter up jstracer/jsregexp, though.  (Also, conceivable you'd end up with a
name based on the original LIR opcode, but that the opcode got changed by an
optimisation;  not a great problem.)  I haven't come up with a satisfactory
solution for this problem.
(Assignee)

Comment 2

9 years ago
Created attachment 394984 [details] [diff] [review]
in-progress, broken patch
(Assignee)

Comment 3

7 years ago
WONTFIXing this because I can't see how to do it nicely, and we've all lived without it long enough that it's obviously not a dire need.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.