nanojit: LuaJIT-style CSE



Core Graveyard
8 years ago
4 years ago


(Reporter: njn, Assigned: njn)


Firefox Tracking Flags

(Not tracked)



(1 attachment)

Created attachment 456991 [details] [diff] [review]
rough patch

This patch implements LuaJIT-style CSE.  From, describing LuaJIT's CSE:

  "Skip-list chains: The IR is threaded with segregated, per-opcode
  skip-list chains. The links are stored in a multi-purpose 16 bit
  field in the instruction. This facilitates low-overhead lookup
  for CSE, DSE and alias analysis. Back-linking enables short-cut
  searches (average overhead is less than 1 lookup). Incremental
  build-up is trivial. No hashes, no sets, no complex updates."

"skip-list chains" turns out to be a misleading description, it has nothing to do with Pugh-style skip lists (a la  There's just a single linked list per opcode which links all instructions containing that opcode in the fragment.

There's a key idea missing in the above paragraph -- in SSA form when searching backwards through the list, if you find yourself at a point earlier than the def-point of any of the instruction's operands, you can stop.  Eg. if you have "add a, b" and you go back past the def-point of 'a' you can stop searching.  This usually works really well, but it doesn't apply for immediates, so the attached patch keeps the hash table for immediates.

The LuaJIT approach is simpler than the hash table approach.  And it gives a slight speed-up on SS, even though it's quite bad for crypto-md5.js, which is a pathological case because it contains the following function that is called (and so traced into) dozens of times in the main big loop:

function safe_add(x, y)
  var lsw = (x & 0xFFFF) + (y & 0xFFFF);
  var msw = (x >> 16) + (y >> 16) + (lsw >> 16);
  return (msw << 16) | (lsw & 0xFFFF);

Normally you get lookup counts of less than one, but for this test the average is something like 40, and it peaks at 700(!)

But the patch can't be applied currently.  For the "stop when you hit the def-point of an operand" test to work, LIR instructions must be laid out in strictly ascending order in memory, and NJ currently doesn't have that.  So I'm just attaching this patch for future's sake, if we ever gain that property.

Comment 1

8 years ago
Could you explain the comment "...but it doesn't apply for immediates". Why does an immediate instruction not act like a def?
(In reply to comment #1)
> Could you explain the comment "...but it doesn't apply for immediates". Why
> does an immediate instruction not act like a def?

Sorry, an immediate instruction does act like a def, so if you have 'add i1, i2' where i1 and i2 are immediates you can stop searching backwards once you pass the def-point of either i1 or i2.

What I meant is that immediates don't have any operands, so when you insert one you can't cut the list search short at any point.  Ie. if you've already inserted N immediates and you insert a new one that hasn't been seen before, you'll compare it against all N previous immediates.  So a hash table is still better for immediates.
I'll mark this WONTFIX.  There's no real advantage, and it precludes the use of LIR_skip to jump to side-blocks of code which can be useful, eg. see bug 545406.
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
Component: Nanojit → Nanojit
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.