Closed Bug 502778 Opened 11 years ago Closed 10 years ago

nanojit: speed up CseFilter

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: njn, Assigned: njn)

References

Details

(Whiteboard: fixed-in-nanojit fixed-in-tracemonkey)

Attachments

(2 files, 1 obsolete file)

Attached patch patch (obsolete) — Splinter Review
This patch splits the CseFilter's LInsHashSet into seven, one for each
instruction group.  This save somewhere between 3--6ms in SS.  It also
increases the starting size of these LInsHashSets from 64--256, which seems
to help another couple of ms, but the difference is small enough that it's difficult to claim it's decisive.

The patch also does a little bit of cleaning up:
- renames some of the hash*() and find*() functions so that the suffixes
  match the ins*() functions.
- avoids an unnecessary argument to hash() by computing hashcode() inside it
- reorders add() so that the call to find() isn't necessary (it adds the
  item then checks for growth, so the find() within grow() covers the newly
  added element)
- Removes CseReader, which is dead in TM and Tamarin
- removes LInsHashSet::replace();  also dead

I tried some further changes -- splitting add(), find() and grow() into more
specific kinds (addImm(), addImmq(), etc) but it didn't help speed at all.
Attachment #387130 - Flags: review?(edwsmith)
Attachment #387130 - Flags: review?(edwsmith) → review+
Comment on attachment 387130 [details] [diff] [review]
patch

A comment in CseFilter for the rationale behind N tables vs 1 would be nice, otherwise looks great.
The previous patch's effect somehow disappeared in the time since I first wrote
it.  Here is a new patch.  It's more extensive.  (Nb: If you read LIR.h changes
before LIR.cpp changes it'll make more sense.)

- The separation of one hash table into nine is now done within LInsHashSet
  rather than outside it.  This was necessary for some of the subsequent
  changes.  The initial size of each one can now be controlled independently
  from the others, which is good as some of them get more use than others.

- Previously, we had a mixture of specialised and generic operations:
  findXYZ() was specialised for the instruction kind (Imm, Immq, etc) when
  inserting, but add()/find()/grow()/equals()/hashcode() were generic, working
  with any instruction kind.  This was a bit sucky because certain operations
  were effectively coded twice, most notably find(), and if the
  implementations didn't match then problems would occur.  Also, we had to
  do switches in various places, most notably hashcode() and equals().
  
  The patch gets rid of find(), equals() and hashcode() by making add() and
  grow() specialised for the instruction kind (simply by taking a LInsKind
  argument, which then causes the appropriate findXYZ() function to be used).
  This required introducing additional findXYZ() wrapper functions that take
  a LInsp argument, and putting them the m_find[] array.

- Now handling insImmf() separately from insImmq().

- Made variable naming more consistent:  'ins' for instructions, 'k' for
  hashtable indices.  And used the LInsKind names (eg. "Imm",
  "Immq", "Call") more consistently (eg. avoided "imm", "call").

As well as improving the code's readability, it's a performance win because
the switches are avoided in hashcode() and equals(), and because the hash
tables are split and sized appropriately for the instruction kinds which
means that grow() is called about 1/2 as often.  I'm currently getting about
a 4ms SS speedup.  This might be optimistic -- some of the tests (eg.
3d-morph) have sped up surprisingly -- but the tests that should speed up
have (crypto-md5, 3d-raytrace, date-format-xparb) and I've confirmed with
Cachegrind that fewer instructions are being run so it's a clear win.
Attachment #387130 - Attachment is obsolete: true
Attachment #399129 - Flags: review?(edwsmith)
Blocks: 505662
Comment on attachment 399129 [details] [diff] [review]
completely overhauled patch

it would be nice to allow kInitialCaps[kind] to be zero.  a growth formula like this might work:

newsize = oldsize + oldsize/2 + K

where K is a largish initial bump, and the overall 1.5x growth factor has less waste when the tables get big

(and the math works nice for an initial size of 0)
Attachment #399129 - Flags: review?(edwsmith) → review+
Ed, the current hashing algorithm requires that the table sizes are multiples of two, because they use a bitmask.  Keeping multiples of two seems simpler.
Blocks: 524632
http://hg.mozilla.org/tracemonkey/rev/1f4284d6be93
Whiteboard: fixed-in-nanojit → fixed-in-nanojit fixed-in-tracemonkey
Depends on: 527288
This code is failing on tamarin when SOFTFLOAT is defined (e.g. under winmo).

In LInsHashSet.grow() the loop over oldList there is a LIR_qjoin entry.  I would imagine that this is incorrect and we should only have call's in the list.

It appears that the SoftFloatFilter is interacting poorly with this new cse filter (i.e. moving SoftFloatFilter downstream of the cse filter cures these symptoms).

Will require further investigation as moving the filter is not a solution, it will effectively disable optimization of any SoftFloatFilter changes.
Rick, the patch wasn't supposed to change functional behaviour of the CseFilter at all.  But it did.  Can you try the patch in bug 527288?
still see the issue post-patch.
Blocks: 527754
(In reply to comment #10)
> still see the issue post-patch.

Bug 527754 is the follow-up bug for the SoftFloat issue.
http://hg.mozilla.org/mozilla-central/rev/1f4284d6be93
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.