Closed Bug 502778 Opened 11 years ago Closed 10 years ago
nanojit: speed up Cse
This patch splits the CseFilter's LInsHashSet into seven, one for each instruction group. This save somewhere between 3--6ms in SS. It also increases the starting size of these LInsHashSets from 64--256, which seems to help another couple of ms, but the difference is small enough that it's difficult to claim it's decisive. The patch also does a little bit of cleaning up: - renames some of the hash*() and find*() functions so that the suffixes match the ins*() functions. - avoids an unnecessary argument to hash() by computing hashcode() inside it - reorders add() so that the call to find() isn't necessary (it adds the item then checks for growth, so the find() within grow() covers the newly added element) - Removes CseReader, which is dead in TM and Tamarin - removes LInsHashSet::replace(); also dead I tried some further changes -- splitting add(), find() and grow() into more specific kinds (addImm(), addImmq(), etc) but it didn't help speed at all.
Attachment #387130 - Flags: review?(edwsmith)
Comment on attachment 387130 [details] [diff] [review] patch A comment in CseFilter for the rationale behind N tables vs 1 would be nice, otherwise looks great.
The previous patch's effect somehow disappeared in the time since I first wrote it. Here is a new patch. It's more extensive. (Nb: If you read LIR.h changes before LIR.cpp changes it'll make more sense.) - The separation of one hash table into nine is now done within LInsHashSet rather than outside it. This was necessary for some of the subsequent changes. The initial size of each one can now be controlled independently from the others, which is good as some of them get more use than others. - Previously, we had a mixture of specialised and generic operations: findXYZ() was specialised for the instruction kind (Imm, Immq, etc) when inserting, but add()/find()/grow()/equals()/hashcode() were generic, working with any instruction kind. This was a bit sucky because certain operations were effectively coded twice, most notably find(), and if the implementations didn't match then problems would occur. Also, we had to do switches in various places, most notably hashcode() and equals(). The patch gets rid of find(), equals() and hashcode() by making add() and grow() specialised for the instruction kind (simply by taking a LInsKind argument, which then causes the appropriate findXYZ() function to be used). This required introducing additional findXYZ() wrapper functions that take a LInsp argument, and putting them the m_find array. - Now handling insImmf() separately from insImmq(). - Made variable naming more consistent: 'ins' for instructions, 'k' for hashtable indices. And used the LInsKind names (eg. "Imm", "Immq", "Call") more consistently (eg. avoided "imm", "call"). As well as improving the code's readability, it's a performance win because the switches are avoided in hashcode() and equals(), and because the hash tables are split and sized appropriately for the instruction kinds which means that grow() is called about 1/2 as often. I'm currently getting about a 4ms SS speedup. This might be optimistic -- some of the tests (eg. 3d-morph) have sped up surprisingly -- but the tests that should speed up have (crypto-md5, 3d-raytrace, date-format-xparb) and I've confirmed with Cachegrind that fewer instructions are being run so it's a clear win.
Comment on attachment 399129 [details] [diff] [review] completely overhauled patch it would be nice to allow kInitialCaps[kind] to be zero. a growth formula like this might work: newsize = oldsize + oldsize/2 + K where K is a largish initial bump, and the overall 1.5x growth factor has less waste when the tables get big (and the math works nice for an initial size of 0)
Attachment #399129 - Flags: review?(edwsmith) → review+
Ed, the current hashing algorithm requires that the table sizes are multiples of two, because they use a bitmask. Keeping multiples of two seems simpler.
Whiteboard: fixed-in-nanojit → fixed-in-nanojit fixed-in-tracemonkey
This code is failing on tamarin when SOFTFLOAT is defined (e.g. under winmo). In LInsHashSet.grow() the loop over oldList there is a LIR_qjoin entry. I would imagine that this is incorrect and we should only have call's in the list. It appears that the SoftFloatFilter is interacting poorly with this new cse filter (i.e. moving SoftFloatFilter downstream of the cse filter cures these symptoms). Will require further investigation as moving the filter is not a solution, it will effectively disable optimization of any SoftFloatFilter changes.
Rick, the patch wasn't supposed to change functional behaviour of the CseFilter at all. But it did. Can you try the patch in bug 527288?
still see the issue post-patch.
(In reply to comment #10) > still see the issue post-patch. Bug 527754 is the follow-up bug for the SoftFloat issue.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.