Closed Bug 545270 Opened 10 years ago Closed 10 years ago

nanojit: use chained hash tables in CseFilter

Categories

(Core Graveyard :: Nanojit, defect)

x86
macOS
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: njn, Assigned: njn)

References

Details

Attachments

(1 file)

CseFilter currently uses open addressing + quadratic probing hash tables, and doubles them in size when the load factor reaches 0.75.  This is fine because we only ever (a) insert entries, (b) lookup entries, and (c) clear the whole table.

However, bug 517910 and (I think) bug 542905 will require selective removal of groups of entries.  This requires (a) iteration over all elements in a hash table, and (b) deletion of entries.  These operations suck in open addressing hash tables (preliminary work on bug 517910 confirms this).

I propose converting the tables to use chaining instead.  I've already seen that it makes no difference currently, but helps greatly once selective removal is added.
Blocks: 517910
Blocks: 542905
Attached patch patchSplinter Review
This patch changes the LInsHashSet hash tables to use chaining.

- Performance changes are so small they are in the noise for both SS and V8.

- The new version is substantially simpler, saving 125 lines of code.  Much
  of this is because we don't need a grow() function (which required
  m_find[] and second versions of findXYZ()).

- Renames 'k' as the more obvious 'hash' throughout.

- Sets the LInsImmq table to size=0 on 32-bit platforms.
Attachment #426137 - Flags: review?(edwsmith)
FYI, jsdhash/pldhash (double-hashing, so open addressing) garbage-collects removed sentinels, AKA tombstones, by keeping count of them and informing the load-factor-based growth/shrinkage alg from that counter as well as the live entry counter.

/be
Nice idea -- I'm learning a lot about hash tables -- but jsdhash/pldhash isn't available from Nanojit.  The idea could be reused, it sounds like it would help with the deletion of entries, but iteration hitting lots of empty slots would probably still be a problem (it was generally a bigger problem than the deletions).

So assuming nobody has strong objections to chaining I'll stick to my original proposal as it's working well.
Nick, your plan sounds fine. My only cautionary tale involves showing scars where I ignored asymptotic complexities, even such things as "malloc each chained hash table entry" vs. "malloc or realloc the entire open-addressed hash table". That last was what motivated jsdhash.c in the old days.

Perhaps scale is better bounded for this bug's chained hash table application.

/be
Is jsdhash/pldhash relatively small and dependency-free? If so, perhaps it could be transplanted into nanojit.
Yeah, it's self-contained in the jsdhash.[ch] files, except for some simple types from NSPR.  I bet you could name that tune in 4 notes!
Man, I get a 5% speedup in V8 and you guys want to complicate things!

I'm not certain that chaining is a better idea in general.  It's very difficult to say, it depends greatly on what the LIR looks like.  Once I add the aliasing stuff, for SS and V8 it definitely beats the existing open addressing scheme + my crappy implementation of deletions.  I can imagine pathological cases where chaining won't do well -- eg. if you have heaps of loads close together without any intervening aliasing stores or labels.  I'm not sure how realistic that is, but jstracer.cpp already throws some pretty horrendous code at NJ (eg. crypto-md5.js has a 20,000+ LIR instruction fragment).

Another complicating factor is that for CSE the lookups are not just "compare one word" but "compare the opcode and the two operands", for example.  I'm not sure if jsdhash handles that.  (I was none the wiser after a quick look at jsdhash.{h,cpp}.)

So I could import the better deletion ideas into the existing NJ open addressing hash table and try that.

As for allocations, we're using tempAlloc which AFAICT is a bump-pointer-style arena, so allocations should be fast.

Maybe I'll try doing smarter deletions in the open addressing hash table and see how that goes.
jsdhash has some rather lame indirections and C-tricks. Luke has written a template-based alternative in C++ that reads much nicer. If anything we should use that.
Heh, I was just saying this on the parent bug.  Bug 545270 has the hash table.  Its been backed out because MSVC PGO crashes.  The hashtable would also need to be de-JS'ed to work in nanojit, but not too severely.
Summarising my options are:

a) go ahead with chaining

b) experiment with better deletion in the old open addressing NJ hash table

c) experiment with jsdhash and/or Luke's table (assuming they handle the unusual
   key requirements)

I'm leaning towards (b) more than (c).  Let's assume I do that and it works well and gives similar performance to (a), ie. the expensive deletions/iterations are mitigated.  Then what?  It would hard to know which of (a) or (b) is better... in general they are similar (as the SS/v8 measurements would show) so then it comes down to "which is less prone to blow-ups in nasty cases?"

jsdhash.h has some analysis of when chaining is better than open addressing, but it's quite complex and you have to know the relative frequencies of different operations.  I could get those frequencies for the normal (eg. SS/v8) cases, but it's the potential nasty cases that are the concern and I don't know what those cases look like.

I do know that (ignoring jsdhash and Luke's table) the code for chaining is simpler than the code for open addressing.  With all else equal, that pushes me towards (a).
It's hard to justify too much time on this "which hashtable" issue. I'd say go with (a) as an improvement over what was there before, by all measures.

FWIW, the jsdhash.h analysis as I remember writing it was not that involved in terms of operation mixes -- the punchline was "small entry size and no address stability requirement? dhash; else chaining." But that does leave out some of the subtleties (delete-heavy workloads, thrashing the table growth/shrinkage code).

/be
5% speedup on V8 is superb, btw!

/be
Looks nice.  I wanted to point your attention to the templates in Containers.h, in case you'd find them useful.

LInsHashNode is equivalent to Seq<LIns*> for example.  HashMap<> is a chain-based implementation but it probably lacks some operations you need and hasn't been tuned.
Attachment #426137 - Flags: review?(edwsmith) → review+
I've been thinking about this some more, and doing some experiments.

First, I created a pathological case by running lirasm --random 1000000 after changing it to generate no labels.  This results in massive amounts of CSE occurring.  I can't remember the exact numbers, but with open addressing it took something like 0.5s, with chaining it took about 10s.  This doesn't please me.

Also, I discovered I'd botched the implementation of tombstones in my open addressing version.  I fixed that and also added some suggested improvements (overwrite tombstones when adding new nodes, and don't tombstone entries that haven't been collided with) and things look much better.

Furthermore, LInsHashSet has eight hash tables but only one of them currently needs iteration/deletion.  So changing them all is silly.

My plan now is to add proper tombstones to the one hashtable that needs it and leave the others untouched.  This will be done as part of bug 517910 because it doesn't make sense to do it separately.
Oh, so I'm marking this as WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.