545270 - nanojit: use chained hash tables in CseFilter

Assignee

Description

•

15 years ago

CseFilter currently uses open addressing + quadratic probing hash tables, and doubles them in size when the load factor reaches 0.75. This is fine because we only ever (a) insert entries, (b) lookup entries, and (c) clear the whole table. However, bug 517910 and (I think) bug 542905 will require selective removal of groups of entries. This requires (a) iteration over all elements in a hash table, and (b) deletion of entries. These operations suck in open addressing hash tables (preliminary work on bug 517910 confirms this). I propose converting the tables to use chaining instead. I've already seen that it makes no difference currently, but helps greatly once selective removal is added.

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Blocks: 517910

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Blocks: 542905

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

15 years ago

Attached patch patch — Details — Splinter Review

This patch changes the LInsHashSet hash tables to use chaining. - Performance changes are so small they are in the noise for both SS and V8. - The new version is substantially simpler, saving 125 lines of code. Much of this is because we don't need a grow() function (which required m_find[] and second versions of findXYZ()). - Renames 'k' as the more obvious 'hash' throughout. - Sets the LInsImmq table to size=0 on 32-bit platforms.

Attachment #426137 - Flags: review?(edwsmith)

Brendan Eich [:brendan]

Comment 2

•

15 years ago

FYI, jsdhash/pldhash (double-hashing, so open addressing) garbage-collects removed sentinels, AKA tombstones, by keeping count of them and informing the load-factor-based growth/shrinkage alg from that counter as well as the live entry counter. /be

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

15 years ago

Nice idea -- I'm learning a lot about hash tables -- but jsdhash/pldhash isn't available from Nanojit. The idea could be reused, it sounds like it would help with the deletion of entries, but iteration hitting lots of empty slots would probably still be a problem (it was generally a bigger problem than the deletions). So assuming nobody has strong objections to chaining I'll stick to my original proposal as it's working well.

Brendan Eich [:brendan]

Comment 4

•

15 years ago

Nick, your plan sounds fine. My only cautionary tale involves showing scars where I ignored asymptotic complexities, even such things as "malloc each chained hash table entry" vs. "malloc or realloc the entire open-addressed hash table". That last was what motivated jsdhash.c in the old days. Perhaps scale is better bounded for this bug's chained hash table application. /be

Steven Johnson

Comment 5

•

15 years ago

Is jsdhash/pldhash relatively small and dependency-free? If so, perhaps it could be transplanted into nanojit.

Mike Shaver (:shaver emeritus)

Comment 6

•

15 years ago

Yeah, it's self-contained in the jsdhash.[ch] files, except for some simple types from NSPR. I bet you could name that tune in 4 notes!

Nicholas Nethercote [inactive]

Assignee

Comment 7

•

15 years ago

Man, I get a 5% speedup in V8 and you guys want to complicate things! I'm not certain that chaining is a better idea in general. It's very difficult to say, it depends greatly on what the LIR looks like. Once I add the aliasing stuff, for SS and V8 it definitely beats the existing open addressing scheme + my crappy implementation of deletions. I can imagine pathological cases where chaining won't do well -- eg. if you have heaps of loads close together without any intervening aliasing stores or labels. I'm not sure how realistic that is, but jstracer.cpp already throws some pretty horrendous code at NJ (eg. crypto-md5.js has a 20,000+ LIR instruction fragment). Another complicating factor is that for CSE the lookups are not just "compare one word" but "compare the opcode and the two operands", for example. I'm not sure if jsdhash handles that. (I was none the wiser after a quick look at jsdhash.{h,cpp}.) So I could import the better deletion ideas into the existing NJ open addressing hash table and try that. As for allocations, we're using tempAlloc which AFAICT is a bump-pointer-style arena, so allocations should be fast. Maybe I'll try doing smarter deletions in the open addressing hash table and see how that goes.

Andreas Gal :gal

Comment 8

•

15 years ago

jsdhash has some rather lame indirections and C-tricks. Luke has written a template-based alternative in C++ that reads much nicer. If anything we should use that.

Luke Wagner [:luke]

Comment 9

•

15 years ago

Heh, I was just saying this on the parent bug. Bug 545270 has the hash table. Its been backed out because MSVC PGO crashes. The hashtable would also need to be de-JS'ed to work in nanojit, but not too severely.

Nicholas Nethercote [inactive]

Assignee

Comment 10

•

15 years ago

Summarising my options are: a) go ahead with chaining b) experiment with better deletion in the old open addressing NJ hash table c) experiment with jsdhash and/or Luke's table (assuming they handle the unusual key requirements) I'm leaning towards (b) more than (c). Let's assume I do that and it works well and gives similar performance to (a), ie. the expensive deletions/iterations are mitigated. Then what? It would hard to know which of (a) or (b) is better... in general they are similar (as the SS/v8 measurements would show) so then it comes down to "which is less prone to blow-ups in nasty cases?" jsdhash.h has some analysis of when chaining is better than open addressing, but it's quite complex and you have to know the relative frequencies of different operations. I could get those frequencies for the normal (eg. SS/v8) cases, but it's the potential nasty cases that are the concern and I don't know what those cases look like. I do know that (ignoring jsdhash and Luke's table) the code for chaining is simpler than the code for open addressing. With all else equal, that pushes me towards (a).

Brendan Eich [:brendan]

Comment 11

•

15 years ago

It's hard to justify too much time on this "which hashtable" issue. I'd say go with (a) as an improvement over what was there before, by all measures. FWIW, the jsdhash.h analysis as I remember writing it was not that involved in terms of operation mixes -- the punchline was "small entry size and no address stability requirement? dhash; else chaining." But that does leave out some of the subtleties (delete-heavy workloads, thrashing the table growth/shrinkage code). /be

Brendan Eich [:brendan]

Comment 12

•

15 years ago

5% speedup on V8 is superb, btw! /be

Edwin Smith

Comment 13

•

15 years ago

Looks nice. I wanted to point your attention to the templates in Containers.h, in case you'd find them useful. LInsHashNode is equivalent to Seq<LIns*> for example. HashMap<> is a chain-based implementation but it probably lacks some operations you need and hasn't been tuned.

Edwin Smith

Updated

•

15 years ago

Attachment #426137 - Flags: review?(edwsmith) → review+

Nicholas Nethercote [inactive]

Assignee

Comment 14

•

15 years ago

I've been thinking about this some more, and doing some experiments. First, I created a pathological case by running lirasm --random 1000000 after changing it to generate no labels. This results in massive amounts of CSE occurring. I can't remember the exact numbers, but with open addressing it took something like 0.5s, with chaining it took about 10s. This doesn't please me. Also, I discovered I'd botched the implementation of tombstones in my open addressing version. I fixed that and also added some suggested improvements (overwrite tombstones when adding new nodes, and don't tombstone entries that haven't been collided with) and things look much better. Furthermore, LInsHashSet has eight hash tables but only one of them currently needs iteration/deletion. So changing them all is silly. My plan now is to add proper tombstones to the one hashtable that needs it and leave the others untouched. This will be done as part of bug 517910 because it doesn't make sense to do it separately.

Nicholas Nethercote [inactive]

Assignee

Comment 15

•

15 years ago

Oh, so I'm marking this as WONTFIX.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Core → Core Graveyard