Closed Bug 515812 Opened 15 years ago Closed 14 years ago

Double hashing template

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jorendorff, Assigned: luke)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(5 files, 2 obsolete files)

Attached patch WIP 1Splinter Review
Just about every hash table we create in js/src has pointer-sized keys and values. Sometimes the key is the value. Double hashing should win for these tables.

The attachment is a first cut. Clearly needs tuning judging by early results (below-- note the weird curve, suggesting a silly fixed cost-- the test case is from bug 506410 comment 3 and is the kind of case where double hashing should win). Something is wrong, but I don't have time for it at the moment.

Outer 100000, Inner 100
[unordered_map] 1.29971
[HashMap] 0.80612
[HashSet] 0.76645

Outer 1000, Inner 10000
[unordered_map] 1.39445
[HashMap] 0.47037
[HashSet] 0.36216

Outer 100, Inner 100000
[unordered_map] 1.58923
[HashMap] 0.86247
[HashSet] 0.60625
Attached patch templated JS_DHashTable (obsolete) — Splinter Review
This is an initial version of a fairly direct translation of JS_DHashTable algorithm into slightly more modern C++ style.  In my benchmarks, it is 1.5x - 3.2x faster than JS_DHashTable.  Basic use looks like:

typedef js::HashMap<int, double> HM;
HM hm(cx);        // default allocator uses JSContext::malloc
if (!hm.init())   // fallible initialization :-(
  return false;

HM::AddPtr p = hm.lookupForAdd(1);
if (p)
  cout << p->key << ' ' << p->value << '\n';
else
  if (!hm.add(p, 1, 3.0))
    return false;

if (HM::Ptr p = hm.lookup(1))
  hm.remove(p);

for (HM::Range r = hm.all(); !r.empty(); r.popFront())
  cout << r.front().key << ' ' << r.front().value << '\n';

with shorthand:

if (!hm.put(5, 30.2))
  return false;
hm.has(5);
hm.remove(5);

Syntactic and other comments welcome!

A few notes on this implementation vs. the one in attachment 399878 [details] [diff] [review]:
 - I borrowed the HashTable/HashMap/HashSet design, so there is also a HashSet :)
 - I initially dropped the whole collision-bit business (since it adds a small overhead to all operations), however one of my synthetic benchmarks from bug 506410 showed js::HashMap to be 30% slower than JS_DHashTable.  Shark showed that practically all this extra time was spent following collision paths.  Adding a collision-bit resulted in a 2.5x speedup, so I'm going leaving it in.
 - Recomputing hashes (instead of storing them) is not an option when hashing is expensive.  I initially added a compile-time constant to the HashingPolicy that allowed the user to either request cached hashes or provide an interface, like attachment 399878 [details] [diff] [review], for detecting/marking live/free/tombstoned values.  In general, this did not show up in benchmarks except for specific iteration sizes where, presumably, the 2x smaller version allowed the table to fit in cache.  When I added the collision-bit, I took this whole feature out since the two do not go well together.
 - Having an inline buffer would avoid the need for a fallible constructor/init.  However, after working with them in bug 506410, it seemed like a waste of 16x2x4 bytes per hash table since they hash tables are usually long-lived and big.
Assignee: general → lw
Status: NEW → ASSIGNED
(In reply to comment #1)
> for (HM::Range r = hm.all(); !r.empty(); r.popFront())
>   cout << r.front().key << ' ' << r.front().value << '\n';

The Range class (or its version) needs a method to remove the key under enumeration. That is used in a few places with the current hashtable enumeration callbacks.
(In reply to comment #2)
> (In reply to comment #1)
> > for (HM::Range r = hm.all(); !r.empty(); r.popFront())
> >   cout << r.front().key << ' ' << r.front().value << '\n';
> 
> The Range class (or its version) needs a method to remove the key under
> enumeration. That is used in a few places with the current hashtable
> enumeration callbacks.

Right, I had been thinking about this, but there were some issues, so I left it out of this initial "release".  But you're right; this should be dealt with now.

Simply sticking a "removeFront" member in either the table or the Range runs into the problem that the Range is invalidated if the table is resized (even if the Range is re-pointed to the new table).  To address this, JS_DHashTableEnumerate avoids resizing until the enumeration is complete, and then possibly resizes/rehashes.  However, with external iteration, the closest thing we have is the Range destructor.  This could work, but then the Range destructor is a lot more "active" than the average destructor, which might lead to unexpected behavior.

This behavior could be made more explicit by having two enumeration types:

struct Range { /* unchanged */ };
struct Enumeration : Range { void removeFront(); void endEnumeration(); };

Range all() const;
Enumeration beginEnumeration();

and having ~Enumeration() call endEnumeration() if it hasn't already been.

Thoughts, anyone?
(In reply to comment #3)

Upon a little more discussion with dvander, I'm further inclined toward the Enumeration/beginEnumeration strategy, since it allows Range to have a copy constructor and Enumeration to hide its copy constructor.
This patch includes the ability to remove during enumeration like so:

HashSet<int> hs;
for (HashSet<int>::Enum e(hs.enumerate()); !e.empty(); e.popFront())
  if (dontlike(e.front()))
    e.removeFront();

As a use case, the patch also replaces JS_DHashTable usage in jstracer.cpp and jsarray.cpp.  In each case, the resulting code is, IMHO, easier to read.  None of these are on hot paths so TSS is not affected.
Attachment #415527 - Attachment is obsolete: true
Looking at the patch in bug 453929 (thanks Brendan), I noticed a slight generalization was needed to capture existing JS_DHashTable usage where the type of the value used to lookup an entry was different from the type of the key stored in the entry.  This is easy to do with an extra "Lookup" associated type in the template hashing policy.

The patch try's fine; about ready for review now.
Attachment #415900 - Attachment is obsolete: true
Comment on attachment 416317 [details] [diff] [review]
added "lookup type"

Great patch, it's like when your silver-age comic heroes are reimagined by a non-wanker hawt new artist, with new costumes, gnarlier powers, and more sax&violins ;-) -- stylish!

> TraceRecorder::guardShape(LIns* obj_ins, JSObject* obj, uint32 shape, const char* guardName,
>                           LIns* map_ins, VMSideExit* exit)
> {
>     // Test (with add if missing) for a remembered guard for (obj_ins, obj).
>+    GuardedShapeTable::AddPtr p = guardedShapeTable.lookupForAdd(obj_ins);
>+    if (p) {
>+        JS_ASSERT(p->value == obj);
>         return RECORD_CONTINUE;
>+    } else {
>+        if (!guardedShapeTable.add(p, obj_ins, obj))
>+            return RECORD_ERROR;
>+    }

Nit: else after return non-sequitur.

/be
Attachment #416317 - Flags: review?(brendan)
Comment on attachment 416317 [details] [diff] [review]
added "lookup type"

I like what I've read, but I'm a terrible reviewer. I only play at C++ hacker now and then. Jason, could you give this a proper review? Thanks, or bounce back if you can't get to it soonish.

/be
Attachment #416317 - Flags: review?(brendan) → review?(jorendorff)
Comment on attachment 416317 [details] [diff] [review]
added "lookup type"

In jstracer.cpp:
>+        if (!set.init())
>+            abort();

Call OutOfMemoryAbort just for consistency?

Also, this patch has some tab characters in jstracer.h.

r=me, sorry for the slow review.
Attachment #416317 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/784ceadd60e5
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
Bugs aren't supposed to be marked RESOLVED FIXED until checked into mozilla-central.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Oops, got a bit carried away.
Apparently stdarg.h was being included for debug builds, but not opt.
http://hg.mozilla.org/tracemonkey/rev/584af5971ccf
Disregard comment 13, meant for a different bug.
Backed out, as this patch seems to cause PGO to die
http://hg.mozilla.org/tracemonkey/rev/fcf42ef466a1
This patch fixes the unintuitive behavior of the HashMap::put convenience function to overwrite when the key is found.
Attachment #427373 - Flags: review?(jorendorff)
Attachment #427373 - Flags: review?(jorendorff) → review+
Initially I was returning const Entry & to HashMap users thinking that 'mutable' would allow mutation of Entry::value, but, I failed to realize, this only works for Entry members.
Attachment #427801 - Flags: review?(dvander)
Attachment #427801 - Flags: review?(dvander) → review+
The patch eliminates the need to call an extra enumerate() function when looping over hashtable elements changing usage pattern from:

   for (HashMapType::Enum e(hashmap.enumerate()); !e.empty(); e.popFront()) ...

to less verbose version: 

   for (HashMapType::Enum e(hashmap); !e.empty(); e.popFront()) ...
Attachment #429099 - Flags: review?(lw)
Comment on attachment 429099 [details] [diff] [review]
less verbose enumeration

Shorter and simpler; what's not to like. Thanks!
Attachment #429099 - Flags: review?(lw) → review+
The same change that was done for HashTable::Enum could also be done for HashTable::Range allowing to do non-deleting enumerations over Hashtable without calling the all method changing a pattern like:

  for (HM::Range r = h.all(); !r.empty(); r.popFront()) ...

into

  for (HM::Range r(h); !r.empty(); r.popFront()) ...

luke, any objections?
(In reply to comment #22)

While that is attractive since it makes Range like Enum, they are rather different: an Enum object has identity and its lifetime has semantic meaning while a Range is a normal value type like an iterator that can be initialized and copied around with =.  So, to me, its fine that they look different; they are different.  Also, for value types, people generally seem to prefer initializer syntax over constructor syntax.
Enumerate implicitly seems to suggest exhaustiveness; Range doesn't to me.  First change is nice, second doesn't quite seem right to me.  But I'm not looking too closely here, maybe I'm missing the flavor of the second case.
Small problems with warnings in the Range class.
I might silence with #pragma for now but this needs fixing...

#
js\src\jshashtable.h(164) : warning C4512: 'js::detail::HashTable<T,HashPolicy,AllocPolicy>::Range' : assignment operator could not be generated
#
        with
#
        [
#
            T=JSObject *const ,
#
            HashPolicy=js::HashSet<JSObject *>::SetOps,
#
            AllocPolicy=js::ContextAllocPolicy
#
        ]
#
        js\src\jshashtable.h(134) : see declaration of 'js::detail::HashTable<T,HashPolicy,AllocPolicy>::Range'
#
        with
#
        [
#
            T=JSObject *const ,
#
            HashPolicy=js::HashSet<JSObject *>::SetOps,
#
            AllocPolicy=js::ContextAllocPolicy
#
        ]
#
        js\src\jshashtable.h(176) : see reference to class template instantiation 'js::detail::HashTable<T,HashPolicy,AllocPolicy>::Range' being compiled
#
        with
#
        [
#
            T=JSObject *const ,
#
            HashPolicy=js::HashSet<JSObject *>::SetOps,
#
            AllocPolicy=js::ContextAllocPolicy
#
        ]
#
        js\src\jshashtable.h(537) : see reference to class template instantiation 'js::detail::HashTable<T,HashPolicy,AllocPolicy>::Enum' being compiled
#
        with
#
        [
#
            T=JSObject *const ,
#
            HashPolicy=js::HashSet<JSObject *>::SetOps,
#
            AllocPolicy=js::ContextAllocPolicy
#
        ]
#
        js\src\jshashtable.h(873) : see reference to class template instantiation 'js::detail::HashTable<T,HashPolicy,AllocPolicy>' being compiled
#
        with
#
        [
#
            T=JSObject *const ,
#
            HashPolicy=js::HashSet<JSObject *>::SetOps,
#
            AllocPolicy=js::ContextAllocPolicy
#
        ]
#
        js/src/jscntxt.h(1361) : see reference to class template instantiation 'js::HashSet<T>' being compiled
#
        with
#
        [
#
            T=JSObject *
#
        ]
Ah, thanks Mike.  No default operator= when one member is a const.
http://hg.mozilla.org/tracemonkey/rev/bc0a50164c21
http://hg.mozilla.org/mozilla-central/rev/784ceadd60e5
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Depends on: 563326
Depends on: 626526
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: