If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Keep PLDHashTable's second hash small to reduce the size of jumps across memory and improve cache behavior

ASSIGNED
Assigned to

Status

()

Core
XPCOM
ASSIGNED
6 months ago
7 days ago

People

(Reporter: dbaron, Assigned: dbaron)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf:p2])

Attachments

(1 attachment)

(Assignee)

Description

6 months ago
PLDHashTable takes the result of the hash function and multiplies it by
kGoldenRatio to ensure that it has a good distribution of bits across
the 32-bit hash value, and then zeroes out the low bit so that it can be
used for the collision flag.  This result is called hash0.  From hash0
it computes two different numbers used to find entries in the table
storage:  hash1 is used to find an initial position in the table to
begin searching for an entry; hash2 is then used to repeatedly offset
that position (mod the size of the table) to build a chain of positions
to search.

In a table with capacity 2^c entries, hash1 is simply the upper c bits
of hash0.  This patch does not change this.

Prior to this patch (but only since bug 1352889), hash2 was the low c
bits of hash0.  This manner of computing hash2 is problematic because
chain can jump across very large areas of memory in a way that is
unfriendly to CPU caches.

So this patch changes the hash2 computation by capping the size of hash2
to try to make it more cache-friendly.

TODO: Measure to find an optimal value of kHash2MaskMaxBits, or whether
we should have a maximum at all!

NOTE: This patch is on top of the as-yet-unlanded patch in bug 1352889.

MozReview-Commit-ID: 7FgybeBgjLF
(Assignee)

Comment 1

6 months ago
Created attachment 8853818 [details] [diff] [review]
Keep PLDHashTable's second hash small to reduce the size of jumps across memory and improve cache behavior
Comment on attachment 8853818 [details] [diff] [review]
Keep PLDHashTable's second hash small to reduce the size of jumps across memory and improve cache behavior

Review of attachment 8853818 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/ds/PLDHashTable.cpp
@@ +270,5 @@
>    //
> +  // Since the result of Hash2 controls how far we jump around the table
> +  // to build a chain after starting at a location determined by Hash1,
> +  // we'd like to keep it small, to improve cache behavior.
> +  // Keep the jumps from the second hash small, to improve cache behavior.

The second sentence in this paragraph basically repeats the first sentence.
(Assignee)

Updated

6 months ago
Blocks: 1341750
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
FWIW perf was extremely helpful in profiling this.  I profiled bug 1357621 for example under perf, and here is the full profile on the Search() function, with the C++ source and asm view intermixed:

PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>  /home/ehsan/moz/src/obj-ff-opt.noindex/toolkit/library/lib
       │231           }                                                                                            ▒
       │                                                                                                           ▒
       │233           char* Get() { return mEntryStore; }                                                          ▒
       │aaf9ef:   mov    0x18(%r14),%rdi                                                                           ▒
       │235       _ZN12PLDHashTable12AddressEntryEj():                                                             ▒
       │298           mEntryStore.Get() + aIndex * mEntrySize);                                                    ▒
       │aaf9f3:   mov    0xc(%r14),%r8d                                                                            ▒
       │aaf9f7:   mov    %r8d,%r10d                                                                                ▒
       │aaf9fa:   imul   %r15d,%r10d                                                                               ▒
       │302       _ZN12PLDHashTable11EntryIsFreeEP15PLDHashEntryHdr():                                             ▒
       │                                                                                                           ▒
       │505         static const PLDHashNumber kCollisionFlag = 1;                                                 ▒
       │                                                                                                           ▒
       │507         static bool EntryIsFree(PLDHashEntryHdr* aEntry)                                               ▒
       │508         {                                                                                              ▒
       │509           return aEntry->mKeyHash == 0;                                                                ◆
       │aaf9fe:   mov    (%rdi,%r10,1),%ebx                                                                        ▒
 53.85 │aafa02:   test   %ebx,%ebx                                                                                 ▒
       │aafa04: ↓ je     aafaca <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │513       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │365         if (EntryIsFree(entry)) {                                                                      ▒
       │366           return (Reason == ForAdd) ? entry : nullptr;                                                 ▒
       │367         }                                                                                              ▒
       │                                                                                                           ▒
       │369         // Hit: return entry.                                                                          ▒
       │370         PLDHashMatchEntry matchEntry = mOps->matchEntry;                                               ▒
       │aafa0a:   mov    (%r14),%rax                                                                               ▒
  1.92 │aafa0d:   mov    0x8(%rax),%r9                                                                             ▒
       │373       _ZN12PLDHashTable17MatchEntryKeyhashEP15PLDHashEntryHdrj():                                      ▒
       │290         return (aEntry->mKeyHash & ~kCollisionFlag) == aKeyHash;                                       ▒
       │aafa11:   and    $0xfffffffe,%ebx                                                                          ▒
       │292       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │366         if (MatchEntryKeyhash(entry, aKeyHash) &&                                                      ▒
  1.92 │aafa14:   cmp    %edx,%ebx                                                                                 ▒
       │aafa16:   mov    %edx,-0x2c(%rbp)                                                                          ▒
       │aafa19:   mov    %r14,-0x38(%rbp)                                                                          ▒
       │aafa1d: ↓ jne    aafa4c <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │371       _ZN12PLDHashTable12AddressEntryEj():                                                             ▒
       │298           mEntryStore.Get() + aIndex * mEntrySize);                                                    ▒
  5.77 │aafa1f:   add    %r10,%rdi                                                                                 ▒
       │aafa22:   mov    %rsi,%r12                                                                                 ▒
       │aafa25:   mov    %r9,%r13                                                                                  ▒
       │aafa28:   mov    %rdi,%rbx                                                                                 ▒
       │303       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │367             matchEntry(entry, aKey)) {                                                                 ▒
       │aafa2b: → callq  *%r13                                                                                     ▒
       │366         if (MatchEntryKeyhash(entry, aKeyHash) &&                                                      ▒
  1.92 │aafa2e:   test   %al,%al                                                                                   ▒
       │aafa30: ↓ jne    aafacc <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │aafa36:   mov    0x8(%r14),%cx                                                                             ▒
       │aafa3b:   mov    0x18(%r14),%rdi                                                                           ▒
       │aafa3f:   mov    0xc(%r14),%r8d                                                                            ▒
       │aafa43:   mov    -0x2c(%rbp),%edx                                                                          ▒
       │aafa46:   mov    %r12,%rsi                                                                                 ▒
       │aafa49:   mov    %r13,%r9                                                                                  ▒
       │375       _ZN12PLDHashTable5Hash2EjRjS0_():                                                                ▒
       │260         uint32_t sizeLog2 = kHashBits - mHashShift;                                                    ▒
  3.85 │aafa4c:   mov    $0x20,%eax                                                                                ▒
       │aafa51:   sub    %ecx,%eax                                                                                 ▒
       │261         uint32_t sizeMask = (PLDHashNumber(1) << sizeLog2) - 1;                                        ▒
       │aafa53:   mov    $0x1,%r13d                                                                                ▒
       │aafa59:   mov    %eax,%ecx                                                                                 ▒
       │aafa5b:   shl    %cl,%r13d                                                                                 ▒
       │aafa5e:   dec    %r13d                                                                                     ▒
       │276         aHash2Out = (aHash0 & sizeMask) | 1;                                                           ▒
  1.92 │aafa61:   mov    %r13d,%ecx                                                                                ▒
       │aafa64:   and    %edx,%ecx                                                                                 ▒
       │aafa66:   or     $0x1,%ecx                                                                                 ▒
       │280       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │389             } else {                                                                                   ▒
       │390               entry->mKeyHash |= kCollisionFlag;                                                       ▒
       │391             }                                                                                          ▒
       │392           }                                                                                            ▒
       │                                                                                                           ▒
       │394           hash1 -= hash2;                                                                              ▒
       │aafa69:   sub    %ecx,%r15d                                                                                ▒
       │390           hash1 &= sizeMask;                                                                           ▒
       │aafa6c:   and    %r13d,%r15d                                                                               ▒
       │392       _ZN12PLDHashTable12AddressEntryEj():                                                             ▒
       │298           mEntryStore.Get() + aIndex * mEntrySize);                                                    ▒
       │aafa6f:   mov    %r8d,%ebx                                                                                 ▒
       │aafa72:   imul   %r15d,%ebx                                                                                ▒
       │301       _ZN12PLDHashTable11EntryIsFreeEP15PLDHashEntryHdr():                                             ▒
       │aafa76:   mov    (%rdi,%rbx,1),%eax                                                                        ▒
  7.69 │aafa79:   test   %eax,%eax                                                                                 ▒
       │506       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │aafa7b: ↓ je     aafaca <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │508       _ZN12PLDHashTable12AddressEntryEj():                                                             ▒
       │aafa7d:   add    %rdi,%rbx                                                                                 ▒
       │aafa80:   mov    %ecx,-0x30(%rbp)                                                                          ▒
       │300       _ZN12PLDHashTable17MatchEntryKeyhashEP15PLDHashEntryHdrj():                                      ▒
       │290         return (aEntry->mKeyHash & ~kCollisionFlag) == aKeyHash;                                       ◆
       │aafa83:   and    $0xfffffffe,%eax                                                                          ▒
       │292       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │398           if (EntryIsFree(entry)) {                                                                    ▒
       │399             return (Reason == ForAdd) ? (firstRemoved ? firstRemoved : entry)                          ▒
       │400                                       : nullptr;                                                       ▒
       │401           }                                                                                            ▒
       │                                                                                                           ▒
       │403           if (MatchEntryKeyhash(entry, aKeyHash) &&                                                    ▒
       │aafa86:   cmp    %edx,%eax                                                                                 ▒
       │aafa88: ↓ jne    aafab2 <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │399               matchEntry(entry, aKey)) {                                                               ▒
  1.92 │aafa8a:   mov    %rbx,%rdi                                                                                 ▒
       │aafa8d:   mov    %rsi,%r12                                                                                 ▒
       │aafa90:   mov    %r9,%r14                                                                                  ▒
       │aafa93: → callq  *%r9                                                                                      ▒
       │398           if (MatchEntryKeyhash(entry, aKeyHash) &&                                                    ▒
       │aafa96:   test   %al,%al                                                                                   ▒
       │aafa98: ↓ jne    aafacc <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │aafa9a:   mov    -0x38(%rbp),%rax                                                                          ▒
       │aafa9e:   mov    0x18(%rax),%rdi                                                                           ▒
       │aafaa2:   mov    0xc(%rax),%r8d                                                                            ▒
       │aafaa6:   mov    -0x2c(%rbp),%edx                                                                          ▒
       │aafaa9:   mov    %r12,%rsi                                                                                 ▒
       │aafaac:   mov    %r14,%r9                                                                                  ▒
       │aafaaf:   mov    -0x30(%rbp),%ecx                                                                          ▒
       │389           hash1 -= hash2;                                                                              ▒
  1.92 │aafab2:   sub    %ecx,%r15d                                                                                ▒
       │390           hash1 &= sizeMask;                                                                           ▒
       │aafab5:   and    %r13d,%r15d                                                                               ▒
       │392       _ZN12PLDHashTable12AddressEntryEj():                                                             ▒
       │298           mEntryStore.Get() + aIndex * mEntrySize);                                                    ▒
       │aafab8:   mov    %r8d,%eax                                                                                 ▒
       │aafabb:   imul   %r15d,%eax                                                                                ▒
       │aafabf:   lea    (%rdi,%rax,1),%rbx                                                                        ▒
       │302       _ZN12PLDHashTable11EntryIsFreeEP15PLDHashEntryHdr():                                             ▒
       │aafac3:   mov    (%rdi,%rax,1),%eax                                                                        ▒
       │505       _ZN12PLDHashTable11SearchTableILNS_12SearchReasonE0EEEP15PLDHashEntryHdrPKvj():                  ▒
       │393           if (EntryIsFree(entry)) {                                                                    ▒
  7.69 │aafac6:   test   %eax,%eax                                                                                 ▒
       │aafac8: ↑ jne    aafa83 <PLDHashEntryHdr* PLDHashTable::SearchTable<(PLDHashTable::SearchReason)0>(void con▒
       │aafaca:   xor    %ebx,%ebx                                                                                 ▒
       │406           }                                                                                            ▒
       │407         }                                                                                              ▒
       │                                                                                                           ▒
       │409         // NOTREACHED                                                                                  ▒
       │410         return nullptr;                                                                                ▒
       │411       }                                                                                                ▒
  1.92 │aafacc:   mov    %rbx,%rax                                                                                 ▒
       │aafacf:   add    $0x18,%rsp                                                                                ▒
       │aafad3:   pop    %rbx                                                                                      ▒
       │aafad4:   pop    %r12                                                                                      ▒
       │aafad6:   pop    %r13                                                                                      ▒
       │aafad8:   pop    %r14                                                                                      ▒
  1.92 │aafada:   pop    %r15                                                                                      ▒
       │aafadc:   pop    %rbp                                                                                      ▒
       │aafadd: ← retq                                                                                             ▒

This profile confirms that we are indeed being hurt a lot by the jumps across memory.  It also shows that in this work load, for example, half of the cost is paid by the first jump.

More profiling of workloads like this under perf would be more useful.  Replicating what I did is pretty simple:

$ perf record -g -F 1000 -p <content-process-pid>
$ reload the page, wait till done, Ctrl+C
$ perf report -g graph,0 
  Hit / to search, type PLDHashTable::SearchTable, press Enter, press a to go to annotate view!
Note that perf can have some issues with displaying what instruction is "responsible" for the cost; I know I have seen profiles where the "expensive" instruction is the `add` after a `div`...but the `div` is what's eating all the time.  In this profile, it seems likely that it's not the `test` instruction, but the `mov` prior to it, where we're missing in the cache or something like that.

Comment 5

5 months ago
FWIW the VTune manual mentions the same caveat: "Due to the collection mechanism, the captured IP address points to an instruction AFTER the one that is actually consuming most of the time."
(In reply to Nathan Froyd [:froydnj] from comment #4)
> Note that perf can have some issues with displaying what instruction is
> "responsible" for the cost; I know I have seen profiles where the
> "expensive" instruction is the `add` after a `div`...but the `div` is what's
> eating all the time.  In this profile, it seems likely that it's not the
> `test` instruction, but the `mov` prior to it, where we're missing in the
> cache or something like that.

That's not an issue with perf, it is due to the nature of the data dependencies on the processor, perf shows you where the program counter is when the samples are taken but the reason could be that the CPU was blocked waiting for data to be fetched from the target register of the mov instruction you're referring to.  You can see this happening in the profile above, here is an example:

       │507         static bool EntryIsFree(PLDHashEntryHdr* aEntry)                                               ▒
       │508         {                                                                                              ▒
       │509           return aEntry->mKeyHash == 0;                                                                ◆
       │aaf9fe:   mov    (%rdi,%r10,1),%ebx                                                                        ▒
 53.85 │aafa02:   test   %ebx,%ebx                                                                                 ▒

Here, is the test instruction doing something particularly expensive?  Not at all!  It is the mov instruction that is fetching the value from memory, but the value is actually not fetched when the processor "executes" mov because it's lazy and it doesn't actually "do" anything (by "do" here I mean it doesn't get blocked on anything finishing), and it is not before test is being executed that %ebx is needed and then the memory dependency is actually needed so you get blocked on it.

And in fact one very important thing that I forgot to mention previously here is that if you look at the callstack view in this profile you'll see that the reason why this is so expensive is that there were a lot of page faults happening in the first EntryIsFree() check in Search()...  I'm not really sure why.  But that may be worthwhile digging further into.  Anecdotally I saw the same behavior profiling a few other test cases as well, mostly focused on reflow like workloads...
(Assignee)

Comment 7

5 months ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #6)
> And in fact one very important thing that I forgot to mention previously
> here is that if you look at the callstack view in this profile you'll see
> that the reason why this is so expensive is that there were a lot of page
> faults happening in the first EntryIsFree() check in Search()...  I'm not
> really sure why.  But that may be worthwhile digging further into. 
> Anecdotally I saw the same behavior profiling a few other test cases as
> well, mostly focused on reflow like workloads...

So if it's predominantly the *first* EntryIsFree() check, then that's just the first access to the table, rather than the jumps around the table.  The question here is is more around the *later* checks, particularly on very large tables (such as the cycle collector table that we access when doing reference counting of cycle collected objects).

And perf probably isn't going to be useful in telling us what the tradeoff is between improved memory locality in the jumps for the second hash versus slightly less random distribution within the hashtable entry store.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #7)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #6)
> > And in fact one very important thing that I forgot to mention previously
> > here is that if you look at the callstack view in this profile you'll see
> > that the reason why this is so expensive is that there were a lot of page
> > faults happening in the first EntryIsFree() check in Search()...  I'm not
> > really sure why.  But that may be worthwhile digging further into. 
> > Anecdotally I saw the same behavior profiling a few other test cases as
> > well, mostly focused on reflow like workloads...
> 
> So if it's predominantly the *first* EntryIsFree() check, then that's just
> the first access to the table, rather than the jumps around the table.  The
> question here is is more around the *later* checks, particularly on very
> large tables (such as the cycle collector table that we access when doing
> reference counting of cycle collected objects).

I saw a bigger cost usually taken by the first check compared to the jumping around due to the page fault, but note that I was looking at the overall times for the function, not at the usage if a specific hashtable.  So yeah I agree that I've sort of scope-creeped this bug, sorry about that.  I needed a place for posting this profile.  :-)

The cost of the jumping around still looked significant enough that it seemed worth fixing.  Basically it seemed about half-ish the cost of hashtable lookups in the profiles I was eyeballing.

> And perf probably isn't going to be useful in telling us what the tradeoff
> is between improved memory locality in the jumps for the second hash versus
> slightly less random distribution within the hashtable entry store.

Yes, very true.  In general I'm afraid of what things we may be missing in our measurements here... I certainly make no claim of the completeness of comment 3, mostly that I thought it's interesting as some starting points for further investigations.
If hash table cache locality is a concern, have you tried profiling linear or quadratic probing instead of double hashing?
So I accidentally ended up creating a *super* reproducible test case for hashtable lookups with really bad collision rates in bug 1377999.  The lookups were around 10% of *total time*.  I tried applying this patch as is, and it made the lookups take only a bit more time than before increasing the collision rate.  This was almost as good as the real fix (bug 1379282).

Based on that experiment, I think we should perhaps consider landing this patch with kHash2MaskMaxBits as 6.

Note that I'm hoping that before 57 I'd get a chance to switch PLDHashTable to a more efficient hashtable algorithm based on robin hood hashing which basically eliminates this problem by doing only a short linear scan in the lookup algorithm in the case of collisions, but it's nice to be able to take incremental improvements here.

Updated

7 days ago
Whiteboard: [qf:p1] → [qf:p2]

Comment 11

7 days ago
I am moving this bug to P2 for post 57 work.
(Assignee)

Comment 12

7 days ago
Oops, I didn't see comment 10 until now.

Maybe it's worth trying to measure using that testcase, or maybe it's worth just landing with the value of 6?  In any case, it probably should be post-57 at this point.
You need to log in before you can comment on or make changes to this bug.