Open Bug 1815266 Opened 2 years ago Updated 1 year ago

Experiment: cache string hash values

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

People

(Reporter: sfink, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(4 files)

We compute a string's hash when atomizing, and when looking it up for various purposes. We could use a bit in the header to say whether it has been hashed, and store the hash after the string data.

It would cost space if we need to bump up the allocation size. Though note that sometimes we'll have enough slop bytes in the allocation to fit it in for free.

Also note that almost every string will get hashed if it survives to be tenured, since the string deduplication code computes its hash to look it up.

This would be useful to start doing that from the stencil, as the hashes, while being the same are re-computed instead of being reused.

If a longer string is atomized repeatedly, we avoid rehashing thanks to the StringToAtomCache.

At some point I prototyped storing either a hash code (for atoms) or a pointer to the corresponding atom (for non-atoms) in JSString. This is pretty nice because it removes the branching to get the hash out of a NormalAtom vs FatInlineAtom and is faster than looking it up in the StringToAtomCache (and works better from JIT code). I'm not sure if it's worth the memory overhead though.

Severity: -- → S3
Priority: -- → P3
Whiteboard: [sp3]
Depends on: 1825675
Depends on: 1848884

This didn't make much of a difference in speedometer. I wrote some microbenchmarks and looked at what was going on.

One thing I ran into quickly is that theStringToAtomCache that jandem mentioned in comment 2 is also used to indicate that a string should be atomized during deduplication. So the obvious microbenchmark of creating a bunch of string keys and repeatedly looking them up in an object doesn't take advantage of this, because when they get tenured they all get atomized anyway. If I am following along correctly, that means that anytime you create a nursery string and then use it to look up a property, it'll end up getting atomized, which seems like it'll cover a large majority of cases—it'll probably be fairly rare to create a string, hang onto it long enough for it to survive a minor GC, and only then use it as a key.

A microbenchmark to look at (non-matching) string comparisons shows improvements (24 -> 15ns/comparison) but only if I make the strings pretty long. An example string is "a really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long key with a number at the end: 13". If I replace all of those "really"s with just one, then the performance difference disappears. (And if the string is even shorter, it'll fit in an inline string, and I am not storing hashes for those.)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: