Closed Bug 1275048 Opened 8 years ago Closed 8 years ago

SharedImmutableStringsCache::Lookup should eagerly hash string contents

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: fitzgen, Assigned: fitzgen)

References

Details

Attachments

(1 file, 1 obsolete file)

We construct lookups before taking the lock, but the hashing still happens while we hold the lock. Constructing the lookup should eagerly compute the string's hash, before we take the lock.
This commit does two things to speed up hashing in the
SharedImmutableStringsCache:

* We eagerly compute a string's hash when we create a lookup for it. This means
  that we compute the hash before we take the lock, which reduces the length of
  time the lock is held.

* For strings longer than SHORT_STRING_MAX_LENGTH (currently 8192), we only hash
  the first N and last N characters in the string, where N =
  SHORT_STRING_MAX_LENGTH / 2. This increases the risk of collisions, but in
  practice it should be rare, and this yields a large speedup for hashing long
  strings.
Attachment #8755564 - Flags: review?(luke)
Comment on attachment 8755564 [details] [diff] [review]
Speed up hashing in SharedImmutableStringsCache

Review of attachment 8755564 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SharedImmutableStringsCache.h
@@ +279,5 @@
>              size_t length_;
>  
>            public:
>              Lookup(const char* chars, size_t length)
> +              : hash_(Hasher::hashLongString(chars, length))

I'm a little wary about hiding big O(n) operations in ctors/dtors/op= of small-looking data types.  I'm not familiar with the surrounding code; is there a different place in the API where this can be more explicit so that one could see both the hash operation and the lock in the same function?
(In reply to Luke Wagner [:luke] from comment #4)
> Comment on attachment 8755564 [details] [diff] [review]
> Speed up hashing in SharedImmutableStringsCache
> 
> Review of attachment 8755564 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/SharedImmutableStringsCache.h
> @@ +279,5 @@
> >              size_t length_;
> >  
> >            public:
> >              Lookup(const char* chars, size_t length)
> > +              : hash_(Hasher::hashLongString(chars, length))
> 
> I'm a little wary about hiding big O(n) operations in ctors/dtors/op= of
> small-looking data types.  I'm not familiar with the surrounding code; is
> there a different place in the API where this can be more explicit so that
> one could see both the hash operation and the lock in the same function?

A few things:

* This type is internal and not part of the public API for the cache, which makes it a little less scary to me :)

* Technically, it's not O(n) anymore since it is hashing at most 8192 characters :P

* More seriously: we could make the Lookup constructor take a HashNumber parameter, have the public SharedImmutableStringsCache::getOrCreate method[0] that constructs lookups and takes the lock pass the hash in explicitly, and maybe assert in the Lookup constructor that we were given the correct hash for the chars in debug builds. But now we have traded hiding the eager hashing with worrying about matching the hash and string. Is that trade worth it? I defer to you.

[0] https://dxr.mozilla.org/mozilla-central/rev/16663eb3dcfa759f25b5e27b101bc79270c156f2/js/src/vm/SharedImmutableStringsCache-inl.h#16
Yes, passing an explicit HashNumber sounds good to me.  Even if it's only internal, it'll help anyone doing any seemingly-harmless refactorings like moving a variable declaration inside the scope of the lock.

Another question: in that case I saw earlier where we hash in the dtor, can be save the HashNumber up-front and reuse it in the dtor?
(In reply to Luke Wagner [:luke] from comment #6)
> Yes, passing an explicit HashNumber sounds good to me.  Even if it's only
> internal, it'll help anyone doing any seemingly-harmless refactorings like
> moving a variable declaration inside the scope of the lock.

Ok.

> Another question: in that case I saw earlier where we hash in the dtor, can
> be save the HashNumber up-front and reuse it in the dtor?

I landed a patch in bug 1269451 to avoid re-hashing in the dtor by boxing cache entries and making the SharedImmutableString dtor hold a pointer to the boxed entry instead of doing another lookup in the table. The patch is on inbound right now, not all the way in m-c yet.
Pass the hash as a parameter to the lookup, as discussed.
Attachment #8755589 - Flags: review?(luke)
Attachment #8755564 - Attachment is obsolete: true
Attachment #8755564 - Flags: review?(luke)
Attachment #8755589 - Flags: review?(luke) → review+
Great, thanks!
https://hg.mozilla.org/mozilla-central/rev/bee9ad2202c2
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: