Closed Bug 290032 Opened 19 years ago Closed 14 years ago

Some files are never cached due to hash collisions which are quite common due to weak string hash function

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: tomasare, Assigned: michal)

References

(Blocks 1 open bug, )

Details

Attachments

(6 files, 16 obsolete files)

217.54 KB, text/plain
Details
2.03 KB, application/x-compressed-tar
Details
7.03 KB, patch
Details | Diff | Splinter Review
77.35 KB, patch
bzbarsky
: review+
benjamin
: review+
Details | Diff | Splinter Review
1.47 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
5.44 KB, patch
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0

It seems that the files ListBar.js, ListBar.css, MenuBar.js, and MenuBar.css are
never cached.  On page page http://nf.opoint.com/main/ff_cache.html there are
other css- and js-files that are cached, but never the ones named ListBar and
MenuBar.  If I rename these files, they get cached, e.g. List.css is a copy of
ListBar.css.


Reproducible: Always

Steps to Reproduce:
1. Open url
2. Reload page (e.g. klikk on the reload-link)
3. With each page reload the four files ListBar.js, ListBar.css, MenuBar.js, and
MenuBar.css get fetched from the server (use e.g. LiveHTTPHeaders)

Actual Results:  
The files are not cached

Expected Results:  
They should be cached as the other js- and css-files

Also tested on:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217
Could you create a log as described at
http://www.mozilla.org/projects/netlib/http/http-debugging.html and attach it to
this bug?
Browser used: Mozilla 1.8b2, Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2)
Gecko/20050516
1. Cleared the browser cache before loading the page.
2. Reloaded the page (by clicking on the reload-link)
3. The log file verifies that the files MenuBar.css, ListBar.css, MenuBar.js,
and ListBar.js are not properly cached (they always report 'got cache entry
[access=2]')
Confirming based on that log.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 310656
-> default owner
Assignee: darin → nobody
Problem is that cache entry can't be opened due to hash collision.

Breakpoint 1, nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
422             *collision = PR_TRUE;
(gdb) bt 10
#0  nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
#1  0x0105fec1 in nsCacheService::SearchCacheDevices (this=0xb791e6b0, key=0xb6d17770, policy=0, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1540
#2  0x01062e5c in nsCacheService::ActivateEntry (this=0xb791e6b0, request=0xb6d1ef20, result=0xbfb9c958)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1452
#3  0x01062394 in nsCacheService::ProcessRequest (this=0xb791e6b0, request=0xb6d1ef20, calledFromOpenCacheEntry=1, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1340
#4  0x01063161 in nsCacheService::OpenCacheEntry (session=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, listener=0x0, 
    result=0xb33afb20) at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1420
#5  0x01065cab in nsCacheSession::OpenCacheEntry (this=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheSession.cpp:106
#6  0x010c1308 in nsHttpChannel::OpenCacheEntry (this=0xb33afa20, offline=0, delayed=0xbfb9cac8)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:1417
#7  0x010c626c in nsHttpChannel::Connect (this=0xb33afa20, firstTime=1)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:286
#8  0x010c697f in nsHttpChannel::AsyncOpen (this=0xb33afa20, listener=0xb6d14dc0, context=0x0)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:3745
#9  0x013652b2 in CSSLoaderImpl::LoadSheet (this=0xb5f780c0, aLoadData=0xb01afe20, aSheetState=eSheetNeedsParser)
    at /opt/moz/TRUNK_new3/mozilla/layout/style/nsCSSLoader.cpp:1454
(More stack frames follow...)
(gdb) l
417         nsDiskCacheEntry * diskEntry = mCacheMap.ReadDiskCacheEntry(&record);
418         if (!diskEntry) return nsnull;
419         
420         // compare key to be sure
421         if (strcmp(diskEntry->Key(), key->get()) != 0) {
422             *collision = PR_TRUE;
423             return nsnull;
424         }
425         
426         nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
(gdb) p diskEntry->Key()
$1 = 0xb44f7624 "HTTP:http://localhost/bug338190/3/XSL/bss.xsl"
(gdb) p *key
$2 = {<nsACString_internal> = {<nsCSubstring_base> = {<No data fields>}, mData = 0xb4677948 "HTTP:http://localhost/bug338190/3/CSS/bss.css", 
    mLength = 45, mFlags = 5}, <No data fields>}
(gdb) p/x hashNumber
$3 = 0xc4b04670


Hash function http://mxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDevice.cpp#240 that is used in mozilla code a lot is really weak and collisions are quite common.

$ ./hashit "ABA/xxx.aba"
8ffac222
$ ./hashit "XyZ/xxx.xYz"
8ffac222
$ ./hashit "CSS/xxx.css"
8ffac222
$ ./hashit "JPG/xxx.jpg"
8ffac222


$ ./hashit modules_newsfeeds/MenuBar/MenuBar.css
15c23729
$ ./hashit modules_newsfeeds/ListBar/ListBar.css
15c23729

$ ./hashit modules_newsfeeds/MenuBar/MenuBar.js
a15c23e5
$ ./hashit modules_newsfeeds/ListBar/ListBar.js
a15c23e5


IMHO the hash function should be changed.
Michal: that is very interesting. Can you update the summary?
We've run into issues with collisions in this hash function before.  Changing it sounds like a great idea.
Summary: some js and css files that are never cached → Some files are never cached due to hash collisions which are quite common due to weak string hash function
(In reply to comment #7)
> We've run into issues with collisions in this hash function before.  Changing
> it sounds like a great idea.
> 

I've done some search and this hash function seems quite good for this purpose:

http://burtleburtle.net/bob/hash/evahash.html

What do you think?
Sorry it took me so long to reply; I was out for a week.

At first glance, that hash function looks pretty reasonable, particularly for hashing mostly-ascii strings as we are.

Let's give it a shot.  As long as it doesn't impact performance negatively, it's got to be a win in terms of collision-avoidance.
Attached patch changed hash function in cache (obsolete) — Splinter Review
Assignee: nobody → michal
Status: NEW → ASSIGNED
Attachment #335539 - Flags: review?(bzbarsky)
I thought that it make sense to replace not only cache hash function but also other same hash functions with the new one. I've rewritten PL_HashString() and tried to use it everywhere, where I've found the old function. But I'm stuck with nsCRT::HashCode(). There is version for (char *) and also for (PRUnichar *). Hash value should be the same for "SomeText" and L"SomeText". I can do this with the new hash function only by discarding most significant byte of PRUnichar and I don't think that it is correct. Is it necessary to keep this feature?
Yes, it is, because we hash utf8 and unicode atoms in the same table.
Can we pad the char* out with nulls instead of dropping the MSB of the PRUnichar?  How would performance look?  That code is _very_ perf-sensitive.  It's possible that a better hash function would even improve performance here, of course.  ;)

That said, this is the hashcode computation, right?  So discarding the MSB would just mean slightly more collisions, which might be acceptable for the atom table (almost all atoms are in fact pure-ascii).
Comment on attachment 335539 [details] [diff] [review]
changed hash function in cache

>Index: netwerk/cache/src/nsDiskCacheDevice.cpp
>+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)

Why PLHashNumber and not just PRUint32?

It's probably worth adding a pointer to where the algorithm came from or something.

>+#define hashmix(a,b,c) \

How about:

 static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);

?  Or static inline void, if desired.  Somewhat preferable to using a macro, in my opinion.

>+nsDiskCache::Hash(const char * key, PLHashNumber initval)
>+  PRUint8 *k = (PRUint8*)key;

reinterpret_cast.

>+  PRUint32 a,b,c,len,length;

Spaces after the commas, please?

>+  c = initval;         /* the previous hash value */

It is?  Or is it more of a "seed" value or something?

>+    a += (k[0] +((PRUint32)k[1]<<8) +((PRUint32)k[2]<<16) +((PRUint32)k[3]<<24));

I think this would be more readable (at least in terms of making sure it's correct) with C++-style casts and withoutthe outermost parens, and with spaces around the plus signs:

  a += k[0] + (PRUint32(k[1])<<8) + (PRUint32(k[2])<<16) + (PRUint32(k[3])<<24);

Similar for the other three statements.  Also use C++ casts in the switch.

>+    /* the first byte of c is reserved for the length */

s/first/low-order/

/*-------------------------------------------- report the result */

I don't think this comment is needed.  ;)

>Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
>+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);

Where does this last magic number come from?  Why is this the only consumer that needs to pass a second arg in here?  dcamp should probably look at this part of the change.

I was wondering whether it would make sense to stick this somewhere in xpcom (maybe even nsCRT, with a different name from the existing functions) so others can also use it.  Or if we fix nsCRT to use this algorithm, just use that directly here?
(In reply to comment #14)

OK, I'll either pad char* with 0 or discard MSB in PRUnichar. Do hash functions nsCRT::HashCode() and PL_HashString() have to return same value for the same input text?
I don't think they need to, no. Though I'd be interested in how the new hash function compares to the ones in https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data easily.
(In reply to comment #17)
> I don't think they need to, no. Though I'd be interested in how the new hash
> function compares to the ones in
> https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data
> easily.
> 

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
Sweet!  Sounds to me like it's definitely worth getting this into NSPR if they'll take it.
(In reply to comment #15)
> (From update of attachment 335539 [details] [diff] [review])
> >Index: netwerk/cache/src/nsDiskCacheDevice.cpp
> >+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)
> 
> Why PLHashNumber and not just PRUint32?

Return value is PLHashNumber so it seems reasonable to have initial value of the same type.

>  static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);
> 
> ?  Or static inline void, if desired.  Somewhat preferable to using a macro, in
> my opinion.

OK, I've done a quick perf test and static inline is as fast as macro. Btw according to my simple test this new hash function is about 10% slower than the current one.

> >+  c = initval;         /* the previous hash value */
> 
> It is?  Or is it more of a "seed" value or something?

It is a comment from original source, but you are right.

> >Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
> >+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);
> 
> Where does this last magic number come from?

dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

> Why is this the only consumer that needs to pass a second arg in here?

This is recommended way to obtain 64-bit hash.

> I was wondering whether it would make sense to stick this somewhere in xpcom
> (maybe even nsCRT, with a different name from the existing functions) so others
> can also use it.  Or if we fix nsCRT to use this algorithm, just use that
> directly here?

I wanted to do this but I don't like the fact that nsDiskCache::kCurrentVersion depends on hash algorithm that is implemented somewhere else. There should be at least some automated test that would fail if kCurrentVersion isn't updated whenever hash function changes.
> Return value is PLHashNumber so it seems reasonable to have initial value of
> the same type.

Return value is PLDHashNumber, actually. I can see doing that.  And yes, the PLHash/PLDHash naming kinda sucks...  Document that the arg should just be any "reasonably random" number and that this arg is there to construct a 64-bit hash?

> dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

Document that this is just a randomly chosen number, then?  It looks very purposeful as things stand.

> but I don't like the fact that nsDiskCache::kCurrentVersion
> depends on hash algorithm that is implemented somewhere else.

That's a really good point.  OK, let's do disk cache separately and the other separately.

As far as that goes, by the way, padding char* with 0 after every byte should be pretty easy: you just end up with a bunch of guaranteed zeroes in various places in the algorithm and can simplify it somewhat.  Not sure what that does to the randomness of the result.
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #335539 - Attachment is obsolete: true
Attachment #335821 - Flags: review?(bzbarsky)
Attachment #335539 - Flags: review?(bzbarsky)
Comment on attachment 335821 [details] [diff] [review]
patch v2

Please, look at the change in DCacheHash(const char * key).
Attachment #335821 - Flags: review?(dcamp)
Comment on attachment 335821 [details] [diff] [review]
patch v2

r+sr=bzbarsky
Attachment #335821 - Flags: superreview+
Attachment #335821 - Flags: review?(bzbarsky)
Attachment #335821 - Flags: review+
(In reply to comment #21)
> As far as that goes, by the way, padding char* with 0 after every byte should
> be pretty easy: you just end up with a bunch of guaranteed zeroes in various
> places in the algorithm and can simplify it somewhat.  Not sure what that does
> to the randomness of the result.

I've added it to the test and this is a result:

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- newhash padded LE
1.134 -- newhash padded BE

LE - little endian (zeroes are inserted after characters)
BE - big endian (zeroes are inserted before characters)

So maybe dropping MSB from unicode is better solution?
Yeah, especially since then we don't have to worry about endianness issues.  Let's do that.
With these changes I've had problems building xpcom/tests, browser/components/build and embedding/browser/gtk/tests. I've changed Makefile.in so that it compiles now, but I'm almost sure that this isn't the right way.
Attachment #336502 - Flags: review?(bzbarsky)
Some quick comments:

1)  This will need review from bsmedberg (xpcom peer, build system stuff) and the
    NSPR peers.  I can probably review the content/layout/misc changes.
2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect that
    making it copying as you do is bad for performance.
(In reply to comment #28)
> 2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect
> that making it copying as you do is bad for performance.

Yes I know, but the problem is that I need to know the length of the string before hashing. I've tried to rewrite the function so that it doesn't need to know the length, but in that form it is cca 2.5x slower.
Hmm...  Could you do some sort of hash object thingie which you basically toss 4-byte-or-less chunks of data into, with a "final chunk" at the end which implements this hash function?  The existing consumers could just quickly toss things in 4 bytes at a time (as they do now, anyway)...

Or is that what you tried?
Attached file performance test
(In reply to comment #30)
> Hmm...  Could you do some sort of hash object thingie which you basically toss
> 4-byte-or-less chunks of data into, with a "final chunk" at the end which
> implements this hash function?  The existing consumers could just quickly toss
> things in 4 bytes at a time (as they do now, anyway)...
> 
> Or is that what you tried?

My first attempt was quite simple, see perf_test_sequentially.cpp in the archive. Then I tried to work with 4 bytes as you've proposed (perf_test_class.cpp), but it is still much slower (2x). This implementation works on little endian only and accepts only the last piece of hased data to be less than 4 bytes.

Do "make test" for results.
Attachment #336502 - Flags: review?(wtc)
Attachment #336502 - Flags: review?(benjamin)
Hrm....  It actually looks even worse at -Os, which is what we usually build with.

Would it make any sense to convert other consumers, leave atoms using the existing code or new functions that do what the existing code does, then do some whole-system timing of atoms?  If the new hash function + copy is still better than the level of hash collisions we get now, it's worth it.  Otherwise it's not.
The atom thing would be a followup bug, I suspect.  In fact, we could land the cache changes and move all the other stuff to followups if you'd prefer that.
Sorry for my silly question, but what exactly do you mean by "atom"?
Oh, sorry.  The AtomTable is the consumer of that HashCodeAsUTF8 method.  It relies on hashing UTF-16 and UTF-8 strings and looking up the same nsIAtom objects based on both without copying.  So it uses a non-copying hashcode generator and non-copying comparison function.
Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

In nsprpub/lib/ds/plhash.c, use strlen instead of PL_strlen to
avoid the need to include "plstr.h".  (<string.h> is already
included.)
Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

I did not review NSPR or NSS changes. Please split those changes out into separate patches because they are different projects.

>Index: browser/components/build/Makefile.in

>+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \

This directory uses frozen linkage and cannot link against these libs.

You should probably remove nsCRT::HashCode and put the hash impl in xpcom/glue/nsCRTGlue.{h,cpp}

>Index: embedding/browser/gtk/tests/Makefile.in

>+		$(XPCOM_LIBS) \

This uses frozen-independent linkage and must not link against XPCOM or NSPR.

> PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)

I would be very surprised if we could tolerate this, performance-wise: we hash 16-bit atoms a lot. Have you profiled this code to show that it is not a performance hotspot?


>+  /**
>+   * nsCRT::BufferHashCode implements the same hash algorithm as PL_HashString.
>+   * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
>+   * value as wide strings, e.g. "Hello" and L"Hello".
>+   */

For what set of characters? ASCII-only? This comment needs to be much more precise.

>Index: xpcom/glue/nsTHashtable.cpp

>+#include "nsCRT.h"

nsTHashtable is in the glue and must not include nsCRT.h. I'm surprised it even compiled.

>Index: xpcom/tests/Makefile.in

Similar glue issues here.

>Index: xpcom/tests/TestHash.cpp

>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998
>+ * the Initial Developer. All Rights Reserved.

really?

>+int main(int argc, char** argv)

This test needs to use the standard framework for reporting errors. For best results use TestHarness.h
Attachment #336502 - Flags: review?(wtc)
Attachment #336502 - Flags: review?(bzbarsky)
Attachment #336502 - Flags: review?(benjamin)
Attachment #336502 - Flags: review-
Attached patch xpcom part of new patch (obsolete) — Splinter Review
Attachment #336502 - Attachment is obsolete: true
Attachment #347777 - Flags: review?(benjamin)
Attached patch security part (obsolete) — Splinter Review
Attachment #347778 - Flags: review?(wtc)
Attached patch rdf part (obsolete) — Splinter Review
Attachment #347779 - Flags: review?(benjamin)
Attached patch parser part (obsolete) — Splinter Review
Attachment #347780 - Flags: review?(jst)
Attached patch nspr part (obsolete) — Splinter Review
Attachment #347781 - Flags: review?(wtc)
Attached patch netwerk part (obsolete) — Splinter Review
Attachment #347784 - Flags: review?(bzbarsky)
Attached patch extensions/spellcheck (obsolete) — Splinter Review
I couldn't find any owner/peer of this code. Boris, can you please do the review?
Attachment #347785 - Flags: review?(bzbarsky)
Attached patch embedding part (obsolete) — Splinter Review
Attachment #347786 - Flags: review?(benjamin)
Attached patch content part (obsolete) — Splinter Review
Attachment #347787 - Flags: review?(jst)
Attached patch caps part (obsolete) — Splinter Review
Attachment #347788 - Flags: review?(bzbarsky)
(In reply to comment #37)
> (From update of attachment 336502 [details] [diff] [review])
> I did not review NSPR or NSS changes. Please split those changes out into
> separate patches because they are different projects.

I've split whole patch by modules.

> >Index: browser/components/build/Makefile.in
> 
> >+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \
> 
> This directory uses frozen linkage and cannot link against these libs.
> 
> You should probably remove nsCRT::HashCode and put the hash impl in
> xpcom/glue/nsCRTGlue.{h,cpp}

I did it and it solved all problems with linking...

> > PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
> 
> I would be very surprised if we could tolerate this, performance-wise: we hash
> 16-bit atoms a lot. Have you profiled this code to show that it is not a
> performance hotspot?

I've reverted this change, because I couldn't include nsString.h in nsCRTGlue.cpp. I've created class NS_HashByBytes that can hash data in sequence.
Comment on attachment 347784 [details] [diff] [review]
netwerk part

Why not pass lengths from all the callsites (or heck, have an inline NS_HashCode that takes const nsCString)?  Also, the fact that on OOM we'll return a different hashcode kinda bothers me, but I can live with it, I guess...
Attachment #347785 - Flags: review?(bzbarsky) → review+
Comment on attachment 347788 [details] [diff] [review]
caps part

OK, but again if we had an nsCString version we could use it here.
Attachment #347788 - Flags: review?(bzbarsky) → review+
Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

>diff --git a/xpcom/glue/nsCRTGlue.cpp b/xpcom/glue/nsCRTGlue.cpp

>+/**
>+ * |NS_HashCode| is identical to |PL_HashString|.
>+ */
>+PRUint32 NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+{
>+  if (!str) return 0;
>+
>+  PRUint32 len = strlen(str);
>+
>+  if ( resultingStrLen )
>+    *resultingStrLen = len;
>+
>+  return PL_HashData(str, len, 0);
>+}

You still have a problem of the standalone glue linking against NSPR symbols.

>+PRUint32 NS_HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
>+{
>+  const PRUnichar* s = start;
>+  const PRUnichar* end = start + length;
>+
>+  PRUint16 W1 = 0;      // the first UTF-16 word in a two word tuple
>+  PRUint32 U = 0;       // the current char as UCS-4
>+  int code_length = 0;  // the number of bytes in the UTF-8 sequence for the current char
>+
>+  NS_HashByBytes hash;
>+
>+  PRUint16 W;
>+  while ( s < end )
>+    {
>+      W = *s++;
>+        /*
>+         * On the fly, decoding from UTF-16 (and/or UCS-2) into UTF-8 as per
>+         *  http://www.ietf.org/rfc/rfc2781.txt
>+         *  http://www.ietf.org/rfc/rfc3629.txt
>+         */

is it impossible to reuse our existing UTF8CharEnumerator class (nsUTF8Utils.h)? I don't particularly like duplicating all this logic.

>diff --git a/xpcom/glue/nsCRTGlue.h b/xpcom/glue/nsCRTGlue.h

>+// Computes the hashcode for a c-string, returns the string length as
>+// an added bonus.
>+NS_COM_GLUE PRUint32
>+NS_HashCode(const char* str, PRUint32* resultingStrLen = nsnull);

Please use javadoc-style comments.

>+// Computes the hashcode for a length number of bytes of c-string data.
>+NS_COM_GLUE PRUint32 NS_HashCode(const char* start, PRUint32 length);

nit, function name goes on the next line

> PRUint32
> HashString(const char *str)
> {
>-  PRUint32 code = 0;
>-
>-  while (*str) {
>-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
>-    ++str;
>-  }
>-
>-  return code;
>+  return NS_HashCode(str);
> }

Why keep this function if it just forwards to NS_HashCode? Even if you don't do it in this pass, please file a followup bug to rename all uses of this function to the new version.
Attachment #347777 - Flags: review?(benjamin) → review-
Attachment #347779 - Flags: review?(benjamin) → review+
Attachment #347786 - Flags: review?(benjamin) → review+
(In reply to comment #51)
> >+  return PL_HashData(str, len, 0);
> >+}
> 
> You still have a problem of the standalone glue linking against NSPR symbols.

Hmm, so I need to copy code from PL_HashData to some function in nsCRTGlue.cpp, right?
Comment on attachment 347781 [details] [diff] [review]
nspr part

Thanks for the patch.  I'm afraid that we can't change
PL_HashString because of backward compatibility.  As your
patch v2 (attachment 335821 [details] [diff] [review]) shows, you need to bump the
kCurrentVersion of nsDiskCache when you change its hash
function.  So similarly, there could be users of
PL_HashString that have stored persistent data based on
the current PL_HashString, and they will be broken when
they upgrade to a new version of NSPR and PL_HashString
produces different hashes.

Please correct me if I'm wrong.

You can add a new function PL_HashString2 (for lack of
better name) to deal with this issue.  You also need to
add the new function(s) to nsprpub/lib/ds/plds.def in a
new NSPR_4.8 section at the end of the file.

Nit: NSPR has a different formatting style from Mozilla.
Please imitate the prevalent formatting style of in that
file.  (I can fix that for you when I check in the patch.)
Attachment #347781 - Flags: review?(wtc) → review-
Comment on attachment 347778 [details] [diff] [review]
security part

These changes should be OK.  We'll need to make sure
we only use these two hash functions for in-memory
hash tables that are rebuilt in every process startup.
We need to watch out for the problem I described in
comment 53.
Attachment #347778 - Flags: superreview?(rrelyea)
Comment on attachment 347778 [details] [diff] [review]
security part

r=wtc.  I verified that these two hash functions
nss_item_hash and nss_certificate_hash are only
passed to PL_NewHashTable.
Attachment #347778 - Flags: review?(wtc) → review+
Attachment #347787 - Flags: superreview+
Attachment #347787 - Flags: review?(jst)
Attachment #347787 - Flags: review+
Attachment #347778 - Flags: superreview?(rrelyea) → superreview+
Comment on attachment 347778 [details] [diff] [review]
security part

r+ rrelyea
Attachment #347780 - Flags: superreview?(mrbkap)
Attachment #347780 - Flags: review?(jst)
Attachment #347780 - Flags: review+
Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

+/**
+ * NS_BufferHashCode implements the same hash algorithm as PL_HashString.
+ * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
+ * value as wide strings, e.g. "Hello" and L"Hello". This is true for ASCII and
+ * all other strings where character in narrow string is equal to LSB
+ * of character in wide string.

Do we know for a fact that the above assumption is safe for the atom table? Any valid UTF8 string must hash to the same value as its UTF16 representation for the atom hash table to work right AFAIC.
Attachment #347780 - Flags: superreview?(mrbkap) → superreview+
(In reply to comment #57)
> Do we know for a fact that the above assumption is safe for the atom table? Any
> valid UTF8 string must hash to the same value as its UTF16 representation for
> the atom hash table to work right AFAIC.

The old behaviour was IMHO the same. There was no guarantee that hash values are equal for non-latin1 strings.
That is, nsCRT::HashCode(PRUnichar*) and nsCRT::HashCode(char*) only behaved the same for latin1, right?  nsCRT::HashCodeAsUTF8(PRUnichar*) behaved the same as nsCRT::HashCode(char* /* UTF8 */) and that's preserved in this patch.
(In reply to comment #51)
> > PRUint32
> > HashString(const char *str)
> > {
> >-  PRUint32 code = 0;
> >-
> >-  while (*str) {
> >-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
> >-    ++str;
> >-  }
> >-
> >-  return code;
> >+  return NS_HashCode(str);
> > }
> 
> Why keep this function if it just forwards to NS_HashCode? Even if you don't do
> it in this pass, please file a followup bug to rename all uses of this function
> to the new version.

I'll do it within this bug. I've added NS_HashCode(const nsAString &) and now all functions HashString() can be replaced with NS_HashCode(). Due to this replacement I've found some places that still uses old hash algorithm: JSJ_HashString(), JS_HashString(), js_HashString(). I guess that I must duplicate the code again instead of calling NSPR hash function, right? Is there the same problem as described in comment #53? I.e. should I create a new function JS_HashString2()?
Have you considered Paul Hsieh's SuperFastHash?
http://www.azillionmonkeys.com/qed/hash.html

I'm also not sure if we need to replace the hash functions
that are used with PLHashTable such as PL_HashString.
PLHashTable handles collisions by chaining, so having
collisions is not disastrous.  It seems that in the extra
time I spend in this new hash function, I can follow a
lot of chaining pointers, so I'm not sure if using the
new hash function is overall a win.
Bug 412345 and bug 435802 are also looking at string hash
functions.
Blocks: 412345
Blocks: 435802
(In reply to comment #61)
> Have you considered Paul Hsieh's SuperFastHash?
> http://www.azillionmonkeys.com/qed/hash.html

I didn't know about this function. It seems to be much faster but I did a quick comparison with https://bugzilla.mozilla.org/attachment.cgi?id=26596 and the result isn't too good.

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- superfasthash

> I'm also not sure if we need to replace the hash functions
> that are used with PLHashTable such as PL_HashString.
> PLHashTable handles collisions by chaining, so having
> collisions is not disastrous.  It seems that in the extra
> time I spend in this new hash function, I can follow a
> lot of chaining pointers, so I'm not sure if using the
> new hash function is overall a win.

Yes, this is maybe true. Let's try it and see what happens. At least the hash function will be now only at few places and trying some other function will be quick and simple.
Michal, can you answer the questions from comment 49?
(In reply to comment #49)
> (From update of attachment 347784 [details] [diff] [review])
> Why not pass lengths from all the callsites (or heck, have an inline
> NS_HashCode that takes const nsCString)?

You mean nsACString, right? It will be changed in a new patch. But it won't be inlined. It didn't compile when I tried to define it as inlined.

> Also, the fact that on OOM we'll
> return a different hashcode kinda bothers me, but I can live with it, I
> guess...

I'll fix it. I'll changed it so that it will use NS_HashByBytes.
> You mean nsACString, right?

No, I really meant nsCString, assuming all callers have one and assuming this is somewhat "internal" API that doesn't have to worry about external strings.  If not, I suppose we could do an nsACString one instead.

I have no idea what the issue with compiling is there; Benjamin might know...
Attachment #347784 - Flags: review?(bzbarsky)
Comment on attachment 347784 [details] [diff] [review]
netwerk part

Removing review request pending said updated patch.
new hash function is PL_HashString2
Attachment #347778 - Attachment is obsolete: true
Attachment #347781 - Attachment is obsolete: true
Attachment #353743 - Flags: review?(wtc)
Attached patch patch2 v3 all except nss+nspr (obsolete) — Splinter Review
changes in the new version:
- NS_HashCodeAsUTF8 uses UTF16CharEnumerator
- HashString functions were removed and replaced with NS_HashCode
- JS_HashString/JSJ_HashString/js_HashString changed to new hash function
- PL_HashString usage replaced with PL_HashString2
- StringHash in nsHttp.cpp uses NS_HashByBytes instead of NS_strdup+NS_HashCode
- NS_HashCode(const char*, PRUint32*) is exported as C function. NS_HashCode is needed in pldhash.c and I've found this solution better then creating new function just for this usage.
Attachment #347777 - Attachment is obsolete: true
Attachment #347779 - Attachment is obsolete: true
Attachment #347780 - Attachment is obsolete: true
Attachment #347784 - Attachment is obsolete: true
Attachment #347785 - Attachment is obsolete: true
Attachment #347786 - Attachment is obsolete: true
Attachment #347787 - Attachment is obsolete: true
Attachment #347788 - Attachment is obsolete: true
Attachment #353746 - Flags: review?(bzbarsky)
Attachment #353746 - Flags: review?(benjamin)
(In reply to comment #66)
> No, I really meant nsCString, assuming all callers have one and assuming this
> is somewhat "internal" API that doesn't have to worry about external strings. 
> If not, I suppose we could do an nsACString one instead.

NS_HashCode(const nsACString &) is now needed because all HashString(...) were replaced by NS_HashCode(...). So it is now a question if it's worth to havenlined NS_HashCode(const nsCString&) to speed up calling at few places.

> I have no idea what the issue with compiling is there; Benjamin might know...

This is the error when the function is inlined:
/usr/bin/ld: TestHash: hidden symbol `NS_HashCode(nsCString const&)' isn't defined
/usr/bin/ld: final link failed: Nonrepresentable section on output

Benjamin, do you have an idea what's wrong?
nsACString == nsCSubstring nowadays, so I don't think there's any benefit to making the API use nsCString& instead of nsACString&

jsdhash is C++ now... please make pldhash be C++ also, and we can get rid of the extern "C" marking. In particular, you have multiple functions with the same name, one of which is extern "C"... this is a recipe for future type mismatches.
Attachment #353746 - Attachment is obsolete: true
Attachment #353867 - Flags: review?(benjamin)
Attachment #353746 - Flags: review?(bzbarsky)
Attachment #353746 - Flags: review?(benjamin)
Attachment #353867 - Flags: review?(bzbarsky)
>+++ b/xpcom/ds/nsStaticNameTable.cpp
> caseInsensitiveStringHashKey(PLDHashTable *table, const void *key)
>+        nsAutoString keyCopy(tableKey->mKeyStr.m2b->get());

This is what this code used to do before it was rewritten to work as now.  The result showed up in CSS parsing profiles.  I'd rather not change this code as part of a huge patch; changing it separated so that the performance impact can be gauged, including on the microbenchmarks used in the bug that added this code, is better.

>+++ b/xpcom/glue/nsCRTGlue.cpp
>+NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+  if ( resultingStrLen )

Ditch the spaces around resultingStrLen?

>+NS_HashCode(const PRUnichar* str, PRUint32* resultingStrLen)
>+  const PRUnichar* k = str;
>+  while (*k)
>+    k++;

NS_strlen?

>+  if ( resultingStrLen )

Ditch the spaces.

>+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)

The naming here is a bit weird... NS_StrHashCode?

I'm not seeing an implementation of the NS_HashCode that takes char* and a length.  Where should I be looking?

Also, did you use the sed script to create pldhash.cpp?  Or did you edit it manually?
(In reply to comment #73)
> >+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)
> 
> The naming here is a bit weird... NS_StrHashCode?

It was nsCRT::BufferHashCode so I kept the name. But I can't see any reason why the name shouldn't be also NS_HashCode. This is just PRUnichar version of NS_HashCode(char *, int). But I can change it to NS_StrHashCode if you wish.

> I'm not seeing an implementation of the NS_HashCode that takes char* and a
> length.  Where should I be looking?

It is just after NS_BufferHashCode (now renamed to NS_HashCode too) and before NS_HashByBytes::NS_HashByBytes.

> Also, did you use the sed script to create pldhash.cpp?  Or did you edit it
> manually?

I did the changes manually but I've updated (and tested) the sed script. But somebody else already did some change that isn't in the script.

$ diff -w pldhash.cpp pldhash.cpp_sed 
72c72
< #define JSDHASH_SINGLE_LINE_ASSERTION PR_ASSERT
---
> #define JSDHASH_ONELINE_ASSERT PR_ASSERT
Attachment #353867 - Attachment is obsolete: true
Attachment #353911 - Flags: review?(bzbarsky)
Attachment #353867 - Flags: review?(bzbarsky)
Attachment #353867 - Flags: review?(benjamin)
Attachment #353911 - Flags: review?(benjamin)
Attachment #353911 - Flags: review?(bzbarsky) → review+
Comment on attachment 353911 [details] [diff] [review]
patch2 v5 - changes according to comment #73

NS_HashCode(PRUnichar*) sounds fine to me.  I thought for some reason that there was a void* overload too; no idea what made me think that.

> But somebody else already did some change that isn't in the script.

Yeah, that was me forgetting to run the script after changing it in response to some review comments.  :(  Just run the script and include that change in what you chek in, please.

r=bzbarsky
Comment on attachment 353913 [details] [diff] [review]
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp

I'd like to see those benchmark results before reviewing this.
(In reply to comment #77)
> (From update of attachment 353913 [details] [diff] [review])
> I'd like to see those benchmark results before reviewing this.

What bug contains those benchmarks? I couldn't find it. I tried to profile caseInsensitiveStringHashKey using callgrind.

without patch 1021712 Ir
Ir     Count
988730 5497  caseInsensitiveStringHashKey (self)
 27282 4547  nsString::get()
  5700  950  nsCString::get()


with patch 7245502 Ir
Ir      Count
 935904 5497  caseInsensitiveStringHashKey (self)
2982761 4547  nsAutoString::nsAutoString(unsigned short const*, unsigned)
1970683 4547  NS_HashCode(unsigned short const *, unsigned)
 416645  950  NS_HashCode(char const *, unsigned)
 378815  950  nsCAutoString::nsCAutoString(char const*, unsigned)
 291008 4547  nsAutoString::~nsAutoString()
 145504 4547  nsAString_internal::BeginWriting()
  60800  950  nsCAutoString::~nsCAutoString()
  30400  950  nsACString_internal::BeginWriting()
  27282 4547  nsString::get()
   5700  950  nsCString::get()
Hmm.  I thought I'd linked to testcases from bug 348691.

How about this:

<body style="display: none">
  <script>
    var s = document.body.style;
    for (var i = 0; i < 1000000; ++i) {
      s.setPropertyValue("left", "auto");
    }
  </script>
</body>

Or just timing the parsing of a largish CSS file, for that matter.
Simple measurement using timestamps gives (1000000x setProperty):
 - without patch 48550ms
 - with patch 49506ms

Time spent in caseInsensitiveStringHashKey using callgrind (10000x setProperty):
 - without patch 1721124 Ir
 - with patch 19109906 Ir
So this change significantly slows down caseInsensitiveStringHashKey but in that testcase only about 2% of the total cost is in that function?  That sounds about right to me, actually.  The cost would be higher in a stylesheet parse, though probably not over 5%.
You can't land attachment 353911 [details] [diff] [review] without 353913, right? Is there a practical reason we can't avoid the string copy in the caseInsensitiveStringHashKey case, kinda the same way it works today, by hashing it a character at a time using the same algorithm?
(In reply to comment #81)
> So this change significantly slows down caseInsensitiveStringHashKey but in
> that testcase only about 2% of the total cost is in that function?

Yes, valgrind shows ca 1.3% in caseInsensitiveStringHashKey.
(In reply to comment #82)
> You can't land attachment 353911 [details] [diff] [review] without 353913, right?

I thing that 353913 isn't needed for 353911. I guess that it is OK to have old hash function only in nsStaticCaseInsensitiveNameTable...

> Is there a practical
> reason we can't avoid the string copy in the caseInsensitiveStringHashKey case,
> kinda the same way it works today, by hashing it a character at a time using
> the same algorithm?

Good point. This new patch uses NS_HashByBytes and it is faster. The same test as in comment #80 gives 48930ms and 10591143 Ir.
Attachment #353913 - Attachment is obsolete: true
Attachment #355597 - Flags: review?(benjamin)
Attachment #353913 - Flags: review?(bzbarsky)
Still almost 10x slower than the old code, no?  At least according to Ir?
(In reply to comment #85)
> Still almost 10x slower than the old code, no?  At least according to Ir?

Of course. I meant that it is faster than previous patch, not original code...
Attachment #355597 - Flags: review?(benjamin) → review+
Attachment #335821 - Flags: review?(dcamp) → review+
Keywords: checkin-needed
Whiteboard: only patch #335821 should be checked in now
Attachment #353911 - Flags: review?(benjamin) → review+
I'm now investigating bug #321361 and it seems to be similar to #355567. Checking in patch #335821 would make reproducing the bug very hard. So I'm postponing the checkin until bug #321361 is fixed and properly tested...
Depends on: 321361
Keywords: checkin-needed
Whiteboard: only patch #335821 should be checked in now
Blocks: 441052
OS: Linux → All
Hardware: x86 → All
Keywords: checkin-needed
Whiteboard: only patch #335821 should be checked in now
Could someone post or point the specific hash implementation here which is the reason for the collisions?  It would be good to pre-test files as cache-able before deploying sites.
Comment on attachment 335821 [details] [diff] [review]
patch v2


Could you update this cvs patch to a current hg one ? Thanks.
Whiteboard: only patch #335821 should be checked in now → only patch #368619 should be checked in now
Comment on attachment 368619 [details] [diff] [review]
patch v2 - diff against hg rev. 4b2a90a7726e
[Checkin: Comment 94]


http://hg.mozilla.org/mozilla-central/rev/999999125311
Attachment #368619 - Attachment description: patch v2 - diff against hg rev. 4b2a90a7726e → patch v2 - diff against hg rev. 4b2a90a7726e [Checkin: Comment 94]
Keywords: checkin-needed
Whiteboard: only patch #368619 should be checked in now
Target Milestone: --- → mozilla1.9.2a1
Blocks: 489882
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Please could anyone of you do the review?
Attachment #353743 - Flags: review?(ted.mielczarek)
Michal, have you tried mailing wtc directly?
Yes, few months ago...
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Hi Michal,

Sorry about the late response.  I didn't have time to follow
this bug closely.  Part of the reason is that this bug doesn't
seem to affect NSPR or NSS that much.

PL_HashString is an extremely simple hash function, whereas
PL_HashString2 is quite complicated.  This has two implications.

1. We'd need to give guidance to NSPR users on which string
hash function is appropriate for what purposes.

2. PLHashTable resolves hash collisions by chaining.  I guess
the advantage of PL_HashString2 is that it reduces hash collisions,
but intuitively with the extra computation performed by
PL_HashString2, I could follow a lot of chaining links in the
same amount of time.  So it's not clear PL_HashString2 is a win
for PLHashTable.

I know you want to close this bug, so I propose the following:

1. I delegate to a Mozilla developer I trust such as bz the review
of your patch.  Then I just need some comments in plhash.h to
tell NSPR users which string hash function should be used for
what purposes.

or

2. Abandon this patch for nspr+nss.

Thanks.
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

I'm not at all qualified to review this. Please follow wtc's suggestions.
Attachment #353743 - Flags: review?(ted.mielczarek)
Blocks: 175600
Blocks: 193911
Blocks: http_cache
(In reply to comment #98)
> (From update of attachment 353743 [details] [diff] [review])
> Hi Michal,
...
> I know you want to close this bug, so I propose the following:
> 
> 1. I delegate to a Mozilla developer I trust such as bz the review
> of your patch.  Then I just need some comments in plhash.h to
> tell NSPR users which string hash function should be used for
> what purposes.

So Michal, why don't you request review from bzbarsky?
Blocks: 216490
Renaming since the HTTP cache part of this bug already landed (comment 94)
Summary: Some files are never cached due to hash collisions which are quite common due to weak string hash function → Provide alternative to NSPR's default string hash function
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Seeing if bz will take review as WTC requested.  Otherwise this bug is WONTFIX, yes?
Attachment #353743 - Flags: review?(wtc) → review?(bzbarsky)
> Renaming since the HTTP cache part of this bug already landed (comment 94)

Imho this bug should have been resolved fixed at that point, with followup bugs filed for followup work.  One bug per issue....  Can we please do that now?

I can do the review, but it'll likely be a week or more before I get to it; lots of other stuff in the queue ahead of it.
No longer blocks: http_cache
Renaming back to original issue, which has landed, and marking FIXED.  Michal, if you like please open a followup NSPR bug with patch for bz.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Summary: Provide alternative to NSPR's default string hash function → Some files are never cached due to hash collisions which are quite common due to weak string hash function
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

wrong version of patch submitted for review.
Attachment #353743 - Flags: review?(bzbarsky)
Blocks: 729940
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: