Last Comment Bug 290032 - Some files are never cached due to hash collisions which are quite common due to weak string hash function
: Some files are never cached due to hash collisions which are quite common due...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All All
: -- normal with 4 votes (vote)
: mozilla1.9.2a1
Assigned To: Michal Novotny (:michal)
:
Mentors:
http://nf.opoint.com/main/ff_cache.html
: 338190 484182 533701 (view as bug list)
Depends on: 310656 321361
Blocks: 412345 435802 175600 193911 216490 441052 489882 729940
  Show dependency treegraph
 
Reported: 2005-04-12 03:54 PDT by Tomas A. Haavet
Modified: 2012-02-23 07:32 PST (History)
36 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
HTTP log when reproducing the bug (217.54 KB, text/plain)
2005-05-16 07:03 PDT, Tomas A. Haavet
no flags Details
changed hash function in cache (5.42 KB, patch)
2008-08-26 09:30 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch v2 (6.07 KB, patch)
2008-08-27 18:25 PDT, Michal Novotny (:michal)
bzbarsky: review+
dcamp: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
patch for plhash, nsCRT and others (31.95 KB, patch)
2008-09-02 09:24 PDT, Michal Novotny (:michal)
benjamin: review-
Details | Diff | Splinter Review
performance test (2.03 KB, application/x-compressed-tar)
2008-09-03 09:36 PDT, Michal Novotny (:michal)
no flags Details
xpcom part of new patch (32.06 KB, patch)
2008-11-12 07:43 PST, Michal Novotny (:michal)
benjamin: review-
Details | Diff | Splinter Review
security part (1.61 KB, patch)
2008-11-12 07:45 PST, Michal Novotny (:michal)
wtc: review+
rrelyea: superreview+
Details | Diff | Splinter Review
rdf part (2.12 KB, patch)
2008-11-12 07:48 PST, Michal Novotny (:michal)
benjamin: review+
Details | Diff | Splinter Review
parser part (633 bytes, patch)
2008-11-12 07:50 PST, Michal Novotny (:michal)
jst: review+
mrbkap: superreview+
Details | Diff | Splinter Review
nspr part (3.43 KB, patch)
2008-11-12 07:53 PST, Michal Novotny (:michal)
wtc: review-
Details | Diff | Splinter Review
netwerk part (2.97 KB, patch)
2008-11-12 07:55 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
extensions/spellcheck (753 bytes, patch)
2008-11-12 07:59 PST, Michal Novotny (:michal)
bzbarsky: review+
Details | Diff | Splinter Review
embedding part (804 bytes, patch)
2008-11-12 08:01 PST, Michal Novotny (:michal)
benjamin: review+
Details | Diff | Splinter Review
content part (1.57 KB, patch)
2008-11-12 08:04 PST, Michal Novotny (:michal)
jst: review+
jst: superreview+
Details | Diff | Splinter Review
caps part (638 bytes, patch)
2008-11-12 08:06 PST, Michal Novotny (:michal)
bzbarsky: review+
Details | Diff | Splinter Review
patch2 v3 nspr+nss (7.03 KB, patch)
2008-12-18 15:52 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch2 v3 all except nss+nspr (75.30 KB, patch)
2008-12-18 16:09 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch2 v4 - pldhash.c changed to C++ (79.02 KB, patch)
2008-12-19 12:30 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch2 v5 - changes according to comment #73 (77.35 KB, patch)
2008-12-19 16:38 PST, Michal Novotny (:michal)
bzbarsky: review+
benjamin: review+
Details | Diff | Splinter Review
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp (1.60 KB, patch)
2008-12-19 16:42 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp v2 (1.47 KB, patch)
2009-01-06 09:10 PST, Michal Novotny (:michal)
benjamin: review+
Details | Diff | Splinter Review
patch v2 - diff against hg rev. 4b2a90a7726e [Checkin: Comment 94] (5.44 KB, patch)
2009-03-20 17:23 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review

Description Tomas A. Haavet 2005-04-12 03:54:57 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0

It seems that the files ListBar.js, ListBar.css, MenuBar.js, and MenuBar.css are
never cached.  On page page http://nf.opoint.com/main/ff_cache.html there are
other css- and js-files that are cached, but never the ones named ListBar and
MenuBar.  If I rename these files, they get cached, e.g. List.css is a copy of
ListBar.css.


Reproducible: Always

Steps to Reproduce:
1. Open url
2. Reload page (e.g. klikk on the reload-link)
3. With each page reload the four files ListBar.js, ListBar.css, MenuBar.js, and
MenuBar.css get fetched from the server (use e.g. LiveHTTPHeaders)

Actual Results:  
The files are not cached

Expected Results:  
They should be cached as the other js- and css-files

Also tested on:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217
Comment 1 Boris Zbarsky [:bz] (TPAC) 2005-05-15 22:34:49 PDT
Could you create a log as described at
http://www.mozilla.org/projects/netlib/http/http-debugging.html and attach it to
this bug?
Comment 2 Tomas A. Haavet 2005-05-16 07:03:34 PDT
Created attachment 183731 [details]
HTTP log when reproducing the bug

Browser used: Mozilla 1.8b2, Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2)
Gecko/20050516
1. Cleared the browser cache before loading the page.
2. Reloaded the page (by clicking on the reload-link)
3. The log file verifies that the files MenuBar.css, ListBar.css, MenuBar.js,
and ListBar.js are not properly cached (they always report 'got cache entry
[access=2]')
Comment 3 Boris Zbarsky [:bz] (TPAC) 2005-05-16 08:54:10 PDT
Confirming based on that log.
Comment 4 Darin Fisher 2006-06-21 15:08:42 PDT
-> default owner
Comment 5 Michal Novotny (:michal) 2008-07-17 16:47:49 PDT
Problem is that cache entry can't be opened due to hash collision.

Breakpoint 1, nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
422             *collision = PR_TRUE;
(gdb) bt 10
#0  nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
#1  0x0105fec1 in nsCacheService::SearchCacheDevices (this=0xb791e6b0, key=0xb6d17770, policy=0, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1540
#2  0x01062e5c in nsCacheService::ActivateEntry (this=0xb791e6b0, request=0xb6d1ef20, result=0xbfb9c958)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1452
#3  0x01062394 in nsCacheService::ProcessRequest (this=0xb791e6b0, request=0xb6d1ef20, calledFromOpenCacheEntry=1, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1340
#4  0x01063161 in nsCacheService::OpenCacheEntry (session=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, listener=0x0, 
    result=0xb33afb20) at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1420
#5  0x01065cab in nsCacheSession::OpenCacheEntry (this=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheSession.cpp:106
#6  0x010c1308 in nsHttpChannel::OpenCacheEntry (this=0xb33afa20, offline=0, delayed=0xbfb9cac8)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:1417
#7  0x010c626c in nsHttpChannel::Connect (this=0xb33afa20, firstTime=1)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:286
#8  0x010c697f in nsHttpChannel::AsyncOpen (this=0xb33afa20, listener=0xb6d14dc0, context=0x0)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:3745
#9  0x013652b2 in CSSLoaderImpl::LoadSheet (this=0xb5f780c0, aLoadData=0xb01afe20, aSheetState=eSheetNeedsParser)
    at /opt/moz/TRUNK_new3/mozilla/layout/style/nsCSSLoader.cpp:1454
(More stack frames follow...)
(gdb) l
417         nsDiskCacheEntry * diskEntry = mCacheMap.ReadDiskCacheEntry(&record);
418         if (!diskEntry) return nsnull;
419         
420         // compare key to be sure
421         if (strcmp(diskEntry->Key(), key->get()) != 0) {
422             *collision = PR_TRUE;
423             return nsnull;
424         }
425         
426         nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
(gdb) p diskEntry->Key()
$1 = 0xb44f7624 "HTTP:http://localhost/bug338190/3/XSL/bss.xsl"
(gdb) p *key
$2 = {<nsACString_internal> = {<nsCSubstring_base> = {<No data fields>}, mData = 0xb4677948 "HTTP:http://localhost/bug338190/3/CSS/bss.css", 
    mLength = 45, mFlags = 5}, <No data fields>}
(gdb) p/x hashNumber
$3 = 0xc4b04670


Hash function http://mxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDevice.cpp#240 that is used in mozilla code a lot is really weak and collisions are quite common.

$ ./hashit "ABA/xxx.aba"
8ffac222
$ ./hashit "XyZ/xxx.xYz"
8ffac222
$ ./hashit "CSS/xxx.css"
8ffac222
$ ./hashit "JPG/xxx.jpg"
8ffac222


$ ./hashit modules_newsfeeds/MenuBar/MenuBar.css
15c23729
$ ./hashit modules_newsfeeds/ListBar/ListBar.css
15c23729

$ ./hashit modules_newsfeeds/MenuBar/MenuBar.js
a15c23e5
$ ./hashit modules_newsfeeds/ListBar/ListBar.js
a15c23e5


IMHO the hash function should be changed.
Comment 6 benc 2008-07-20 08:28:55 PDT
Michal: that is very interesting. Can you update the summary?
Comment 7 Boris Zbarsky [:bz] (TPAC) 2008-07-20 18:37:52 PDT
We've run into issues with collisions in this hash function before.  Changing it sounds like a great idea.
Comment 8 Michal Novotny (:michal) 2008-08-07 14:58:58 PDT
*** Bug 338190 has been marked as a duplicate of this bug. ***
Comment 9 Michal Novotny (:michal) 2008-08-11 05:43:46 PDT
(In reply to comment #7)
> We've run into issues with collisions in this hash function before.  Changing
> it sounds like a great idea.
> 

I've done some search and this hash function seems quite good for this purpose:

http://burtleburtle.net/bob/hash/evahash.html

What do you think?
Comment 10 Boris Zbarsky [:bz] (TPAC) 2008-08-19 20:47:51 PDT
Sorry it took me so long to reply; I was out for a week.

At first glance, that hash function looks pretty reasonable, particularly for hashing mostly-ascii strings as we are.

Let's give it a shot.  As long as it doesn't impact performance negatively, it's got to be a win in terms of collision-avoidance.
Comment 11 Michal Novotny (:michal) 2008-08-26 09:30:25 PDT
Created attachment 335539 [details] [diff] [review]
changed hash function in cache
Comment 12 Michal Novotny (:michal) 2008-08-26 11:19:43 PDT
I thought that it make sense to replace not only cache hash function but also other same hash functions with the new one. I've rewritten PL_HashString() and tried to use it everywhere, where I've found the old function. But I'm stuck with nsCRT::HashCode(). There is version for (char *) and also for (PRUnichar *). Hash value should be the same for "SomeText" and L"SomeText". I can do this with the new hash function only by discarding most significant byte of PRUnichar and I don't think that it is correct. Is it necessary to keep this feature?
Comment 13 Benjamin Smedberg [:bsmedberg] 2008-08-26 11:24:14 PDT
Yes, it is, because we hash utf8 and unicode atoms in the same table.
Comment 14 Boris Zbarsky [:bz] (TPAC) 2008-08-26 11:54:00 PDT
Can we pad the char* out with nulls instead of dropping the MSB of the PRUnichar?  How would performance look?  That code is _very_ perf-sensitive.  It's possible that a better hash function would even improve performance here, of course.  ;)

That said, this is the hashcode computation, right?  So discarding the MSB would just mean slightly more collisions, which might be acceptable for the atom table (almost all atoms are in fact pure-ascii).
Comment 15 Boris Zbarsky [:bz] (TPAC) 2008-08-26 12:42:27 PDT
Comment on attachment 335539 [details] [diff] [review]
changed hash function in cache

>Index: netwerk/cache/src/nsDiskCacheDevice.cpp
>+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)

Why PLHashNumber and not just PRUint32?

It's probably worth adding a pointer to where the algorithm came from or something.

>+#define hashmix(a,b,c) \

How about:

 static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);

?  Or static inline void, if desired.  Somewhat preferable to using a macro, in my opinion.

>+nsDiskCache::Hash(const char * key, PLHashNumber initval)
>+  PRUint8 *k = (PRUint8*)key;

reinterpret_cast.

>+  PRUint32 a,b,c,len,length;

Spaces after the commas, please?

>+  c = initval;         /* the previous hash value */

It is?  Or is it more of a "seed" value or something?

>+    a += (k[0] +((PRUint32)k[1]<<8) +((PRUint32)k[2]<<16) +((PRUint32)k[3]<<24));

I think this would be more readable (at least in terms of making sure it's correct) with C++-style casts and withoutthe outermost parens, and with spaces around the plus signs:

  a += k[0] + (PRUint32(k[1])<<8) + (PRUint32(k[2])<<16) + (PRUint32(k[3])<<24);

Similar for the other three statements.  Also use C++ casts in the switch.

>+    /* the first byte of c is reserved for the length */

s/first/low-order/

/*-------------------------------------------- report the result */

I don't think this comment is needed.  ;)

>Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
>+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);

Where does this last magic number come from?  Why is this the only consumer that needs to pass a second arg in here?  dcamp should probably look at this part of the change.

I was wondering whether it would make sense to stick this somewhere in xpcom (maybe even nsCRT, with a different name from the existing functions) so others can also use it.  Or if we fix nsCRT to use this algorithm, just use that directly here?
Comment 16 Michal Novotny (:michal) 2008-08-26 13:08:12 PDT
(In reply to comment #14)

OK, I'll either pad char* with 0 or discard MSB in PRUnichar. Do hash functions nsCRT::HashCode() and PL_HashString() have to return same value for the same input text?
Comment 17 Boris Zbarsky [:bz] (TPAC) 2008-08-26 13:19:31 PDT
I don't think they need to, no. Though I'd be interested in how the new hash function compares to the ones in https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data easily.
Comment 18 Michal Novotny (:michal) 2008-08-26 13:36:16 PDT
(In reply to comment #17)
> I don't think they need to, no. Though I'd be interested in how the new hash
> function compares to the ones in
> https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data
> easily.
> 

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
Comment 19 Boris Zbarsky [:bz] (TPAC) 2008-08-26 14:17:15 PDT
Sweet!  Sounds to me like it's definitely worth getting this into NSPR if they'll take it.
Comment 20 Michal Novotny (:michal) 2008-08-26 15:45:19 PDT
(In reply to comment #15)
> (From update of attachment 335539 [details] [diff] [review])
> >Index: netwerk/cache/src/nsDiskCacheDevice.cpp
> >+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)
> 
> Why PLHashNumber and not just PRUint32?

Return value is PLHashNumber so it seems reasonable to have initial value of the same type.

>  static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);
> 
> ?  Or static inline void, if desired.  Somewhat preferable to using a macro, in
> my opinion.

OK, I've done a quick perf test and static inline is as fast as macro. Btw according to my simple test this new hash function is about 10% slower than the current one.

> >+  c = initval;         /* the previous hash value */
> 
> It is?  Or is it more of a "seed" value or something?

It is a comment from original source, but you are right.

> >Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
> >+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);
> 
> Where does this last magic number come from?

dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

> Why is this the only consumer that needs to pass a second arg in here?

This is recommended way to obtain 64-bit hash.

> I was wondering whether it would make sense to stick this somewhere in xpcom
> (maybe even nsCRT, with a different name from the existing functions) so others
> can also use it.  Or if we fix nsCRT to use this algorithm, just use that
> directly here?

I wanted to do this but I don't like the fact that nsDiskCache::kCurrentVersion depends on hash algorithm that is implemented somewhere else. There should be at least some automated test that would fail if kCurrentVersion isn't updated whenever hash function changes.
Comment 21 Boris Zbarsky [:bz] (TPAC) 2008-08-26 18:57:18 PDT
> Return value is PLHashNumber so it seems reasonable to have initial value of
> the same type.

Return value is PLDHashNumber, actually. I can see doing that.  And yes, the PLHash/PLDHash naming kinda sucks...  Document that the arg should just be any "reasonably random" number and that this arg is there to construct a 64-bit hash?

> dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

Document that this is just a randomly chosen number, then?  It looks very purposeful as things stand.

> but I don't like the fact that nsDiskCache::kCurrentVersion
> depends on hash algorithm that is implemented somewhere else.

That's a really good point.  OK, let's do disk cache separately and the other separately.

As far as that goes, by the way, padding char* with 0 after every byte should be pretty easy: you just end up with a bunch of guaranteed zeroes in various places in the algorithm and can simplify it somewhat.  Not sure what that does to the randomness of the result.
Comment 22 Michal Novotny (:michal) 2008-08-27 18:25:03 PDT
Created attachment 335821 [details] [diff] [review]
patch v2
Comment 23 Michal Novotny (:michal) 2008-08-27 18:29:09 PDT
Comment on attachment 335821 [details] [diff] [review]
patch v2

Please, look at the change in DCacheHash(const char * key).
Comment 24 Boris Zbarsky [:bz] (TPAC) 2008-08-28 06:19:10 PDT
Comment on attachment 335821 [details] [diff] [review]
patch v2

r+sr=bzbarsky
Comment 25 Michal Novotny (:michal) 2008-08-28 16:07:47 PDT
(In reply to comment #21)
> As far as that goes, by the way, padding char* with 0 after every byte should
> be pretty easy: you just end up with a bunch of guaranteed zeroes in various
> places in the algorithm and can simplify it somewhat.  Not sure what that does
> to the randomness of the result.

I've added it to the test and this is a result:

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- newhash padded LE
1.134 -- newhash padded BE

LE - little endian (zeroes are inserted after characters)
BE - big endian (zeroes are inserted before characters)

So maybe dropping MSB from unicode is better solution?
Comment 26 Boris Zbarsky [:bz] (TPAC) 2008-08-28 17:33:08 PDT
Yeah, especially since then we don't have to worry about endianness issues.  Let's do that.
Comment 27 Michal Novotny (:michal) 2008-09-02 09:24:29 PDT
Created attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

With these changes I've had problems building xpcom/tests, browser/components/build and embedding/browser/gtk/tests. I've changed Makefile.in so that it compiles now, but I'm almost sure that this isn't the right way.
Comment 28 Boris Zbarsky [:bz] (TPAC) 2008-09-02 09:47:21 PDT
Some quick comments:

1)  This will need review from bsmedberg (xpcom peer, build system stuff) and the
    NSPR peers.  I can probably review the content/layout/misc changes.
2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect that
    making it copying as you do is bad for performance.
Comment 29 Michal Novotny (:michal) 2008-09-02 15:23:02 PDT
(In reply to comment #28)
> 2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect
> that making it copying as you do is bad for performance.

Yes I know, but the problem is that I need to know the length of the string before hashing. I've tried to rewrite the function so that it doesn't need to know the length, but in that form it is cca 2.5x slower.
Comment 30 Boris Zbarsky [:bz] (TPAC) 2008-09-02 17:53:00 PDT
Hmm...  Could you do some sort of hash object thingie which you basically toss 4-byte-or-less chunks of data into, with a "final chunk" at the end which implements this hash function?  The existing consumers could just quickly toss things in 4 bytes at a time (as they do now, anyway)...

Or is that what you tried?
Comment 31 Michal Novotny (:michal) 2008-09-03 09:36:36 PDT
Created attachment 336665 [details]
performance test

(In reply to comment #30)
> Hmm...  Could you do some sort of hash object thingie which you basically toss
> 4-byte-or-less chunks of data into, with a "final chunk" at the end which
> implements this hash function?  The existing consumers could just quickly toss
> things in 4 bytes at a time (as they do now, anyway)...
> 
> Or is that what you tried?

My first attempt was quite simple, see perf_test_sequentially.cpp in the archive. Then I tried to work with 4 bytes as you've proposed (perf_test_class.cpp), but it is still much slower (2x). This implementation works on little endian only and accepts only the last piece of hased data to be less than 4 bytes.

Do "make test" for results.
Comment 32 Boris Zbarsky [:bz] (TPAC) 2008-09-03 20:15:52 PDT
Hrm....  It actually looks even worse at -Os, which is what we usually build with.

Would it make any sense to convert other consumers, leave atoms using the existing code or new functions that do what the existing code does, then do some whole-system timing of atoms?  If the new hash function + copy is still better than the level of hash collisions we get now, it's worth it.  Otherwise it's not.
Comment 33 Boris Zbarsky [:bz] (TPAC) 2008-09-03 20:16:46 PDT
The atom thing would be a followup bug, I suspect.  In fact, we could land the cache changes and move all the other stuff to followups if you'd prefer that.
Comment 34 Michal Novotny (:michal) 2008-09-04 08:21:59 PDT
Sorry for my silly question, but what exactly do you mean by "atom"?
Comment 35 Boris Zbarsky [:bz] (TPAC) 2008-09-04 09:04:09 PDT
Oh, sorry.  The AtomTable is the consumer of that HashCodeAsUTF8 method.  It relies on hashing UTF-16 and UTF-8 strings and looking up the same nsIAtom objects based on both without copying.  So it uses a non-copying hashcode generator and non-copying comparison function.
Comment 36 Wan-Teh Chang 2008-09-05 00:07:03 PDT
Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

In nsprpub/lib/ds/plhash.c, use strlen instead of PL_strlen to
avoid the need to include "plstr.h".  (<string.h> is already
included.)
Comment 37 Benjamin Smedberg [:bsmedberg] 2008-09-09 07:13:22 PDT
Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

I did not review NSPR or NSS changes. Please split those changes out into separate patches because they are different projects.

>Index: browser/components/build/Makefile.in

>+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \

This directory uses frozen linkage and cannot link against these libs.

You should probably remove nsCRT::HashCode and put the hash impl in xpcom/glue/nsCRTGlue.{h,cpp}

>Index: embedding/browser/gtk/tests/Makefile.in

>+		$(XPCOM_LIBS) \

This uses frozen-independent linkage and must not link against XPCOM or NSPR.

> PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)

I would be very surprised if we could tolerate this, performance-wise: we hash 16-bit atoms a lot. Have you profiled this code to show that it is not a performance hotspot?


>+  /**
>+   * nsCRT::BufferHashCode implements the same hash algorithm as PL_HashString.
>+   * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
>+   * value as wide strings, e.g. "Hello" and L"Hello".
>+   */

For what set of characters? ASCII-only? This comment needs to be much more precise.

>Index: xpcom/glue/nsTHashtable.cpp

>+#include "nsCRT.h"

nsTHashtable is in the glue and must not include nsCRT.h. I'm surprised it even compiled.

>Index: xpcom/tests/Makefile.in

Similar glue issues here.

>Index: xpcom/tests/TestHash.cpp

>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998
>+ * the Initial Developer. All Rights Reserved.

really?

>+int main(int argc, char** argv)

This test needs to use the standard framework for reporting errors. For best results use TestHarness.h
Comment 38 Michal Novotny (:michal) 2008-11-12 07:43:29 PST
Created attachment 347777 [details] [diff] [review]
xpcom part of new patch
Comment 39 Michal Novotny (:michal) 2008-11-12 07:45:41 PST
Created attachment 347778 [details] [diff] [review]
security part
Comment 40 Michal Novotny (:michal) 2008-11-12 07:48:09 PST
Created attachment 347779 [details] [diff] [review]
rdf part
Comment 41 Michal Novotny (:michal) 2008-11-12 07:50:39 PST
Created attachment 347780 [details] [diff] [review]
parser part
Comment 42 Michal Novotny (:michal) 2008-11-12 07:53:23 PST
Created attachment 347781 [details] [diff] [review]
nspr part
Comment 43 Michal Novotny (:michal) 2008-11-12 07:55:43 PST
Created attachment 347784 [details] [diff] [review]
netwerk part
Comment 44 Michal Novotny (:michal) 2008-11-12 07:59:27 PST
Created attachment 347785 [details] [diff] [review]
extensions/spellcheck

I couldn't find any owner/peer of this code. Boris, can you please do the review?
Comment 45 Michal Novotny (:michal) 2008-11-12 08:01:54 PST
Created attachment 347786 [details] [diff] [review]
embedding part
Comment 46 Michal Novotny (:michal) 2008-11-12 08:04:26 PST
Created attachment 347787 [details] [diff] [review]
content part
Comment 47 Michal Novotny (:michal) 2008-11-12 08:06:34 PST
Created attachment 347788 [details] [diff] [review]
caps part
Comment 48 Michal Novotny (:michal) 2008-11-12 08:20:58 PST
(In reply to comment #37)
> (From update of attachment 336502 [details] [diff] [review])
> I did not review NSPR or NSS changes. Please split those changes out into
> separate patches because they are different projects.

I've split whole patch by modules.

> >Index: browser/components/build/Makefile.in
> 
> >+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \
> 
> This directory uses frozen linkage and cannot link against these libs.
> 
> You should probably remove nsCRT::HashCode and put the hash impl in
> xpcom/glue/nsCRTGlue.{h,cpp}

I did it and it solved all problems with linking...

> > PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
> 
> I would be very surprised if we could tolerate this, performance-wise: we hash
> 16-bit atoms a lot. Have you profiled this code to show that it is not a
> performance hotspot?

I've reverted this change, because I couldn't include nsString.h in nsCRTGlue.cpp. I've created class NS_HashByBytes that can hash data in sequence.
Comment 49 Boris Zbarsky [:bz] (TPAC) 2008-11-12 12:58:59 PST
Comment on attachment 347784 [details] [diff] [review]
netwerk part

Why not pass lengths from all the callsites (or heck, have an inline NS_HashCode that takes const nsCString)?  Also, the fact that on OOM we'll return a different hashcode kinda bothers me, but I can live with it, I guess...
Comment 50 Boris Zbarsky [:bz] (TPAC) 2008-11-12 13:00:05 PST
Comment on attachment 347788 [details] [diff] [review]
caps part

OK, but again if we had an nsCString version we could use it here.
Comment 51 Benjamin Smedberg [:bsmedberg] 2008-11-13 07:20:42 PST
Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

>diff --git a/xpcom/glue/nsCRTGlue.cpp b/xpcom/glue/nsCRTGlue.cpp

>+/**
>+ * |NS_HashCode| is identical to |PL_HashString|.
>+ */
>+PRUint32 NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+{
>+  if (!str) return 0;
>+
>+  PRUint32 len = strlen(str);
>+
>+  if ( resultingStrLen )
>+    *resultingStrLen = len;
>+
>+  return PL_HashData(str, len, 0);
>+}

You still have a problem of the standalone glue linking against NSPR symbols.

>+PRUint32 NS_HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
>+{
>+  const PRUnichar* s = start;
>+  const PRUnichar* end = start + length;
>+
>+  PRUint16 W1 = 0;      // the first UTF-16 word in a two word tuple
>+  PRUint32 U = 0;       // the current char as UCS-4
>+  int code_length = 0;  // the number of bytes in the UTF-8 sequence for the current char
>+
>+  NS_HashByBytes hash;
>+
>+  PRUint16 W;
>+  while ( s < end )
>+    {
>+      W = *s++;
>+        /*
>+         * On the fly, decoding from UTF-16 (and/or UCS-2) into UTF-8 as per
>+         *  http://www.ietf.org/rfc/rfc2781.txt
>+         *  http://www.ietf.org/rfc/rfc3629.txt
>+         */

is it impossible to reuse our existing UTF8CharEnumerator class (nsUTF8Utils.h)? I don't particularly like duplicating all this logic.

>diff --git a/xpcom/glue/nsCRTGlue.h b/xpcom/glue/nsCRTGlue.h

>+// Computes the hashcode for a c-string, returns the string length as
>+// an added bonus.
>+NS_COM_GLUE PRUint32
>+NS_HashCode(const char* str, PRUint32* resultingStrLen = nsnull);

Please use javadoc-style comments.

>+// Computes the hashcode for a length number of bytes of c-string data.
>+NS_COM_GLUE PRUint32 NS_HashCode(const char* start, PRUint32 length);

nit, function name goes on the next line

> PRUint32
> HashString(const char *str)
> {
>-  PRUint32 code = 0;
>-
>-  while (*str) {
>-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
>-    ++str;
>-  }
>-
>-  return code;
>+  return NS_HashCode(str);
> }

Why keep this function if it just forwards to NS_HashCode? Even if you don't do it in this pass, please file a followup bug to rename all uses of this function to the new version.
Comment 52 Michal Novotny (:michal) 2008-11-13 17:37:27 PST
(In reply to comment #51)
> >+  return PL_HashData(str, len, 0);
> >+}
> 
> You still have a problem of the standalone glue linking against NSPR symbols.

Hmm, so I need to copy code from PL_HashData to some function in nsCRTGlue.cpp, right?
Comment 53 Wan-Teh Chang 2008-11-15 05:36:00 PST
Comment on attachment 347781 [details] [diff] [review]
nspr part

Thanks for the patch.  I'm afraid that we can't change
PL_HashString because of backward compatibility.  As your
patch v2 (attachment 335821 [details] [diff] [review]) shows, you need to bump the
kCurrentVersion of nsDiskCache when you change its hash
function.  So similarly, there could be users of
PL_HashString that have stored persistent data based on
the current PL_HashString, and they will be broken when
they upgrade to a new version of NSPR and PL_HashString
produces different hashes.

Please correct me if I'm wrong.

You can add a new function PL_HashString2 (for lack of
better name) to deal with this issue.  You also need to
add the new function(s) to nsprpub/lib/ds/plds.def in a
new NSPR_4.8 section at the end of the file.

Nit: NSPR has a different formatting style from Mozilla.
Please imitate the prevalent formatting style of in that
file.  (I can fix that for you when I check in the patch.)
Comment 54 Wan-Teh Chang 2008-11-15 05:44:15 PST
Comment on attachment 347778 [details] [diff] [review]
security part

These changes should be OK.  We'll need to make sure
we only use these two hash functions for in-memory
hash tables that are rebuilt in every process startup.
We need to watch out for the problem I described in
comment 53.
Comment 55 Wan-Teh Chang 2008-11-15 05:50:53 PST
Comment on attachment 347778 [details] [diff] [review]
security part

r=wtc.  I verified that these two hash functions
nss_item_hash and nss_certificate_hash are only
passed to PL_NewHashTable.
Comment 56 Robert Relyea 2008-11-17 14:13:08 PST
Comment on attachment 347778 [details] [diff] [review]
security part

r+ rrelyea
Comment 57 Johnny Stenback (:jst, jst@mozilla.com) 2008-11-17 15:57:03 PST
Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

+/**
+ * NS_BufferHashCode implements the same hash algorithm as PL_HashString.
+ * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
+ * value as wide strings, e.g. "Hello" and L"Hello". This is true for ASCII and
+ * all other strings where character in narrow string is equal to LSB
+ * of character in wide string.

Do we know for a fact that the above assumption is safe for the atom table? Any valid UTF8 string must hash to the same value as its UTF16 representation for the atom hash table to work right AFAIC.
Comment 58 Michal Novotny (:michal) 2008-11-17 18:45:28 PST
(In reply to comment #57)
> Do we know for a fact that the above assumption is safe for the atom table? Any
> valid UTF8 string must hash to the same value as its UTF16 representation for
> the atom hash table to work right AFAIC.

The old behaviour was IMHO the same. There was no guarantee that hash values are equal for non-latin1 strings.
Comment 59 Boris Zbarsky [:bz] (TPAC) 2008-11-18 05:40:31 PST
That is, nsCRT::HashCode(PRUnichar*) and nsCRT::HashCode(char*) only behaved the same for latin1, right?  nsCRT::HashCodeAsUTF8(PRUnichar*) behaved the same as nsCRT::HashCode(char* /* UTF8 */) and that's preserved in this patch.
Comment 60 Michal Novotny (:michal) 2008-11-18 16:49:32 PST
(In reply to comment #51)
> > PRUint32
> > HashString(const char *str)
> > {
> >-  PRUint32 code = 0;
> >-
> >-  while (*str) {
> >-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
> >-    ++str;
> >-  }
> >-
> >-  return code;
> >+  return NS_HashCode(str);
> > }
> 
> Why keep this function if it just forwards to NS_HashCode? Even if you don't do
> it in this pass, please file a followup bug to rename all uses of this function
> to the new version.

I'll do it within this bug. I've added NS_HashCode(const nsAString &) and now all functions HashString() can be replaced with NS_HashCode(). Due to this replacement I've found some places that still uses old hash algorithm: JSJ_HashString(), JS_HashString(), js_HashString(). I guess that I must duplicate the code again instead of calling NSPR hash function, right? Is there the same problem as described in comment #53? I.e. should I create a new function JS_HashString2()?
Comment 61 Wan-Teh Chang 2008-11-19 18:04:45 PST
Have you considered Paul Hsieh's SuperFastHash?
http://www.azillionmonkeys.com/qed/hash.html

I'm also not sure if we need to replace the hash functions
that are used with PLHashTable such as PL_HashString.
PLHashTable handles collisions by chaining, so having
collisions is not disastrous.  It seems that in the extra
time I spend in this new hash function, I can follow a
lot of chaining pointers, so I'm not sure if using the
new hash function is overall a win.
Comment 62 Wan-Teh Chang 2008-11-20 06:59:32 PST
Bug 412345 and bug 435802 are also looking at string hash
functions.
Comment 63 Michal Novotny (:michal) 2008-11-21 13:04:41 PST
(In reply to comment #61)
> Have you considered Paul Hsieh's SuperFastHash?
> http://www.azillionmonkeys.com/qed/hash.html

I didn't know about this function. It seems to be much faster but I did a quick comparison with https://bugzilla.mozilla.org/attachment.cgi?id=26596 and the result isn't too good.

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- superfasthash

> I'm also not sure if we need to replace the hash functions
> that are used with PLHashTable such as PL_HashString.
> PLHashTable handles collisions by chaining, so having
> collisions is not disastrous.  It seems that in the extra
> time I spend in this new hash function, I can follow a
> lot of chaining pointers, so I'm not sure if using the
> new hash function is overall a win.

Yes, this is maybe true. Let's try it and see what happens. At least the hash function will be now only at few places and trying some other function will be quick and simple.
Comment 64 Boris Zbarsky [:bz] (TPAC) 2008-11-30 19:20:25 PST
Michal, can you answer the questions from comment 49?
Comment 65 Michal Novotny (:michal) 2008-12-01 15:37:12 PST
(In reply to comment #49)
> (From update of attachment 347784 [details] [diff] [review])
> Why not pass lengths from all the callsites (or heck, have an inline
> NS_HashCode that takes const nsCString)?

You mean nsACString, right? It will be changed in a new patch. But it won't be inlined. It didn't compile when I tried to define it as inlined.

> Also, the fact that on OOM we'll
> return a different hashcode kinda bothers me, but I can live with it, I
> guess...

I'll fix it. I'll changed it so that it will use NS_HashByBytes.
Comment 66 Boris Zbarsky [:bz] (TPAC) 2008-12-02 09:18:10 PST
> You mean nsACString, right?

No, I really meant nsCString, assuming all callers have one and assuming this is somewhat "internal" API that doesn't have to worry about external strings.  If not, I suppose we could do an nsACString one instead.

I have no idea what the issue with compiling is there; Benjamin might know...
Comment 67 Boris Zbarsky [:bz] (TPAC) 2008-12-05 09:20:59 PST
Comment on attachment 347784 [details] [diff] [review]
netwerk part

Removing review request pending said updated patch.
Comment 68 Michal Novotny (:michal) 2008-12-18 15:52:52 PST
Created attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

new hash function is PL_HashString2
Comment 69 Michal Novotny (:michal) 2008-12-18 16:09:08 PST
Created attachment 353746 [details] [diff] [review]
patch2 v3 all except nss+nspr

changes in the new version:
- NS_HashCodeAsUTF8 uses UTF16CharEnumerator
- HashString functions were removed and replaced with NS_HashCode
- JS_HashString/JSJ_HashString/js_HashString changed to new hash function
- PL_HashString usage replaced with PL_HashString2
- StringHash in nsHttp.cpp uses NS_HashByBytes instead of NS_strdup+NS_HashCode
- NS_HashCode(const char*, PRUint32*) is exported as C function. NS_HashCode is needed in pldhash.c and I've found this solution better then creating new function just for this usage.
Comment 70 Michal Novotny (:michal) 2008-12-18 16:23:39 PST
(In reply to comment #66)
> No, I really meant nsCString, assuming all callers have one and assuming this
> is somewhat "internal" API that doesn't have to worry about external strings. 
> If not, I suppose we could do an nsACString one instead.

NS_HashCode(const nsACString &) is now needed because all HashString(...) were replaced by NS_HashCode(...). So it is now a question if it's worth to havenlined NS_HashCode(const nsCString&) to speed up calling at few places.

> I have no idea what the issue with compiling is there; Benjamin might know...

This is the error when the function is inlined:
/usr/bin/ld: TestHash: hidden symbol `NS_HashCode(nsCString const&)' isn't defined
/usr/bin/ld: final link failed: Nonrepresentable section on output

Benjamin, do you have an idea what's wrong?
Comment 71 Benjamin Smedberg [:bsmedberg] 2008-12-18 19:01:26 PST
nsACString == nsCSubstring nowadays, so I don't think there's any benefit to making the API use nsCString& instead of nsACString&

jsdhash is C++ now... please make pldhash be C++ also, and we can get rid of the extern "C" marking. In particular, you have multiple functions with the same name, one of which is extern "C"... this is a recipe for future type mismatches.
Comment 72 Michal Novotny (:michal) 2008-12-19 12:30:09 PST
Created attachment 353867 [details] [diff] [review]
patch2 v4 - pldhash.c changed to C++
Comment 73 Boris Zbarsky [:bz] (TPAC) 2008-12-19 14:02:44 PST
>+++ b/xpcom/ds/nsStaticNameTable.cpp
> caseInsensitiveStringHashKey(PLDHashTable *table, const void *key)
>+        nsAutoString keyCopy(tableKey->mKeyStr.m2b->get());

This is what this code used to do before it was rewritten to work as now.  The result showed up in CSS parsing profiles.  I'd rather not change this code as part of a huge patch; changing it separated so that the performance impact can be gauged, including on the microbenchmarks used in the bug that added this code, is better.

>+++ b/xpcom/glue/nsCRTGlue.cpp
>+NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+  if ( resultingStrLen )

Ditch the spaces around resultingStrLen?

>+NS_HashCode(const PRUnichar* str, PRUint32* resultingStrLen)
>+  const PRUnichar* k = str;
>+  while (*k)
>+    k++;

NS_strlen?

>+  if ( resultingStrLen )

Ditch the spaces.

>+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)

The naming here is a bit weird... NS_StrHashCode?

I'm not seeing an implementation of the NS_HashCode that takes char* and a length.  Where should I be looking?

Also, did you use the sed script to create pldhash.cpp?  Or did you edit it manually?
Comment 74 Michal Novotny (:michal) 2008-12-19 16:38:41 PST
Created attachment 353911 [details] [diff] [review]
patch2 v5 - changes according to comment #73

(In reply to comment #73)
> >+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)
> 
> The naming here is a bit weird... NS_StrHashCode?

It was nsCRT::BufferHashCode so I kept the name. But I can't see any reason why the name shouldn't be also NS_HashCode. This is just PRUnichar version of NS_HashCode(char *, int). But I can change it to NS_StrHashCode if you wish.

> I'm not seeing an implementation of the NS_HashCode that takes char* and a
> length.  Where should I be looking?

It is just after NS_BufferHashCode (now renamed to NS_HashCode too) and before NS_HashByBytes::NS_HashByBytes.

> Also, did you use the sed script to create pldhash.cpp?  Or did you edit it
> manually?

I did the changes manually but I've updated (and tested) the sed script. But somebody else already did some change that isn't in the script.

$ diff -w pldhash.cpp pldhash.cpp_sed 
72c72
< #define JSDHASH_SINGLE_LINE_ASSERTION PR_ASSERT
---
> #define JSDHASH_ONELINE_ASSERT PR_ASSERT
Comment 75 Michal Novotny (:michal) 2008-12-19 16:42:56 PST
Created attachment 353913 [details] [diff] [review]
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp
Comment 76 Boris Zbarsky [:bz] (TPAC) 2008-12-19 17:16:46 PST
Comment on attachment 353911 [details] [diff] [review]
patch2 v5 - changes according to comment #73

NS_HashCode(PRUnichar*) sounds fine to me.  I thought for some reason that there was a void* overload too; no idea what made me think that.

> But somebody else already did some change that isn't in the script.

Yeah, that was me forgetting to run the script after changing it in response to some review comments.  :(  Just run the script and include that change in what you chek in, please.

r=bzbarsky
Comment 77 Boris Zbarsky [:bz] (TPAC) 2008-12-19 17:17:23 PST
Comment on attachment 353913 [details] [diff] [review]
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp

I'd like to see those benchmark results before reviewing this.
Comment 78 Michal Novotny (:michal) 2008-12-23 14:20:06 PST
(In reply to comment #77)
> (From update of attachment 353913 [details] [diff] [review])
> I'd like to see those benchmark results before reviewing this.

What bug contains those benchmarks? I couldn't find it. I tried to profile caseInsensitiveStringHashKey using callgrind.

without patch 1021712 Ir
Ir     Count
988730 5497  caseInsensitiveStringHashKey (self)
 27282 4547  nsString::get()
  5700  950  nsCString::get()


with patch 7245502 Ir
Ir      Count
 935904 5497  caseInsensitiveStringHashKey (self)
2982761 4547  nsAutoString::nsAutoString(unsigned short const*, unsigned)
1970683 4547  NS_HashCode(unsigned short const *, unsigned)
 416645  950  NS_HashCode(char const *, unsigned)
 378815  950  nsCAutoString::nsCAutoString(char const*, unsigned)
 291008 4547  nsAutoString::~nsAutoString()
 145504 4547  nsAString_internal::BeginWriting()
  60800  950  nsCAutoString::~nsCAutoString()
  30400  950  nsACString_internal::BeginWriting()
  27282 4547  nsString::get()
   5700  950  nsCString::get()
Comment 79 Boris Zbarsky [:bz] (TPAC) 2008-12-23 14:41:09 PST
Hmm.  I thought I'd linked to testcases from bug 348691.

How about this:

<body style="display: none">
  <script>
    var s = document.body.style;
    for (var i = 0; i < 1000000; ++i) {
      s.setPropertyValue("left", "auto");
    }
  </script>
</body>

Or just timing the parsing of a largish CSS file, for that matter.
Comment 80 Michal Novotny (:michal) 2008-12-23 17:58:55 PST
Simple measurement using timestamps gives (1000000x setProperty):
 - without patch 48550ms
 - with patch 49506ms

Time spent in caseInsensitiveStringHashKey using callgrind (10000x setProperty):
 - without patch 1721124 Ir
 - with patch 19109906 Ir
Comment 81 Boris Zbarsky [:bz] (TPAC) 2008-12-26 23:08:36 PST
So this change significantly slows down caseInsensitiveStringHashKey but in that testcase only about 2% of the total cost is in that function?  That sounds about right to me, actually.  The cost would be higher in a stylesheet parse, though probably not over 5%.
Comment 82 Benjamin Smedberg [:bsmedberg] 2009-01-05 12:27:12 PST
You can't land attachment 353911 [details] [diff] [review] without 353913, right? Is there a practical reason we can't avoid the string copy in the caseInsensitiveStringHashKey case, kinda the same way it works today, by hashing it a character at a time using the same algorithm?
Comment 83 Michal Novotny (:michal) 2009-01-06 07:09:46 PST
(In reply to comment #81)
> So this change significantly slows down caseInsensitiveStringHashKey but in
> that testcase only about 2% of the total cost is in that function?

Yes, valgrind shows ca 1.3% in caseInsensitiveStringHashKey.
Comment 84 Michal Novotny (:michal) 2009-01-06 09:10:36 PST
Created attachment 355597 [details] [diff] [review]
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp v2

(In reply to comment #82)
> You can't land attachment 353911 [details] [diff] [review] without 353913, right?

I thing that 353913 isn't needed for 353911. I guess that it is OK to have old hash function only in nsStaticCaseInsensitiveNameTable...

> Is there a practical
> reason we can't avoid the string copy in the caseInsensitiveStringHashKey case,
> kinda the same way it works today, by hashing it a character at a time using
> the same algorithm?

Good point. This new patch uses NS_HashByBytes and it is faster. The same test as in comment #80 gives 48930ms and 10591143 Ir.
Comment 85 Boris Zbarsky [:bz] (TPAC) 2009-01-06 09:15:11 PST
Still almost 10x slower than the old code, no?  At least according to Ir?
Comment 86 Michal Novotny (:michal) 2009-01-06 09:49:19 PST
(In reply to comment #85)
> Still almost 10x slower than the old code, no?  At least according to Ir?

Of course. I meant that it is faster than previous patch, not original code...
Comment 87 Michal Novotny (:michal) 2009-02-01 16:44:38 PST
I'm now investigating bug #321361 and it seems to be similar to #355567. Checking in patch #335821 would make reproducing the bug very hard. So I'm postponing the checkin until bug #321361 is fixed and properly tested...
Comment 88 WADA 2009-02-09 19:34:04 PST
Linkifying Bug 355567 & Bug 335821.
Comment 89 Joe Drew (not getting mail) 2009-03-19 08:50:18 PDT
*** Bug 484182 has been marked as a duplicate of this bug. ***
Comment 90 jedierikb 2009-03-19 16:44:39 PDT
Could someone post or point the specific hash implementation here which is the reason for the collisions?  It would be good to pre-test files as cache-able before deploying sites.
Comment 92 Serge Gautherie (:sgautherie) 2009-03-20 16:26:35 PDT
Comment on attachment 335821 [details] [diff] [review]
patch v2


Could you update this cvs patch to a current hg one ? Thanks.
Comment 93 Michal Novotny (:michal) 2009-03-20 17:23:07 PDT
Created attachment 368619 [details] [diff] [review]
patch v2 - diff against hg rev. 4b2a90a7726e
[Checkin: Comment 94]
Comment 94 Serge Gautherie (:sgautherie) 2009-03-27 11:02:56 PDT
Comment on attachment 368619 [details] [diff] [review]
patch v2 - diff against hg rev. 4b2a90a7726e
[Checkin: Comment 94]


http://hg.mozilla.org/mozilla-central/rev/999999125311
Comment 95 Michal Novotny (:michal) 2009-06-05 04:09:22 PDT
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Please could anyone of you do the review?
Comment 96 Boris Zbarsky [:bz] (TPAC) 2009-06-05 14:21:22 PDT
Michal, have you tried mailing wtc directly?
Comment 97 Michal Novotny (:michal) 2009-06-05 14:58:11 PDT
Yes, few months ago...
Comment 98 Wan-Teh Chang 2009-06-08 10:48:52 PDT
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Hi Michal,

Sorry about the late response.  I didn't have time to follow
this bug closely.  Part of the reason is that this bug doesn't
seem to affect NSPR or NSS that much.

PL_HashString is an extremely simple hash function, whereas
PL_HashString2 is quite complicated.  This has two implications.

1. We'd need to give guidance to NSPR users on which string
hash function is appropriate for what purposes.

2. PLHashTable resolves hash collisions by chaining.  I guess
the advantage of PL_HashString2 is that it reduces hash collisions,
but intuitively with the extra computation performed by
PL_HashString2, I could follow a lot of chaining links in the
same amount of time.  So it's not clear PL_HashString2 is a win
for PLHashTable.

I know you want to close this bug, so I propose the following:

1. I delegate to a Mozilla developer I trust such as bz the review
of your patch.  Then I just need some comments in plhash.h to
tell NSPR users which string hash function should be used for
what purposes.

or

2. Abandon this patch for nspr+nss.

Thanks.
Comment 99 Ted Mielczarek [:ted.mielczarek] 2009-06-22 08:27:22 PDT
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

I'm not at all qualified to review this. Please follow wtc's suggestions.
Comment 100 Jeff Muizelaar [:jrmuizel] 2009-12-09 16:50:41 PST
*** Bug 533701 has been marked as a duplicate of this bug. ***
Comment 101 Steffen Wilberg 2010-04-16 06:25:54 PDT
(In reply to comment #98)
> (From update of attachment 353743 [details] [diff] [review])
> Hi Michal,
...
> I know you want to close this bug, so I propose the following:
> 
> 1. I delegate to a Mozilla developer I trust such as bz the review
> of your patch.  Then I just need some comments in plhash.h to
> tell NSPR users which string hash function should be used for
> what purposes.

So Michal, why don't you request review from bzbarsky?
Comment 102 Jason Duell [:jduell] (needinfo me) 2010-04-16 12:49:20 PDT
Renaming since the HTTP cache part of this bug already landed (comment 94)
Comment 103 Jason Duell [:jduell] (needinfo me) 2010-04-16 12:52:03 PDT
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Seeing if bz will take review as WTC requested.  Otherwise this bug is WONTFIX, yes?
Comment 104 Boris Zbarsky [:bz] (TPAC) 2010-04-16 14:04:55 PDT
> Renaming since the HTTP cache part of this bug already landed (comment 94)

Imho this bug should have been resolved fixed at that point, with followup bugs filed for followup work.  One bug per issue....  Can we please do that now?

I can do the review, but it'll likely be a week or more before I get to it; lots of other stuff in the queue ahead of it.
Comment 105 Jason Duell [:jduell] (needinfo me) 2010-04-16 14:41:53 PDT
Renaming back to original issue, which has landed, and marking FIXED.  Michal, if you like please open a followup NSPR bug with patch for bz.
Comment 106 Jason Duell [:jduell] (needinfo me) 2010-05-02 00:58:19 PDT
Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

wrong version of patch submitted for review.

Note You need to log in before you can comment on or make changes to this bug.