Closed Bug 290032 Opened 19 years ago Closed 14 years ago

Some files are never cached due to hash collisions which are quite common due to weak string hash function

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9.2a1

People

(Reporter: tomasare, Assigned: michal)

References

(Blocks 1 open bug,
URL
)

Details

Attachments

(6 files, 16 obsolete files)

HTTP log when reproducing the bug 19 years ago Tomas A. Haavet 217.54 KB, text/plain		Details
changed hash function in cache 16 years ago Michal Novotny [:michal] 5.42 KB, patch		Details \| Diff \| Splinter Review
patch v2 16 years ago Michal Novotny [:michal] 6.07 KB, patch	bzbarsky : review+ dcamp : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
patch for plhash, nsCRT and others 16 years ago Michal Novotny [:michal] 31.95 KB, patch	benjamin : review-	Details \| Diff \| Splinter Review
performance test 16 years ago Michal Novotny [:michal] 2.03 KB, application/x-compressed-tar		Details
xpcom part of new patch 16 years ago Michal Novotny [:michal] 32.06 KB, patch	benjamin : review-	Details \| Diff \| Splinter Review
security part 16 years ago Michal Novotny [:michal] 1.61 KB, patch	wtc : review+ rrelyea : superreview+	Details \| Diff \| Splinter Review
rdf part 16 years ago Michal Novotny [:michal] 2.12 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
parser part 16 years ago Michal Novotny [:michal] 633 bytes, patch	jst : review+ mrbkap : superreview+	Details \| Diff \| Splinter Review
nspr part 16 years ago Michal Novotny [:michal] 3.43 KB, patch	wtc : review-	Details \| Diff \| Splinter Review
netwerk part 16 years ago Michal Novotny [:michal] 2.97 KB, patch		Details \| Diff \| Splinter Review
extensions/spellcheck 16 years ago Michal Novotny [:michal] 753 bytes, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
embedding part 16 years ago Michal Novotny [:michal] 804 bytes, patch	benjamin : review+	Details \| Diff \| Splinter Review
content part 16 years ago Michal Novotny [:michal] 1.57 KB, patch	jst : review+ jst : superreview+	Details \| Diff \| Splinter Review
caps part 16 years ago Michal Novotny [:michal] 638 bytes, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
patch2 v3 nspr+nss 16 years ago Michal Novotny [:michal] 7.03 KB, patch		Details \| Diff \| Splinter Review
patch2 v3 all except nss+nspr 16 years ago Michal Novotny [:michal] 75.30 KB, patch		Details \| Diff \| Splinter Review
patch2 v4 - pldhash.c changed to C++ 16 years ago Michal Novotny [:michal] 79.02 KB, patch		Details \| Diff \| Splinter Review
patch2 v5 - changes according to comment #73 16 years ago Michal Novotny [:michal] 77.35 KB, patch	bzbarsky : review+ benjamin : review+	Details \| Diff \| Splinter Review
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp 16 years ago Michal Novotny [:michal] 1.60 KB, patch		Details \| Diff \| Splinter Review
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp v2 16 years ago Michal Novotny [:michal] 1.47 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
patch v2 - diff against hg rev. 4b2a90a7726e [Checkin: Comment 94] 15 years ago Michal Novotny [:michal] 5.44 KB, patch		Details \| Diff \| Splinter Review

Tomas A. Haavet

Reporter

Description

•

19 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; chrome://navigator/locale/navigator.properties; rv:1.7.5) Gecko/20041107 Firefox/1.0

It seems that the files ListBar.js, ListBar.css, MenuBar.js, and MenuBar.css are
never cached.  On page page http://nf.opoint.com/main/ff_cache.html there are
other css- and js-files that are cached, but never the ones named ListBar and
MenuBar.  If I rename these files, they get cached, e.g. List.css is a copy of
ListBar.css.


Reproducible: Always

Steps to Reproduce:
1. Open url
2. Reload page (e.g. klikk on the reload-link)
3. With each page reload the four files ListBar.js, ListBar.css, MenuBar.js, and
MenuBar.css get fetched from the server (use e.g. LiveHTTPHeaders)

Actual Results:  
The files are not cached

Expected Results:  
They should be cached as the other js- and css-files

Also tested on:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217

Boris Zbarsky [:bzbarsky]

Comment 1

•

19 years ago

Could you create a log as described at
http://www.mozilla.org/projects/netlib/http/http-debugging.html and attach it to
this bug?

Tomas A. Haavet

Reporter

Comment 2

•

19 years ago

Attached file HTTP log when reproducing the bug — Details

Browser used: Mozilla 1.8b2, Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2)
Gecko/20050516
1. Cleared the browser cache before loading the page.
2. Reloaded the page (by clicking on the reload-link)
3. The log file verifies that the files MenuBar.css, ListBar.css, MenuBar.js,
and ListBar.js are not properly cached (they always report 'got cache entry
[access=2]')

Boris Zbarsky [:bzbarsky]

Comment 3

•

19 years ago

Confirming based on that log.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Caleb

Updated

•

19 years ago

Depends on: 310656

Darin Fisher

Comment 4

•

18 years ago

-> default owner

Assignee: darin → nobody

Michal Novotny [:michal]

Assignee

Comment 5

•

16 years ago

Problem is that cache entry can't be opened due to hash collision.

Breakpoint 1, nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
422             *collision = PR_TRUE;
(gdb) bt 10
#0  nsDiskCacheDevice::FindEntry (this=0xb33ab880, key=0xb6d17770, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsDiskCacheDevice.cpp:422
#1  0x0105fec1 in nsCacheService::SearchCacheDevices (this=0xb791e6b0, key=0xb6d17770, policy=0, collision=0xbfb9c900)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1540
#2  0x01062e5c in nsCacheService::ActivateEntry (this=0xb791e6b0, request=0xb6d1ef20, result=0xbfb9c958)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1452
#3  0x01062394 in nsCacheService::ProcessRequest (this=0xb791e6b0, request=0xb6d1ef20, calledFromOpenCacheEntry=1, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1340
#4  0x01063161 in nsCacheService::OpenCacheEntry (session=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, listener=0x0, 
    result=0xb33afb20) at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheService.cpp:1420
#5  0x01065cab in nsCacheSession::OpenCacheEntry (this=0xb6d1eee0, key=@0xbfb9ca0c, accessRequested=2, blockingMode=0, result=0xb33afb20)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/cache/src/nsCacheSession.cpp:106
#6  0x010c1308 in nsHttpChannel::OpenCacheEntry (this=0xb33afa20, offline=0, delayed=0xbfb9cac8)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:1417
#7  0x010c626c in nsHttpChannel::Connect (this=0xb33afa20, firstTime=1)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:286
#8  0x010c697f in nsHttpChannel::AsyncOpen (this=0xb33afa20, listener=0xb6d14dc0, context=0x0)
    at /opt/moz/TRUNK_new3/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:3745
#9  0x013652b2 in CSSLoaderImpl::LoadSheet (this=0xb5f780c0, aLoadData=0xb01afe20, aSheetState=eSheetNeedsParser)
    at /opt/moz/TRUNK_new3/mozilla/layout/style/nsCSSLoader.cpp:1454
(More stack frames follow...)
(gdb) l
417         nsDiskCacheEntry * diskEntry = mCacheMap.ReadDiskCacheEntry(&record);
418         if (!diskEntry) return nsnull;
419         
420         // compare key to be sure
421         if (strcmp(diskEntry->Key(), key->get()) != 0) {
422             *collision = PR_TRUE;
423             return nsnull;
424         }
425         
426         nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
(gdb) p diskEntry->Key()
$1 = 0xb44f7624 "HTTP:http://localhost/bug338190/3/XSL/bss.xsl"
(gdb) p *key
$2 = {<nsACString_internal> = {<nsCSubstring_base> = {<No data fields>}, mData = 0xb4677948 "HTTP:http://localhost/bug338190/3/CSS/bss.css", 
    mLength = 45, mFlags = 5}, <No data fields>}
(gdb) p/x hashNumber
$3 = 0xc4b04670


Hash function http://mxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDevice.cpp#240 that is used in mozilla code a lot is really weak and collisions are quite common.

$ ./hashit "ABA/xxx.aba"
8ffac222
$ ./hashit "XyZ/xxx.xYz"
8ffac222
$ ./hashit "CSS/xxx.css"
8ffac222
$ ./hashit "JPG/xxx.jpg"
8ffac222


$ ./hashit modules_newsfeeds/MenuBar/MenuBar.css
15c23729
$ ./hashit modules_newsfeeds/ListBar/ListBar.css
15c23729

$ ./hashit modules_newsfeeds/MenuBar/MenuBar.js
a15c23e5
$ ./hashit modules_newsfeeds/ListBar/ListBar.js
a15c23e5


IMHO the hash function should be changed.

benc

Comment 6

•

16 years ago

Michal: that is very interesting. Can you update the summary?

Boris Zbarsky [:bzbarsky]

Comment 7

•

16 years ago

We've run into issues with collisions in this hash function before.  Changing it sounds like a great idea.

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Summary: some js and css files that are never cached → Some files are never cached due to hash collisions which are quite common due to weak string hash function

Michal Novotny [:michal]

Assignee

Comment 9

•

16 years ago

(In reply to comment #7)
> We've run into issues with collisions in this hash function before.  Changing
> it sounds like a great idea.
> 

I've done some search and this hash function seems quite good for this purpose:

http://burtleburtle.net/bob/hash/evahash.html

What do you think?

Boris Zbarsky [:bzbarsky]

Comment 10

•

16 years ago

Sorry it took me so long to reply; I was out for a week.

At first glance, that hash function looks pretty reasonable, particularly for hashing mostly-ascii strings as we are.

Let's give it a shot.  As long as it doesn't impact performance negatively, it's got to be a win in terms of collision-avoidance.

Michal Novotny [:michal]

Assignee

Comment 11

•

16 years ago

Attached patch changed hash function in cache (obsolete) — Details — Splinter Review

Assignee: nobody → michal

Status: NEW → ASSIGNED

Attachment #335539 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Comment 12

•

16 years ago

I thought that it make sense to replace not only cache hash function but also other same hash functions with the new one. I've rewritten PL_HashString() and tried to use it everywhere, where I've found the old function. But I'm stuck with nsCRT::HashCode(). There is version for (char *) and also for (PRUnichar *). Hash value should be the same for "SomeText" and L"SomeText". I can do this with the new hash function only by discarding most significant byte of PRUnichar and I don't think that it is correct. Is it necessary to keep this feature?

Benjamin Smedberg

Comment 13

•

16 years ago

Yes, it is, because we hash utf8 and unicode atoms in the same table.

Boris Zbarsky [:bzbarsky]

Comment 14

•

16 years ago

Can we pad the char* out with nulls instead of dropping the MSB of the PRUnichar?  How would performance look?  That code is _very_ perf-sensitive.  It's possible that a better hash function would even improve performance here, of course.  ;)

That said, this is the hashcode computation, right?  So discarding the MSB would just mean slightly more collisions, which might be acceptable for the atom table (almost all atoms are in fact pure-ascii).

Boris Zbarsky [:bzbarsky]

Comment 15

•

16 years ago

Comment on attachment 335539 [details] [diff] [review]
changed hash function in cache

>Index: netwerk/cache/src/nsDiskCacheDevice.cpp
>+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)

Why PLHashNumber and not just PRUint32?

It's probably worth adding a pointer to where the algorithm came from or something.

>+#define hashmix(a,b,c) \

How about:

 static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);

?  Or static inline void, if desired.  Somewhat preferable to using a macro, in my opinion.

>+nsDiskCache::Hash(const char * key, PLHashNumber initval)
>+  PRUint8 *k = (PRUint8*)key;

reinterpret_cast.

>+  PRUint32 a,b,c,len,length;

Spaces after the commas, please?

>+  c = initval;         /* the previous hash value */

It is?  Or is it more of a "seed" value or something?

>+    a += (k[0] +((PRUint32)k[1]<<8) +((PRUint32)k[2]<<16) +((PRUint32)k[3]<<24));

I think this would be more readable (at least in terms of making sure it's correct) with C++-style casts and withoutthe outermost parens, and with spaces around the plus signs:

  a += k[0] + (PRUint32(k[1])<<8) + (PRUint32(k[2])<<16) + (PRUint32(k[3])<<24);

Similar for the other three statements.  Also use C++ casts in the switch.

>+    /* the first byte of c is reserved for the length */

s/first/low-order/

/*-------------------------------------------- report the result */

I don't think this comment is needed.  ;)

>Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
>+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);

Where does this last magic number come from?  Why is this the only consumer that needs to pass a second arg in here?  dcamp should probably look at this part of the change.

I was wondering whether it would make sense to stick this somewhere in xpcom (maybe even nsCRT, with a different name from the existing functions) so others can also use it.  Or if we fix nsCRT to use this algorithm, just use that directly here?

Michal Novotny [:michal]

Assignee

Comment 16

•

16 years ago

(In reply to comment #14)

OK, I'll either pad char* with 0 or discard MSB in PRUnichar. Do hash functions nsCRT::HashCode() and PL_HashString() have to return same value for the same input text?

Boris Zbarsky [:bzbarsky]

Comment 17

•

16 years ago

I don't think they need to, no. Though I'd be interested in how the new hash function compares to the ones in https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data easily.

Michal Novotny [:michal]

Assignee

Comment 18

•

16 years ago

(In reply to comment #17)
> I don't think they need to, no. Though I'd be interested in how the new hash
> function compares to the ones in
> https://bugzilla.mozilla.org/attachment.cgi?id=26596 if you can get that data
> easily.
> 

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash

Boris Zbarsky [:bzbarsky]

Comment 19

•

16 years ago

Sweet!  Sounds to me like it's definitely worth getting this into NSPR if they'll take it.

Michal Novotny [:michal]

Assignee

Comment 20

•

16 years ago

(In reply to comment #15)
> (From update of attachment 335539 [details] [diff] [review])
> >Index: netwerk/cache/src/nsDiskCacheDevice.cpp
> >+ *  nsDiskCache::Hash(const char * key, PLHashNumber initval)
> 
> Why PLHashNumber and not just PRUint32?

Return value is PLHashNumber so it seems reasonable to have initial value of the same type.

>  static void hashmix(PRUint32& a, PRUint32& b, PRUint32& c);
> 
> ?  Or static inline void, if desired.  Somewhat preferable to using a macro, in
> my opinion.

OK, I've done a quick perf test and static inline is as fast as macro. Btw according to my simple test this new hash function is about 10% slower than the current one.

> >+  c = initval;         /* the previous hash value */
> 
> It is?  Or is it more of a "seed" value or something?

It is a comment from original source, but you are right.

> >Index: netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
> >+  return ((PRUint64)nsDiskCache::Hash(key, 0) << 32) | nsDiskCache::Hash(key, 0x7416f295);
> 
> Where does this last magic number come from?

dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

> Why is this the only consumer that needs to pass a second arg in here?

This is recommended way to obtain 64-bit hash.

> I was wondering whether it would make sense to stick this somewhere in xpcom
> (maybe even nsCRT, with a different name from the existing functions) so others
> can also use it.  Or if we fix nsCRT to use this algorithm, just use that
> directly here?

I wanted to do this but I don't like the fact that nsDiskCache::kCurrentVersion depends on hash algorithm that is implemented somewhere else. There should be at least some automated test that would fail if kCurrentVersion isn't updated whenever hash function changes.

Boris Zbarsky [:bzbarsky]

Comment 21

•

16 years ago

> Return value is PLHashNumber so it seems reasonable to have initial value of
> the same type.

Return value is PLDHashNumber, actually. I can see doing that.  And yes, the PLHash/PLDHash naming kinda sucks...  Document that the arg should just be any "reasonably random" number and that this arg is there to construct a 64-bit hash?

> dd if=/dev/urandom count=1 | sha1sum | cut -c 1-8

Document that this is just a randomly chosen number, then?  It looks very purposeful as things stand.

> but I don't like the fact that nsDiskCache::kCurrentVersion
> depends on hash algorithm that is implemented somewhere else.

That's a really good point.  OK, let's do disk cache separately and the other separately.

As far as that goes, by the way, padding char* with 0 after every byte should be pretty easy: you just end up with a bunch of guaranteed zeroes in various places in the algorithm and can simplify it somewhat.  Not sure what that does to the randomness of the result.

Michal Novotny [:michal]

Assignee

Comment 22

•

16 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

Attachment #335539 - Attachment is obsolete: true

Attachment #335821 - Flags: review?(bzbarsky)

Attachment #335539 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Comment 23

•

16 years ago

Comment on attachment 335821 [details] [diff] [review]
patch v2

Please, look at the change in DCacheHash(const char * key).

Attachment #335821 - Flags: review?(dcamp)

Boris Zbarsky [:bzbarsky]

Comment 24

•

16 years ago

Comment on attachment 335821 [details] [diff] [review]
patch v2

r+sr=bzbarsky

Attachment #335821 - Flags: superreview+

Attachment #335821 - Flags: review?(bzbarsky)

Attachment #335821 - Flags: review+

Michal Novotny [:michal]

Assignee

Comment 25

•

16 years ago

(In reply to comment #21)
> As far as that goes, by the way, padding char* with 0 after every byte should
> be pretty easy: you just end up with a bunch of guaranteed zeroes in various
> places in the algorithm and can simplify it somewhat.  Not sure what that does
> to the randomness of the result.

I've added it to the test and this is a result:

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- newhash padded LE
1.134 -- newhash padded BE

LE - little endian (zeroes are inserted after characters)
BE - big endian (zeroes are inserted before characters)

So maybe dropping MSB from unicode is better solution?

Boris Zbarsky [:bzbarsky]

Comment 26

•

16 years ago

Yeah, especially since then we don't have to worry about endianness issues.  Let's do that.

Michal Novotny [:michal]

Assignee

Comment 27

•

16 years ago

Attached patch patch for plhash, nsCRT and others (obsolete) — Details — Splinter Review

With these changes I've had problems building xpcom/tests, browser/components/build and embedding/browser/gtk/tests. I've changed Makefile.in so that it compiles now, but I'm almost sure that this isn't the right way.

Attachment #336502 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 28

•

16 years ago

Some quick comments:

1)  This will need review from bsmedberg (xpcom peer, build system stuff) and the
    NSPR peers.  I can probably review the content/layout/misc changes.
2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect that
    making it copying as you do is bad for performance.

Michal Novotny [:michal]

Assignee

Comment 29

•

16 years ago

(In reply to comment #28)
> 2)  nsCRT::HashCodeAsUTF8 is non-copying for performance reasons.  I suspect
> that making it copying as you do is bad for performance.

Yes I know, but the problem is that I need to know the length of the string before hashing. I've tried to rewrite the function so that it doesn't need to know the length, but in that form it is cca 2.5x slower.

Boris Zbarsky [:bzbarsky]

Comment 30

•

16 years ago

Hmm...  Could you do some sort of hash object thingie which you basically toss 4-byte-or-less chunks of data into, with a "final chunk" at the end which implements this hash function?  The existing consumers could just quickly toss things in 4 bytes at a time (as they do now, anyway)...

Or is that what you tried?

Michal Novotny [:michal]

Assignee

Comment 31

•

16 years ago

Attached file performance test — Details

(In reply to comment #30)
> Hmm...  Could you do some sort of hash object thingie which you basically toss
> 4-byte-or-less chunks of data into, with a "final chunk" at the end which
> implements this hash function?  The existing consumers could just quickly toss
> things in 4 bytes at a time (as they do now, anyway)...
> 
> Or is that what you tried?

My first attempt was quite simple, see perf_test_sequentially.cpp in the archive. Then I tried to work with 4 bytes as you've proposed (perf_test_class.cpp), but it is still much slower (2x). This implementation works on little endian only and accepts only the last piece of hased data to be less than 4 bytes.

Do "make test" for results.

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Attachment #336502 - Flags: review?(wtc)

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Attachment #336502 - Flags: review?(benjamin)

Boris Zbarsky [:bzbarsky]

Comment 32

•

16 years ago

Hrm....  It actually looks even worse at -Os, which is what we usually build with.

Would it make any sense to convert other consumers, leave atoms using the existing code or new functions that do what the existing code does, then do some whole-system timing of atoms?  If the new hash function + copy is still better than the level of hash collisions we get now, it's worth it.  Otherwise it's not.

Boris Zbarsky [:bzbarsky]

Comment 33

•

16 years ago

The atom thing would be a followup bug, I suspect.  In fact, we could land the cache changes and move all the other stuff to followups if you'd prefer that.

Michal Novotny [:michal]

Assignee

Comment 34

•

16 years ago

Sorry for my silly question, but what exactly do you mean by "atom"?

Boris Zbarsky [:bzbarsky]

Comment 35

•

16 years ago

Oh, sorry.  The AtomTable is the consumer of that HashCodeAsUTF8 method.  It relies on hashing UTF-16 and UTF-8 strings and looking up the same nsIAtom objects based on both without copying.  So it uses a non-copying hashcode generator and non-copying comparison function.

Wan-Teh Chang

Comment 36

•

16 years ago

Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

In nsprpub/lib/ds/plhash.c, use strlen instead of PL_strlen to
avoid the need to include "plstr.h".  (<string.h> is already
included.)

Benjamin Smedberg

Comment 37

•

16 years ago

Comment on attachment 336502 [details] [diff] [review]
patch for plhash, nsCRT and others

I did not review NSPR or NSS changes. Please split those changes out into separate patches because they are different projects.

>Index: browser/components/build/Makefile.in

>+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \

This directory uses frozen linkage and cannot link against these libs.

You should probably remove nsCRT::HashCode and put the hash impl in xpcom/glue/nsCRTGlue.{h,cpp}

>Index: embedding/browser/gtk/tests/Makefile.in

>+		$(XPCOM_LIBS) \

This uses frozen-independent linkage and must not link against XPCOM or NSPR.

> PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)

I would be very surprised if we could tolerate this, performance-wise: we hash 16-bit atoms a lot. Have you profiled this code to show that it is not a performance hotspot?


>+  /**
>+   * nsCRT::BufferHashCode implements the same hash algorithm as PL_HashString.
>+   * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
>+   * value as wide strings, e.g. "Hello" and L"Hello".
>+   */

For what set of characters? ASCII-only? This comment needs to be much more precise.

>Index: xpcom/glue/nsTHashtable.cpp

>+#include "nsCRT.h"

nsTHashtable is in the glue and must not include nsCRT.h. I'm surprised it even compiled.

>Index: xpcom/tests/Makefile.in

Similar glue issues here.

>Index: xpcom/tests/TestHash.cpp

>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.
>+ * Portions created by the Initial Developer are Copyright (C) 1998
>+ * the Initial Developer. All Rights Reserved.

really?

>+int main(int argc, char** argv)

This test needs to use the standard framework for reporting errors. For best results use TestHarness.h

Attachment #336502 - Flags: review?(wtc)

Attachment #336502 - Flags: review?(bzbarsky)

Attachment #336502 - Flags: review?(benjamin)

Attachment #336502 - Flags: review-

Michal Novotny [:michal]

Assignee

Comment 38

•

16 years ago

Attached patch xpcom part of new patch (obsolete) — Details — Splinter Review

Attachment #336502 - Attachment is obsolete: true

Attachment #347777 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Comment 39

•

16 years ago

Attached patch security part (obsolete) — Details — Splinter Review

Attachment #347778 - Flags: review?(wtc)

Michal Novotny [:michal]

Assignee

Comment 40

•

16 years ago

Attached patch rdf part (obsolete) — Details — Splinter Review

Attachment #347779 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Comment 41

•

16 years ago

Attached patch parser part (obsolete) — Details — Splinter Review

Attachment #347780 - Flags: review?(jst)

Michal Novotny [:michal]

Assignee

Comment 42

•

16 years ago

Attached patch nspr part (obsolete) — Details — Splinter Review

Attachment #347781 - Flags: review?(wtc)

Michal Novotny [:michal]

Assignee

Comment 43

•

16 years ago

Attached patch netwerk part (obsolete) — Details — Splinter Review

Attachment #347784 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Comment 44

•

16 years ago

Attached patch extensions/spellcheck (obsolete) — Details — Splinter Review

I couldn't find any owner/peer of this code. Boris, can you please do the review?

Attachment #347785 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Comment 45

•

16 years ago

Attached patch embedding part (obsolete) — Details — Splinter Review

Attachment #347786 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Comment 46

•

16 years ago

Attached patch content part (obsolete) — Details — Splinter Review

Attachment #347787 - Flags: review?(jst)

Michal Novotny [:michal]

Assignee

Comment 47

•

16 years ago

Attached patch caps part (obsolete) — Details — Splinter Review

Attachment #347788 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Comment 48

•

16 years ago

(In reply to comment #37)
> (From update of attachment 336502 [details] [diff] [review])
> I did not review NSPR or NSS changes. Please split those changes out into
> separate patches because they are different projects.

I've split whole patch by modules.

> >Index: browser/components/build/Makefile.in
> 
> >+	$(DEPTH)/xpcom/ds/$(LIB_PREFIX)xpcomds_s.$(LIB_SUFFIX) $(DEPTH)/xpcom/string/src/$(LIB_PREFIX)string_s.$(LIB_SUFFIX) \
> 
> This directory uses frozen linkage and cannot link against these libs.
> 
> You should probably remove nsCRT::HashCode and put the hash impl in
> xpcom/glue/nsCRTGlue.{h,cpp}

I did it and it solved all problems with linking...

> > PRUint32 nsCRT::HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
> 
> I would be very surprised if we could tolerate this, performance-wise: we hash
> 16-bit atoms a lot. Have you profiled this code to show that it is not a
> performance hotspot?

I've reverted this change, because I couldn't include nsString.h in nsCRTGlue.cpp. I've created class NS_HashByBytes that can hash data in sequence.

Boris Zbarsky [:bzbarsky]

Comment 49

•

16 years ago

Comment on attachment 347784 [details] [diff] [review]
netwerk part

Why not pass lengths from all the callsites (or heck, have an inline NS_HashCode that takes const nsCString)?  Also, the fact that on OOM we'll return a different hashcode kinda bothers me, but I can live with it, I guess...

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Attachment #347785 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 50

•

16 years ago

Comment on attachment 347788 [details] [diff] [review]
caps part

OK, but again if we had an nsCString version we could use it here.

Attachment #347788 - Flags: review?(bzbarsky) → review+

Benjamin Smedberg

Comment 51

•

16 years ago

Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

>diff --git a/xpcom/glue/nsCRTGlue.cpp b/xpcom/glue/nsCRTGlue.cpp

>+/**
>+ * |NS_HashCode| is identical to |PL_HashString|.
>+ */
>+PRUint32 NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+{
>+  if (!str) return 0;
>+
>+  PRUint32 len = strlen(str);
>+
>+  if ( resultingStrLen )
>+    *resultingStrLen = len;
>+
>+  return PL_HashData(str, len, 0);
>+}

You still have a problem of the standalone glue linking against NSPR symbols.

>+PRUint32 NS_HashCodeAsUTF8(const PRUnichar* start, PRUint32 length)
>+{
>+  const PRUnichar* s = start;
>+  const PRUnichar* end = start + length;
>+
>+  PRUint16 W1 = 0;      // the first UTF-16 word in a two word tuple
>+  PRUint32 U = 0;       // the current char as UCS-4
>+  int code_length = 0;  // the number of bytes in the UTF-8 sequence for the current char
>+
>+  NS_HashByBytes hash;
>+
>+  PRUint16 W;
>+  while ( s < end )
>+    {
>+      W = *s++;
>+        /*
>+         * On the fly, decoding from UTF-16 (and/or UCS-2) into UTF-8 as per
>+         *  http://www.ietf.org/rfc/rfc2781.txt
>+         *  http://www.ietf.org/rfc/rfc3629.txt
>+         */

is it impossible to reuse our existing UTF8CharEnumerator class (nsUTF8Utils.h)? I don't particularly like duplicating all this logic.

>diff --git a/xpcom/glue/nsCRTGlue.h b/xpcom/glue/nsCRTGlue.h

>+// Computes the hashcode for a c-string, returns the string length as
>+// an added bonus.
>+NS_COM_GLUE PRUint32
>+NS_HashCode(const char* str, PRUint32* resultingStrLen = nsnull);

Please use javadoc-style comments.

>+// Computes the hashcode for a length number of bytes of c-string data.
>+NS_COM_GLUE PRUint32 NS_HashCode(const char* start, PRUint32 length);

nit, function name goes on the next line

> PRUint32
> HashString(const char *str)
> {
>-  PRUint32 code = 0;
>-
>-  while (*str) {
>-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
>-    ++str;
>-  }
>-
>-  return code;
>+  return NS_HashCode(str);
> }

Why keep this function if it just forwards to NS_HashCode? Even if you don't do it in this pass, please file a followup bug to rename all uses of this function to the new version.

Attachment #347777 - Flags: review?(benjamin) → review-

Benjamin Smedberg

Updated

•

16 years ago

Attachment #347779 - Flags: review?(benjamin) → review+

Benjamin Smedberg

Updated

•

16 years ago

Attachment #347786 - Flags: review?(benjamin) → review+

Michal Novotny [:michal]

Assignee

Comment 52

•

16 years ago

(In reply to comment #51)
> >+  return PL_HashData(str, len, 0);
> >+}
> 
> You still have a problem of the standalone glue linking against NSPR symbols.

Hmm, so I need to copy code from PL_HashData to some function in nsCRTGlue.cpp, right?

Wan-Teh Chang

Comment 53

•

16 years ago

Comment on attachment 347781 [details] [diff] [review]
nspr part

Thanks for the patch.  I'm afraid that we can't change
PL_HashString because of backward compatibility.  As your
patch v2 (attachment 335821 [details] [diff] [review]) shows, you need to bump the
kCurrentVersion of nsDiskCache when you change its hash
function.  So similarly, there could be users of
PL_HashString that have stored persistent data based on
the current PL_HashString, and they will be broken when
they upgrade to a new version of NSPR and PL_HashString
produces different hashes.

Please correct me if I'm wrong.

You can add a new function PL_HashString2 (for lack of
better name) to deal with this issue.  You also need to
add the new function(s) to nsprpub/lib/ds/plds.def in a
new NSPR_4.8 section at the end of the file.

Nit: NSPR has a different formatting style from Mozilla.
Please imitate the prevalent formatting style of in that
file.  (I can fix that for you when I check in the patch.)

Attachment #347781 - Flags: review?(wtc) → review-

Wan-Teh Chang

Comment 54

•

16 years ago

Comment on attachment 347778 [details] [diff] [review]
security part

These changes should be OK.  We'll need to make sure
we only use these two hash functions for in-memory
hash tables that are rebuilt in every process startup.
We need to watch out for the problem I described in
comment 53.

Attachment #347778 - Flags: superreview?(rrelyea)

Wan-Teh Chang

Comment 55

•

16 years ago

Comment on attachment 347778 [details] [diff] [review]
security part

r=wtc.  I verified that these two hash functions
nss_item_hash and nss_certificate_hash are only
passed to PL_NewHashTable.

Attachment #347778 - Flags: review?(wtc) → review+

Johnny Stenback (:jst)

Updated

•

16 years ago

Attachment #347787 - Flags: superreview+

Attachment #347787 - Flags: review?(jst)

Attachment #347787 - Flags: review+

Robert Relyea

Updated

•

16 years ago

Attachment #347778 - Flags: superreview?(rrelyea) → superreview+

Robert Relyea

Comment 56

•

16 years ago

Comment on attachment 347778 [details] [diff] [review]
security part

r+ rrelyea

Johnny Stenback (:jst)

Updated

•

16 years ago

Attachment #347780 - Flags: superreview?(mrbkap)

Attachment #347780 - Flags: review?(jst)

Attachment #347780 - Flags: review+

Johnny Stenback (:jst)

Comment 57

•

16 years ago

Comment on attachment 347777 [details] [diff] [review]
xpcom part of new patch

+/**
+ * NS_BufferHashCode implements the same hash algorithm as PL_HashString.
+ * MSB in PRUnichar is dropped and thus narrow strings are hashed to the same
+ * value as wide strings, e.g. "Hello" and L"Hello". This is true for ASCII and
+ * all other strings where character in narrow string is equal to LSB
+ * of character in wide string.

Do we know for a fact that the above assumption is safe for the atom table? Any valid UTF8 string must hash to the same value as its UTF16 representation for the atom hash table to work right AFAIC.

Blake Kaplan (:mrbkap) (inactive)

Updated

•

16 years ago

Attachment #347780 - Flags: superreview?(mrbkap) → superreview+

Michal Novotny [:michal]

Assignee

Comment 58

•

16 years ago

(In reply to comment #57)
> Do we know for a fact that the above assumption is safe for the atom table? Any
> valid UTF8 string must hash to the same value as its UTF16 representation for
> the atom hash table to work right AFAIC.

The old behaviour was IMHO the same. There was no guarantee that hash values are equal for non-latin1 strings.

Boris Zbarsky [:bzbarsky]

Comment 59

•

16 years ago

That is, nsCRT::HashCode(PRUnichar*) and nsCRT::HashCode(char*) only behaved the same for latin1, right?  nsCRT::HashCodeAsUTF8(PRUnichar*) behaved the same as nsCRT::HashCode(char* /* UTF8 */) and that's preserved in this patch.

Michal Novotny [:michal]

Assignee

Comment 60

•

16 years ago

(In reply to comment #51)
> > PRUint32
> > HashString(const char *str)
> > {
> >-  PRUint32 code = 0;
> >-
> >-  while (*str) {
> >-    code = PR_ROTATE_LEFT32(code, 4) ^ PRUint32(*str);
> >-    ++str;
> >-  }
> >-
> >-  return code;
> >+  return NS_HashCode(str);
> > }
> 
> Why keep this function if it just forwards to NS_HashCode? Even if you don't do
> it in this pass, please file a followup bug to rename all uses of this function
> to the new version.

I'll do it within this bug. I've added NS_HashCode(const nsAString &) and now all functions HashString() can be replaced with NS_HashCode(). Due to this replacement I've found some places that still uses old hash algorithm: JSJ_HashString(), JS_HashString(), js_HashString(). I guess that I must duplicate the code again instead of calling NSPR hash function, right? Is there the same problem as described in comment #53? I.e. should I create a new function JS_HashString2()?

Wan-Teh Chang

Comment 61

•

16 years ago

Have you considered Paul Hsieh's SuperFastHash?
http://www.azillionmonkeys.com/qed/hash.html

I'm also not sure if we need to replace the hash functions
that are used with PLHashTable such as PL_HashString.
PLHashTable handles collisions by chaining, so having
collisions is not disastrous.  It seems that in the extra
time I spend in this new hash function, I can follow a
lot of chaining pointers, so I'm not sure if using the
new hash function is overall a win.

Wan-Teh Chang

Comment 62

•

16 years ago

Bug 412345 and bug 435802 are also looking at string hash
functions.

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Blocks: 412345

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Blocks: 435802

Michal Novotny [:michal]

Assignee

Comment 63

•

16 years ago

(In reply to comment #61)
> Have you considered Paul Hsieh's SuperFastHash?
> http://www.azillionmonkeys.com/qed/hash.html

I didn't know about this function. It seems to be much faster but I did a quick comparison with https://bugzilla.mozilla.org/attachment.cgi?id=26596 and the result isn't too good.

1.1635 -- hashpjw
1.1622 -- PL_HashString
1.1814 -- nsCRT::HashCode
1.2 -- newhash
1.1564 -- superfasthash

> I'm also not sure if we need to replace the hash functions
> that are used with PLHashTable such as PL_HashString.
> PLHashTable handles collisions by chaining, so having
> collisions is not disastrous.  It seems that in the extra
> time I spend in this new hash function, I can follow a
> lot of chaining pointers, so I'm not sure if using the
> new hash function is overall a win.

Yes, this is maybe true. Let's try it and see what happens. At least the hash function will be now only at few places and trying some other function will be quick and simple.

Boris Zbarsky [:bzbarsky]

Comment 64

•

16 years ago

Michal, can you answer the questions from comment 49?

Michal Novotny [:michal]

Assignee

Comment 65

•

16 years ago

(In reply to comment #49)
> (From update of attachment 347784 [details] [diff] [review])
> Why not pass lengths from all the callsites (or heck, have an inline
> NS_HashCode that takes const nsCString)?

You mean nsACString, right? It will be changed in a new patch. But it won't be inlined. It didn't compile when I tried to define it as inlined.

> Also, the fact that on OOM we'll
> return a different hashcode kinda bothers me, but I can live with it, I
> guess...

I'll fix it. I'll changed it so that it will use NS_HashByBytes.

Boris Zbarsky [:bzbarsky]

Comment 66

•

16 years ago

> You mean nsACString, right?

No, I really meant nsCString, assuming all callers have one and assuming this is somewhat "internal" API that doesn't have to worry about external strings.  If not, I suppose we could do an nsACString one instead.

I have no idea what the issue with compiling is there; Benjamin might know...

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Attachment #347784 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 67

•

16 years ago

Comment on attachment 347784 [details] [diff] [review]
netwerk part

Removing review request pending said updated patch.

Michal Novotny [:michal]

Assignee

Comment 68

•

16 years ago

Attached patch patch2 v3 nspr+nss — Details — Splinter Review

new hash function is PL_HashString2

Attachment #347778 - Attachment is obsolete: true

Attachment #347781 - Attachment is obsolete: true

Attachment #353743 - Flags: review?(wtc)

Michal Novotny [:michal]

Assignee

Comment 69

•

16 years ago

Attached patch patch2 v3 all except nss+nspr (obsolete) — Details — Splinter Review

changes in the new version:
- NS_HashCodeAsUTF8 uses UTF16CharEnumerator
- HashString functions were removed and replaced with NS_HashCode
- JS_HashString/JSJ_HashString/js_HashString changed to new hash function
- PL_HashString usage replaced with PL_HashString2
- StringHash in nsHttp.cpp uses NS_HashByBytes instead of NS_strdup+NS_HashCode
- NS_HashCode(const char*, PRUint32*) is exported as C function. NS_HashCode is needed in pldhash.c and I've found this solution better then creating new function just for this usage.

Attachment #347777 - Attachment is obsolete: true

Attachment #347779 - Attachment is obsolete: true

Attachment #347780 - Attachment is obsolete: true

Attachment #347784 - Attachment is obsolete: true

Attachment #347785 - Attachment is obsolete: true

Attachment #347786 - Attachment is obsolete: true

Attachment #347787 - Attachment is obsolete: true

Attachment #347788 - Attachment is obsolete: true

Attachment #353746 - Flags: review?(bzbarsky)

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Attachment #353746 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Comment 70

•

16 years ago

(In reply to comment #66)
> No, I really meant nsCString, assuming all callers have one and assuming this
> is somewhat "internal" API that doesn't have to worry about external strings. 
> If not, I suppose we could do an nsACString one instead.

NS_HashCode(const nsACString &) is now needed because all HashString(...) were replaced by NS_HashCode(...). So it is now a question if it's worth to havenlined NS_HashCode(const nsCString&) to speed up calling at few places.

> I have no idea what the issue with compiling is there; Benjamin might know...

This is the error when the function is inlined:
/usr/bin/ld: TestHash: hidden symbol `NS_HashCode(nsCString const&)' isn't defined
/usr/bin/ld: final link failed: Nonrepresentable section on output

Benjamin, do you have an idea what's wrong?

Benjamin Smedberg

Comment 71

•

16 years ago

nsACString == nsCSubstring nowadays, so I don't think there's any benefit to making the API use nsCString& instead of nsACString&

jsdhash is C++ now... please make pldhash be C++ also, and we can get rid of the extern "C" marking. In particular, you have multiple functions with the same name, one of which is extern "C"... this is a recipe for future type mismatches.

Michal Novotny [:michal]

Assignee

Comment 72

•

16 years ago

Attached patch patch2 v4 - pldhash.c changed to C++ (obsolete) — Details — Splinter Review

Attachment #353746 - Attachment is obsolete: true

Attachment #353867 - Flags: review?(benjamin)

Attachment #353746 - Flags: review?(bzbarsky)

Attachment #353746 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Attachment #353867 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 73

•

16 years ago

>+++ b/xpcom/ds/nsStaticNameTable.cpp
> caseInsensitiveStringHashKey(PLDHashTable *table, const void *key)
>+        nsAutoString keyCopy(tableKey->mKeyStr.m2b->get());

This is what this code used to do before it was rewritten to work as now.  The result showed up in CSS parsing profiles.  I'd rather not change this code as part of a huge patch; changing it separated so that the performance impact can be gauged, including on the microbenchmarks used in the bug that added this code, is better.

>+++ b/xpcom/glue/nsCRTGlue.cpp
>+NS_HashCode(const char* str, PRUint32* resultingStrLen)
>+  if ( resultingStrLen )

Ditch the spaces around resultingStrLen?

>+NS_HashCode(const PRUnichar* str, PRUint32* resultingStrLen)
>+  const PRUnichar* k = str;
>+  while (*k)
>+    k++;

NS_strlen?

>+  if ( resultingStrLen )

Ditch the spaces.

>+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)

The naming here is a bit weird... NS_StrHashCode?

I'm not seeing an implementation of the NS_HashCode that takes char* and a length.  Where should I be looking?

Also, did you use the sed script to create pldhash.cpp?  Or did you edit it manually?

Michal Novotny [:michal]

Assignee

Comment 74

•

16 years ago

Attached patch patch2 v5 - changes according to comment #73 — Details — Splinter Review

(In reply to comment #73)
> >+NS_BufferHashCode(const PRUnichar* s, PRUint32 length)
> 
> The naming here is a bit weird... NS_StrHashCode?

It was nsCRT::BufferHashCode so I kept the name. But I can't see any reason why the name shouldn't be also NS_HashCode. This is just PRUnichar version of NS_HashCode(char *, int). But I can change it to NS_StrHashCode if you wish.

> I'm not seeing an implementation of the NS_HashCode that takes char* and a
> length.  Where should I be looking?

It is just after NS_BufferHashCode (now renamed to NS_HashCode too) and before NS_HashByBytes::NS_HashByBytes.

> Also, did you use the sed script to create pldhash.cpp?  Or did you edit it
> manually?

I did the changes manually but I've updated (and tested) the sed script. But somebody else already did some change that isn't in the script.

$ diff -w pldhash.cpp pldhash.cpp_sed 
72c72
< #define JSDHASH_SINGLE_LINE_ASSERTION PR_ASSERT
---
> #define JSDHASH_ONELINE_ASSERT PR_ASSERT

Attachment #353867 - Attachment is obsolete: true

Attachment #353911 - Flags: review?(bzbarsky)

Attachment #353867 - Flags: review?(bzbarsky)

Attachment #353867 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Attachment #353911 - Flags: review?(benjamin)

Michal Novotny [:michal]

Assignee

Comment 75

•

16 years ago

Attached patch separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp (obsolete) — Details — Splinter Review

Attachment #353913 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Attachment #353911 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 76

•

16 years ago

Comment on attachment 353911 [details] [diff] [review]
patch2 v5 - changes according to comment #73

NS_HashCode(PRUnichar*) sounds fine to me.  I thought for some reason that there was a void* overload too; no idea what made me think that.

> But somebody else already did some change that isn't in the script.

Yeah, that was me forgetting to run the script after changing it in response to some review comments.  :(  Just run the script and include that change in what you chek in, please.

r=bzbarsky

Boris Zbarsky [:bzbarsky]

Comment 77

•

16 years ago

Comment on attachment 353913 [details] [diff] [review]
separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp

I'd like to see those benchmark results before reviewing this.

Michal Novotny [:michal]

Assignee

Comment 78

•

16 years ago

(In reply to comment #77)
> (From update of attachment 353913 [details] [diff] [review])
> I'd like to see those benchmark results before reviewing this.

What bug contains those benchmarks? I couldn't find it. I tried to profile caseInsensitiveStringHashKey using callgrind.

without patch 1021712 Ir
Ir     Count
988730 5497  caseInsensitiveStringHashKey (self)
 27282 4547  nsString::get()
  5700  950  nsCString::get()


with patch 7245502 Ir
Ir      Count
 935904 5497  caseInsensitiveStringHashKey (self)
2982761 4547  nsAutoString::nsAutoString(unsigned short const*, unsigned)
1970683 4547  NS_HashCode(unsigned short const *, unsigned)
 416645  950  NS_HashCode(char const *, unsigned)
 378815  950  nsCAutoString::nsCAutoString(char const*, unsigned)
 291008 4547  nsAutoString::~nsAutoString()
 145504 4547  nsAString_internal::BeginWriting()
  60800  950  nsCAutoString::~nsCAutoString()
  30400  950  nsACString_internal::BeginWriting()
  27282 4547  nsString::get()
   5700  950  nsCString::get()

Boris Zbarsky [:bzbarsky]

Comment 79

•

16 years ago

Hmm.  I thought I'd linked to testcases from bug 348691.

How about this:

<body style="display: none">
  <script>
    var s = document.body.style;
    for (var i = 0; i < 1000000; ++i) {
      s.setPropertyValue("left", "auto");
    }
  </script>
</body>

Or just timing the parsing of a largish CSS file, for that matter.

Michal Novotny [:michal]

Assignee

Comment 80

•

16 years ago

Simple measurement using timestamps gives (1000000x setProperty):
 - without patch 48550ms
 - with patch 49506ms

Time spent in caseInsensitiveStringHashKey using callgrind (10000x setProperty):
 - without patch 1721124 Ir
 - with patch 19109906 Ir

Boris Zbarsky [:bzbarsky]

Comment 81

•

16 years ago

So this change significantly slows down caseInsensitiveStringHashKey but in that testcase only about 2% of the total cost is in that function?  That sounds about right to me, actually.  The cost would be higher in a stylesheet parse, though probably not over 5%.

Benjamin Smedberg

Comment 82

•

16 years ago

You can't land attachment 353911 [details] [diff] [review] without 353913, right? Is there a practical reason we can't avoid the string copy in the caseInsensitiveStringHashKey case, kinda the same way it works today, by hashing it a character at a time using the same algorithm?

Michal Novotny [:michal]

Assignee

Comment 83

•

16 years ago

(In reply to comment #81)
> So this change significantly slows down caseInsensitiveStringHashKey but in
> that testcase only about 2% of the total cost is in that function?

Yes, valgrind shows ca 1.3% in caseInsensitiveStringHashKey.

Michal Novotny [:michal]

Assignee

Comment 84

•

16 years ago

Attached patch separate patch for matchNameKeysCaseInsensitive in xpcom/ds/nsStaticNameTable.cpp v2 — Details — Splinter Review

(In reply to comment #82)
> You can't land attachment 353911 [details] [diff] [review] without 353913, right?

I thing that 353913 isn't needed for 353911. I guess that it is OK to have old hash function only in nsStaticCaseInsensitiveNameTable...

> Is there a practical
> reason we can't avoid the string copy in the caseInsensitiveStringHashKey case,
> kinda the same way it works today, by hashing it a character at a time using
> the same algorithm?

Good point. This new patch uses NS_HashByBytes and it is faster. The same test as in comment #80 gives 48930ms and 10591143 Ir.

Attachment #353913 - Attachment is obsolete: true

Attachment #355597 - Flags: review?(benjamin)

Attachment #353913 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 85

•

16 years ago

Still almost 10x slower than the old code, no?  At least according to Ir?

Michal Novotny [:michal]

Assignee

Comment 86

•

16 years ago

(In reply to comment #85)
> Still almost 10x slower than the old code, no?  At least according to Ir?

Of course. I meant that it is faster than previous patch, not original code...

Benjamin Smedberg

Updated

•

16 years ago

Attachment #355597 - Flags: review?(benjamin) → review+

Dave Camp (:dcamp)

Updated

•

16 years ago

Attachment #335821 - Flags: review?(dcamp) → review+

Michal Novotny [:michal]

Assignee

Updated

•

16 years ago

Keywords: checkin-needed

Whiteboard: only patch #335821 should be checked in now

Benjamin Smedberg

Updated

•

16 years ago

Attachment #353911 - Flags: review?(benjamin) → review+

Michal Novotny [:michal]

Assignee

Comment 87

•

15 years ago

I'm now investigating bug #321361 and it seems to be similar to #355567. Checking in patch #335821 would make reproducing the bug very hard. So I'm postponing the checkin until bug #321361 is fixed and properly tested...

Depends on: 321361

Keywords: checkin-needed

Whiteboard: only patch #335821 should be checked in now

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Blocks: 441052

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

15 years ago

OS: Linux → All

Hardware: x86 → All

WADA:World Anti-bad-Duping Agency

Comment 88

•

15 years ago

Linkifying Bug 355567 & Bug 335821.

Michal Novotny [:michal]

Assignee

Updated

•

15 years ago

Keywords: checkin-needed

Whiteboard: only patch #335821 should be checked in now

jedierikb

Comment 90

•

15 years ago

Could someone post or point the specific hash implementation here which is the reason for the collisions?  It would be good to pre-test files as cache-able before deploying sites.

Michal Novotny [:michal]

Assignee

Comment 91

•

15 years ago

(In reply to comment #90)

The problematic hash is:
http://hg.mozilla.org/mozilla-central/annotate/e74b36d547de/netwerk/cache/src/nsDiskCacheDevice.cpp#l249

For more info see: 
https://bugzilla.mozilla.org/show_bug.cgi?id=355567#c28

Serge Gautherie (:sgautherie)

Comment 92

•

15 years ago

Comment on attachment 335821 [details] [diff] [review]
patch v2


Could you update this cvs patch to a current hg one ? Thanks.

Michal Novotny [:michal]

Assignee

Comment 93

•

15 years ago

Attached patch patch v2 - diff against hg rev. 4b2a90a7726e [Checkin: Comment 94] — Details — Splinter Review

Attachment #335821 - Attachment is obsolete: true

Michal Novotny [:michal]

Assignee

Updated

•

15 years ago

Whiteboard: only patch #335821 should be checked in now → only patch #368619 should be checked in now

Serge Gautherie (:sgautherie)

Comment 94

•

15 years ago

Comment on attachment 368619 [details] [diff] [review]
patch v2 - diff against hg rev. 4b2a90a7726e
[Checkin: Comment 94]


http://hg.mozilla.org/mozilla-central/rev/999999125311

Attachment #368619 - Attachment description: patch v2 - diff against hg rev. 4b2a90a7726e → patch v2 - diff against hg rev. 4b2a90a7726e [Checkin: Comment 94]

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Keywords: checkin-needed

Whiteboard: only patch #368619 should be checked in now

Target Milestone: --- → mozilla1.9.2a1

Marco Perez [:mpk]

Updated

•

15 years ago

Blocks: 489882

Michal Novotny [:michal]

Assignee

Comment 95

•

15 years ago

Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Please could anyone of you do the review?

Attachment #353743 - Flags: review?(ted.mielczarek)

Boris Zbarsky [:bzbarsky]

Comment 96

•

15 years ago

Michal, have you tried mailing wtc directly?

Michal Novotny [:michal]

Assignee

Comment 97

•

15 years ago

Yes, few months ago...

Wan-Teh Chang

Comment 98

•

15 years ago

Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Hi Michal,

Sorry about the late response.  I didn't have time to follow
this bug closely.  Part of the reason is that this bug doesn't
seem to affect NSPR or NSS that much.

PL_HashString is an extremely simple hash function, whereas
PL_HashString2 is quite complicated.  This has two implications.

1. We'd need to give guidance to NSPR users on which string
hash function is appropriate for what purposes.

2. PLHashTable resolves hash collisions by chaining.  I guess
the advantage of PL_HashString2 is that it reduces hash collisions,
but intuitively with the extra computation performed by
PL_HashString2, I could follow a lot of chaining links in the
same amount of time.  So it's not clear PL_HashString2 is a win
for PLHashTable.

I know you want to close this bug, so I propose the following:

1. I delegate to a Mozilla developer I trust such as bz the review
of your patch.  Then I just need some comments in plhash.h to
tell NSPR users which string hash function should be used for
what purposes.

or

2. Abandon this patch for nspr+nss.

Thanks.

(not currently active) Ted Mielczarek

Comment 99

•

15 years ago

Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

I'm not at all qualified to review this. Please follow wtc's suggestions.

Attachment #353743 - Flags: review?(ted.mielczarek)

Jason Duell

Updated

•

14 years ago

Blocks: 175600

Jason Duell

Updated

•

14 years ago

Blocks: 193911

Jason Duell

Updated

•

14 years ago

Blocks: http_cache

Steffen Wilberg

Comment 101

•

14 years ago

(In reply to comment #98)
> (From update of attachment 353743 [details] [diff] [review])
> Hi Michal,
...
> I know you want to close this bug, so I propose the following:
> 
> 1. I delegate to a Mozilla developer I trust such as bz the review
> of your patch.  Then I just need some comments in plhash.h to
> tell NSPR users which string hash function should be used for
> what purposes.

So Michal, why don't you request review from bzbarsky?

Steve Chapel

Updated

•

14 years ago

Blocks: 216490

Jason Duell

Comment 102

•

14 years ago

Renaming since the HTTP cache part of this bug already landed (comment 94)

Summary: Some files are never cached due to hash collisions which are quite common due to weak string hash function → Provide alternative to NSPR's default string hash function

Jason Duell

Comment 103

•

14 years ago

Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

Seeing if bz will take review as WTC requested.  Otherwise this bug is WONTFIX, yes?

Attachment #353743 - Flags: review?(wtc) → review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 104

•

14 years ago

> Renaming since the HTTP cache part of this bug already landed (comment 94)

Imho this bug should have been resolved fixed at that point, with followup bugs filed for followup work.  One bug per issue....  Can we please do that now?

I can do the review, but it'll likely be a week or more before I get to it; lots of other stuff in the queue ahead of it.

Jason Duell

Updated

•

14 years ago

No longer blocks: http_cache

Jason Duell

Comment 105

•

14 years ago

Renaming back to original issue, which has landed, and marking FIXED.  Michal, if you like please open a followup NSPR bug with patch for bz.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Summary: Provide alternative to NSPR's default string hash function → Some files are never cached due to hash collisions which are quite common due to weak string hash function

Jason Duell

Comment 106

•

14 years ago

Comment on attachment 353743 [details] [diff] [review]
patch2 v3 nspr+nss

wrong version of patch submitted for review.

Attachment #353743 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Updated

•

12 years ago

Blocks: 729940

You need to log in before you can comment on or make changes to this bug.