Closed Bug 435802 Opened 16 years ago Closed 6 years ago

How does Judyhash compare to the mozilla hash table implementation?

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: skumar, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

Judyhash (http://judyhash.sourceforge.net/) implementation claims better performance for resizing and collisions. Just curious if there will be any performance improvement if mozilla makes use of judyhash, as mozilla uses a large number of hash tables and hash table operations show up as hotspots in performance profiling (eg: SearchTable, PL_DHashTableOperate).

Not sure if judyhash can be made free of std:: since mozilla does not use std::.
Any thoughts?

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Component: General → XPCOM
Product: Firefox → Core
QA Contact: general → xpcom
The licensing is wrong, I think; it's not really a viable path, based on what I recall from the last time people looked at it, but instrumenting the hash tables would tell us what operations dominate (lookup, insert, resize, etc.).

http://www.azillionmonkeys.com/qed/hash.html might let us squeeze a bit more out of the hash tables we do use, if you're looking to run an experiment.  More likely, I suspect, we'll want to look at changing how we use hash tables rather than changing the tables we use, once we run out of other things to speed up. :)
Keywords: perf
Version: unspecified → Trunk
I did some profiling of Mozilla hash function vs Hsieh's hash function that mshaver pointed out. 

Hsieh's hash seems to do better.

I did this analysis on Mac. I used the disabled dump meter functionality with some modifications to collect this data.

I am not sure about analyzing this data completely, but following are the simple things I observed.

(In the attached XL workbook, there are 4 sheets. pl for the dump from pldhash.c with original code. js - for jsdhash.c, pl-hs for pldhash.c modified to use HS hash and js-hs for jsdhash.c using HS hash)

Test case:
1.Launch.
2. cnn.com
3. Open a tab and goto maps.google.com and search richardson, tx to whistler, bc
4. Open another tab and check gmail.
5. In the same tab, check out eenadu.net, rediff.com and slashdot.org
6. close

A)
from pl:
131072	78610	0	277829	125491	152338	0.598516	1.4051	2.07939	15	0	78610	0	125491	0	0	0	0	0	2	0	0	0

from pl-hs:
131072	52820	0	220284	93736	126548	0.455135	1.24432	1.02539	10	0	52820	0	93736	0	0	0	0	0	2	0	0	0


Here, pl-hs is showing 0.455135 mean steps per search where as pl is showing 0.598516 Most other metrics seem to be smaller for HS hash. (see XL sheet for description of these numbers)


B) 

I measured the total time spent in PL_DHashStringKey () with original code and with modification to use HS hash function.

It took 306365928 micro seconds for 89391 calls with original code. (av = 3427.2569)
It took 268095645 micro seconds for 84175 calls with HS hash code. (av = 3184.9794)

(Attached the D-Script used. Basically I used vtimestamp in DTrace, which is supposed to give exact time spent in this function excluding any context switches, etc. I accumulated the difference of vtimestamp when the function returned vs entry of the function and measured the number of times the function got called.)

C) There is no way to exactly link the output from the dumps with original code and HS hash code - except with the size of hash tables.

D) Firefox uses so many hash tables!

E) EVen the number of tables dumped seem to be smaller with HS hash. The number of entries are also smaller. These might not relate to the hash functions?

F) There are other places where hashing is implemented. (nsCRT.cpp, nsTHashTable.cpp, nsStaticNameTable.cpp, xpcmaps.cpp). I tried to replace these too with HS hash, but I got some random crashes, which I did not debug much.
See also bug 290032, which needs a better string hash function
for nsDiskCache::Hash.
Depends on: 290032
Given the limited key/value types judyhash supports, I don't think we'd consider using judyhash; our effort is likely to be better spent on improving our own hash table implementation.
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: