213813 - FindMember could probably use optimizing

Assignee

Description

•

21 years ago

FindMember has shown up in some cases where repeated calls into XPConnect occur,
with what I believe are objects with either many interfaces and/or interfaces
with many members. Currently this is a two phase linear search, one for
interfaces names, and another for method names within each interface.

David Bradley

Assignee

Updated

•

21 years ago

Blocks: 212831

David Bradley

Assignee

Updated

•

21 years ago

Status: NEW → ASSIGNED

Summary: FindMember could probably use a hash → FindMember could probably use optimizing

David Bradley

Assignee

Comment 1

•

21 years ago

Attached patch Something to play with (obsolete) — Details — Splinter Review

I've attached a patch for people with DHTML issues to play with and see if they
notice any benefit.

Originally I was thinking of using a hash. I think this might be a better
solution, though. The idea is to "cache" the most common methods hit. The
assumption is that most objects have just a few methods at a time that get hit
frequently. My fear was that a cache might create two much overhead.

The cache basically tracks the 4 last used methods. Making searching the cache
extermely quick, and misses not too expensive. The downside is that this adds
another 52 bytes per XPCNativeSet. There are generally usually no more than 300
of these at least for the runs I've done so far, so we're talking an additional
15,600 bytes which I don't think is all that much in the grand schem of things.
(Assuming there were 300 active at any given time). Also the cache tracks the
number of misses. The idea here was that if we're getting several misses in a
row, we need to turn the cache off.

I don't have any way accurately profile this now. So I thought I'd get it out
so others can take a look maybe with jprof or the like.

The patch also optimizes some of the loops in XPCNativeInterface and
XPCNativeSet. In many of the loops it was incrementing a pointer as well as a
index. I removed the index and changed to code to work with just the pointer.
It's possible some clever compiler's optimizer consolidated this, but I doubt
it. It also makes the loop more like the container/interator pattern which I
think is a plus. I've ran the code, so I don't think I broke anything in doing
this, but let me know if you encounter crashes or screwy behavior.

In just starting up and shutting down I don't see a lot of improvement. Hits
are slightly less than misses. I'm hoping the at least the gains from the hits
cover the overhead of the misses. It's hard to say because the stats I'm
currently keeping don't tell me about that. I'm going to try and add more stats
to better determine how effective this approach is.

Hashing might still be a better alternative, but I was thinking this might be a
simple approarch with less memory allocation overhread.

Stats are kept if you compile and your defined as an DEBUG_xpc_hacker.
Otherwise no stats are kept. They are printed at program termination.

David Bradley

Assignee

Comment 2

•

21 years ago

Attached patch Fixed release build issues (obsolete) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 3

•

21 years ago

Comment on attachment 128862 [details] [diff] [review]
Fixed release build issues

>-    const XPCNativeMember* member = mMembers;
>-    for(int i = (int) mMemberCount; i > 0; i--, member++)
>+    for(XPCNativeMember const * member = mMembers; member < mMembers + mMemberCount;
>+         ++member)

Decent compilers can tell that mMembers and mMemberCount are loop-invariant,
but you might want to store mMembers + mMemberCount in an end local, similar to
the way you do later, just to be sure.

>-    XPCNativeMember* member = mMembers;
>-    for(int i = (int) mMemberCount; i > 0; i--, member++)
>+    for(XPCNativeMember* member = mMembers; member < mMembers + mMemberCount;
>+         ++member)

Ditto.

>+    inline
>+    void TurnCacheOff() const
>+    {
>+#ifdef DEBUG_xpc_hacker
>+        ++NonConst()->mCacheOff;
>+#endif
>+        NonConst()->mMemberCache->mName = JSVAL_VOID;

Nit: mMemberCache[0].mName might be better style, since mMemberCache is an
array.

>+    MemberCacheEntry * entry;
>+    if(IsCacheOn())
>+    {
>+        MemberCacheEntry const * const end = mMemberCache + MEMBER_CACHE_SIZE;
>+        for(entry = NonConst()->mMemberCache; entry < end; ++entry)
>+        {
>+            if(name == entry->mName)
>+            {
>+                if(entry->mIndex == -1)
>+                {
>+                    CacheHit();
>+                    NonConst()->mMisses = 0;                    
>+                    return JS_FALSE;
>+                }
>+                if(pMember)
>+                    *pMember = entry->mMember;
>+                if(pInterfaceIndex)
>+                    *pInterfaceIndex = entry->mIndex;
>+                CacheHit();
>+                NonConst()->mMisses = 0;
>+                return JS_TRUE;
>+            }
>+            // If we're at the end of the list
>+            else if(entry->mName == JSVAL_NULL)

No sense in else after return.

>+                break;
>+        }
>+        // If we're missing too many turn off the cache
>+        if (entry == mMemberCache + MEMBER_CACHE_SIZE &&
>+            ++NonConst()->mMisses == MAX_MISSES)
>+        {
>+            TurnCacheOff();
>+        }
>+        CacheMiss();

There's no way to turn the cache back on when you start getting better locality
of reference again.  Idea: use a direct mapped cache, not linear search through
a small array.	Use a good hash function and do not probe on collision,
replacing instead.  Then the cache cost is constant and smaller (it's nearly
constant now, but may be bigger than you like, especially if MEMBER_CACHE_SIZE
is tuned up).

With a direct-mapped cache, you could increase MEMBER_CACHE_SIZE to a larger
size, depending on your bloat tolerance, and not have to worry about time
complexity getting out of hand.

/be

David Bradley

Assignee

Comment 4

•

21 years ago

Thanks for the ideas Brendan. I meant to use "end" in all cases, probably got
lost when I zapped my tree and forgot I hadn't saved my most recent changes and
reverted back to my last saved patch.

Yes, the direct-mapped cache would most likely yield better results. I was going
simple for now to see if this would even be worth pursuing.

I'm trying to figure out how you would know when to turn the cache back on.
Wouldn't the direct mapped cache eliminate the need for this anyway? You'd need
to know that the misses have subsided, and I'm not sure how to do that without
utilizing the cache. I did notice the cache gets turned off a fair bit, whether
that misses a lot of potential cach hits, I don't have data to support that.

One thought might be to turn the cache back on after a GC during the mark phase
or maybe when it gets reused through GetNewOrUsed

Markus Hübner

Comment 5

•

21 years ago

Would be great if you could provide a win32 build somewhere for testing.

Markus Hübner

Updated

•

21 years ago

Blocks: 195408

David Bradley

Assignee

Comment 6

•

21 years ago

Attached patch FindMember using a simple hash mechanism. (obsolete) — Details — Splinter Review

This usese a simple hash. I'm looking for input on a better algorithm. I'm
basically stripping off the tag bits of the jsval and then and'ing the last few
bits to create an index into the cache. I'm seeing a fair number of collisions,
but that may just be par for the course.

I'm still experimenting the the cache size and minimum number of members.

This seems to give some DHTML pages a 25% boost in speed. I suspect we'll see
similar gains for any JavaScript based loops that make native calls to
relatively inexpensive native functions.

I've been running with the patch for a while now and it seems pretty stable.

Attachment #128815 - Attachment is obsolete: true

Attachment #128862 - Attachment is obsolete: true

Adam D. Moss

Comment 7

•

21 years ago

You could try out this implementation of HashName (I just
happened to see this bug at about the same time I wanted to
evaluate the FNV-1a hash, so...)

So, the two questions here are:
1) Does this appreciably decrease collisions and misses?
2) Does the expense of this HashName() compared to the old
version outweigh to benefits gained from a greater hit ratio?

I don't have a reasonable testcase or instrumentation to
answer those questions, I think.

    inline
    static PRUint32 HashName(jsval name)
    {
        // fnv-1a hash
        // fnv magic numbers
#define FNV_32_PRIME ((PRUint32)0x01000193)
#define FNV1_32_INIT ((PRUint32)0x811c9dc5)
        PRUint32 rtn = FNV1_32_INIT;
        PRUint32 value = NS_STATIC_CAST(PRUint32, name) >> JSVAL_TAGBITS;
        int i = sizeof(value); // number of bytes of value to mix
        while (i--) {
            rtn ^= (unsigned char) value; // mix in low 8 bits of value
            value >>= 8;
            rtn *= FNV_32_PRIME;
        }
        rtn ^= rtn >> 16; // fold some of the higher-bit entropy down
        rtn = rtn & PRUint32(MEMBER_CACHE_SIZE - 1);
        return rtn;
    }

Boris Zbarsky [:bzbarsky]

Updated

•

21 years ago

Blocks: 213943

Rene Pronk

Comment 8

•

21 years ago

David,

Though I haven't taken a look at the XPCOM code (haven't got any time, sorry), I
have an algorithm you might be interested in. I assume that in the two phase
linear search you're talking about in your original report you are performing a
string compare. I've used this algorithm successfully in a .Net environment but
it should be work in c++ as well (though c++ is already more efficient by
nature). It's fast but I'm not sure if it beats a hashtable. That perhaps
depends on the number of items in your list.

I did the following. I generated a 32-bit code for every string in my list in
the following way:

Turn on bit 0 if the string contains an 'a' or 'A'
 "          1          "       "        'b' or 'B'
...
 "          25         "       "        'z' or 'Z'
 "          26         "       "     a digit
 "          27         "       "     a '_' or '+' or '-' ...

Other schemes are of course also posssible, whatever suits your needs. This code
works best if the strings in your list are sufficiently different. This code can
be generated very fast by scanning the string once. For even more speed you can
use a small 128 byte lookuptable containing the bitvalues which you can |.

Once you have to find a string in the list of strings, you generate this same
code for the string to be found. You can now compare this code against the other
codes. Once 2 codes match, you'll still need to check if the strings are really
equal, but in general this reduces the string compare to an int compare which is
usually much faster. If you keep the list of codes sorted you can even do a
binary search (beware that a certain code can occur multiple times), this
massively reduces the number of compares to be done if the list is sufficiently
large. 

Good luck.

ps. This algorithm works *very* well if you have to find a substring in a list
of strings (you & the codes in the compare to see if all the characters occur in
the string).

David Bradley

Assignee

Comment 9

•

21 years ago

FindMember is comparing JSVal's which are basically integers and unique within
the system. So the simple hash of JSVal's is pretty fast. They essentially
represent strings.

David Bradley

Assignee

Comment 10

•

21 years ago

Quick update, I'm hoping within the next week to have some performance numbers
and recommondations for setting the couple of constants for this patch. Then
hopefully we can get this in, I think we'll see some marked improvements in
DHTML as well as other areas.

Adam D. Moss

Comment 11

•

21 years ago

Please compare the drop-in fnv-1a hash in comment 7 too.  I'm curious as to
whether the better distribution outweighs the increased complexity.  I suspect
not, but I don't have adequate instrumentation.

David Bradley

Assignee

Comment 12

•

21 years ago

I did a while back and the improvement to the hit/collision rate was small to
non-existentant compared with using the magic number Brendan mentioned. And I'm
pretty sure the code generated was more expensive. I'll double check though, as
it's been a while and my memory may be in error.

Adam D. Moss

Comment 13

•

21 years ago

I've no doubt that the hash is more expensive -- the one it replaces is
super-light.  If there's little discernable improvement in distribution then
there's no benefit at all.

Markus Hübner

Updated

•

21 years ago

Blocks: 139986, 203448

Markus Hübner

Updated

•

21 years ago

Blocks: 210602

Markus Hübner

Updated

•

21 years ago

Blocks: 21762

Markus Hübner

Comment 14

•

21 years ago

A chance of trying this for 1.7a ?

David Bradley

Assignee

Comment 15

•

21 years ago

Attached patch Same patch as before, but with cache settings set to optimal values — Details — Splinter Review

I ran 100 runs for each cache setting of browser startup/shutdown. This setting
was the best. It also performed better than the version without the patch. The
improvement in startup/shutdown time is minimal, around half a percent.

I'd still be interested in feedback from people who apply this patch and run
various DHTML sites.

I'd like to leave the hacker code in, it will only affect people mucking with
XPConnect.

So the patch is ready for review. Also this patch is more than just the
FindMember cache optimization, it also optimizes many of the FindMember loops.

David Bradley

Assignee

Updated

•

21 years ago

Attachment #129404 - Attachment is obsolete: true

Rene Pronk

Comment 16

•

21 years ago

I've installed the patch and everything seems to work fine. I tried the
following sites. I couldn't measure any significant improved but at least I
didn't see any regression ;)

http://www.world-direct.com/mozilla/dhtml/3dcube/index.htm
http://www.world-direct.com/mozilla/dhtml/funo/dhtmltest_10to20ms.htm
http://www.world-direct.com/mozilla/dhtml/75121/anim-test.htm
http://www.world-direct.com/mozilla/dhtml/timeouttest_jpg_withoutscaling.htm

It might of course be that these tests make hardly any use of the new code. If
so please point me to some tests where this patch can show its muscles, I'll be
happy to give you some numbers.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 17

•

21 years ago

Should it speed up getElementById ?

David Bradley

Assignee

Comment 18

•

21 years ago

Yes I would expect improvement from repeated calls from JavaScript to
getElementById. Unless the cache entry gets overwritten. This is a simple direct
hash, so it's possible that if another function with the same hash value as
getElementById were called in the loop you'd actually see a slight degradation
since it would be a constant miss.

timeless

Updated

•

20 years ago

Attachment #139865 - Flags: review?(jband_mozilla)

Markus Hübner

Comment 19

•

20 years ago

Are we ready to bake that on the trunk yet?

Jo Hermans

Comment 20

•

19 years ago

Is this still planned for the future, or is it a lost cause ?

David Bradley

Assignee

Comment 21

•

19 years ago

I think there may be some value in doing this, but there hasn't been any 
definitive proof this helps anything. If someone steps up and says this boosts 
some aspect of performance then I'd be willing to move forward with it. It's 
not a trivial change so there needs to be some reward for this risk.

Target Milestone: --- → Future

Markus Hübner

Comment 22

•

19 years ago

Are there already results available of a build with this patch running the 
DHTML tinderbox tests?

Markus Hübner

Comment 23

•

19 years ago

Has anyone run the DHTML tinderbox tests with this patch applied?

Reed Loden [:reed]

Updated

•

18 years ago

QA Contact: pschwartau → xpconnect

David Bradley

Assignee

Comment 24

•

18 years ago

Comment on attachment 139865 [details] [diff] [review]
Same patch as before, but with cache settings set to optimal values

No need to review right now, until someone identifies a case where this provides a benefit

Attachment #139865 - Flags: review?(jband_mozilla)

Wayne Mery (:wsmwk)

Comment 25

•

18 years ago

(In reply to comment #24)
> (From update of attachment 139865 [details] [diff] [review] [edit])
> No need to review right now, until someone identifies a case where this
> provides a benefit

David, can you supply a win32 build ?

David Bradley

Assignee

Comment 26

•

18 years ago

I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to update. Then the next hurdle is finding some place to host the build.

Cees T.

Comment 27

•

17 years ago

(In reply to comment #26)
> I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to
> update. Then the next hurdle is finding some place to host the build.
> 
The top hit on Google looks good: http://www.mediafire.com/
100% Free
Unlimited Disk Space
Up to 100MB per File
No Sign Up Required

(not currently active) Ted Mielczarek

Comment 28

•

17 years ago

Or you can just cheat and submit your patch to the Try server and let mozilla.org host the build:
http://wiki.mozilla.org/Build:TryServer

as a bonus, you get builds for Win32/Mac/Linux from that.

Lucas Malor (mail: c6kfnkn2uc AT snkmail DOT c0m)

Comment 29

•

16 years ago

Can someone attach a minimal testcase for this bug, or set testcase-needed in Whiteboard?

Lucas Malor (mail: c6kfnkn2uc AT snkmail DOT c0m)

Comment 30

•

16 years ago

Ah, I forgot: David Bradley, are you still working on this bug? If not, please reassign it to NOBODY.

David Bradley

Assignee

Comment 31

•

16 years ago

I had hoped to see some feedback on the patch as to whether it improved any specific paths. From my findings there was a slight degradation in performance with only modest gains in certain paths.

At this point unless someone with perf numbers can prove otherwise I think this is a won't fix. At least using the tiny hash approach.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → WONTFIX

Something to play with 21 years ago David Bradley 16.42 KB, patch		Details \| Diff \| Splinter Review
Fixed release build issues 21 years ago David Bradley 16.45 KB, patch		Details \| Diff \| Splinter Review
FindMember using a simple hash mechanism. 21 years ago David Bradley 16.37 KB, patch		Details \| Diff \| Splinter Review
Same patch as before, but with cache settings set to optimal values 21 years ago David Bradley 16.46 KB, patch		Details \| Diff \| Splinter Review