Closed Bug 213813 Opened 21 years ago Closed 16 years ago

FindMember could probably use optimizing

Categories

(Core :: XPConnect, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: dbradley, Assigned: dbradley)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file, 3 obsolete files)

FindMember has shown up in some cases where repeated calls into XPConnect occur,
with what I believe are objects with either many interfaces and/or interfaces
with many members. Currently this is a two phase linear search, one for
interfaces names, and another for method names within each interface.
Blocks: 212831
Status: NEW → ASSIGNED
Summary: FindMember could probably use a hash → FindMember could probably use optimizing
Attached patch Something to play with (obsolete) — Splinter Review
I've attached a patch for people with DHTML issues to play with and see if they
notice any benefit.

Originally I was thinking of using a hash. I think this might be a better
solution, though. The idea is to "cache" the most common methods hit. The
assumption is that most objects have just a few methods at a time that get hit
frequently. My fear was that a cache might create two much overhead.

The cache basically tracks the 4 last used methods. Making searching the cache
extermely quick, and misses not too expensive. The downside is that this adds
another 52 bytes per XPCNativeSet. There are generally usually no more than 300
of these at least for the runs I've done so far, so we're talking an additional
15,600 bytes which I don't think is all that much in the grand schem of things.
(Assuming there were 300 active at any given time). Also the cache tracks the
number of misses. The idea here was that if we're getting several misses in a
row, we need to turn the cache off.

I don't have any way accurately profile this now. So I thought I'd get it out
so others can take a look maybe with jprof or the like.

The patch also optimizes some of the loops in XPCNativeInterface and
XPCNativeSet. In many of the loops it was incrementing a pointer as well as a
index. I removed the index and changed to code to work with just the pointer.
It's possible some clever compiler's optimizer consolidated this, but I doubt
it. It also makes the loop more like the container/interator pattern which I
think is a plus. I've ran the code, so I don't think I broke anything in doing
this, but let me know if you encounter crashes or screwy behavior.

In just starting up and shutting down I don't see a lot of improvement. Hits
are slightly less than misses. I'm hoping the at least the gains from the hits
cover the overhead of the misses. It's hard to say because the stats I'm
currently keeping don't tell me about that. I'm going to try and add more stats
to better determine how effective this approach is.

Hashing might still be a better alternative, but I was thinking this might be a
simple approarch with less memory allocation overhread.

Stats are kept if you compile and your defined as an DEBUG_xpc_hacker.
Otherwise no stats are kept. They are printed at program termination.
Attached patch Fixed release build issues (obsolete) — Splinter Review
Comment on attachment 128862 [details] [diff] [review]
Fixed release build issues

>-    const XPCNativeMember* member = mMembers;
>-    for(int i = (int) mMemberCount; i > 0; i--, member++)
>+    for(XPCNativeMember const * member = mMembers; member < mMembers + mMemberCount;
>+         ++member)

Decent compilers can tell that mMembers and mMemberCount are loop-invariant,
but you might want to store mMembers + mMemberCount in an end local, similar to
the way you do later, just to be sure.

>-    XPCNativeMember* member = mMembers;
>-    for(int i = (int) mMemberCount; i > 0; i--, member++)
>+    for(XPCNativeMember* member = mMembers; member < mMembers + mMemberCount;
>+         ++member)

Ditto.

>+    inline
>+    void TurnCacheOff() const
>+    {
>+#ifdef DEBUG_xpc_hacker
>+        ++NonConst()->mCacheOff;
>+#endif
>+        NonConst()->mMemberCache->mName = JSVAL_VOID;

Nit: mMemberCache[0].mName might be better style, since mMemberCache is an
array.

>+    MemberCacheEntry * entry;
>+    if(IsCacheOn())
>+    {
>+        MemberCacheEntry const * const end = mMemberCache + MEMBER_CACHE_SIZE;
>+        for(entry = NonConst()->mMemberCache; entry < end; ++entry)
>+        {
>+            if(name == entry->mName)
>+            {
>+                if(entry->mIndex == -1)
>+                {
>+                    CacheHit();
>+                    NonConst()->mMisses = 0;                    
>+                    return JS_FALSE;
>+                }
>+                if(pMember)
>+                    *pMember = entry->mMember;
>+                if(pInterfaceIndex)
>+                    *pInterfaceIndex = entry->mIndex;
>+                CacheHit();
>+                NonConst()->mMisses = 0;
>+                return JS_TRUE;
>+            }
>+            // If we're at the end of the list
>+            else if(entry->mName == JSVAL_NULL)

No sense in else after return.

>+                break;
>+        }
>+        // If we're missing too many turn off the cache
>+        if (entry == mMemberCache + MEMBER_CACHE_SIZE &&
>+            ++NonConst()->mMisses == MAX_MISSES)
>+        {
>+            TurnCacheOff();
>+        }
>+        CacheMiss();

There's no way to turn the cache back on when you start getting better locality
of reference again.  Idea: use a direct mapped cache, not linear search through
a small array.	Use a good hash function and do not probe on collision,
replacing instead.  Then the cache cost is constant and smaller (it's nearly
constant now, but may be bigger than you like, especially if MEMBER_CACHE_SIZE
is tuned up).

With a direct-mapped cache, you could increase MEMBER_CACHE_SIZE to a larger
size, depending on your bloat tolerance, and not have to worry about time
complexity getting out of hand.

/be
Thanks for the ideas Brendan. I meant to use "end" in all cases, probably got
lost when I zapped my tree and forgot I hadn't saved my most recent changes and
reverted back to my last saved patch.

Yes, the direct-mapped cache would most likely yield better results. I was going
simple for now to see if this would even be worth pursuing.

I'm trying to figure out how you would know when to turn the cache back on.
Wouldn't the direct mapped cache eliminate the need for this anyway? You'd need
to know that the misses have subsided, and I'm not sure how to do that without
utilizing the cache. I did notice the cache gets turned off a fair bit, whether
that misses a lot of potential cach hits, I don't have data to support that.

One thought might be to turn the cache back on after a GC during the mark phase
or maybe when it gets reused through GetNewOrUsed
Would be great if you could provide a win32 build somewhere for testing.
Blocks: 195408
This usese a simple hash. I'm looking for input on a better algorithm. I'm
basically stripping off the tag bits of the jsval and then and'ing the last few
bits to create an index into the cache. I'm seeing a fair number of collisions,
but that may just be par for the course.

I'm still experimenting the the cache size and minimum number of members.

This seems to give some DHTML pages a 25% boost in speed. I suspect we'll see
similar gains for any JavaScript based loops that make native calls to
relatively inexpensive native functions.

I've been running with the patch for a while now and it seems pretty stable.
Attachment #128815 - Attachment is obsolete: true
Attachment #128862 - Attachment is obsolete: true
You could try out this implementation of HashName (I just
happened to see this bug at about the same time I wanted to
evaluate the FNV-1a hash, so...)

So, the two questions here are:
1) Does this appreciably decrease collisions and misses?
2) Does the expense of this HashName() compared to the old
version outweigh to benefits gained from a greater hit ratio?

I don't have a reasonable testcase or instrumentation to
answer those questions, I think.

    inline
    static PRUint32 HashName(jsval name)
    {
        // fnv-1a hash
        // fnv magic numbers
#define FNV_32_PRIME ((PRUint32)0x01000193)
#define FNV1_32_INIT ((PRUint32)0x811c9dc5)
        PRUint32 rtn = FNV1_32_INIT;
        PRUint32 value = NS_STATIC_CAST(PRUint32, name) >> JSVAL_TAGBITS;
        int i = sizeof(value); // number of bytes of value to mix
        while (i--) {
            rtn ^= (unsigned char) value; // mix in low 8 bits of value
            value >>= 8;
            rtn *= FNV_32_PRIME;
        }
        rtn ^= rtn >> 16; // fold some of the higher-bit entropy down
        rtn = rtn & PRUint32(MEMBER_CACHE_SIZE - 1);
        return rtn;
    }
Blocks: 213943
David,

Though I haven't taken a look at the XPCOM code (haven't got any time, sorry), I
have an algorithm you might be interested in. I assume that in the two phase
linear search you're talking about in your original report you are performing a
string compare. I've used this algorithm successfully in a .Net environment but
it should be work in c++ as well (though c++ is already more efficient by
nature). It's fast but I'm not sure if it beats a hashtable. That perhaps
depends on the number of items in your list.

I did the following. I generated a 32-bit code for every string in my list in
the following way:

Turn on bit 0 if the string contains an 'a' or 'A'
 "          1          "       "        'b' or 'B'
...
 "          25         "       "        'z' or 'Z'
 "          26         "       "     a digit
 "          27         "       "     a '_' or '+' or '-' ...

Other schemes are of course also posssible, whatever suits your needs. This code
works best if the strings in your list are sufficiently different. This code can
be generated very fast by scanning the string once. For even more speed you can
use a small 128 byte lookuptable containing the bitvalues which you can |.

Once you have to find a string in the list of strings, you generate this same
code for the string to be found. You can now compare this code against the other
codes. Once 2 codes match, you'll still need to check if the strings are really
equal, but in general this reduces the string compare to an int compare which is
usually much faster. If you keep the list of codes sorted you can even do a
binary search (beware that a certain code can occur multiple times), this
massively reduces the number of compares to be done if the list is sufficiently
large. 

Good luck.

ps. This algorithm works *very* well if you have to find a substring in a list
of strings (you & the codes in the compare to see if all the characters occur in
the string).
FindMember is comparing JSVal's which are basically integers and unique within
the system. So the simple hash of JSVal's is pretty fast. They essentially
represent strings.
Quick update, I'm hoping within the next week to have some performance numbers
and recommondations for setting the couple of constants for this patch. Then
hopefully we can get this in, I think we'll see some marked improvements in
DHTML as well as other areas.
Please compare the drop-in fnv-1a hash in comment 7 too.  I'm curious as to
whether the better distribution outweighs the increased complexity.  I suspect
not, but I don't have adequate instrumentation.
I did a while back and the improvement to the hit/collision rate was small to
non-existentant compared with using the magic number Brendan mentioned. And I'm
pretty sure the code generated was more expensive. I'll double check though, as
it's been a while and my memory may be in error.
I've no doubt that the hash is more expensive -- the one it replaces is
super-light.  If there's little discernable improvement in distribution then
there's no benefit at all.
Blocks: 139986, 203448
Blocks: 210602
Blocks: 21762
A chance of trying this for 1.7a ?
I ran 100 runs for each cache setting of browser startup/shutdown. This setting
was the best. It also performed better than the version without the patch. The
improvement in startup/shutdown time is minimal, around half a percent.

I'd still be interested in feedback from people who apply this patch and run
various DHTML sites.

I'd like to leave the hacker code in, it will only affect people mucking with
XPConnect.

So the patch is ready for review. Also this patch is more than just the
FindMember cache optimization, it also optimizes many of the FindMember loops.
Attachment #129404 - Attachment is obsolete: true
I've installed the patch and everything seems to work fine. I tried the
following sites. I couldn't measure any significant improved but at least I
didn't see any regression ;)

http://www.world-direct.com/mozilla/dhtml/3dcube/index.htm
http://www.world-direct.com/mozilla/dhtml/funo/dhtmltest_10to20ms.htm
http://www.world-direct.com/mozilla/dhtml/75121/anim-test.htm
http://www.world-direct.com/mozilla/dhtml/timeouttest_jpg_withoutscaling.htm

It might of course be that these tests make hardly any use of the new code. If
so please point me to some tests where this patch can show its muscles, I'll be
happy to give you some numbers.
Should it speed up getElementById ?
Yes I would expect improvement from repeated calls from JavaScript to
getElementById. Unless the cache entry gets overwritten. This is a simple direct
hash, so it's possible that if another function with the same hash value as
getElementById were called in the loop you'd actually see a slight degradation
since it would be a constant miss.
Attachment #139865 - Flags: review?(jband_mozilla)
Are we ready to bake that on the trunk yet?
Is this still planned for the future, or is it a lost cause ?
I think there may be some value in doing this, but there hasn't been any 
definitive proof this helps anything. If someone steps up and says this boosts 
some aspect of performance then I'd be willing to move forward with it. It's 
not a trivial change so there needs to be some reward for this risk.
Target Milestone: --- → Future
Are there already results available of a build with this patch running the 
DHTML tinderbox tests?
Has anyone run the DHTML tinderbox tests with this patch applied?
QA Contact: pschwartau → xpconnect
Comment on attachment 139865 [details] [diff] [review]
Same patch as before, but with cache settings set to optimal values

No need to review right now, until someone identifies a case where this provides a benefit
Attachment #139865 - Flags: review?(jband_mozilla)
(In reply to comment #24)
> (From update of attachment 139865 [details] [diff] [review] [edit])
> No need to review right now, until someone identifies a case where this
> provides a benefit

David, can you supply a win32 build ?
I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to update. Then the next hurdle is finding some place to host the build.
(In reply to comment #26)
> I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to
> update. Then the next hurdle is finding some place to host the build.
> 
The top hit on Google looks good: http://www.mediafire.com/
100% Free
Unlimited Disk Space
Up to 100MB per File
No Sign Up Required
Or you can just cheat and submit your patch to the Try server and let mozilla.org host the build:
http://wiki.mozilla.org/Build:TryServer

as a bonus, you get builds for Win32/Mac/Linux from that.
Can someone attach a minimal testcase for this bug, or set testcase-needed in Whiteboard?
Ah, I forgot: David Bradley, are you still working on this bug? If not, please reassign it to NOBODY.
I had hoped to see some feedback on the patch as to whether it improved any specific paths. From my findings there was a slight degradation in performance with only modest gains in certain paths.

At this point unless someone with perf numbers can prove otherwise I think this is a won't fix. At least using the tiny hash approach.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: