Last Comment Bug 140758 - [FIX]Cache results of getElementsByTagName on the document
: [FIX]Cache results of getElementsByTagName on the document
Status: VERIFIED FIXED
: embed, perf, topembed-
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: All All
: P1 normal (vote)
: mozilla1.0
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on:
Blocks: 118933 132974
  Show dependency treegraph
 
Reported: 2002-04-28 10:04 PDT by Boris Zbarsky [:bz]
Modified: 2008-07-31 02:32 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v 1.0 (15.20 KB, patch)
2002-05-03 21:16 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.1 (14.44 KB, patch)
2002-05-05 13:13 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.2 (14.85 KB, patch)
2002-05-06 16:49 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.3 (14.67 KB, patch)
2002-05-06 19:24 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.4 (14.91 KB, patch)
2002-05-06 21:14 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
kind of like this? (14.84 KB, patch)
2002-05-06 22:30 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.73 (14.64 KB, patch)
2002-05-07 13:28 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Patch v 1.732 (15.99 KB, patch)
2002-05-07 17:24 PDT, Boris Zbarsky [:bz]
jst: superreview+
Details | Diff | Review
Patch v 1.73205 (17.29 KB, patch)
2002-05-07 22:57 PDT, Boris Zbarsky [:bz]
jst: superreview+
jud: approval+
Details | Diff | Review
Add a global one-list cache that holds on to the last requested list... (2.64 KB, patch)
2002-05-21 09:27 PDT, Johnny Stenback (:jst, jst@mozilla.com)
bzbarsky: review+
rbs: superreview+
Details | Diff | Review

Description Boris Zbarsky [:bz] 2002-04-28 10:04:10 PDT
Right now, every call to getElementsByTagName produces a new nsContentList
object.  This is very expensive if getElementsByTagName is being called in a
loop or even just often between garbage collections.

Per discussion with jst, we should cache these content lists and return them,
instead of creating new objects.
Comment 1 Boris Zbarsky [:bz] 2002-04-28 10:08:20 PDT
adding some bugs that show how we could benefit from this to dependency list. 
In particular, the table at
http://bugzilla.mozilla.org/show_bug.cgi?id=118933#c72 shows how much we have to
go and the comments at http://bugzilla.mozilla.org/show_bug.cgi?id=132974#c5
shows a possible win from this caching.
Comment 2 Boris Zbarsky [:bz] 2002-05-03 21:16:42 PDT
Created attachment 82313 [details] [diff] [review]
Patch v 1.0
Comment 3 Boris Zbarsky [:bz] 2002-05-03 21:19:55 PDT
This patch makes us about 2.5 times faster on tight loops which call
getElementsByTagName (tested on
http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm and the
page in bug 132974).

Reviews?  Testing?  I have not run this through a rigorous "did I break
getElementsByTagName?" test yet, so if someone wants to do that... :)
Comment 4 Daniel Bratell 2002-05-04 05:18:15 PDT
I don't see how you keep the cache consistent when someone adds or removes tags
from the page dynamically. Am I missing something? I don't know much about this
code.
Comment 5 Boris Zbarsky [:bz] 2002-05-04 07:29:39 PDT
Addition/removal of tags is taken care of by the content lists themselves (they
observe the document).  This patch just changes the following situation:

var a = document.getElementsByTagName("a");
var b = document.getElementsByTagName("a");

Before this change |a| and |b| would end up as two separate objects, both
observing the document and responding to content changes.  With this patch, we
cache the content list on the first call and just pass back a pointer to the
cached content list on the second call, so |a| and |b| end up being the same object.
Comment 6 Fabian Guisset 2002-05-04 07:42:33 PDT
Comment on attachment 82313 [details] [diff] [review]
Patch v 1.0

>+      nsContentList* contentList = entry->mContentList;
>+
>+      if (contentList &&
>+          contentList->GetDocument() == aDocument &&
>+          contentList->GetMatchAtom() == aMatchAtom &&
>+          contentList->GetMatchNamespaceId() == aMatchNameSpaceId &&
>+          contentList->GetRootContent() == aRootContent) {
>+        *aInstancePtrResult = contentList;
>+      }

Why are all those checks needed? The hash key encodes all those properties of
the content
content list, so they must be equal, must they not?

Apart from the fact that I don't know the hash table code, this looks great :-)
Comment 7 Boris Zbarsky [:bz] 2002-05-04 08:48:09 PDT
> The hash key encodes all those properties of the content
> content list, so they must be equal, must they not?

Well, no.  There's the possibility of hash collisions....  Not very likely, but
possible.
Comment 8 Boris Zbarsky [:bz] 2002-05-05 13:13:40 PDT
Created attachment 82421 [details] [diff] [review]
Patch v 1.1

Same thing, but doesn't touch nsIContentList; no one who doesn't use raw
nsContentList objects needs to know about this.
Comment 9 rbs 2002-05-05 16:35:58 PDT
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

+PR_STATIC_CALLBACK(PLDHashNumber)
+ContentListHashHashKey(PLDHashTable *table, const void *key)

Calling these functions |ContentListHashTable...etc| will avoid
a weirdname such as the above.

================
+  if (--gContentListCount == 0 && gContentListHashTableInited) {
+    PL_DHashTableFinish(&gContentListHashTable);
+    gContentListHashTableInited = PR_FALSE;
+  }

The patch is not adding all content lists to the HT (only one 
constructor is calling AddToHashTable(), so |gContentListCount|
is not guaranteed to be in sync with what is in the HT.

Are there any users of the other constructors? Since you now want
users to create a list via the single entry-point of the factory
function that you just added, you might as well fold the other
constructors under the umbrella of the factory to avoid confusion --
even if they don't add to the hash. 

================
 nsContentList::~nsContentList()
 {
+  RemoveFromHashtable();
   if (mDocument) {

 NS_IMETHODIMP
 nsContentList::DocumentWillBeDestroyed(nsIDocument *aDocument)
 {
   if (mDocument) {
+    // Our key will change... Best remove ourselves before that happens.
+    RemoveFromHashtable();

Why is the first one removed unconditionnally and the second one isn't?
In the first one, what happens in the case of a shared list as you noted
in comment #5 (in a C++ side, |b| could die when going out of certain
scope for example). You should perhaps refcount the entries instead?
(this could cater for the |gContentListCount| mentioned above).

================
+  inline nsIDocument* GetDocument(void) { return mDocument; }
+  inline nsIAtom* GetMatchAtom(void) { return mMatchAtom; }
+  inline PRInt32 GetMatchNamespaceId(void) { return mMatchNameSpaceId; }
+  inline nsIContent* GetRootContent(void) { return mRootContent; }

Not necessary to add the |inline| keyword since they are already inline
by virtue of being unrolled in the class declaration itself.
Comment 10 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-05 16:36:15 PDT
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

>+// Hashtable for storing nsContentLists
>+static PLDHashTable gContentListHashTable;
>+static PRBool gContentListHashTableInited = PR_FALSE;

Since the standard guarantees null-ness of static memory,
you can test gContentListHashTable.ops to see whether it's
been initialized.  (If you maintain the invariant that you
destroy the table when its size hits zero, you could also
use entryCount.)

>+static PRUint32 gContentListCount = 0;

Can you just use gContentListHashTable.entryCount ?

>+class ContentListHashEntry : public PLDHashEntryHdr

Perhaps |struct|?

>+{
>+public:
>+  ContentListHashEntry(const PRUint32 aKey)
>+    : mKey(aKey), mContentList(nsnull)
>+  {
>+  }

pldhash will zero-initialize the entry space for you
(although your customized clearEntry breaks that), so
you don't need to null-initialize.  It's more traditional
not to use ctors/dtors, skip the initEntry callback, and
just fill in your key manually where you fill in the value.

>+  ~ContentListHashEntry()
>+  {
>+  }

Why bother writing an empty destructor just so you can
write a customized clearEntry callback that calls it?

>+PR_STATIC_CALLBACK(const void *)
>+ContentListHashGetKey(PLDHashTable *table, PLDHashEntryHdr *entry)
>+{
>+  ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry);
>+  return &e->mKey;

In a patch that I haven't got working yet, I used NS_INT32_TO_PTR
here (and it and NS_PTR_TO_INT32 in other places, rather than using
pointer-to-integer).

However, this is a moot point given the advice below.

>+// Calculate the hash key.
>+
>+// The most variable part of the key will likely be the aMatchAtom, so
>+// take the low 12 bits of that, drop the last two, and use those 10
>+// as the low 10 bits of our key.
>+// The next 6 bits of the key will be the six low bits of the namespace id
>+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of
>+// aRootContent
>+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of
>+// aDocument
>+static const PRUint32
>+CalculateHashKey(nsIDocument* aDocument, nsIAtom* aMatchAtom,
>+                 PRInt32 aMatchNameSpaceId, nsIContent* aRootContent)
>+{
>+  NS_ASSERTION(sizeof(PRWord) == sizeof(void *),
>+               "PRWord and void* not the same, so this is likely going to go astray");
>+  return
>+    ((NS_REINTERPRET_CAST(PRWord, aMatchAtom) & 0xFFF) >> 2) |
>+    ((aMatchNameSpaceId & 0x3F) << 10) |
>+    (((NS_REINTERPRET_CAST(PRWord, aRootContent) & 0x3FF) >> 2) << 16) |
>+    (((NS_REINTERPRET_CAST(PRWord, aDocument) & 0x3FF) >> 2) << 24);
>+}

OK, now I see what you're doing.  So what you really want to do here
is have 2-word entries rather than three word entries, and you can
use more of the stub ops since you'll look exactly like
PLDHashEntryStub (except for a typed |ContentList* mContentList|
instead of void *key).	You'll then use the nsContentList as your
key, essentially, although I think the best way to do this might to
create |nsContentListKey| with those 4 members, and make nsContentList
derive from that (protected inheritance, with a |GetHashKey()| member
like the ones you added), and make your |getKey| callback do an NS_STATIC_CAST
to nsContentListKey* from the content list, and then you can put
the hash calculation stuff into the |hashKey| callback and just construct
a temporary nsContentListKey on the stack when you want to do a lookup.
This seems a bit simpler / more elegant to me, although it does have the
key construction on the stack (which should be simple enough, though).
(BTW, you're probably better off not ignoring so many bits and just
XORing on top of things.)  In fact, you *have* to do this because otherwise
key collisions within your 32-bit int space will cause entries to be lost.

>+NS_EXPORT nsresult
>+NS_GetContentList(nsContentList** aInstancePtrResult,
>+                  nsIDocument* aDocument, nsIAtom* aMatchAtom,
>+                  PRInt32 aMatchNameSpaceId, nsIContent* aRootContent)
>+{
>+  *aInstancePtrResult = nsnull;
>+  // First we look in our hashtable.  Then we create a content list if needed
>+  if (gContentListHashTableInited) {

Again, check .ops here.

>+    const PRUint32 hashKey = CalculateHashKey(aDocument, aMatchAtom,
>+                                              aMatchNameSpaceId, aRootContent);
>+    
>+    ContentListHashEntry *entry =
>+      NS_STATIC_CAST(ContentListHashEntry *,
>+                     PL_DHashTableOperate(&gContentListHashTable,
>+                                          &hashKey,
>+                                          PL_DHASH_LOOKUP));
>+    if (PL_DHASH_ENTRY_IS_LIVE(entry)) {
>+      nsContentList* contentList = entry->mContentList;
>+
>+      if (contentList &&
>+          contentList->GetDocument() == aDocument &&
>+          contentList->GetMatchAtom() == aMatchAtom &&
>+          contentList->GetMatchNamespaceId() == aMatchNameSpaceId &&
>+          contentList->GetRootContent() == aRootContent) {

Every single one of these checks should be true if you write
matchEntry correctly, given the above changes to the entry structure.

>+        *aInstancePtrResult = contentList;
>+      }
>+    }
>+  }
>+
>+  if (!*aInstancePtrResult) {
>+    // We need to create one
>+    *aInstancePtrResult = new nsContentList(aDocument, aMatchAtom,
>+                                            aMatchNameSpaceId, aRootContent);

Do you want to add it to the hashtable here, in general?  (If so,
then the simple solution is to do a PL_DHASH_ADD instead of a 
PL_DHASH_LOOKUP, and just fill it in if you have to create it.)

>+  if (--gContentListCount == 0 && gContentListHashTableInited) {
>+    PL_DHashTableFinish(&gContentListHashTable);
>+    gContentListHashTableInited = PR_FALSE;

If this is all you're using the |gContentListCount| for,
then it certainly could just be the entryCount on the
table.

>+// Support functions for hashing content lists
>+void
>+nsContentList::AddToHashtable()
>+{
>+  static PLDHashTableOps hash_table_ops =
>+  {
>+    PL_DHashAllocTable,
>+    PL_DHashFreeTable,
>+    ContentListHashGetKey,
>+    ContentListHashHashKey,
>+    ContentListHashMatchEntry,
>+    PL_DHashMoveEntryStub,
>+    ContentListHashClearEntry,
>+    PL_DHashFinalizeStub,
>+    ContentListHashInitEntry
>+  };
>+    
>+  if (!gContentListHashTableInited) {

You can check against the table's .ops here.

>+    gContentListHashTableInited = PL_DHashTableInit(&gContentListHashTable,
>+                                                    &hash_table_ops, nsnull,
>+                                                    sizeof(ContentListHashEntry),
>+                                                    16);
>+
>+    if (!gContentListHashTableInited)

You'll want to null .ops on failure here, though.

>+      return;
>+  }
>+
>+  const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom,
>+                                            mMatchNameSpaceId, mRootContent);
>+  
>+  ContentListHashEntry *entry =
>+    NS_STATIC_CAST(ContentListHashEntry *,
>+                   PL_DHashTableOperate(&gContentListHashTable,
>+                                        &hashKey, PL_DHASH_ADD));
>+  if (!entry)
>+    return;
>+
>+  entry->mContentList = this;

Again, this could be much less complicated (see above).

>+void
>+nsContentList::RemoveFromHashtable()
>+{
>+  if (!gContentListHashTableInited)
>+    return;
>+  
>+  const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom,
>+                                            mMatchNameSpaceId, mRootContent);
>+  
>+  ContentListHashEntry *entry =
>+    NS_STATIC_CAST(ContentListHashEntry *,
>+                   PL_DHashTableOperate(&gContentListHashTable,
>+                                        &hashKey,
>+                                        PL_DHASH_LOOKUP));
>+  if (!PL_DHASH_ENTRY_IS_LIVE(entry))
>+    return;
>+  
>+  if (entry->mContentList != this)
>+    return;  // Not us!
>+
>+  PL_DHashTableRawRemove(&gContentListHashTable, entry);

Given the changes in key structure, this could just be
a PL_DHASH_REMOVE.
Comment 11 Boris Zbarsky [:bz] 2002-05-05 19:11:06 PDT
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

Re: comment 9:

> |gContentListCount| is not guaranteed to be in sync with what is in the HT.

It need not be.  I just wanted to make sure we called finish() on the hashtable
before shutdown.. think of it as a poor man's shutdown observer... :)

David's comments supercede this, in any case.

> Are there any users of the other constructors?

There are, but I'm not sure I want to define multiple overloaded copies of this
function most of which will do nothing but call the constructor.

Also, note that this is not exactly a factory function because it does not
addref the out param, hence it not being called NS_New*.

> Why is the first one removed unconditionnally and the second one isn't?

Because there is no reason to RemoveFromHashtable() in the case when our hash
key is not going to change?  On the other hand, removing ourselves in the
destructor is a must -- the hash table holds no reference and we want to avoid
stale pointers here.

> In the first one, what happens in the case of a shared list 

Absolutely nothing.  As long as _someone_'s holding a reference to the
nsContentList (these are refcounted, btw) the destructor is not going to be
called.  If the destructor is being called that means no one is holding a
reference to the content list anymore.

Good call on the inline thing.

Re: Comment 10 and comment 11:

Ooh.  Nice.  :)  Updated patch coming up.
Comment 12 Boris Zbarsky [:bz] 2002-05-06 16:49:28 PDT
Created attachment 82565 [details] [diff] [review]
Patch v 1.2

Changes:

Address David's comments, using the nsContentList as the key.  Use .ops to test
whether the table is initialized and .entryCount to decide when to finish the
table.	Move all the hash-addition code to NS_GetContentList, so people who
want to use the hashtable can call that and people who don't can use the raw
|new|.
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 18:28:41 PDT
Comment on attachment 82565 [details] [diff] [review]
Patch v 1.2

>+#define ROTATE(num, bits) (((num) << bits) | ((num) >> (32-bits)))
>+
>+PRUint32
>+nsContentListKey::GetHash(void) const
>+{
>+  // mMatchNameSpaceId is not likely to get big, so give it only 6 bits.
>+  // mMatchAtom is likely the most variable part, so give it 10 bits.
>+  // The other two pointers get 8 bits each.
>+  return
>+    NS_PTR_TO_INT32(mMatchAtom.get()) ^
>+    ROTATE(mMatchNameSpaceId, 10) ^
>+    ROTATE(NS_PTR_TO_INT32(mRootContent), 16) ^
>+    ROTATE(NS_PTR_TO_INT32(mDocument), 24);
>+}

It's probably ok to just do

return NS_PTR_TO_INT32(mMatchAtom.get()) ^
       (NS_PTR_TO_INT32(mRootContent) << 8) ^
       (NS_PTR_TO_INT32(mDocument) << 16) ^
       mMatchNameSpaceId << 24;

>+PR_STATIC_CALLBACK(const void *)
>+ContentListHashtableGetKey(PLDHashTable *table, PLDHashEntryHdr *entry)
>+{
>+  ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry);
>+  return e->mContentList;

Hmm.  I would think this needs to be

    return NS_STATIC_CAST(nsContentListKey*, e->mContentList);

although actually you might need a member variable on nsContentList to
do that since it's protected inheritance.  I'm surprised this
worked (or did this cost you the perf gain?).

>+  ContentListHashEntry *entry = nsnull;
>+  // First we look in our hashtable.  Then we create a content list if needed
>+  if (gContentListHashTable.ops) {
>+    nsContentListKey hashKey(aDocument, aMatchAtom,
>+                             aMatchNameSpaceId, aRootContent);
>+    
>+    // A PL_DHASH_ADD is equivalent to a PL_DHASH_LOOKUP for cases
>+    // when the entry is already in the hashtable.
>+    entry =
>+      NS_STATIC_CAST(ContentListHashEntry *,
>+                     PL_DHashTableOperate(&gContentListHashTable,
>+                                          &hashKey,
>+                                          PL_DHASH_ADD));
>+      *aInstancePtrResult = entry->mContentList;

This is misindented, and it assumes you didn't hit out-of-memory
by dereferencing entry (which you're more carefuly about below).

>+void
>+nsContentList::RemoveFromHashtable()
>+{
>+  if (!gContentListHashTable.ops)
>+    return;
>+  
>+  nsContentListKey hashKey(mDocument, mMatchAtom, mMatchNameSpaceId, mRootContent);

You don't need to construct a key -- you can just use
NS_STATIC_CAST(nsContentListKey*, this)

> class nsContentList : public nsBaseContentList,
>+                      protected nsContentListKey,

Given the |protected| members above, maybe this could
just be |public|, at least if you give the |public|
member functions les confusing names.
Comment 14 Boris Zbarsky [:bz] 2002-05-06 19:06:24 PDT
> I'm surprised this worked

I am too, now that you point it out... :)
This is why I was using stack-allocated keys when it was not really needed --
otherwise it did _not_ work.  Thanks for pointing out why....

I hate casts, some days... Patch with all that fixed coming up.

Comment 15 Boris Zbarsky [:bz] 2002-05-06 19:24:39 PDT
Created attachment 82586 [details] [diff] [review]
Patch v 1.3

Addresses comment 13's issues on casting and OOM.  In fact, I've dropped my
custom hash entry struct and the custom getKey callback, since the stubs work
just fine once I do the casting right.

I'd rather keep the inheritance |protected|, but it's jst's call on what he
wants in his module...
Comment 16 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 19:39:44 PDT
How can you drop the getKey callback?  The one you had before was equivalent to
the stub, but I said in comment 13 that it was broken.
Comment 17 Boris Zbarsky [:bz] 2002-05-06 20:11:44 PDT
The brokenness was in having an nsContentList* member that I was returning
without casting to nsContentListKey*.  Since nsContentListKey is not the first
class nsContentList inherits from, taking that return value and casting to
nsContentListKey* was bad (it gave a pointer to the wrong data).  Essentially,
it was doing

NS_STATIC_CAST(nsContentListKey*, NS_STATIC_CAST(void*, nsContentList* foo))

which is bad.

The new version makes sure that I always cast to nsContentListKey* before
writing the key and always cast to nsContentListKey* after reading the key.  So
the conversion is always nsContentListKey* --> void* or the other way, which is
consistent.  So this should be correct, unless I deeply misunderstand how
casting works in C++ (which is entirely possible, btw).
Comment 18 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 20:44:56 PDT
You still need a |getKey| callback.  It's only used when resizing the table, so
you might not notice the bugs that the lack of it would cause.  And, as I
mentioned yesterday, you might want to switch which of the things you shift when
computing the hash, since you know that only the low bits of the namespace are
significant, and you can expect that the document is somewhat less significant
given that you're already hashing on the root content node.
Comment 19 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 20:46:13 PDT
(The reason you need the getKey callback is to enforce the casting that you
mention in comment 17.  And it should just be what I mentioned in comment 13.)
Comment 20 Boris Zbarsky [:bz] 2002-05-06 21:14:46 PDT
Created attachment 82596 [details] [diff] [review]
Patch v 1.4

OK, that makes sense.  Adds the getKey callback.
Comment 21 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 21:35:58 PDT
Comment on attachment 82596 [details] [diff] [review]
Patch v 1.4

OK, now that I look more closely, it still isn't right, because you're actually
using the PLDHashEntryStub type, which has a |void *key|.  You're probably
better off defining your own type that has the exact same layout as the stub
type (and is derived from PLDHashEntryHdr), but that uses a typed
|nsContentList *mContentList|, so that you can use fewer casts.
Comment 22 Boris Zbarsky [:bz] 2002-05-06 22:30:46 PDT
Created attachment 82608 [details] [diff] [review]
kind of like this?

Moves all the icky casting into the callback functions.
Comment 23 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-06 22:58:09 PDT
Comment on attachment 82608 [details] [diff] [review]
kind of like this?

Yeah, the hashtable stuff looks good to me now.

>+    if (entry)
>+      entry->mContentList = *aInstancePtrResult;

If you're worried about allocation failure for the content
list, you might want to change this to be:

if (!*aInstancePtrResult) {
  if (entry)
    PL_DHashTableRawRemove(gContentListTable, entry)
  return NS_ERROR_OUT_OF_MEMORY;
}
if (entry)
  entry->mContentList = *aInstancePtrResult;

(and perhaps even use a local variable instead of
*aInstancePtrResult).

>+  inline PRBool Equals(const nsContentListKey& aContentListKey) const;
>+  inline PRUint32 GetHash(void) const;

Maybe just write the inline functions inline in the class?  Or at
least use the inline keyword at the definition as well?

Other than that, r=dbaron if you get module-owner review as your sr=.
Comment 24 Boris Zbarsky [:bz] 2002-05-07 13:28:36 PDT
Created attachment 82681 [details] [diff] [review]
Patch v 1.73

Move inlines to header, remove the entries if a new list cannot be allocated.
Comment 25 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-07 13:49:16 PDT
Comment on attachment 82681 [details] [diff] [review]
Patch v 1.73

>+    list = new nsContentList(aDocument, aMatchAtom,
>+                                            aMatchNameSpaceId, aRootContent);

weird indentation.

Other than that r=dbaron, as I said 2 comments back.
Comment 26 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-07 13:58:32 PDT
Comment on attachment 82681 [details] [diff] [review]
Patch v 1.73

+nsContentListKey::nsContentListKey(nsIDocument *aDocument,
+				    nsIAtom* aMatchAtom, 
+				    PRInt32 aMatchNameSpaceId,
+				    nsIContent* aRootContent)
+  : mMatchAtom(aMatchAtom),
+    mMatchNameSpaceId(aMatchNameSpaceId),
+    mDocument(aDocument),
+    mRootContent(aRootContent)
+{
+}
+
+nsContentListKey::nsContentListKey(const nsContentListKey& aContentListKey)
+  : mMatchAtom(aContentListKey.mMatchAtom),
+    mMatchNameSpaceId(aContentListKey.mMatchNameSpaceId),
+    mDocument(aContentListKey.mDocument),
+    mRootContent(aContentListKey.mRootContent)
+{
+}

I vould've made the above inline, but either way...

- In NS_GetContentList():

+  nsContentList* list;

You'll need to initialize |list| to null or you could end up accessing it w/o
ever having set it in error cases.

And NS_GetContentList() really needs to refcount the list that it returns, not
doing that in a method whose signature looks like it really would do that (even
if the type of the out param is a concrete class) is not cool. Also, you're
potentially passing back shared content lists, so there's no way for the caller
to do the right thing in error cases.

+    list = new nsContentList(aDocument, aMatchAtom,
+					     aMatchNameSpaceId, aRootContent);

Fix the next-line indentation.

- In nsContentList::RemoveFromHashtable():

+  (void) PL_DHashTableOperate(&gContentListHashTable,
+			       NS_STATIC_CAST(nsContentListKey*, this),
+			       PL_DHASH_REMOVE);

Loose the (void) noncense here :-)

- In nsContentList.h:

+  nsIDocument* mDocument;
+  nsIContent* mRootContent;

Add "// Weak" comments after these...

Other than that, this looks good so far. Fix the above and I'll have one more
look...
Comment 27 Boris Zbarsky [:bz] 2002-05-07 17:24:31 PDT
Created attachment 82739 [details] [diff] [review]
Patch v 1.732

Fix those issues.
Comment 28 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-07 17:47:38 PDT
Comment on attachment 82739 [details] [diff] [review]
Patch v 1.732

sr=jst
Comment 29 Boris Zbarsky [:bz] 2002-05-07 21:22:37 PDT
So... the Win32 compiler won't let me cast entry->mContentList to
nsContentListKey*, presumably because of the protected inheritance...  I suppose
I could just make it publicly inherited; any reason not to do that?
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2002-05-07 21:39:47 PDT
The simple alternative is to add a GetKey member function:

nsContentListKey* GetKey() {
  return NS_STATIC_CAST(nsContentListKey*, this);
}

The only reason I can think of to avoid making it public is that the public
member functions of the key class might not make sense as names of members of
nsContentList.
Comment 31 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-07 22:29:20 PDT
dbaron's suggestion sounds good to me.
Comment 32 Boris Zbarsky [:bz] 2002-05-07 22:57:47 PDT
Created attachment 82782 [details] [diff] [review]
Patch v 1.73205

Make this compile on Windows/Mac/Irix/etc, per dbaron's suggestion.
Comment 33 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-08 09:35:22 PDT
Comment on attachment 82782 [details] [diff] [review]
Patch v 1.73205

sr=jst
Comment 34 Boris Zbarsky [:bz] 2002-05-08 18:39:42 PDT
Er.. the last part of that diff (to nsRuleNode.cpp) has nothing to do with this
bug.  The rest has been checked in on the trunk; I'll ask for branch approval in
a day or two.
Comment 35 Doron Rosenberg (IBM) 2002-05-09 10:46:07 PDT
nominating for 1.0.0 and making topembed.
Comment 36 Winnie Lam 2002-05-15 10:30:59 PDT
topembed-, embed.  not directly impacting embedding customers but it'd be a good
fix for performance.
Comment 37 Doron Rosenberg (IBM) 2002-05-16 18:55:39 PDT
nominating for nsbeta1, making us faster than IE is always good for releases :)
Comment 38 Boris Zbarsky [:bz] 2002-05-19 19:04:22 PDT
Drivers are leery of taking this for 1.0. Marking fixed to get off my radar, as
it's fixed on the trunk.
Comment 39 rbs 2002-05-20 00:35:11 PDT
Cc:ing boullet who did some timings of Mozilla vs. IE in bug 118933 comment 92
(where |getElementsByTagName| is about 4x slower compared to IE 5.5), maybe getting
some real numbers here can help drivers to quantify the benefit/risk w.r.t. the
competition.
Comment 40 Boris Zbarsky [:bz] 2002-05-20 08:05:27 PDT
As a note, I sent information on that in the approval request (I had people on
#mozillazine test both IE and Mozilla on those exact testcases).

This may be landing for 1.0.1, but 1.0 is being very much locked down at this point.
Comment 41 rbs 2002-05-20 13:53:42 PDT
It would be interesting to have an idea of much improvement the patch brought
about (there is still life after m1.0 :-) Got some measurements to share with
us?
Comment 42 Boris Zbarsky [:bz] 2002-05-20 15:18:38 PDT
Sure thing.  Testing with stock nightly builds from the morning and the evening
of the day I checked this in, on Linux, on the testcase at
http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm:



2002-05-08-09

Average (1runs) :1655ms
Average (5runs) :1876ms
Average (10runs) :2047ms

2002-05-08-21

Average (1runs) :787ms
Average (5runs) :736ms
Average (10runs) :739ms

So a factor of 2-3 improvement.
Comment 43 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-20 15:40:18 PDT
... and that's on a tiny document, the savings on a larger document should be an
order of magnitude in normal useage (depending on the useage patterns, of course).

Oh, and by the way Mr. "not reading bugmail till June 12", stop reading your
bugmail! :-)
Comment 44 Marc Boullet 2002-05-21 07:14:32 PDT
Here are the results with 2002052008
Mozilla is now very close to IE5.5 on getElementByTagName and I don't see
anymore the GC effect that was making each run time very different.

+---------------------+--------+----------+-------+----------+
|Test                 | before | w/ 82782 | Gain% |   IE5.5  |
|---------------------|--------|----------|-------|----------|
|getElementById       |1169 ms | 1095 ms  |  ---  |  972 ms  |
|---------------------|--------|----------|-------|----------|
|getElementsByTagName |4119 ms | 1185 ms  |  71%  | 1093 ms  |
|---------------------|--------|----------|-------|----------|
|createElement        |2193 ms | 2116 ms  |  ---  |  760 ms  |
|---------------------|--------|----------|-------|----------|
|getAttribute         |1472 ms | 1398 ms  |  ---  |  496 ms  |
|---------------------|--------|----------|-------|----------|
|setAttribute         |1425 ms | 1336 ms  |  ---  |  541 ms  |
+---------------------+--------+----------+-------+----------+
Comment 45 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-21 09:03:54 PDT
That's awesome! The "GC effect" that we used to see here was due to the fact
that we create so many new content lists and GC'ing all those took a long time,
and the fact that we created so many of them could also have been the reason for
running the GC in the first place. Either way, even if we still end up running
the GC during these tests, it should be really really fast, and even if it does
potentially (and intentionally) invalidate our cached result from
getElementsByTagName(), we're still much much faster since even in a trivial
case where we execute one getElementsByTagName() per branch we'd now end up
walking the tree looking for those elements once every 4096 branch executions
(assuming the GC is run that often even if we create no new objects per branch
execution), where we used to walk the tree for every single branch execution
(and create a new JSObject, not to mention adding one document observer per
branch execution as well).

We could, and I think we might want to do this, create an internal cache that
would hold on to the last requested content list that we get from
getElementsByTagName(), that way the last created content list would survive a
GC run, even if the XPConnect wrapper for that content list might go away. That
would save us the tree walking-after-GC that we might now end up doing in these
testcases at least. Not a big deal on real-life pages I would imagine, but doing
this would also have the benefit of possibly helping poorly written code that's
running in a non-GC environment (read C++), i.e. a similar testcase written in
C++ where getElementsByTagName() would be called in a tight loop and the result
would be released between every call, the current cache would do us no good in
such a case, but if we'd cache the last requested content list, we'd help code
like that.

Thoughts? Bz (or someone else), wanna add one more tiny cache to this code?
Comment 46 Boris Zbarsky [:bz] 2002-05-21 09:20:00 PDT
Would this "one-list cache" be global, per-document, or what?
Comment 47 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-21 09:24:54 PDT
I'd say a global one-list cache would give us pretty good mileage. I couldn't
resist trying this out, so I have a patch. Not tested yet though...
Comment 48 Johnny Stenback (:jst, jst@mozilla.com) 2002-05-21 09:27:20 PDT
Created attachment 84447 [details] [diff] [review]
Add a global one-list cache that holds on to the last requested list...
Comment 49 Prashant Desale 2002-05-24 16:41:49 PDT
Verified on 2002-05-23-08-trunk.
Comment 50 Boris Zbarsky [:bz] 2002-05-29 23:07:50 PDT
Sent drivers an approval request for 1.0.1
Comment 51 Judson Valeski 2002-05-30 09:06:57 PDT
please checkin to the 1.0.1 branch ASAP. once there please remove the
mozilla1.0.1+ keyword and add the fixed1.0.1 keyword.
Comment 52 Boris Zbarsky [:bz] 2002-05-30 13:14:55 PDT
Checked in on branch.
Comment 53 Boris Zbarsky [:bz] 2002-06-20 14:53:38 PDT
Comment on attachment 84447 [details] [diff] [review]
Add a global one-list cache that holds on to the last requested list...

r=bzbarsky.  Let's land that.
Comment 54 rbs 2002-07-05 12:20:04 PDT
Comment on attachment 84447 [details] [diff] [review]
Add a global one-list cache that holds on to the last requested list...

sr=rbs
Comment 55 Boris Zbarsky [:bz] 2002-07-09 00:19:06 PDT
Checked in that one-list cache on trunk.
Comment 56 Sivakiran Tummala 2002-07-12 16:31:36 PDT
There is an improvement on windows but on linux there is no considerable
improvement. Here are the results for 10 testruns with 2002-07-10-07-1.0 and
2002-07-11-08-1.0 branch builds.

+---------------------------------------------------------+
|Test                 | w/2002-05-08 |  Windows |  Linux  |
|                     | comment # 44 |          |         |
|---------------------|-------------------------|---------|
|getElementById       |   1095 ms    |  961ms   |  1474ms | 
|---------------------|-------------------------|---------|
|getElementsByTagName |   1185 ms    |  1096ms  |  1788ms |  
|---------------------|-------------------------|---------|
|createElement        |   2116 ms    |  2136ms  |  2483ms |  
|---------------------|-------------------------|---------|
|getAttribute         |   1398 ms    |  1252ms  |  1966ms |
|---------------------|-------------------------|---------|
|setAttribute         |   1336 ms    |  1205ms  |  1917ms | 
+---------------------------------------------------------+
Comment 57 Boris Zbarsky [:bz] 2002-07-12 16:40:50 PDT
I would expect absolutely no effect from that one-list cache for scripts run
from JS, on any platform.  It'll only be noticeable if someone calls
getElementsByTagName from C++ code in a particularly dumb way.  ;)
Comment 58 Sivakiran Tummala 2002-07-25 03:46:40 PDT
marking verified. 

Note You need to log in before you can comment on or make changes to this bug.