Closed Bug 140758 Opened 22 years ago Closed 22 years ago

[FIX]Cache results of getElementsByTagName on the document

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

(Keywords: embed, perf, topembed-)

Attachments

(2 files, 8 obsolete files)

Right now, every call to getElementsByTagName produces a new nsContentList
object.  This is very expensive if getElementsByTagName is being called in a
loop or even just often between garbage collections.

Per discussion with jst, we should cache these content lists and return them,
instead of creating new objects.
adding some bugs that show how we could benefit from this to dependency list. 
In particular, the table at
http://bugzilla.mozilla.org/show_bug.cgi?id=118933#c72 shows how much we have to
go and the comments at http://bugzilla.mozilla.org/show_bug.cgi?id=132974#c5
shows a possible win from this caching.
Blocks: 118933, 132974
Keywords: perf
Priority: -- → P1
Target Milestone: --- → mozilla1.2alpha
Attached patch Patch v 1.0 (obsolete) — Splinter Review
This patch makes us about 2.5 times faster on tight loops which call
getElementsByTagName (tested on
http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm and the
page in bug 132974).

Reviews?  Testing?  I have not run this through a rigorous "did I break
getElementsByTagName?" test yet, so if someone wants to do that... :)
I don't see how you keep the cache consistent when someone adds or removes tags
from the page dynamically. Am I missing something? I don't know much about this
code.
Addition/removal of tags is taken care of by the content lists themselves (they
observe the document).  This patch just changes the following situation:

var a = document.getElementsByTagName("a");
var b = document.getElementsByTagName("a");

Before this change |a| and |b| would end up as two separate objects, both
observing the document and responding to content changes.  With this patch, we
cache the content list on the first call and just pass back a pointer to the
cached content list on the second call, so |a| and |b| end up being the same object.
Comment on attachment 82313 [details] [diff] [review]
Patch v 1.0

>+      nsContentList* contentList = entry->mContentList;
>+
>+      if (contentList &&
>+          contentList->GetDocument() == aDocument &&
>+          contentList->GetMatchAtom() == aMatchAtom &&
>+          contentList->GetMatchNamespaceId() == aMatchNameSpaceId &&
>+          contentList->GetRootContent() == aRootContent) {
>+        *aInstancePtrResult = contentList;
>+      }

Why are all those checks needed? The hash key encodes all those properties of
the content
content list, so they must be equal, must they not?

Apart from the fact that I don't know the hash table code, this looks great :-)
> The hash key encodes all those properties of the content
> content list, so they must be equal, must they not?

Well, no.  There's the possibility of hash collisions....  Not very likely, but
possible.
Attached patch Patch v 1.1 (obsolete) — Splinter Review
Same thing, but doesn't touch nsIContentList; no one who doesn't use raw
nsContentList objects needs to know about this.
Attachment #82313 - Attachment is obsolete: true
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

+PR_STATIC_CALLBACK(PLDHashNumber)
+ContentListHashHashKey(PLDHashTable *table, const void *key)

Calling these functions |ContentListHashTable...etc| will avoid
a weirdname such as the above.

================
+  if (--gContentListCount == 0 && gContentListHashTableInited) {
+    PL_DHashTableFinish(&gContentListHashTable);
+    gContentListHashTableInited = PR_FALSE;
+  }

The patch is not adding all content lists to the HT (only one 
constructor is calling AddToHashTable(), so |gContentListCount|
is not guaranteed to be in sync with what is in the HT.

Are there any users of the other constructors? Since you now want
users to create a list via the single entry-point of the factory
function that you just added, you might as well fold the other
constructors under the umbrella of the factory to avoid confusion --
even if they don't add to the hash. 

================
 nsContentList::~nsContentList()
 {
+  RemoveFromHashtable();
   if (mDocument) {

 NS_IMETHODIMP
 nsContentList::DocumentWillBeDestroyed(nsIDocument *aDocument)
 {
   if (mDocument) {
+    // Our key will change... Best remove ourselves before that happens.
+    RemoveFromHashtable();

Why is the first one removed unconditionnally and the second one isn't?
In the first one, what happens in the case of a shared list as you noted
in comment #5 (in a C++ side, |b| could die when going out of certain
scope for example). You should perhaps refcount the entries instead?
(this could cater for the |gContentListCount| mentioned above).

================
+  inline nsIDocument* GetDocument(void) { return mDocument; }
+  inline nsIAtom* GetMatchAtom(void) { return mMatchAtom; }
+  inline PRInt32 GetMatchNamespaceId(void) { return mMatchNameSpaceId; }
+  inline nsIContent* GetRootContent(void) { return mRootContent; }

Not necessary to add the |inline| keyword since they are already inline
by virtue of being unrolled in the class declaration itself.
Attachment #82421 - Flags: needs-work+
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

>+// Hashtable for storing nsContentLists
>+static PLDHashTable gContentListHashTable;
>+static PRBool gContentListHashTableInited = PR_FALSE;

Since the standard guarantees null-ness of static memory,
you can test gContentListHashTable.ops to see whether it's
been initialized.  (If you maintain the invariant that you
destroy the table when its size hits zero, you could also
use entryCount.)

>+static PRUint32 gContentListCount = 0;

Can you just use gContentListHashTable.entryCount ?

>+class ContentListHashEntry : public PLDHashEntryHdr

Perhaps |struct|?

>+{
>+public:
>+  ContentListHashEntry(const PRUint32 aKey)
>+    : mKey(aKey), mContentList(nsnull)
>+  {
>+  }

pldhash will zero-initialize the entry space for you
(although your customized clearEntry breaks that), so
you don't need to null-initialize.  It's more traditional
not to use ctors/dtors, skip the initEntry callback, and
just fill in your key manually where you fill in the value.

>+  ~ContentListHashEntry()
>+  {
>+  }

Why bother writing an empty destructor just so you can
write a customized clearEntry callback that calls it?

>+PR_STATIC_CALLBACK(const void *)
>+ContentListHashGetKey(PLDHashTable *table, PLDHashEntryHdr *entry)
>+{
>+  ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry);
>+  return &e->mKey;

In a patch that I haven't got working yet, I used NS_INT32_TO_PTR
here (and it and NS_PTR_TO_INT32 in other places, rather than using
pointer-to-integer).

However, this is a moot point given the advice below.

>+// Calculate the hash key.
>+
>+// The most variable part of the key will likely be the aMatchAtom, so
>+// take the low 12 bits of that, drop the last two, and use those 10
>+// as the low 10 bits of our key.
>+// The next 6 bits of the key will be the six low bits of the namespace id
>+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of
>+// aRootContent
>+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of
>+// aDocument
>+static const PRUint32
>+CalculateHashKey(nsIDocument* aDocument, nsIAtom* aMatchAtom,
>+                 PRInt32 aMatchNameSpaceId, nsIContent* aRootContent)
>+{
>+  NS_ASSERTION(sizeof(PRWord) == sizeof(void *),
>+               "PRWord and void* not the same, so this is likely going to go astray");
>+  return
>+    ((NS_REINTERPRET_CAST(PRWord, aMatchAtom) & 0xFFF) >> 2) |
>+    ((aMatchNameSpaceId & 0x3F) << 10) |
>+    (((NS_REINTERPRET_CAST(PRWord, aRootContent) & 0x3FF) >> 2) << 16) |
>+    (((NS_REINTERPRET_CAST(PRWord, aDocument) & 0x3FF) >> 2) << 24);
>+}

OK, now I see what you're doing.  So what you really want to do here
is have 2-word entries rather than three word entries, and you can
use more of the stub ops since you'll look exactly like
PLDHashEntryStub (except for a typed |ContentList* mContentList|
instead of void *key).	You'll then use the nsContentList as your
key, essentially, although I think the best way to do this might to
create |nsContentListKey| with those 4 members, and make nsContentList
derive from that (protected inheritance, with a |GetHashKey()| member
like the ones you added), and make your |getKey| callback do an NS_STATIC_CAST
to nsContentListKey* from the content list, and then you can put
the hash calculation stuff into the |hashKey| callback and just construct
a temporary nsContentListKey on the stack when you want to do a lookup.
This seems a bit simpler / more elegant to me, although it does have the
key construction on the stack (which should be simple enough, though).
(BTW, you're probably better off not ignoring so many bits and just
XORing on top of things.)  In fact, you *have* to do this because otherwise
key collisions within your 32-bit int space will cause entries to be lost.

>+NS_EXPORT nsresult
>+NS_GetContentList(nsContentList** aInstancePtrResult,
>+                  nsIDocument* aDocument, nsIAtom* aMatchAtom,
>+                  PRInt32 aMatchNameSpaceId, nsIContent* aRootContent)
>+{
>+  *aInstancePtrResult = nsnull;
>+  // First we look in our hashtable.  Then we create a content list if needed
>+  if (gContentListHashTableInited) {

Again, check .ops here.

>+    const PRUint32 hashKey = CalculateHashKey(aDocument, aMatchAtom,
>+                                              aMatchNameSpaceId, aRootContent);
>+    
>+    ContentListHashEntry *entry =
>+      NS_STATIC_CAST(ContentListHashEntry *,
>+                     PL_DHashTableOperate(&gContentListHashTable,
>+                                          &hashKey,
>+                                          PL_DHASH_LOOKUP));
>+    if (PL_DHASH_ENTRY_IS_LIVE(entry)) {
>+      nsContentList* contentList = entry->mContentList;
>+
>+      if (contentList &&
>+          contentList->GetDocument() == aDocument &&
>+          contentList->GetMatchAtom() == aMatchAtom &&
>+          contentList->GetMatchNamespaceId() == aMatchNameSpaceId &&
>+          contentList->GetRootContent() == aRootContent) {

Every single one of these checks should be true if you write
matchEntry correctly, given the above changes to the entry structure.

>+        *aInstancePtrResult = contentList;
>+      }
>+    }
>+  }
>+
>+  if (!*aInstancePtrResult) {
>+    // We need to create one
>+    *aInstancePtrResult = new nsContentList(aDocument, aMatchAtom,
>+                                            aMatchNameSpaceId, aRootContent);

Do you want to add it to the hashtable here, in general?  (If so,
then the simple solution is to do a PL_DHASH_ADD instead of a 
PL_DHASH_LOOKUP, and just fill it in if you have to create it.)

>+  if (--gContentListCount == 0 && gContentListHashTableInited) {
>+    PL_DHashTableFinish(&gContentListHashTable);
>+    gContentListHashTableInited = PR_FALSE;

If this is all you're using the |gContentListCount| for,
then it certainly could just be the entryCount on the
table.

>+// Support functions for hashing content lists
>+void
>+nsContentList::AddToHashtable()
>+{
>+  static PLDHashTableOps hash_table_ops =
>+  {
>+    PL_DHashAllocTable,
>+    PL_DHashFreeTable,
>+    ContentListHashGetKey,
>+    ContentListHashHashKey,
>+    ContentListHashMatchEntry,
>+    PL_DHashMoveEntryStub,
>+    ContentListHashClearEntry,
>+    PL_DHashFinalizeStub,
>+    ContentListHashInitEntry
>+  };
>+    
>+  if (!gContentListHashTableInited) {

You can check against the table's .ops here.

>+    gContentListHashTableInited = PL_DHashTableInit(&gContentListHashTable,
>+                                                    &hash_table_ops, nsnull,
>+                                                    sizeof(ContentListHashEntry),
>+                                                    16);
>+
>+    if (!gContentListHashTableInited)

You'll want to null .ops on failure here, though.

>+      return;
>+  }
>+
>+  const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom,
>+                                            mMatchNameSpaceId, mRootContent);
>+  
>+  ContentListHashEntry *entry =
>+    NS_STATIC_CAST(ContentListHashEntry *,
>+                   PL_DHashTableOperate(&gContentListHashTable,
>+                                        &hashKey, PL_DHASH_ADD));
>+  if (!entry)
>+    return;
>+
>+  entry->mContentList = this;

Again, this could be much less complicated (see above).

>+void
>+nsContentList::RemoveFromHashtable()
>+{
>+  if (!gContentListHashTableInited)
>+    return;
>+  
>+  const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom,
>+                                            mMatchNameSpaceId, mRootContent);
>+  
>+  ContentListHashEntry *entry =
>+    NS_STATIC_CAST(ContentListHashEntry *,
>+                   PL_DHashTableOperate(&gContentListHashTable,
>+                                        &hashKey,
>+                                        PL_DHASH_LOOKUP));
>+  if (!PL_DHASH_ENTRY_IS_LIVE(entry))
>+    return;
>+  
>+  if (entry->mContentList != this)
>+    return;  // Not us!
>+
>+  PL_DHashTableRawRemove(&gContentListHashTable, entry);

Given the changes in key structure, this could just be
a PL_DHASH_REMOVE.
Attachment #82421 - Flags: needs-work+
Comment on attachment 82421 [details] [diff] [review]
Patch v 1.1

Re: comment 9:

> |gContentListCount| is not guaranteed to be in sync with what is in the HT.

It need not be.  I just wanted to make sure we called finish() on the hashtable
before shutdown.. think of it as a poor man's shutdown observer... :)

David's comments supercede this, in any case.

> Are there any users of the other constructors?

There are, but I'm not sure I want to define multiple overloaded copies of this
function most of which will do nothing but call the constructor.

Also, note that this is not exactly a factory function because it does not
addref the out param, hence it not being called NS_New*.

> Why is the first one removed unconditionnally and the second one isn't?

Because there is no reason to RemoveFromHashtable() in the case when our hash
key is not going to change?  On the other hand, removing ourselves in the
destructor is a must -- the hash table holds no reference and we want to avoid
stale pointers here.

> In the first one, what happens in the case of a shared list 

Absolutely nothing.  As long as _someone_'s holding a reference to the
nsContentList (these are refcounted, btw) the destructor is not going to be
called.  If the destructor is being called that means no one is holding a
reference to the content list anymore.

Good call on the inline thing.

Re: Comment 10 and comment 11:

Ooh.  Nice.  :)  Updated patch coming up.
Attachment #82421 - Flags: needs-work+
Attached patch Patch v 1.2 (obsolete) — Splinter Review
Changes:

Address David's comments, using the nsContentList as the key.  Use .ops to test
whether the table is initialized and .entryCount to decide when to finish the
table.	Move all the hash-addition code to NS_GetContentList, so people who
want to use the hashtable can call that and people who don't can use the raw
|new|.
Attachment #82421 - Attachment is obsolete: true
Comment on attachment 82565 [details] [diff] [review]
Patch v 1.2

>+#define ROTATE(num, bits) (((num) << bits) | ((num) >> (32-bits)))
>+
>+PRUint32
>+nsContentListKey::GetHash(void) const
>+{
>+  // mMatchNameSpaceId is not likely to get big, so give it only 6 bits.
>+  // mMatchAtom is likely the most variable part, so give it 10 bits.
>+  // The other two pointers get 8 bits each.
>+  return
>+    NS_PTR_TO_INT32(mMatchAtom.get()) ^
>+    ROTATE(mMatchNameSpaceId, 10) ^
>+    ROTATE(NS_PTR_TO_INT32(mRootContent), 16) ^
>+    ROTATE(NS_PTR_TO_INT32(mDocument), 24);
>+}

It's probably ok to just do

return NS_PTR_TO_INT32(mMatchAtom.get()) ^
       (NS_PTR_TO_INT32(mRootContent) << 8) ^
       (NS_PTR_TO_INT32(mDocument) << 16) ^
       mMatchNameSpaceId << 24;

>+PR_STATIC_CALLBACK(const void *)
>+ContentListHashtableGetKey(PLDHashTable *table, PLDHashEntryHdr *entry)
>+{
>+  ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry);
>+  return e->mContentList;

Hmm.  I would think this needs to be

    return NS_STATIC_CAST(nsContentListKey*, e->mContentList);

although actually you might need a member variable on nsContentList to
do that since it's protected inheritance.  I'm surprised this
worked (or did this cost you the perf gain?).

>+  ContentListHashEntry *entry = nsnull;
>+  // First we look in our hashtable.  Then we create a content list if needed
>+  if (gContentListHashTable.ops) {
>+    nsContentListKey hashKey(aDocument, aMatchAtom,
>+                             aMatchNameSpaceId, aRootContent);
>+    
>+    // A PL_DHASH_ADD is equivalent to a PL_DHASH_LOOKUP for cases
>+    // when the entry is already in the hashtable.
>+    entry =
>+      NS_STATIC_CAST(ContentListHashEntry *,
>+                     PL_DHashTableOperate(&gContentListHashTable,
>+                                          &hashKey,
>+                                          PL_DHASH_ADD));
>+      *aInstancePtrResult = entry->mContentList;

This is misindented, and it assumes you didn't hit out-of-memory
by dereferencing entry (which you're more carefuly about below).

>+void
>+nsContentList::RemoveFromHashtable()
>+{
>+  if (!gContentListHashTable.ops)
>+    return;
>+  
>+  nsContentListKey hashKey(mDocument, mMatchAtom, mMatchNameSpaceId, mRootContent);

You don't need to construct a key -- you can just use
NS_STATIC_CAST(nsContentListKey*, this)

> class nsContentList : public nsBaseContentList,
>+                      protected nsContentListKey,

Given the |protected| members above, maybe this could
just be |public|, at least if you give the |public|
member functions les confusing names.
> I'm surprised this worked

I am too, now that you point it out... :)
This is why I was using stack-allocated keys when it was not really needed --
otherwise it did _not_ work.  Thanks for pointing out why....

I hate casts, some days... Patch with all that fixed coming up.

Attached patch Patch v 1.3 (obsolete) — Splinter Review
Addresses comment 13's issues on casting and OOM.  In fact, I've dropped my
custom hash entry struct and the custom getKey callback, since the stubs work
just fine once I do the casting right.

I'd rather keep the inheritance |protected|, but it's jst's call on what he
wants in his module...
Attachment #82565 - Attachment is obsolete: true
How can you drop the getKey callback?  The one you had before was equivalent to
the stub, but I said in comment 13 that it was broken.
The brokenness was in having an nsContentList* member that I was returning
without casting to nsContentListKey*.  Since nsContentListKey is not the first
class nsContentList inherits from, taking that return value and casting to
nsContentListKey* was bad (it gave a pointer to the wrong data).  Essentially,
it was doing

NS_STATIC_CAST(nsContentListKey*, NS_STATIC_CAST(void*, nsContentList* foo))

which is bad.

The new version makes sure that I always cast to nsContentListKey* before
writing the key and always cast to nsContentListKey* after reading the key.  So
the conversion is always nsContentListKey* --> void* or the other way, which is
consistent.  So this should be correct, unless I deeply misunderstand how
casting works in C++ (which is entirely possible, btw).
You still need a |getKey| callback.  It's only used when resizing the table, so
you might not notice the bugs that the lack of it would cause.  And, as I
mentioned yesterday, you might want to switch which of the things you shift when
computing the hash, since you know that only the low bits of the namespace are
significant, and you can expect that the document is somewhat less significant
given that you're already hashing on the root content node.
(The reason you need the getKey callback is to enforce the casting that you
mention in comment 17.  And it should just be what I mentioned in comment 13.)
Attached patch Patch v 1.4 (obsolete) — Splinter Review
OK, that makes sense.  Adds the getKey callback.
Attachment #82586 - Attachment is obsolete: true
Comment on attachment 82596 [details] [diff] [review]
Patch v 1.4

OK, now that I look more closely, it still isn't right, because you're actually
using the PLDHashEntryStub type, which has a |void *key|.  You're probably
better off defining your own type that has the exact same layout as the stub
type (and is derived from PLDHashEntryHdr), but that uses a typed
|nsContentList *mContentList|, so that you can use fewer casts.
Attached patch kind of like this? (obsolete) — Splinter Review
Moves all the icky casting into the callback functions.
Comment on attachment 82608 [details] [diff] [review]
kind of like this?

Yeah, the hashtable stuff looks good to me now.

>+    if (entry)
>+      entry->mContentList = *aInstancePtrResult;

If you're worried about allocation failure for the content
list, you might want to change this to be:

if (!*aInstancePtrResult) {
  if (entry)
    PL_DHashTableRawRemove(gContentListTable, entry)
  return NS_ERROR_OUT_OF_MEMORY;
}
if (entry)
  entry->mContentList = *aInstancePtrResult;

(and perhaps even use a local variable instead of
*aInstancePtrResult).

>+  inline PRBool Equals(const nsContentListKey& aContentListKey) const;
>+  inline PRUint32 GetHash(void) const;

Maybe just write the inline functions inline in the class?  Or at
least use the inline keyword at the definition as well?

Other than that, r=dbaron if you get module-owner review as your sr=.
Attached patch Patch v 1.73 (obsolete) — Splinter Review
Move inlines to header, remove the entries if a new list cannot be allocated.
Attachment #82596 - Attachment is obsolete: true
Attachment #82608 - Attachment is obsolete: true
Comment on attachment 82681 [details] [diff] [review]
Patch v 1.73

>+    list = new nsContentList(aDocument, aMatchAtom,
>+                                            aMatchNameSpaceId, aRootContent);

weird indentation.

Other than that r=dbaron, as I said 2 comments back.
Attachment #82681 - Flags: review+
Comment on attachment 82681 [details] [diff] [review]
Patch v 1.73

+nsContentListKey::nsContentListKey(nsIDocument *aDocument,
+				    nsIAtom* aMatchAtom, 
+				    PRInt32 aMatchNameSpaceId,
+				    nsIContent* aRootContent)
+  : mMatchAtom(aMatchAtom),
+    mMatchNameSpaceId(aMatchNameSpaceId),
+    mDocument(aDocument),
+    mRootContent(aRootContent)
+{
+}
+
+nsContentListKey::nsContentListKey(const nsContentListKey& aContentListKey)
+  : mMatchAtom(aContentListKey.mMatchAtom),
+    mMatchNameSpaceId(aContentListKey.mMatchNameSpaceId),
+    mDocument(aContentListKey.mDocument),
+    mRootContent(aContentListKey.mRootContent)
+{
+}

I vould've made the above inline, but either way...

- In NS_GetContentList():

+  nsContentList* list;

You'll need to initialize |list| to null or you could end up accessing it w/o
ever having set it in error cases.

And NS_GetContentList() really needs to refcount the list that it returns, not
doing that in a method whose signature looks like it really would do that (even
if the type of the out param is a concrete class) is not cool. Also, you're
potentially passing back shared content lists, so there's no way for the caller
to do the right thing in error cases.

+    list = new nsContentList(aDocument, aMatchAtom,
+					     aMatchNameSpaceId, aRootContent);

Fix the next-line indentation.

- In nsContentList::RemoveFromHashtable():

+  (void) PL_DHashTableOperate(&gContentListHashTable,
+			       NS_STATIC_CAST(nsContentListKey*, this),
+			       PL_DHASH_REMOVE);

Loose the (void) noncense here :-)

- In nsContentList.h:

+  nsIDocument* mDocument;
+  nsIContent* mRootContent;

Add "// Weak" comments after these...

Other than that, this looks good so far. Fix the above and I'll have one more
look...
Attachment #82681 - Flags: review+ → needs-work+
Attached patch Patch v 1.732 (obsolete) — Splinter Review
Fix those issues.
Attachment #82681 - Attachment is obsolete: true
Comment on attachment 82739 [details] [diff] [review]
Patch v 1.732

sr=jst
Attachment #82739 - Flags: superreview+
So... the Win32 compiler won't let me cast entry->mContentList to
nsContentListKey*, presumably because of the protected inheritance...  I suppose
I could just make it publicly inherited; any reason not to do that?
The simple alternative is to add a GetKey member function:

nsContentListKey* GetKey() {
  return NS_STATIC_CAST(nsContentListKey*, this);
}

The only reason I can think of to avoid making it public is that the public
member functions of the key class might not make sense as names of members of
nsContentList.
dbaron's suggestion sounds good to me.
Attached patch Patch v 1.73205Splinter Review
Make this compile on Windows/Mac/Irix/etc, per dbaron's suggestion.
Attachment #82739 - Attachment is obsolete: true
Comment on attachment 82782 [details] [diff] [review]
Patch v 1.73205

sr=jst
Attachment #82782 - Flags: superreview+
Er.. the last part of that diff (to nsRuleNode.cpp) has nothing to do with this
bug.  The rest has been checked in on the trunk; I'll ask for branch approval in
a day or two.
nominating for 1.0.0 and making topembed.
Keywords: mozilla1.0, topembed
Target Milestone: mozilla1.2alpha → mozilla1.0
Summary: Cache results of getElementsByTagName on the document → [FIX]Cache results of getElementsByTagName on the document
topembed-, embed.  not directly impacting embedding customers but it'd be a good
fix for performance.
Keywords: topembedembed, topembed-
nominating for nsbeta1, making us faster than IE is always good for releases :)
Keywords: nsbeta1
Drivers are leery of taking this for 1.0. Marking fixed to get off my radar, as
it's fixed on the trunk.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Cc:ing boullet who did some timings of Mozilla vs. IE in bug 118933 comment 92
(where |getElementsByTagName| is about 4x slower compared to IE 5.5), maybe getting
some real numbers here can help drivers to quantify the benefit/risk w.r.t. the
competition.
As a note, I sent information on that in the approval request (I had people on
#mozillazine test both IE and Mozilla on those exact testcases).

This may be landing for 1.0.1, but 1.0 is being very much locked down at this point.
It would be interesting to have an idea of much improvement the patch brought
about (there is still life after m1.0 :-) Got some measurements to share with
us?
Sure thing.  Testing with stock nightly builds from the morning and the evening
of the day I checked this in, on Linux, on the testcase at
http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm:



2002-05-08-09

Average (1runs) :1655ms
Average (5runs) :1876ms
Average (10runs) :2047ms

2002-05-08-21

Average (1runs) :787ms
Average (5runs) :736ms
Average (10runs) :739ms

So a factor of 2-3 improvement.
... and that's on a tiny document, the savings on a larger document should be an
order of magnitude in normal useage (depending on the useage patterns, of course).

Oh, and by the way Mr. "not reading bugmail till June 12", stop reading your
bugmail! :-)
Here are the results with 2002052008
Mozilla is now very close to IE5.5 on getElementByTagName and I don't see
anymore the GC effect that was making each run time very different.

+---------------------+--------+----------+-------+----------+
|Test                 | before | w/ 82782 | Gain% |   IE5.5  |
|---------------------|--------|----------|-------|----------|
|getElementById       |1169 ms | 1095 ms  |  ---  |  972 ms  |
|---------------------|--------|----------|-------|----------|
|getElementsByTagName |4119 ms | 1185 ms  |  71%  | 1093 ms  |
|---------------------|--------|----------|-------|----------|
|createElement        |2193 ms | 2116 ms  |  ---  |  760 ms  |
|---------------------|--------|----------|-------|----------|
|getAttribute         |1472 ms | 1398 ms  |  ---  |  496 ms  |
|---------------------|--------|----------|-------|----------|
|setAttribute         |1425 ms | 1336 ms  |  ---  |  541 ms  |
+---------------------+--------+----------+-------+----------+
That's awesome! The "GC effect" that we used to see here was due to the fact
that we create so many new content lists and GC'ing all those took a long time,
and the fact that we created so many of them could also have been the reason for
running the GC in the first place. Either way, even if we still end up running
the GC during these tests, it should be really really fast, and even if it does
potentially (and intentionally) invalidate our cached result from
getElementsByTagName(), we're still much much faster since even in a trivial
case where we execute one getElementsByTagName() per branch we'd now end up
walking the tree looking for those elements once every 4096 branch executions
(assuming the GC is run that often even if we create no new objects per branch
execution), where we used to walk the tree for every single branch execution
(and create a new JSObject, not to mention adding one document observer per
branch execution as well).

We could, and I think we might want to do this, create an internal cache that
would hold on to the last requested content list that we get from
getElementsByTagName(), that way the last created content list would survive a
GC run, even if the XPConnect wrapper for that content list might go away. That
would save us the tree walking-after-GC that we might now end up doing in these
testcases at least. Not a big deal on real-life pages I would imagine, but doing
this would also have the benefit of possibly helping poorly written code that's
running in a non-GC environment (read C++), i.e. a similar testcase written in
C++ where getElementsByTagName() would be called in a tight loop and the result
would be released between every call, the current cache would do us no good in
such a case, but if we'd cache the last requested content list, we'd help code
like that.

Thoughts? Bz (or someone else), wanna add one more tiny cache to this code?
Would this "one-list cache" be global, per-document, or what?
I'd say a global one-list cache would give us pretty good mileage. I couldn't
resist trying this out, so I have a patch. Not tested yet though...
Verified on 2002-05-23-08-trunk.
Status: RESOLVED → VERIFIED
Sent drivers an approval request for 1.0.1
Keywords: mozilla1.0.1
please checkin to the 1.0.1 branch ASAP. once there please remove the
mozilla1.0.1+ keyword and add the fixed1.0.1 keyword.
Attachment #82782 - Flags: approval+
Checked in on branch.
Comment on attachment 84447 [details] [diff] [review]
Add a global one-list cache that holds on to the last requested list...

r=bzbarsky.  Let's land that.
Attachment #84447 - Flags: review+
Attachment #84447 - Flags: superreview+
Comment on attachment 84447 [details] [diff] [review]
Add a global one-list cache that holds on to the last requested list...

sr=rbs
Checked in that one-list cache on trunk.
There is an improvement on windows but on linux there is no considerable
improvement. Here are the results for 10 testruns with 2002-07-10-07-1.0 and
2002-07-11-08-1.0 branch builds.

+---------------------------------------------------------+
|Test                 | w/2002-05-08 |  Windows |  Linux  |
|                     | comment # 44 |          |         |
|---------------------|-------------------------|---------|
|getElementById       |   1095 ms    |  961ms   |  1474ms | 
|---------------------|-------------------------|---------|
|getElementsByTagName |   1185 ms    |  1096ms  |  1788ms |  
|---------------------|-------------------------|---------|
|createElement        |   2116 ms    |  2136ms  |  2483ms |  
|---------------------|-------------------------|---------|
|getAttribute         |   1398 ms    |  1252ms  |  1966ms |
|---------------------|-------------------------|---------|
|setAttribute         |   1336 ms    |  1205ms  |  1917ms | 
+---------------------------------------------------------+
I would expect absolutely no effect from that one-list cache for scripts run
from JS, on any platform.  It'll only be noticeable if someone calls
getElementsByTagName from C++ code in a particularly dumb way.  ;)
marking verified. 
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: