Closed Bug 76994 Opened 23 years ago Closed 20 years ago

O(n^2) algorithm in GenericElementCollection::Item()

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8alpha1

People

(Reporter: jst, Assigned: bzbarsky)

References

Details

(Keywords: perf)

Attachments

(2 files, 3 obsolete files)

This hasn't shown up in any profiles yet AFAIK, brendan and aaronl found this by
looking at the code...
Status: NEW → ASSIGNED
Keywords: perf
Target Milestone: --- → mozilla1.0
Updating QA contact to Shivakiran Tummala.
QA Contact: desale → stummala
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
what effects does this "bug" have / visible in which scenarious?
Only a performance problem in edge cases, hasn't shown up as a real issue yet.
changing priority to P5
Priority: -- → P5
Target Milestone: mozilla1.0.1 → ---
Mass-reassigning bugs.
Assignee: jst → dom_bugs
Status: ASSIGNED → NEW
jst, why do we even have this class? Wouldn't it be simpler to modify
nsContentList to not descend into kids if needed, use that for all places where
GenericElementCollection is used, and scrap this class altogether?  The only
things this class is used for are HTMLMapElement.areas and the various
table-related lists (tBodies, DOM operations like insertRow (which can be O(N)
in number of rows because of this bug!), HTMLTableSectionElement.rows, etc).

nsContentList ends in two 8-bit members, so we could add a PRPackedBool for
"descend into children" with no trouble.

I can do this if people think the proposal is reasonable.
That sounds very reasonable to me!
Yay for killing obscure content classes! :-)
Hmm.. his could be less trivial than I thought due to the nasty issue of
mRootContent management in nsContentList (there isn't any; we just sort of pray).
Depends on: 237566
No longer depends on: 237566
This removes GenericElementCollection, replaces all uses with nsContentList,
and fixes nsContentList to be able to handle "shallow" tree traversals.

The only really hard changes are to nsContentList.  The basic summary is:

1)  Change IsDescendantOfRoot() to return true only if the container is the
root
    in the shallow case.
2)  Make MatchSelf() return without recursing in the shallow case.

Those two changes make us do the right thing for all the nsIDocumentObserver
notifications as far as deciding whether the notification affects us.

3)  Change PopulateWith to not recurse down when it's looking at a node it
might
    populate with (when aIncludeRoot is true)
4)  Change PopulateWithStartingAfter to only look at the kids of aStartRoot if
    aStartRoot == mRootContent in the shallow case.  This ensures we don't pick

    up grandkids.

I believe these changes are sufficient to handle things correctly, but I'll do
some testing in a debug build to see whether I trigger the asserts and whether
the behavior is correct.

Any suggestions of testcases to try?  Things using .rows, .cells, etc would be
good, I think.
Attached file Some tests
That tests that everything is at least happy in a static environment (with
test7 being a nod to testing some response to DOM mutations).

So this basically tests everything reasonably well except for the nsContentList
changes....
Attached patch Fix typo (obsolete) — Splinter Review
This patch passes those tests, runs fine as far as I can tell, doesn't assert
on various things like that test, getElementsByTagName calls, etc.
Attachment #145404 - Attachment is obsolete: true
Attached patch Same as diff -b. FOR REVIEW ONLY (obsolete) — Splinter Review
Comment on attachment 145406 [details] [diff] [review]
Same as diff -b.  FOR REVIEW ONLY

I'm sorry the fix for bug 237213 snuck in there....

Anyway, would you review?  The changes to nsContentList need special attention,
as I said above.
Attachment #145406 - Flags: superreview?(jst)
Attachment #145406 - Flags: review?(bugmail)
Complete list of instances of HTMLCollection in the DOM 2 HTML Spec are:

document.images
document.applets
document.links
document.forms
document.anchors
HTMLFormElement.elements
HTMLMapElement.areas
HTMLTableElement.rows
HTMLTableElement.tBodies
HTMLTableSectionElement.rows
HTMLTableRowElement.cells

I could do short tests cases (with lots of iterations) that measured the time
required to perform an insert and see if the time looked O(1), O(N), or
something else. Would that help? Should each of the above be tested?

http://dom-ts.bclary.com/build/ecmascript/level2/html/alltests.html also has a
number of tests as well.
The only things I changed, hopefully, were:

HTMLMapElement.areas
HTMLTableElement.rows
HTMLTableElement.tBodies
HTMLTableSectionElement.rows
HTMLTableRowElement.cells

though the code I touched is used by all HTMLCollection and a number of NodeList
instances.

What's needed is not performance testing (that's easily doable by code tracing
in this case) but correctness testing.  Things like accessing .rows during
pageload several times and seeing whether you get the right results....

Thanks for the pointer to alltests; I'll give it a spin.
Actually http://dom-ts.bclary.com/build/ecmascript/level1/html/alltests.html is
more complete than level 2 tests these days.
Hmm...  so I ran those tests and I don't see any failures that I'm not getting
without my changes.

But how do I view the actual page the test failed on?  Some of the failures are
in code that really should work.
remove the alltests.html and you can view a directory listing of the tests. Note
that the tests load test documents from the files/ directory and will load the
.xml, .xhtml or .html versions depending on the content type you have chosen.
There are some tests which are only appropriate for .html or .xhtml (e.g
HTMLBody* in Level 2 HTML) so try the tests with .html to remove that potential
issue. Am on IRC now if you want to discuss it.
No IRC client here....

That was enough for me to figure out what's going on with the test that was
confuxing me.  The test at
http://dom-ts.bclary.com/build/ecmascript/level2/html/HTMLTableElement39.html is
just wrong.  Inserting a row at -1 will insert it in the same section as the
last row in logical order, which would be in the tfoot.  So the number of rows
in the tbody won't change.
Attachment #145403 - Attachment is obsolete: true
Attachment #145404 - Attachment is obsolete: false
Actually, in nsContentList::nsContentList the 

   mFunc = aFunc;

line can be removed...
Blocks: 240186
Comment on attachment 145406 [details] [diff] [review]
Same as diff -b.  FOR REVIEW ONLY

The name IsDescendantOfRoot isn't quite right any more. Maybe something like
"IsInSearchedSet" or something might be better. No biggie though.

How does nsContentLists returned from GetElementsByTagName handle the root
element going away? The new calls to RootDestroyed looks fine but i'm a little
surpriced that there wasn't already a mechanism in place for that.

>Index: content/base/src/nsContentList.cpp
> nsContentList::nsContentList(nsIDocument *aDocument,
>                              nsContentListMatchFunc aFunc,
>                              const nsAString& aData,
>-                             nsIContent* aRootContent)
>-  : nsBaseContentList(), nsContentListKey(aDocument, nsnull, kNameSpaceID_Unknown, aRootContent)
>+                             nsIContent* aRootContent,
>+                             PRBool aDeep)
>+  : nsBaseContentList(),
>+    nsContentListKey(aDocument, nsnull, kNameSpaceID_Unknown, aRootContent),
>+    mFunc(aFunc),
>+    mMatchAll(PR_FALSE),
>+    mState(LIST_DIRTY),
>+    mDeep(aDeep)
> {
>+  NS_ASSERTION(mDeep || mRootContent, "Must have root content for non-deep list!");
>   mFunc = aFunc;
>   if (!aData.IsEmpty()) {
>     mData = new nsString(aData);
>     // If this fails, fail silently
>   }
>   else {
>     mData = nsnull;
>   }
>-  mMatchAtom = nsnull;
>-  mRootContent = aRootContent;
>   mMatchAll = PR_FALSE;
>   mState = LIST_DIRTY;
>   Init(aDocument);
> }

Did you delete the wrong lines here? I don't see mMatchAtom or mRootContent
being set any more, but mFunc, mMatchAll and mState are being set twice.

> void 
> nsContentList::PopulateWith(nsIContent *aContent, PRBool aIncludeRoot,
>                             PRUint32 & aElementsToAppend)
> {
>+  NS_PRECONDITION(mDeep || aContent == mRootContent ||
>+                  aContent->GetParent() == mRootContent,
>+                  "PopulateWith called on nodes we can't possibly match");
>+  NS_PRECONDITION(mDeep || aIncludeRoot || aContent == mRootContent,
>+                  "Bogus root passed to PopulateWith in non-deep list");
>+  

You're not testing for non-deep searches on the root element. I.e. mDeep being
false, aIncludeRoot being true and aContent == mRootContent.

>Index: content/html/content/src/nsHTMLTableElement.cpp
> // we re-count every call.  A better implementation would be to set
> // ourselves up as an observer of contentAppended, contentInserted,
> // and contentDeleted
> NS_IMETHODIMP 
> TableRowsCollection::GetLength(PRUint32* aLength)

Can the above comment be removed now?

>+// Returns the number of items in the row group, only if *aItem ends
>+// up null.  Otherwise, returns 0.
>+static PRUint32
>+GetItemInRowGroup(nsIDOMHTMLTableSectionElement* aRowGroup,
>+                  PRUint32 aIndex, nsIDOMNode** aItem)
>+{
>+  NS_PRECONDITION(aItem, "Null out param");
>+  
>+  PRUint32 length = 0;
>+  
>+  if (aRowGroup) {
>+    nsCOMPtr<nsIDOMHTMLCollection> rows;
>+    aRowGroup->GetRows(getter_AddRefs(rows));
>+    
>+    if (rows) {
>+      rows->Item(aIndex, aItem);
>+      if (!*aItem) {
>+        rows->GetLength(&length);
>+      }
>+    }
>+  }
>+  
>+  return length;
>+}

You should null out *aItem in case aRowGroup or rows is null.

With that, r=me
Attachment #145406 - Flags: review?(bugmail) → review+
(In reply to comment #23)
> The name IsDescendantOfRoot isn't quite right any more. Maybe something like
> "IsInSearchedSet" or something might be better. No biggie though.

Hmm.. How about MayContainRelevantNodes(), since we should always be calling
this on containers of nodes we may care about?

> How does nsContentLists returned from GetElementsByTagName handle the root
> element going away?

It doesn't.  See the comment at
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsContentList.h#143
(which this patch leaves there, since it's still relevant).  I'll file a
followup bug on this now that we have infrastructure to handle it cleanly.

> Did you delete the wrong lines here? I don't see mMatchAtom or mRootContent
> being set any more

They're set by the constructor of the superclass.

> but mFunc, mMatchAll and mState are being set twice.

Yeah, see comment 22.  I'll remove the extra sets before checking in.

> You're not testing for non-deep searches on the root element. I.e. mDeep being
> false, aIncludeRoot being true and aContent == mRootContent.

Hmm... right you are.  I'll add an assert to that effect.

> >Index: content/html/content/src/nsHTMLTableElement.cpp
> > // we re-count every call.  A better implementation would be to set
> Can the above comment be removed now?

Not quite, since we still recount (walk the table sections, etc).  Although it's
enough better that maybe we don't want to worry about it anymore.

> You should null out *aItem in case aRowGroup or rows is null.

Will do.

jst, let me know whether you want an updated patch with those changes....
Comment on attachment 145406 [details] [diff] [review]
Same as diff -b.  FOR REVIEW ONLY

+// Returns the number of items in the row group, only if *aItem ends
+// up null.  Otherwise, returns 0.
+static PRUint32
+GetItemInRowGroup(nsIDOMHTMLTableSectionElement* aRowGroup,
+		   PRUint32 aIndex, nsIDOMNode** aItem)

How about naming that GetItemOrCountInRowGroup to closer reflect what it really
does?

- In content/html/content/src/nsHTMLTableRowElement.cpp:

+PR_STATIC_CALLBACK(PRBool)
+IsCell(nsIContent *aContent, nsString* aData)
+{
+  nsINodeInfo *ni = aContent->GetNodeInfo();
+
+  return (ni && (ni->Equals(nsHTMLAtoms::td) || ni->Equals(nsHTMLAtoms::th))
&&
+	   aContent->IsContentOfType(nsIContent::eHTML));
+}

This would be slightly less code, and no slower, if written as:

+  nsIAtom *tag = aContent->Tag();
+
+  return ((tag == nsHTMLAtoms::td || tag == nsHTMLAtoms::th) &&
+	   aContent->IsContentOfType(nsIContent::eHTML));

With that, and with the earlier comments addressed, sr=jst
Attachment #145406 - Flags: superreview?(jst) → superreview+
Assignee: general → bzbarsky
Attachment #145405 - Attachment is obsolete: true
Attachment #145406 - Attachment is obsolete: true
Status: NEW → ASSIGNED
er, that last diff doesn't include a change to nsContentList.h to rename
IsDescendantOfRoot to MayContainRelevantNodes there too, which I just checked in.
Patch checked in for 1.8a, tree is green, birds would be singing if it were
earlier in the day.

Marking fixed, since the offending code is simply no more.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Priority: P5 → P2
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8alpha
Bug 240471 filed on the issue of mRootContent going away.
Woohoo! Another class bites the dust!
Component: DOM: HTML → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: