Closed Bug 515338 Opened 15 years ago Closed 15 years ago

[HTML5] Make HTML5 parser internals not hold nsIContent or regular dynamic atoms

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 6 obsolete files)

The tokenizer and tree builder must not hold main-thread-only objects like nsIContent and nsAtomImpl.
Blocks: 515941
Attachment #400756 - Flags: review?(bnewman)
Comment on attachment 400756 [details] [diff] [review]
Change the memory management regime for atoms, nodes and attributes, v2

This is ready for review, except changes over in bug 514661 will affect how exactly this patch will call the scoped atom table stuff. However, those changes shouldn't affect the substance of this patch.
Does this patch depend on any others?  It's not applying cleanly for me, even on top of 514661.  Maybe I'm just confused.
Keywords: 4xp
It depends on bug 499642, bug 514661 and bug 515142. Bug 515142 is a spec correctness change to the Java bits, so I'm assuming it can go in as rs=sicking.
Depends on: 499642, 515142
My comments:

diff --git a/parser/html/nsHtml5Portability.cpp b/parser/html/nsHtml5Portability.cpp
--- a/parser/html/nsHtml5Portability.cpp
+++ b/parser/html/nsHtml5Portability.cpp
@@ -115,20 +115,8 @@ nsHtml5Portability::releaseLocal(nsIAtom
 {
   NS_IF_RELEASE(local);
 }
 
-// XXX Useless code
-void
-nsHtml5Portability::retainElement(nsIContent** element)
-{
-}
-
-// XXX Useless code
-void
-nsHtml5Portability::releaseElement(nsIContent** element)
-{
-}
-
 PRBool
 nsHtml5Portability::localEqualsBuffer(nsIAtom* local, PRUnichar* buf, PRInt32 offset, PRInt32 length)
 {
   return local->Equals(nsDependentSubstring(buf + offset, buf + offset + length));
diff --git a/parser/html/nsHtml5Portability.h b/parser/html/nsHtml5Portability.h
--- a/parser/html/nsHtml5Portability.h
+++ b/parser/html/nsHtml5Portability.h
@@ -68,10 +68,11 @@ class nsHtml5Portability
     static jArray<PRUnichar,PRInt32> newCharArrayFromString(nsString* string);
     static void releaseString(nsString* str);
     static void retainLocal(nsIAtom* local);
     static void releaseLocal(nsIAtom* local);
-    static void retainElement(nsIContent** elt);
-    static void releaseElement(nsIContent** elt);
+    // Advertise the uselessness here?
+    static void retainElement(nsIContent** elt) {}
+    static void releaseElement(nsIContent** elt) {}
     static PRBool localEqualsBuffer(nsIAtom* local, PRUnichar* buf, PRInt32 offset, PRInt32 length);
     static PRBool lowerCaseLiteralIsPrefixOfIgnoreAsciiCaseString(const char* lowerCaseLiteral, nsString* string);
     static PRBool lowerCaseLiteralEqualsIgnoreAsciiCaseString(const char* lowerCaseLiteral, nsString* string);
     static PRBool literalEqualsString(const char* literal, nsString* string);
diff --git a/parser/html/nsHtml5TreeBuilder.cpp b/parser/html/nsHtml5TreeBuilder.cpp
--- a/parser/html/nsHtml5TreeBuilder.cpp
+++ b/parser/html/nsHtml5TreeBuilder.cpp
@@ -617,8 +617,9 @@ nsHtml5TreeBuilder::startTag(nsHtml5Elem
                   selfClosing = PR_FALSE;
                 } else {
                   appendToCurrentNodeAndPushElementMayFosterCamelCase(currNs, elementName, attributes);
                 }
+                // Why do these assignments not use nsHtml5HtmlAttributes::EMPTY_ATTRIBUTES?
                 attributes = nsnull;
                 goto starttagloop_end;
               } else {
                 attributes->adjustForMath();
diff --git a/parser/html/nsHtml5TreeOpExecutor.cpp b/parser/html/nsHtml5TreeOpExecutor.cpp
--- a/parser/html/nsHtml5TreeOpExecutor.cpp
+++ b/parser/html/nsHtml5TreeOpExecutor.cpp
@@ -75,8 +75,10 @@ NS_IMPL_RELEASE_INHERITED(nsHtml5TreeOpE
 
 NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN_INHERITED(nsHtml5TreeOpExecutor, nsContentSink)
   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mFlushTimer)
   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mScriptElement)
+  // This is gonna traverse a lot of elements.  Maybe there's no other way, but
+  // I hope you're pretty sure this is necessary.
   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMARRAY(mOwnedElements)
   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMARRAY(mOwnedNonElements)
 NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END
 
diff --git a/parser/html/nsHtml5TreeOpExecutor.h b/parser/html/nsHtml5TreeOpExecutor.h
--- a/parser/html/nsHtml5TreeOpExecutor.h
+++ b/parser/html/nsHtml5TreeOpExecutor.h
@@ -112,8 +112,12 @@ class nsHtml5TreeOpExecutor : public nsI
     nsTArray<nsHtml5TreeOperation>       mOpQueue;
     nsTArray<nsIContentPtr>              mElementsSeenInThisAppendBatch;
     nsTArray<nsHtml5PendingNotification> mPendingNotifications;
     nsHtml5StreamParser*                 mStreamParser;
+
+    // Any idea how often duplicate elements are inserted?
+    // From a brief glance at debugging spew, I didn't notice any obvious
+    // repetition, so the hash doesn't seem immediately necessary.
     nsCOMArray<nsIContent>               mOwnedElements; // maybe change to hash
     
     // This could be optimized away by introducing more tree ops so that 
     // non-elements wouldn't use the handle setup but the text node / comment
diff --git a/parser/html/nsHtml5TreeOperation.cpp b/parser/html/nsHtml5TreeOperation.cpp
--- a/parser/html/nsHtml5TreeOperation.cpp
+++ b/parser/html/nsHtml5TreeOperation.cpp
@@ -58,15 +58,15 @@
 #include "nsIStyleSheetLinkingElement.h"
 #include "nsIDOMDocumentType.h"
 
 #define TO_CONTENT(_field) \
-  (*((nsIContent**)_field))
+  (*(static_cast<nsIContent**>(_field)))
 
 #define TO_ATOM(_field) \
-  ((nsIAtom*)_field)
+  (static_cast<nsIAtom*>(_field))
 
 #define TO_ATTRIBUTES(_field) \
-  ((nsHtml5HtmlAttributes*)_field)
+  (static_cast<nsHtml5HtmlAttributes*>(_field))
 
 nsHtml5TreeOperation::nsHtml5TreeOperation()
  : mOpCode(eTreeOpAppend)
 {
@@ -188,9 +188,9 @@ nsHtml5TreeOperation::Perform(nsHtml5Tre
               }
             }
           }
           
-          stateMask = stateMask ^ PRUint32(node->IntrinsicState());
+          stateMask ^= PRUint32(node->IntrinsicState());
           if (stateMask && document) {
             MOZ_AUTO_DOC_UPDATE(document, UPDATE_CONTENT_STATE, PR_TRUE);
             document->ContentStatesChanged(node, nsnull, stateMask);
           }
@@ -365,9 +365,9 @@ nsHtml5TreeOperation::Perform(nsHtml5Tre
       aBuilder->StartLayout(); // this causes a flush anyway
       return rv;
     }
     case eTreeOpDocumentMode: {
-      aBuilder->DocumentMode(*((nsHtml5DocumentMode*)mPtrOne));
+      aBuilder->DocumentMode(*static_cast<nsHtml5DocumentMode*>(mPtrOne));
       return rv;
     }
     default: {
       NS_NOTREACHED("Bogus tree op");
diff --git a/parser/html/nsHtml5TreeOperation.h b/parser/html/nsHtml5TreeOperation.h
--- a/parser/html/nsHtml5TreeOperation.h
+++ b/parser/html/nsHtml5TreeOperation.h
@@ -99,8 +99,9 @@ class nsHtml5TreeOperation {
 
     ~nsHtml5TreeOperation();
 
     inline void Init(nsIContent** aNode, nsIContent** aParent) {
+      // Why abandon the old meaningful variable names (mNode, mParent)?
       mPtrOne = (void*)aNode;
       mPtrTwo = (void*)aParent;
     }
 
@@ -130,8 +131,9 @@ class nsHtml5TreeOperation {
     inline void Init(nsHtml5DocumentMode aMode) {
       mOpCode = eTreeOpDocumentMode;
       switch (aMode) {
         case QUIRKS_MODE:
+          // Oh, I see.
           mPtrOne = (void*) &sQuirks;
           return;
         case ALMOST_STANDARDS_MODE:
           mPtrOne = (void*) &sAlmostStandards;
@@ -201,8 +203,9 @@ class nsHtml5TreeOperation {
     // 3) Make the queue take items the size of pointer and make the op code
     //    decide how many operands it dequeues after it.
     eHtml5TreeOperation mOpCode;
     PRInt32 mInt;
+    // Can you at least make these union types?
     void* mPtrOne;
     void* mPtrTwo;
     void* mPtrThree;
 };
This trick essentially gives the value stored in mPtrOne a bunch of typed aliases.  As each name corresponds to a specific type, the need for void* casting goes away.

I still think having multiple members (m1, m2, m3) is useful, although technically you could have union members like struct { nsIContent **node, **parent }.  Since the options provided by m1..3 will probably be distinct, the division ensures somewhat more type safety than having a single union type.
> -// XXX Useless code
> -void
> -nsHtml5Portability::retainElement(nsIContent** element)
> -{
> -}
> -
> -// XXX Useless code
> -void
> -nsHtml5Portability::releaseElement(nsIContent** element)
> -{
> -}

I've now moved the ugliness to the Java source and the translator. I didn't remove these from the Java source, yet, because it seemed premature to remove the possibility of element refcounting C++ targets for non-Gecko uses.

However, making the translator change take effect with this patch would be inconvenient, because by now, the translator also has other changes for the next iteration and those would break this patch.

OK if those useless methods go away with the next landing after this one?

> diff --git a/parser/html/nsHtml5TreeBuilder.cpp
> b/parser/html/nsHtml5TreeBuilder.cpp
> --- a/parser/html/nsHtml5TreeBuilder.cpp
> +++ b/parser/html/nsHtml5TreeBuilder.cpp
> @@ -617,8 +617,9 @@ nsHtml5TreeBuilder::startTag(nsHtml5Elem
>                    selfClosing = PR_FALSE;
>                  } else {
>                    appendToCurrentNodeAndPushElementMayFosterCamelCase(currNs,
> elementName, attributes);
>                  }
> +                // Why do these assignments not use
> nsHtml5HtmlAttributes::EMPTY_ATTRIBUTES?
>                  attributes = nsnull;

It felt more natural to use null as the marker that the the object's ownership has been adopted away. This is a bit awkward, but the attribute holder will cross the thread boundary and I wanted to avoid situation where an object participates in refcounting on two threads. Also, I wanted to avoid constructs that can't be expressed easily in the Java source (reference to pointer or pointer pointer that'd allow the method doing the adoption write into a stack variable of its invocator).

> diff --git a/parser/html/nsHtml5TreeOpExecutor.cpp
> b/parser/html/nsHtml5TreeOpExecutor.cpp
> --- a/parser/html/nsHtml5TreeOpExecutor.cpp
> +++ b/parser/html/nsHtml5TreeOpExecutor.cpp
> @@ -75,8 +75,10 @@ NS_IMPL_RELEASE_INHERITED(nsHtml5TreeOpE
> 
>  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN_INHERITED(nsHtml5TreeOpExecutor,
> nsContentSink)
>    NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mFlushTimer)
>    NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mScriptElement)
> +  // This is gonna traverse a lot of elements.  Maybe there's no other way,
> but
> +  // I hope you're pretty sure this is necessary.
>    NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMARRAY(mOwnedElements)
>    NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMARRAY(mOwnedNonElements)
>  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

It's necessary if one wants the cycle collector to be able to correctly traverse the whole object graph. It's rather useless if it's useless to have the cycle collector fully traverse the object graph during the parse, though.

Therefore, I'm not at all sure that this is necessary. I suspect it's unnecessary, but I'm not 100% sure it's unnecessary. That is, I don't know if there are cases where the parser stops in mid-parse and the cycle collector is relied upon for detecting the situation and collecting the whole graph.

> diff --git a/parser/html/nsHtml5TreeOpExecutor.h
> b/parser/html/nsHtml5TreeOpExecutor.h
> --- a/parser/html/nsHtml5TreeOpExecutor.h
> +++ b/parser/html/nsHtml5TreeOpExecutor.h
> @@ -112,8 +112,12 @@ class nsHtml5TreeOpExecutor : public nsI
>      nsTArray<nsHtml5TreeOperation>       mOpQueue;
>      nsTArray<nsIContentPtr>              mElementsSeenInThisAppendBatch;
>      nsTArray<nsHtml5PendingNotification> mPendingNotifications;
>      nsHtml5StreamParser*                 mStreamParser;
> +
> +    // Any idea how often duplicate elements are inserted?
> +    // From a brief glance at debugging spew, I didn't notice any obvious
> +    // repetition, so the hash doesn't seem immediately necessary.
>      nsCOMArray<nsIContent>               mOwnedElements; // maybe change to
> hash

What I had in mind was trying to deref the nodes earlier in the future. In such a scenario, it would also have to be possible to addref the "form pointer" element. I've removed the comment.

> -  (*((nsIContent**)_field))
> +  (*(static_cast<nsIContent**>(_field)))

Casting replaced with use of unions.

> -          stateMask = stateMask ^ PRUint32(node->IntrinsicState());
> +          stateMask ^= PRUint32(node->IntrinsicState());

Fixed.

> +      // Why abandon the old meaningful variable names (mNode, mParent)?

Because many tree ops use the members for other purposes.

> +    // Can you at least make these union types?

Made them into unions.

> I still think having multiple members (m1, m2, m3) is useful, although
> technically you could have union members like struct { nsIContent **node,
> **parent }.  Since the options provided by m1..3 will probably be distinct, the
> division ensures somewhat more type safety than having a single union type.

Type safety is up to the programmer here anyway, so I went with a single union type for the three members.

Also changed the atom stuff to use the new patch for bug 514661.
Attachment #400756 - Attachment is obsolete: true
Attachment #401834 - Flags: review?(bnewman)
Attachment #400756 - Flags: review?(bnewman)
Attachment #401834 - Attachment is obsolete: true
Attachment #401876 - Flags: review?(bnewman)
Attachment #401834 - Flags: review?(bnewman)
Actually, avoiding leaks by using the destructor is wrong, because nsTArray copies its contents upon append. The best fix would be having a way to append an nsTArray to another by moving so that array items would neither be copy-constructed nor destructed by the append.
It seems the AppendElements() stuff wasn't in this patch but in the next one in my queue, so the copy-construction case doesn't arise yet in this patch.
Blocks: 516406
Attached patch Clean up compiler warnings (obsolete) — Splinter Review
Attachment #401876 - Attachment is obsolete: true
Attachment #402824 - Flags: review?(bnewman)
Attachment #401876 - Flags: review?(bnewman)
Comment on attachment 402962 [details] [diff] [review]
feedback for "Avoid leaks when tree ops are discarded without executing them

(In reply to comment #13)
> Created an attachment (id=402962) [details]
> feedback for "Avoid leaks when tree ops are discarded without executing them

Moving the union type into nsHtml5TreeOperation, using nsAutoPtr and nsAutoArrayPtr instead of the ~nsHtml5TreeOperation (seems cleaner to me), and const-ifying the nsHtml5Parser::mAtomTable member (since it never gets reassigned).
Attachment #402962 - Attachment is patch: true
Attachment #402962 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #14)
> (From update of attachment 402962 [details] [diff] [review])
> (In reply to comment #13)
> > Created an attachment (id=402962) [details] [details]
> > feedback for "Avoid leaks when tree ops are discarded without executing them
> 
> Moving the union type into nsHtml5TreeOperation, using nsAutoPtr and
> nsAutoArrayPtr instead of the ~nsHtml5TreeOperation (seems cleaner to me), 

Won't this fail when we have speculation and tree ops in failed speculations get destroyed without ever calling Perform() them? (Even now forced Teminated() should lead to that situation.)

> and
> const-ifying the nsHtml5Parser::mAtomTable member (since it never gets
> reassigned).

Never reassiging is a bug, actually. It needs to get reassigned or otherwise emptied when the innerHTML parser is recycled.
Attached patch Address review comments (obsolete) — Splinter Review
I made the other changes except I didn't move the tree op member deletions due to concerns stated above.
Attachment #402824 - Attachment is obsolete: true
Attachment #402824 - Flags: review?(bnewman)
This step along the way introduced string leaks. Must be a failure to delete an attribute holder somewhere.
Bug 518104 is now in the queue ahead of this one, since I had to put one or the other first.
Depends on: 518104
Attached patch Fix memory leaksSplinter Review
Attachment #403779 - Attachment is obsolete: true
Attachment #404803 - Flags: review?(bnewman)
Comment on attachment 404803 [details] [diff] [review]
Fix memory leaks

Looks good to me.
Attachment #404803 - Flags: review?(bnewman) → review+
Blocks: 522512
Status: NEW → ASSIGNED
Blocks: 515278
Thank you for the review.

Pushed as http://hg.mozilla.org/mozilla-central/rev/34bc9a8eb49e
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: