[HTML5] Split the HTML5 parser into stream parser, doc.write parser and tree op executor

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

Other Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 13 obsolete attachments)

175.22 KB, patch
mozilla+ben
: review+
mrbkap
: superreview+
Details | Diff | Splinter Review
In order to prepare for off-the-main-thread parsing, the HTML5 parser should be split into 
 * The network stream parser (will move off the main thread)
 * document.write() parser (duplicates tokenizer & tree builder; will stay on the main thread)
 * Tree operation executor (will stay on the main thread)
 * Some kind of coordinator that implements nsIParser and wraps these.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
This stuff is more intertwingled than I thought. :-(
Attachment #395047 - Attachment is obsolete: true
Up next, splitting off the stream parser from the rest of the parser.

I think I'm going to leave the doc.write parser inside the coordinator object, since splitting those seems to carry no benefit.
Attachment #395575 - Attachment is obsolete: true
Note to self: diffed against rev 32d7abba0d2b.
Attachment #395595 - Attachment is obsolete: true
Requires htmlparser/translator svn rev 576 now.
Posted patch Done except rebasing (obsolete) — Splinter Review
Attachment #396703 - Attachment is obsolete: true
Posted patch Hopefully done (obsolete) — Splinter Review
Rebased, fixed document mode. I guess this one needs to go through tryserver and dogfooding a bit.
Attachment #396721 - Attachment is obsolete: true
Sigh. This is broken when there's a late charset switch.
Comment on attachment 397002 [details] [diff] [review]
Maybe now. With zeroing operator new.

Seems to work finally. Rebased to 9798dcc4a19d.

The only deliberate behavior change is ignoring late meta charset switches from document.written content.
Attachment #397002 - Flags: review?(bnewman)
Note: This patch deliberately doesn't introduce dual tokenizers and tree builders yet. nsHtml5StreamParser just has pointers to the tokenizer and tree builder owned by nsHtml5Parser.
(In reply to comment #9)
> Sigh. This is broken when there's a late charset switch.

What breaks, out of curiosity?
Nits mostly, some of them probably beside the point at this stage.

My biggest concern is with the lack of QueryInterface calls (you use static_cast instead).  I think there may be something wrong with the interface table declarations of nsHtml5{TreeOpExecutor,StreamParser} that prevents QueryInterface from working properly (I get errors about nsISupports being an ambiguous base), but I'm not sure how to fix it.

I assume ignoring late charset switches is a permanent design decision?  Is there any treatment of that in the HTML5 spec (or elsewhere)?

r=me if you'll consider and/or accept the following suggestions:

diff --git a/parser/html/nsHtml5Parser.cpp b/parser/html/nsHtml5Parser.cpp
--- a/parser/html/nsHtml5Parser.cpp
+++ b/parser/html/nsHtml5Parser.cpp
@@ -89,24 +89,24 @@ NS_INTERFACE_TABLE_HEAD(nsHtml5Parser)
 NS_INTERFACE_MAP_END
 
 NS_IMPL_CYCLE_COLLECTING_ADDREF(nsHtml5Parser)
 NS_IMPL_CYCLE_COLLECTING_RELEASE(nsHtml5Parser)
 
 NS_IMPL_CYCLE_COLLECTION_CLASS(nsHtml5Parser)
 
 NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(nsHtml5Parser)
-  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mExecutor, nsIContentSink);
-  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mStreamParser, nsIStreamListener);
+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mExecutor, nsIContentSink)
+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mStreamParser, nsIStreamListener)
   tmp->mTreeBuilder->DoTraverse(cb);
 NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END
 
 NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(nsHtml5Parser)
-  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mExecutor);
-  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mStreamParser);
+  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mExecutor)
+  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mStreamParser)
   tmp->mTreeBuilder->DoUnlink();
 NS_IMPL_CYCLE_COLLECTION_UNLINK_END
 
 nsHtml5Parser::nsHtml5Parser()
   : mFirstBuffer(new nsHtml5UTF16Buffer(0))
   , mLastBuffer(mFirstBuffer)
   , mExecutor(new nsHtml5TreeOpExecutor())
   , mTreeBuilder(new nsHtml5TreeBuilder(mExecutor))
@@ -130,16 +130,17 @@ nsHtml5Parser::SetContentSink(nsIContent
 {
   NS_ASSERTION(aSink == static_cast<nsIContentSink*> (mExecutor), 
                "Attempt to set a foreign sink.");
 }
 
 NS_IMETHODIMP_(nsIContentSink*)
 nsHtml5Parser::GetContentSink(void)
 {
+  // Why can't this be done with a QueryInterface?
   return static_cast<nsIContentSink*> (mExecutor);
 }
 
 NS_IMETHODIMP_(void)
 nsHtml5Parser::GetCommand(nsCString& aCommand)
 {
   aCommand.Assign("view");
 }
@@ -155,16 +156,17 @@ nsHtml5Parser::SetCommand(eParserCommand
 {
   NS_ASSERTION(aParserCommand == eViewNormal, 
                "Parser command was not eViewNormal.");
 }
 
 NS_IMETHODIMP_(void)
 nsHtml5Parser::SetDocumentCharset(const nsACString& aCharset, PRInt32 aCharsetSource)
 {
+  // Why don't we call mExecutor->SetDocumentCharset(aCharset) here?
   NS_PRECONDITION(mExecutor->GetLifeCycle() == NOT_STARTED,
                   "Document charset set too late.");
   NS_PRECONDITION(mStreamParser, "Tried to set charset on a script-only parser.");
   mStreamParser->SetDocumentCharset(aCharset, aCharsetSource);
 }
 
 NS_IMETHODIMP_(void)
 nsHtml5Parser::SetParserFilter(nsIParserFilter* aFilter)
@@ -190,16 +192,17 @@ nsHtml5Parser::GetDTD(nsIDTD** aDTD)
 }
 
 NS_IMETHODIMP
 nsHtml5Parser::GetStreamListener(nsIStreamListener** aListener)
 {
   if (!mStreamParser) {
     mStreamParser = new nsHtml5StreamParser(mTokenizer, mExecutor, this);
   }
+  // Needs a QueryInterface:
   NS_ADDREF(*aListener = mStreamParser);
   return NS_OK;
 }
 
 NS_IMETHODIMP
 nsHtml5Parser::ContinueParsing()
 {
   UnblockParser();
@@ -505,18 +508,19 @@ nsHtml5Parser::CancelParsingEvents()
 }
 
 void
 nsHtml5Parser::Reset()
 {
   mExecutor->Reset();
   mLastWasCR = PR_FALSE;
   mFragmentMode = PR_FALSE;
-  mBlocked = PR_FALSE;
+  UnblockParser();
   mSuspending = PR_FALSE;
+  // Isn't this redundant with mExecutor->Reset()?
   mExecutor->SetLifeCycle(NOT_STARTED);
   mStreamParser = nsnull;
   // mUninterruptibleDocWrite = PR_FALSE;
   mRootContextKey = nsnull;
   mContinueEvent = nsnull;  // weak ref
   // Portable parser objects
   while (mFirstBuffer->next) {
     nsHtml5UTF16Buffer* oldBuf = mFirstBuffer;
diff --git a/parser/html/nsHtml5Parser.h b/parser/html/nsHtml5Parser.h
--- a/parser/html/nsHtml5Parser.h
+++ b/parser/html/nsHtml5Parser.h
@@ -316,17 +316,17 @@ class nsHtml5Parser : public nsIParser {
     /**
      * The event loop will spin ASAP
      */
     PRBool                        mSuspending;
 
     // script execution
 
     /**
-     * 
+     * Can this be killed off?
      */
     PRBool                        mUninterruptibleDocWrite;
 
     // Gecko integration
     void*                         mRootContextKey;
     nsIRunnable*                  mContinueEvent;  // weak ref
 
     // Portable parser objects
@@ -344,22 +344,22 @@ class nsHtml5Parser : public nsIParser {
     /**
      * The tree operation executor
      */
     nsRefPtr<nsHtml5TreeOpExecutor> mExecutor;
 
     /**
      * The HTML5 tree builder
      */
-    nsAutoPtr<nsHtml5TreeBuilder> mTreeBuilder;
+    const nsAutoPtr<nsHtml5TreeBuilder> mTreeBuilder;
 
     /**
      * The HTML5 tokenizer
      */
-    nsAutoPtr<nsHtml5Tokenizer>   mTokenizer;
+    const nsAutoPtr<nsHtml5Tokenizer>   mTokenizer;
 
     /**
      * The stream parser.
      */
     nsRefPtr<nsHtml5StreamParser> mStreamParser;
 
 };
 #endif
diff --git a/parser/html/nsHtml5StreamParser.cpp b/parser/html/nsHtml5StreamParser.cpp
--- a/parser/html/nsHtml5StreamParser.cpp
+++ b/parser/html/nsHtml5StreamParser.cpp
@@ -193,16 +193,18 @@ nsHtml5StreamParser::FinalizeSniffing(co
       }
       rv = detector->Done();
       NS_ENSURE_SUCCESS(rv, rv);
       // fall thru; callback may have changed charset
     } else {
       NS_ERROR("Could not instantiate charset detector.");
     }
   }
+  // Are we relying on the zeroing operator new to set mCharsetSource to
+  // kCharsetUninitialized by default?
   if (mCharsetSource == kCharsetUninitialized) {
     // Hopefully this case is never needed, but dealing with it anyway
     mCharset.Assign("windows-1252");
     mCharsetSource = kCharsetFromWeakDocTypeDefault;
     mExecutor->SetDocumentCharset(mCharset);
   }
   return SetupDecodingAndWriteSniffingBufferAndCurrentSegment(aFromSegment, aCount, aWriteCount);
 }
@@ -417,32 +419,33 @@ nsHtml5StreamParser::OnStopRequest(nsIRe
     PRUint32 writeCount;
     rv = FinalizeSniffing(nsnull, 0, &writeCount, 0);
     NS_ENSURE_SUCCESS(rv, rv);
   }
   switch (mExecutor->GetLifeCycle()) {
     case TERMINATED:
       break;
     case NOT_STARTED:
+      // What's the deal with these XXX!!! lines?
       // mExecutor->SetParser(this); XXX!!!
       // mTreeBuilder->setScriptingEnabled(mExecutor->IsScriptEnabled()); XXX!!!
       mTokenizer->start();
       mExecutor->SetLifeCycle(STREAM_ENDING);
       break;
     case STREAM_ENDING:
       NS_ERROR("OnStopRequest when the stream lifecycle was already ending.");
       break;
     default:
       mExecutor->SetLifeCycle(STREAM_ENDING);
       break;
   }
 #ifdef DEBUG
   mStreamListenerState = eOnStop;
 #endif
-  if (!mExecutor->IsScriptExecutingImpl()) {
+  if (!mExecutor->IsScriptExecuting()) {
     ParseUntilSuspend();
   }
   if (mObserver) {
     mObserver->OnStopRequest(aRequest, aContext, status);
   }
   return NS_OK;
 }
 
@@ -479,17 +482,17 @@ nsHtml5StreamParser::OnDataAvailable(nsI
   mExecutor->MaybeFlush();
   NS_PRECONDITION(eOnStart == mStreamListenerState ||
                   eOnDataAvail == mStreamListenerState,
             "Error: OnStartRequest() must be called before OnDataAvailable()");
   NS_ASSERTION(mRequest == aRequest, "Got data on wrong stream.");
   PRUint32 totalRead;
   nsresult rv = aInStream->ReadSegments(ParserWriteFunc, static_cast<void*> (this), aLength, &totalRead);
   NS_ASSERTION(totalRead == aLength, "ReadSegments read the wrong number of bytes.");
-  if (!mExecutor->IsScriptExecutingImpl()) {
+  if (!mExecutor->IsScriptExecuting()) {
     ParseUntilSuspend();
   }
   return rv;
 }
 
 void
 nsHtml5StreamParser::internalEncodingDeclaration(nsString* aEncoding)
 {
diff --git a/parser/html/nsHtml5StreamParser.h b/parser/html/nsHtml5StreamParser.h
--- a/parser/html/nsHtml5StreamParser.h
+++ b/parser/html/nsHtml5StreamParser.h
@@ -152,16 +152,18 @@ class nsHtml5StreamParser : public nsISt
     inline void Suspend() {
       mSuspending = PR_TRUE;
     }
 
     void ParseUntilSuspend();
     
   // The stuff below here is conceptually private but needs to be reachable
   // from a C-style callback. 
+  // If you made ParserWriteFunc a static method of nsHtml5StreamParser, then
+  // you wouldn't need to publicize SniffStreamBytes and WriteStreamBytes.
 
     /**
      * True when there is a Unicode decoder already
      */
     inline PRBool HasDecoder() {
       return !!mUnicodeDecoder;
     }
 
diff --git a/parser/html/nsHtml5TreeOpExecutor.cpp b/parser/html/nsHtml5TreeOpExecutor.cpp
--- a/parser/html/nsHtml5TreeOpExecutor.cpp
+++ b/parser/html/nsHtml5TreeOpExecutor.cpp
@@ -59,17 +59,17 @@
 
 #define NS_HTML5_TREE_OP_EXECUTOR_MAX_QUEUE_TIME 3000UL // milliseconds
 #define NS_HTML5_TREE_OP_EXECUTOR_DEFAULT_QUEUE_LENGTH 200
 #define NS_HTML5_TREE_OP_EXECUTOR_MIN_QUEUE_LENGTH 100
 #define NS_HTML5_TREE_OP_EXECUTOR_MAX_TIME_WITHOUT_FLUSH 5000 // milliseconds
 
 NS_IMPL_CYCLE_COLLECTION_CLASS(nsHtml5TreeOpExecutor)
 
-NS_INTERFACE_TABLE_HEAD_CYCLE_COLLECTION_INHERITED(nsHtml5TreeOpExecutor) \
+NS_INTERFACE_TABLE_HEAD_CYCLE_COLLECTION_INHERITED(nsHtml5TreeOpExecutor)
   NS_INTERFACE_TABLE_INHERITED1(nsHtml5TreeOpExecutor, 
                                 nsIContentSink)
 NS_INTERFACE_TABLE_TAIL_INHERITING(nsContentSink)
 
 NS_IMPL_ADDREF_INHERITED(nsHtml5TreeOpExecutor, nsContentSink)
 
 NS_IMPL_RELEASE_INHERITED(nsHtml5TreeOpExecutor, nsContentSink)
 
@@ -94,16 +94,18 @@ nsHtml5TreeOpExecutor::nsHtml5TreeOpExec
   , mFlushTimer(do_CreateInstance("@mozilla.org/timer;1"))
   , mNeedsCharsetSwitch(PR_FALSE)
 {
 
 }
 
 nsHtml5TreeOpExecutor::~nsHtml5TreeOpExecutor()
 {
+  // Why not just assert emptiness here?  Can the executor ever be destroyed
+  // when there are outstanding operations in the queue?
   mOpQueue.Clear();
   if (mFlushTimer) {
     mFlushTimer->Cancel(); // XXX why is this even necessary? it is, though.
   }
 }
 
 static void
 TimerCallbackFunc(nsITimer* aTimer, void* aClosure)
@@ -500,16 +502,18 @@ nsHtml5TreeOpExecutor::MaybeSuspend() {
   }
 }
 
 void
 nsHtml5TreeOpExecutor::MaybeExecuteScript() {
   if (mScriptElement) {
     // mUninterruptibleDocWrite = PR_FALSE;
     ExecuteScript();
+    // This seems to be the only use of mStreamParser in nsHtml5TreeOpExecutor.
+    // Would it be possible to avoid letting the executor hold this reference?
     mStreamParser->Suspend();
   }    
 }
 
 void
 nsHtml5TreeOpExecutor::DeferredTimerFlush() {
   if (mTreeBuilder->GetOpQueueLength() > 0) {
     mNeedsFlush = PR_TRUE;
diff --git a/parser/html/nsHtml5TreeOpExecutor.h b/parser/html/nsHtml5TreeOpExecutor.h
--- a/parser/html/nsHtml5TreeOpExecutor.h
+++ b/parser/html/nsHtml5TreeOpExecutor.h
@@ -174,16 +174,17 @@ public:
     /**
      * Unimplemented. For interface compat only.
      */
     NS_IMETHOD WillResume();
 
     /**
      * Sets the parser.
      */
+    // Can the TreeOpExecutor really handle arbitrary nsIParser*s, or just nsHtml5Parser*s?
     NS_IMETHOD SetParser(nsIParser* aParser);
 
     /**
      * No-op for backwards compat.
      */
     virtual void FlushPendingNotifications(mozFlushType aType);
 
     /**
@@ -201,16 +202,18 @@ public:
     virtual void UpdateChildCounts();
     virtual nsresult FlushTags();
     virtual void PostEvaluateScript(nsIScriptElement *aElement);
     /**
      * Sets up style sheet load / parse
      */
     void UpdateStyleSheet(nsIContent* aElement);
 
+    // Shouldn't need all these 'inline' keywords inside the class definition...
+
     // Getters and setters for fields from nsContentSink
     inline nsIDocument* GetDocument() {
       return mDocument;
     }
     inline nsNodeInfoManager* GetNodeInfoManager() {
       return mNodeInfoManager;
     }
     inline nsIDocShell* GetDocShell() {
@@ -346,16 +349,17 @@ public:
     inline eHtml5ParserLifecycle GetLifeCycle() {
       return mLifeCycle;
     }
     
     inline void SetLifeCycle(eHtml5ParserLifecycle aLifeCycle) {
       mLifeCycle = aLifeCycle;
     }
     
+    // Does this need to be public?
     nsHtml5Tokenizer* GetTokenizer();
     
     void MaybeExecuteScript();
     
     inline void MaybePreventExecution() {
       if (mScriptElement) {
         nsCOMPtr<nsIScriptElement> script = do_QueryInterface(mScriptElement);
         NS_ASSERTION(script, "mScriptElement didn't QI to nsIScriptElement!");
(In reply to comment #14)
> My biggest concern is with the lack of QueryInterface calls (you use
> static_cast instead).

I was thinking of QueryInterface as a dynamic_cast substitute and these cases are casts where it's known what the other object is.

> I assume ignoring late charset switches is a permanent design decision?  Is
> there any treatment of that in the HTML5 spec (or elsewhere)?
> 
> r=me if you'll consider and/or accept the following suggestions:

Unfortunately, there was still at least one substantial bug left: It was possible for nsHtml5Parser to get refcounted to zero and released while nsHtml5StreamParser was still alive. I've tried to address that issue in a new patch by introducing a reference cycle that gets broken in the end just like the parser–sink cycle.

Re-requesting review with this substantial difference. (But bnewman seems to be away right now.) Also, requesting sr for the nsIParser interface change.

> -  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mExecutor,
> nsIContentSink);
> -  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mStreamParser,
> nsIStreamListener);
> +  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mExecutor,
> nsIContentSink)
> +  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mStreamParser,
> nsIStreamListener)
[...]
> -  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mExecutor);
> -  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mStreamParser);
> +  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mExecutor)
> +  NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mStreamParser)

Fixed.

>  NS_IMETHODIMP_(nsIContentSink*)
>  nsHtml5Parser::GetContentSink(void)
>  {
> +  // Why can't this be done with a QueryInterface?
>    return static_cast<nsIContentSink*> (mExecutor);
>  }

It probably could, but it seems it doesn't need to. Is it poor form to do it this way?

>  NS_IMETHODIMP_(void)
>  nsHtml5Parser::SetDocumentCharset(const nsACString& aCharset, PRInt32
> aCharsetSource)
>  {
> +  // Why don't we call mExecutor->SetDocumentCharset(aCharset) here?
>    NS_PRECONDITION(mExecutor->GetLifeCycle() == NOT_STARTED,
>                    "Document charset set too late.");
>    NS_PRECONDITION(mStreamParser, "Tried to set charset on a script-only
> parser.");
>    mStreamParser->SetDocumentCharset(aCharset, aCharsetSource);
>  }

Fixed.

>  NS_IMETHODIMP
>  nsHtml5Parser::GetStreamListener(nsIStreamListener** aListener)
>  {
>    if (!mStreamParser) {
>      mStreamParser = new nsHtml5StreamParser(mTokenizer, mExecutor, this);
>    }
> +  // Needs a QueryInterface:
>    NS_ADDREF(*aListener = mStreamParser);
>    return NS_OK;
>  }

Since mStreamParser is a class pointer, doesn't the compiler to the right thing here without QI?

>  void
>  nsHtml5Parser::Reset()
>  {
>    mExecutor->Reset();
>    mLastWasCR = PR_FALSE;
>    mFragmentMode = PR_FALSE;
> -  mBlocked = PR_FALSE;
> +  UnblockParser();

Fixed.

>    mSuspending = PR_FALSE;
> +  // Isn't this redundant with mExecutor->Reset()?
>    mExecutor->SetLifeCycle(NOT_STARTED);

Yes. Removed.

>      /**
> -     * 
> +     * Can this be killed off?
>       */
>      PRBool                        mUninterruptibleDocWrite;

Yes. Killed.

>      /**
>       * The HTML5 tree builder
>       */
> -    nsAutoPtr<nsHtml5TreeBuilder> mTreeBuilder;
> +    const nsAutoPtr<nsHtml5TreeBuilder> mTreeBuilder;
> 
>      /**
>       * The HTML5 tokenizer
>       */
> -    nsAutoPtr<nsHtml5Tokenizer>   mTokenizer;
> +    const nsAutoPtr<nsHtml5Tokenizer>   mTokenizer;

Fixed.

> +  // Are we relying on the zeroing operator new to set mCharsetSource to
> +  // kCharsetUninitialized by default?
>    if (mCharsetSource == kCharsetUninitialized) {

Yes.

>    switch (mExecutor->GetLifeCycle()) {
>      case TERMINATED:
>        break;
>      case NOT_STARTED:
> +      // What's the deal with these XXX!!! lines?
>        // mExecutor->SetParser(this); XXX!!!
>        // mTreeBuilder->setScriptingEnabled(mExecutor->IsScriptEnabled());
> XXX!!!

Oops. Fixed with NS_NOTREACHED, because this is now handled in Parse() in nsHtml5Parser.

> -  if (!mExecutor->IsScriptExecutingImpl()) {
> +  if (!mExecutor->IsScriptExecuting()) {
[...]
> -  if (!mExecutor->IsScriptExecutingImpl()) {
> +  if (!mExecutor->IsScriptExecuting()) {

Fixed.

>    // The stuff below here is conceptually private but needs to be reachable
>    // from a C-style callback. 
> +  // If you made ParserWriteFunc a static method of nsHtml5StreamParser, then
> +  // you wouldn't need to publicize SniffStreamBytes and WriteStreamBytes.

Fixed.

> -NS_INTERFACE_TABLE_HEAD_CYCLE_COLLECTION_INHERITED(nsHtml5TreeOpExecutor) \
> +NS_INTERFACE_TABLE_HEAD_CYCLE_COLLECTION_INHERITED(nsHtml5TreeOpExecutor)

Fixed.

>  nsHtml5TreeOpExecutor::~nsHtml5TreeOpExecutor()
>  {
> +  // Why not just assert emptiness here?  Can the executor ever be destroyed
> +  // when there are outstanding operations in the queue?
>    mOpQueue.Clear();

Fixed.

>  void
>  nsHtml5TreeOpExecutor::MaybeExecuteScript() {
>    if (mScriptElement) {
>      // mUninterruptibleDocWrite = PR_FALSE;
>      ExecuteScript();
> +    // This seems to be the only use of mStreamParser in
> nsHtml5TreeOpExecutor.
> +    // Would it be possible to avoid letting the executor hold this reference?
>      mStreamParser->Suspend();
>    }    
>  }

My plan was for this to go away in a later iteration.

>      /**
>       * Sets the parser.
>       */
> +    // Can the TreeOpExecutor really handle arbitrary nsIParser*s, or just
> nsHtml5Parser*s?
>      NS_IMETHOD SetParser(nsIParser* aParser);

It can't, but this method signature comes from nsIContentSink and didn't seem useful to introduce a second method that takes nsHtml5Parser*.

> +    // Shouldn't need all these 'inline' keywords inside the class
> definition...

Removed. Do compilers these days inline the methods anyway?

> +    // Does this need to be public?
>      nsHtml5Tokenizer* GetTokenizer();

Fixed.
Attachment #397002 - Attachment is obsolete: true
Attachment #397854 - Flags: superreview?(mrbkap)
Attachment #397854 - Flags: review?(mrbkap)
Attachment #397002 - Flags: review?(bnewman)
Attachment #397854 - Attachment is obsolete: true
Attachment #397854 - Flags: superreview?(mrbkap)
Attachment #397854 - Flags: review?(mrbkap)
Comment on attachment 397854 [details] [diff] [review]
Address review commets, make stream parser hold a strong ref back to parser

The old problem that script global object doesn't exist until first OnDataAvailable came back to bite me.

This patch is broken. Since the whole point is moving OnDataAvailable elsewhere, this situation is a bit of a problem.
I thought about putting a tree op for WillBuildModel on the queue, but that doesn't work right if the network stream ends and is removed from the load group before the queue is flushed.

Maybe it's time to actually fix things so that script global object is initialized before calling into the parser.
Changing the docshell seemed too difficult and dangerous, so I fixed the issue (again) on the parser side this time leaving around comments to prevent the issue for reoccurring.
Attachment #398354 - Flags: superreview?(mrbkap)
Attachment #398354 - Flags: review?(mrbkap)
Attachment #398354 - Attachment is obsolete: true
Attachment #398354 - Flags: superreview?(mrbkap)
Attachment #398354 - Flags: review?(mrbkap)
Comment on attachment 398354 [details] [diff] [review]
Fix script global object order

Still not baked enough. This stuff is incredible brittle. :-(
> I assume ignoring late charset switches is a permanent design decision?  Is
> there any treatment of that in the HTML5 spec (or elsewhere)?

This is to address bug 487949 comment 69. It's not in the spec yet.

(Now is a convenient time to address this, because changing this as a separate patch would be harder.)
Sorry about the bug spam. Managed to attached the wrong patch twice.
Attachment #399054 - Attachment is obsolete: true
Attachment #399056 - Flags: superreview?(mrbkap)
Attachment #399056 - Flags: review?(bnewman)
Two quick questions before I grant the review:

diff --git a/parser/html/nsHtml5Parser.cpp b/parser/html/nsHtml5Parser.cpp
--- a/parser/html/nsHtml5Parser.cpp
+++ b/parser/html/nsHtml5Parser.cpp
@@ -292,16 +292,19 @@ nsHtml5Parser::Parse(const nsAString& aS
                      PRBool aLastCall,
                      nsDTDMode aMode) // ignored
 {
   NS_PRECONDITION(!mFragmentMode, "Document.write called in fragment mode!");
 
   // Maintain a reference to ourselves so we don't go away
   // till we're completely done.
   nsCOMPtr<nsIParser> kungFuDeathGrip(this);
+  // I'm all for more kung fu, but it seems as though the death grip on the
+  // parser should protect the parser's nsRefPtr members, so these additional
+  // grips shouldn't be necessary.  Any theory as to why they help?
   nsRefPtr<nsHtml5StreamParser> streamKungFuDeathGrip(mStreamParser);
   nsRefPtr<nsHtml5TreeOpExecutor> treeOpKungFuDeathGrip(mExecutor);
 
   // Return early if the parser has processed EOF
   switch (mExecutor->GetLifeCycle()) {
     case TERMINATED:
       return NS_OK;
     case NOT_STARTED:
diff --git a/parser/html/nsHtml5StreamParser.cpp b/parser/html/nsHtml5StreamParser.cpp
--- a/parser/html/nsHtml5StreamParser.cpp
+++ b/parser/html/nsHtml5StreamParser.cpp
@@ -394,16 +394,17 @@ nsHtml5StreamParser::OnStartRequest(nsIR
   mStreamListenerState = eOnStart;
 #endif
   mRequest = aRequest;
 
   /*
    * If you move the following line, be very careful not to cause 
    * WillBuildModel to be called before the document has had its 
    * script global object set.
+   * Can this be enforced with an assertion in nsHtml5TreeOpExecutor::WillBuildModel?
    */
   mTokenizer->start();
   mExecutor->SetLifeCycle(PARSING);
   
   if (mCharsetSource < kCharsetFromChannel) {
     // we aren't ready to commit to an encoding yet
     // leave converter uninstantiated for now
     return NS_OK;
(In reply to comment #24)
> Two quick questions before I grant the review:
> 
> diff --git a/parser/html/nsHtml5Parser.cpp b/parser/html/nsHtml5Parser.cpp
> --- a/parser/html/nsHtml5Parser.cpp
> +++ b/parser/html/nsHtml5Parser.cpp
> @@ -292,16 +292,19 @@ nsHtml5Parser::Parse(const nsAString& aS
>                       PRBool aLastCall,
>                       nsDTDMode aMode) // ignored
>  {
>    NS_PRECONDITION(!mFragmentMode, "Document.write called in fragment mode!");
> 
>    // Maintain a reference to ourselves so we don't go away
>    // till we're completely done.
>    nsCOMPtr<nsIParser> kungFuDeathGrip(this);
> +  // I'm all for more kung fu, but it seems as though the death grip on the
> +  // parser should protect the parser's nsRefPtr members, so these additional
> +  // grips shouldn't be necessary.  Any theory as to why they help?
>    nsRefPtr<nsHtml5StreamParser> streamKungFuDeathGrip(mStreamParser);
>    nsRefPtr<nsHtml5TreeOpExecutor> treeOpKungFuDeathGrip(mExecutor);

In the other case where the old parser does a kungFuDeathGrip, the new split parser kept crashing until I made it grip the stream parser and executor also. It's not clear to me why exactly this is, since I didn't audit the code outside the parser that holds references to the parts of the parser.

In the case of this method, I don't have a theory of whether gripping the stream parser and executor is absolutely necessary. It's not fully clear to me why the old parser grips itself in this method.

I added these grips, because the other old grip required gripping the other objects, too, and I wanted to avoid this landing getting delayed by a crash caused by failure to have the same grip pattern here as well. (My planned schedule was already disrupted by the other grip case.)

I moved the grips to the top of the method just in case.

Added comments about this.

> +   * Can this be enforced with an assertion in
> nsHtml5TreeOpExecutor::WillBuildModel?

Added.
Attachment #399056 - Attachment is obsolete: true
Attachment #399974 - Flags: superreview?(mrbkap)
Attachment #399974 - Flags: review?(bnewman)
Attachment #399056 - Flags: superreview?(mrbkap)
Attachment #399056 - Flags: review?(bnewman)
Attachment #399974 - Flags: review?(bnewman) → review+
(In reply to comment #25)
> > +  // I'm all for more kung fu, but it seems as though the death grip on the
> > +  // parser should protect the parser's nsRefPtr members, so these additional
> > +  // grips shouldn't be necessary.  Any theory as to why they help?
> >    nsRefPtr<nsHtml5StreamParser> streamKungFuDeathGrip(mStreamParser);
> >    nsRefPtr<nsHtml5TreeOpExecutor> treeOpKungFuDeathGrip(mExecutor);
> 
> In the other case where the old parser does a kungFuDeathGrip, the new split
> parser kept crashing until I made it grip the stream parser and executor also.
> It's not clear to me why exactly this is, since I didn't audit the code outside
> the parser that holds references to the parts of the parser.

My analysis was flawed, actually: the grips could well be necessary if mStreamParser or mExecutor are ever reassigned during parsing.  It doesn't appear trivial to make those data members const, so the grips should definitely stay.  And, as you argue, they don't hurt.

mrbkap: nudge?
Attachment #399974 - Flags: superreview?(mrbkap) → superreview+
Thank you for the r&sr. Pushed as:
http://hg.mozilla.org/mozilla-central/rev/ac68245b5363
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
     case eTreeOpProcessOfflineManifest: {
-      rv = aBuilder->ProcessOfflineManifest(mNode);
+      aBuilder->ProcessOfflineManifest(mNode);
       return rv;
     }

This doesn't look right to me.
(In reply to comment #28)
>      case eTreeOpProcessOfflineManifest: {
> -      rv = aBuilder->ProcessOfflineManifest(mNode);
> +      aBuilder->ProcessOfflineManifest(mNode);
>        return rv;
>      }
> 
> This doesn't look right to me.

nsContentSink::ProcessOfflineManifest is void:
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsContentSink.h#251
The parser was written with the assumption that no-fail malloc would land long before the parser, so the parser drops OOM-or-OK type nsresults on the floor all over the place.
You need to log in before you can comment on or make changes to this bug.