Closed Bug 516406 Opened 11 years ago Closed 11 years ago

[HTML5] Make document.write() parser and stream parser have distinct tokenizers

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Attachments

(1 file, 14 obsolete files)

170.64 KB, patch
mozilla+ben
: review+
Details | Diff | Splinter Review
nsHtml5Parser and nsHtml5StreamParser need to have their own copies of nsHtml5Tokenizer and nsHtml5TreeBuilder.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attached patch Backup of WIP (obsolete) — Splinter Review
Depends on: 515338
Depends on: 517943
Blocks: 518104
It turns out that supporting defer scripts the Gecko way is trouble with off-the-main thread parsing. Need to break them here and bring them back with HTML5 compliant behavior in bug 518104.
Attachment #401835 - Attachment is obsolete: true
Blocks: 516411
Comment on attachment 402587 [details] [diff] [review]
Backup of WIP. Almost there. Code polish needed.

Todo: release memory properly from the new tree ops.
Something here makes BlockOnload() and UnblockOnload() counts not match on www.cnn.com and EndLoad() in nsHTMLDocument is called too early.
Attached patch Make www.cnn.com less broken (obsolete) — Splinter Review
Attachment #402587 - Attachment is obsolete: true
Attachment #403778 - Attachment is obsolete: true
Attached patch Flush stream parser more often (obsolete) — Splinter Review
Attachment #403998 - Attachment is obsolete: true
No longer blocks: 518104
Attachment #404024 - Attachment is obsolete: true
Attachment #404804 - Flags: review?(bnewman)
Comment on attachment 404804 [details] [diff] [review]
Fix merge conflicts caused by earlier patches in the queue

This doesn't handle tokenizer state copying just right.
Attachment #404804 - Attachment is obsolete: true
Attachment #404804 - Flags: review?(bnewman)
Attached patch Make state syncing better (obsolete) — Splinter Review
At this point, the tokenizer still isn't quite right when someone document.writes a partial doctype token in the middle of the document but now it's harder to backport that parser core fix to this patch, so I'd rather fix it in a follow-up bug.
(In reply to comment #11)
> Created an attachment (id=405485) [details]
> Make state syncing better
> 
> At this point, the tokenizer still isn't quite right when someone
> document.writes a partial doctype token in the middle of the document but now
> it's harder to backport that parser core fix to this patch, so I'd rather fix
> it in a follow-up bug.

Does that mean you still want me to review the latest patch?  (Sorry for the delay.)
It would be good if you could review this patch (with the assumption that the state syncing is wrong inside doctype tokens and will get fixed later).
Attachment #405485 - Attachment is obsolete: true
Attachment #406218 - Flags: review?(bnewman)
Attached patch Fix doctype syncing (obsolete) — Splinter Review
I realized there was nothing blocking me from fixing the doctype syncing at this point in the patch queue, so here it is.
Attachment #406218 - Attachment is obsolete: true
Attachment #406446 - Flags: review?(bnewman)
Attachment #406218 - Flags: review?(bnewman)
Summary:

- nsHtml5StreamParser::mSnapshot can get overwritten before the corresponding tree operation has been delivered, leading to memory corruption.

- Need to add back the call to mExecutor->SetLifeCycle(TERMINATED) in nsHtml5Parser::ParseFragment, I think.

- I'd prefer a little more assurance that uninitialized eHtml5TreeOperations are actually intended to be eTreeOpAppends.  If you agree that this could be a source of bugs, I have an idea for a DEBUG-only wrapper for mOpCode that ensures it is reassigned exactly once.

- The previous patch didn't apply completely cleanly because the aBuilder->HaveNotified(node) expression here

  case eTreeOpDoneAddingChildren: {
    nsIContent* node = *(mOne.node);
    node->DoneAddingChildren(aBuilder->HaveNotified(node));
    return rv;
  }

was simply PR_FALSE.  Since nsHtml5TreeBuilder::HaveNotified appears no longer to exist, and PR_FALSE seems like the safe value to pass, I assume the PR_FALSE version is what you intended, but maybe I messed up the patch sequence.
Attached patch Address review comments (obsolete) — Splinter Review
(In reply to comment #15)
> Created an attachment (id=406644) [details]
> Feedback on "406446: Fix doctype syncing"
> 
> Summary:
> 
> - nsHtml5StreamParser::mSnapshot can get overwritten before the corresponding
> tree operation has been delivered, leading to memory corruption.

Yeah, sorry, this wasn't well thought out at all, because it was going to go away in a subsequent patch. 

Fixed.

> - Need to add back the call to mExecutor->SetLifeCycle(TERMINATED) in
> nsHtml5Parser::ParseFragment, I think.

Fixed.

> - I'd prefer a little more assurance that uninitialized eHtml5TreeOperations
> are actually intended to be eTreeOpAppends.  If you agree that this could be a
> source of bugs, I have an idea for a DEBUG-only wrapper for mOpCode that
> ensures it is reassigned exactly once.

Fixed by adding a debug-only enum value.

> - The previous patch didn't apply completely cleanly because the
> aBuilder->HaveNotified(node) expression here
> 
>   case eTreeOpDoneAddingChildren: {
>     nsIContent* node = *(mOne.node);
>     node->DoneAddingChildren(aBuilder->HaveNotified(node));
>     return rv;
>   }
> 
> was simply PR_FALSE.  Since nsHtml5TreeBuilder::HaveNotified appears no longer
> to exist, and PR_FALSE seems like the safe value to pass, I assume the PR_FALSE
> version is what you intended, but maybe I messed up the patch sequence.

That happened, because I landed the patch for bug 509851 out of sequence. The patch on this bug is correctly synced. The attached patch on bug 515338 might have a conflict on that line. Conceptually, the change from bug 509851 should be preserved.

I noticed a new problem:

Starting with this patch, framesets are rendered twice. The frameset visible in the viewport appears empty. There's another rendering of the frameset below it (scrolled out of view). The lower frameset rendering renders the frame content. Resizing frames in the upper rendering resizes frames in the lower one, too.

Seems like a double CSS frame construction problem, but I don't yet have no idea what changed in a way that'd cause double CSS frame construction.
Attachment #406446 - Attachment is obsolete: true
Attachment #406446 - Flags: review?(bnewman)
Attached patch Stylistic improvements (obsolete) — Splinter Review
Still haven't figured out the frameset double rendering.
Attachment #406667 - Attachment is obsolete: true
Attached patch Really doing the improvements (obsolete) — Splinter Review
Wrong patch in the previous upload. Sorry for the spam.
Attachment #406669 - Attachment is obsolete: true
This fixes the frameset issue.
Attachment #406644 - Attachment is obsolete: true
Attachment #406670 - Attachment is obsolete: true
Attachment #406683 - Flags: review?(bnewman)
These two assertions (I added them) are failing for me on facebook pages:

  diff --git a/parser/html/nsHtml5TreeOpExecutor.cpp b/parser/html/nsHtml5TreeOpExecutor.cpp
  --- a/parser/html/nsHtml5TreeOpExecutor.cpp
  +++ b/parser/html/nsHtml5TreeOpExecutor.cpp
  @@ -437,16 +437,18 @@ nsHtml5TreeOpExecutor::ExecuteScript()
     // need executing.
     nsresult rv = mScriptElement->DoneAddingChildren(PR_TRUE);
     // If the act of insertion evaluated the script, we're fine.
     // Else, block the parser till the script has loaded.
     if (rv == NS_ERROR_HTMLPARSER_BLOCK) {
       mScriptElements.AppendObject(sele);
       mParser->BlockParser();
     } else {
  +    NS_ASSERTION(mLifeCycle != TERMINATED, "Didn't we, like, just check for this?");
  +    NS_ASSERTION(mParser, "Uh-oh.");
       // This may have already happened if the script executed, but in case
       // it didn't then remove the element so that it doesn't get stuck forever.
       htmlDocument->ScriptExecuted(sele);
       static_cast<nsHtml5Parser*> (mParser.get())->MaybePostContinueEvent();
     }
     mScriptElement = nsnull;
   }
   
Any idea if this represents a larger problem, or if the appropriate fix is just to re-check mLifeCycle and/or mParser after script execution?
(In reply to comment #20)
> These two assertions (I added them) are failing for me on facebook pages:

Added a termination check. It can go away later in the patch queue, though.

> Any idea if this represents a larger problem, or if the appropriate fix is just
> to re-check mLifeCycle and/or mParser after script execution?

It's a larger problem. I posted about it to .platform newsgroup. The general problem is that nsHtml5TreeOpExecutor::Flush() calls out to something and somewhere outside the parser this causes a call back to nsIParser::Terminate(). I don't yet understand the assumptions the rest of Gecko has about parser behavior well enough to know what the best fix is (deferring aspects of termination vs. checking for termination all the time).

I'd like to address that general problem as a follow-up bug after flushing the current patch queue. It's an old problem that isn't introduced by these patches.
Attachment #406683 - Attachment is obsolete: true
Attachment #406986 - Flags: review?(bnewman)
Attachment #406683 - Flags: review?(bnewman)
Comment on attachment 406986 [details] [diff] [review]
Add a termination check 

Looks good to me!
Attachment #406986 - Flags: review?(bnewman) → review+
(In reply to comment #22)
> (From update of attachment 406986 [details] [diff] [review])
> Looks good to me!

Er, now I'm not so sure.  I can reliably trigger

  ###!!! ASSERTION: Had a script element and DidBuildModel call: '!mCallDidBuildModel', file /Users/ben/dev/debug/parser/html/nsHtml5TreeOpExecutor.cpp, line 320

by visiting this page:

  http://people.mozilla.org/~bnewman/bugs/516406/springer.html
(In reply to comment #23)
> (In reply to comment #22)
> > (From update of attachment 406986 [details] [diff] [review] [details])
> > Looks good to me!
> 
> Er, now I'm not so sure.  I can reliably trigger
> 
>   ###!!! ASSERTION: Had a script element and DidBuildModel call:
> '!mCallDidBuildModel', file
> /Users/ben/dev/debug/parser/html/nsHtml5TreeOpExecutor.cpp, line 320
> 
> by visiting this page:
> 
>   http://people.mozilla.org/~bnewman/bugs/516406/springer.html

This goes away with the later patches in the queue. I have applied patches for bug 514661, bug 515338, bug 516406, bug 482918, bug 482919, bug 503473, bug 521970 and bug 523087, and I don't see the problem.

Bug 503473 and bug 523087 cover a whole lot of problems.

At this point, I wonder if the the above queue should be treated as an atomic review and landing unit. It seems like a bad use of time to keep backporting stuff to the earlier patches in the queue. Yesterday, I lost some time because I forgot to revert a change I did at comment 16 here but that needed to go away again at bug 482919.
(In reply to comment #24)
> At this point, I wonder if the the above queue should be treated as an atomic
> review and landing unit. It seems like a bad use of time to keep backporting
> stuff to the earlier patches in the queue. Yesterday, I lost some time because
> I forgot to revert a change I did at comment 16 here but that needed to go away
> again at bug 482919.

Yeah, I definitely agree that this should all land at once.  I'll try to do a better job of considering the whole stack when I review.  No need to fold it all together yet, though; I've got all the patches applied in the right order on my machine, and that's easier to manage than one huge patch.
Thank you for the review.

Landed as http://hg.mozilla.org/mozilla-central/rev/7380c012628e
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.