Closed Bug 258082 Opened 20 years ago Closed 20 years ago

CTextToken::ConsumeUntil consumes too much text

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mrbkap, Assigned: mrbkap)

Details

Attachments

(1 file, 4 obsolete files)

I'm going to fix up the function to stop at the </tag> instead of eating it (and
thus losing its representation). Then the tokenizer can actually call
ConsumeEndTag and not fake a close tag.
bz, I've almost got this done, however, I've run into a couple of problems. One
is that with my patch we now hand the DTD entirely empty attributes (no value,
no key). I had a fix for this until I saw bug 109152. My original fix didn't
take the case provided by that bug into account, I'll fix that tomorrow. I'm
trying to get this patch finished by tomorrow because I won't have time to work
on this over the week.
Attached patch patch v1 (obsolete) — Splinter Review
This is my first attempt. There is *a lot* of funkyness with invalid tokens
near the end of the document, and in cases like:
<body><script>foo()</script </body (there needs to be a space after the
</body>!)
in view source, I see </body></body> (without the space, the </body> token gets
dropped completely, but that's another bug). Otherwise, this patch works to
spec. The one change I made to CAttributeToken::Consume is so that we don't
generate entirely empty (aka nonexistent) attributes.
As a side effect, this patch also fixes bug 83221 (but the </script>,</style>
still gets appended in non-view-source mode!).
Attached patch real patch (obsolete) — Splinter Review
The other patch had an unnecessary change (and I attached the wrong one).
Attachment #158021 - Attachment is obsolete: true
Attachment #158027 - Flags: review?(bzbarsky)
> in view source, I see </body></body>

You mean you see:

<body><script>whatever</script></body></body> ?

Is that with the patch, or without?  What happens without?  And is this
something to Not Worry about given the work on the new view-source tokenizer?

I'll try to review later tonight.
Ok. I overreacted slightly on the duplicated end tag. I see it more often now
because we call ConsumeEndTag more often (namely on <script> and friends). 

You should be able to reproduce it without the patch. I'm not sure if I should
bother filing a new bug on this, but the funkyness at the end of the document
breaks down as follows:
"</tag" (no whitespace) disappears entirely
"</tag " (with whitespace) turns into "</tag></tag>"
Comment on attachment 158027 [details] [diff] [review]
real patch

>Index: src/nsHTMLTokens.cpp

>-        CopyUnicodeTo(ltOffset.advance(2),gtOffset,aEndTagName);

Make aEndTagName const?  Then in the caller, you can use an
nsDependentSubstring instead of copying the text into an nsAutoString...

>+      aScanner.SetPosition(ltOffset);

Want to change the header to document that this is where the scanner is left?

>       // We found </SCRIPT>...permit flushing -> Ref: Bug 22485

And fix this lying little comment to acknowledge that </style> happens too?

>-  //I changed a bit of this method to use aRetain so that we do the right
>-  //thing in viewsource. The ws/cr/lf sequences are now maintained, and 

Why remove this comment?

>Index: src/nsHTMLTokenizer.cpp

>+            PRUnichar theChar;
>+            result = aScanner.Peek(theChar);

This is only used in the first branch below... why bother with it?

For that matter, if you plan to get unconditionally, why bother with the peek? 
Just get and assert after getting?

>+              // Get the /
>+              result = aScanner.Peek(theChar);
>+              NS_ENSURE_SUCCESS(result, result);
>+              NS_ASSERTION(theChar == kForwardSlash, "CTextToken::ConsumeUntil is broken!");

Make that whole block ifdef DEBUG.

>+              result = ConsumeEndTag(theChar,endToken,aScanner);

And pass |PRUnichar('/')| for the first arg here (since you're already
asserting that's what the char is).

With those changes, looks good....
Attached patch patch v2 (obsolete) — Splinter Review
Updated to comments.
Attachment #158027 - Attachment is obsolete: true
Attachment #158027 - Flags: review?(bzbarsky)
Attached patch patch v3 (obsolete) — Splinter Review
More comments addressed.
Attachment #158081 - Attachment is obsolete: true
Comment on attachment 158083 [details] [diff] [review]
patch v3

>Index: src/nsHTMLTokenizer.cpp

>+              result = aScanner.GetChar(theChar);

You never check that "result".	I'd say change the assert to

NS_ASSERTION(NS_SUCCEEDED(result) && theChar == kLessThan, ...)

With that, r=bzbarsky.
Attachment #158083 - Flags: review+
Comment on attachment 158083 [details] [diff] [review]
patch v3

jst, please give this patch sr=. I also need this to be checked in for me.
Attachment #158083 - Flags: superreview?(jst)
Comment on attachment 158083 [details] [diff] [review]
patch v3

sr=jst
Attachment #158083 - Flags: superreview?(jst) → superreview+
Fix checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: