Closed Bug 258082 Opened 21 years ago Closed 21 years ago

CTextToken::ConsumeUntil consumes too much text

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mrbkap, Assigned: mrbkap)

Details

Attachments

(1 file, 4 obsolete files)

I'm going to fix up the function to stop at the </tag> instead of eating it (and thus losing its representation). Then the tokenizer can actually call ConsumeEndTag and not fake a close tag.
bz, I've almost got this done, however, I've run into a couple of problems. One is that with my patch we now hand the DTD entirely empty attributes (no value, no key). I had a fix for this until I saw bug 109152. My original fix didn't take the case provided by that bug into account, I'll fix that tomorrow. I'm trying to get this patch finished by tomorrow because I won't have time to work on this over the week.
Attached patch patch v1 (obsolete) — Splinter Review
This is my first attempt. There is *a lot* of funkyness with invalid tokens near the end of the document, and in cases like: <body><script>foo()</script </body (there needs to be a space after the </body>!) in view source, I see </body></body> (without the space, the </body> token gets dropped completely, but that's another bug). Otherwise, this patch works to spec. The one change I made to CAttributeToken::Consume is so that we don't generate entirely empty (aka nonexistent) attributes. As a side effect, this patch also fixes bug 83221 (but the </script>,</style> still gets appended in non-view-source mode!).
Attached patch real patch (obsolete) — Splinter Review
The other patch had an unnecessary change (and I attached the wrong one).
Attachment #158021 - Attachment is obsolete: true
Attachment #158027 - Flags: review?(bzbarsky)
> in view source, I see </body></body> You mean you see: <body><script>whatever</script></body></body> ? Is that with the patch, or without? What happens without? And is this something to Not Worry about given the work on the new view-source tokenizer? I'll try to review later tonight.
Ok. I overreacted slightly on the duplicated end tag. I see it more often now because we call ConsumeEndTag more often (namely on <script> and friends). You should be able to reproduce it without the patch. I'm not sure if I should bother filing a new bug on this, but the funkyness at the end of the document breaks down as follows: "</tag" (no whitespace) disappears entirely "</tag " (with whitespace) turns into "</tag></tag>"
Comment on attachment 158027 [details] [diff] [review] real patch >Index: src/nsHTMLTokens.cpp >- CopyUnicodeTo(ltOffset.advance(2),gtOffset,aEndTagName); Make aEndTagName const? Then in the caller, you can use an nsDependentSubstring instead of copying the text into an nsAutoString... >+ aScanner.SetPosition(ltOffset); Want to change the header to document that this is where the scanner is left? > // We found </SCRIPT>...permit flushing -> Ref: Bug 22485 And fix this lying little comment to acknowledge that </style> happens too? >- //I changed a bit of this method to use aRetain so that we do the right >- //thing in viewsource. The ws/cr/lf sequences are now maintained, and Why remove this comment? >Index: src/nsHTMLTokenizer.cpp >+ PRUnichar theChar; >+ result = aScanner.Peek(theChar); This is only used in the first branch below... why bother with it? For that matter, if you plan to get unconditionally, why bother with the peek? Just get and assert after getting? >+ // Get the / >+ result = aScanner.Peek(theChar); >+ NS_ENSURE_SUCCESS(result, result); >+ NS_ASSERTION(theChar == kForwardSlash, "CTextToken::ConsumeUntil is broken!"); Make that whole block ifdef DEBUG. >+ result = ConsumeEndTag(theChar,endToken,aScanner); And pass |PRUnichar('/')| for the first arg here (since you're already asserting that's what the char is). With those changes, looks good....
Attached patch patch v2 (obsolete) — Splinter Review
Updated to comments.
Attachment #158027 - Attachment is obsolete: true
Attachment #158027 - Flags: review?(bzbarsky)
Attached patch patch v3 (obsolete) — Splinter Review
More comments addressed.
Attachment #158081 - Attachment is obsolete: true
Comment on attachment 158083 [details] [diff] [review] patch v3 >Index: src/nsHTMLTokenizer.cpp >+ result = aScanner.GetChar(theChar); You never check that "result". I'd say change the assert to NS_ASSERTION(NS_SUCCEEDED(result) && theChar == kLessThan, ...) With that, r=bzbarsky.
Attachment #158083 - Flags: review+
Comment on attachment 158083 [details] [diff] [review] patch v3 jst, please give this patch sr=. I also need this to be checked in for me.
Attachment #158083 - Flags: superreview?(jst)
Comment on attachment 158083 [details] [diff] [review] patch v3 sr=jst
Attachment #158083 - Flags: superreview?(jst) → superreview+
Fix checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: