Closed Bug 142771 Opened 23 years ago Closed 20 years ago

Underlined text doesn't stop

Categories

(Core :: DOM: HTML Parser, defect, P4)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED
Future

People

(Reporter: sander, Assigned: mrbkap)

References

()

Details

(Whiteboard: custrtm-[fix in hand])

Attachments

(3 files, 3 obsolete files)

Using mozilla build 2002050705 on Windows NT Looking at the page the underlined text doesnt stop were it is supposed to do so. Apparently this html was made by word. The problem seems related to the illegal use of the <p> tag. I made a testcase demonstrating the behaviour with bold instead of underline. Explorer shows the page just fine.
Confirming this bug. But is it really a bug ? Or bad html writing ? ;-)
To parser, but I'm betting this is wontfix...
Assignee: Matti → harishd
Status: UNCONFIRMED → NEW
Component: Browser-General → Parser
Ever confirmed: true
QA Contact: imajes-qa → moied
Summary: Underlined text doesnt stop → Underlined text doesn't stop
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: --- → mozilla1.0.1
Attached file Simplified testcase
Slight simplification of the test case.
Keywords: nsbeta1+
Whiteboard: [ADT2 RTM]
The root cause of the problem is nsHTMLTokenizer::ScanDocStructure that would allow inline elements to contain block level if the mark up is wellformed. Unfortunately, we cannot remove this feature, at this stage, because of the performance implications. On the other hand it's much easier to fix the content. Setting this bug to future.
Target Milestone: mozilla1.0.1 → Future
mmmmm...there is a way to fix this without compromising the feature. Investigating. Moving back to 1.0.1
Target Milestone: Future → mozilla1.0.1
Mark a token wellformed, nsHTMLTokenizer::ScanDocStructure, only if it's really wellformed. Ex. <p><b><i><p></b></p></i> Before the patch: On encountering </b>, when scanning for wellformedness, we check to see if the immediate open tag was actually <b> and since it wasn't we marked <b>, on the stack, as malformed. Now, on encountering </p> we see <p> on the stack and pop it off the stack after marking it wellformed. And similarly with </i>. This was the bug. Neither <p> nor <i> is wellformed. After the patch: On encountering </b>, when scanning for wellformedness, we check to see if the immediate open tag was actually <b> and since it wasn't we mark <p> ( which is the immediate open tag ) as malformed. Now, on encountering </p> we do see <p> on the stack, however, since it was marked malformed we don't pop it off the stack ( because it wasn't really wellformed ). Because of this </i> does not see <i> as the immediate open tag and therefore is not mistaken for a wellformed markup.
FYI: The above patch passed parser regression test.
Whiteboard: [ADT2 RTM] → [ADT2 RTM],custrtm-
This looks too risky for RTM, and seeing there are no dupes I think we should take this on the trunk only.
Keywords: nsbeta1+nsbeta1-
Whiteboard: [ADT2 RTM],custrtm- → custrtm-
Whiteboard: custrtm- → custrtm-[fix in hand]
Target Milestone: mozilla1.0.1 → mozilla1.1beta
Target Milestone: mozilla1.1beta → Future
Choess, do you understand this code?
No, but if you hum a few bars...I'll try to wrap my head around it, but I'm not sure when I'll have time for that.
Assigning to me so I remember to look at this. Choess, if you get a chance to do so before me, feel free to take back. I probably won't get to this for a while, but I don't want to lose it.
Assignee: harishd → mrbkap
Status: ASSIGNED → NEW
Attached patch updated to trunk (obsolete) — Splinter Review
This is the patch updated to the current trunk. I think it makes sense, but I'd like to investigate it more fully before asking for reviews.
Attachment #85329 - Attachment is obsolete: true
Comment on attachment 163937 [details] [diff] [review] updated to trunk So, I think this patch is mostly right. My one concern is that if there is a malformed inline-containing-block, then our stack can never shrink past the size of the tags leading up to it, so the stack could potentially grow to theMaxStackDepth on a large document with lots of errors. Once that happens we'd simply do RS handling on all inlines containing blocks, which would work, but would be a performance hit. Does anybody have opinions on whether this is a problem worth fixing (i.e., figuring out if/when we can pop tokens off the stack) or if this is fine?
Attached patch alternative approach (obsolete) — Splinter Review
This is an alternative approach that I like a lot more. Basically, the idea is that given <b><i><p>foo</b></p></i>, when we see </b> any tag on the stack between it and <b> is malformed (in this case <i><p>) so we mark them as malformed on our way down to <b> (which also gets marked as malformed). We need to keep track of the tags so that tags lower on the stack don't get affected by bad content in between, so in: <p>foo <b><i><p>foo</b></i></p></p>, even though we don't care if <p> is malformed, we don't really want it to be set as malformed. My only concern with this patch is that it does more than the old code, so it might be a perf hit.
Attachment #163937 - Attachment is obsolete: true
Comment on attachment 165236 [details] [diff] [review] alternative approach rbs, what do you think of this? (note: this patch is fairly ugly since I also did a bit of whitespace cleanup that was there in the original patch also). A diff -w isn't any nicer-looking.
Attachment #165236 - Flags: review?(rbs)
Status: NEW → ASSIGNED
Blocks: 276804
Attachment #165236 - Flags: review?(rbs) → review?(bzbarsky)
It'll take me a bit (a week? more?) to get to this review... :(
Comment on attachment 165236 [details] [diff] [review] alternative approach A diff -b, in addition to this diff, would have been a LOT easier to review, I think. >Index: src/nsHTMLTokenizer.cpp >+ CHTMLToken* theToken = (CHTMLToken*)mTokenDeque.ObjectAt(mTokenScanPos); Could you possibly document what mTokenScanPos is (in the relevant header), and maybe document in more detail what this function does, if you understand it? I see that this is preserving the old behavior, based on the code after the "now we know where to start" comment, but.. >+ while(theToken && (theStackDepth < theMaxStackDepth)) { No need for the extra parens. >+ theStack.Pop(); // pop off theLastToken for real. >+ do { >+ theLastToken->SetContainerInfo(eMalformed); >+ tempStack.Push(theLastToken); >+ theLastToken = NS_STATIC_CAST(CHTMLToken*, theStack.Pop()); >+ } while(theLastToken && theTag != theLastToken->GetTypeID()); >+ >+ NS_ASSERTION(theLastToken, >+ "FindLastIndexOfTag lied to us!" >+ " We couldn't find theTag on theStack"); >+ theLastToken->SetContainerInfo(eMalformed); >+ >+ // Great, now push all of the other tokens back onto the >+ // stack to preserve the general structure of the document. >+ while(tempStack.GetSize() != 0) { >+ theStack.Push(tempStack.Pop()); > } This doesn't push the last value of theLastToken onto theStack, does it? Is that correct?
(In reply to comment #18) > (From update of attachment 165236 [details] [diff] [review] [edit]) > A diff -b, in addition to this diff, would have been a LOT easier to review, I > think. > > >Index: src/nsHTMLTokenizer.cpp > >+ CHTMLToken* theToken = (CHTMLToken*)mTokenDeque.ObjectAt(mTokenScanPos); > > Could you possibly document what mTokenScanPos is (in the relevant header), Sure > This doesn't push the last value of theLastToken onto theStack, does it? Is > that correct? Yes. By this point theLastToken is the perceived start tag for this current end tag. So given: <b><p>Foo bar</b></p> We pop the <p> telling it it's malformed (even though we don't care), and we then pop the <b> and tell it it's malformed. However, since we have found the start token that we care about (which was just closed by the end tag) we don't want to push that one back on. Instead, we just push all of the rest of the tokens back onto the stack so that given: <p><b><p>Foo bar</b></p></p> the </p> doesn't close the wrong </p>. (after, we should have <p><p>Foo bar</p></p>).
(In reply to comment #19) > doesn't close the wrong </p>. (after, we should have <p><p>Foo bar</p></p>). This should be: doesn't close the wrong <p>. I'll attach a new diff -b in a couple of minutes.
This addresses the comments above.
Attachment #165236 - Attachment is obsolete: true
Attachment #171548 - Flags: review?(bzbarsky)
Attachment #165236 - Flags: review?(bzbarsky)
Comment on attachment 171548 [details] [diff] [review] updated to comments (diff -wb) OK. Add a comment about why we're not pushing theLastToken back on, and r=bzbarsky
Attachment #171548 - Flags: review?(bzbarsky) → review+
Attachment #171548 - Flags: superreview?(rbs)
Comment on attachment 171548 [details] [diff] [review] updated to comments (diff -wb) + // Find theTarget in the stack, marking each (malformed!) + // tag in our way. + theStack.Pop(); // pop off theLastToken for real. + do { + theLastToken->SetContainerInfo(eMalformed); + tempStack.Push(theLastToken); + theLastToken = NS_STATIC_CAST(CHTMLToken*, theStack.Pop()); + } while(theLastToken && theTag != theLastToken->GetTypeID()); I have this feeling that your otherwise careful approach doesn't cater for the case of unknown tags, since GetTypeID would return the catch-all eHTMLTag_userdefined even when they have different tag names. Do you need to strenghten the test for such cases?
(In reply to comment #23) > I have this feeling that your otherwise careful approach doesn't cater for the > case of unknown tags, since GetTypeID would return the catch-all > eHTMLTag_userdefined even when they have different tag names. Do you need to > strenghten the test for such cases? I think that for now, this is OK since the first userdefined tag we find will be the one that will be closed anyway (see bug 57217). Once bug 57217 is fixed, this test will need to be strengthened to match it. I'll add a comment to that effect.
Comment on attachment 171548 [details] [diff] [review] updated to comments (diff -wb) sr=rbs
Attachment #171548 - Flags: superreview?(rbs) → superreview+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
No longer blocks: 276804
*** Bug 276804 has been marked as a duplicate of this bug. ***
Testcase https://bugzilla.mozilla.org/attachment.cgi?id=83535 works for me now; it shows "More normal text" in regular font; without bold attributes. Verified FIXED using build 2005-01-21-05 Seamonkey trunk, Windows XP.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: