Underlined text doesn't stop

VERIFIED FIXED in Future

Status

()

Core
HTML: Parser
P4
normal
VERIFIED FIXED
16 years ago
13 years ago

People

(Reporter: Sander van Rijnswou, Assigned: mrbkap)

Tracking

Trunk
Future
x86
Windows NT
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: custrtm-[fix in hand], URL)

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

16 years ago
Using mozilla build 2002050705 on Windows NT

Looking at the page the underlined text doesnt stop were it is supposed to do
so. Apparently this html was made by word. 

The problem seems related to the illegal use of the <p> tag. I made a testcase
demonstrating the behaviour with bold instead of underline. 

Explorer shows the page just fine.
(Reporter)

Comment 1

16 years ago
Created attachment 82639 [details]
Testcase showing text that is bold but should not be

Comment 2

16 years ago
Confirming this bug. But is it really a bug ? Or bad html writing ? ;-)
To parser, but I'm betting this is wontfix...
Assignee: Matti → harishd
Status: UNCONFIRMED → NEW
Component: Browser-General → Parser
Ever confirmed: true
QA Contact: imajes-qa → moied

Updated

16 years ago
Summary: Underlined text doesnt stop → Underlined text doesn't stop

Updated

16 years ago
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: --- → mozilla1.0.1
(Reporter)

Comment 4

16 years ago
Created attachment 83535 [details]
Simplified testcase

Slight simplification of the test case.
Keywords: nsbeta1+
Whiteboard: [ADT2 RTM]

Comment 5

16 years ago
The root cause of the problem is nsHTMLTokenizer::ScanDocStructure that would
allow inline elements to contain block level if the mark up is wellformed.
Unfortunately, we cannot remove this feature, at this stage, because of the
performance implications. On the other hand it's much easier to fix the content.
Setting this bug to future.
Target Milestone: mozilla1.0.1 → Future

Comment 6

16 years ago
mmmmm...there is a way to fix this without compromising the feature.
Investigating. Moving back to 1.0.1
Target Milestone: Future → mozilla1.0.1

Comment 7

16 years ago
Created attachment 85329 [details] [diff] [review]
patch v1.0 [ needs extensive testing ]

Mark a token wellformed, nsHTMLTokenizer::ScanDocStructure, only if it's really
wellformed. 

Ex. <p><b><i><p></b></p></i>

Before the patch: 
On encountering </b>, when scanning for wellformedness, we check to see if the
immediate open tag was actually <b> and since it wasn't we marked <b>, on the
stack, as malformed. Now, on encountering </p> we see <p> on the stack and pop
it off the stack after marking it wellformed. And similarly with </i>. This was
the bug. Neither <p> nor <i> is wellformed.

After the patch:
On encountering </b>, when scanning for wellformedness, we check to see if the
immediate open tag was actually <b> and since it wasn't we mark <p> ( which is
the immediate open tag ) as malformed. Now, on encountering </p> we do see <p>
on the stack, however, since it was marked malformed we don't pop it off the
stack ( because it wasn't really wellformed ). Because of this </i> does not
see <i> as the immediate open tag and therefore is not mistaken for a
wellformed markup.

Comment 8

16 years ago
FYI: The above patch passed parser regression test.

Updated

16 years ago
Whiteboard: [ADT2 RTM] → [ADT2 RTM],custrtm-
This looks too risky for RTM, and seeing there are no dupes I think we should
take this on the trunk only.
Keywords: nsbeta1+ → nsbeta1-
Whiteboard: [ADT2 RTM],custrtm- → custrtm-

Updated

16 years ago
Whiteboard: custrtm- → custrtm-[fix in hand]
Target Milestone: mozilla1.0.1 → mozilla1.1beta
Target Milestone: mozilla1.1beta → Future
Choess, do you understand this code?
No, but if you hum a few bars...I'll try to wrap my head around it, but I'm not
sure when I'll have time for that.
(Assignee)

Comment 12

14 years ago
Assigning to me so I remember to look at this. Choess, if you get a chance to do
so before me, feel free to take back.

I probably won't get to this for a while, but I don't want to lose it.
Assignee: harishd → mrbkap
Status: ASSIGNED → NEW
(Assignee)

Comment 13

14 years ago
Created attachment 163937 [details] [diff] [review]
updated to trunk

This is the patch updated to the current trunk. I think it makes sense, but I'd
like to investigate it more fully before asking for reviews.
Attachment #85329 - Attachment is obsolete: true
(Assignee)

Comment 14

14 years ago
Comment on attachment 163937 [details] [diff] [review]
updated to trunk

So, I think this patch is mostly right. My one concern is that if there is a
malformed inline-containing-block, then our stack can never shrink past the
size of the tags leading up to it, so the stack could potentially grow to
theMaxStackDepth on a large document with lots of errors. Once that happens
we'd simply do RS handling on all inlines containing blocks, which would work,
but would be a performance hit.

Does anybody have opinions on whether this is a problem worth fixing (i.e.,
figuring out if/when we can pop tokens off the stack) or if this is fine?
(Assignee)

Comment 15

14 years ago
Created attachment 165236 [details] [diff] [review]
alternative approach

This is an alternative approach that I like a lot more. Basically, the idea is
that given <b><i><p>foo</b></p></i>, when we see </b> any tag on the stack
between it and <b> is malformed (in this case <i><p>) so we mark them as
malformed on our way down to <b> (which also gets marked as malformed).

We need to keep track of the tags so that tags lower on the stack don't get
affected by bad content in between, so in: <p>foo <b><i><p>foo</b></i></p></p>,
even though we don't care if <p> is malformed, we don't really want it to be
set as malformed.

My only concern with this patch is that it does more than the old code, so it
might be a perf hit.
Attachment #163937 - Attachment is obsolete: true
(Assignee)

Comment 16

14 years ago
Comment on attachment 165236 [details] [diff] [review]
alternative approach

rbs, what do you think of this? (note: this patch is fairly ugly since I also
did a bit of whitespace cleanup that was there in the original patch also). A
diff -w isn't any nicer-looking.
Attachment #165236 - Flags: review?(rbs)
(Assignee)

Updated

13 years ago
Status: NEW → ASSIGNED
(Assignee)

Updated

13 years ago
Blocks: 276804
(Assignee)

Updated

13 years ago
Attachment #165236 - Flags: review?(rbs) → review?(bzbarsky)
It'll take me a bit (a week? more?) to get to this review... :(
Comment on attachment 165236 [details] [diff] [review]
alternative approach

A diff -b, in addition to this diff, would have been a LOT easier to review, I
think.

>Index: src/nsHTMLTokenizer.cpp
>+  CHTMLToken* theToken = (CHTMLToken*)mTokenDeque.ObjectAt(mTokenScanPos);

Could you possibly document what mTokenScanPos is (in the relevant header), and
maybe document in more detail what this function does, if you understand it?  I
see that this is preserving the old behavior, based on the code after the "now
we know where to start" comment, but..

>+  while(theToken && (theStackDepth < theMaxStackDepth)) {

No need for the extra parens.

>+                    theStack.Pop(); // pop off theLastToken for real.
>+                    do {
>+                      theLastToken->SetContainerInfo(eMalformed);
>+                      tempStack.Push(theLastToken);
>+                      theLastToken = NS_STATIC_CAST(CHTMLToken*, theStack.Pop());
>+                    } while(theLastToken && theTag != theLastToken->GetTypeID());
>+
>+                    NS_ASSERTION(theLastToken,
>+                                 "FindLastIndexOfTag lied to us!"
>+                                 " We couldn't find theTag on theStack");
>+                    theLastToken->SetContainerInfo(eMalformed);
>+
>+                    // Great, now push all of the other tokens back onto the
>+                    // stack to preserve the general structure of the document.
>+                    while(tempStack.GetSize() != 0) {
>+                      theStack.Push(tempStack.Pop());
>                     }

This doesn't push the last value of theLastToken onto theStack, does it?  Is
that correct?
(Assignee)

Comment 19

13 years ago
(In reply to comment #18)
> (From update of attachment 165236 [details] [diff] [review] [edit])
> A diff -b, in addition to this diff, would have been a LOT easier to review, I
> think.
> 
> >Index: src/nsHTMLTokenizer.cpp
> >+  CHTMLToken* theToken = (CHTMLToken*)mTokenDeque.ObjectAt(mTokenScanPos);
> 
> Could you possibly document what mTokenScanPos is (in the relevant header),
Sure

> This doesn't push the last value of theLastToken onto theStack, does it?  Is
> that correct?

Yes. By this point theLastToken is the perceived start tag for this current end
tag. So given:
<b><p>Foo bar</b></p>

We pop the <p> telling it it's malformed (even though we don't care), and we
then pop the <b> and tell it it's malformed. However, since we have found the
start token that we care about (which was just closed by the end tag) we don't
want to push that one back on. Instead, we just push all of the rest of the
tokens back onto the stack so that given: <p><b><p>Foo bar</b></p></p> the </p>
doesn't close the wrong </p>. (after, we should have <p><p>Foo bar</p></p>).
(Assignee)

Comment 20

13 years ago
(In reply to comment #19)
> doesn't close the wrong </p>. (after, we should have <p><p>Foo bar</p></p>).

This should be: doesn't close the wrong <p>. I'll attach a new diff -b in a
couple of minutes.
(Assignee)

Comment 21

13 years ago
Created attachment 171548 [details] [diff] [review]
updated to comments (diff -wb)

This addresses the comments above.
Attachment #165236 - Attachment is obsolete: true
Attachment #171548 - Flags: review?(bzbarsky)
(Assignee)

Updated

13 years ago
Attachment #165236 - Flags: review?(bzbarsky)
Comment on attachment 171548 [details] [diff] [review]
updated to comments (diff -wb)

OK.  Add a comment about why we're not pushing theLastToken back on, and
r=bzbarsky
Attachment #171548 - Flags: review?(bzbarsky) → review+
(Assignee)

Updated

13 years ago
Attachment #171548 - Flags: superreview?(rbs)

Comment 23

13 years ago
Comment on attachment 171548 [details] [diff] [review]
updated to comments (diff -wb)

+  // Find theTarget in the stack, marking each (malformed!)
+  // tag in our way.
+  theStack.Pop(); // pop off theLastToken for real.
+  do {
+    theLastToken->SetContainerInfo(eMalformed);
+    tempStack.Push(theLastToken);
+    theLastToken = NS_STATIC_CAST(CHTMLToken*, theStack.Pop());
+  } while(theLastToken && theTag != theLastToken->GetTypeID());

I have this feeling that your otherwise careful approach doesn't cater for the
case of unknown tags, since  GetTypeID would return the catch-all
eHTMLTag_userdefined even when they have different tag names. Do you need to
strenghten the test for such cases?
(Assignee)

Comment 24

13 years ago
(In reply to comment #23)
> I have this feeling that your otherwise careful approach doesn't cater for the
> case of unknown tags, since  GetTypeID would return the catch-all
> eHTMLTag_userdefined even when they have different tag names. Do you need to
> strenghten the test for such cases?

I think that for now, this is OK since the first userdefined tag we find will be
the one that will be closed anyway (see bug 57217). Once bug 57217 is fixed,
this test will need to be strengthened to match it. I'll add a comment to that
effect.

Comment 25

13 years ago
Comment on attachment 171548 [details] [diff] [review]
updated to comments (diff -wb)

sr=rbs
Attachment #171548 - Flags: superreview?(rbs) → superreview+
(Assignee)

Comment 26

13 years ago
Fix checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
(Assignee)

Updated

13 years ago
No longer blocks: 276804
(Assignee)

Comment 27

13 years ago
*** Bug 276804 has been marked as a duplicate of this bug. ***
Testcase https://bugzilla.mozilla.org/attachment.cgi?id=83535 works for me now;
it shows "More normal text" in regular font; without bold attributes.

Verified FIXED using build 2005-01-21-05 Seamonkey trunk, Windows XP.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.