Closed Bug 54834 Opened 24 years ago Closed 23 years ago

Unknown tags (inside pre?) assumed to need closing tags

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: rickg)

References

()

Details

(Keywords: html4)

Attachments

(4 files)

A <whatever> tag inside a <pre> causes </pre> to not close the pre section, 
unless there is a closing </whatever> tag inside the pre as well.  This isn't 
happening with all tags (try replacing "whatever" with "input"), so I'm 
assuming it's happening with all unknown tags.  This would make the problem a 
major bug, because ignoring unknown tags is a central part of HTML's ability ot 
be forward-compatible.

(Probably a dup, but I couldn't find the bug.)
Attached file testcase
I don't agree that this is a MAJOR issue; I'm not sure how often this type of 
problem will occur. I don't see this type of case very often. Nevertheless, the 
fix is trivial.
Status: NEW → ASSIGNED
Fixed by 1-line change to elementtable.
Keywords: rtm
Whiteboard: [RTM+] fix in hand
We should accept this for RTM. There are a variety of authoring, database, and
site-maintenance tags that insert proprietary extension tags into HTML markup
for all kinds of purposes (autocontent generation, database linking, etc.). We
should not be making any assumptions about tags we know nothing about; this
could trip up high-profile web authoring, web site, and web content generation
tools. It's also html4.0 compliance--we're supposed to ignore tags silently if
we don't recognize them.
Keywords: html4
rickg - does your patch fix bug 55042 as well?
PDT marking [rtm need info] since no code reviews are listed for this patch.
Whiteboard: [RTM+] fix in hand → [RTM need info] fix in hand
Here's the patch:

Index: nsElementTable.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsElementTable.cpp,v
retrieving revision 3.105
diff -r3.105 nsElementTable.cpp

> /**
>  * This tests whether all the bits in the parentbits
>  * are included in the given set. It may be too 
>  * broad a question for most cases.
>  *
>  * @update	gess12/13/98
>  * @param 
>  * @return
>  */
> PRBool nsHTMLElement::ContainsSet(PRInt32 aSet) const{
>   return TestBits(mParentBits,aSet);
1909c1921
<         if(!gHTMLElements[theTag].IsMemberOf(kSpecial|kFontStyle|kPhrase)) {
---
>         if(!gHTMLElements[theTag].ContainsSet(kSpecial|kFontStyle|kPhrase)) {
1952c1964
<   else if(IsMemberOf(kFormControl|kExtensions|kPreformatted)){
---
>   else if(ContainsSet(kFormControl|kExtensions|kPreformatted)){  //bug54834...
Whiteboard: [RTM need info] fix in hand → [RTM+] fix in hand
Here's the patch:

Index: nsElementTable.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsElementTable.cpp,v
retrieving revision 3.105
diff -r3.105 nsElementTable.cpp
1166c1166
<       /*req-parent excl-parent*/          eHTMLTag_unknown,eHTMLTag_unknown,
---
>       /*req-parent excl-parent*/          eHTMLTag_table,eHTMLTag_unknown,  
//fix bug 54840...
Oops, please ignore the 2nd patch, it was for bug 54840.
The code has been reviewed and superreviewed by buster.
PDT marking [rtm++]
Whiteboard: [RTM+] fix in hand → [RTM++] fix in hand
*** Bug 55042 has been marked as a duplicate of this bug. ***
Fixed by correcting the elementtable, where eHTMLTag_table's parent was 
incorrectly specified.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Reopening, since the alledged fix caused a nasty regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Restoring rtm need info status.
Please be sure super-review and reviewer are distinct folks if you go for 
reconsideration.
Thanks,
Jim
Whiteboard: [RTM++] fix in hand → [RTM need info] fix in hand
Thanks Jim. Looking into this shortly, but I've also got a new crasher to attend
to first.
Status: REOPENED → ASSIGNED
Jar: I'm a bit insulted. I do consider myself a distinct person.  If you think
rick needs a review from someone of greater distinction, maybe we can hire Knuth
away from that quaint little layout project he has going...
I know for certain that buster is distinct, because I've seen him on the 
Munsters. Aside from that, this bug is now fixed in my tree, and the unfortunate 
regression it caused has been avoided. I'm doing my (improved) regression tests 
now. r=attinasi, sr=buster.

Here's the patch:

retrieving revision 3.109
diff -r3.109 nsElementTable.cpp
1964c1964,1965
<   else if(IsMemberOf(kFormControl|kExtensions|kPreformatted)){  //bug54834...
---
>   else if(ContainsSet(kPreformatted) ||
>           IsMemberOf(kFormControl|kExtensions|kPreformatted)){  //bug54834...
Whiteboard: [RTM need info] fix in hand → [RTM+] fix in hand
rtm-
Whiteboard: [RTM+] fix in hand → [RTM-] fix in hand
Please reconsider, given that this impacts bug 56665 as well. I wont bother you 
with this again if you say no -- but backward compatibility is a passion with 
me.
Whiteboard: [RTM-] fix in hand → [RTM+] fix in hand
I thought we were taking out the fix for this bug, since that fix caused bug
56665, and then leaving it alone for RTM. Did I misunderstand? Marking [need info]
Whiteboard: [RTM+] fix in hand → [RTM need info] fix in hand
I'm at fault for not making this issue clear. 54834 and 56665 are not the same 
problem, but they are similar in nature. What I was trying to say is that 56665 
is fixed, and I have a fix for this (I know I regressed this once, but 
really...). I'm hoping that since we have had this working (in my tree and 
harish's) for a week now, that you would  reconsider allowing me to land the 
right fix. It's backward compatibility with PRE, which is an extremenly common 
element. 

As I said, I'm begging for one last consideration; after than I'll just go away 
broken hearted. :)
Whiteboard: [RTM need info] fix in hand → [RTM+] fix in hand
Doesn't this also effect forward compatibility and some unprocessed server-side 
scripts?
So you're saying we should try to fix this bug again the right way because the
problem is really common. If so, could you provide some examples of pages which
look really bad due to this lack of compatibility? If it's not common, we'd
really rather look at crash bugs.
Whiteboard: [RTM+] fix in hand → [RTM need info] fix in hand
Thanks Phil: that's fair. I'll give you data tomorrow, as I'm going to bed early 
with the flu.
I downloaded and parsed 50 of the top 100 sites (see attachment). The 
result is that ~75% of the sites have <PRE> tags, and 60% of those have nested 
style tags (which is the point of this bug). 

Given this data, I'd argue this is a significant enough trend to earn checkin 
rights (on the grounds of compatibility, not crashing). As I said, I won't 
bother you guys again. r=attinasi, sr=buster. 

I'll attach the patch and the sample results momentarily.
Whiteboard: [RTM need info] fix in hand → [RTM+] fix in hand
Ok, that's the list of style inside <pre>, but we tried a couple of those pages,
and they look fine. Seems like the pages also have to have unknown tags, and
have those tags not be closed. Since it doesn't seem like this is a common case,
marking rtm-. Slap us if we've misunderstood.
Whiteboard: [RTM+] fix in hand → [RTM-] fix in hand
This is the sound of rickg giving up.
*** Bug 22913 has been marked as a duplicate of this bug. ***
*** Bug 58576 has been marked as a duplicate of this bug. ***
Is this bug going into NS6 or not?

Thanks
It's unlikely to land in N6.
*** Bug 61077 has been marked as a duplicate of this bug. ***
Bug 61077 has this happening with an unclosed <p> tag inside a <pre> block.
*** Bug 61433 has been marked as a duplicate of this bug. ***
Here's a link that looks really bad due to this bug.  Sorry I came late to the
party.  This may not be a "top site" by any stretch but it's a really simple
example of what can go wrong.

http://c2.com/cgi/wiki?FunnyThingsSeenInSourceCodeAndDocumentation

PRE overflowing makes long paragraphs scroll pretty much all the way off the
screen. 
OS: Windows 98 → All
Hardware: PC → All
*** Bug 62655 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
This is really common when "bare-metal" webmasters cut-and-paste email (with
bracketed addresses) into their web pages. Also when they paste C langaguage
files with
  #include <stdio.h>
lines.
It might be interesting (but lower priority) to indicate "silently ignore tag
near line #" in the status line. Most people would ignore that status line. That
line would remind webmasters to properly escape angle brackets and fix some
other kinds of invalid HTML code. That line would tell Mozilla hackers and
webmasters when valid HTML code uses tags that Mozilla does not (yet) support.
Attached file another test case
rickg: is there any reason why this patch can't go into the trunk now?

Gerv
updated qa contact.
QA Contact: janc → bsharma
rickg: ping :-) 

There's a two-line patch sitting here which missed the boat for NS 6 RTM. It 
would be a shame if it missed the next boat as well :-)

Gerv
fix was checked in 3.115 <rickg@netscape.com> 07 Jan 2001 19:37

tested using w32 3/10-08 15918, 22877 appear as internally described.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
Whiteboard: [RTM-] fix in hand
Verified on:
build: 2001-04-02-09-Mtrunk
platform: WinNT

The above url loads fine and the test case loads fine.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: