Invalid tags destroyed/discarded by composer




18 years ago
17 years ago


(Reporter: timeless, Assigned: Akkana Peck)




Firefox Tracking Flags

(Not tracked)


(Whiteboard: relnote-user, URL)



18 years ago
from bug 41959 test case

My understanding is that this should render as [centered]test[/] on a stary 
background.  We have the background working, but the text `test` is missing.
Composer shows no text. fwiw The cursor is centered.
<body class="mainpage"><br>

Comment 1

18 years ago
This blocks verification of the testcase from 41959.
Blocks: 41959

Comment 2

18 years ago
This page is invalid. You've provided a <doctype HTML 4.0 strict> but your page 
is not compliant. To make it compliant, you must do this:

  <TITLE>The Nexus</TITLE>
    HREF="" TYPE="text/css">
<BODY CLASS="mainpage">
Last Resolved: 18 years ago
Resolution: --- → INVALID

Comment 3

18 years ago
Netscape 4's composer gladly preserves the following html [it adds random 
jibberish, but it doesn't delete anything]:

If a mistake is made in writing an html page [eg omiting a <p>] the flawed html 
should survive composer so that it can be fixed.  If the answer is obvious, 
maybe it should be fixed [add, don't remove].
Component: Parser → Compositor
Keywords: 4xp
OS: Windows 2000 → All
Hardware: PC → All
Resolution: INVALID → ---
Summary: Text lost in <body> tag → Invalid tags destroyed/discarded by composer

Comment 4

18 years ago
change component to editor (was compositor)
Component: Compositor → Editor

Comment 5

18 years ago
The chief issue here is that the doctype specifies STRICT. In mozilla, documents 
with a strict <doctype> MUST conform to the w3c STRICT dtd. (This is an 
essential part of our standards story).

The spec is clear about unknown tags -- the user agent is free to do with them 
as they see fit. In the strict world, they get dropped because we couldn't 
possible predict the indended use of the tag. 

Marking invalid.
Last Resolved: 18 years ago18 years ago
Resolution: --- → INVALID

Comment 6

18 years ago
Dawn just called this bug to my attention.  I just today checked in a fix for
bug 38154, which fixes the output system so that if we get an unknown tag like
<foo></foo>, we'll preserve it into the output (and strip off the _moz-unknown
attribute that got added somewhere along the line).  That was an output system

But of course, that assumes that we get the tag in the first place.  If the
parser isn't putting the tag into the content model because the DTD is specified
as strict, then the editor and output system won't know anything about it and
the fix for 38154 won't help.

You might want to check on that bug and see if that was really the bug you were
hoping to see fixed, and see if things are working better for you now.

Comment 7

18 years ago
As akkana points out, we only drop tags in the strict/transitional DTDs. HTML 
3.2 documents will store unknown tags as containers in the content model.

Comment 8

18 years ago
If someone lies about their document's doctype (as can easily happen with 
shared documents) we shouldn't punish them by quietly deleteing stuff from
their document. A much better solution would be to change the doctype.

Comment 9

18 years ago
I really hate to see us drop something just because we don't know what it is.  
Sounds rather harsh to me.

Comment 10

18 years ago
Since the user agent is free to do with unknown tags as it pleases, why not 
treat them as unknown containers? Just because you can't predict its intended 
use, does that mean it should completely be stripped out?

Actually, at 
it's recommended that: "If a user agent encounters an element it does not 
recognize, it should try to render the element's content.", to facilitate 
experimentation and interoperability between implementations of various versions 
of HTML.

Extrapolating that to composer, I would interpret it as keeping the unknown 

One instance where it helps being able to use custom tags is in templates.
For a nice example project, see, which uses 
custom tags for things like <if foo="bar">some html code</if>. Being able to 
edit templates in composer (and leaving the custom tags alone) greatly enhances 
the value of Mozilla.

Yes, in the above example, one could leave out the DTD, or specify a 3.2 DTD, 
but if the rest of the code is 4.0 DTD Strict, and after parsing and generating 
from the template one does get pure 4.0 DTD Strict html, and taking into account 
the recommendation quoted above, what would be wrong with being lenient towards 
these unknown tags, even in 4.0 DTD Strict?

Comment 11

18 years ago
The parser is stripping out invalid elements based on the DTD specifed, marking 
"Conservative in what you generate, but lenient in what you accept" is an 
ancient protocol design maxim.  Peter Annema makes this point eloquently, and 
with support from relevant standards.  I don't think it's right for this bug and 
its commmentary to be marked invalid.  I'm reopening and reassigning for further 

Resolution: INVALID → ---
David, are you willing to adjudicate this conflict?

Assignee: rickg → dbaron
There are serious problems with "Be conservative in what you generate, but
lenient in what you accept" when the large majority of those who are generating
determine the validity of what they generate by whether it is accepted by a few
(or even one) user-agent.  This is why more recent web standards (e.g., CSS,
XML) have moved towards much stricter error handling rules.  However, there
certainly is a tradition of lenient handling of HTML and we have some degree of
obligation to meet past expectations that future browsers would continue to be

However, I have to say I'm somewhat reluctant to support throwing out character
data (as opposed to tags) that are not allowed, although I'm not sure what the
parser should do to make them valid.

Comment 15

18 years ago
If the document reads strict it should be treated as such, the content provider 
has to specifically code in strict. A strict doctype is not the default. The 
default is transitional. If a content provider states strict, then that is 
exactly how the document should be processed. If the provider wants a more 
global, universal rendering of the document, then usage of the default is the 
appropriate doctype to use. Remember -- it is a concious decision to state 
strict. And the spec is pretty clear on how stirct should be handled. If the 
expectation is to allow anything and support anything regardless of the doctype 
specified, then what is the point behind being standards compliant? If the 
content provider wants a 4.x level of standards support, then stick witht he 
I'd like to see the spec that is "pretty clear on how strict should be handled."

Comment 17

18 years ago
Adjudication isn't necessary; we're doing the right thing by preserving the 
content and ignoring the tags. 

Comment 18

18 years ago
Trying to guess how to treat unknown tags is dubious at best. We *could* just 
assume unknown tags are container elementss with required endtags, of a type 
that is appropriate within their parent container. This would allow them to be 
maintained in the document. If push comes to shove, it's trivial to do this.

However, authors that specify strict are explicity saying "hold my markup to a 
higher standard". In the XML world, if someone get's their markup wrong the 
browser simply refuses to display the document (and shows an error message 
instead). In HTML, the rule is to always show something, but since they've asked 
us to be strict we should make them comply with the DTD. 

Dawn: if someone tells us to apply the strictDTD, even by mistake, we should 
honor their request. They'll bring the document up in the browser and see that 
it (does or doesn't) render as they expect. No one is lying, and no one is being 
punished. We're following the spec as we've been asked to do. People will 
inevitably make mistakes, but that doesn't mean we should throw out the 

Comment 19

18 years ago
David: simple -- look at the DTD, look at the content model for the elements 
specified, it's clear as to what elements are acceptable within any and all 
allowable elements of the DTD.
beppe:  You can say the exact same thing for HTML4 transitional, HTML 3.2, and 
HTML 2.0.

Comment 21

18 years ago
Suppose I edit someone else's page (which I do often, especially when i'm 
trying to figure out why moz/nc don't like a page). I want to be able to edit 
the page.

If the first chance i get to alter the source, the page is already destroyed, 
that's really helpful</sarcasm> [please don't strip that].  It would be really 
nice if Mozilla Composer told me that the document failed to match its dtd and 
`would it be ok if the dtd was dropped and the document reparsed?`

I also like the idea of being able to use a composer to see problems with a 
document, often our browser view-source doesn't work, and even if it does 
there's no guarantee the source is even close to readable in the source window.

In netscape4 we had yellow tags for stuff we didn't recognize, if we also had 
say red tags for stuff we thought was broken (imo this should include the dtd, 
but apparently the <head> stuff isn't really editable in the composer view.) 
people could see problems and fix them.

If you don't like that red tag stuff i could make it a separate RFE, but i 
really think that composer should be able to say the dtd isn't valid and offer 
to handle it some other way.

yes I know that people can save a document and then strip the dtd, and then use 
composer, but that's silly when composer could enable us to do it.  Remember, 
this isn't Browser, this is Composer, we're supposed to let people work on a 
document, including fixing stuff.

Comment 22

18 years ago
David: and that is the point, the DTD clearly specifies what is acceptable 
structure within the file, if you wish to alter that structure, then either 1. 
alter the DTD and publish it, and point to the new 'public' DTD in the doctype, 
or 2. add additional rules to the file that define the new elements. Simple, 
basic SGML constructs.

Comment 23

18 years ago
timeless: yes, it would be nice to somehow mark the areas that are bad in the 
file, especially in Composer. It would also be nice to throw up a dialog taht 
let the user know that the DTD specified and the elements used are not in 
synch with each other. Giving the user the opportunit to either 1. remove the 
doctype, or 2. show them where the offending elements are located.

The file however, goes through the parser before it reaches us. So, that will 
take coordination with the parser folks to see if we can work through that 
issue. I do like the red unknown tag idea, and maybe have the background greyed 
out or something to show them how far reaching the offending code is. We should 
certainly look into that, but we won't be able to really investigate this at the 

One issue that we need to take into account, is that if there is a doctype 
specified and if the user specifies to keep the doctype in, then we should not 
output any element that is not within the specified DTD. We should only emit, 
valid documents in respect to the specified DTD and that becomes more and more 
important as we begin to support other element structures such as xml.
beppe (responding to 10:22 comment):  What does the DTD have to do with this?  
I'm not saying the documents aren't wrong.  What specifies the *error handling* 
behavior?  What says error handling for documents with HTML 4.0 Strict DTDs 
should be different from error handling for documents with HTML 2.0 DTDs?  
AFAIK, there is absolutely no spec-based argument to make a distinction between 
HTML 4.0 strict and other DTDs, and you implied above that there was.

The only reason we're doing stricter error handling for documents with HTML 4.0 
strict DTDs and not other DTDs is because documents with HTML 4.0 strict DTDs 
are not yet common on the web.  We assume that (almost) any strict HTML written 
on the web will be written considering our behavior.  This will help the web in 
general *if* people move to HTML 4.0 strict that works on Mozilla.  If the 
assumption is wrong that HTML 4.0 strict documents will be written with Mozilla 
in mind, then perhaps we need to reconsider our behavior.  (Right now I think 
that assumption is close enough that we're OK.  But, I'm not so sure that people 
will write strict HTML for Mozilla.)

Comment 25

18 years ago
David: you're joking right? The DTD has everything to do with how we handle the 
structure of the file. That is what is used to determine what gets stripped and 
what doesn't. Why do you think we have a parser? That is what this topic is 
about -- the parser stripping out elements. ANd where do you think the parser 
gets that set of rules from? Yep, you betcha -- the DTD.
No, I'm not joking.  Yes, the rules in the HTML 4 strict DTD are slightly 
different from other DTDs.  However, we're not doing this type of error handling 
for other DTDs.  The DTD does *not* specify error handling behavior.  It only 
describes which documents are valid and which are not.

Comment 27

18 years ago
Oh, I see what you're trying to ask. Rick did try to preserve the Transitional 
constructs but we had to revert back to quirks mode because of the issue with 
4.x Composer placing the doctype. In addition, we had to make a call as to how 
we could be backwards compatible and still move forward. I believe the decision 
was that transitional would be more forgiving and allow for the awful page 
consrtuction that is out there. If transitional did adhere to the letter of the 
law, then a vast majority of the pages would not parse. So, an unfortunate 
choice had to be made -- do you break the vast majority of pages or do you 
provide two levels of adherence? Since transitional is the anything goes 
support, the strict is left for the true standards compliance support.

Doctype statements that are less than the current level should also trigger a 
dialog informing the user that they are editing a document with an obsoleted DTD 

Comment 28

18 years ago
In non-strict mode, the parser adds a _moz-userdefined attribute to tags it
doesn't recognize (which the output system then strips out upon output).  We
could use this to set up a style sheet (similar to the "show all tags" mode the
editor already has) which showed "red tag" warnings for tags with this
attribute.  This should be filed as a separate RFE; cmanske would probably best
know how to do this, but may not have time, so it might require help from
outside.  If we could get the tags in strict mode, or offer a dialog suggesting
to the user that the doctype needs to be changed or else we'll throw out
nonconforming data, this might solve most of the problems.

Comment 29

18 years ago
beppe, wasn't the suggestion with regard to the Composer 4.x compatibility 
issue, that <!doctype ...> is case sensitive, and Composer 4.x uses the 
incorrect case, so it wouldn't/shouldn't be treated as Transitional anyway?

Comment 30

18 years ago
yes taht was mentioned, but that is not how the problem was resolved, setting 
transitional back to quirks is what happened.
What needs to be done for this bug?  Some (random?) thoughts:

I think SGML-based HTML is dead.  That is, I think it will always be tag soup,
and attempts to get people to conform to SGML won't help.  Our browser doesn't
even accept lots of correct HTML.  However, I don't want to force this dismal
forecast on others and make it a reality, so I'm willing to accept the Strict
DTD's changes, as far as the browser is concerned, as something that *could*
help improve the quality of SGML-based HTML on the web.

However, for the Editor, these changes cause serious problems.  I like Dawn
Endico's suggestion of 2000-06-16 16:39 (if there are errors when loading a page
into the editor, we should ask the user whether the page should be parsed
loosely and the DOCTYPE changed).  It seems like a simple solution that wouldn't
disturb much else.  It requires notification of errors from the parser and some
code in the editor to check if there were parser errors.  I imagine it can't be
too hard for the parser to notice when it's dropping things.

Notification of errors would also be a very good thing for authors.  It would at
least give us a place to point when authors complain that we're dropping
markup/data.  These errors could be shown in something like the JS console.
Should the JS error console be turned into a general error console, or should
there be separate ones for different things?  I tend to like the idea of one big
error console.  Is there a console service that allows these errors to be shown
For lack of any responses to my previous comment...

Since I think any proposed solution to this bug that involves keeping the Strict
DTD requires some sort of error notification from the DTD, I'm assigning this to
Harish so he can make the parser remember that there were errors (and maybe even
what they were).
Assignee: dbaron → harishd

Comment 33

18 years ago
Reasonable: Dawn Endico 2000-06-16 19:39; David Baron 2000-07-01 00:11; Akkana 
2000-06-21 13:11 -------
I still stand by: 2000-06-16 01:45; 
2000-06-21 10:17 
wrt David Baron 2000-07-01 00:11
Yes there are console services, but i'm not sure i like that.  I'd prefer red 
tags that have hints describing why they're red.  I agree w/ your conclusion 
that parser needs to tell editor if the document failed and then allow it to 
be parsed as loose.

Maybe editor could also use the little widget describing quality of 
conformance. [I don't remember the bug #]

Comment 34

18 years ago
Strict DTD will not be supported in mozilla. Marking bug INVALID.
Last Resolved: 18 years ago18 years ago
Resolution: --- → INVALID

Comment 35

18 years ago
Please don't mark this bug as invalid. If parser has fixed its faults in this 
then feel free to reassign this to Akkana in Editor to be marked as fixed, iirc 
we currently preserve the junk (something akin to my ongoing requests and 
comments by jag and dbaron)

/me is sick of people invalidating bugs.
Resolution: INVALID → ---

Comment 36

18 years ago
Timeless, since the bug was on my plate I took the liberty of marking this bug 
INVALID. In my understanding strict DTD is the cause of the problem and since 
this DTD will not be supported in Mozilla the problem,in question, should go 
away. Isn't that correct?

Per timeless, reassigning to akkana.
Assignee: harishd → akkana

Comment 37

18 years ago
This really was a parser bug, not an editor bug.  I'll mark it fixed if you
want, and if you say the parser is no longer destroying invalid tags, but I'm
not really the right person to do it: I don't know any more than having read
your various comments saying we're not supporting strict DTD (and I'm not
entirely clear what that means -- does it mean that we've fixed this bug by
deciding not to discard unknown tags?  Or something else?)

Timeless, what are you seeing?  Is it working now for your test case?

Comment 38

18 years ago
------- Additional Comments From 2000-06-20 23:34 -------
Adjudication isn't necessary; we're doing the right thing by preserving the 
content and ignoring the tags. 
->test is still destroyed. IMO test is content.
------- Additional Comments From David Baron 2000-07-01 00:11 -------
What needs to be done for this bug?  Some (random?) thoughts:
->Editor: Please trash the DTD before giving it to Parser.  That's the end of 
the story.
------- Additional Comments From 2000-08-27 12:21 -------
Strict DTD will not be supported in mozilla. Marking bug INVALID.
->I'm not going to try to figure out what a strict DTD is.
->Editor: Don't Support DTDs I'll be happy.
->Parser: Ignore any dtd's editor gives you. I'll be happy
->Strict: Die before parser mangles the content. I'll be happy
->Coffee: I need some.
------- Additional Comments From Akkana 2000-08-28 15:52 -------
Timeless, what are you seeing?  Is it working now for your test case?
->Editor: Parser is killing test.
->Parser: Not supporting something shouldn't mean authorization to mangle it.
->Akkana: please reassign to someone in Editor to arrange for the DTD to be 
->Brendan: These people don't believe in leniancy. This is a serious problem.

Marking relnoteRTM. Shreading documents must be documented. We should not 
shread documents. Not supporting something does not justify shreading it.
Priority: Critical due to dataloss.

Editor, if you don't like the idea of trashing the dtd, preserve it but don't 
give it to parser.
Severity: normal → critical
Keywords: relnoteRTM
It was decided that we would no longer use the strict DTD for HTML.

Comment 40

18 years ago
so, we don't support strict anymore -- Harish, does that now mean unknown and/or 
invalid elements will not be stripped and that the element and data will be 

If so, then Harish this bug is your call to respond to
Assignee: akkana → harishd

Comment 41

18 years ago
Harish, does that now mean unknown and/or invalid elements will not be stripped 
and that the element and data will be preserved?

That's correct. But this will happen only when bug 50070 gets fixed.

Comment 42

18 years ago
BTW, timeless, the editor doesn't have control over what it gives to the
parser.  The editor doesn't even get created until the document is finished
loading and the dom tree is fully created.  So we're at the parser's mercy on
this one.

Comment 43

18 years ago
A long time ago
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Comment 44

18 years ago
changing qa contact to sujay
QA Contact: janc → sujay

Comment 45

18 years ago
Refuse to verify fixed
result: document is shreaded.
result: I can't actually view the source for that page :O

I don't understand what in the world is going on here.

QA: thanks for reminding me about this bug, i'm sorry that you and everyone 
else are stuck dealing w/ it.
Could you please file and cc bugs or find bugs matching the two above problems? 
I'm tired of this and i only just looked at it for 5 mins. --very sorry--

maybe i should never use editor. --very sorry, running away-- --very sorry--
Keywords: testcase

Comment 46

18 years ago
timeless, REOPEN if you think this bug is not fixed....

Comment 47

18 years ago
Here is the content model for a the simple testcase that timeless provided:

html@02EF4798 refcount=6<
  head@02EF46A8 refcount=2<
  Text@02D0EAD0 refcount=3<\n >
  body@02D0B4B8 refcount=3<
    Text@02D42800 refcount=3<\n  >
    invalid@02D43FC8 _moz-userdefined= refcount=3<
      Text@02D43D50 refcount=3<\n >

The invalid tag is in the content model. That is, parser did not discard it!

This, IMO, is a composer issue not parser.  Giving bug to Akkana and reopening
the bug.
Resolution: FIXED → ---

Comment 48

18 years ago
Reassigning to akkana.
Assignee: harishd → akkana

Comment 49

18 years ago
Can someone explain what the bug is at this point?  If I create a file
containing timeless' example, and (in a branch build, haven't tested on the
trunk) I edit that file, then when I output it, I get (after adding a title):


In other words, the only differences are formatting, title, and the addition of
a </invalid> close tag.

If I go into html source mode in the editor, or do OutputHTML, I see the same thing.

Is the close tag the problem?  (I'm not clear how the output system can
determine from the content model whether a close tag is needed on an unknown
tag; we have a few special cases, like <p>, where we know not to add a close
tag.  Is there a bug in the trunk that I'm not seeing on the branch?  What am I

Adding anthonyd since he's inheriting output system bugs.

Comment 50

18 years ago
Sorry, I haven't been creating files.

To reproduce:
run composer
view html source
enter my testcase [or yours]
view normal edit mode.

Yours shows that the page is being considered (the title survives).
this is w/ 11/01-04 w32 talkback.
Whiteboard: relnote-user

Comment 51

18 years ago
so, timeless -- are you still seeing the elements getting stripped? In a current
build (release build) I displayed the sample page from the original entry in the
browser, selected to edit page and it all renders correctly and I'm allowed to
edit without incident.
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Comment 52

18 years ago
Timeless, please verify....thanks...

Comment 53

18 years ago
I'm having a real hard time verifying because composer seems to be remembering 
random things after they're destroyed.
2000111004 [i know it isn't current]

Start with the original testcase in Navigator.
File> Edit Page.
The content has survived.

View>HTML Source
Delete the <link> entry.

The source should now be
<title>The Nexus</title>

<body class="mainpage">

Select View>Normal Edit Mode

afaik, the background should be gone because there is no longer a style sheet 
that gives it features for 'mainpage'.

In practice, the page looks like it did when i first loaded it in Composer.

Back to HTML Source.
cut ' class="mainpage"'

The page should now look like:
<title>The Nexus</title>


Go to normal edit mode
I see:
test [left aligned as expected], on a cyan blue background w/ purple stars.

?? I am very confused.  Yes I know this is really a separate bug, but the thing 
is the way I intended to verify was to simply enter my testcase in Show HTML, 
when in fact that is still practicing voodoo.

Next. Create a new blank html document [File>New Composer Page]
Go to HTML Source
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
Replace with: <html><body><invalid></body></html>
Go to Normal Edit Mode. Return to HTML Source.
Result: the text you've entered is no longer there, you have the default blank 
page text. <invalid> is definitely not there.

I will gladly verify this bug, if and when I can follow the steps I have just 
described and do get the expected output.

Go back to normal edit mode.
type <invalid>
go back to HTML Source

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
Now each time i switch back and forth I get another <br>.

These are probably all bugs unrelated to my complaint, except for my concern 
that I can't insert <invalid> in composer.

Switch to View>Show All Tags
Result: nothing happens.

I can reproduce most of these steps on netscape6rtm [i'm not going to reproduce 
them all now, but I am certain they will].  My computer has a session available 
for interaction, if you want to watch as I take these steps, or show me what I 
should do then feel free to contact me on irc [Asa and others can also show 
you how to use my computer].

Reopening per total lack of ability to insert <invalid>

I did verify that inserting <b>test</b> does work. So insert html can work.
Resolution: FIXED → ---

Comment 54

18 years ago
Timeless --  can you please file a separate bug for the new issue?  I see it
too, and we need to look at it, but it's definitely not the same bug as this,
and we need a new bug to track it.

Go ahead and assign it to me (but cc cmanske, he does a lot of work with view
source), and I'll triage.
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Comment 55

18 years ago
I talked to Charley about this issue: he said it wasn't surprising that this
happened, and that it might be very hard to fix this, because the style
information is separate from the rest of the document, and doesn't get destroyed
when the editor reloads the document.  Definitely a separate bug, which should
probably be assigned to him ( since it sounded like he was
already thinking about ways to reload the document more completely.  But cc me

Comment 56

18 years ago
Timeless, please verify this one and mark verified fixed..

I am crossing my fingers this time.
The basic issue here seems to be fixed.  When I first open the file, the unknown
tags are there (linux build 2001-01-29-08).  There are ways to make composer
drop them, and we will likely end up with a bunch of bugs on that (I just filed
bug 67007, for example).

Verified Linux build 2001012908.
timeless says it works for him on windows as well.

Marking verified.
You need to log in before you can comment on or make changes to this bug.