The parser adds a <tbody> tag to all tables read in. That means that the editor
has no choice about adding tbody tags to a user's existing html file, since we
have no way of telling a tbody which was part of the original file from one
which the parser added. We have other bugs saying that the editor should not
add tbody (see bug 7648).
I talked to Rick about this a month or two ago, and he said there was no reason
we really needed the tbody tag, and that he might be open to taking that out.
I'd like to see it removed -- keep the tbody if it's there, but don't add one.
Forwarding to ChrisK, to make sure his code won't die horribly if we leave out
he tbody. Chris -- send this back with your answer.
In theory this will work, but it will force the CSS2 table frame construction
code to add the tbody frame if it is missing. This code is fairly untested and
there are known bugs that have been delayed (since it is CSS2). I hope that we
are not planning to do this for beta1. However, if we are, then the
table regression tests would probably tell us if there are problems. Rick, I'm
giving the bug back to you.
No, it's too late for beta1, since the change might have side effects. But it
would be great if we could get it in well before beta2 and get that frame
construction code tested.
Summary of a discussion between myself and RickG: we decided that the best thing
to do was not to add the tbody except in the case where a doctype of HTML 4.0 is
explicitly specified, indicating that we should be very precise about making
sure the document is correct according to the 4.0 standard. In the case where
no doctype is specified, we should not insert the tbody. This assumes that the
backend can live without the tbody node.
Chris Karnaze says that table layout needs tbody *frames*, but does not need a
tbody node in the content tree. In fact, there's code there now to generate
tbody frames in the case that there's no tbody content node, but due to bug
2497, sometimes too many tbody frames (one for each row) are generated. So this
bug (suppressing the creation of the tbody node) should not be implemented until
after 2479 is fixed, and I've set up a dependancy accordingly.
The editor bug which depends on this one, 7648, is implicated in the "roundtrip
issues" feature item in the current beta 2 feature list, because our old
composer does bad things with a tbody; and this means that it's a mail compose
interoperability issue. So Chris and I decided the thing to do was to put beta2
in the keywords field of all three bugs, and let the PDT committee decide
whether this is a beta2 or beta3 feature.
How will not having the TBODY in the content model affect DOM1 compliance? In
particular, how will it affect the row manipulation functions:
Ok -- the parser is ready to do this. ChrisK: when you're ready, send this bug
back I'll enable it from the parser.
Rick, I'm ready, but you should let me do some testing with the parser changes
before checking them in.
The changes are in (controlled by a #define in CNavDTD.cpp). Back to Chris.
Putting on [nsbeta2-] radar.
This bug was nominated for beta2 and not accepted. Is it really important or
not. I'm marking it future to press the issue.
Well, I think these are the pertinent issues:
1. TBODY is not a valid element in HTML 3.2. If these tags get added to an HTML
3.2 document, it will invalidate the document.
2. My understanding is that it has been a design goal for the HTML composer not
to arbitrarily add markup without explicit user intervention.
If the HTML composer is supposed to be able to edit HTML 3.2 documents, #1 would
seem to be a showstopper. #2 is fudgable, I think, as long as we're only dealing
with HTML 4 documents, since this doesn't change the content model.
if the tbody stays in, a possible workaround would be to either include or
update a doctype that specified 4.0 transitional -- if the document did not
already have a doctype then we would need to insert one, and if it did have a
doctype specified, we would ensure it was 4.0 Transitional, unless of course if
it states 4.0 strict
If that workaround is chosen, a warning dialog is definitely in order. The
document type should not be changed surreptitiously.
I'd like to get back to the issue David Baron raised some time ago: DOM
compliance. I haven't confirmed this, but I strongly suspect from this bug that
if I attempted to do a DOM tree walk of an HTML 3.2 document in Mozilla, I would
hit a TBODY element. Clearly, that should not happen.
Also, I strongly suspect this is a cross-platform bug, though it's only
registered for Linux. Can anyone confirm elsewhere?
Cmanske (a Windows user) was the one who told me that older versions of composer
couldn't edit tables with tbody properly, so if the user sends mail with a tbody
(i.e. anything with a table created in mozilla composer), the contents aren't
editable in a reply from 4.x. That seems like a fairly serious interoperability
problem and a good reason to make this nsbeta2 or at least nsbeta3. Perhaps
Charley can elaborate (cc'ing).
This is definitely a cross-platform issue. Bug 7648 covers the issue of what
we generate in Composer: Mostly, we will NOT insert a <tbody> since we use
the 4.0 Transitional DTD by default. We will insert only when we encounter a
"4.01 Strict" DTD. So the parser should also test the DTD and follow the same
rules. This is an important issue and must be done before shipping; whether it
should be done for nsbeta2 depends on our judgement of how important it is
to not write table code that Mail Composer 4.x doesn't like.
The DOM compliance issues are strange because the HTML DOM is designed for HTML
4.0 and up, so applying the HTML DOM to HTML 3.2 doesn't make much sense. The
core DOM could be applied, but...
Yes, the DOM Core was what I had in mind. The HTML binding to the DOM is clearly
targetted at HTML 4, and is probably inappropriate for anything other than HTML
I can guess that something like "... why would anyone want to???" might follow
your trailing "but ...". That's a valid question, alright ... I'm just not sure
how pertinent it is. If people *can* do something, they *will* do it. Whether by
accident or deliberately, I can't predict, and it really doesn't matter. When
they do it, or rather, when they *try* to do it, the Right Thing ought to
If the verdict is that Mozilla won't support the DOM for HTML 3.2 documents, the
Right Thing is total failure. Otherwise, DOM Core support for HTML 3.2 documents
Basically, I completely agree with you that applying the HTML DOM to HTML 3.2
documents doesn't make much sense. And all I'm saying here is that Mozilla
shouldn't do that.
Future? I thought this wasn't expected to be hard once 2479 was fixed. Our
editor bug 7648 is still blocked by this ... please reconsider?
There are some DOM1-HTML compliance issues involved here too. See
Chris, can't we at least look to see if there is an existing doctype to check
and see if it is pre-4.0, and if so, not include the tbody? The inclusion of the
tbody in docs that are not 4.0 does make the file invalid. If there isn't a
doctype specified, or if a 4.0 doctype is specified, then adding the tbody
should be ok -- however, if the doctype is missing, one must be added
Removing future and adding nsbeta3, correctness to keywords to get it back on
After discussion with ekrock this was deemed less important than other table
issues. Denying beta3 approval for this reason. Editor folks, the dependent bug
is not approved for beta3 as I write this; if this blocks that bug and it is
approved for beta3 we should re-evaluate the disposition of this bug.
Did everyone consider the 4.x Message Composer issue? When a page with tbody
is read in mail composer, it will move it into a caption. Thus multiple round
trips to/from 4.x and mozilla will accumulate lots of "unknown" tags (yellow
icon in 4.x) in mail messages. Concerning bug 7648 for editor, that is easy to
fix since we can get the DTD and insert a tbody only with HTML 4 strict, as
discussed above. Is it that hard for the parser to skip inserting it as well?
Chris needs to look back at this and see if the table code can handle not having
the TBODY. Also, we need to see if the parser code that was skipping the TBODY
is still in there. I'll mark this NSBETA3+ to get chris to look at it, but if it
is not an easy one it will be reverted to NSBETA3- and we will ahve to live with
the junky yellow 'unknown' tags.
assigning to rickg.
I ran the table regression tests with ALLOW_TR_AS_CHILD_OF_TABLE defined in the
parser. The regression data is very different because of this change, which
prevents automated comparisons. But I looked at a lot of pages and they all
seem fine. So I think we can go ahead with this.
The code is in (and works), but I'm waiting for Karnaze to confirm (next week).
My code is done and ready to enable with a #define. Karnaze has asked me not to
do so for now. So I'm marking WONTFIX, even though it IS fixed, but not enabled.
Umm, shouldn't you mark it as "future" milestone and *not* "wontfix"?
We need to resolve this sometime!
Charlie: what I'm trying to say is that my work is done. If karnaze wants to use
it, it's simply a #define. So I'm technically done, while a policy issue
I agree with Charles re. Future being more appropriate than WONTFIX. This bug is
blocking bug 7648, which tracks hosage of HTML 3.2 documents with tables when
saved in the editor. Maybe pass it off to Karnaze... but it seems important that
this not fall *too* far off the radar.
I agree, too. It's not fixed as long as the code is disabled. Reopening ...
... and passing to Karnaze.
I'm marking this nsbeta3- with "future" milestone. Beth agreed that the risk
outweights the gain.
updated qa contact.
Trying to breath life into this again
IMHO the RightThing would be to always have the parser add the tbody element
into the DOM since that is what the DOM-spec says. But then have the serializer
remove the <tbody> tags when serializing to html3.2 or whenever the tbody
dosn't have any attributes
The dom tree would still be invalid in that case. Why not fix it the right way?
So what is the right way then? I thought that my proposal was...
Why would the domtree be invalid?
I assumed that something that was illegal when serialized was also illegal when
sitting in a dom tree. Is that wrong -- is the legality of tags only applicable
when they're in serialized form?
More important, the serializers currently have no knowledge of doctypes, dtd or
how to ask a dtd what's legal. Putting special-case code into the serializer
for one particular tag in one particular dtd is clearly the wrong solution (it
might be a quick fix for this one case, but it gets ugly fast if you try to
generalize it). The right solution is for something that has access to the dtd
classes, and can ask the dtd what's legal and what isn't, to make these decisions.
Is this for HTML or XHTML ?
this isnt a tble bug specifically.
Is this bug a reason why in code saved via 'Save As' is added <TBODY> tags?
yes, teh insertion is due to the doctype used when we save. We insert the 4.01
transitional doctype statement, sice doctypes are required. The <tbody> element
is a required element in the 4.01 DTD.
<!ELEMENT TABLE - - (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)>
<!ELEMENT CAPTION - - (%inline;)* -- table caption -->
<!ELEMENT THEAD - O (TR)+ -- table header -->
<!ELEMENT TFOOT - O (TR)+ -- table footer -->
<!ELEMENT TBODY O O (TR)+ -- table body -->
<!ELEMENT COLGROUP - O (COL)* -- table column group -->
<!ELEMENT COL - O EMPTY -- table column -->
<!ELEMENT TR - O (TH|TD)+ -- table row -->
<!ELEMENT (TH|TD) - O (%flow;)* -- table header cell, table data cell-->
While the TBODY element is required, its tags are optional since it can be
inferred. So, Adam, your HTML 4.0 document's table has a TBODY element whether
you can see it in the code or not. Just like it has HTML, HEAD, and BODY
elements, even if you've omitted those tags. Most authors prefer the practice of
explicitly including tags even when they can be omitted, and this is also more
straightforward in terms of implementation in a browser.
I think this bug is too narrow in its scope. I think the only good reasons to
zero in on TBODY tags as a problem are articulated in bug 7648. In general,
though, it would make more sense to treat TBODY tags as any other optional tags.
If Mozilla has made a decision to include them, this is Not A Bug, except as
described in bug 7648.
It is not sufficient for Mozilla simply to omit TBODY tags as long as the author
has omitted them *unless it does that for all optional tags*. Mozilla should
look at the doctype (HTML 4.0 vs. 3.2) and do the Right Thing.
I could agree with your opinions (after reading some part of specification), but
I prefer in actions like 'Save As' to have similar code as is possible to
original. But it isn't really problem for me.
A related problem with XHTML: in XML we cannot insert a missing tbody, which
results in buggy layout because a lot of the table layout logic is implemented
and is dependent on tbody being there (see bug 68061, I think).
Braden: I agree with you on most of the points. However I don't see why this
has to be an "all or nothing" thing? Why can't mozilla output some optional
tags while leaving out others?
I think that we should output something that looks as "good" as possible.
However the definition of "good" is very subjective. IMHO "good" is something
that is readable and what people are used to see. So for example IMHO <tbody>
start and end tags should not be serialized, while ending <td> tags should be
serialized even though they are optional.
Actually, i would say that the first "good" qualifier is something that is as
close to the originally parsed markup, however i don't think that we would get
very close to that, so IMHO it might be better to drop this requirement all
"Why can't mozilla output some optional tags while leaving out others?"
It could, but under what rationale? How would you make it clear to users what to
What about <html>, </html>, <head>, </head>, <body>, </body>, </p>? What kind of
code "looks good" to HTML developers? That's going to be a highly subjective
thing, and you're not going to find agreement. Unless you want to include a
meticulous UI that lets users switch on and off whether the tag is output when
it's optional, all-or-nothing is the best bet.
removing myself from the cc list
Authors write TBODY tags only when they also write TFOOT or THEAD tags, or
IMO everybody will be happy is, when saving an HTML file, the TBODY is written
there only if:
- it has an attribute.
- there are tfoot or thead nodes too.
- there are other tbody's.
I don't suppose we can assume that thead and tfoot will always come before (in
the dom iterator sense) tbody? The serializer only gets one node at a time, so
it can't "look around" for sibling nodes. It could theoretically do something
like save the whole contents inside <table> to a separate placeholder string,
and then add a tbody if it sees tfoot or thead or another tbody, but that would
end up being fairly complicated code compared to just doing it at parse time
when the parser has all the information it needs.
akkana: Yes, you can.
From HTML 4.01 Transitional:
<!ELEMENT TABLE - -
(CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)>
From XHTML 1.0 Transitional:
(caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
(The productions are slightly different because XML doesn't have a notion of
One other thing... The parser *should* add TBODY to HTML 4.0 tables. There,
TBODY is an inferred element. Summary changed accordingly.
Is layout ready for this change in the parser? Last I checked, a lot of the
table layout logic was implemented in the tbody element code, which is not used
if there is no tbody element...
I don't see anything of interest in nsHTMLTableSectionElement.cpp. We should be
constructing frames even if the element isn't present.
There would still be an nsTableRowGroupFrame (but it would be anonymous) if the
parser stopped adding a table section element. However, a lot of anonymous frame
construction code would be getting used more often and a lot of bugs could
but removing the <tbody> while serializing html shouldn't affect the content-
model since the <tbody> content-node will be inserted while parsing even if the
<tbody>-tags arn't there.
But no, layout can't fully handle a table without a <tbody> content-node. See
this bug does not apply to XHTML since a <tbody> in a xhtml table should be
serialized since it was explicilty added (either in the original xml-markup or
through the DOM)
Jonas: Yes, this applies to XHTML. If an XHTML table is read that *does not*
include a TBODY element, it is incorrect for the parser to add it (since that
changes the content model).
Similarly, adding a TBODY element during serializing is the Wrong Thing to do
> If an XHTML table is read that *does not*
> include a TBODY element, it is incorrect for the parser to add it (since that
> changes the content model).
Yep, and we don't do that, so we do the right thing.
> Similarly, adding a TBODY element during serializing is the Wrong Thing to do
> for XHTML.
And we don't do that either. Since xhtml is serialized using the xml
serializer, which has no knowledge about tables.
Leaving it up to you to remove the xhtml-stuff if you agree with me.
Jonas: I created a simple XHTML file that included a TABLE with no TBODY, loaded
it into Composer, and went to the "HTML Source" tab. Sure enough, the code had
sprouted TBODY tags.
This, admittedly, is using 0.9.9. Have things changed in more recent builds?
A clarification: for Mozilla there is only one kind of XHTML, and that is when
it is served with an XML mime type. In that case no tbody is added, nor removed,
like Jonas said.
If you serve XHTML with HTML mime type, we do normal HTML processing in strict
mode (triggered by doctype), and in that case the HTML parser inserts the tbody.
Removing XHTML keyword.
Heikki: What about when a local XHTML file is edited using Composer?
Surely the behavior in that case is a bug; should another bug report be opened
on that? How should it be phrased differently from this one?
Composer doesn't *have* an XHTML editor, as far as I know. It has an HTML
editor. Any loading of a document into composer is as text/html.
David: Then Composer needs to issue a warning to users who mistakenly load an
XHTML document into the editor rather than silently corrupt the file (just as it
needs to do for text/html that is not HTML 4).
If Composer pretends to be able to load and edit an XHTML document, users will
assume it can do so competently.
in any event, composers behaviour should be discussed elsewhere, in a different
bug or in some newsgroup.
Adjusting summary, this bug applies to html4 as well, but not to xhtml
Composer opens XML documents in plain text mode.
Jonas: How is the a problem with HTML 4? Since TBODY is an inferred element in
HTML 4, the parser *must* add it.
Heikki: More accurately, Composer opens files that end in ".xml" in plain text
mode. I strongly suspect that it will be quite ordinary for users to use ".html"
for XHTML files. Composer will modify those files in an unanticipated way.
hmm.. is this bug about the parser or the serializer? the summary says parser,
but most of the discussion has been about the serializer.
I agree that the parser has to add it, and it does. But the serializer shouldn't
Jonas: Surely the serializer should include TBODY if it was present in the
original input. Is it possible for the serializer to know whether that was the
case or not?
no it can't. At least not in the general case, composer might be, i'm not sure.
see the for example comments 50-52 for discussion about when <tbody> should be
How about if the parser flagged ( using a proprietary attribute ) the fabricated
TBODY? Can't the serializer make use of that information?
Adding "3.2" back to the summary.
I've filed bug 141334 about Composer opening XHTML files named "*.html" as
Jonas has suggested that the serializer should not output inferred elements that
didn't occur explicitly in the original input. I don't agree that this is a
problem, so someone else can file a bug on that if they want to.
Harish: if the parser would flag inferred elements, the serializer could very
easily omit them at output time. That would be a great solution if the parser
can be made to do it.
but please don't add a DOM-attribute (using SetAttribute() or one of it's
friends) to the element. Set some internal flag through some internal mozilla-
akkana: That would cure a symptom, but not the problem: it does not solve the
problem of a DOM tree generated from an HTML 3.2 document containing a TBODY
(a) HTML 4.01 spec says in section 11.2.3 :
The TBODY start tag is always required except when the table contains
only one table body and no table head or foot sections. The TBODY end
tag may always be safely omitted.
(b) legacy browsers choke on tbody
(c) I receive user complaints by email
Just to correct some incorrect statements made earlier in this bug:
1) A DOM that would be an invalid document when serialized is not an invalid
DOM. The DOM spec explicitly says that you can create such DOMs via the DOM
2) Mozilla's HTML parser does not enforce <tfoot> and <thead> coming before
<tbody> and authors often put <tfoot> after <tbody>, so you cannot expect to
see the <thead>/<tfoot> before <tbody> when serializing.
Created attachment 238468 [details] [diff] [review]
patch: #define-replacement by new (boolean) preferences value
Wouldn't it hence be neat to let the user decide from preferences at runtime, if he wants to allow <tr> to be child of <table>, instead of #define'ing this at compiletime?
Have a look at the attached patch, default behaviour (i.e. in case of unset "parser.allowtraschildoftable") is kept, so no one should get into trouble.
I don't see any point to the user deciding, and giving users two options just makes things harder for authors.
The point is that inserted tbodies do not belong there. If the document's author liked them, he would have inserted them himself. I don't think it's consistent either, to force users to accept unnecessarily inserted elements just for compatibility reasons with broken browsers. At least those who care should be able to switch to a more natural behaviour easily - on their own risk, of course.
I admit, the cleanest solution would for sure be, to detect the document type, insert <tbody> in HTML 4 (if necessary), remove it in HTML 3.2 (if necessary), and don't touch it in XHTML. My patch would become obsolete.
There are reasonable arguments for having this be AUTHOR-controlled, but none for having it be USER-controlled. However, I don't think the arguments for having it be author-controlled are strong enough to add versioning to HTML.
(In reply to comment #82)
> There are reasonable arguments for having this be AUTHOR-controlled, but none
> for having it be USER-controlled. However, I don't think the arguments for
> having it be author-controlled are strong enough to add versioning to HTML.
Agreed a zillion %.
Inferring tbody elements in HTML is done by any modern browser, from IE to Opera and Safari (in Quirks Mode or with an HTML 3.x doctype). I think changing the behaviour would do more harm than good, because some sites would break in Mozilla only.
Don't fix a running system.
Besides, we're nearing HTML 5, thus a model of HTML in which a HTML3.2 document (which doesn't really exist) would automatically become a HTML5 document.
In this sense, I'll close this. Please punish me if that's too rashly.
By the way: There are some parser hooks like "eQuirks3" that don't seem to do anything. These should probably get removed.