Closed Bug 30378 Opened 25 years ago Closed 16 years ago

Parser should not add <tbody> to HTML 3.2 tables

Categories

(Core :: DOM: HTML Parser, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: akkzilla, Assigned: harishd)

References

Details

(Whiteboard: [nsbeta3-] fix waiting)

Attachments

(1 file)

The parser adds a <tbody> tag to all tables read in. That means that the editor has no choice about adding tbody tags to a user's existing html file, since we have no way of telling a tbody which was part of the original file from one which the parser added. We have other bugs saying that the editor should not add tbody (see bug 7648). I talked to Rick about this a month or two ago, and he said there was no reason we really needed the tbody tag, and that he might be open to taking that out. I'd like to see it removed -- keep the tbody if it's there, but don't add one.
Blocks: 7648
Forwarding to ChrisK, to make sure his code won't die horribly if we leave out he tbody. Chris -- send this back with your answer.
Assignee: rickg → karnaze
In theory this will work, but it will force the CSS2 table frame construction code to add the tbody frame if it is missing. This code is fairly untested and there are known bugs that have been delayed (since it is CSS2). I hope that we are not planning to do this for beta1. However, if we are, then the table regression tests would probably tell us if there are problems. Rick, I'm giving the bug back to you.
Assignee: karnaze → rickg
No, it's too late for beta1, since the change might have side effects. But it would be great if we could get it in well before beta2 and get that frame construction code tested.
Depends on: 2479
Keywords: beta2
Summary of a discussion between myself and RickG: we decided that the best thing to do was not to add the tbody except in the case where a doctype of HTML 4.0 is explicitly specified, indicating that we should be very precise about making sure the document is correct according to the 4.0 standard. In the case where no doctype is specified, we should not insert the tbody. This assumes that the backend can live without the tbody node. Chris Karnaze says that table layout needs tbody *frames*, but does not need a tbody node in the content tree. In fact, there's code there now to generate tbody frames in the case that there's no tbody content node, but due to bug 2497, sometimes too many tbody frames (one for each row) are generated. So this bug (suppressing the creation of the tbody node) should not be implemented until after 2479 is fixed, and I've set up a dependancy accordingly. The editor bug which depends on this one, 7648, is implicated in the "roundtrip issues" feature item in the current beta 2 feature list, because our old composer does bad things with a tbody; and this means that it's a mail compose interoperability issue. So Chris and I decided the thing to do was to put beta2 in the keywords field of all three bugs, and let the PDT committee decide whether this is a beta2 or beta3 feature.
How will not having the TBODY in the content model affect DOM1 compliance? In particular, how will it affect the row manipulation functions: http://www.w3.org/TR/REC-DOM-Level-1/level-one-html.html
Ok -- the parser is ready to do this. ChrisK: when you're ready, send this bug back I'll enable it from the parser.
Assignee: rickg → karnaze
Rick, I'm ready, but you should let me do some testing with the parser changes before checking them in.
Assignee: karnaze → rickg
Target Milestone: --- → M16
The changes are in (controlled by a #define in CNavDTD.cpp). Back to Chris.
Assignee: rickg → karnaze
Keywords: nsbeta2
Putting on [nsbeta2-] radar.
Whiteboard: [nsbeta2-]
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
This bug was nominated for beta2 and not accepted. Is it really important or not. I'm marking it future to press the issue.
Target Milestone: M17 → Future
Well, I think these are the pertinent issues: 1. TBODY is not a valid element in HTML 3.2. If these tags get added to an HTML 3.2 document, it will invalidate the document. 2. My understanding is that it has been a design goal for the HTML composer not to arbitrarily add markup without explicit user intervention. If the HTML composer is supposed to be able to edit HTML 3.2 documents, #1 would seem to be a showstopper. #2 is fudgable, I think, as long as we're only dealing with HTML 4 documents, since this doesn't change the content model.
if the tbody stays in, a possible workaround would be to either include or update a doctype that specified 4.0 transitional -- if the document did not already have a doctype then we would need to insert one, and if it did have a doctype specified, we would ensure it was 4.0 Transitional, unless of course if it states 4.0 strict
If that workaround is chosen, a warning dialog is definitely in order. The document type should not be changed surreptitiously.
I'd like to get back to the issue David Baron raised some time ago: DOM compliance. I haven't confirmed this, but I strongly suspect from this bug that if I attempted to do a DOM tree walk of an HTML 3.2 document in Mozilla, I would hit a TBODY element. Clearly, that should not happen. Also, I strongly suspect this is a cross-platform bug, though it's only registered for Linux. Can anyone confirm elsewhere?
Cmanske (a Windows user) was the one who told me that older versions of composer couldn't edit tables with tbody properly, so if the user sends mail with a tbody (i.e. anything with a table created in mozilla composer), the contents aren't editable in a reply from 4.x. That seems like a fairly serious interoperability problem and a good reason to make this nsbeta2 or at least nsbeta3. Perhaps Charley can elaborate (cc'ing).
OS: Linux → All
Hardware: PC → All
This is definitely a cross-platform issue. Bug 7648 covers the issue of what we generate in Composer: Mostly, we will NOT insert a <tbody> since we use the 4.0 Transitional DTD by default. We will insert only when we encounter a "4.01 Strict" DTD. So the parser should also test the DTD and follow the same rules. This is an important issue and must be done before shipping; whether it should be done for nsbeta2 depends on our judgement of how important it is to not write table code that Mail Composer 4.x doesn't like.
The DOM compliance issues are strange because the HTML DOM is designed for HTML 4.0 and up, so applying the HTML DOM to HTML 3.2 doesn't make much sense. The core DOM could be applied, but...
Yes, the DOM Core was what I had in mind. The HTML binding to the DOM is clearly targetted at HTML 4, and is probably inappropriate for anything other than HTML 4.0x/XHTML 1.0. I can guess that something like "... why would anyone want to???" might follow your trailing "but ...". That's a valid question, alright ... I'm just not sure how pertinent it is. If people *can* do something, they *will* do it. Whether by accident or deliberately, I can't predict, and it really doesn't matter. When they do it, or rather, when they *try* to do it, the Right Thing ought to happen. If the verdict is that Mozilla won't support the DOM for HTML 3.2 documents, the Right Thing is total failure. Otherwise, DOM Core support for HTML 3.2 documents seems appropriate. Basically, I completely agree with you that applying the HTML DOM to HTML 3.2 documents doesn't make much sense. And all I'm saying here is that Mozilla shouldn't do that.
Future? I thought this wasn't expected to be hard once 2479 was fixed. Our editor bug 7648 is still blocked by this ... please reconsider?
Chris, can't we at least look to see if there is an existing doctype to check and see if it is pre-4.0, and if so, not include the tbody? The inclusion of the tbody in docs that are not 4.0 does make the file invalid. If there isn't a doctype specified, or if a 4.0 doctype is specified, then adding the tbody should be ok -- however, if the doctype is missing, one must be added Removing future and adding nsbeta3, correctness to keywords to get it back on the radar
Keywords: nsbeta2correctness, nsbeta3
Whiteboard: [nsbeta2-]
Target Milestone: Future → ---
After discussion with ekrock this was deemed less important than other table issues. Denying beta3 approval for this reason. Editor folks, the dependent bug is not approved for beta3 as I write this; if this blocks that bug and it is approved for beta3 we should re-evaluate the disposition of this bug.
Whiteboard: [nsbeta3-]
Target Milestone: --- → Future
Did everyone consider the 4.x Message Composer issue? When a page with tbody is read in mail composer, it will move it into a caption. Thus multiple round trips to/from 4.x and mozilla will accumulate lots of "unknown" tags (yellow icon in 4.x) in mail messages. Concerning bug 7648 for editor, that is easy to fix since we can get the DTD and insert a tbody only with HTML 4 strict, as discussed above. Is it that hard for the parser to skip inserting it as well?
Whiteboard: [nsbeta3-]
Target Milestone: Future → ---
Chris needs to look back at this and see if the table code can handle not having the TBODY. Also, we need to see if the parser code that was skipping the TBODY is still in there. I'll mark this NSBETA3+ to get chris to look at it, but if it is not an easy one it will be reverted to NSBETA3- and we will ahve to live with the junky yellow 'unknown' tags.
Whiteboard: [nsbeta3+]
assigning to rickg. I ran the table regression tests with ALLOW_TR_AS_CHILD_OF_TABLE defined in the parser. The regression data is very different because of this change, which prevents automated comparisons. But I looked at a lot of pages and they all seem fine. So I think we can go ahead with this.
Assignee: karnaze → rickg
Status: ASSIGNED → NEW
The code is in (and works), but I'm waiting for Karnaze to confirm (next week).
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3+] → [nsbeta3+] fix waiting
My code is done and ready to enable with a #define. Karnaze has asked me not to do so for now. So I'm marking WONTFIX, even though it IS fixed, but not enabled.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
Umm, shouldn't you mark it as "future" milestone and *not* "wontfix"? We need to resolve this sometime!
Charlie: what I'm trying to say is that my work is done. If karnaze wants to use it, it's simply a #define. So I'm technically done, while a policy issue remains.
I agree with Charles re. Future being more appropriate than WONTFIX. This bug is blocking bug 7648, which tracks hosage of HTML 3.2 documents with tables when saved in the editor. Maybe pass it off to Karnaze... but it seems important that this not fall *too* far off the radar.
I agree, too. It's not fixed as long as the code is disabled. Reopening ...
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
... and passing to Karnaze.
Assignee: rickg → karnaze
Status: REOPENED → NEW
I'm marking this nsbeta3- with "future" milestone. Beth agreed that the risk outweights the gain.
Whiteboard: [nsbeta3+] fix waiting → [nsbeta3-] fix waiting
Target Milestone: --- → Future
updated qa contact.
QA Contact: janc → bsharma
Trying to breath life into this again IMHO the RightThing would be to always have the parser add the tbody element into the DOM since that is what the DOM-spec says. But then have the serializer remove the <tbody> tags when serializing to html3.2 or whenever the tbody dosn't have any attributes
The dom tree would still be invalid in that case. Why not fix it the right way?
So what is the right way then? I thought that my proposal was... Why would the domtree be invalid?
I assumed that something that was illegal when serialized was also illegal when sitting in a dom tree. Is that wrong -- is the legality of tags only applicable when they're in serialized form? More important, the serializers currently have no knowledge of doctypes, dtd or how to ask a dtd what's legal. Putting special-case code into the serializer for one particular tag in one particular dtd is clearly the wrong solution (it might be a quick fix for this one case, but it gets ugly fast if you try to generalize it). The right solution is for something that has access to the dtd classes, and can ask the dtd what's legal and what isn't, to make these decisions.
QA Contact: bsharma → moied
Is this for HTML or XHTML ?
HTML.
this isnt a tble bug specifically.
Assignee: karnaze → harishd
Is this bug a reason why in code saved via 'Save As' is added <TBODY> tags?
yes, teh insertion is due to the doctype used when we save. We insert the 4.01 transitional doctype statement, sice doctypes are required. The <tbody> element is a required element in the 4.01 DTD. http://www.w3.org/TR/html4/sgml/loosedtd.html <!ELEMENT TABLE - - (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)> <!ELEMENT CAPTION - - (%inline;)* -- table caption --> <!ELEMENT THEAD - O (TR)+ -- table header --> <!ELEMENT TFOOT - O (TR)+ -- table footer --> <!ELEMENT TBODY O O (TR)+ -- table body --> <!ELEMENT COLGROUP - O (COL)* -- table column group --> <!ELEMENT COL - O EMPTY -- table column --> <!ELEMENT TR - O (TH|TD)+ -- table row --> <!ELEMENT (TH|TD) - O (%flow;)* -- table header cell, table data cell-->
While the TBODY element is required, its tags are optional since it can be inferred. So, Adam, your HTML 4.0 document's table has a TBODY element whether you can see it in the code or not. Just like it has HTML, HEAD, and BODY elements, even if you've omitted those tags. Most authors prefer the practice of explicitly including tags even when they can be omitted, and this is also more straightforward in terms of implementation in a browser. I think this bug is too narrow in its scope. I think the only good reasons to zero in on TBODY tags as a problem are articulated in bug 7648. In general, though, it would make more sense to treat TBODY tags as any other optional tags. If Mozilla has made a decision to include them, this is Not A Bug, except as described in bug 7648. It is not sufficient for Mozilla simply to omit TBODY tags as long as the author has omitted them *unless it does that for all optional tags*. Mozilla should look at the doctype (HTML 4.0 vs. 3.2) and do the Right Thing.
I could agree with your opinions (after reading some part of specification), but I prefer in actions like 'Save As' to have similar code as is possible to original. But it isn't really problem for me.
A related problem with XHTML: in XML we cannot insert a missing tbody, which results in buggy layout because a lot of the table layout logic is implemented and is dependent on tbody being there (see bug 68061, I think).
Braden: I agree with you on most of the points. However I don't see why this has to be an "all or nothing" thing? Why can't mozilla output some optional tags while leaving out others? I think that we should output something that looks as "good" as possible. However the definition of "good" is very subjective. IMHO "good" is something that is readable and what people are used to see. So for example IMHO <tbody> start and end tags should not be serialized, while ending <td> tags should be serialized even though they are optional. Actually, i would say that the first "good" qualifier is something that is as close to the originally parsed markup, however i don't think that we would get very close to that, so IMHO it might be better to drop this requirement all together
"Why can't mozilla output some optional tags while leaving out others?" It could, but under what rationale? How would you make it clear to users what to expect? What about <html>, </html>, <head>, </head>, <body>, </body>, </p>? What kind of code "looks good" to HTML developers? That's going to be a highly subjective thing, and you're not going to find agreement. Unless you want to include a meticulous UI that lets users switch on and off whether the tag is output when it's optional, all-or-nothing is the best bet.
removing myself from the cc list
Authors write TBODY tags only when they also write TFOOT or THEAD tags, or several TBODY's. IMO everybody will be happy is, when saving an HTML file, the TBODY is written there only if: - it has an attribute. - there are tfoot or thead nodes too. - there are other tbody's.
I don't suppose we can assume that thead and tfoot will always come before (in the dom iterator sense) tbody? The serializer only gets one node at a time, so it can't "look around" for sibling nodes. It could theoretically do something like save the whole contents inside <table> to a separate placeholder string, and then add a tbody if it sees tfoot or thead or another tbody, but that would end up being fairly complicated code compared to just doing it at parse time when the parser has all the information it needs.
akkana: Yes, you can. From HTML 4.01 Transitional: <!ELEMENT TABLE - - (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)> From XHTML 1.0 Transitional: <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> (The productions are slightly different because XML doesn't have a notion of inferred elements.)
One other thing... The parser *should* add TBODY to HTML 4.0 tables. There, TBODY is an inferred element. Summary changed accordingly.
Summary: Parser should not add <tbody> to tables → Parser should not add <tbody> to HTML 3.2 and XHTML tables
Keywords: xhtml
Is layout ready for this change in the parser? Last I checked, a lot of the table layout logic was implemented in the tbody element code, which is not used if there is no tbody element...
Depends on: 68061
No longer depends on: 68061
I don't see anything of interest in nsHTMLTableSectionElement.cpp. We should be constructing frames even if the element isn't present.
There would still be an nsTableRowGroupFrame (but it would be anonymous) if the parser stopped adding a table section element. However, a lot of anonymous frame construction code would be getting used more often and a lot of bugs could arise.
but removing the <tbody> while serializing html shouldn't affect the content- model since the <tbody> content-node will be inserted while parsing even if the <tbody>-tags arn't there. But no, layout can't fully handle a table without a <tbody> content-node. See bug 68061
this bug does not apply to XHTML since a <tbody> in a xhtml table should be serialized since it was explicilty added (either in the original xml-markup or through the DOM)
Keywords: xhtml
Summary: Parser should not add <tbody> to HTML 3.2 and XHTML tables → Parser should not add <tbody> to HTML 3.2 tables
Jonas: Yes, this applies to XHTML. If an XHTML table is read that *does not* include a TBODY element, it is incorrect for the parser to add it (since that changes the content model). Similarly, adding a TBODY element during serializing is the Wrong Thing to do for XHTML.
Keywords: xhtml
Summary: Parser should not add <tbody> to HTML 3.2 tables → Parser should not add <tbody> to HTML 3.2 or XHTML tables
> If an XHTML table is read that *does not* > include a TBODY element, it is incorrect for the parser to add it (since that > changes the content model). Yep, and we don't do that, so we do the right thing. > Similarly, adding a TBODY element during serializing is the Wrong Thing to do > for XHTML. And we don't do that either. Since xhtml is serialized using the xml serializer, which has no knowledge about tables. Leaving it up to you to remove the xhtml-stuff if you agree with me.
Jonas: I created a simple XHTML file that included a TABLE with no TBODY, loaded it into Composer, and went to the "HTML Source" tab. Sure enough, the code had sprouted TBODY tags. This, admittedly, is using 0.9.9. Have things changed in more recent builds?
Depends on: 60861
A clarification: for Mozilla there is only one kind of XHTML, and that is when it is served with an XML mime type. In that case no tbody is added, nor removed, like Jonas said. If you serve XHTML with HTML mime type, we do normal HTML processing in strict mode (triggered by doctype), and in that case the HTML parser inserts the tbody. Removing XHTML keyword.
Keywords: xhtml
Heikki: What about when a local XHTML file is edited using Composer? Surely the behavior in that case is a bug; should another bug report be opened on that? How should it be phrased differently from this one?
Composer doesn't *have* an XHTML editor, as far as I know. It has an HTML editor. Any loading of a document into composer is as text/html.
David: Then Composer needs to issue a warning to users who mistakenly load an XHTML document into the editor rather than silently corrupt the file (just as it needs to do for text/html that is not HTML 4). If Composer pretends to be able to load and edit an XHTML document, users will assume it can do so competently.
in any event, composers behaviour should be discussed elsewhere, in a different bug or in some newsgroup. Adjusting summary, this bug applies to html4 as well, but not to xhtml
Summary: Parser should not add <tbody> to HTML 3.2 or XHTML tables → Parser should not add <tbody> to HTML tables
Composer opens XML documents in plain text mode.
Jonas: How is the a problem with HTML 4? Since TBODY is an inferred element in HTML 4, the parser *must* add it. Heikki: More accurately, Composer opens files that end in ".xml" in plain text mode. I strongly suspect that it will be quite ordinary for users to use ".html" for XHTML files. Composer will modify those files in an unanticipated way.
hmm.. is this bug about the parser or the serializer? the summary says parser, but most of the discussion has been about the serializer. I agree that the parser has to add it, and it does. But the serializer shouldn't
Jonas: Surely the serializer should include TBODY if it was present in the original input. Is it possible for the serializer to know whether that was the case or not?
no it can't. At least not in the general case, composer might be, i'm not sure. see the for example comments 50-52 for discussion about when <tbody> should be serialized
How about if the parser flagged ( using a proprietary attribute ) the fabricated TBODY? Can't the serializer make use of that information?
Adding "3.2" back to the summary. I've filed bug 141334 about Composer opening XHTML files named "*.html" as text/html. Jonas has suggested that the serializer should not output inferred elements that didn't occur explicitly in the original input. I don't agree that this is a problem, so someone else can file a bug on that if they want to.
Depends on: 68061
No longer depends on: 60861
Summary: Parser should not add <tbody> to HTML tables → Parser should not add <tbody> to HTML 3.2 tables
Harish: if the parser would flag inferred elements, the serializer could very easily omit them at output time. That would be a great solution if the parser can be made to do it.
but please don't add a DOM-attribute (using SetAttribute() or one of it's friends) to the element. Set some internal flag through some internal mozilla- interface instead.
Depends on: 148810
akkana: That would cure a symptom, but not the problem: it does not solve the problem of a DOM tree generated from an HTML 3.2 document containing a TBODY element.
(a) HTML 4.01 spec says in section 11.2.3 : The TBODY start tag is always required except when the table contains only one table body and no table head or foot sections. The TBODY end tag may always be safely omitted. (b) legacy browsers choke on tbody (c) I receive user complaints by email
Just to correct some incorrect statements made earlier in this bug: 1) A DOM that would be an invalid document when serialized is not an invalid DOM. The DOM spec explicitly says that you can create such DOMs via the DOM apis. 2) Mozilla's HTML parser does not enforce <tfoot> and <thead> coming before <tbody> and authors often put <tfoot> after <tbody>, so you cannot expect to see the <thead>/<tfoot> before <tbody> when serializing.
Summing up, tbody is mandatory only in html4, for <html4 it's invalid, for >html4 (i.e. xhtml) it's optional. So there might be many situations, in which a user would prefer a stricter parsing of the html-source. E.g. when evaluating XPath expressions against the dom-tree (via javascript), these become wrong in case of almost always unnecessarily inserted tbodies. Wouldn't it hence be neat to let the user decide from preferences at runtime, if he wants to allow <tr> to be child of <table>, instead of #define'ing this at compiletime? Have a look at the attached patch, default behaviour (i.e. in case of unset "parser.allowtraschildoftable") is kept, so no one should get into trouble.
I don't see any point to the user deciding, and giving users two options just makes things harder for authors.
The point is that inserted tbodies do not belong there. If the document's author liked them, he would have inserted them himself. I don't think it's consistent either, to force users to accept unnecessarily inserted elements just for compatibility reasons with broken browsers. At least those who care should be able to switch to a more natural behaviour easily - on their own risk, of course. I admit, the cleanest solution would for sure be, to detect the document type, insert <tbody> in HTML 4 (if necessary), remove it in HTML 3.2 (if necessary), and don't touch it in XHTML. My patch would become obsolete.
There are reasonable arguments for having this be AUTHOR-controlled, but none for having it be USER-controlled. However, I don't think the arguments for having it be author-controlled are strong enough to add versioning to HTML.
(In reply to comment #82) > There are reasonable arguments for having this be AUTHOR-controlled, but none > for having it be USER-controlled. However, I don't think the arguments for > having it be author-controlled are strong enough to add versioning to HTML. Agreed a zillion %.
Inferring tbody elements in HTML is done by any modern browser, from IE to Opera and Safari (in Quirks Mode or with an HTML 3.x doctype). I think changing the behaviour would do more harm than good, because some sites would break in Mozilla only. Don't fix a running system. Besides, we're nearing HTML 5, thus a model of HTML in which a HTML3.2 document (which doesn't really exist) would automatically become a HTML5 document. In this sense, I'll close this. Please punish me if that's too rashly. By the way: There are some parser hooks like "eQuirks3" that don't seem to do anything. These should probably get removed.
Status: NEW → RESOLVED
Closed: 24 years ago16 years ago
Resolution: --- → WONTFIX
Blocks: 455043
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: