HTML parser considers BGSOUND must be inside HEAD

RESOLVED FIXED

Status

()

RESOLVED FIXED
14 years ago
13 years ago

People

(Reporter: glazou, Assigned: mrbkap)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

14 years ago
The HTML parser assumes the presence of BGSOUND element in the BODY is an error
and the element is automagically moved inside HEAD. That is a problem for the
editor, more specifically for people wanting to edit documents made for MSIE
since MSIE explicitely allows BGSOUND anywhere. See URL attached to bug for
reference about this tag.

Did NS4.X have a restricted implementation of BGSOUND allowed only in the HEAD?
That's the only reason I can find for this code...
http://lxr.mozilla.org/seamonkey/source/parser/htmlparser/src/nsElementTable.cpp#348
It appears that parser's conception of <bgsound> is just plain wrong. According
to everything that I can find, <bgsound> is a non-container that can appear
anywhere (including the head).

A simple fix is to make <bgsound>'s Initialize()ation in the element table a
copy of <sound>, which has the same behavior. I'm going to attach a patch that
cleans up the code a bit more, though.
Assignee: parser → mrbkap
Status: NEW → ASSIGNED
Actually, this fix isn't nearly so simple. As far as I can see, htmlparser
doesn't have a way to say that a tag _may_ appear in the head. Only "it appears
in the head" and "it doesn't appear in the head." Unfortunately, all of the
stuff we do for whitespace and userdefined tags doesn't extend well to
non-userdefined tags.

I'm working on a patch that will allow the parser to do this correctly.

Note that the parser needs eHTMLTag_bgsound since it is not a container.

Comment 3

14 years ago
OBJECT is allowed in both the HEAD and BODY elements per HTML 4.01. Can't that
code be reused?
If I try: <html><head><object></object></head><body></body></html>
I get a content model of: <html><head></head><body><object></object></body>

So the code to handle this properly is non-existant. There may be a bug on this
already existing, but I can't find it.

Comment 5

14 years ago
Ugh. In that case a patch to allow something like this would be really useful.
Especially for standards compliance.
(Reporter)

Comment 6

14 years ago
Sorry to ask, but if we parse that tag, we don't seem to have any sort of code
to HANDLE it... Ask LXR (url below). So my opinion here is that all references
to bgsound should be removed to the parser...

http://lxr.mozilla.org/seamonkey/search?string=bgsound
Daniel, if we remove all references to bgsound in the parser, then we will mess
up people's pages since it will suddenly become a userdefined tag, which is a
container, instead of a leaf like it is now (then composer would probably add in
</bgsound> tags, which people would probably find strange). Since we have
"supported" it until now, and the code to handle it (more) correctly is needed
for better standards compliance anyway, I feel that we shouldn't break
compatibility by removing this tag from the parser.
(Reporter)

Comment 8

14 years ago
(In reply to comment #7)

> Daniel, if we remove all references to bgsound in the parser, then we will mess
> up people's pages since it will suddenly become a userdefined tag, which is a
> container, instead of a leaf 

Ah, right, good argument, I missed that.
Apparently, it can also be a container for MSIE...
Created attachment 177103 [details] [diff] [review]
patch v1

This patch introduces the idea of tags that _may_ be in the <head> of a
document. I've attempted to follow IE's lead for them (except for <script>,
which has other problems).

In nsElementTable.cpp, a parent type of kHeadContent means that this tag *must*
be in the head (i.e., always should be put into the head of a document). For
tags such as <style>, this is obvious. <script> has a parent content of
kHeadContent, however, this is simply to make sure that <script> doesn't get
moved out of the head into the <body> prematurely, it will be placed properly
due to my hack in CNavDTD::HandleStartToken(). A parent type of kHeadMisc means
that this tag _may_ be in the <head>. This is for tags such as <bgsound>.

My changes to nsHTMLElement::IsChildOfHead() reflect these changes
(aExclusively == PR_TRUE <==> this tag _must_ follow the return value).

In CNavDTD::HandleToken(), the first thing to note is that because each of
eHTMLTag_whitespace, eHTMLTag_comment, eHTMLTag_newline, and
eHTMLTag_userdefined have parents of kHeadMisc. This means that they are
children of head with an exclusivity of PR_FALSE. This means that in the
|default:| case, they fall into the |if| statement that says
|if(mMisplacedContent.GetSize() == 0) {|, meaning that there is no change in
behavior for these tags.

The second thing to note in this function is that if we have not received any
indication that the body might have been tentatively open (such as a <br>),
then tags that *could* be in the head (such as <bgsound>) are allowed to remain
in the head, otherwise we put them in the misplaced content queue. Tags that
*must* be in the head are never even given this consideration, they are just
passed right along.

In CNavDTD::HandleStartToken(), I munge around with the exclusivity for
<script>. If we have *not* seen a body, then we treat <script> as kHeadContent.
Otherwise, we treat it as kHeadMisc, which in this case always means we call
CNavDTD::HandleDefaultStartToken() on it.
Attachment #177103 - Flags: review?(bzbarsky)
> In nsElementTable.cpp, a parent type of kHeadContent means that this tag *must*
> be in the head (i.e., always should be put into the head of a document). For
> tags such as <style>, this is obvious.

So we'll ship <style> tags in the <body> into <head>?  That will break pages. 
Same for <base> (and even more so).
Comment on attachment 177103 [details] [diff] [review]
patch v1

Update nsElementTable.h with docs on what kHeadMisc and kHeadContent mean?  And
make the comments there match the new reality?

>Index: src/nsElementTable.cpp
>+ * @param aExclusively [out]Whether or not this tag can *only* appear in the
>+ *                     head (as opposed to things like <object> which can be
>+                       either in the body or the head).

Document that this is only reliable if PR_TRUE is returned?

> PRBool nsHTMLElement::IsChildOfHead(eHTMLTags aChild,PRBool& aExclusively) {
>+  PRBool result = PR_FALSE;
>+  aExclusively = PR_TRUE;
>+  // Is this a head-only tag?
>+  if (gHTMLElements[aChild].mParentBits & kHeadContent) {
>+    result = PR_TRUE;
>+  }

How about just "return PR_TRUE" inside that if?  No need for result then...

>+  if (!result) {

And no need for this if check.	Just unconditionally set aExclusively to
PR_FALSE here and return (gHTMLElements[aChild].mParentBits & kHeadMisc) != 0.

> the first thing to note is that because each of eHTMLTag_whitespace,
> eHTMLTag_comment, eHTMLTag_newline, and eHTMLTag_userdefined have parents of
> kHeadMisc. 

Assert that, somewhere?  e.g. assert that the tag is not one of these in
CNavDTD if we get a doesn't belong in head or get an exclusive. 

>Index: src/CNavDTD.cpp
>               PRBool theExclusive=PR_FALSE;

How about isExclusive instead?	If that changes too much code, ignore me.

>+          theExclusive = !(mFlags & NS_DTD_FLAG_HAD_BODY);

I assume HAD_BODY instead of HAS_OPEN_BODY helps out with cases where the body
got closed?

>-        if(theHeadIsParent || ((mFlags & NS_DTD_FLAG_HAS_OPEN_HEAD)  && 
>-                               (eHTMLTag_newline == theChildTag    || 
>-                                eHTMLTag_whitespace == theChildTag ||
>-                                eHTMLTag_userdefined == theChildTag))) {
>-            result = AddHeadLeaf(theNode);
>+        if(theHeadIsParent &&
>+            (theExclusive || (mFlags & NS_DTD_FLAG_HAS_OPEN_HEAD))) {

Why the logic change here (from || to && and the other one)?  That is, we used
to automatically stick in head if theHeadIsParent; was that wrong?
(In reply to comment #11)
> (From update of attachment 177103 [details] [diff] [review] [edit])
> Update nsElementTable.h with docs on what kHeadMisc and kHeadContent mean?  And
> make the comments there match the new reality?

Sure.

> 
> Document that this is only reliable if PR_TRUE is returned?
> 

The way I wrote this function, this isn't true (i.e., a return of PR_FALSE and
aExclusive == PR_TRUE would mean it exclusively is not a child of the head), but
with your changes, this will change, so I'll do this.

> How about just "return PR_TRUE" inside that if?  No need for result then...

Sure.

> Assert that, somewhere?  e.g. assert that the tag is not one of these in
> CNavDTD if we get a doesn't belong in head or get an exclusive. 

I could do this, but I'd rather put a comment in nsElementTable.cpp instead,
since there isn't really a good place to maintain this invarient.


> How about isExclusive instead? If that changes too much code, ignore me.

I'll make that change everywhere (whoever wrote this code liked variable names
to begin with the).

> 
> >+          theExclusive = !(mFlags & NS_DTD_FLAG_HAD_BODY);
> 
> I assume HAD_BODY instead of HAS_OPEN_BODY helps out with cases where the body
> got closed?

As it turns out, these flags are the same (we never unset HAS_OPEN_BODY). So it
doesn't really matter which is used. Perhaps one should be removed at some point?

> >+        if(theHeadIsParent &&
> >+            (theExclusive || (mFlags & NS_DTD_FLAG_HAS_OPEN_HEAD))) {
> 
> Why the logic change here (from || to && and the other one)?  That is, we used
> to automatically stick in head if theHeadIsParent; was that wrong?
> 

Before, since we didn't use the exclusivity of the element, theHeadIsParent
meant that we wanted to put the element in the head. Now, we want to make sure
the element actually wants to go in the head (either it _has_ to exclusively, or
we're in the head anyway and it *can* go in the head).

I'll make a new patch sometime tonight or tomorrow that addresses the comments
up to now.
> I could do this, but I'd rather put a comment in nsElementTable.cpp instead,

Ok.  Could do both, but either way.  ;)

> Perhaps one should be removed at some point?

Sounds like it.

Created attachment 177269 [details] [diff] [review]
patch v2

This should be updated to all of the comments. It also has an additional hunk
that I forgot to patch earlier (in CNavDTD::WillHandleStarToken()). I'm not
sure that it is necessary, but it deserved updating too.

Note that I made nsHTMLElement::IsChildOfHead()'s aExclusively parameter
meaningful for all return values.
Attachment #177103 - Attachment is obsolete: true
Attachment #177269 - Flags: review?(bzbarsky)
Attachment #177103 - Flags: review?(bzbarsky)
Comment on attachment 177269 [details] [diff] [review]
patch v2

r=bzbarsky
Attachment #177269 - Flags: review?(bzbarsky) → review+
Attachment #177269 - Flags: superreview?(jst)
Comment on attachment 177269 [details] [diff] [review]
patch v2

sr=jst
Attachment #177269 - Flags: superreview?(jst) → superreview+
Fix checked in. Note, I accidentally included the patch for bug 284587 in the
patch for this bug.
Status: ASSIGNED → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED
Depends on: 309040
You need to log in before you can comment on or make changes to this bug.