Closed Bug 136529 Opened 18 years ago Closed 18 years ago

CSS does not get loaded through LINK if HTML version is 4.0 Strict or 4.01 Transitional/Strict/Frameset

Categories

(Core :: CSS Parsing and Computation, defect, major)

x86
Linux
defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: kripe, Assigned: dbaron)

References

()

Details

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020313
BuildID:    2002031312

http://whhh.wellington.net.nz/lime-401.html - HTML 4.01 Strict document that
should get a lime background from the CSS /lime.css.
http://whhh.wellington.net.nz/lime-40.html - Same document with HTML 4.0
Transitional header - works in Mozilla 0.9.9.
Both used to work in Mozilla 0.9.2.1.

Reproducible: Always
Steps to Reproduce:
1.Check out the URLs above.


Actual Results:  You will see a document with white background, even though the
stylesheet defines a lime background.

Expected Results:  Lime background.
The CSS file is being served with a MIME type of text/plain, so we will ignore
it if we are in standards mode.  See http://mozilla.org/docs/web-developer/quirks/ .
Status: UNCONFIRMED → RESOLVED
Closed: 18 years ago
Resolution: --- → INVALID
VERIFIED that lime.css is incorrectly sent as text/plain
Status: RESOLVED → VERIFIED
Specifying the type="text/css" attribute in the <link rel="stylesheet" .../> 
element should solve the problem in any case, even if the web server reports 
the wrong MIME type, or does not report any MIME type (for example if the link 
references a local file or a FTP resource).
Actually, it should only affect it if the server sends no MIME type. If the
server sends a MIME type then per the HTML and HTTP specs, the server's MIME
type overrides the MIME type given on the <link> element.
"per the HTML and HTTP specs"...
where did you read that ?

Certainly not in the HTTP specs (which are FULLY independant of HTML) and thus 
does not mandate any use of the downloaded documents by an application.

Certainly not in the HTML specs (which just says SHOULD or MAY, but does not 
mandate something regarding the transport layer, and also because HTML specs 
are FULLY independant of HTTP or any other transport protocol).

So we must read the specs regarding the <link> and <style> elements as they 
are, without reference to any protocol. The HTML spec only says that a browser 
should take a decision of whever to download or not download the referenced 
document by an anticipated analysis of the type attribute, to see if it 
supports this content type. If it sees that it support the "text/css" content 
type, then it must maintain that decision when it will download the stylesheet, 
and it clearly must manage that document according to that decision, regardless 
of what the protocol layer may later say.

This also means that the following:

<img src="myimage.bin" type="image/jpeg" width="32" height="32" xml:lang="en" 
title="My company logo"/>

will download and manage the referenced image as a JPEG image regarless of the 
fact that the server returns the "myimage.bin" document with another media type 
(such as here "application/binary"), and regardless of the fact that the server 
may return another language indicator, another title for the image, and another 
dimension for its best representation.

This is the only way to keep the independance of the transport layer and the 
HTML spec.
# type 
#   This attribute gives an advisory hint as to the content type of the content 
#   available at the link target address. It allows user agents to opt to use a 
#   fallback mechanism rather than fetch the content if they are advised that 
#   they will get content in a content type they do not support.
#
#   Authors who use this attribute take responsibility to manage the risk that 
#   it may become inconsistent with the content available at the link target 
#   address.
  -- http://www.w3.org/TR/html4/struct/links.html#adef-type-A

So if the <link> says type="text/css" then we fetch it, because we support
text/css. However, if the server says "This is a text/html file" then we say
"oh, well, we don't know how to handle text/html stylesheets" so we ignore it.
And this is what you call "managing the risk":
yes we don't know how to manage text/html stylesheets, but does it matter in 
that case if we already know that we can manage the anticipated text/css type ?
Why not using the CSS parser on the content to decide if this is actually valid 
CSS ?
HTML is likely to contain tags, and will typically contain at least one "<" 
character which is invalid anywhere as a valid CSS selector
If HTML does not contain any "<" character then it is transitional HTML, and it 
will not contain stylesheets (require at least a <style> tag which is an 
invalid CSS selector)
If HTML appears to be a valid CSS file, it will contain no tag and it cannot be 
considered as a strict conforming HTML (missing the <html> tag)
if the referenced file is sent as text/x-unknown, then we know at least that it 
is of the text family, so we can run the CSS parser on it to validate it.

I remember the conforming rules defined in the CSS specs:
1) if a CSS selector is invalid, the whole style sheet must be ignored
2) if a CSS selector is valid and the braces match and their content validates 
the basic syntax (of a semi-colon separated list of a keyword followed by a 
colon and by a value of one or more words), then the CSS rule can be accepted, 
provided that the style sheet validates as a whole. The individual definitions 
in the brace may just need to be dropped if not recognized (for backward 
compatibility of CSS specs), but the rule can be accepted
3) empty CSS rules are valid (a valid CSS selector followed by braces whose 
content is empty or whose all value pairs have been ignored to the previous 
rule.)

So I see absolutely no problem in trying to validate the CSS file that has been 
retreived.
The only case where the "text/html" mime type should not be parsed is when the 
<link rel="stylesheet"> or anchor element references the stylsheet without 
specifying the MIME content type. In that case, the content type of the server 
can be checked and honored before trying to use the CSS parser.

If neither the server nor the link contain indication about the content type, I 
really think that in that case we must not parse the content (because it may be 
in a still unknown but valid content-type for stylesheets defined in further 
specs, whose parsing through a CSS1 or CSS2 or CSS3 parser may cause problems).
In that situation (for example, FTP URI references to stylesheets), the 
designer will need to specify the type attribute and this will actually be used 
by the browser for the stylesheet validation.
> I remember the conforming rules defined in the CSS specs:
> 1) if a CSS selector is invalid, the whole style sheet must be ignored

No, just the rule.

We already have a bug somewhere on parsing a 404 error page (for a stylesheet
load) that had a STYLE element within it and getting legitimate style rules out
of it.  It's not true that one can't get style rules out of HTML.
This debate has been happening in three different bugs (this one, bug 113399,
and here).  It certainly doesn't help to convince me when you comment in
multiple bugs in the hope that the owner of one of them will decide to change
the behavior, but I think this bug is the most appropriate place for the
discussion of the three.
s/and here/and bug 53112/
Yes just the rule! how do you delimitate the rule if the syntax that defines it 
(i.e. a valid selector and a braced body) can't be found ?
Where do you resume to the next rule ?
in my opinion you must just ignore the rule only when the basic delimiting 
characters are OK (this is not the case in presence of a "<" in a selector, and 
resuming to the character that follows the invalid character is not what the 
standard says: a "<" character is not a rule, so you can't ignore it.
This is what the W3C CSS validator does: it does not validate ANY rule that may 
appear further in the text if the parse error occurs in selectors.
There are several levels of errors in CSS:
- fatal parse errors (illegal characters in selectors)
- not matching braces are fatal parse errors.
Then only you can divide the file into indedependant rules, that can be ignored 
selectively. each have a selector part and a definition part.
- correct syntax but illegal usage of an operator: ignore the rule
- invalid definitions between matching braces, such as absence of a name before 
the colon or multiple names before it: ignore the rule
- unknown attribute name: ignore the rule
- invalid attriute value: ignore the rule.
Ignoring a rule is only possible when you have a context where to recover to 
the next rule. If not, the CSS file is fully invalid and all previously parsed 
rules that where accidently accepted in that file must be ignored and parsing 
must abort with a fatal parse error.
In the following page:

<html>
  <title>my page</title>
  <style type="text/css">

  body { display: block; }

  p { color: red; }
  </style>
</head>
<body>...</body></html>

There are two CSS rules followed by some garbage at the end.  The first one has
an invalid selector, "<html>" ... "body", and is ignored.  The { indicates the
end of the selector and the } the end of the rule.  The "p { color: red; }" is a
valid rule, and then there's some garbage at the end that could be an invalid
selector for the next rule.
responding to bug 53112 comment 10:

> Isn't it an exceptional circumstance when the standard suggests that there's a 
> risk of ambiguity?

There is no ambiguity. The specs fully define the behaviour for every
combination of Content-Type and type="".


> I see absolutely no problem in trying to use the CSS parser on what the server
> reports as "text/html" or "text/plain" or "text/unknown" 

See the case dbaron is arguing above.


> or "text/xml" 

XSLT stylesheets, MIME type text/xml, have semantics different to CSS
stylesheets, MIME type text/css. Trying to interpret text/xml stylesheets using
a CSS parser is incorrect.

We even support text/xml stylesheets; and in the unlikely event of finding a
text/xml stylesheet at the end of a link with type="text/css", we should handle
it using the XSL engine, not the CSS one. (If we don't, file a bug.)


> or "text/javascript"

Javascript Stylesheets, MIME type text/javascript, have semantics different to
CSS stylesheets, MIME type text/css. Trying to interpret text/javascript
stylesheets using a CSS parser is incorrect.


> just avoiding to use it on content types that the server says not being of 
> the "text/*" family (because this is a basic requirement for a CSS parser 
> which first of all is a text parser).

image/svg+xml and text/xml are virtually indistinguishable, so why should
text/xml be considered more like text/css than image/svg+xml?
You say "the specs say" ... Where did you see that ? I'm very curious to know 
which point is accurate.

I have carefully searched in both HTTP and HTML specs and did not find ant such 
assumption (simply because both specs are completely independant from each 
other, and because HTML does not give any requirement to the underlaying 
transport protocol used to retreive external entities)...

Nor did I see that you could recover from parse errors by searching for "}" in 
case of bad characters in CSS selector (the specs only gives the special case 
of At-Keyword blocks, which here requires a specific pseudo-selector).
So the specs only let the the designer of the HTML page let the risk of of 
managing the case where both type sdon't match, and the spec does not seem to 
recommend any behavior. In that case you should still trust the designer and 
honor the type="" attribute, even in strict mode (but in that mode you will run 
the CSS parser in strict mode too so you won't recover from parse errors except 
in the only cases which are explicitly given by the CSS spec).

In quirks mode, you can still run the CSS parser with a more relaxed error 
recovery, but yes you should also trust the HTML page designer (which assumes 
the risk of its page becoming broken in case of unwanted external entity 
corruption, or because the server for the external entity incorrectly delivers 
a file of a wrong type despite of the Accept header sent by the HTTP client).

What is your rule then if HTTP does not provide any MIME type in a Content-Type 
header (this case often occur when the retreived file is served or generated 
through a server-side script or program), or with FTP (which does not carry any 
MIME type information), or from an application-bounded namespace (such as 
Microsoft HTML Help files) ?

> You say "the specs say" ... Where did you see that ? I'm very curious to know 
> which point is accurate.

HTTP 1.1:

# 7.2.1 Type
#
#    When an entity-body is included with a message, the data type of that
#    body is determined via the header fields Content-Type and Content-
#    Encoding. [...]
#
#    Content-Type specifies the media type of the underlying data. [...]
#
#    If and only if the media type is not given by a Content-Type field, the
#    recipient MAY attempt to guess the media type via inspection of its
#    content and/or the name extension(s) of the URL used to identify the
#    resource. If the media type remains unknown, the recipient SHOULD
#    treat it as type "application/octet-stream".
#
# [...]
#
# 14.18 Content-Type
#
#    The Content-Type entity-header field indicates the media type of the
#    entity-body sent to the recipient
  -- http://www.w3.org/Protocols/rfc2068/rfc2068


HTML 4.01:

# type = content-type [CI]
#   This attribute gives an advisory hint as to the content type of the content 
#   available at the link target address. It allows user agents to opt to use a
#   fallback mechanism rather than fetch the content if they are advised that 
#   they will get content in a content type they do not support.
#   Authors who use this attribute take responsibility to manage the risk that 
#   it may become inconsistent with the content available at the link target 
#   address.
  -- http://www.w3.org/TR/html401/struct/links.html#adef-type-A


Put together, these specs tell us, firstly, that the type attribute is _only_
there to give a hint to the UA about whether the UA should both to try to fetch
the resource, and does not in any way imply anything about the content type of
the target document, and secondly, that the Content-Type header returned should
be honoured.

So when we look at the <link> element we establish whether or not we are
interested in the resource, and if we are (type="text/css" or type="text/xml")
then we fetch it. When we get the resource back, we look at its actual MIME type
(either from Content-Type, if present, or using content sniffing, if not) and if
we establish that it is text/css, we pass it to our CSS parser.


> Nor did I see that you could recover from parse errors by searching for "}" in 
> case of bad characters in CSS selector [...]

Section 4.2 of CSS2 is not completely clear on this, but the intention of the
CSS working group (of which David and I are members) is as he described in
comment 12. I have sent an e-mail to the working group mailing list so that we
may add an errata to the spec to make this clearer.
> Nor did I see that you could recover from parse errors by searching for "}" 

As Ian was kind enough to point out in another bug, this is CSS2, Section 4.1.7 
(http://www.w3.org/TR/REC-CSS2/syndata.html#q8), paragraph 3.  It seems pretty 
clear to me...
You need to log in before you can comment on or make changes to this bug.