Closed Bug 136529 Opened 23 years ago Closed 23 years ago

CSS does not get loaded through LINK if HTML version is 4.0 Strict or 4.01 Transitional/Strict/Frameset

Categories

(Core :: CSS Parsing and Computation, defect)

x86
Linux
defect
Not set
major

Tracking

()

VERIFIED INVALID

People

(Reporter: kripe, Assigned: dbaron)

References

()

Details

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020313 BuildID: 2002031312 http://whhh.wellington.net.nz/lime-401.html - HTML 4.01 Strict document that should get a lime background from the CSS /lime.css. http://whhh.wellington.net.nz/lime-40.html - Same document with HTML 4.0 Transitional header - works in Mozilla 0.9.9. Both used to work in Mozilla 0.9.2.1. Reproducible: Always Steps to Reproduce: 1.Check out the URLs above. Actual Results: You will see a document with white background, even though the stylesheet defines a lime background. Expected Results: Lime background.
The CSS file is being served with a MIME type of text/plain, so we will ignore it if we are in standards mode. See http://mozilla.org/docs/web-developer/quirks/ .
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → INVALID
VERIFIED that lime.css is incorrectly sent as text/plain
Status: RESOLVED → VERIFIED
Specifying the type="text/css" attribute in the <link rel="stylesheet" .../> element should solve the problem in any case, even if the web server reports the wrong MIME type, or does not report any MIME type (for example if the link references a local file or a FTP resource).
Actually, it should only affect it if the server sends no MIME type. If the server sends a MIME type then per the HTML and HTTP specs, the server's MIME type overrides the MIME type given on the <link> element.
"per the HTML and HTTP specs"... where did you read that ? Certainly not in the HTTP specs (which are FULLY independant of HTML) and thus does not mandate any use of the downloaded documents by an application. Certainly not in the HTML specs (which just says SHOULD or MAY, but does not mandate something regarding the transport layer, and also because HTML specs are FULLY independant of HTTP or any other transport protocol). So we must read the specs regarding the <link> and <style> elements as they are, without reference to any protocol. The HTML spec only says that a browser should take a decision of whever to download or not download the referenced document by an anticipated analysis of the type attribute, to see if it supports this content type. If it sees that it support the "text/css" content type, then it must maintain that decision when it will download the stylesheet, and it clearly must manage that document according to that decision, regardless of what the protocol layer may later say. This also means that the following: <img src="myimage.bin" type="image/jpeg" width="32" height="32" xml:lang="en" title="My company logo"/> will download and manage the referenced image as a JPEG image regarless of the fact that the server returns the "myimage.bin" document with another media type (such as here "application/binary"), and regardless of the fact that the server may return another language indicator, another title for the image, and another dimension for its best representation. This is the only way to keep the independance of the transport layer and the HTML spec.
# type # This attribute gives an advisory hint as to the content type of the content # available at the link target address. It allows user agents to opt to use a # fallback mechanism rather than fetch the content if they are advised that # they will get content in a content type they do not support. # # Authors who use this attribute take responsibility to manage the risk that # it may become inconsistent with the content available at the link target # address. -- http://www.w3.org/TR/html4/struct/links.html#adef-type-A So if the <link> says type="text/css" then we fetch it, because we support text/css. However, if the server says "This is a text/html file" then we say "oh, well, we don't know how to handle text/html stylesheets" so we ignore it.
And this is what you call "managing the risk": yes we don't know how to manage text/html stylesheets, but does it matter in that case if we already know that we can manage the anticipated text/css type ? Why not using the CSS parser on the content to decide if this is actually valid CSS ? HTML is likely to contain tags, and will typically contain at least one "<" character which is invalid anywhere as a valid CSS selector If HTML does not contain any "<" character then it is transitional HTML, and it will not contain stylesheets (require at least a <style> tag which is an invalid CSS selector) If HTML appears to be a valid CSS file, it will contain no tag and it cannot be considered as a strict conforming HTML (missing the <html> tag) if the referenced file is sent as text/x-unknown, then we know at least that it is of the text family, so we can run the CSS parser on it to validate it. I remember the conforming rules defined in the CSS specs: 1) if a CSS selector is invalid, the whole style sheet must be ignored 2) if a CSS selector is valid and the braces match and their content validates the basic syntax (of a semi-colon separated list of a keyword followed by a colon and by a value of one or more words), then the CSS rule can be accepted, provided that the style sheet validates as a whole. The individual definitions in the brace may just need to be dropped if not recognized (for backward compatibility of CSS specs), but the rule can be accepted 3) empty CSS rules are valid (a valid CSS selector followed by braces whose content is empty or whose all value pairs have been ignored to the previous rule.) So I see absolutely no problem in trying to validate the CSS file that has been retreived. The only case where the "text/html" mime type should not be parsed is when the <link rel="stylesheet"> or anchor element references the stylsheet without specifying the MIME content type. In that case, the content type of the server can be checked and honored before trying to use the CSS parser. If neither the server nor the link contain indication about the content type, I really think that in that case we must not parse the content (because it may be in a still unknown but valid content-type for stylesheets defined in further specs, whose parsing through a CSS1 or CSS2 or CSS3 parser may cause problems). In that situation (for example, FTP URI references to stylesheets), the designer will need to specify the type attribute and this will actually be used by the browser for the stylesheet validation.
> I remember the conforming rules defined in the CSS specs: > 1) if a CSS selector is invalid, the whole style sheet must be ignored No, just the rule. We already have a bug somewhere on parsing a 404 error page (for a stylesheet load) that had a STYLE element within it and getting legitimate style rules out of it. It's not true that one can't get style rules out of HTML.
This debate has been happening in three different bugs (this one, bug 113399, and here). It certainly doesn't help to convince me when you comment in multiple bugs in the hope that the owner of one of them will decide to change the behavior, but I think this bug is the most appropriate place for the discussion of the three.
Yes just the rule! how do you delimitate the rule if the syntax that defines it (i.e. a valid selector and a braced body) can't be found ? Where do you resume to the next rule ? in my opinion you must just ignore the rule only when the basic delimiting characters are OK (this is not the case in presence of a "<" in a selector, and resuming to the character that follows the invalid character is not what the standard says: a "<" character is not a rule, so you can't ignore it. This is what the W3C CSS validator does: it does not validate ANY rule that may appear further in the text if the parse error occurs in selectors. There are several levels of errors in CSS: - fatal parse errors (illegal characters in selectors) - not matching braces are fatal parse errors. Then only you can divide the file into indedependant rules, that can be ignored selectively. each have a selector part and a definition part. - correct syntax but illegal usage of an operator: ignore the rule - invalid definitions between matching braces, such as absence of a name before the colon or multiple names before it: ignore the rule - unknown attribute name: ignore the rule - invalid attriute value: ignore the rule. Ignoring a rule is only possible when you have a context where to recover to the next rule. If not, the CSS file is fully invalid and all previously parsed rules that where accidently accepted in that file must be ignored and parsing must abort with a fatal parse error.
In the following page: <html> <title>my page</title> <style type="text/css"> body { display: block; } p { color: red; } </style> </head> <body>...</body></html> There are two CSS rules followed by some garbage at the end. The first one has an invalid selector, "<html>" ... "body", and is ignored. The { indicates the end of the selector and the } the end of the rule. The "p { color: red; }" is a valid rule, and then there's some garbage at the end that could be an invalid selector for the next rule.
responding to bug 53112 comment 10: > Isn't it an exceptional circumstance when the standard suggests that there's a > risk of ambiguity? There is no ambiguity. The specs fully define the behaviour for every combination of Content-Type and type="". > I see absolutely no problem in trying to use the CSS parser on what the server > reports as "text/html" or "text/plain" or "text/unknown" See the case dbaron is arguing above. > or "text/xml" XSLT stylesheets, MIME type text/xml, have semantics different to CSS stylesheets, MIME type text/css. Trying to interpret text/xml stylesheets using a CSS parser is incorrect. We even support text/xml stylesheets; and in the unlikely event of finding a text/xml stylesheet at the end of a link with type="text/css", we should handle it using the XSL engine, not the CSS one. (If we don't, file a bug.) > or "text/javascript" Javascript Stylesheets, MIME type text/javascript, have semantics different to CSS stylesheets, MIME type text/css. Trying to interpret text/javascript stylesheets using a CSS parser is incorrect. > just avoiding to use it on content types that the server says not being of > the "text/*" family (because this is a basic requirement for a CSS parser > which first of all is a text parser). image/svg+xml and text/xml are virtually indistinguishable, so why should text/xml be considered more like text/css than image/svg+xml?
You say "the specs say" ... Where did you see that ? I'm very curious to know which point is accurate. I have carefully searched in both HTTP and HTML specs and did not find ant such assumption (simply because both specs are completely independant from each other, and because HTML does not give any requirement to the underlaying transport protocol used to retreive external entities)... Nor did I see that you could recover from parse errors by searching for "}" in case of bad characters in CSS selector (the specs only gives the special case of At-Keyword blocks, which here requires a specific pseudo-selector). So the specs only let the the designer of the HTML page let the risk of of managing the case where both type sdon't match, and the spec does not seem to recommend any behavior. In that case you should still trust the designer and honor the type="" attribute, even in strict mode (but in that mode you will run the CSS parser in strict mode too so you won't recover from parse errors except in the only cases which are explicitly given by the CSS spec). In quirks mode, you can still run the CSS parser with a more relaxed error recovery, but yes you should also trust the HTML page designer (which assumes the risk of its page becoming broken in case of unwanted external entity corruption, or because the server for the external entity incorrectly delivers a file of a wrong type despite of the Accept header sent by the HTTP client). What is your rule then if HTTP does not provide any MIME type in a Content-Type header (this case often occur when the retreived file is served or generated through a server-side script or program), or with FTP (which does not carry any MIME type information), or from an application-bounded namespace (such as Microsoft HTML Help files) ?
> You say "the specs say" ... Where did you see that ? I'm very curious to know > which point is accurate. HTTP 1.1: # 7.2.1 Type # # When an entity-body is included with a message, the data type of that # body is determined via the header fields Content-Type and Content- # Encoding. [...] # # Content-Type specifies the media type of the underlying data. [...] # # If and only if the media type is not given by a Content-Type field, the # recipient MAY attempt to guess the media type via inspection of its # content and/or the name extension(s) of the URL used to identify the # resource. If the media type remains unknown, the recipient SHOULD # treat it as type "application/octet-stream". # # [...] # # 14.18 Content-Type # # The Content-Type entity-header field indicates the media type of the # entity-body sent to the recipient -- http://www.w3.org/Protocols/rfc2068/rfc2068 HTML 4.01: # type = content-type [CI] # This attribute gives an advisory hint as to the content type of the content # available at the link target address. It allows user agents to opt to use a # fallback mechanism rather than fetch the content if they are advised that # they will get content in a content type they do not support. # Authors who use this attribute take responsibility to manage the risk that # it may become inconsistent with the content available at the link target # address. -- http://www.w3.org/TR/html401/struct/links.html#adef-type-A Put together, these specs tell us, firstly, that the type attribute is _only_ there to give a hint to the UA about whether the UA should both to try to fetch the resource, and does not in any way imply anything about the content type of the target document, and secondly, that the Content-Type header returned should be honoured. So when we look at the <link> element we establish whether or not we are interested in the resource, and if we are (type="text/css" or type="text/xml") then we fetch it. When we get the resource back, we look at its actual MIME type (either from Content-Type, if present, or using content sniffing, if not) and if we establish that it is text/css, we pass it to our CSS parser. > Nor did I see that you could recover from parse errors by searching for "}" in > case of bad characters in CSS selector [...] Section 4.2 of CSS2 is not completely clear on this, but the intention of the CSS working group (of which David and I are members) is as he described in comment 12. I have sent an e-mail to the working group mailing list so that we may add an errata to the spec to make this clearer.
> Nor did I see that you could recover from parse errors by searching for "}" As Ian was kind enough to point out in another bug, this is CSS2, Section 4.1.7 (http://www.w3.org/TR/REC-CSS2/syndata.html#q8), paragraph 3. It seems pretty clear to me...
You need to log in before you can comment on or make changes to this bug.