Closed Bug 120789 Opened 23 years ago Closed 22 years ago

checking for text/css sheets should only happen for http with a type header

Categories

(Core :: CSS Parsing and Computation, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.5alpha

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

(Keywords: helpwanted)

Attachments

(1 file, 6 obsolete files)

We should not be checking the type for file:// channels, etc. We should also not be checking the type for http:// if there is no Content-Type header.
Attached patch patch to restrict the checking (obsolete) — Splinter Review
Attached patch Same patch, better comments. (obsolete) — Splinter Review
Attachment #65625 - Attachment is obsolete: true
Nit: HTTP isn't the only scheme which delivers a media-type. We should also check data:, for instance.
Just talked to Hixie. The fundamental problem here is that all channels return a type. We could special-case channels till the cows come home here... So the proposal is: 1) Create an nsITypedChannel interface. This will have a GetOriginalType() method which will return the "original" type before any sniffing happened. 2) QI to nsITypedChannel in the CSSLoader 3) If the QI succeeds and the type is nonempty, _then_ look at the type. Darin, bbaetz, does #1 seem reasonable? At the moment, just http: and data: channels would implement that interface.
Why can't you just ask for the content-type on the channel, and allow text/css _or_ an unknown type. Or does the mime matching stuff happen before this?
The unknown decoder happens far, far before this. Also, asking the channel will involve the MIME service, and we don't want to trust the mime service here (since it just extension-sniffs).
Isn't css one of the extentions we hard code, along with html and xul? Actually, how does xul handle this? Do we fail to start up if I have an OS mapping of XUL to something else? Or does that code not do content-type checking?
We _do_ hardcode .xul and .css extensions. You cannot override them with OS settings. That's beside the point. We should not be forcing people to have their stylesheets in .css files (as long as they configure their web server correctly once they upload them).
Oh, I see. So a file:/// stylesheet with an extention of .foo is OK, because when accessed via the web .foo could be mapped to text/css ? In that case, just QI to nsIHttpChannel before testing. The only other protocol which gives us mime type info is mail/news, and I don't know if the effort is worth the bother to get that edge case correct - most html-via-mail uses inline stylesheets, I think.
no, data: does as well. And therein lies the problem. Adding a new protocol should not require changes all over the codebase to change hardcoded QIs. Channels should be able to return their real content-type, their sniffed content-type, their real character set, and their sniffed character set. Code that has fallbacks -- e.g. <link> elements with type="" attributes, <link> elements with charset="" attributes, etc, should then be able to say "ok, if you don't have a content-type use this one". There is no way that hard coding knowledge of specific protocols in the CSS loader is good style. I have the feeling we have this problem all over the place, a problem which probably results in weird "but it works for this protocol..." bugs.
Actually a simple way of implementing this which will better ensure good use throughout the codebase might just be to change the Content-Type API to require a "default" argument. Then it's up to the protocols to return the normative content-type or the default (if non-null) or the sniffed content type, using that order for fallbacks. So here we would do something like type = channel->contentType(element->typeAttribute()); Similarly with the charset APIs. This would also make it unnecessary for us to add a bunch of different interfaces for typed channels, charset channels, etc.
i'm really against the idea of adding another common channel interface. i think nsIChannel should be modified instead to do what we need. BTW nsIHttpChannel has a charset attribute that could probably be moved into nsIChannel.
So Darin, would you say that we should just add an "originalType" attribute to nsIChannel that will be "" for file:// and the like and just return the type for data: and the type header for http:? (Note that a data channel of type application/x-unknown-content-type will sniff, blow the type away, and have no chance of getting it back... Not sure what we want to do about this, if anything).
i'm not really sure what the best change would be... i still need to think about this one some more.
Priority: -- → P3
Target Milestone: --- → mozilla1.0
Comment on attachment 65627 [details] [diff] [review] Same patch, better comments. this patch breaks if there is a charset param. needs work.
Attachment #65627 - Attachment is obsolete: true
Attachment #65627 - Flags: needs-work+
I'm going out of town in two days and not getting back till after 1.0 freeze. Setting realistic milestones for all 1.0 bugs that depend on other bugs or on possible interface work. Any help on getting these fixed would be much appreciated...
Keywords: helpwanted
Target Milestone: mozilla1.0 → mozilla1.1alpha
*** Bug 134871 has been marked as a duplicate of this bug. ***
1.1alpha is frozen. Unsetting milestone and will retriage in a few days when I can make a realistic assessment of the situation.
Target Milestone: mozilla1.1alpha → ---
Talked about this with Darin for a bit. He raises a good point -- the job of Necko is to determine the content type to the best of its ability and expose it on nsIChannel. If it determines the wrong content type, that's bad. Exposing the fact that some channels get the type from the MIME service, some get it from the unknown content decoder, and some get it from the network headers is not going to happen via Necko APIs. Which leaves us with a few options here: 1) Make the unknoon content decoder detect CSS (I have no idea how to actually do this, since CSS files have no distinctive features I can think of). 2) We explicitly check the channel type for http/data and check that the Content-Type header was actually sent if it's HTTP. 3) We relax our current restriction to say something like "if it's not a styleshet language we recognize, go with the advisory type". That is, if a <link rel="stylesheet"> comes back with a text/xml document we can feed it to XSLT, but if it comes back with text/plain we go with the advisory type. 4) Leave things as they are. Option #2 or option #3 will lead to the best behavior with regard to real content, I think.... it's just that option #2 is a little ugly. ;) Thoughts?
What are the problems with the suggestions in comment 4 and comment 11?
Comment 11 is no longer imlementable because the API in question is frozen. Comment 4, I get the impression that Darin, as module owner, feels this would be a bad idea. I could be wrong, however.
option #3 seems good to me, but i may be missing something. it just seems good to fallback on trusting the document author... he said "text/css" in the <link> tag, so even if necko comes back and says that the document is not "text/css", let's trust that it actually is. btw: what does IE do?
Option 3 is _wrong_. If we do that, we will be violating the specs. Does the API being frozen also mean we can't _add_ to the API? It seems to me that the suggestion in comment 11 most has the smallest semantic gap between what we are trying to do and what we can do. IE does totally the wrong thing and almost always ignores the Content-Type header, causing many authors and users grief and preventing some well-written sites from working correctly.
frozen means that we cannot make any changes to the interface :( i understand that option #3 is not what the spec says we should do, but it still seems like the sane thing to do. i mean, if the content-type specifies a type that we don't handle, then why not try the advisory type? what could it really hurt? i mean, otherwise we'd just ignore the document. is that what we need to do in order to maintain compliance with the spec?
Yes. Imagine there are two different stylesheet languages: text/css, and text/x-cll. The following is text/x-cll for making the footer body hidden when printing: @footer; body { display: none; when: printing; } A standard-compliant browser will correctly ignore that file, since it is sent as text/x-cll, and thus render any pages that use it correctly. A browser that is not following HTTP and treated that document as text/css despite the server specifying that it is not text/css would end up hiding the entire document. (Because if you handle that as a CSS stylesheet, it has the effect of removing the <body> element from display.) Given that we currently do the right thing in this case, I think it is not acceptable for us to regress our behaviour. Ok, so the interface is frozen. Can't we increase the version number of the interface and fix the issue there? How are we going to handle other features of HTTP that we haven't implemented yet?
It sounds like part of the problem here is that for the file:// case, we can get a Content-Type *only* by extension n' content-sniffing (assuming MIME-types aren't part of the file metadata the OS can tell us), when we don't actually want to perform that process until after checking for the advisory type. Is that correct?
hixie: nsIHttpChannel is not frozen... i was only talking about nsIChannel being frozen. ic your point about overriding the server specified content-type, but aren't there some brownie points at least for honoring the advisory type? i mean, surely the advisory type has to be given some credibility. ok, the spec says otherwise, but in this case the spec seems out-of-touch with reality. people write documents that say they are importing a text/css document. they don't have their server setup correctly or they didn't save the css document with the proper extension, but as far as they know they did tell the browser that the content-type is text/css. so, why is it that we want to completely disregard this info? because the standard says so? seems to me like the spec needs to be revised, but then maybe i'm missing something really fundamental here.
Continuing to muse aloud... as best I can figure, nsIChannel is used for retrieving a URI, taken from some URI scheme. However, not all protocols associated with a URI scheme are able to convey a MIME-type for the browser to use in displaying that URI. HTTP authorizes the browser to guess only in cases where there is no MIME-type; but doing sniffing for a file:// resource is a function that lies *outside* of the file:// protocol. To summarize: HTTP is a special case of a protocol where we're allowed to sniff for Content-Type *as part of the protocol*. (data:, in contradistinction, should fall back on text/plain.) For things like file://, the protocol should tell whatever's calling it that it doesn't know how to determine the MIME-type of the file coming through the channel, and let the caller fall back on its mechanisms (in HTML, advisory type followed by sniffing; in most other cases, probably just sniffing). I realize that this probably bears no resemblance whatsoever to our current architecture, but that's the closest-to-spec behavior, AFAICT.
Darin: If the server doesn't tell us what the type is, then yes, I'm all for falling back on the advisory type. But if we do know what it is, then we must not try to second guess the server. Our interface doesn't match the model we should be implementing; maybe it is time to work on a better nsIChannel. data: should never be sniffed (it always knows its type) ftp, file, chrome: should use the advisory type (whatever it is we are expecting), or, if we don't know what we are expecting, should sniff. http: should return the type given by the server, or, if none, then use the advisory or expected type, or, if none, should sniff. So it seems the model we need is one with two ways of getting the type: nsIChannel::contentType() returns the content type, if known, or else the sniffed type. nsICompliantChannel::contentTypeWithFallback(fallback) returns the actual type, if known, else the fallback, else the sniffed type. ...where nsICompliantChannel is whatever we call the new interface which supplements nsIChannel, and which all nsIChannel implementors should also implement. (I don't know how XPCOM works, maybe there is a better way of doing that or something.)
ok, so the simplest patch would be to just QI to nsIHttpChannel and call GetResponseHeader("content-type"). if you get an empty string back, use the advisory type. else, fallback on nsIChannel::contentType. if not HTTP, then use nsIChannel::contentType. (is it really critical that we do anything special for data URLs?) i understand that this patch would mean adding knowledge of the HTTP channel implementation in the CSS loader. yes, that is bad, but for the moment it is really only the HTTP protocol that has to be treated in a special way according to the spec, right? if so, then it seems like special casing is not so bad. nsHttpChannel has a big enough vtable as it is... is it really worth it to add yet another interface to its vtable? maybe, but i'm not convinced. btw: the unknown content decoder is not used when loading text/css. so the only sniffing that happens is sniffing of the file extension. and that is not done for the data protocol. the data channel defaults to text/plain if the content-type is not specified in the URL. if we see text/plain can we use the advisory type? seems reasonable to me.
> nsHttpChannel has a big enough vtable as it is... is it really worth it to add > yet another interface to its vtable? maybe, but i'm not convinced. Could this new interface be derived from nsIChannel, and nsIHttpChannel could derive from it?
I don't see why we should special case text/plain just because we can't get our architecture correct. The reason I think hardcoding support for HTTP is silly is because it makes adding new protocols very hard. (You have to go in and edit all the code that has hardcoded assumptions about protocols instead of just implementing the various interfaces.) Anyway, there are already other protocols that require similar fallback mechanisms for Content-Type -- IMAP comes to mind. Also, I'm concerned about your comment about the fact that we have two different sniffing mechanisms... surely there should only be one?
nope, there are two different sniffing mechanisms ;-) which perhaps points us at the best solution for this bug. perhaps all we really need to do is move the sniffing out of necko completely. currently, docloader invokes a content sniffer to actually look at the content bytes to determine the content type of a stream if the channel doesn't specify a content type. perhaps docloader should talk to the MIME service before invoking the content sniffer. that would have the benefit of putting all of the content-type sniffing code in one place. the only real problem with this design is the fact that not all necko consumers go through the docloader, including CSS, JS, and inline images. in the case of CSS we are happy to not have necko talk to the MIME service. in the case of JS the same is probably true. and with images, i believe imagelib does some sniffing of its own to figure out what to do. still, there are probably other consumers that i'm forgetting. clearly some folks may be utilizing the fact that file:// URLs come back with a content-type based on the file-extension. anyhow, if we go down this road, we probably will end up with something a lot cleaner, but i don't expect the path getting there to be so smooth. we're likely to see a lot of regressions from a change like this.
> perhaps docloader should talk to the MIME service before invoking the > content sniffer FTP and file:// do it that way.... That is, GetContentType() on those channels talks to the MIME service and only returns "unknown" if the MIME service says nothing useful, at which point the docloader sniffs. Is the suggestion to move the MIME service access out of the channel impls and into the docloader? Imagelib content-sniffs based on magic numbers, since the image formats have well-defined magic numbers. It would be unaffected by any changes we make to necko since it does not trust the necko type in any case.
"Imglib...does not trust the necko type in any case". Oh dear. So if I serve a PNG as image/gif, will it still be decoded as a PNG?
Yes. For one thing, lots of sites actually do that. For another, feeding a png to the gif decoder would likely crash it or something like that....
bz: yup, i was suggesting moving queries to the MIME service out of necko. i know it isn't a trivial change, but it might yield the cleanest end result.
Yeah, I buy that. That way consumers could choose for themselves whether to query the unknown content decoder and whether to query the MIME service... the docloader would, the style system would not... I'd be willing to work on that if people feel that's a reasonable approach. (Just have to make sure plugins still work from file:// urls after I'm done ;).)
Boris: stats? I'm skeptical...and this seems as heinous as any of the MIME-type-ignoring issues we've fulminated against. Perhaps a better solution would be to check the magic #s, and if they don't correspond to the necko-supplied type, display a broken image?
The libpr0n image handling issue is a separate (and, I believe, already filed) bug; let's not discuss it here. Removing the content sniffing from Necko altogether sounds perfect. So the usage model would be: If channel->contentType() is known, use it, else, do whatever is appropriate at this point. (e.g. for stuff fetched from <link> or <a>, use the type="" advisory information, otherwise, sniff.) That also, as darin pointed out, gives us the opportunity to put all our content sniffing code in one place. Regressions we should expect if we don't catch all the call sites are: * documents that used to render fine (e.g. from ftp://, imap://, file://, and such like) may now claim they are unknown. * some documents may be sniffed to different content types. Both seem relatively unlikely enough that I think we should go for it. Note that some call sites should bypass the sniffing altogether. For example, if we have the following link: <link rel="stylesheet" href="foo"> ...and "foo" doesn't have an explicit type (e.g. it is a local file on a Unix filesystem) then we should not try to sniff to check if it is CSS, we should just assume that it is.
OK. I've been looking at this, and the suggested approach has one minor problem.... nsFileChannel and nsFileIO both have an mFile member that they use to get the content type. Darin/dougt, will the nsStreamIOChannel wrapped around the nsFileIO always have a file:// uri? If so, we could GetURI, QI to nsIFileURL, get the nsIFile, and be in business....If not, we need some other way of getting the nsIFile from an arbitrary channel that may happen to wrap a file....
Priority: P3 → P1
Target Milestone: --- → mozilla1.5alpha
Attached patch proof-of-concept patch (obsolete) — Splinter Review
If darin is OK with this general approach, I'll go ahead and make similar changes to other channels that need them (eg ftp).
Comment on attachment 126095 [details] [diff] [review] proof-of-concept patch Thoughts?
Attachment #126095 - Flags: superreview?(darin)
Comment on attachment 126095 [details] [diff] [review] proof-of-concept patch >+ * type of data it should be expecting. This function must be called after >+ * the channel is opened. don't you mean "before the channel is opened"??? that's what seems to be happening in your example with the CSS loader. if so, then why can't we just use nsIChannel::SetContentType? what am i missing?
Attachment #126095 - Flags: superreview?(darin) → superreview-
it'd be a semantic change of the meaning of SetContentType when called before AsyncOpen (which currently has undefined behavior).
Hmm... I meant "after", but either one is actually OK, I suspect. The cssloader code was just a proof-of-concept and I should change it. I can't use SetContentType, because that would override the content-type header for HTTP channels (and HTTP will be implementing this interface, with the hint only used for http/0.9 responses).
Right. That's why I wanted to go with "after" -- to parallel SetContentType. We should be doing that; I've moved the cssloader code to after we create the streamloader.
uhm, but my point was... why not accomplish this with SetContentType called before AsyncOpen? we could document that that serves as a hint to the channel... that it may choose to override the hint if it has absolute knowledge of the content type.
Because that's not what SetContentType does now and the nsIChannel interface is frozen, so I don't think making this large a change to it is advisable?
no, today SetContentType may only be called after OnStartRequest fires and then only if the nsIRequest::status is not a failure code. it is currently an error (nsHttpChannel::SetContentType returns NS_ERROR_NOT_AVAILABLE in fact) if SetContentType is called before OnStartRequest has fired. so, i think we are in the clear to redefine the meaning of SetContentType when called before AsyncOpen.
Oh. _Before_ asyncopen. I need to learn to read. Yeah, totally. If you are fine with that interface change, I will go ahead and make it and then fix this damn bug. ;)
yes, let's do that... i think i even have a bug about improving the documentation for nsIChannel::contentType (and related attributes).
Attached patch patch (obsolete) — Splinter Review
This implements Darin's idea. Most of the changes are self-explanatory. The change to the modeline in the viewsourcechannel files just brings the modeline in harmony with reality -- those files are 4-space indented. I fixed some indentation issues that had crept in via the modeline. The changes to the binhex decoder are just cleanup; I was looking at all SetContentType impls and callers and that function caught my eye as needing improvement. Most of our channel impls do not allow SetContentType to do anything at all, and I did not modify any of those.
Attachment #126095 - Attachment is obsolete: true
Comment on attachment 126144 [details] [diff] [review] patch It's not as big and scary as it looks, except the nsIChannel part...
Attachment #126144 - Flags: superreview?(darin)
Attachment #126144 - Flags: review?(dougt)
Comment on attachment 126144 [details] [diff] [review] patch >Index: netwerk/protocol/data/src/nsDataChannel.h > #endif /* nsFTPChannel_h___ */ can you fix up this typo at the bottom of the file while you're at it? thanks! >Index: netwerk/protocol/data/src/nsDataChannel.cpp >Index: netwerk/protocol/ftp/src/nsFTPChannel.cpp >Index: netwerk/protocol/gopher/src/nsGopherChannel.h >Index: netwerk/protocol/gopher/src/nsGopherChannel.cpp >+ // We are being given a content-type hint. >+ // XXX what about charset, if any? Just ignore it for now.... >+ nsCAutoString charsetBuf; >+ NS_ParseContentType(aContentType, mContentTypeHint, charsetBuf); nsIChannel::contentCharset should have the same semantics as nsIChannel::contentType. so, we should make SetContentCharset before AsyncOpen mean "here's what we think the charset will be" >Index: netwerk/protocol/http/src/nsHttpChannel.cpp >+ if (mIsPending && !mResponseHead) > return NS_ERROR_NOT_AVAILABLE; > >+ if (mIsPending) { nit: move |if (!mResponseHead)| check inside this block so you don't have to test mIsPending twice. >Index: netwerk/protocol/viewsource/src/nsViewSourceChannel.cpp >+ if (NS_SUCCEEDED(rv)) { >+ mOpened = PR_TRUE; >+ } inconsistent indentation style. >Index: netwerk/streamconv/converters/nsBinHexDecoder.h > nsresult ProcessNextState(nsIRequest * aRequest, nsISupports * aContext); >- nsresult SetContentType(nsIRequest * aRequest, const char * fileName); >+ nsresult SetContentType(nsIRequest* aRequest, const char * fileName); hmm.. style change is not consistent. >Index: netwerk/streamconv/converters/nsBinHexDecoder.cpp >+ nsCOMPtr<nsIChannel> channel(do_QueryInterface(aRequest)); >+ NS_ENSURE_TRUE(channel, NS_ERROR_UNEXPECTED); better to NS_ENSURE_SUCCESS(rv, rv), that way all return points look like |return rv;|... plus it's better to return the error code corresponding to the QI failure than to create one like this. >+ nsresult rv = NS_OK; > nsCOMPtr<nsIMIMEService> mimeService (do_GetService("@mozilla.org/mime;1", &rv)); no need to initialize |rv|... do_GetService is an inline expansion of several functions from which i think the compiler can determine that |rv| in always initialized. at least, i've never seen a compiler warning about an uninitalized variable. >+ // extract the extension from fileName and look it up. >+ const char * fileExt = PL_strrchr(fileName, '.'); hmm... you've already null checked fileName, so how about just calling strrchr instead ;-)
Attachment #126144 - Flags: superreview?(darin) → superreview-
Attachment #126144 - Flags: review?(dougt)
Attached patch Addresses darin's comments (obsolete) — Splinter Review
Good point on charsets.
Attachment #126144 - Attachment is obsolete: true
Attachment #126294 - Flags: superreview?(darin)
Attachment #126294 - Flags: review?(dougt)
Comment on attachment 126294 [details] [diff] [review] Addresses darin's comments so how about adding similar documentation for nsIChannel::contentCharset? ;-)
Attachment #126294 - Flags: superreview?(darin) → superreview-
Attached patch Sure. (obsolete) — Splinter Review
Attachment #126294 - Attachment is obsolete: true
Attachment #126294 - Attachment is obsolete: false
Attachment #126294 - Flags: review?(dougt)
Attachment #126297 - Flags: superreview?(darin)
Attachment #126297 - Flags: review?(dougt)
Comment on attachment 126297 [details] [diff] [review] Sure. >Index: netwerk/base/src/nsInputStreamChannel.cpp > NS_IMETHODIMP > nsInputStreamChannel::SetContentType(const nsACString &aContentType) > { >+ // If someone gives us a type hint we should just use that type. >+ // So we don't care when this is being called. do you want these kind of comments for SetContentCharset as well? seems like it'd be good. >Index: netwerk/protocol/http/src/nsHttpChannel.cpp > if (mResponseHead && mResponseHead->ContentType().IsEmpty()) { >+ if (!mContentTypeHint.IsEmpty()) { >+ mResponseHead->SetContentType(mContentTypeHint); >+ } else { nit: style for HTTP code is to leave out the braces when not needed. if (!mContentTypeHint.IsEmpty()) mResponseHead->SetContentType(mContentTypeHint); else { sr=darin with nits addressed (i don't need to see a new patch)
Attachment #126297 - Flags: superreview?(darin) → superreview+
Attachment #126297 - Flags: review?(dougt) → review+
Attached patch pick those nitsSplinter Review
Attachment #126294 - Attachment is obsolete: true
Attachment #126297 - Attachment is obsolete: true
Fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: