nsIChannel::GetContentType appears to return the magic string "application/x-unknown-content-type" not only when there was no Content-Type header at all, but when it was unparseable (e.g. "Content/Type: */*", "Content-Type: bogus", or "Content-Type:") or when the server actually provided "application/x-unknown-content-type" as the header value. For security reasons (see bug 524223 -- the instant concern is style sheets) I need to be able to reliably distinguish the total absence of a Content-Type header (which should trigger content sniffing) from a Content-Type header that was present but gobbledygook (which should, at least for style sheets, cause the load to be discarded). Proposal: We have three internal-use-only MIME types: application/x-unknown-content-type, application/x-vnd.mozilla.guess-from-ext, and application/x-view-source. Move these to a new x-internal/ type group (so x-internal/unknown, x-internal/guess-from-ext, x-internal/view-source -- no need for double x- prefixes). Add another such type, x-internal/parse-error. Make the Content-Type header parser give back x-internal/parse-error whenever the Content-Type header is empty or nonsense. If we see x-internal/anything from the server, map that to x-internal/parse-error as well. Consumers of this information should normally treat x-internal/parse-error as equivalent to application/octet-stream, but it's possible that debugging extensions or equivalent might want to distinguish them. Consumers should also treat failure of GetContentType() as equivalent to a result of x-internal/parse-error. This is not technically a problem with the HTTP code -- my best guess at the proper location of the fix is netwerk/base/nsUrlHelper.cpp:net_parseMediaType -- but HTTP is, as far as I can tell, the only protocol that we have that believes Content-Type headers sent by the server, so filing it there.
Whiteboard: [sg:low] data exfiltration from sites with bad HTTP content labeling
data: presumably has similar behavior, right? Or does it end up falling back on text/plain or bailing out of the type is not parseable? If all we cared about is HTTP you could look at the header value yourself, but I agree that it would be better to not create special-cases like that. The proposal sounds fine to me, except the part about treating the parse error as octet-stream. I'd need some data on what other UAs do for that; it could turn into a web compat issue.
(In reply to comment #1) > data: presumably has similar behavior, right? Or does it end up falling back > on text/plain or bailing out of the type is not parseable? Dunno, will investigate. > The proposal sounds fine to me, except the part about treating the parse error > as octet-stream. I'd need some data on what other UAs do for that; it could > turn into a web compat issue. There's a test page that serves CSS under a variety of content types at http://crypto.stanford.edu/~collinj/research/css/ and it shouldn't be that hard to extend to other stuff. I get the impression that (post their equivalent of bug 524223 being fixed) other browsers are pickier than we are about malformed Content-Type.
Oh, I don't mean for CSS. For CSS I'm happy to be picky and treat the bogus types as application/octet-stream here. My concern is mostly full-page loads.
http://tools.ietf.org/html/draft-abarth-mime-sniff-04 seems to consider unparseable content-type headers as equivalent to none at all, but it doesn't consider CSS at all, and is quite eager about falling back to text/plain or application/octet-stream.
Yeah, that's basically the algorithm for loads in <iframe>s. For other types of loads, different sniffing rules need to apply.
Whiteboard: [sg:low] data exfiltration from sites with bad HTTP content labeling → [sg:low] data exfiltration from sites with bad HTTP content labeling[necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.