Bogus content-type headers indistinguishable from absence of content-type header

NEW
Unassigned

Status

()

Core
Networking: HTTP
P5
normal
8 years ago
3 months ago

People

(Reporter: zwol, Unassigned)

Tracking

(Blocks: 1 bug, {sec-low})

Trunk
x86
Linux
sec-low
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:low] data exfiltration from sites with bad HTTP content labeling[necko-would-take])

(Reporter)

Description

8 years ago
nsIChannel::GetContentType appears to return the magic string "application/x-unknown-content-type" not only when there was no Content-Type header at all, but when it was unparseable (e.g. "Content/Type: */*", "Content-Type: bogus", or "Content-Type:") or when the server actually provided "application/x-unknown-content-type" as the header value.

For security reasons (see bug 524223 -- the instant concern is style sheets) I need to be able to reliably distinguish the total absence of a Content-Type header (which should trigger content sniffing) from a Content-Type header that was present but gobbledygook (which should, at least for style sheets, cause the load to be discarded).

Proposal:

We have three internal-use-only MIME types: application/x-unknown-content-type, application/x-vnd.mozilla.guess-from-ext, and application/x-view-source.  Move these to a new x-internal/ type group (so x-internal/unknown, x-internal/guess-from-ext, x-internal/view-source -- no need for double x- prefixes).  Add another such type, x-internal/parse-error.  Make the Content-Type header parser give back x-internal/parse-error whenever the Content-Type header is empty or nonsense.  If we see x-internal/anything from the server, map that to x-internal/parse-error as well.

Consumers of this information should normally treat x-internal/parse-error as equivalent to application/octet-stream, but it's possible that debugging extensions or equivalent might want to distinguish them.

Consumers should also treat failure of GetContentType() as equivalent to a result of x-internal/parse-error.

This is not technically a problem with the HTTP code -- my best guess at the proper location of the fix is netwerk/base/nsUrlHelper.cpp:net_parseMediaType -- but HTTP is, as far as I can tell, the only protocol that we have that believes Content-Type headers sent by the server, so filing it there.
(Reporter)

Updated

8 years ago
Whiteboard: [sg:low] data exfiltration from sites with bad HTTP content labeling
data: presumably has similar behavior, right?  Or does it end up falling back on text/plain or bailing out of the type is not parseable?

If all we cared about is HTTP you could look at the header value yourself, but I agree that it would be better to not create special-cases like that.

The proposal sounds fine to me, except the part about treating the parse error as octet-stream.  I'd need some data on what other UAs do for that; it could turn into a web compat issue.
(Reporter)

Comment 2

8 years ago
(In reply to comment #1)
> data: presumably has similar behavior, right?  Or does it end up falling back
> on text/plain or bailing out of the type is not parseable?

Dunno, will investigate.

> The proposal sounds fine to me, except the part about treating the parse error
> as octet-stream.  I'd need some data on what other UAs do for that; it could
> turn into a web compat issue.

There's a test page that serves CSS under a variety of content types at http://crypto.stanford.edu/~collinj/research/css/ and it shouldn't be that hard to extend to other stuff.  I get the impression that (post their equivalent of bug 524223 being fixed) other browsers are pickier than we are about malformed Content-Type.
Oh, I don't mean for CSS.  For CSS I'm happy to be picky and treat the bogus types as application/octet-stream here.  My concern is mostly full-page loads.
(Reporter)

Comment 4

8 years ago
http://tools.ietf.org/html/draft-abarth-mime-sniff-04 seems to consider unparseable content-type headers as equivalent to none at all, but it doesn't consider CSS at all, and is quite eager about falling back to text/plain or application/octet-stream.
Yeah, that's basically the algorithm for loads in <iframe>s.  For other types of loads, different sniffing rules need to apply.
Keywords: sec-low
(Reporter)

Updated

5 years ago
Blocks: 808593
Whiteboard: [sg:low] data exfiltration from sites with bad HTTP content labeling → [sg:low] data exfiltration from sites with bad HTTP content labeling[necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.