986924 - binary gzip content-encoded contents sent as text/plain shows as gibberish

Reporter

Description

•

11 years ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release) Build ID: 20140314220517 Steps to reproduce: go to http://popcorn.cdnjd.com/ click downlowd Actual results: it opens new tab displays binary as plain text Expected results: handle like other archives: ask/open/save

moumny

Reporter

Comment 1

•

11 years ago

Reproduced on ubuntu 13.10 64-bits.

Component: Untriaged → File Handling

Josh Matthews [:jdm]

Comment 2

•

11 years ago

Do any other browsers handle this differently? This looks like a server error, since there's a text/plain Content-Type header being sent.

Assignee: nobody → english-us

Component: File Handling → English US

Product: Firefox → Tech Evangelism

Version: 28 Branch → unspecified

moumny

Reporter

Comment 3

•

11 years ago

Yes, Chrome download the file as you would expect.

moumny

Reporter

Comment 4

•

11 years ago

Safari works too. Best regards.

Josh Matthews [:jdm]

Comment 5

•

11 years ago

Huh. Chrome appears to be sending the same request headers and receiving the same response headers as us, but apparently it chooses to interpret the result internally as application/x-gzip. That's interesting.

Assignee: english-us → nobody

Component: English US → File Handling

Product: Tech Evangelism → Core

Josh Matthews [:jdm]

Comment 6

•

11 years ago

Chromium has code that sniffs out gzip headers: https://code.google.com/p/chromium/codesearch#chromium/src/net/base/mime_sniffer.cc&q=application/x-gzip&sq=package:chromium&dr=C&l=155. We have similar code in nsUnknownDecoder, but don't include such an entry: http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#266

Component: File Handling → Networking

Daniel Stenberg [:bagder]

Comment 7

•

11 years ago

Not only that. The sniffing part of nsUnknownDecoder::DetermineContentType() only runs if mContentType is empty, which in the case shown here isn't. See: http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#301 The server says it is text/plain while in reality it is gzip compressed content. Whichever is the "correct" behavior here is hard to tell.

Daniel Stenberg [:bagder]

Comment 8

•

11 years ago

I can be noted that if we'd follow the simple guidelines in the w3c document on mime sniffing on how to detect binary vs text (http://mimesniff.spec.whatwg.org/#binary-data-byte) we would only need to read the first byte to (0x1f) determine that this is a binary response... My vote is that we start sniffing for binary in text/* responses. I'm sure it'll bring some other "interesting" side-effects though and I'm not in a position to say what they are and if they're worth this extra detection.

Masatoshi Kimura [:emk]

Comment 9

•

11 years ago

Please search for "check-for-apache-bug flag" in the mimesniff spec.

Masatoshi Kimura [:emk]

Updated

•

11 years ago

Blocks: mimesniff

vulcain

Comment 10

•

11 years ago

" Platform: x86 Mac OS X " Not only: same problem on Ubuntu 12.04 (Linux x86_64)

moumny

Reporter

Comment 11

•

11 years ago

The other problem with opening a binary file as plain text is that it can crash/hang the browser. It happened to me twice (4GB ram). While a power user can easily circumvent the problems, most users will be just confused/annoyed. So I think my vote goes for sniffing for binary.

Masatoshi Kimura [:emk]

Updated

•

11 years ago

OS: Mac OS X → All

Hardware: x86 → All

Gordon P. Hemsley [:GPHemsley]

Comment 12

•

11 years ago

I can't reproduce this on Aurora 29 on Ubuntu 10.13; I get prompted to download a file of "unknown" type. However, if the resource is reported as 'text/plain', then the fact that it has binary data bytes should cause it to be sniffed as 'application/octet-stream', per the following algorithms: http://mimesniff.spec.whatwg.org/#supplied-mime-type-detection-algorithm http://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm http://mimesniff.spec.whatwg.org/#rules-for-text-or-binary

Daniel Stenberg [:bagder]

Comment 13

•

11 years ago

The original URL doesn't work anymore but I have some further details to shed on this behavior. I've setup a test URL (http://daniel.haxx.se/dump2.cgi) that only serves the beginning of the file the original URL provided. ~2K out of the original 49MB. I hope that I mimic the original problem close enough here. The response headers it sends are these: HTTP/1.1 200 OK Date: Thu, 03 Apr 2014 09:57:50 GMT Server: Apache/2.4.6 (Debian) Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/plain The beginning of the response body contains the three magic bytes 1f 8b 08 that can identify it as gzip data, but note that there's nothing that says it is. Using Firefox network tools to inspect the response headers, it claims there's a "Content-Encoding: gzip" header (but that wasn't actually present over the wire). I figure that has been sniffed (and added) somewhere previous in the funcion call chain. I'm afraid I don't know yet exactly where that's done. Then, in nsBinaryDetector::DetermineContentType() the code tries to determine if the content is truly text/plain or possibly binary. That check is *aborted* if Content-Encoding is set! If I just edit out that check (http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#626) the function will successfully detect the contents as binary and pop up a dialogue with an offer to download it...

Masatoshi Kimura [:emk]

Comment 14

•

11 years ago

(In reply to Daniel Stenberg [:bagder] from comment #13) > Vary: Accept-Encoding > Using Firefox network tools to inspect the response headers, it claims > there's a "Content-Encoding: gzip" header (but that wasn't actually present > over the wire). How did you inspect the response header on the wire? "Vary: Accept-Encoding" means that the response will vary depending on the Accept-Encoding header. Did you send "Accept-Encoding: gzip" on the request?

Daniel Stenberg [:bagder]

Comment 15

•

11 years ago

Argh. Sorry, I messed up. The headers on the wire do indeed include "Content-Encoding: gzip" I looked on the wrong request. This is the request: GET /dump2.cgi HTTP/1.1 Host: daniel.haxx.se User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive which actually gets this repsonse: HTTP/1.1 200 OK Date: Thu, 03 Apr 2014 11:07:45 GMT Server: Apache/2.4.6 (Debian) Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 2787 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/plain The cgi on my server just does this, so I'm a bit surprised it delivers a Content-Length: #!/bin/sh echo "Content-Type: text/plain" echo "" cat file So, yeah the server does indeed say this is Content-Encoding gzip and text/plain.

Daniel Stenberg [:bagder]

Updated

•

11 years ago

Assignee: nobody → daniel

Daniel Stenberg [:bagder]

Comment 16

•

11 years ago

What's missing here is that the main sniffing is done _before_ the decompressing of the content. The decompressed content is delivered without sniffing (except for trying to figure out charset) and the content-type is trusted.

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Daniel Stenberg [:bagder]

Updated

•

11 years ago

Summary: Firefox considers .tgz archive as plain text → binary gzip content-encoded contents sent as text/plain shows as gibberish

Daniel Stenberg [:bagder]

Comment 17

•

11 years ago

Continued: so the stream parser decompresses the content and call nsHTTPCompressConv::do_OnDataAvailable() for each decompressed chunk. This is used for all sorts of data and not just html or text to render, so we can sniff there. This function then calls the mListener->OnDataAvailable() to deliver the data. In our problematic case, that function is nsHtml5StreamParser::DoDataAvailable() which calls nsHtml5StreamParser::SniffStreamBytes() in which I've played with adding detection logic for binary contents like below. 1 - I think the detection isn't good enough for UTF16 contents 2 - I don't know what to do if we truly detect the contents is binary and not text at all! --- a/parser/html/nsHtml5StreamParser.cpp +++ b/parser/html/nsHtml5StreamParser.cpp @@ -737,10 +737,21 @@ nsHtml5StreamParser::SniffStreamBytes(const uint8_t* aFromSegment, mTreeBuilder->SetDocumentCharset(mCharset, mCharsetSource); return SetupDecodingAndWriteSniffingBufferAndCurrentSegment( aFromSegment, aCount, aWriteCount); } } + else if (mMode == PLAIN_TEXT) { + uint32_t i; + for(i=0; i<countToSniffingLimit; i++) { + if(!aFromSegment[i]) { + fprintf(stderr, "***************** found zero at index %u\n", i); + break; + } + } + } if (mCharsetSource == kCharsetFromParentForced || mCharsetSource == kCharsetFromUserForced) { // meta not found, honor override return SetupDecodingAndWriteSniffingBufferAndCurrentSegment( aFromSegment, aCount, aWriteCount);

Daniel Stenberg [:bagder]

Comment 18

•

11 years ago

This bug is set to be blocking bug 808593, but possibly it is the other way around and this bug would not even exist if bug 808593 was implemented...

Masatoshi Kimura [:emk]

Comment 19

•

11 years ago

Please implement the algorithm from the MIME Sniffing Standard [1] instead of inventing yet another own algorithm. [1] http://mimesniff.spec.whatwg.org/#sniffing-a-mislabeled-binary-resource

Daniel Stenberg [:bagder]

Comment 20

•

11 years ago

1 - I didn't invent a new algorithm, I was testing where I could detect binary. 2 - See bug 808593

Patrick McManus [:mcmanus]

Updated

•

9 years ago

Whiteboard: [necko-backlog]

Firefox Bug Husbandry Bot

Comment 21

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P1

Firefox Bug Husbandry Bot

Comment 22

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: P1 → P3

Anne (:annevk)

Comment 23

•

7 years ago

Possible duplicate of bug 864851?

Daniel Stenberg [:bagder]

Updated

•

6 years ago

Assignee: daniel → nobody

Status: ASSIGNED → NEW

BMO Automation

Updated

•

2 years ago

Severity: normal → S3