Open
Bug 986924
Opened 11 years ago
Updated 2 years ago
binary gzip content-encoded contents sent as text/plain shows as gibberish
Categories
(Core :: Networking, defect, P3)
Core
Networking
Tracking
()
NEW
People
(Reporter: moumny, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-backlog])
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release)
Build ID: 20140314220517
Steps to reproduce:
go to
http://popcorn.cdnjd.com/
click downlowd
Actual results:
it opens new tab
displays binary as plain text
Expected results:
handle like other archives: ask/open/save
Reproduced on ubuntu 13.10 64-bits.
Component: Untriaged → File Handling
Comment 2•11 years ago
|
||
Do any other browsers handle this differently? This looks like a server error, since there's a text/plain Content-Type header being sent.
Assignee: nobody → english-us
Component: File Handling → English US
Product: Firefox → Tech Evangelism
Version: 28 Branch → unspecified
Comment 5•11 years ago
|
||
Huh. Chrome appears to be sending the same request headers and receiving the same response headers as us, but apparently it chooses to interpret the result internally as application/x-gzip. That's interesting.
Assignee: english-us → nobody
Component: English US → File Handling
Product: Tech Evangelism → Core
Comment 6•11 years ago
|
||
Chromium has code that sniffs out gzip headers: https://code.google.com/p/chromium/codesearch#chromium/src/net/base/mime_sniffer.cc&q=application/x-gzip&sq=package:chromium&dr=C&l=155. We have similar code in nsUnknownDecoder, but don't include such an entry: http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#266
Component: File Handling → Networking
Comment 7•11 years ago
|
||
Not only that. The sniffing part of nsUnknownDecoder::DetermineContentType() only runs if mContentType is empty, which in the case shown here isn't. See: http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#301
The server says it is text/plain while in reality it is gzip compressed content.
Whichever is the "correct" behavior here is hard to tell.
Comment 8•11 years ago
|
||
I can be noted that if we'd follow the simple guidelines in the w3c document on mime sniffing on how to detect binary vs text (http://mimesniff.spec.whatwg.org/#binary-data-byte) we would only need to read the first byte to (0x1f) determine that this is a binary response...
My vote is that we start sniffing for binary in text/* responses. I'm sure it'll bring some other "interesting" side-effects though and I'm not in a position to say what they are and if they're worth this extra detection.
Comment 9•11 years ago
|
||
Please search for "check-for-apache-bug flag" in the mimesniff spec.
Comment 10•11 years ago
|
||
" Platform: x86 Mac OS X "
Not only: same problem on Ubuntu 12.04 (Linux x86_64)
Reporter | ||
Comment 11•11 years ago
|
||
The other problem with opening a binary file as plain text is that it can crash/hang the browser. It happened to me twice (4GB ram).
While a power user can easily circumvent the problems, most users will be just confused/annoyed. So I think my vote goes for sniffing for binary.
Updated•11 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Comment 12•11 years ago
|
||
I can't reproduce this on Aurora 29 on Ubuntu 10.13; I get prompted to download a file of "unknown" type.
However, if the resource is reported as 'text/plain', then the fact that it has binary data bytes should cause it to be sniffed as 'application/octet-stream', per the following algorithms:
http://mimesniff.spec.whatwg.org/#supplied-mime-type-detection-algorithm
http://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm
http://mimesniff.spec.whatwg.org/#rules-for-text-or-binary
Comment 13•11 years ago
|
||
The original URL doesn't work anymore but I have some further details to shed on this behavior. I've setup a test URL (http://daniel.haxx.se/dump2.cgi) that only serves the beginning of the file the original URL provided. ~2K out of the original 49MB. I hope that I mimic the original problem close enough here.
The response headers it sends are these:
HTTP/1.1 200 OK
Date: Thu, 03 Apr 2014 09:57:50 GMT
Server: Apache/2.4.6 (Debian)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/plain
The beginning of the response body contains the three magic bytes 1f 8b 08 that can identify it as gzip data, but note that there's nothing that says it is.
Using Firefox network tools to inspect the response headers, it claims there's a "Content-Encoding: gzip" header (but that wasn't actually present over the wire). I figure that has been sniffed (and added) somewhere previous in the funcion call chain. I'm afraid I don't know yet exactly where that's done.
Then, in nsBinaryDetector::DetermineContentType() the code tries to determine if the content is truly text/plain or possibly binary. That check is *aborted* if Content-Encoding is set! If I just edit out that check (http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#626) the function will successfully detect the contents as binary and pop up a dialogue with an offer to download it...
Comment 14•11 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #13)
> Vary: Accept-Encoding
> Using Firefox network tools to inspect the response headers, it claims
> there's a "Content-Encoding: gzip" header (but that wasn't actually present
> over the wire).
How did you inspect the response header on the wire? "Vary: Accept-Encoding" means that the response will vary depending on the Accept-Encoding header. Did you send "Accept-Encoding: gzip" on the request?
Comment 15•11 years ago
|
||
Argh. Sorry, I messed up. The headers on the wire do indeed include "Content-Encoding: gzip" I looked on the wrong request. This is the request:
GET /dump2.cgi HTTP/1.1
Host: daniel.haxx.se
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
which actually gets this repsonse:
HTTP/1.1 200 OK
Date: Thu, 03 Apr 2014 11:07:45 GMT
Server: Apache/2.4.6 (Debian)
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 2787
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/plain
The cgi on my server just does this, so I'm a bit surprised it delivers a Content-Length:
#!/bin/sh
echo "Content-Type: text/plain"
echo ""
cat file
So, yeah the server does indeed say this is Content-Encoding gzip and text/plain.
Updated•11 years ago
|
Assignee: nobody → daniel
Comment 16•11 years ago
|
||
What's missing here is that the main sniffing is done _before_ the decompressing of the content. The decompressed content is delivered without sniffing (except for trying to figure out charset) and the content-type is trusted.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Updated•11 years ago
|
Summary: Firefox considers .tgz archive as plain text → binary gzip content-encoded contents sent as text/plain shows as gibberish
Comment 17•11 years ago
|
||
Continued: so the stream parser decompresses the content and call nsHTTPCompressConv::do_OnDataAvailable() for each decompressed chunk. This is used for all sorts of data and not just html or text to render, so we can sniff there. This function then calls the mListener->OnDataAvailable() to deliver the data.
In our problematic case, that function is nsHtml5StreamParser::DoDataAvailable() which calls nsHtml5StreamParser::SniffStreamBytes() in which I've played with adding detection logic for binary contents like below.
1 - I think the detection isn't good enough for UTF16 contents
2 - I don't know what to do if we truly detect the contents is binary and not text at all!
--- a/parser/html/nsHtml5StreamParser.cpp
+++ b/parser/html/nsHtml5StreamParser.cpp
@@ -737,10 +737,21 @@ nsHtml5StreamParser::SniffStreamBytes(const uint8_t* aFromSegment,
mTreeBuilder->SetDocumentCharset(mCharset, mCharsetSource);
return SetupDecodingAndWriteSniffingBufferAndCurrentSegment(
aFromSegment, aCount, aWriteCount);
}
}
+ else if (mMode == PLAIN_TEXT) {
+ uint32_t i;
+ for(i=0; i<countToSniffingLimit; i++) {
+ if(!aFromSegment[i]) {
+ fprintf(stderr, "***************** found zero at index %u\n", i);
+ break;
+ }
+ }
+ }
if (mCharsetSource == kCharsetFromParentForced ||
mCharsetSource == kCharsetFromUserForced) {
// meta not found, honor override
return SetupDecodingAndWriteSniffingBufferAndCurrentSegment(
aFromSegment, aCount, aWriteCount);
Comment 18•11 years ago
|
||
This bug is set to be blocking bug 808593, but possibly it is the other way around and this bug would not even exist if bug 808593 was implemented...
Comment 19•11 years ago
|
||
Please implement the algorithm from the MIME Sniffing Standard [1] instead of inventing yet another own algorithm.
[1] http://mimesniff.spec.whatwg.org/#sniffing-a-mislabeled-binary-resource
Comment 20•11 years ago
|
||
1 - I didn't invent a new algorithm, I was testing where I could detect binary.
2 - See bug 808593
Updated•9 years ago
|
Whiteboard: [necko-backlog]
Comment 21•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Comment 22•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Comment 23•7 years ago
|
||
Possible duplicate of bug 864851?
Updated•6 years ago
|
Assignee: daniel → nobody
Status: ASSIGNED → NEW
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•