220807 - [FIX]Prompt user about invalid text/plain content - solving most incorrect MIME type issues

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20030923 Firebird/0.7+ Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6a) Gecko/20030923 Firebird/0.7+ When Firebird receives content which the server has set to the MIME type "text/plain", but the data contains invalid ASCII control characters (as can be identified from IETF RFC 2046), Firebird should prompt the user as to what to do with the data rather than attempt to display it immediately. This behaviour is intended to avoid the common problem of attempting to download binary data from mis-configured web-servers; and solves it *without* breaking Firebird's compliance with MIME type handling. The content of a "text/plain" document, according to IETF RFC 2046 (Section 4.1.2): ... Note that the control characters including DEL (0-31, 127) have no defined meaning in apart from the combination CRLF (US-ASCII values 13 and 10) indicating a new line. Two of the characters have de facto meanings in wide use: FF (12) ... and TAB or HT (9) ... Aside from these conventions, any use of the control characters or DEL in a body must either occur (1) because a subtype of text other than "plain" specifically assigns some additional meaning, or (2) within the context of a private agreement between the sender and recipient. Such private agreements are discouraged and should be replaced by the other capabilities of this document. [RFC 2046] Thus, the detection can be made from a stream sent as the "text/plain" MIME-type, yet the first few bytes contain one or more ASCII values in the range 0-9 or 14-31 (or, possibly, 127 - but I'm sure some codepages use it when they shouldn't). Reproducible: Always Steps to Reproduce: 1. Find a website with, e.g. a .rar file which has no MIME type associated with .rar files 2. Click on the link. Actual Results: Firebird starts displaying the erroneous text/plain stream in the browser -- the user sees complete garbage on their screen. Expected Results: (The server doesn't recognise the file type so, typically, claims it is "text/plain".) This stream could be identified (from the first few bytes) as containing invalid text/plain characters (in accordance with IETF RFC 2046), and the end user could be prompted with this information. The end-user could then be prompted as to what to do with this file - e.g. save it to disk (or display it anyway) I believe the FTP component of Firbird has a mechanism to detect binary data. Example Problem: An example where this is a common problem is: .RAR archive files stored on a website where the webmaster has no direct control over MIME type association. Alternatives: Many veteran users say "Just right-click and Save-As" or "Wait until the document loads fully, then save it all". The former is impossible if the download was accessed by filling in a form or as the result of a browser redirect. The latter is impractical to expect a novice user to understand and be satisfied with. Background: The problem is not with Firebird, but with mis-configured servers. However, it is important to bear in mind: * most servers appear to default to text/plain for unknown file types * new binary file types will be unknown * these unknown files will be sent as text/plain * in many cases, the webmaster cannot or will not directly control the MIME type association to correct this problem (either the site is hosted and they have no access, or may feel the majority of their (IE) users have no problem as other browsers use MIME-type detection) Precident: It's a similar situation to if the server sent random bytes claiming they were "image/gif". You wouldn't expect the browser to display a load of rubbish, you'd expect it to notice that something's wrong and perhaps display a "broken image" icon and the ALT text instead. This proposal isn't all that dissimilar: FB should cope with invalid text/plain, and the way it should cope should be by offering to save it instead -- this has the enormous benefit of also being a workaround for the many misconfigured servers out there. Multi-lingual notes: The default codepage for text/plain is "US-ASCII". Other languages may specify a different codepage or use UTF-8 encoding of Unicode. However, these facts make no difference to the interpretation of the control characters. I don't know whether, for example, Unicode UTF-16 would actually be valid for "text/plain". If it is, it would be easy to check for the FF EF (? - or vice-versa) unicode header. Suggested UI behaviour: If it appears likely that the stream is not "text/plain", the user should be prompted with a dialog similar to one that would be displayed if the content were "application/octet-stream": +------------------------------------ | | You have chosen to open | filename.rar | | from: http://www.website.com/ | | which has been sent as plain text, but it appears to be binary data. | | What should Mozilla Firebird do with this file? | | (*) Display as plain text | ( ) Open with [UnRAR |V] | ( ) Save to Disk | | [ ] Do this automatically for files like this from now on. | | [OK] [CANCEL] | +------------------------------------ - the changes being the "Display as plain text" option, and that the "open with" application could come from any applications on the user's system that are registered for that file extension. The "Do this automatically" option is undoubtedly one for debate. It could be used to always force this behaviour for that file extension, or perhaps only for future false text/plain detections. The user could then manage their choices as additions in the Tools|Options|Downloads panel -- but, in this case, by managing from the file extension rather than the MIME type. (Aside: why doesn't Tools|Options|Downloads currently display extensions *instead of* MIME types?) In short: The problem of misconfigured servers sending binary data as text/plain is unlikely to go away for several years. Many people agree something needs to be done to make Firebird more usable in this respect However, most binary files sent as text/plain are invalid for that format, Firebird could allow the user to choose what to do with these files. This is the best solution as far as I can see it, without violating the specs.

Josh

Comment 1

•

21 years ago

I just want to post to ask the devlopers to seriously consider this instead of just calling it another Mimetype bug and WONTFIX'ing it. This specific implementation does not technically deviate from WC3 standards, could be added as something that can be enabled/disabled by user preference, and would give users a way to handle misconfigured webservers (which would make a *LOT* of people happy);

Jesse Ruderman

Comment 2

•

21 years ago

See also bug 57342, Add "View as Text" option for unknown mime content-type.

Warren TenBrook

Comment 3

•

21 years ago

Also see Camino bug 221877. The only problem I forsee is the Firebird proposal uses sniffing to tell the user that text-plain is erroneous. This appears to violate the standards, since mozilla.org has said we should trust the server (see bug 175848 and RFC 2068, section 7.2.1). A different approach is to have the dialog state the file is indeed text-plain, so the server is trusted, but then give the user some options other than rendering in the window. It is a subtle change, but could be important for compliance.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

21 years ago

How does the proposed approach handle text/plain content in non-ASCII encodings? Say in UTF-16?

Daniel Jackson

Reporter

Comment 5

•

21 years ago

Regarding Comment #3 WRT calling it "content sniffing", I still believe the analogy of loading an erroneous image is valid; you don't expect it a browser to display garbage on your screen if the server claims a file is an image but it doesn't actually contain an image header. I guess the only concern is *when* the start of the "text/plain" stream is checked. If you look at it as checking the first few bytes to see if it can be displayed before actually displaying it, it really doesn't seem any different to the handling of images. However, the idea of always prompting about text/plain is a very good alternative to the proposal. There aren't *that* many text/plain files that a user does want to see per day, but to prevent it annoying people, perhaps there can be a list of file extensions that the user always wants to be displayed in the browser (pre-installed with ".txt" etc.) Regarding Comment #4 I'm no expert, but I'm not even sure the specs I read are valid for UTF-16. Regardless, it (and UTF-32) are easily identified by the byte-order mark (BOM) at the start of the file (e.g. somthing like 'FF EF'). If UTF-16 is valid for text/plain, then the "invalid text/plain" check wouldn't be performed on a text/plain stream if it starts with a valid BOM. If UTF-16 isn't valid for text/plain, and the check on the first few bytes determined a stream was invalid text/plain, then the BOM could be checked-for and, if present, the file could be displayed anyway. Small correction to original post: At the end of "Suggested UI behaviour": I was suppose to be asking, as an aside, "why *DOES* Tools|Options|Downloads currently display (file) extensions instead of MIME types"? This placement almost makes it appear as if those actions will be performed on files with those extensions irrespective of MIME type.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 6

•

21 years ago

> it (and UTF-32) are easily identified by the byte-order mark (BOM) that does not need to be there

Dominik Geisler

Comment 7

•

21 years ago

Could I ask on the developer opinion on this one? There are quite some votes, and it hasn't been WONTFIXed yet. However, there's also not been any response. Is this normal? (sorry, I've just tracked a few bugs so far.)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

21 years ago

The developers responsible for the MIME dispatch back end raised an issue with the proposed approach that has not been addressed (comment 4). At the same time, this bug is filed as a UI bug on the Firebird browser, not as a core bug. Which means that it should only be resolved by a Firebird developer.

Jeff Van Boxtel

Comment 9

•

21 years ago

Can this solution be implemented only for text/plain documents in the ASCII encoding? Or the solution could still be implemented for all cases and just change the wording on the dialog to say something like "... which has been sent as plain text, but appears to be either binary data or non-ascii encoding." The user could still choose to handle it whichever way they prefer and optionally choose to have future occurences handled the same way.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 10

•

21 years ago

>"... which has been sent as plain >text, but appears to be either binary data or non-ascii encoding." are you aware that there are very few languages for which ASCII suffices?

Jeff Van Boxtel

Comment 11

•

21 years ago

Now I'm just a common user, so I don't know all that much about this stuff, but in the situation where the text/plain is being specified as the default MIME type because of a misconfigured server, isn't it normal that the encoding is either unspecifed or us-ascii? If there is any other encoding or character-set wouldn't those be specifically identified in the header? This feature of detecting possible mislabeled MIME types should still be possible in the cases of us-ascii or non-specified character sets. While using this only in the case of text/plain + us-ascii may not correct all of the occurences, I think it would go a long way towards solving the problem. Even if finding a safe way to determine if a file might be binary or not is blocking this feature, I think it should still be implemented as an option. For example, if I did not want my web browser to handle text/plain content, but would rather such files open in my text-editor, I should be able to do that either by selecting the default action the first time I navigate to such a file, or by changing the default behavior in the preferences dialog.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 12

•

21 years ago

> For example, if I did not want my web browser to handle text/plain content That's a separate issue from this bug... The problem with comment 11 is the same as with the previous comments. Say I am serving up a Japanese plaintext document. I server it up as text/plain. I do not send a charset header (usual operating procedure and all). It renders just fine on everyone's browsers because of charset autodetection. How do we avoid breaking this scenario?

Jeff Van Boxtel

Comment 13

•

21 years ago

According to this source: http://www.faqs.org/rfcs/rfc2046.html "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII." and also "messages containing characters coded according to other versions of ISO 646 than US-ASCII and the 1991 IRV, or using code-switching procedures (e.g., those of ISO 2022), as well as 8bit or multiple octet character encodings MUST use an appropriate character set specification to be consistent with MIME." So in the case of US-ASCII (or nothing sepcified, which is probably the most common case in the case of a misconfigured server) we check the data to see if it is binary and prompt the user. In all other cases we can trust the character-set specified in the header.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

21 years ago

> According to this source: http://www.faqs.org/rfcs/rfc2046.html Which is ignored by as many, if not more, people as the the HTTP RFCs....

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

21 years ago

By the way, I think we're all agreed that the only document that should get special treatment of any sort are text/plain documents coming in over HTTP with no charset specified. The remaining questions are: 1) What fraction of such documents are actually ASCII plaintext? 2) What fraction of such documents are actually random "binary" data? 3) What fraction of such documents are non-ascii plaintext? If the answers to these show that category #3 is not too big, we could try to make some changes to URILoader to handle this ("could" as in "I won't mind if someone does"). Have to put some thought into how the whole dialog-posing mess would work... That's sort of tied in to the general channel-suspendability stuff, unless we plan to force the dialog modal and hope for the best.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 16

•

21 years ago

re comment 13: I seem to recall that HTTP specifies a default charset of ISO-8859-1... >That's sort of tied in to the general channel-suspendability stuff that needs to be fixed anyway...

Dan Tobias

Comment 17

•

21 years ago

If you serve a Japanese (or other non-ASCII) document as text/plain with no charset header, then you're relying on browser error-correction to properly render it. However, from a strict standards standpoint, charset "sniffing" is just as evil as content-type sniffing, isn't it? On another tangent, is it proper by the standards to serve a data stream as "text/plain" if it intentionally uses ASCII control characters in their defined (though little used these days) meanings in the original 1960s-era ASCII standards? I remember from my '80s college days common use of "cute tricks" like carriage returns without linefeeds, backspaces, and the bell character (#7) to do gimmicky animations and special effects, for instance in "plan" files displayed from the "finger" command on timesharing computers. Then, once dialup BBSs became commonplace, they used lots of ANSI escape codes, but I guess those wouldn't count as "plain text". I myself sometimes use the delimiter characters from #28 to #31 (FS, GS, RS, US) to separate fields in formatted data sent as the output of one program for the purpose of being processed by another; what MIME type would be appropriate for such a thing?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 18

•

21 years ago

> However, from a strict standards standpoint, charset "sniffing" is just as evil Sure. Which is why I'm willing to break it if the other case is a lot more common. > what MIME type would be appropriate for such a thing? Probably a text/x-something mimetype unless you register a subtype of text/.

Dominik Geisler

Comment 19

•

21 years ago

I think that the problem of, for example, UTF-16 text formats is not that important. Since the problem usually occurs only on a very limited number of file types, the browser could be programmed only to scan the text for files with .rar, .ace and a few of the other problematic file types. The file would only be declared application/octet stream if both the extension AND the file sniffling AND the user interaction suggests that the MIME type is specified wrongly. So you have MIME type | Action ================================================================== text/plain | display octet stream | save text/plain with octet | 'guessed' text -> display extension (e.g. .rar) | 'guessed' octet & user wants display -> Display | 'guessed' octet & user wants save -> save So even rar-files with UTF-16 text would be displayed properly, although that would require user interaction. I doubt there many UTF-16-rar-plain-text-files though, I ever doub there is one such file on the internet. And even that file would be handled correctly by the intelligent user. In bug 126782, a detection method has been programmed in the fix (http://bugzilla.mozilla.org/attachment.cgi?id=135573&action=view). Could that one be used here?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 20

•

21 years ago

> I think that the problem of, for example, UTF-16 text formats is not that > important. You're also very clearly not East Asian... Please don't impose your ethnocentric biases on the whole world. There are a lot more people to whom that problem is important than live in all of Europe. I'd still like some numbers in response to my question in comment 15. Sweeping generalizations are not numbers. > Since the problem usually occurs only on a very limited number of > file types, the browser could be programmed only to scan the text for files > with .rar, .ace and a few of the other problematic file types. No. There will be no hardcoding of extensions in this code. The bug 126782 mechanism is exactly what would be used here. That is, if we decide the "text/plain" type is "unreliable" we would simply run the unknown type detection code on it instead.

Dominik Geisler

Comment 21

•

21 years ago

Boris Zbarsky, With the hardcoding of extensions, UTF-16 would not be a big issue. I am not 'ethnocentrically biased', that's why I made this comment. If you hardcode extensions, you wouldn't have to program all non-ASCII encodings and still have a solution working perfectly. As I said, only plain text files .RAR files from asia would require user interaction. I thought everyone could live with that, as I could even live with automatically declaring .rar files octet/stream (but since developers won't, I won't bother with that). Besides, if you refuse to hardcode anything, all files, e.g. .html, would have to be scanned in ASCII, UTF-7, UTF-8, UTF-16, etc. until a fitting encoding is found. I can't see why that would be a good and fast solution.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 22

•

21 years ago

> Besides, if you refuse to hardcode anything as I understood bz, the text/plain MIME type would be hardcoded. so .html would not be scanned unless they were sent as such a type. let me add that I would love to mark this bug wontfix.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 23

•

21 years ago

Dominik, why are you so focused on .rar files? This is a problem with a lot more than those... The reason I refuse to hardcode extensions is that when a new extension appears (like .dmg, for example), none of the web servers will know about it so will send it as text/plain AND Mozilla will treat it as text/plain. Not useful, really. Yeah, we could play whack-a-mole and have older Mozillas not be able to deal with these new extensions, but that's not a good alternative in my mind.

Daniel Jackson

Reporter

Comment 24

•

21 years ago

Just to clarify: the "bug" described would work perfectly in correctly identify invalid "text/plain". The text/plain MIME type is only supposed to be valid for ASCII-compatible octet streams (e.g. any ASCII codepage or Unicode as UTF-8). The only issue we have, is that some servers *incorrectly* serve non-ASCII-compatible text as text/plain (when it's supposed to be text/something-else) and with no "charset" part in the MIME-type. However, it is the case that many other character encodings are actually upwardly-compatible with ASCII, in that they don't use (at least most of) the ASCII control codes to encode characters. For example, if I understand Shift-JIS encoding properly, then just by accepting the escape character (27) as a permitted control character, Shift-JIS would also work with this proposed scheme. I believe many other character sets would too. So the issue is with non-ASCII compatible character sets that make use of the octet values 0-8, 14-26, or 28-31. Which character sets are actually affected by this? UTF-16 or UTF-32 *incorrectly* sent as text/plain would indeed use ASCII control codes, but I thought these encodings typically had a byte-order-mark at the start of the file. If they don't, then what would the browser do with them anyway - it should really display them as US-ASCII (and I can't see a manual "UTF-16" encoding option in Firebird anyway). The real question is: How often is this actually a problem? - Plain text files are only a small fraction of what most users view - then the fraction of those files in a charset that uses octet values {0-8, 14-26, 28-31} (or no BOM if UTF-16/32) - then the fraction of those files incorrectly sent as "text/plain" and with no "charset" header If it's quite rare, then would it really be a problem if the user was (correctly) informed the content is invalid "text/plain", and they can select "view as text" anyway. (I assume they'll also have to select an encoding manually as the server didn't tell them what the file was, perhaps they could have a combobox for the encoding next to "view as text", or even perform any charset auto-detection wizardry if, and only if, it's got this far). One other mitigating feature could be the UI option: "[ ] Do this automatically for files with this extension from now on." (this could probably integrate nicely into the list of file types under Tools | Options | Download) For example, this could allow the user to build up their own personal list of preferences to: * Always "Save" files that are incorrectly sent as "text/plain" which have the extension ".rar" * Always "View as text" files that are incorrectly sent as "text/plain" which have the extension ".txt" (Note: Absolutely no "hard-coding" of extensions, just allowing the user to have a per-extension preference for dealing with files incorrectly marked as text/plain).

Jeff Van Boxtel

Comment 25

•

21 years ago

> 1) What fraction of such documents are actually ASCII plaintext? According to the standard, all such documents should be. And for the most part this is true, although I've been watching lately, and I see ISO-8859-1 instead of US-ASCII most of the time. > 2) What fraction of such documents are actually random "binary" data? This is a small percentage of misconfigured servers. > 3) What fraction of such documents are non-ascii plaintext? None. As is defined by the standard. I've looked and I could not find a single situation like this. If anyone else has ever seen it, please correct me. I don't know of anyway to collect scientific data on these kind of numbers, otherwise I would do it. The biggest point about this idea, as Daniel is trying to express I believe, is that it prompts the user in cases of ambiguity so that the user can decide how to handle the situation. It shouldn't matter so much how often these cases happen because this feature would give the user an option of how to handle it in the future. It doesn't break any functionality, the worst it could do is bother the user once with a dialog to choose "handle this way always". The default action could even be to handle it as it is handled now.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 26

•

21 years ago

> The default action could even be to handle it as it is handled now. Not any time in the near future... that would depend on a "view in browser" option being available in the dialog. >> 3) What fraction of such documents are non-ascii plaintext? >None. As is defined by the standard. I've looked If I may ask, where did you look? I agree this is hard to solidly quantify -- hence the quandary.

Jeff Van Boxtel

Comment 27

•

21 years ago

I sometimes browse various Japanese websites and I have never seen such an occurence. In addition, I did some google searches on .jp domains for txt, text, etc and came up with several text/plain in Shift-JIS such as: http://rawfish.at.infoseek.co.jp/txt/log0000/00000001.txt http://www.rtpro.yamaha.co.jp/RT/docs/relnote/Rev.05.01/relnote_05_01_14.txt http://www.iodata.jp/lib/product/c/cdv_104.txt http://www.iodata.jp/lib/product/c/cdv_102.txt Or EUC: http://sky-mue.jp/linux/plamo/plamo-HOWTO.txt http://kwatch.at.infoseek.co.jp/web/http.txt http://www.suplex.gr.jp/~hourin/signature.txt And even some English on Japanese servers in ISO-8859-1: http://geta.ex.nii.ac.jp/getaN2002/release/drep_200202.txt but wasnt able to find any as described. Ofcourse, I didn't search for other languages since I can't read them, so I could be overlooking a huge sample of data.

S Woodside

Comment 28

•

21 years ago

The nice thing about this approach in general (good idea!) is like bz said: If you have a file with MIME-type text/plain and no text-encoding set, you're out of the standards zone and into the Guess Zone. That means that any and all kinds of sniffing are *permissable* (according to spec). We could at that point special-case things. For example, we have very commonly this problem with .dmg files on the Mac. So I would definitely think that if the file is in the Guess Zone, sniffing for certain features would definitely be OK. Is there any way to detect a UTF-8 or UTF-16 file? Can we sniff for a DMG or an ASF file (those are commonly unset on servers). My point is that this bug defines a Guess Zone. So now we can finally special-case certain highly-complained-about types like DMGs without breaking the specs.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 29

•

21 years ago

> If you have a file with MIME-type text/plain and no text-encoding set, you're > out of the standards zone and into the Guess Zone. Actually, that's not true. The standards clearly say that this means the text is ASCII-encoded. In any case, per discussion with biesi and darin, the following is the plan: If incoming data meets the following criteria: 1) HTTP channel 2) "text/plain" request header (literal, case-sensitive comparison) 3) This is the initial content dispatch Then reset the content-type on the channel to "application/x-maybe-text" (or some such). Necko will provide a stream converter (probably just subclass or superclass of nsUnknownDecoder) that will take that type as input and apply the following heuristics: A) If the data starts with a BOM, treat it as text/plain B) Otherwise, apply the "last ditch sniff" algorithm from nsUnknownDecoder to detect it as either text/plain or application/octet-stream. Note that this will NOT detect the data as any other types (in particular as any types we would render inline). Embeddors are of course free to register their own converters for the relevant contractids and override the converter Necko registers. I may get to this by the 1.7a cycle if no one posts a patch earlier. There are other various bugs in bugzilla that should probably be dependent on this one... (eg there is a similar Camino bug).

Daniel Jackson

Reporter

Comment 30

•

21 years ago

bz: That's great news! Although, may I suggest also allowing char "27" in the "IS_TEXT_CHAR" macro from nsUnknownDecoder as this would then allow Shift-JIS (and probably other non-ASCII, yet ASCII-compatible) encodings to be treated as text, and not binary. (I know they will not be strictly text/plain, but that will be what was intended)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 31

•

21 years ago

Yeah, we should do that.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 32

•

21 years ago

Attached patch Ok, I got bored (obsolete) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Comment 33

•

21 years ago

Comment on attachment 136882 [details] [diff] [review] Ok, I got bored biesi? Darin? Thoughts?

Attachment #136882 - Flags: superreview?(darin)

Attachment #136882 - Flags: review?(cbiesinger)

Ben Goodger (use ben at mozilla dot org for email)

Comment 34

•

21 years ago

I applied this patch then attempted to load kerz's "Gold Standard" "Idiot on Silverado" wmv test: http://static1.stileproject.com/jump/4e46800c366751035c3a771fb7f50137/help6.wmv It still displays as text.

Jason Kersey

Comment 35

•

21 years ago

For the good of the web, we must be able to display this movie. Thank's for your work on this so far...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 36

•

21 years ago

Yep. That server sends: Content-Type: text/plain; charset=ISO-8859-1 which fails test #2 in comment 29. _If_ this is the default Apache setup, I would be willing to change test #2 to also check for this literal string....

Boris Zbarsky [:bzbarsky]

Assignee

Comment 37

•

21 years ago

To make it clear, if that's the default misconfiguration Apache ships with, I'm willing to work around it. If this server has gone out of its way to be more misconfigured than the average server, then that's a different issue entirely.

Ben Goodger (use ben at mozilla dot org for email)

Comment 38

•

21 years ago

So, if it meets criteria #2 (sending text/plain) should that not mean your "Last Ditch Sniff" should detect binary data? What does server configuration have to do with that?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 39

•

21 years ago

> So, if it meets criteria #2 (sending text/plain) It does NOT mean criterion #2. Please read what I said again. Server configuration has everything to do with this. I am implementing a workaround for a bug in the default configuration of the most popular web server out there. I am _not_ attempting to solve all possible ways in which someone can fuck up a web server to lie to web browsers. Now I suspect that the default config of Apache may end up sending Content-Type headers in some cases. If that's the case, we need to rethink criterion #2 to account for that a bit...

Ben Goodger (use ben at mozilla dot org for email)

Comment 40

•

21 years ago

Oh, I didn't realize your test #2 was explicit "text/plain" (and thus would not catch "text/plain *"

Boris Zbarsky [:bzbarsky]

Assignee

Comment 41

•

21 years ago

Right. The whole point is that "text/plain; charset=Big5", eg, should not go through the sniffer, since it would be incorrectly detected as binary...

scratch

Comment 42

•

21 years ago

i can't say for sure that it's the *default*, but all testcases I can find do indeed send: Content-Type: text/plain; charset=ISO-8859-1

Ben Goodger (use ben at mozilla dot org for email)

Comment 43

•

21 years ago

I ran: http://www.bengoodger.com/video/funny_cats.wmv through the http://www.rexswain.com/ 's HTTP viewer, and it reports a Content-Type field of "text/plain" but a build bz's patch still shows raw data. I don't know enough about Apache to say if this affects anything, but my bog standard Apache install on this machine has this line in httpd.conf: AddDefaultCharset ISO-8859-1

Boris Zbarsky [:bzbarsky]

Assignee

Comment 44

•

21 years ago

> http://www.bengoodger.com/video/funny_cats.wmv My patched build is detecting this as application/octet-stream.... you did rebuild in all the relevant places (uriloader/, netwerk/, docshell/build/), right?

Ben Goodger (use ben at mozilla dot org for email)

Comment 45

•

21 years ago

ooh, not docshell. thanks. kerz points out this bug in the apache database: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421

Boris Zbarsky [:bzbarsky]

Assignee

Comment 46

•

21 years ago

ah, I see. So that's a server version issue... sounds like we need to check for both of those literal strings.

Ben Goodger (use ben at mozilla dot org for email)

Comment 47

•

21 years ago

OK, I've properly tried your patch now and it works like a charm, except for the old OSHelperAppService bugs which I can look at separately (on Windows it's showing the helper for WMV files as "WMVFile" which I think implies it's looking for the pretty app name in the wrong spot in the windows registry).

Christian :Biesinger (don't email me, ping me on IRC)

Comment 48

•

21 years ago

ben: yes... the oshelperappservice uses HKCR\.<ext>\@ as the helper app description, and HKCR\<thatvalue>\@ as mime type description (where @ means "default value") do you have a better suggestion?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 49

•

21 years ago

Comment on attachment 136882 [details] [diff] [review] Ok, I got bored + httpChannel->SetContentType( + NS_LITERAL_CSTRING("application/vnd.mozilla.maybe-text")); could you maybe add a log comment around here, that we're checking if this is really text? also... is application/vnd.mozilla.maybe-text a good type to use? It's not a registered type per ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/application/ (hardly surprising :) ), I think you should use some x- type. + void DetermineContentType(nsIRequest* aRequest); hm... can you add a "virtual" here, to make clear that this is a virtual function? + (((unsigned char)ch) > 31 thank you :) + // First, check for a BOM. you have no check for a UTF-32 BOM... doesn't mozilla support that? +CreateNewBinaryDetectorFactory(nsISupports *aOuter, REFNSIID aIID, void **aResult) is there a reason to avoid NS_GENERIC_FACTORY_CONSTRUCTOR?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 50

•

21 years ago

> could you maybe add a log comment err, I meant "a LOG statement"

Boris Zbarsky [:bzbarsky]

Assignee

Comment 51

•

21 years ago

> could you maybe add a log comment around here, Sure thing. > is application/vnd.mozilla.maybe-text a good type to use? Good question. I'll change it to a x- type that should still not cause random namespace collisions. > can you add a "virtual" here, Yep. > you have no check for a UTF-32 BOM True. I'll add it -- I finally convinced myself this can be done in a safe way. > is there a reason to avoid NS_GENERIC_FACTORY_CONSTRUCTOR? Following the style of the file... no other reason.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 52

•

21 years ago

putting this finally in browser:File handling where it belongs; also -> bz

Assignee: blake → bz-vacation

Component: General → File Handling

Product: Firebird → Browser

Version: unspecified → Trunk

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #136882 - Flags: superreview?(darin)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 53

•

21 years ago

Attached patch Updated to comments (obsolete) — Details — Splinter Review

Attachment #136882 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #136945 - Flags: superreview?(darin)

Attachment #136945 - Flags: review?(cbiesinger)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 54

•

21 years ago

Comment on attachment 136945 [details] [diff] [review] Updated to comments + (buf[0] == 0xFF && buf[1] == 0xFE && buf[2] == 0 && buf[3] == 0) || // UCS-4LE this will already match the UTF-16LE check, so this line can be removed + (buf[0] == 0xFE && buf[1] == 0xFF && buf[2] == 0 && buf[3] == 0)) { // UCS-4 this will match the UTF1-6BE check, so it can also be removed with the lines removed, r=me

Attachment #136945 - Flags: review?(cbiesinger) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Priority: -- → P2

Summary: Prompt user about invalid text/plain content - solving most incorrect MIME type issues → [FIX]Prompt user about invalid text/plain content - solving most incorrect MIME type issues

Target Milestone: --- → mozilla1.7alpha

scratch

Comment 55

•

21 years ago

> ben: yes... the oshelperappservice uses HKCR\.<ext>\@ as the helper app > description, and HKCR\<thatvalue>\@ as mime type description (where @ means > "default value") > > do you have a better suggestion? filetype = HKCR\.<ext>\@ defaultapp = HKCR\<filetype>\shell\@ helperapp = HKCR\<filetype>\shell\<defaultapp> helperappdescription = HKCU\Software\Microsoft\Windows\ShellNoRoam\<helperapp> If helperappdescription does not exist, then we should display helperapp. This is the default windows behavior. Displaying either the application name if available or the application filename if not is much more intuitive than displaying the internal filetype name and saying that's the program you're opening it with.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 56

•

21 years ago

Let's have a separate bug on the Windows thing, ok? ;)

Ben Goodger (use ben at mozilla dot org for email)

Comment 57

•

21 years ago

Christian, I'll do some research and file another bug.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 58

•

21 years ago

ben: scratch filed bug 227671 about the application name question

Ben Goodger (use ben at mozilla dot org for email)

Comment 59

•

21 years ago

OK, this new patch works great with the "Idiot on Silverado" video (Content-Type: text/plain; charset=ISO-8859-1). Please don't attempt to load the link from my comment above anymore, it now seems to display undesirable content o_o;;

Hixie (not reading bugmail)

Comment 60

•

21 years ago

So... what does this patch do to our results on these tests?: http://www.hixie.ch/tests/adhoc/http/content-type/ What does this faking of the MIME type do to Page Info and so forth? I really hate how we sometimes show our autodetected MIME types as if it was the server type. Anyway. Interesting solution. Separate from that though we do IMHO still need the View | Content Type ...submenu (analogous to the View | Encoding submenu for the analogous problem).

Christian :Biesinger (don't email me, ping me on IRC)

Comment 61

•

21 years ago

Ian: the behaviour on those files does not change with this patch For files where behaviour does change, the helper app dialog will be shown and indicate a type of application/octet-stream. (hence, page info never comes into play) hmm... why does mozilla display application/x-view-source files?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 62

•

21 years ago

> why does mozilla display application/x-view-source files? Because we have a handler registered for that content type -- that's how we do view-source, y'now. ;)

Hixie (not reading bugmail)

Updated

•

21 years ago

QA Contact: ian

Ben Goodger (use ben at mozilla dot org for email)

Comment 63

•

21 years ago

Attached patch bz's patch plus a little more (obsolete) — Details — Splinter Review

This adds to bz's patch an extension-based lookup in nsExternalHelperAppService::DoContent as a last ditch attempt to locate the One True Type from the OS. bz/biesi - I've included the MIME type application/x-vnd.moz.maybe-binary as a literal string in both cases, please advise if there's a common header or something where it'd be better defined.

Attachment #136945 - Attachment is obsolete: true

Ben Goodger (use ben at mozilla dot org for email)

Updated

•

21 years ago

Attachment #137227 - Flags: superreview?(bz-vacation)

Attachment #137227 - Flags: review?(cbiesinger)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 64

•

21 years ago

Ben, did you base that on the patch _before_ I addressed some review comments? The type could go in nsMimeTypes.h, I suppose.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 65

•

21 years ago

Comment on attachment 137227 [details] [diff] [review] bz's patch plus a little more this does not address most parts of comment 49... also: + if (aMimeContentType == "application/x-vnd.mozilla.maybe-binary") { aMimeContentType is "const char*". this comparison does therefore not do what you want. + nsXPIDLCString mimeType; + if (mimeInfo) + mimeInfo->GetMIMEType(getter_Copies(mimeType)); can you move this into the "if (!fileExtension.IsEmpty())" block? (except the nsXPIDLCString that should stay outside of course) I would also like a LOG statement to show what mime type the nsIMIMEInfo has stored.

Attachment #137227 - Flags: review?(cbiesinger) → review-

Christian :Biesinger (don't email me, ping me on IRC)

Comment 66

•

21 years ago

oh, about the mimetype question... yes, you could put it in nsMimeTypes.h as bz suggested, or just use the literal string... personally, I don't care either way

Ben Goodger (use ben at mozilla dot org for email)

Comment 67

•

21 years ago

Huh. I'm pretty sure I reversed the original patch and then applied the newer one. I must have messed something up. *tries again*

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #137227 - Flags: superreview?(bz-vacation) → superreview-

Ben Goodger (use ben at mozilla dot org for email)

Comment 68

•

21 years ago

Attached patch newer patch (obsolete) — Details — Splinter Review

This one is really based on bz's second patch, with biesi's comments rolled in.

Attachment #137227 - Attachment is obsolete: true

Ben Goodger (use ben at mozilla dot org for email)

Comment 69

•

21 years ago

Comment on attachment 137481 [details] [diff] [review] newer patch oops. one moment. disregard this patch.

Attachment #137481 - Attachment is obsolete: true

Ben Goodger (use ben at mozilla dot org for email)

Comment 70

•

21 years ago

Attached patch new patch (obsolete) — Details — Splinter Review

Christian :Biesinger (don't email me, ping me on IRC)

Comment 71

•

21 years ago

Comment on attachment 137485 [details] [diff] [review] new patch + if (!nsCRT::strcmp(aMimeContentType, APPLICATION_MAYBE_BINARY)) { do we guarantee anywhere that aMimeContentType is lowercase? it doesn't look like it. please use nsCRT::strcasecmp. + mContentType = APPLICATION_MAYBE_BINARY; huh? will that do any good?

Ben Goodger (use ben at mozilla dot org for email)

Comment 72

•

21 years ago

Christian - I replaced the place where the content type was set to octet-stream with maybe-binary. I assumed that bz's original patch relied on this code to set to octet-stream so that the UCT dialog would be presented, and so switched to maybe-binary. bz - did I misinterpret your origianl patch?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 73

•

21 years ago

oh. hmmm. hmmmmm. well I fear that at that point, this will cause another streamconverter to get instantiated, so you get into an infinite loop... bz knows this code better than me, though, I'll let him comment. (maybe we need two mimetypes - one to trigger the check for non-ascii characters, and one to tell the helper app dialog to get the type from the extension)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #136945 - Flags: superreview?(darin)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 74

•

21 years ago

Comment on attachment 137485 [details] [diff] [review] new patch In the future, patches using the -p switch and more context (-u8 at least) are much easier to review.... And there was no need for -w here, either, imo. >+ (((unsigned char)ch) > 31 || (9 <= ch && ch <= 13) || ch == 27) I know I wrote this code... but could you put () around "ch" everywhere in that line? So: (((unsigned char)(ch)) > 31 || (9 <= (ch) && (ch) <= 13 || (ch) == 27) >+ mContentType = APPLICATION_MAYBE_BINARY; biesi, this part is correct. We do have two types involved here -- maybe-text and maybe-binary. The former triggers this stream converter, the latter is magic to the external helper app handler. Someone could indeed create a stream converter for maybe-binary, but that's ok if they want to do that.... >+ // the type off the channel. > nextLink->mContentType = aOutContentType; >- } This is indented correctly, right? Damn -w. ;) >+ LOG(("OS-Provided mime type '%s' for extension '%s'\n", >+ fileExtension.get(), mimeType.get())); That should pass mimeType.get() first, no? sr=bzbarsky on Ben's additional changes with that addressed. The rest of the patch still needs sr (and moa from darin), so not marking the sr flag myself...

Attachment #137485 - Flags: superreview?(darin)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 75

•

21 years ago

>We do have two types involved here -- maybe-text >and maybe-binary. er, oops - I missed that. :( but, why is maybe-binary called maybe-binary? don't we know that it is binary (because it failed the "is text" check)?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 76

•

21 years ago

Better name suggestions? maybe-octet-stream?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 77

•

21 years ago

application/x-guess-type-from-extension?

alanjstr

Comment 78

•

21 years ago

There's no guarantee that there will be an extension. Streamed videos or content management systems, for example. I suppose we could do application/x-guess-type-from-extension and fall into x-maybe-octet-stream if that fails, too.

Mike Pinkerton (not reading bugmail)

Comment 79

•

21 years ago

*** Bug 221877 has been marked as a duplicate of this bug. ***

Christian :Biesinger (don't email me, ping me on IRC)

Comment 80

•

21 years ago

> There's no guarantee that there will be an extension. indeed. please read the patch. the question is just how to name that thing.

Ben Goodger (use ben at mozilla dot org for email)

Comment 81

•

21 years ago

Attached patch Patch for FB_0_8_BRANCH *ONLY* (obsolete) — Details — Splinter Review

I've rolled in biesi and bz's comments except the content type name change into this patch which I plan to check in on the 0.8 Branch. We can continue to discuss the content type and check the further modified patch in for 1.7a

Ben Goodger (use ben at mozilla dot org for email)

Comment 82

•

21 years ago

Christian... should that be application/x-vnd.mozilla.guess-type-from-extension ?

Dan Tobias

Comment 83

•

21 years ago

I don't think any MIME type needs both a "x-" prefix and a "vnd." one... "x-" means "unregistered", and no further structure is necessary after that. ("vnd." is for registered MIME types in the vendor subtree.)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 84

•

21 years ago

> don't think any MIME type needs both a "x-" prefix and a "vnd." one... "x-" > means "unregistered", and no further structure is necessary after that. Except if you're trying to be a good citizen and ensure that your x- crap won't collide with anyone else's x- crap. I made the type what I made it very purposefully. I don't really care what we name the type, as long as we clearly document in nsMIMETypes.h what its "magic" effect is and as long as it starts with "x-vnd.mozilla."

Hixie (not reading bugmail)

Comment 85

•

21 years ago

are we really sure using magic MIME types is the way to go here? :-) /me starts planning tests that use those MIME types to see what happens

Christian :Biesinger (don't email me, ping me on IRC)

Comment 86

•

21 years ago

> application/x-vnd.mozilla.guess-type-from-extension that sounds good.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 87

•

21 years ago

> are we really sure using magic MIME types is the way to go here? :-) Yes, we are. It's the simplest way to do it, and they are not very magic (you could set up a plugin or other internal handler for any of the types involved any time you wanted to, eg; you just can't set up a helper for the application/x-vnd.mozilla.guess-type-from-extension type). And the name was carefully chosen such that you better not expect it to work in any particular way.

Matthew Mastracci

Comment 88

•

21 years ago

I've read through the entire discussion and it looks like things are shaping up alright. I would highly recommend taking steps preventing any active content from loading automatically based on mime-type sniffing (offering to download is okay, however). Fortunately, the patch here seems to only apply by making files sent as text/plain appear with a download dialog. Here are some articles from the Freenet-devel mailing list discussing how mime-type autodetection can affect privacy: http://article.gmane.org/gmane.network.freenet.devel/4939/match=internet+explorer http://article.gmane.org/gmane.network.freenet.devel/1580/match=autodetect+mime It's impossible to use IE in any sort of anonymous way on Freenet (and I assume other anonymous networks) due to "mime type sniffing". You can easily provide an anonymity-destroying extension file and pass that as an image! Just a couple of points to keep in mind.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 89

•

21 years ago

Comment 88 has nothing to do with this bug. This bug is about the main content area load. The issues mentioned in comment 88 have to do with subsidiary script/plugin loads to which the code discussed in this bug does not apply. Please don't make irrelevant comments like that; file separate bugs on separate issues. If you're not sure it's a separate issue, file a separate bug and add a dependency.

alanjstr

Comment 90

•

21 years ago

There appears to be a case sensitivity issue with the charset. http://forums.mozillazine.org/viewtopic.php?t=41893

Boris Zbarsky [:bzbarsky]

Assignee

Comment 91

•

21 years ago

Yeah, yeah. See what I wrote about what this patch is aiming to do (quoted in that thread). The case-sensitive comparison is very much purposeful.

alanjstr

Comment 92

•

21 years ago

Yes, I see that you're doing it case-sensitive because you're handling it exactly as Apache 2.x sends it. And my question is: why not make it case insensitive? Forgive me for being dumb, but how do we handle TEXT/plain or text/Plain? I'm not trying to say we should handle every single possibility.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 93

•

21 years ago

the point is: this is only done to work around the default apache misconfiguration, and not against anything else.

Ben Basson

Comment 94

•

21 years ago

That may well be the point, but if you're going to the lengths to implement a fix like this, surely you might as well add a simple case conversion statement, and then everyone is even happier.

Hixie (not reading bugmail)

Comment 95

•

21 years ago

Not everyone. This entire codepath is a standards violation, we absolutely want to keep the violating to the strict minimum required to hit the most common case. If people go out of their way to misconfigure their servers even further, then tough -- it won't work.

alanjstr

Comment 96

•

21 years ago

Well, if we limit the check to the first 10 characters we can ignore the charset regardless.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 97

•

21 years ago

The simple reason to NOT make this a case-insensitive check is that I'd rather like to avoid detecting actual valid ISO-8859-1 text content as application/octet-stream. Sorry, but that check is staying case-sensitive. alanjstr, if we limit the check to the first ten chars, then we will fail on type strings like in comment 41. See also the first 20 or so comments in this bug for more background. One more comment from someone who clearly hasn't bothered to carefully read the bug, and I'm reassigning this to whoever did the commenting.

Donnie Roberts

Comment 98

•

21 years ago

Please add the charset "iso-8859-1" (lowercased) as it is the default on my Debian/Apache server as well. The only mention of the the charset in httpd.conf is: # Default charset to iso-8859-1 (http://www.apache.org/info/css-security/). AddDefaultCharset on HTTP/1.1 200 OK Date: Thu, 25 Dec 2003 20:41:32 GMT Server: Apache/1.3.29 (Debian GNU/Linux) Last-Modified: Sat, 22 Nov 2003 23:38:46 GMT ETag: "1ed36-28e0-3fbff386" Accept-Ranges: bytes Content-Length: 10464 Connection: close Content-Type: text/plain; charset=iso-8859-1

Boris Zbarsky [:bzbarsky]

Assignee

Comment 99

•

21 years ago

Yeah, that also seems to match the bug report mentioned in comment 45. Can do that.

Joe Infla

Comment 100

•

21 years ago

*** Bug 209343 has been marked as a duplicate of this bug. ***

Dominik Geisler

Updated

•

21 years ago

Blocks: 135411

alanjstr

Comment 101

•

21 years ago

I do not think that this she Block the evangelism bugs, since there are probably quite a few of those and fixing this won't fix those servers, merely hide the symptoms.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #137485 - Attachment is obsolete: true

Attachment #137485 - Flags: superreview?(darin)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 102

•

21 years ago

Attached patch Patch for trunk, updated to all comments (obsolete) — Details — Splinter Review

Attachment #137535 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Comment 103

•

21 years ago

Comment on attachment 138491 [details] [diff] [review] Patch for trunk, updated to all comments biesi, darin, could you review?

Attachment #138491 - Flags: superreview?(darin)

Attachment #138491 - Flags: review?(cbiesinger)

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

21 years ago

Attachment #138491 - Flags: review?(cbiesinger) → review+

Darin Fisher

Comment 104

•

21 years ago

Comment on attachment 138491 [details] [diff] [review] Patch for trunk, updated to all comments >Index: netwerk/mime/public/nsMimeTypes.h >+#define APPLICATION_GUESS_TYPE_FROM_EXTENSION "application/x-vnd.mozilla.guess-type-from-extension" i think this mime type is excessively verbose. how about one of these instead: application/x-vnd.mozilla.guess-from-ext application/x-vnd.mozilla.from-extension application/x-vnd.mozilla.from-ext >Index: netwerk/streamconv/converters/nsUnknownDecoder.h i think this class deserves some decription. please explain why you are overriding DetermineContentType, etc. >+#define NS_BINARYDETECTOR_CID \ >+{ /* a2027ec6-ba0d-4c72-805d-148233f5f33c */ \ >+ 0xa2027ec6, \ >+ 0xba0d, \ >+ 0x4c72, \ >+ {0x80, 0x5d, 0x14, 0x82, 0x33, 0xf5, 0xf3, 0x3c} \ >+} >+ >+class nsBinaryDetector : public nsUnknownDecoder >+{ >+public: >+ nsBinaryDetector() >+ {} >+ >+protected: >+ virtual ~nsBinaryDetector() >+ {} >+ >+ virtual void DetermineContentType(nsIRequest* aRequest); >+}; seems like you should be able to compact the class declaration down to this: class nsBinaryDetector : public nsUnknownDecoder { protected: virtual void DetermineContentType(nsIRequest* aRequest); }; >Index: uriloader/base/nsURILoader.cpp >+ if (httpChannel && mContentType.IsEmpty()) { >+ // This is our initial dispatch, and this is an HTTP channel. Check for >+ // the text/plain mess. >+ nsCAutoString contentType; >+ httpChannel->GetResponseHeader(NS_LITERAL_CSTRING("Content-Type"), >+ contentType); >+ // Make sure to do a case-sensitive exact match comparison here. Apache >+ // 1.x just sends text/plain for "unknown", while Apache 2.x sends >+ // text/plain with a ISO-8859-1 charset. Debian's Apache version, just to >+ // be different, sends text/plain with iso-8859-1 charset. Don't do >+ // general case-insensitive comparison, since we really want to apply this >+ // crap as rarely as we can. >+ if (contentType.Equals(NS_LITERAL_CSTRING("text/plain")) || >+ contentType.Equals( >+ NS_LITERAL_CSTRING("text/plain; charset=ISO-8859-1")) || >+ contentType.Equals( >+ NS_LITERAL_CSTRING("text/plain; charset=iso-8859-1"))) { i'm not sure i follow why you wouldn't want to just do a case-insensitive compare against "text/plain; charset=iso-8859-1" the only difference is that you would accept "Text/Plain..." as well. that doesn't seem like a big deal, right? sr=darin

Attachment #138491 - Flags: superreview?(darin) → superreview+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 105

•

21 years ago

> application/x-vnd.mozilla.guess-from-ext Sounds good. Will do that. > i think this class deserves some decription. Done. Also compacted the class decl as suggested. > i'm not sure i follow why you wouldn't want to just do a case-insensitive > compare against "text/plain; charset=iso-8859-1" Because if the type is _not_ "text/plain", it's not just Apache being dumb -- it's someone purposefully sending that type. In which case we really should follow what the server said.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 106

•

21 years ago

Attached patch Updated to darin's comments — Details — Splinter Review

Attachment #138491 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Comment 107

•

21 years ago

Checked in that patch.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Blocks: 226024

Christian :Biesinger (don't email me, ping me on IRC)

Comment 108

•

21 years ago

*** Bug 207154 has been marked as a duplicate of this bug. ***

Christian :Biesinger (don't email me, ping me on IRC)

Comment 109

•

21 years ago

*** Bug 218372 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Assignee

Comment 110

•

21 years ago

*** Bug 233465 has been marked as a duplicate of this bug. ***

Christian :Biesinger (don't email me, ping me on IRC)

Comment 111

•

21 years ago

*** Bug 233678 has been marked as a duplicate of this bug. ***

Xanthor Sccas

Comment 112

•

21 years ago

Is there an option to disable this behaviour ? Even a hidden pref ... (This is important for example if we actually want to see the data....)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 113

•

21 years ago

There is no such option. I would rather implement a "view in browser" option on the helper app dialog (which we need to do anyway) than waste time on a pref for this.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Blocks: 233047

Alexander Skwar

Comment 114

•

21 years ago

I just sent myself a mail with a RTF attachment for which my Windows XP system did not have a Content-Type set in the registry. The MUA sent the file with a Content-Type of application/x-vnd.mozilla.guess-from-ext; I would have expected that the Content-Type were application/octet-stream, since this is the Content-Type for "unknown"/unclassified data. Using Thunderbird Nightly 2004-03-21-00-trunk on Windows XP. Requesting to reopen bug.

Alex Bishop

Comment 115

•

21 years ago

(In reply to comment #114) > Requesting to reopen bug. This bug is fixed but has caused a regression so therefore it would be better to file a new bug.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 116

•

21 years ago

(In reply to comment #114) > I just sent myself a mail with a RTF attachment for which my Windows XP system > did not have a Content-Type set in the registry. The MUA sent the file with a > Content-Type of application/x-vnd.mozilla.guess-from-ext; heh. please file a new bug for that, and mention the bug# here.

Alexander Skwar

Comment 117

•

21 years ago

Okay. I submitted bug 238706.

Ben Apgar

Comment 118

•

21 years ago

It is still sometimes opening .rar files in a new window for me. I also cannot right-click and save some .rar files. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 If I go to this site http://www.dustydunes.com/ for example and click on the link for: http://ygs.pwp.blueyonder.co.uk/matt/maps/de_seaside2k.rar then Mozilla opens the contents in a new window instead of downloading them. If I go to http://zveris007.boom.ru/ for example and click the link for: http://zveris007.boom.ru/Plan2.part01.rar the Mozilla opens the contents in a new window instead of downloading them. Furthermore, if I right-click on the link and click "Save Link Target As..." a box pops up that says: "The link could not be saved. The web page might have been removed or had its name changed." But obviously it is there, since Mozilla could get it in text format. The availability of those files is questionable (for some reason?). Sometimes IE6 goes to the website's 404 not found page if I click directly on the link (and sometimes it downloads). IE6 also flags it as unavailable if you right-click and go to "Save Target as..." *I make no claims to the security of those sites or files, I just searched for .rar files to demonstrate. Yarrrr I'm a n00b at this reporting business!

scratch

Comment 119

•

21 years ago

this is the intended behavior, see the "solving most incorrect MIME type issues". see comment #39.

Ben Apgar

Comment 120

•

21 years ago

(In reply to comment #119) > this is the intended behavior, see the "solving most incorrect MIME type > issues". see comment #39. The intended behaviour is "display garbage". Great.

Ben Basson

Comment 121

•

21 years ago

"Display garbage" is better than "Open a gaping security hole". In any case, this bug is resolved. If you want to suggest that we break standards, discuss it on http://www.mozillazine.org or file a new bug.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 122

•

21 years ago

(this caused Bug 238706)

Keith Briscoe

Comment 123

•

21 years ago

I think I've found another "safe to assume without breaking standards" case. If the server sends text/plain but the A HREF link contains a TYPE="application/java-archive", then would it not be safe to assume that 1) the linked file is a java archive, or 2) the server and web page disagree, so it's a situation where you should be prompted? The reason I'm asking is that in a hosted environment web page authors may not have control over the server MIME types, and they may wish to use type="xxx" in cases where the server is wrong. This could also be a totally different bug, but I think it's a valid request for improvement. Mozilla 1.7 RC1 currently always takes the server as gospel regardless of the TYPE indicated in the link.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 124

•

21 years ago

Please file a separate bug on that. That may be worth doing... Note that we already support type in general, so this could be done without too much pain, probably.

Jean-Marc Desperrier

Updated

•

20 years ago

Blocks: 266410

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

20 years ago

Blocks: 285517

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Core → Core Graveyard

Ok, I got bored 21 years ago Boris Zbarsky [:bzbarsky] 14.49 KB, patch		Details \| Diff \| Splinter Review
Updated to comments 21 years ago Boris Zbarsky [:bzbarsky] 15.35 KB, patch	Biesinger : review+	Details \| Diff \| Splinter Review
bz's patch plus a little more 21 years ago Ben Goodger (use ben at mozilla dot org for email) 10.41 KB, patch	Biesinger : review- bzbarsky : superreview-	Details \| Diff \| Splinter Review
newer patch 21 years ago Ben Goodger (use ben at mozilla dot org for email) 12.12 KB, patch		Details \| Diff \| Splinter Review
new patch 21 years ago Ben Goodger (use ben at mozilla dot org for email) 12.13 KB, patch		Details \| Diff \| Splinter Review
Patch for FB_0_8_BRANCH ONLY 21 years ago Ben Goodger (use ben at mozilla dot org for email) 12.14 KB, patch		Details \| Diff \| Splinter Review
Patch for trunk, updated to all comments 21 years ago Boris Zbarsky [:bzbarsky] 18.73 KB, patch	Biesinger : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
Updated to darin's comments 21 years ago Boris Zbarsky [:bzbarsky] 20.67 KB, patch		Details \| Diff \| Splinter Review