Closed Bug 126782 Opened 23 years ago Closed 21 years ago

[FIX]Binary file with unknown type displayed as text/plain rather than saved

Categories

(Core Graveyard :: File Handling, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.6beta

People

(Reporter: shwag, Assigned: bzbarsky)

References

()

Details

Attachments

(1 file)

Going to only this particulair URL, mozilla will load the file in the browser
window, hence never making it to my hard disk w/ no work around.  I pasted the
URL into IE finally to download the file.  

All the other files on this page came as expected.
Win98SE, 2002022203, this file loads into the window for me too.  

Reporter:  You can then save the file with File->Save page as...

Observing the output from wget, there doesn't seem to be any indication in the
server headers of what this file actually is (i.e. no mime type).  Given that
the extension ".w02" is hardly well-known, and given that making assumptions
based on the extension is A Bad Thing, what _should_ moz do with it?
Severity: major → normal
I thought the File-->Save As might mess up the contents of the file, regarding
text to binary conversion.  

If you view the directory on the site that the file is stored in, you will see
that there are files named .W01 .W02 .W03, which are compressed files.  All of
the other files load properly.  There is even many other .W02 files which
downloads fine!  It is just that one link that loads wrong.  Weird, huh?  
> I thought the File-->Save As might mess up the contents of the file, 
> regarding text to binary conversion.  

I've done it many times with several formats (notably .asf, .wmv, both binary
formats) without a hitch.

However, you're correct.  Other files with the same extension elsewhere on the
site immediately pop up a "save file" dialog, but this one loads straight to the
browser window.  The page info dialog shows that moz thinks the file is text/plain.

So, a question for the developers:  how does moz decide what to do with these
files?  And how come it's doing different things with similar files from the
same site?

Assignee: bbaetz → law
Status: UNCONFIRMED → NEW
Component: Networking: FTP → File Handling
Ever confirmed: true
QA Contact: benc → sairuh
Summary: MIME bug maybe ? → Binary file with unknown type displayed as text/plain rather than saved
ftp doesn't have content type, so we guess. bz, is the mime service getting this
wrong here?
This is sort of funny, actually...  The mime service says nothing about this
file (since it has no useful extension), so it gets passed on to the unknown
content decoder.

The way the unknown content decoder tells text/plain apart from
application/octet-stream is by looking for null bytes.  The first null byte in
this file is the 1168th byte.  The unknown content decoder only looks at the
first 1024 bytes of the file (since 99% of the time that's enough to determine
what needs to be determined).  In fact we're considering decreasing that 1024 to
something like 512 or 256 so it won't be so eager to decide things are HTML...

Over to rpotts... I'm not sure what a good solution is here, exactly.  No matter
what we do, unless we sniff the entire file there is no way to tell whether it's
text or binary data (one can always come up with a more pathological case).

Maybe we should special-case FTP somehow or something?
Assignee: law → rpotts
Component: File Handling → Networking
> The unknown content decoder only looks at the first 1024 bytes of the file 
> (since 99% of the time that's enough to determine what needs to be determined)

I'm going to be nitpicky here, and say that unless I've completely forgotten
what I was taught about statistics, looking at the first 1024 bytes is only
going to work about 98% of the time, or 49 times out of every 50.  Which isn't
actually very certain.  Cutting down to the first 256 bytes will cut that to
63%, less than 2 times out of three.  Not good.

I'm assuming a totally random distribution of the individual bytes in the range
0-255 for the purposes of this calculation.  This isn't always going to be the
case of course, and if a file format has a bias AGAINST null characters, things
are going to get worse.
suggest a better approach, given that we have to make the decision before we
have all the data and the decision is irreversible...
I ain't got one.  I agree, there isn't much that can realistically be done in
these circumstances, since we're essentially blindfolded in a dark room and
someone's stolen our torch batteries.

I think that the bottom like is that ALL the unknown decoder can do is *guess*!!

By the time we get to the unknown decoder, we've exhausted ALL other (more
accurate) options for determining the content-type of the data...  So, all we
have left is a collection of heuristics that we use to 'guess' the
content-type...  Sometimes we guess wrong :-(

Is there some way that we can modify these heuristics to guess better? 
Currently, i believe that our least reliable heuristic is that for detecting
'text/plain'... 

Initially, I chose to *only* key off of embedded NULLs because various character
set encoding use the 8th bit...  Maybe, this isn't an issue??

Since we have NO character encoding information available, we can't deal with
these characters very well anyways (all we can use is the 'default' encoding)...

So, maybe we should modify the code to disallow *anything* in the 8th bit... 
The argument for doing so is that it would limit false positive 'text/plain'
hits.  It may very well, reject streams that 'could' be rendered as text/plain
using the default character encoding...

I guess the question is which is more desirable:
1. occationally rendering binary data in a window...
or
2. occationally bringing up the 'Save As' dialog box for text files...

Once we decide which is the desired behavior, we can fine tune our heuristics...

-- rick
I _may_ be able to help come up with something better, but I'm gonna need to
clarify a few things first:

1) what groups are we categorising files into?  From the comments above, we're
after at least text/html, text/plain and [everything else] - any more?  If it's
a fairly short list, then we can see about knocking up a list of conditions for
each of them.

2) presumably we have to worry about every language/alphabet under the sun,
which is where the 8-bit stuff comes from.  I can only claim to know anything
about languages that use the latin alphabet, so to be really thorough we'll need
some input from the i18n guys.

> I guess the question is which is more desirable:
> 1. occasionally rendering binary data in a window, or
> 2. occasionally bringing up the 'Save As' dialog box for text files...

Personally, I'd prefer 2.  But then I'm not a typical user.  Can we get anyone
to go out into the world and mercilessly interrogate a couple of thousand
typical users?  :)

Seriously, users without any technical knowledge are just going to run away
screaming when they see "garbage" in the browser window, and many
slightly-technically-savvy users "know" that opening a binary file in a text
viewer, then saving it, is a quick way to break the binary file, and won't
bother trying.  In many cases, they're right.  Moz is unusual here.  At least if
we offer to save to disc, the user can save it with a .txt extension (or
whatever) and open it in their favourite text editor.  

Random thought: at the point where this code gets invoked, _we_ have *absolutely
no idea* what the incoming file is.  How likely is it that the user is as
clueless as we are?


Assuming that there's a short(ish) list of file categories to worry about, a few
ideas:

- text/plain.  What about whitespace?  How many text files are going to have no
whitespace (space/tab/cr/lf) characters _at all_ in the first 256 bytes, let
alone the first 1024?  If it's got less than about one whitespace character per
60 bytes in the first 1024 bytes, it almost certainly isn't plain text (probably
not HTML, either).  That'll stand for just about every latin-alphabet language,
I think.  If it isn't a human language (e.g. base 64 encoded, or whatever), then
the user is probably going to want to save it anyway, since mozilla's not going
to be able to do much useful with it.  Course, if it's an ASCII-art kitchen
sink, we're in trouble :-D.

- text/html.  It's gonna have tags in it, surely?  Can't we go looking for
"<html", "<head", "<body", or even <...>...<...>...<...> patterns?  On the other
hand, how often does this code actually get handed an HTML file?  To get here,
it's got to be coming in without any content headers (which I believe means it's
probably not coming via HTTP[S]?), and it's not got any kind of recognised HTML
file name extension.  It'd be really nice if we could get some kind of data on
what files actually hit this code.  Not likely, I know, but it would be really nice.

> Initially, I chose to *only* key off of embedded NULLs because various 
> character set encoding use the 8th bit...

Why only nulls?  What about the other control characters, ascii 01-31?  OK,
there will be CR, LF, and TAB floating around, but what about some of the
others?  05 (enquiry), 06 (Acknowledge), 07 (bell), and several others I don't
even know the purpose of, are going to frightfully rare in text files, aren't they?

OK, enough wibbling from me. 
> 1) what groups are we categorising files into?

At the moment we detect:  application/pdf, application/postscript, text/html,
                          all the image types Mozilla supports, text/plain,
                          application/octet-stream

> Can't we go looking for "<html", "<head", "<body", or even
> <...>...<...>...<...> patterns? 

We do.
http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#333


> On the other hand, how often does this code actually get handed an HTML file? 

A lot.  90% of the ad servers out there don't send any content-type.  More to
the point, every single ebay URL goes through this code (ebay seems to feel it's
above sending content-type headers).

I think I agree that I'd rather err on the side of letting the user save than on
the side of showing in browser.  Especially if we ever get a "view as text"
option hooked up for the helper app dialog.  :)
QA Contact: sairuh → benc
>> 1) what groups are we categorising files into?
>
> At the moment we detect:  application/pdf, application/postscript, text/html,
> all the image types Mozilla supports, text/plain, application/octet-stream

OK.  Most of those have headers that are being explicitly sniffed, which makes
life easier.  

From personal experience, I'd say it's probably worth adding .asf
(http://www.microsoft.com/windows/windowsmedia/WM7/format/asfspec11300e.asp) and
.wmv (which has the same internal format as .asf, according to
http://support.microsoft.com/default.aspx?scid=kb;EN-US;q284094).

Yes, they're MS-proprietary, but they're out there in substantial numbers, and
they're the formats that give me the most grief.  The spec linked on the above
page appears to be in office 2000 format, so I can't read it, but I'd be
surprised if there wasn't a sniffable header in there.

I know there's a limit to how many types we can be reasonably be expected to
sniff, but presumably PDF/PS get in because they're common on the net?  What
about other things that are common?  Can we get data[1] on what file types are
out there?

[1] data that's more meaningful than me going "i wanna .asf and a .wmv and a
.exe and a .zip and a .tar and a ....."


>> Can't we go looking for "<html", "<head", "<body", or even
>> <...>...<...>...<...> patterns? 
>
> We do. [snip]

And a few more I hadn't thought of.  Jolly good.

Looking at the code, it's basically:
1. [PDF or Postscript headers] -> appropriate types
2. [local file] -> go to step 4 for security reasons
3. [html tags?] -> HTML
4. [known image headers] -> appropriate types
5. [No nulls in it?] -> plain text
6. [everything else] -> octet-stream

Apart from quibbles about other explicitly sniffable types, I've little to add
beyond the possible improvements to plain text sniffing I listed above.

> I think I agree that I'd rather err on the side of letting the user save
> than on the side of showing in browser.

Any chance we can ping some usability gurus on this?

> Especially if we ever get a "view as text" option hooked up for the helper app
> dialog.  :)

Yeah, that would help.  The more I think about it, the more I think that if a
file makes it down to step 5, the user probably has a better idea of what it
is[2] than we do, so the best solution might be to just ask them.

[2] not least because we're completely clueless at this point.
mpt, what do you think about comment #9?
*** Bug 129918 has been marked as a duplicate of this bug. ***
From comment 4, above:

> The mime service says nothing about this file (since it has no useful 
> extension)

Hang on a minute.  Does this mean that if the file has a recognised file
extension,  moz should figure out whether the file can be displayed or not?  So
the unknown decoder only kicks in if the extension isn't recognised?  If so, it
looks like .asf and .wmv aren't on that list.  Adding them to that list would
best be filed as a different bug, since this one is rapidly heading in the
direction of "what we should do with files in the unknown content decoder",
which is a different issue from preventing them hitting the decoder in the first
place.

If someone can confirm the above, give me a shout and I'll spin off a separate
bug for that.

[sorry, brain go slow, should have spotted this earlier]
> So the unknown decoder only kicks in if the extension isn't recognised?

Correct.  If nothing else ever uses those extensions then we can just add them
to our "extensions we know" list at
http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#124
>> Adding [.asf, .wmv] to [the lsit of known extensions] would best be filed as 
>> a different bug.
>
> If nothing else ever uses those extensions then we can just add them to our 
> "extensions we know" list [...]

Well, www.wotsit.org doesn't know any other uses of .asf.  And it's not even
heard of .wmv or .wma (.wmv's audio cousin).  Dunno if that's a good sign or a
bad sign :-/

Anyhow, logged as bug 129982.
Ok... so it sounds like tightening up our text/plain detection is desirable. 
Let me summarize what i'm hearing...

1. In addition to NUL, check for other 'low ascii' control characters to reject
text/plain.

2. Add a whitespace heuristic...  Some amount of <SP> and/or <TAB> should be
present (ideally one or more per line :-) )

any other suggestions to sniff out text/plain ??

I suppose we could add explicit detection of base64 encoding to limit the number
of text/plain misses because of this encoding too..

-- rick


That's my best shot for now.  The ASF/WMV thing should be covered by bug 129982.

The only other thing is the suggestion to switch from:
if (known binary) [octet stream]
else [plaintext]

to:
if (known plaintext) [plaintext]
else [octet stream]

So that the "unknown" cases get saved to disc rather than loaded into the
browser window.

That's my preferred behaviour and Boris's, too, I believe.  But, of course,
Boris and I aren't typical users, so PDT and MPT might have different ideas.
According to comment #11 and others, it would be great to add those extension to
mozilla :

.ace -> Ace archives files (http://www.winace.com/)
.rar -> Rar archives files (http://www.rarsoft.com/)

Is this possible ? All archives we can download are not only .zip :-)
Files that are .ISO always end up in my window.  
I don't know if already discussed solutions will fix this too.
Frederic, shwag, those issues are probably best covered by logging separate bugs
for those extensions (similar to my bug 129982 for windows media), since this
bug is covering what happens once moz decides it's got no idea what it's dealing
with.
Blocks: 138000
Would the following work as a fix for this bug? First, add several known binary
file extensions, including ISO, bz2, and others to the mime service. Second, set
the unknown content decoder to look at 0.05% of any file it gets for null
characters. For a one million byte file, it would look at 50,000 bytes.
Removing bogus dependency that was added by a non-driver.
No longer blocks: 138000
In response to comment #23 -- yes, that could be doable....  Rick, what do you
think?  We probably want to do PR_MAX(512, something*datasize) (othewise for a
small file we'd only look at a few chars...

I'm assuming you meant 5%, not 0.05%, since 0.05% of 10^6 is 500, not 50000....

I think 5% is a little big.  That would be on the order of 500000 bytes (that
would need to be allocated in memory!) for downloading Mozilla, and would be on
the order of 20-30 megabytes (that would need to be allocated in memory) for ISO
images....  But the general approach could certainly be tried; I'd like to see
whether that approach has any more success with the various file types listed in
this bug.

Perhaps something more like:

PR_MIN(PR_MAX(512, something*datasize), 20000) 

would be a thought?  That way ridiculously huge files are capped....
That wouldonly be valid for ftp, or the unknown content type. We have to trust
the server, if it lies, its a server issue, and not our problem.
That sounds good. Once implemented, we could fine tune it, if necessary.
having a variable length buffer based on the content-length (that is clamped as
boris suggests) sounds fine to me.

However, this is exactly the opposite of what bug #119942 is all about :-)  It
suggests that a *smaller* buffer be used ;-)

lets decide on a strategy... and mark bug #199942 as either a dup of this bug...
or invalid...

-- rick
Buffer size:

Firstly, let's keep things sane for those on slow connections.  In europe, most
people are still on dial-up.  If they're downloading things from a slow server,
on the other side of the world, even 1024 bytes can take a few seconds.  

A 20,000 byte buffer could mean clicking the "save this link" option, then
waiting *15-20 seconds* for the filename dialog to come up.  Even from a fast
server, with a fast modem, they're gonna be waiting 4-5 seconds with no sign
that their click did anything.  That's too long.  It'll confuse users, and make
them think moz is glacially slow at downloading.

Ideally, we could use something like the "getting file information" intermediate
dialog IE6/Win has, but that's probably gonna be loads of work, and best covered
by another bug.

Conversely, the buffer's got to be big enough so that, statistically, it is
going to correctly figure out binary/text _most_ of the time by whatever method
is being used.  Obviously, 100% would be good, but that ain't gonna happen.  The
present method is good for about 98% with a 1024 byte buffer, but a 256 byte
buffer will cut that to under 70%, which is terrible.  If we improve the
detection method, as discussed above, we can probably get better detection, with
a smaller buffer than is currently being used, especially if we can catch some
of the common culprits via other methods (e.g. windows media, bug 129982 )

So, summary of what I think needs doing:

1. Improve plain text detection heuristics as discussed here.
2. consider adding other sniffable headers to those checked
3. amend default to [save] rather than [display] (i.e. if we can't figure it
out, treat it as binary, not as text)
4. reconsider buffer size given improved heuristics.


> A 20,000 byte buffer could mean clicking the "save this link" option

This code is never called for that option.  The _only_ time this code is called
is when you actually load a url (click on a link, type in URL bar, submit form,
etc).  Any "save link", "mail link", etc. options do not use it.
>> A 20,000 byte buffer could mean clicking the "save this link" option
>
> This code is never called for that option.  

Doh!  Of course, at that point, they're ASKING to save it, aren't they?  So much
for that objection.  [mental note to self: WAKE UP!]

If the heuristics are improved, however, would we relly _need_ a bigger buffer?

If we remove a couple of the worst-offending filetypes by checking for headers
and/or extensions, add a whitespace check, and add a check for half-a-dozen
different ascii 0-31 characters, we could get our accuracy better than 99.99%,
all with a 1024 character buffer.  We could probably even get better than 99.7%
with only the 256-character buffer proposed in bug 119942 - which is to say, a
quarter of the error rate of the current system with a 1024-byte buffer.

I think that extending the null check to cover other characters may be the best
single improvement, if we can do so.  Even expending it to check for 2/3
characters, rather than just the one, out of the 8-bit ascii range, will make a
huge difference to our accuracy.

*all my statistics are assuming random distribution of characters, yada, yada.
I don't know if it is related, but every file with unknown extension (from
groups.yahoo.com) are saved like .exe files (in 2002043010 nightly trunk build).

Strange ?!
Totally unrelated bug (bug 120327)
Of ASCII 0-31, which characters are valid in text/plain files?

9  = \t (tab)
10 = \n (linefeed aka newline)
12 = \f (formfeed, is this actually used in text files?)
13 = \r (carriage return)

Did I miss any?  A file should only be considered text if there are no
characters in the 0-31 range other than these.

IIRC 127 isn't printable either, so should also identify a binary file.  So, we
should check for 0-8,11,14-31,127 (adjust as needed) and only if none of those
characters are present, AND there are spaces or \t or \n or \r scattered
appropriately, then it's text, otherwise it's binary.  Right?

Re: Comment #18, rpotts: are you saying base64-encoded files *should* be
displayed as text?  Why?  Seems to me that displaying them as text is useless; I
can't read base64, but if I save the file I can extract it with StuffIt Expander
or whatever.
> 12 = \f (formfeed, is this actually used in text files?)

It sure is.  Newsgroup posts, for example.

You forgot

11 -- Vertical Tab (\v)

Comment 18 meant that we will currently detect base64 as plaintext (since it's
7-bit-clean printable ascii).  We should therefore attempt to detect it as
non-text/plain, for best results.  :) Any idea what the magic numbers that
identify a base64-encoded file are?
Let me add a voice for user control.

Specificly:
Let the use specify a preferred handler for an unknown type. (BTW, _is_ there a
MIME type for "unknown"?)
Once loaded (or loading), let the user hand the URL to a specific handler. For
exmple, bring the URL in as app/octet-stream (my conservative preference) and in
the Save dialogue, off a "recast to type and handler" option.

Also, can the .ext -> mime/type mapping be exposed and manually extensible?
To be used only in the guess-this-type code of course, since the server's MIME
claims should be respected.

I'd also like to second the vote for Gavin Long's comment #19, to change:

    if (known binary) [octet stream]
    else [plaintext]
to:
    if (known plaintext) [plaintext]
    else [octet stream]

It seems much safer and saner to me.
> BTW, _is_ there a MIME type for "unknown"

application/octet-stream is it.  The definition is "unknown data of some sort".

The rest of what you suggest is already covered in 3 or 4 different RFEs.  The
extension to type mapping is extensible through helper app preferences already.
*** Bug 119942 has been marked as a duplicate of this bug. ***
Proposed relnote: Mozilla will sometimes not detect that an opened file is
binary, and will attempt to display it as a web page. To download such a file,
right-click on the link and select "Save Link Target As."
Keywords: relnote
Can't you also take the filesize into account? I mean, if a file is larger than
1 or 2 MB, I'm pretty sure users want to save that file (or open it with another
application) rather than read it in the browser window. And I doubt there are
that many large textfiles around...
We could, but large logfiles or message archives are actually very common..
Easily multi-megabyte.
Keywords: mozilla1.0
Keywords: mozilla1.1
Plugins also have this problem on Win32, for example:
http://slip.mcom.com/shrir/edittext4.swf

Should we not be looking at the extensions?

Nominating nsbeta1.
Keywords: nsbeta1
-> ftp (may end up in File Handling)
peter: In FTP, yes. For the example you give, what does that extension map to?
Component: Networking → Networking: FTP
My testcase works in FTP mode. It does not work in HTTP.

That extension in only mapped to a mime type in plugin code.  Calling
|nsIPluginHost::IsPluginEnabledForExtension| will check for a mapping.
For HTTP, if the server tells us it's text/plain then we should not be looking 
at extension.
okay ->file handling, if I'm reading this correctly.
Component: Networking: FTP → File Handling
QA Contact: benc → sairuh
*** Bug 152203 has been marked as a duplicate of this bug. ***
*** Bug 156020 has been marked as a duplicate of this bug. ***
this occurs on Linux as well
OS=>All
OS: Windows XP → All
according to bug #156020 this is true on Mac (OS X and 9 ) as well.
( http )
Now, referring to that bug as well, this happens on the .gz format as well, and
that format is well recognized with a .gz file ending, And has a very applyable
header.

Though this seems to barf quite hard with things like this as well:

spider@Darkmere spider $ wget
http://www.mitzpettel.com/download/IcyJuice0.9d2.dmg.gz
--18:02:37--  http://www.mitzpettel.com/download/IcyJuice0.9d2.dmg.gz
           => `IcyJuice0.9d2.dmg.gz'
Resolving www.mitzpettel.com... done.
Connecting to www.mitzpettel.com[161.58.237.23]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 606,600 [text/plain]



The text/plain would suggest a malconfigured (unconfigured?) http server, but
how come it gets attached as text/plain with mozilla? why do we trust the server
 in this case?


We trust the server because that's what the HTTP specification says we MUST do.
 Let's keep this bug focused on the issue at hand, please...
I see this in Chimera too, so it hits embedding apps as well. Yet another
testcase: <http://ftp.mozilla.org/pub/chimera/nightly/2002-07-22-05/Chimera.dmg.gz>
Hardware: PC → All
Simon: Mozilla/chimera use the Http protocol for this URl and the server sends :
text/plain....
No longer blocks: 150046
This bug hasn't been touched in months? Its marked mozilla1.0?
Anyone care to make a patch for assuming save as and providig view as text in
the save as options?
> make a patch for assuming save as

What does that have to do with this bug?

> providig view as text

This part is a large piece of work...  (trust me, I've tried two or three times).

Is there a comment after comment 18 that actually has a useful suggestion other
than the banter about buffer sizes?
>> I think I agree that I'd rather err on the side of letting the user save
>> than on the side of showing in browser.

>Any chance we can ping some usability gurus on this?

I wouldn't claim to be a usability guru, but I'm certainly a user.

Why not simply add an option to force-save the file in raw format, regardless of
the mime type sent, to the "save as type" menu. That way, if mozilla incorrectly
identifies a binary file as text, or the server erroneously sends a text mime
type for a binary file (like with those RAR archives), the user has some control
over how the data is saved -- if they know the file is binary, they have a means
of safely saving it as binary data that doesn't involve pasting the address into IE.

The same could be added in reverse: on the off-chance that mozilla, for whatever
reason, interprets a text file as binary data, the user can force-save as text
if s/he so desires.
>Why not simply add an option to force-save the file in raw format

imho, saving a file should ALWAYS save it in raw format (unless "web page
complete" is chosen, of course)
In case you all missed it, saving in Mozilla _is_ in raw format.  we don't even do 
newline conversion (though we should, imo, in some cases).
QA Contact: sairuh → petersen
Here is another file that does the same ol' thing we've all seen for months.

http://205.122.23.229/peng/linusq-a.ogg

Bad example -- that one the server claims to be text/plain.  Fix the buggy
server, please.
Its not my server to fix, and since there are other servers out there that are 
also likely misconfigured, it would be foolish to say that it is not worth 
looking at a way to have mozilla detext files by extension.  

Workaround: open the URL up in IE.  
No, you do not understand.  Doing what you suggest wouldbe a gross and blatant
violation of the spec that _no_ browser other than IE does (I've tested Mozilla,
Opera, Konqueror, Netscape 4, Mosaic, lynx, links, w3m).

We _can_ detect these files by extension or even data sniffing.  However we will
_not_ be doing it.

Please stop spamming this bug with rehashes of discussions that have happened in
the newsgroups many times over.
Workaround is to save it with File->Save or Ctrl-S in the window.
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
*** Bug 210973 has been marked as a duplicate of this bug. ***
OK, taking.  We've talked a lot, and lots of good ideas here, and I'm going to
implement the simplest one -- filtering out known-not-text chars.
Assignee: rpotts → bz-vacation
Attached patch Proposed patchSplinter Review
For the curious, with this patch we detect the file in the URL field as binary
three bytes in.
Comment on attachment 135573 [details] [diff] [review]
Proposed patch

Er, ignore that first hunk; I've not updated this tree to tip in a few days...
;)

IS_TEXT_CHAR treats 127 and 8-bit chars as text for now, because various
codepages may use them (though they probably should not be using 127, I can't
guarantee that they are not).

Thoughts?
Attachment #135573 - Flags: superreview?(darin)
Attachment #135573 - Flags: review?(darin)
Priority: -- → P1
Summary: Binary file with unknown type displayed as text/plain rather than saved → [FIX]Binary file with unknown type displayed as text/plain rather than saved
Target Milestone: --- → mozilla1.6beta
Comment on attachment 135573 [details] [diff] [review]
Proposed patch

this is better than nothing.  i agree that matching 127 here might be risky.  i
think this is a good heuristic that should help catch a lot of cases.

r+sr=darin
Attachment #135573 - Flags: superreview?(darin)
Attachment #135573 - Flags: superreview+
Attachment #135573 - Flags: review?(darin)
Attachment #135573 - Flags: review+
Checked in.  The next step is to add sniffers for common formats, per comment 29
(which I think has a good summary of the situation).  Please file bugs on those
and assign them to me?  So far we have base64 on the list, right?
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Keywords: relnote
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: