Closed Bug 67940 Opened 23 years ago Closed 20 years ago

For application/octet-stream, set MIME type from extension/data

Categories

(Core Graveyard :: File Handling, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX
Future

People

(Reporter: matt, Assigned: law)

References

Details

Attachments

(1 file, 1 obsolete file)

Sometimes a webserver gives the MIME type "application/octet-stream" for
a file that has a defined MIME type (for example, I often get octet-stream
for PDF files).  It would be nice if Mozilla would, for octet-stream
MIME types, try to guess the actual MIME type from the file extension.
This would also be useful in mail/news because certain email clients will not
attach the correct mime type to a file.  A particular example is when I receive
png files I often have to save them to disk, and then open them again with
mozilla because mail will not display them in-line if they have the
application/octet-stream mime type.
Blocks: 61688
Target Milestone: --- → Future
This bug is very annoying for me. I use a fax->mail gateway that sends me

Content-Type: application/octet-stream; name="990087634_C3G01F01.TIF"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="990087634_C3G01F01.TIF"

Even if I configure TIFF in Helper Apps, I still get a normal download dialog,
with not even the tiff helper app prefilled.

I guess, they use octet-stream, because some another app would otherwise choke,
I don't know.

neeti, any hints on how I could fix this?

Changing SUMMARY: s/guess/set/, because we know the extensions of the mimetypes
- you can set them in the helper app dialog.
Keywords: mozilla1.0
Summary: For application/octet-stream, guess MIME type from extension → For application/octet-stream, set MIME type from extension
Open Networking bugs, qa=tever -> qa to me.
QA Contact: tever → benc
A more general description of this bug is in bug 66677. (f.ex. HTML attachments
that arrive as text/plain). 
I received mail with the following relevent headers (according to show message
source):

MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----_=_NextPart_000_01C12F4F.C6F96700"
Content-Length: 3705

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C12F4F.C6F96700
Content-Type: text/plain;
	charset="iso-8859-1"

... text message...

------_=_NextPart_000_01C12F4F.C6F96700
Content-Type: application/octet-stream;
	name="DAVIS-T33549-PASCALET.htm"
Content-Disposition: attachment;
	filename="DAVIS-T33549-PASCALET.htm"

... HTML source...


Quite reasonably, Mozilla doesn't display the HTML page inline, as its
disposition is attachment.

If I select the attachment and choose Open, I am presented with a dialog box
which proclaims

You have chosen to download a file of type "Hyper Text Markup Language"
[text/html] from imap...
What should Mozilla do with this file?
  Open using <no application specified>
  Save this file to Disk

It appears to have interpreted the file type from the extension, but doesn't
offer the option of displaying the attachment internally (or doing this
automatically, since I did say I wanted it opened).

Scott: you're being bitten by bug 78943 and its byblows
Here's a preliminary patch to simulate discussion.  If people feel this is the
right approach, I'd like to reimplement it a bit more cleanly and fix some
other code that wants access to the Content-Disposition filename to go through
the new interface it creates.

This patch ignores a "Content-Type: application/octet-stream" sent by the
server on the grounds that such a header is as useless as sending no
"Content-Type" header whatsoever.  When faced with such unknown content (i.e.,
a missing "Content-Type" header or a "stupid" content type like "*/*" or
"application/octet-stream"), it *first* tries a filename given in a
"Content-Disposition" header.  If that gives an extension that maps to a useful
MIME type, it uses that.  Otherwise, it falls back to trying to derive a MIME
type from a file extension in the URI.

This is under-tested, but with this patch Mozilla now works as expected when
accessing attachments from the Microsoft Outlook Web Access.  In this case, the
attachment is sent by the server with "Content-Type: application/octet-stream"
but with an appropriate filename in a "Content-Disposition" header.  With this
patch, if there's a matching extension in the MIME types for that content
disposition filename, Mozilla will get it right.  (Hooray!)

Comments?

Also, let me note that bug 164996 is really a duplicate of this, but they're
both assigned to different people, so I'm hesitant to change anything.
hmm... i do like the idea of adding Content-Disposition to the list of HTTP
atoms, since that'll avoid growing the atom table when that header is
encountered, but as for the rest, can it move into uriloader/exthandler along
with the rest of the content-disposition code?
Oh, you probably want the patch that *doesn't* crash.  Here's a second try.  I
wouldn't be entirely surprised if there weren't more bugs lurking in there
though.
Attachment #99252 - Attachment is obsolete: true
Darin:

It's not clear to me how to move it into uriloader.  The problem is that, for
HTTP channels, the MIME type should be decided by the channel's idea of the
content type (from a Content-Type header) unless it's a "stupid" value like
"application/octet-stream", then by the Content-Disposition filename extension,
and finally by an extension calculated via GetTypeFromURI or similar.  The first
and the last case are already handled by nsHTTPChannel::GetContentType.  We'd
have to split them up if we wanted to check the content disposition in the
middle there.

My thought was just the opposite---the content disposition parsing code could be
moved *out* of exthandler and into nsHttpResponseHead.  There would be
nsIHttpChannel and nsIMultiPartChannel methods that exthandler could use to
fetch the relevant *parsed* Content-Disposition information it needed.

However, I don't really understand the role of exthandler very well.  Is it
involved in every bit of content or *only* when the content has been recognized
as something to be handled externally?
Kevin, the code Darin refers to lives in
http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#248
(nsExternalHelperAppService::DoContent).  It does the type lookup, then does the
extension lookup if there's nothing registered for the type.

The reason it's not a good idea to change the http channel as you did is that
sites will often send HTTP content as octet-stream expecting it to be saved
(that's what NS4 and IE do, no?).  So they send the content with a Refresh
header that redirects to a "done downloading" page or something.  Handling such
content inline would just mean the user has no way of getting to it....  This
problem is inherent in any fix to this bug, including one via the uriloader (the
helper app service would need to set the type on the channel and kick the load
back to the originating window, basically.... but that encounters the same issues.)
Okay, I think I understand, but I'll have to ponder this a bit.  It seems like
there should be an extra channel interface method, so we have a pair like
nsHTTPChannel::GetContentType and nsHTTPChannel::GuessExternalContentType.

The idea would be that, at least for the HTTP channel, GetContentType would
always deliver the Content-Type header, even if it was
"application/octet-stream" (indicating "download suggested").  If there was no
Content-Type header, it could fall back on a channel-dependent algorithm to
guess the (inline) content type.  For the HTTP channel, it would probably *only*
check the extension from the URI and try to map that for a MIME type, as it does
now.

Then, GuessExternalContentType would be used from the uriloader module as a
channel-dependent way to make a better guess when the content type is still
unknown or something useless like "application/octet-stream".  We could either
do this in nsExternalAppHandler::OnStartRequest so we bring up a "what do you
want to do with this content?" dialogue that has an appropriate MIME type based
on the Content-Disposition filename.  Or, we could do it in
nsDocumentOpenInfo::DispatchContent just before we're about to pass the request
off to the helperAppService.  I don't think it makes a difference.

Anyway, I'll think about it this week and try testing some things, including the
"redirect to `download complete' page" scenario you mentioned.
kevin: you can QI to nsIHttpChannel and call GetResponseHeader("content-type")
to access the raw server specified MIME type.  GetContentType on the other hand
returns the guessed MIME type.  so, we basically have the functionality that
you're looking for... perhaps the uriloader code just needs to ask for things
differently??
Whoa.  Let's keep knowledge of the content-type header out of the loader ok? 
That's _so_ an implementation detail of http...  Not to mention that other
channels have the same issues as http with application/octet-stream (multipart
channels come to mind).  
well, my point was that it is possible to infer these things from the channel. 
of course, if we can find a protocol-agnostic way to do the same, then that's
always better.
Not sure, if it's relevant or already considered:

Some filesystems store the mimetype in a special field. ext2, XFS and (I think)
Mac OS have this ability. Presumably, it would be a good idea to set it for
downloads as well. However, I am not sure, who is responsible to figure out a
good mimetype, if the provider didn't give one (should we guess and set it or
leave it blank, leaving it to OS/apps to figrue it out?).

Also, did you consider using |file| on Unix to figure out the correct mimetype?
It often has a much better guess than an filename extension lookup.
Ben, your suggestion would be a great improvement to the file channel... Please
file a bug on that; cc me.
*** Bug 164996 has been marked as a duplicate of this bug. ***
moving neeti's futured bugs for triaging.
Assignee: neeti → new-network-bugs
Hello --

I often receive messages with application/octet-stream for PDF, EXE, TIF, GIF,
EFX (fax).  I need some resolution from Mozilla on determining applications for
each file suffix/extension.   Mail/News is _very_ frustrating to use when
viewing a non-zero # of emails with attachments (of various type).   This may be
a trivial "perceived" functionality -- but difficult in implementation??  I
don't doubt there's some issue with lack of standards support -- but end-user
functionality should remain high on priority list.    I'd be surprised to learn
the Netscape7 release has this same problem.  Thanks -GA
Start being surprised -- it has the same "problem".

One issue is that on the Web authors rely on being able to set
application/octet-stream to force "save as" behavior (that is currently the only
reliable method).

For mail, of course, this is not an issue... The mail channel impls could
certainly be fixed to deal.  I'll look into doing that.
Summary: For application/octet-stream, set MIME type from extension → For application/octet-stream, set MIME type from extension/data
Depends on: 177026
Referring to comment #20 by Garretta:

I've got the same problem. I had a text file (.txt) with the famous MIME type
"application/octet-stream". Connected it to Notepad. Now every downloaded .exe
file (and others) are saved as ".exe.txt".
Now I checked the settings for helper applications: Editing the entry for
"application/octet-stream" there, I noticed, that the MIME type must be specified.

My suggestion:
If I define a helper application for files with a certain extension and the MIME
type "application/octet-stream", ignore the MIME type and just check the
extension. This includes several options: 
a) If there's another MIME type with this extension connected to a helper
application, use this type.
b) Several helper application can be defined for the type
"application/octet-stream".
c) Alternatively you might introduce a "dummy" MIME type for this or the option
"every type". 

I prefefred option a).

Thanks for listening to a generally satisfied Mozilla user.
RK
*** Bug 191730 has been marked as a duplicate of this bug. ***
This bug is fairly important for the average user. For example, my 
mother often gets emails containing images that have the incorrect 
content-type assigned to them. A quote from her:

"Previous to downloading Netscape 7.0, it would have automatically 
opened up the picture. Hope you have some ideas how to fix this."

(She previously used netscape 4.7 as her mail client.) Certainly 
sounds like a 4xp issue to me
Keywords: 4xp
Now that I think about it, every channel that wants this just needs to stick an
nsUnknownDecoder stream converter into the data stream before calling
OnStartRequest (as the http channel does for HTTP/0.9 responses, eg).  That's
it.  No changes needed to the uriloader, channels can make their own decisions
(eg HTTP should _not_ do this, in my opinion.  But for mail, it would make a
good deal of sense to do it).

Thoughts?  (Note that all comments but comment zero talk about this in the
context of mail, and all the issues I've raised are only problems in the context
of HTTP.)
bz: HTTP doesn't push a nsUnknownDecoder... the uriloader does so on behalf of
any channel that cannot provide a specific content-type.  the OnStartRequest is
delayed by nsDocumentOpenInfo (or whatever the class name is) until
nsUnknownDeocder does its thing.
Darin, see nsHttpChannel::CallOnStartRequest (you reviewed that patch, man!)
*oh yeah* :-)
*** Bug 195978 has been marked as a duplicate of this bug. ***
->file handling?
From a QA point, maybe.... This is a Necko bug at heart, but this particular
part of necko is classified as "file handling" in the component descriptions.
*** Bug 173236 has been marked as a duplicate of this bug. ***
*** Bug 158050 has been marked as a duplicate of this bug. ***
->file handling
Assignee: new-network-bugs → darin
whoops....
Assignee: darin → law
Component: Networking → File Handling
QA Contact: benc → cpetersen0953
So are we back to Kevin's suggestion in comment #12 ?
Opera has a feature similar to what this is proposing. Their option calls it
"determine file type based on extension for unreliable MIME types" or something,
and I think that's a pretty good description of what this should be. I would
like to see an option to allow filetypes to be defined based on extension
(priority over MIME type) since many servers do not properly define many MIME types.
Re: Comment 25, my take is that this is a user-empowerment issue. I'm a pretty
savvy developer (though not so much in the web arena), and my first attempt to
"do something" about this implementation was to try changing my "helper
applications" preference to handle "application/octet-stream" files with
extensions of "jpg" (which were the ones that were annoying me) to be handled
internally. Hilarity ensued. 

As for the web sites that want to "force" a download... what if I don't want to
be forced?

Where am I going with this, you might ask? Why does the GUI for configuring MIME
type handlers allow only one entry for each MIME type? It seems to be that this
could be handled by allowing users to specify their *own* actions for different
extensions for these kinds of meaningless and widely wrongly used MIME types. 

Then we could have arguments about what the defaults for these should be :-).
Try http://www.testitonline.no/. It tries to download the html pages. Is this
problem within the scope of this "bug"?
comment#39:
The server is broken and they should send the correct mime/type for the document 
(text/html) but this is a workaround for this misconfigured server.
But comment#39 is a good example of a web site that is displayed in M$IE even
when you think forced downloading should be triggered (it does in virtually all
other browsers though). But this behavior probably depend on a file type/content
that IE know how to display internally (i.e. text/html/image).
Why is this an enhancement, and not a bug?

Is the email client really doing anything wrong? What mime-type should it use
for unknown file types? It just gets files to attach, and often know nothing
about them except their extentions. How should it be able to figure out the
mime-type of all possible file types?

I think this should be of hight priority to be fixed, as I assume many of the
users are going to experience this problem and no good workaround exists.
>Why is this an enhancement, and not a bug?

interesting that you ask, personally I wonder "why is this an enhancement and
not WONTFIX"

hm... comment 25 suggests to make this mail-only, in which case it may make
sense, but why is this in the browser product then?

>Is the email client really doing anything wrong? What mime-type should it use
>for unknown file types? It just gets files to attach, and often know nothing
>about them except their extentions. How should it be able to figure out the
>mime-type of all possible file types?

That's indeed a quite good question. Why do you think Mozilla can do a better
job at extension->type mapping than your mail client can?
> hm... comment 25 suggests to make this mail-only, in which case it may make
> sense, but why is this in the browser product then?

I suppose the same problem exists for web also, or is it less because of less
M$-percentage of servers than for email clients?
I also agree that the issue is different for web. When you put something on a
http server, you should know what type it is, that is, you are not displaying
user content.

>> Is the email client really doing anything wrong? What mime-type should it use
>> for unknown file types? It just gets files to attach, and often know nothing
>> about them except their extentions. How should it be able to figure out the
>> mime-type of all possible file types?

> That's indeed a quite good question. Why do you think Mozilla can do a better
> job at extension->type mapping than your mail client can?

Think of this scenario:
-I have acroread/acrobat creator installed on my PC, and create a PDF file,
place it on a file server.
-B, which has no PDF applications installed on his PC, attach the file from that
file server to an e-mail and send it to you
-you have PDF applications installed, and know that .PDF (usually) are PDFs, a
signature for PDF contents and know the MIME-type for PDF, get the attachment as
application/octet-stream

Here the MIME-type is lost because of the storage on a harddisk, with a file
system which doesn't save the MIME-type.

Do you think B should install a mapping of filename extentions of mime-types, or
be forced to install some PDF applications?

If the sender doesn't know the mime type and the receiver knows, I think the
receiver should guess.

Is there a better mime type for unknown than octet-stream?

What should I tell people that say the exact same thing is working as expected
in other mail clients (i.e. outlook)?
>I suppose the same problem exists for web also, or is it less because of less
>M$-percentage of servers than for email clients?

Well, see comment 11.

>Do you think B should install a mapping of filename extentions of mime-types, 
>or be forced to install some PDF applications?

That was a good example. Yeah, doing it for mail would be ok for me. But then
this bug should be in the mail product.

> Is there a better mime type for unknown than octet-stream?

No, afaik octet-stream means "I don't know what this is, but it's binary"

What should I tell people that say the exact same thing is working as expected
in other mail clients (i.e. outlook)?
comment#11: "that's what NS4 and IE do, no?" 
No, IE doesn't use the content-type: http://www.mversen.de/mozilla/octet/ 
The server should send a attachment-header if the client should download the
file. (And I saw a few servers send content-types like "application/x-download" 
to forcing a download)

It would make sense to use the unkown-content decoder (not the extension) for
application/octet-stream because application/octet-strema means = I don't know
what this is. I think that the client should handle this file because the server 
doesn't know what this file is.

I don't know if this is true but I think you can't tell Apache to send an unkown 
File without content-type. The mime-config only allows you to send a general
mime-type for unkown files and the mime-list in Apache only contains IANA
Mime-types (no mime-types for many known files like .rar or .ace)
The xitami web-server does the same (but the default config is a broken "*/*"
mime-type for unkown files )

And this would make us compatible with Opera or Safari
Keywords: mozilla1.0
If this is a mail-only bug, it's a trivial matter of hooking the unknown content
decoder into streamlistener chain for mailnews channels that have
application/octet-stream for the type.  The http channel does this already for
cases when the server sends no content-type header.

Note that:
1)  Comment 0 is not about mail
2)  We have a separate bug on the mail behavior, filed in the mailnews product, as
    far as I can recall.
Comment#43 asked "Why do you think Mozilla can do a better
job at extension->type mapping than your mail client can?"

The correct question should be "why do you think your Mozilla can do a better
job at extension->type mapping than someone else's mail client or web server can?"

A fix to this bug will allow Mozilla users to have control over how they handle
content received from poorly configured mail senders or web servers.
The mail-bug (or at least one of the relevant bugs) is bug 59631 : M$ Virus
Outbreak (= Outlook) sends vcards as application/octet-stream.
*** Bug 189391 has been marked as a duplicate of this bug. ***
WONTFIX based on comments.

Please file a bug on the mail components if this is to be done only in our mail
clients, assuming such a bug is not already filed.
Status: NEW → RESOLVED
Closed: 20 years ago
QA Contact: chrispetersen → ian
Resolution: --- → WONTFIX
that's bug 59631 as mentioned in Comment 49.

so please do not file duplicates of that bug.

vrfy wont
Status: RESOLVED → VERIFIED
This isn't only for the mail component. Why was this closed WONTFIX "based on
comments"?
Because for non-mail cases, this is actually invalid per the specs, as described
in comments above.
Exactly. The wontfix is for the non-mail part of this bug.  I concur, and so
does darin last I checked.
What enhancement bug then contains a feature similar to the opera option
mentioned in comment #37?
None that I know of.  And I don't think we have any plans to implement something
like that (I would say that the wontfix in this bug applies to that idea too).
Seeing as how this is a huge pain for real users, I think something should be
done. However, it is true that the specs say that application/octet-stream is
for unknown binary files that should be saved to disk.

Given that, isn't it a violation of the specs for Mozilla to even *allow*
setting a handler for this MIME type (that's what gets most people into
trouble)? You can't have it both ways. 

Personally, I suggested something in comment 38 that I think would be an ideal
solution, and that would actually be useful in other situations as well
(allowing multiple extensions/handlers per MIME type). Yes, it's a bit of a
pain, and something that only advanced users would get any benefit from... but
it's not a violation of this spec at least...
> to even *allow* setting a handler for this MIME type

At this point, you can't do that accidentally.  The only way to do it is to
purposefully open up prefs and set the handler for that type, typing in the
type.  As in, the helper app dialog no longer auto-saves the handler when the
type is application/octet-stream.

If I want to set the handler to something like a hex editor, no reason why I
shouldn't.  ;)

Please do file a _separate_ clear bug on your suggestion.  It's something that
we would take if someone implements it (neither biesi nor I are likely to have
time to work on it anytime soon, and we seem to be the only people willing to
touch this code....)
My Workaround for Linux/KDE:
This just passes the mail attachment to KDE/Konqueror to run the default app.
You can probably do something similar on Windows with Explorer.  I don't know
exactly about Gnome.

Add the following line to /etc/mailcap:
application/octet-stream; /usr/local/bin/attachment.sh "%s"

Put the following script in /usr/local/bin/attachment.sh:
#!/bin/bash
kfmclient exec "$1"

Make it executable:
chmod ugo+x /usr/local/bin/attachment.sh

In Edit:Preferences:Navigator:Helper Applications, Add a New Application:
For MIME Type, use 'application/octet-stream'
Add some file extensions:  'wpd doc pdf'
Select "Open it using the default application"
Uncheck "Always ask me before handling files of this type"

As long as the KDE file manager is setup with the right application, this will
work.  With my system (Fedora Core 1) Mozilla copies the attachment to "/tmp"
then the scripts pass the URL of the attachment to kfm-client which opens it
with the correct application.  It takes a couple of seconds on a slow system,
but it works.
Wouldn't that also open shell scripts and exes (using wine) etc.? If so, that's
extremely dangerous.
> then the scripts pass the URL of the attachment to kfm-client

surely the filename of the /tmp file, not a url?
> surely the filename of the /tmp file, not a url?

Sorry:  of course, the filename.  I was playing around with passing url's for a
while so I was still in that mindset when I wrote that comment.  I couldn't
figure out whether Konq supports the mailbox:// url and I couldn't really find a
good reference for the % variables, so it just saves the attachment and passes
the filename.

> Wouldn't that also open shell scripts... ?

Afaik, not unless Moz saves the attachment with the execute bit, in which case,
yes, that would be a huge problem.  Although it would be a problem with Moz, not
my script.

You might be able to force the (wrong) behaviour through Konqueror by
associating scripts with their interpreter, but I don't think Konq is set up to
do that automatically.  On my system at least, I don't see any scripts or Wine
executables that are associated.  Even so, I would hope that Konq would respect
the filesystem permissions, but who knows.
*** Bug 222132 has been marked as a duplicate of this bug. ***
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.