Closed Bug 324781 Opened 19 years ago Closed 5 months ago

Firefox should honor Content-Type instead of extension

Categories

(Firefox :: File Handling, defect)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: vincent-moz, Unassigned)

References

(Depends on 1 open bug, )

Details

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1

According to wget, the headers for the above URL are:

  Content-Type: application/postscript
  Content-Encoding: x-gzip

But when I try to open it with Firefox, it says that it is a Gnu ZIP Archive instead of taking into account the Content-Type header.

Reproducible: Always

Steps to Reproduce:
1. Open the above URL.
Actual Results:  
Firefox says:

You have chosen to open
TI-97-7.ps.gz
which is a: Gnu ZIP Archive
from: http://www.informatik.tu-darmstadt.de

Expected Results:  
It should have said that it was a Postscript file.

I wonder is this can lead to a security problem if a wrong helper application can be used.
The type displayed by Firefox is indeed from the extension, but the *behaviour* of Firefox comes (initially at least) from the Content-Type.
ohh, content-encoding issues in file handling
Assignee: darin → file-handling
Component: Networking: HTTP → File Handling
QA Contact: networking.http → ian
(In reply to comment #1)
> but the *behaviour* of Firefox comes (initially at least) from the Content-Type.

Not completely. Firefox proposes to open the file with Stuffit Expander, which is a dearchiver, not a viewer.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20060110 Firefox/1.5

Is this bug Mac OS X only? At least here on Linux (1.8 branch build) I see the correct file type in the download dialog (postscript file) and it offers to open the file in my selected PS viewer application.
I confirm that it is correct under Linux:

You have chosen to open
  TI-97-7.ps.gz
which is a: PS file
from: ...
So is this an issue with Seamonkey too?  Or just Firefox?  And a problem with just the UI?  Or the core code?

More precisely, what does a NSPR_LOG_MODULES=HelperAppService:5 show in this case?
(In reply to comment #6)
> More precisely, what does a NSPR_LOG_MODULES=HelperAppService:5 show in this
> case?

-1610551960[1b06ce0]: Found extension 'gz' (filename is 'TI-97-7.ps.gz', handling attachment: 0)
-1610551960[1b06ce0]: HelperAppService::DoContent: mime 'application/postscript', extension 'gz'
-1610551960[1b06ce0]: Getting mimeinfo from type 'application/postscript' ext 'gz'
-1610551960[1b06ce0]: Mac: HelperAppService lookup for type 'application/postscript' ext 'gz' (IC: 0x1e24960)
-1610551960[1b06ce0]: OS gave us: By Type: 0x0 By Ext: 0x6d10060 type has default: false
-1610551960[1b06ce0]: OS gave back 0x6d10060 - found: 1
-1610551960[1b06ce0]: Data source: Via type: retval 0x80040111
-1610551960[1b06ce0]: Data source: Via ext: retval 0x00000000
-1610551960[1b06ce0]: Extension 'gz' matches mime info: 1
-1610551960[1b06ce0]: MIME Info Summary: Type 'application/postscript', Primary Ext 'gz'
-1610551960[1b06ce0]: Type/Ext lookup found 0x6d10060
-1610551960[1b06ce0]: Getting mimeinfo from type 'application/postscript' ext '.gz'
-1610551960[1b06ce0]: Mac: HelperAppService lookup for type 'application/postscript' ext '.gz' (IC: 0x1e24960)
-1610551960[1b06ce0]: OS gave us: By Type: 0x0 By Ext: 0x5161870 type has default: false
-1610551960[1b06ce0]: OS gave back 0x5161870 - found: 1
-1610551960[1b06ce0]: Data source: Via type: retval 0x80040111
-1610551960[1b06ce0]: Data source: Via ext: retval 0x80040111
-1610551960[1b06ce0]: Extension '.gz' matches mime info: 0
-1610551960[1b06ce0]: MIME Info Summary: Type 'application/postscript', Primary Ext 'gz'
> -1610551960[1b06ce0]: Getting mimeinfo from type 'application/postscript' ext
'gz'
> -1610551960[1b06ce0]: Mac: HelperAppService lookup for type
'application/postscript' ext 'gz' (IC: 0x1e24960)
> -1610551960[1b06ce0]: OS gave us: By Type: 0x0 By Ext: 0x6d10060 type has
default: false

Hmm...  So your OS knows nothing about application/postscript in this case.

biesi, we should probably not do the extension lookup, or modify it, when we're decompressing and the extension is just the content-encoding.  What do you think?
Doing so would probably fix bug 229871 (a similar issue) as well.

And this bug depends on bug 132702, I think.
This could be done independently of bug 132702, I think.

But yeah, bug 229871 is very much related (possibly to the point of being a dup).
Depends on: 229871
Well, until bug bug 132702 is fixed, we pass the compressed file to the helper application. So currently, offering to open in Stuffit Expander actually sort of makes sense. If this bug is fixed to offer the user to open the document in the Preview application (or any other app capable of viewing PostScript), the helper app will - rightly - complain that the (still compressed) file is not a PostScript file. That's why I think fixing this bug without fixing bug 132702 as well will make things actually worse than they are now.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
I wonder if there could be a security problem by not honoring strictly the Content-Type, in particular if the user chooses alternate helper applications. For instance, the user could have a filter (provided by a local proxy or whatever) based on the Content-Type, but then, since the Content-Type is not honored, the file will be passed to a different application, for which there may be more risks. And the filter corresponded to this application was not executed because the Content-Type corresponds to another helper application.

If this bug isn't fixed yet, the browser should make sure that, for instance in the case of a file ending with .gz, the file is really a gzipped one (and not a shell script or something like that).
Vincent: This is more of a general issue with helper applications. The browser cannot verify that a given file is actually what it claims to be (using either Content-Type or file extension), and even if it is, there may be risks involved, see the recent issue with WMF files on Windows. So we ultimately have to trust the helper apps here to report bad/invalid files to the user.

Filtering Content-Types at the proxy level fundamentally does not work for archives (think zip or tar), so completely relying on that won't work anyway.

Hm, looks like I accidentially assigned this bug instead of confirming...
Status: ASSIGNED → NEW
(In reply to comment #13)
> Vincent: This is more of a general issue with helper applications. The browser
> cannot verify that a given file is actually what it claims to be (using either
> Content-Type or file extension), and even if it is, there may be risks
> involved, see the recent issue with WMF files on Windows.

The WMF problem was probably yet another security hole in Windows.

> So we ultimately have to trust the helper apps here to report bad/invalid
> files to the user.

I disagree. A problem is that a helper application may be generic, and it is not necessarily possible or easy to do.

> Filtering Content-Types at the proxy level fundamentally does not work for
> archives (think zip or tar), so completely relying on that won't work anyway.

I don't see any problem with archives.
Here are more details concerning the problem:

The user may want to use a generic application for archives, e.g. zip, tar, and possibly shell scripts (is that application/x-shar?). Any format would be accepted (the user assumes that for local files, e.g. on his intranet, there's no security problem). But as some formats are not trusted from the internet, the user would set up some filtering, e.g. disallowing shell scripts for any archive Content-Type (but not for text/*, as the user still wants to be able to *view* shell scripts). Therefore, with such a filtering, the helper application would be executed on internet files only for known and trusted archive formats.

Now assume that the user downloads a file named file.gz, and this file has the Content-Type text/x-sh. No filtering is done because text/* is safe (e.g. should be viewed in the browser or saved to the disk).

However, Firefox doesn't know how to display a text/x-sh file, and looks at the extension (or the Content-Encoding, which may be fake). From it, Firefox tells the user that the file is a GNU Zip archive and proposes to open it with the helper application for archives (configured by the user). The user thinks that it is a known and trusted archive format (due to his filtering based on the Content-Type -- see above) and executes the helper application. But in the reality, it may be a shell script since Firefox has messed up the Content-Type!
where does the security problem come in? the archive manager will refuse to open the file.
(In reply to comment #16)
> where does the security problem come in? the archive manager will refuse to
> open the file.

The user could have changed the configuration to use another archive manager that also handles shell archives.
> The user could have changed the configuration to use another archive manager
> that also handles shell archives.

In which case a malicious server can simply send a shell script with the Content-Type "application/zip" to get around all of your filtering. Any user who does this is just plain _stupid_, period (well, one could also say this about the person writing such an archive manager). The concept of shell archives works only in trusted environments, and that is why they are dead for close to 10 years now.

Can we now return to the topic of this bug, please?
(In reply to comment #18)
> In which case a malicious server can simply send a shell script with the
> Content-Type "application/zip" to get around all of your filtering.

No. Please read comment #15 carefully: "disallowing shell scripts for any
archive Content-Type" (note the "any").
In that case you need to filter on the actual content, not (only) the content type, which is different from what you said before. And if you do content filtering anyway, you can also exclude shell archives in general (or you can exclude all the text/* content types that Firefox will not render in the browser, see bug 57342). In short: Yes, you can set up your browser in a way that may compromise security (shar archive manager). Don't do that then.
(In reply to comment #20)
> In that case you need to filter on the actual content, not (only) the content
> type, which is different from what you said before.

Yes, sorry if this wasn't clear. Filtering is done on the actual contents, but the filter itself is selected according to the Content-Type.

> And if you do content filtering anyway, you can also exclude shell archives
> in general (or you can exclude all the text/* content types that Firefox
> will not render in the browser, see bug 57342).

I do not wish to exclude shell archives in general. Also remember that this is just an example. There could be similar problems with other types of files (and if other extensions are used).

> In short: Yes, you can set up your browser in a way that may compromise
> security (shar archive manager). Don't do that then.

If the browser honored the Content-Type (always) in some way, then security wouldn't be compromised. The problem is that until I was aware of this bug, I didn't know that such a filtering method could compromise security.

So, if this bug could be fixed soon[*], then OK. Otherwise I think that there should be a temporary warning (or more useful information) in the dialog box if this is easier than fixing the bug completely.

[*] Does this mean that bug 132702 must be fixed first? This one is almost 4 years old, so it could still last months or years before this bug is fixed.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
Product: Core → Firefox
Version: Trunk → unspecified
QA Whiteboard: qa-not-actionable

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --

The severity field is not set for this bug.
:Gijs, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(gijskruitbosch+bugs)

The URL associated with this report now 404s, and the most recent topical comment is 18 years old. bug 132702 was marked wontfix. As far as I know we don't read Content-Encoding when making file handling decisions, so that's a red herring.

In the web of 2023, it's usually more annoying that Windows has a very sparse content type database and that users don't know about or understand mimetypes (which they generally cannot see) and that they can see file extensions and (think they) understand those. It doesn't help that web servers don't have a great track record of using meaningful and correct mimetypes to identify file content (hello, application/octet-stream, binary/octet-stream, and all your variations. Oh and sending text/plain or text/html for files that are not. Oh and the mess that is media file encoding, cf. https://mimesniff.spec.whatwg.org/#audio-or-video-type-pattern-matching-algorithm ).

So ignoring the file extension in favour of the content type seems like an unlikely thing for us to change at this point.

Status: NEW → RESOLVED
Closed: 5 months ago
Flags: needinfo?(gijskruitbosch+bugs)
Resolution: --- → WONTFIX
Summary: Firefox should honor Content-Type instead of Content-Encoding or extension → Firefox should honor Content-Type instead of extension

While I don't disagree concerning "generic" mimetypes, Firefox should correctly open files with an accurate mimetype, in particular when the file extension is correct too. You have another example here:

https://www.vinc17.net/defi14.ps.gz

which is a gzipped postscript file (hence the .ps.gz file extension) served as

Content-Type: application/postscript
Content-Encoding: gzip

So everything is fine on the server side, but Firefox wants to save the file instead of opening it with an application/postscript application.

Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Note: this test has been done with Firefox 119.0 under Linux (so, no longer MacOS).

OS: macOS → All
Hardware: PowerPC → All

(In reply to Vincent Lefevre from comment #25)

While I don't disagree concerning "generic" mimetypes, Firefox should correctly open files with an accurate mimetype, in particular when the file extension is correct too. You have another example here:

https://www.vinc17.net/defi14.ps.gz

which is a gzipped postscript file (hence the .ps.gz file extension) served as

Content-Type: application/postscript
Content-Encoding: gzip

So everything is fine on the server side

I mean, no? If this was a postscript file, and your server is gzipping it for transfer, it should be defi14.ps, with the existing Content-Encoding header. If it was a gzipped postscript file (ps.gz) on your server, without transfer encodings, it should be application/gzip or application/x-gzip, without Content-Encoding header. The existing combination is... confusing.

Right now it's nominally a postscript file per the mimetype but gzipped for transfer, but the extension (.gz) is wrong for postscript. See also the mess that is bug 1470011 (and many dupes / related bugs) and bug 35956.

It's also unclear what your new OS knows or doesn't know about application/postscript, and how you've configured Firefox to handle postscript or gzip files, as you said in comment 5 that this was WFM under linux...

Status: REOPENED → RESOLVED
Closed: 5 months ago5 months ago
Resolution: --- → FIXED

Gah, didn't mean to close this.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

(In reply to :Gijs (he/him) from comment #27)

I mean, no? If this was a postscript file, and your server is gzipping it for transfer, it should be defi14.ps, with the existing Content-Encoding header. If it was a gzipped postscript file (ps.gz) on your server, without transfer encodings, it should be application/gzip or application/x-gzip, without Content-Encoding header. The existing combination is... confusing.

Well, this might be confusing only if the file is saved (the file extension matters only in this case). But even in this case, I suppose that the Firefox behavior is OK as bug 35956 is fixed (in my example, Firefox saves the file under the name "defi14.ps.gz" as gzip compressed, so that this is consistent).

But in the case where the file should be opened with an application, I don't see any confusion. I suppose that Firefox should uncompress the file for the application, so that there is no reason that the application would reject it. But note that even if the file is not decompressed, most PostScript viewers under Linux can handle gzip-compressed files (e.g. gv and atril). So I don't see why Firefox doesn't propose to open the file with a PostScript viewer.

(In reply to Vincent Lefevre from comment #29)

(In reply to :Gijs (he/him) from comment #27)

I mean, no? If this was a postscript file, and your server is gzipping it for transfer, it should be defi14.ps, with the existing Content-Encoding header. If it was a gzipped postscript file (ps.gz) on your server, without transfer encodings, it should be application/gzip or application/x-gzip, without Content-Encoding header. The existing combination is... confusing.

Well, this might be confusing only if the file is saved (the file extension matters only in this case).

I don't see why you think the file extension doesn't matter when opening the file - on many OSes it is the only thing that determines what program the OS chooses to hand the file to, once we ask the OS to open the file.

But in the case where the file should be opened with an application, I don't see any confusion. I suppose that Firefox should uncompress the file for the application,

I don't think this would work at all - how is Firefox supposed to know if the app you or the OS suggests should open the file wants the archive or the thing in the archive? Also, there may be more than one thing in the archive, what then? Should we just be opening everything?

so that there is no reason that the application would reject it. But note that even if the file is not decompressed, most PostScript viewers under Linux can handle gzip-compressed files (e.g. gv and atril). So I don't see why Firefox doesn't propose to open the file with a PostScript viewer.

By default, Firefox saves all files it can't open itself to disk. If you've configured it to do something else for postscript/gz files, you've not explained exactly what you have done so it's not really possible to respond to this or to get this bug into an actionable state.

Flags: needinfo?(vincent-moz)

(In reply to :Gijs (he/him) from comment #30)

I don't see why you think the file extension doesn't matter when opening the file - on many OSes it is the only thing that determines what program the OS chooses to hand the file to, once we ask the OS to open the file.

Filename extensions are not standardized. A same filename extension can have different meanings depending on the OS. This means that a web server may serve files with an extension that will appear to be wrong on some OS. So if an OS blindly selects applications based on file extensions provided by a web server, then it is broken.

Under Linux, Firefox normally lets the use choose the application based on the mimetype.

I think that the issue was due to the fact that in the settings, "PostScript document" was set to "Save file" instead of "Always ask" (even though "Ask whether to open or save files" was selected). I suspect some profile corruption that could have appeared several years ago. So, eventually, Firefox really takes the mimetype into account, and I would say that this bug is now fixed.

BTW, Atril uses the extension to determine the file type (e.g. PostScript). This is bad, and does not work correctly with Firefox, because Firefox renames the extension (as seen with my https://www.vinc17.net/defi14.ps.gz example) when the file already exists! So, opening the link for the first time works, but not when opening it again.

I don't think this would work at all - how is Firefox supposed to know if the app you or the OS suggests should open the file wants the archive or the thing in the archive? Also, there may be more than one thing in the archive, what then? Should we just be opening everything?

I would say that the Content-Type should give the intent. Note that it is not the OS that decides. It is the web browser that decides whether the file should be uncompressed or not. But this could also be configurable.

Flags: needinfo?(vincent-moz)

(In reply to Vincent Lefevre from comment #31)

(In reply to :Gijs (he/him) from comment #30)

I don't see why you think the file extension doesn't matter when opening the file - on many OSes it is the only thing that determines what program the OS chooses to hand the file to, once we ask the OS to open the file.

Filename extensions are not standardized. A same filename extension can have different meanings depending on the OS. This means that a web server may serve files with an extension that will appear to be wrong on some OS. So if an OS blindly selects applications based on file extensions provided by a web server, then it is broken.

Windows has worked like this for a long time. macOS does something similar. Even if you or I think that's broken, that's just the reality of the world that the browser has to deal with.

Under Linux, Firefox normally lets the use choose the application based on the mimetype.

That's Firefox, not the OS. And one of the choices Firefox offers is "pick the OS default". This is useful because it doesn't make sense to duplicate all your preferred app settings into the browser. However... how does the OS decide what program to use to open the file? See above...

I think that the issue was due to the fact that in the settings, "PostScript document" was set to "Save file" instead of "Always ask" (even though "Ask whether to open or save files" was selected). I suspect some profile corruption that could have appeared several years ago. So, eventually, Firefox really takes the mimetype into account, and I would say that this bug is now fixed.

OK, thanks for confirming. I will close this out for now. For other changes you think need to happen in how Firefox treats downloads, please file new bugs.

BTW, Atril uses the extension to determine the file type (e.g. PostScript). This is bad, and does not work correctly with Firefox, because Firefox renames the extension (as seen with my https://www.vinc17.net/defi14.ps.gz example) when the file already exists! So, opening the link for the first time works, but not when opening it again.

If you expect Atril to use something else (maybe file magic values?) that's something to file with them, I expect...

Status: REOPENED → RESOLVED
Closed: 5 months ago5 months ago
Resolution: --- → WORKSFORME

(In reply to :Gijs (he/him) from comment #32)

(In reply to Vincent Lefevre from comment #31)

Filename extensions are not standardized. A same filename extension can have different meanings depending on the OS. This means that a web server may serve files with an extension that will appear to be wrong on some OS. So if an OS blindly selects applications based on file extensions provided by a web server, then it is broken.

Windows has worked like this for a long time. macOS does something similar. Even if you or I think that's broken, that's just the reality of the world that the browser has to deal with.

Linux doesn't work that way.

Under Linux, Firefox normally lets the use choose the application based on the mimetype.

That's Firefox, not the OS. And one of the choices Firefox offers is "pick the OS default". This is useful because it doesn't make sense to duplicate all your preferred app settings into the browser.

I agree, and that's why under Linux there is mailcap for that (originally designed for mail clients, but there is no reason not to use it for web browsers too). Firefox supported it in the past, but unfortunately, it seems that it no longer does (there is no "pick the mailcap default"). I'm wondering, though, because the mailcap related bug 120380 is still open (the latest comment was 16 years ago).

OK, thanks for confirming. I will close this out for now. For other changes you think need to happen in how Firefox treats downloads, please file new bugs.

There's now bug 1864779 (though I could find a workaround).

If you expect Atril to use something else (maybe file magic values?) that's something to file with them, I expect...

Yes, done that.

But note that if the OS chooses the application based on the filename extension (like Windows and macOS as you said), what Firefox does will break. But since I do not use these OS, I do not care...

You need to log in before you can comment on or make changes to this bug.