Open Bug 52282 Opened 20 years ago Updated 3 months ago

(ftp:// or file://) suffixes like (.gz) are Content-Encoding (not Content-Type) [decode and display .gz file]

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

Future

People

(Reporter: frb, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: helpwanted, student-project, Whiteboard: [necko-would-take])

I did see the other bugs on http:// URLs not recognizing zip, file:// is still
broken however in a build from source on 2000 09 11
--> networking, I believe they handle this as well.
Assignee: asa → gagan
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
QA Contact: doronr → tever
WORKSFORME
Platform: PC
OS: Linux 2.2.16
Mozilla Build: 2001011904

Marking as such.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WORKSFORME
I just tried to load a file:/usr/share/doc/libc6/FAQ.gz on a debian system
running 2001011912, and it still opens a dialog saying it doesn't recognize the
mime type instead of seeing it as an encoding type

REOPEN
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
As an added bonus, this problem occurs on ftp:// as well as file://
Hope that helps
I'd have to guess that our extension recognition needs to know that some
extensions correspond to Content-Encoding rather than Content-Type.  (HTTP
servers' extension recognition generally does this -- and then allows the
previous extension to indicate the Content-Type.)
We added a function specifically on HTTP (though just for download cases) to not 
perform the content conversion (attribute applyConversion) I wonder if that is 
what is needed in other cases as well. Or maybe we change how aggressively the 
conversion is applied. 

->dougt
Assignee: gagan → dougt
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
mass move, v2.
qa to me.
QA Contact: tever → benc
what is milestone "mozilla1.0" anyway?  Moving to future.
Target Milestone: mozilla1.0 → Future
*** Bug 78065 has been marked as a duplicate of this bug. ***
*** Bug 87774 has been marked as a duplicate of this bug. ***
tried to clarify summary, might not be an improvement...
Summary: doesn't recognize gzip files as a content instead of a type → recognizing file type (.gz) should use Content-Encoding rather than Content-Type sometimes
*** Bug 87403 has been marked as a duplicate of this bug. ***
API Change
Target Milestone: Future → mozilla1.0
re-summary attempt #2. more searchable and hopefully more sensible.
Summary: recognizing file type (.gz) should use Content-Encoding rather than Content-Type sometimes → (ftp: or file:) suffixes like (.gz) areContent-Encoding (not Content-Type)
*** Bug 92192 has been marked as a duplicate of this bug. ***
*** Bug 111162 has been marked as a duplicate of this bug. ***
Summary: (ftp: or file:) suffixes like (.gz) areContent-Encoding (not Content-Type) → (ftp: or file:) suffixes like (.gz) are Content-Encoding (not Content-Type)
Keywords: helpwanted
Target Milestone: mozilla1.0 → Future
So... how do we want to go about figuring out when to decode and when not to? 
If I'm fetching a file over FTP to save it to disk, I _really_ don't want it
decoded... 

On the other hand, a gzipped readme should be readable directly.
Responding to Boris, I think a simple rubric would cover most cases: if stripping .gz leaves an 
extension of .txt, .htm or .html, then decode.  This doesn't deal with all cases but is simple and as 
much as can be asked for.
*** Bug 147515 has been marked as a duplicate of this bug. ***
*** Bug 158559 has been marked as a duplicate of this bug. ***
-> file handling, as I understand it.
Assignee: dougt → law
Component: Networking → File Handling
QA Contact: benc → sairuh
No, it's not.  The encoding should be a property of the channel, IMO, not
something consumers should be guessing at.
Assignee: law → new-network-bugs
Component: File Handling → Networking
QA Contact: sairuh → benc
If you go to the Mozilla homepage and take it up on its offer to download
Chimera-0.4 (first item today,
ftp://ftp.mozilla.org/pub/chimera/releases/chimera-0.4.dmg.gz) you get the
contents of the disk image in the browser window.  That's a bad default behavior. 

I like comment #18, though I might extend that to any type mozilla will handle
itself or a plug-in will take.
> That's a bad default behavior.

That's a very interesting behavior, since until this bug is fixed that file
should be getting detected as application/gzip unless you have overriden that
file association in Internet Config or in preferences...
*** Bug 168121 has been marked as a duplicate of this bug. ***
Summary: (ftp: or file:) suffixes like (.gz) are Content-Encoding (not Content-Type) → (ftp:// or file://) suffixes like (.gz) are Content-Encoding (not Content-Type) [decode and display .gz file]
looks like decoding on HTTP protocal is done by
netwerk/streamconv/converters/nsConvFactories.cpp
<http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsConvFactories.cpp>
which can't be used to handle file:// and ftp:// stuff.

do we really want to decode file downloaded from ftp?

I think this should be a file handling bug
Re handling file:/.../thing.txt.gz or .html.gz:
It's really useful to be able to compress html files on the local disk
when you have thousands or 10's of thousands of them.  This is the one feature
that Netscape has over Mozilla, IMHO.

luke
> which can't be used to handle file:// and ftp:// stuff.

Why not?

> do we really want to decode file downloaded from ftp?

Yes.

There are three parts to this bug:

1)  Make ftp and file channels know about the existence of encodings.
2)  Make it possible to detect not only the type of the data the channel is
    getting but also its encoding.
3)  Make the ftp and file channels do this and kick in the decoder as needed.

Part #1 is definitely networking.  These channels have no concept of encoding
and will need one (nsIDecodingChannel?  Some other name for the interface?  We
need a place to put the contentEncodings enumerator anyway, since we want to
move it off nsIHttpChannel).

Part #2 could be considered file handling, since that seems to be a dumping
ground for mime-related bugs...

Part #3 is networking and blocked by the other two parts.

We should consider three separate bugs here...
*** Bug 171345 has been marked as a duplicate of this bug. ***
*** Bug 173390 has been marked as a duplicate of this bug. ***
Why isn't this bug marked 4xp? This worked in Netscape 4.76. It also should be
marked all platforms/OSs.
Keywords: 4xp
OS: Linux → All
Hardware: PC → All
bz, when you get a chance, I'd like to break this out.
I've a #4 to add to the mix:

4)  Figure out how to have stream converters on an encoded channel in the necko
    API. 

Until that happens, everything else is moot.
*** Bug 181982 has been marked as a duplicate of this bug. ***
Possible approach:

1)  Have an interface that has an |attribute applyConversion;| on it.
    (nsIEncodedChannel does fine here).
2)  nsUnknownDecoder detects when it's given such a channel in OnStartRequest
    and immediately sets applyConversion to FALSE.
3)  nsUnknownDecoder does its normal thing.  When the time comes to detect the
    type, if the channel is an nsIEncodedChannel and the type corresponds to a
    content-encoding (gzip, zip), we immediately decompress the buffer we're
    sniffing and resniff for the decompressed type.  We set both the type and
    the encoding on the channel (this needs an API for setting the latter).
4)  nsUnknownDecoder set applyConversion to TRUE (the default) and calls its
    listener's OnStartRequest.
5)  When this returns, nsUnknownDecoder looks at applyConversion.  If this is
    still TRUE, it puts a stream converter for the appropriate encoding in the  
    stream between itself and its listener.

This gives the listener the ability to disable conversions as needed. If all
stream converters that delay OnStartRequest follow this policy of setting
applyConversion to FALSE right away and then setting it to TRUE before calling
OnStartRequest on the next-in-chain and putting in a decoder if it's still TRUE
after the call, then the final decision on whether to do decoding lies with the
final listener in the chain, where it should.

Thoughts?
bz: documentation of nsIEncodedChannel::applyConversion says:
 55      * [...] Calling this during 
 56      * OnDataAvailable is an error. 

That also makes sense - what does it mean if you set applyConversion to true in
ODA? is it even possible to apply the conversion after the start of the data is
received? what happens to the data that the current oda call sent?
> 56      * OnDataAvailable is an error. 

I'm proposing changing that comment....  The interface is not frozen or anything.

> is it even possible to apply the conversion after the start of the data is
> received?

No.  So we specify that setting it during ODA affects the value of the attr but
nothing else.

Alternate proposals for how to get this to work are welcome, of course.
ok, alternate proposal but applicable only to file:
o) read the first few bytes of the file (local i/o should be comparably fast;
even if not, this can be done immediately before sending onstartrequest, which
might make this applicable for ftp too)
Given those bytes, check if this matches a content-encoding (compress/gz, I
don't think we support anything else)
o) If it does, insert a streamconverter that does the decompressing

hmm. why isn't the unknown decoder implemented like this, i.e. the channel
explicitly gives the decoder the first few bytes and asks for a type?
> o) If it does, insert a streamconverter that does the decompressing

Consumers need to keep this from happening, mind you....  But yes, the channel
could delay OnStartRequest till it gets the first data packet...

> why isn't the unknown decoder implemented like this

Because it's meant to be transparent to channels, mostly.  And it is.  Mostly.
>Consumers need to keep this from happening, mind you.... 

ok, so wait with inserting the streamconverter until after onstartrequest is
called, and if consumers want to stop this from happening, they call
setApplyDecoding(false) in onstartrequest
Unfortunately, stream converters are stream listeners... So you can't insert one
into a listener chain once OnStartRequest has been called on its downstream
listener -- that will result in OnStartRequest being called twice on said
downstream listener.
*** Bug 157514 has been marked as a duplicate of this bug. ***
This bug makes it still impossible to open a local svgz file with Firefox. 

I think severity should be major and Target Milestone be a 1.9 Build.
I'll get to this sometime, maybe, but this is lower priority for me than some other 1.9 work at this point.  And setting the target milestone would imply that someone is working on this, targeting that milestone.  At the moment, no one is working on it.  If you think this is incorrect allocation of resources, I urge you to fix the bug yourself; I can point you to relevant parts of the code.  
We should try to leverage nsBaseChannel to implement this.  Perhaps it should implement nsIEncodedChannel (which has the SetApplyConversion method).
*** Bug 322550 has been marked as a duplicate of this bug. ***
Duplicate of this bug: 376631
Not sure if fixing 52282 will support the following use case (So I will post this same comment in 52282 and in 157514:

A data:url containing svgz. The data url syntax doesn't have a way to specify that the content is gzip compressed, and there is only one MIME type for svg.
Mozilla throws up an XML parsing error. Opera handles it.

Why inline svgz in a data:url instead of using a compound XML document and compressing the whole thing?
1) Why not.
2) The whole XML / DTD doctype conunundrum to get the latter approach working.
3) I want to send the svg back to the server, and prefer to send compressed data.
> and there is only one MIME type for svg.

This is the real problem.  Or lack of content-encoding in data:, I guess.  But plenty of other things don't support content-encoding.

> Opera handles it.

I don't see how without violating one of the relevant specs.

Basically, this bug is about the case when we're sniffing the type.  We're never in that case for data:.

Note that you could probably do compression with a jar:data: if you base64-encode the ZIP file.
What is jar:data:? I can't find any documentation about that anywhere on the world-wide-webs.

Opera may be violating the spec, but in a useful way.


The jar: protocol (which is admittedly not standardized) looks like this:

  jar:uri-to-zip-file!/path-in-zip-file

So if you take a ZIP file, base64-encode it, and do:

  jar:data:application/zip;base64,.....!/foo.svg

you'll get compressed inlined svg.

> Opera may be violating the spec, but in a useful way.

Frankly, it doesn't matter.  Once you start doing that, interoperability (the whole reason for specs) goes to hell.  Case in point: something that's not SVG gets rendered as SVG by Opera, but other SVG UAs don't have that behavior, thus you lose interoperability.

Seriously, it sounds like you want a MIME type for compressed SVG.  I suggest raising that with the SVG working group.

In any case, all this is completely off-topic for this bug.  If you want to continue this discussion, please use the newsgroups.
I'm not sure exactly what SVG is, but what you want to do with it does sound related to this bug.  You don't need another mime type for a compressed svg file; mozilla just needs to recognize the .gz ending means mime-encoding=gzip, and strip the .gz before trying to guess the mime-type.  

Phillip, Jonathan is using a data: URI.  There is no extension involved.  Just data and a MIME type.
It sounded like that URI refers to a file name that contains the data.  The file has an extension which should be used to guess the mime-type and mime-encoding.  The problem is that right now firefox guesses a mime-type of gzip instead of a mime-encoding of gzip, and whatever else for the mime-type.  
> It sounded like that URI refers to a file name that contains the data. 

It doesn't.  Please go and google "data: uri".
Realising this has drifted a little off topic, but wanting to clear up some misinformation from the last few comments:

Phillip, SVG is Scalable Vector Graphics 
http://www.w3.org/Graphics/SVG/
http://www.w3.org/TR/SVG11/
the data: protocol is defined by RFC 2397 'The "data" URL scheme', L. Masinter, August 1998.

Boris, the fact that compressed SVG and uncompressed SVG have the same Internet media type is by design, and follows the relevant IETF guidelines on separating content type from compression. See
http://www.w3.org/TR/SVGMobile12/conform.html#ConformingSVGServers
for a discussion on Internet media Type and compression. It would be incorrect and inappropriate to register all combinations of media type times compression method as new media types.

Jonathan, you are correct that the data: protocol does not provide a means to indicate the Content-Encoding (or indeed, any other http headers). I would therefore not recommend using compressed data in a data: uri since there is no way to indicate that it is compressed or what method has been used to compress it.

All, the general issue is that most protocols (file, ftp, data, etc) do not provide metadata headers - just the actual data. Internet Media type, compression, language, and so forth therefore need to infer this information.

For SVG, the .svg extension implies media type image/svg+xml while the .svgz extension implies media type image/svg+xml and Content-Encoding: gzip.
Duplicate of this bug: 443403
Duplicate of this bug: 432795
Duplicate of this bug: 494968
Blocks: 319690
Duplicate of this bug: 546431
Why don't you ask the user what he'd like to do?
Just as he could check any other option
"Save"
"Open"
"View in FF"

I'm not able to view the file locally which was working perfectly being on the server. That's absurdly.
> Why don't you ask the user what he'd like to do?

There's a bug on that.  It's not this bug.  Turns out that it requires some significant code rework to implement "view in FF".  If you want to help, of course, I'll point you to the work-in-progress patch.
Setting to block bug 512501 as the SVG specification states that:

"SVG implementations must correctly support gzip-encoded  [RFC1952] and deflate-encoded  [RFC1951] data streams, for any content type (including SVG, script files, images)." [1]

As Chris as well mentioned (in comment 56), this must be done using heuristics due to lack of meta-information in several protocols.

Although the currently available test suite [2] doesn't explicitly test for this, there's a specific test [3] in SVG 1.1 SE which will break whenever the test suite is being executed locally (the online version works file due to the metadata inserted at HTTP level).

<rant>
Apart from the conformance issue, this is one of the most longstanding and annoying and issues regarding SVG support. The broken feeling when opening ".svgz" files locally probably causes many users not to use a format which is highly desirable due to the XML overhead, for perceived performance and bandwidth reduction reasons. Also, the behavior is quite unintuitive/unexpected for most users/authors, as they are able to access the exact same content online, but not locally... :-|
</rant>


[1] http://www.w3.org/TR/SVG11/conform.html#ConformingSVGViewers
[2] http://www.w3.org/Graphics/SVG/Test/
[3] http://dev.w3.org/SVG/profiles/1.1F2/test/harness/htmlObject/conform-viewers-01-t.html
Blocks: svg11tests
SVG is a very important format & support for *.svgz is vital to reduce bloating.
15 years on & this - what should be a - trivial issue still hasn't been resolved.
If Opera can do it, so can you.
I don't believe it!
If you think this is trivial, please feel free to write a patch.  I'll happily review.
I would if I could, but I can't, but I do contribute to a number of Open Source projects.
(for example: I created the patch for PrimeFaces so it delivers the correct MimeType for svgZ's
 & I instigated Servlet Spec 86 so Tomcat et al can serve the correct Content-Encoding for svgZ's)
Whiteboard: [necko-would-take]
Duplicate of this bug: 1263359
If the browser doesn't support opening the svgz file type from file:// then it doesn't support it - that needs to change.  The theory that a graphics file must be loaded over http:// is misplaced; we as industrial users load a LOT of data into browsers from file://.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P3 → P5
Duplicate of this bug: 1414620
Hi Robert, 1414620 and 157514 are not duplicates of this bug. They are specific to the ".svgz" file extension (ie: not ".svg.gz")
I tell you what, you submit a patch that fixes either one of them and if it doesn't fix the other I'll deduplicate the bugs.
Flags: needinfo?(chris.james.plant)
Ok, I'll give it a go. I've never tried building firefox before though so it may take me a while...
Flags: needinfo?(chris.james.plant)
You need to log in before you can comment on or make changes to this bug.