Closed Bug 173044 Opened 22 years ago Closed 9 years ago

Support for bzip2 transfer encoding, a la w3m

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: luke, Unassigned)

References

Details

(Keywords: helpwanted)

Attachments

(1 file, 1 obsolete file)

It would be nice for mozilla to have the ability (or at least the option to) use
bzip2 as a content transfer encoding, similar to how gzip is used now.  As bzip2
compression tends to result in smaller files, it would be particularly useful to
people over dial-up.

w3m appears to have support for bzip2 transfer, but I couldn't find anything on
bzip2 and mozilla.
nice suggestion; no time to work on this now -> future
Target Milestone: --- → Future
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: helpwanted
I'm prepared to try working on this, but I've not done any development on Moz
before.  Any decent guides I should be looking at to get me up to speed?  Give
us a clue ;-)
sure, take a look at:

http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsHTTPCompressConv.cpp

you'd want to either add code to this "stream converter" to handle bzip2, or
you'd want to create a similar stream converter object for BZIP2
compression/decompression.  then you'd just have to update the pref in

modules/libpref/src/init/all.js

pref("network.http.accept-encoding", ...);

that defines our Accept-Encoding header.  i believe the HTTP code, when it sees
a Content-Encoding header, will just query the stream converter service for the
appropriate converter.
also, mozilla doesn't currently support Transfer-Encoding: gzip, so adding bzip2
support for the Transfer-Encoding header is a much bigger task.
Sorry my mistake, I wanted bzip support for Content-Encoding, not
Transfer-Encoding.  Watch this space.
Attached patch bzip support patch (obsolete) — Splinter Review
The attached rather inelegent patch works for me.  Two major issues with it:

1. Requires bzip libraries.  Really configure ought to guess this, but I don't
know enough about autoconf (yet).

2. I'm sure I've inadvertently tweaked a few too many things.
yeah, it's going to take someone's time to clean this patch up so it can be
landed in mozilla.  i unfortunately don't have much free time to do that right now.
Apart from being diffed against a slightly oldish tree (which is not a big deal
really), I have a few comments/questions:

1)  Does the bzip lib handle both bzip1 and bzip2?
2)  Do people really send bzip2 files as application/bzip?  What do they do for
    bzip1 files?
3)  +	if (!PL_strncasecmp (fromStr, HTTP_BZIP_TYPE   , strlen
    (HTTP_COMPRESS_TYPE))  -- use HTTP_BZIP_TYPE as the arg to strlen, perhaps?
    and HTTP_BZIP2_TYPE for the next comparison
4)  Don't do the /* LCR */ comment thing.  That's what the "Contributors"
    section at the top of the file is for.  ;)
5)  Perhaps rename mGzipStreamInitialized to something that makes sense for both
    stream types, like "mDecompressStreamInitialized" or something?
6)  I'd sort of prefer to keep HTTP_COMPRESS_IDENTITY as the last element of the
    enum... shouldn't affect any other code.
7)  You also need to make some changes to nsExternalHelperAppService (see the
    ApplyDecodingForType and ApplyDecodingForExtension functions and the arrays
    they use).
8)  There are some whitespace issues in the code (tabs?  If so, please convert
    to spaces)
9)  There are some subtle differences between the z_stream and bz_stream
    structs.  For example, the former uses uInt while the latter uses "unsigned
    int".  I would assume these are the same thing, but for clarity and
    consitency, I'd rather cast to the type the struct expects (too bad, really
    -- there are so many parallels between the two codepaths that I almost wish
    we could have a single codepath for the two decompressions, with some
    conditional dispatch of functions).
10) mGzipStreamEnded could use renaming too, in line with the widened usage.
11) Darin, is it correct to just pass through aSourceOffset unchanged here?
    This applies to the gzip code too.
Ta for the feedback.  As for the old tree, CVS access isn't easy here ;-)

bzip1 appears to have run into trouble with patent issues, which bzip2 doesn't
appear to suffer.  Quite a few people use bzip to mean bzip2, confusingly. 
Probably the best MIME type is application/x-bzip2.

Regarding the use of strlen(HTTP_COMPRESS_TYPE), I did wonder about that, but
the GZIP code does it the same way.

The /* LCR */ comments snuck through along with some printfs.  They were various
markers and debug bits for me, and I didn't intend for them to go into the patch.

The other issues all sound sensible to me.

I'm currently working on a second version of the patch, with a --with-bzip2
switch to configure (via autoconf) to turn support on (and to incorporate some
of the suggestions made).
> I did wonder about that, but the GZIP code does it the same way.

Wanna fix that too?  ;)
Luke, any luck with that updated patch?
Sorry, I've got distracted by work of late, and also because the fall-out of
Java gcc-2.95 vs gcc-3.2 RH8 has meant I haven't upgraded for a while (policy
here is that things must come as RPM!).  

I've just grabbed the 1.4 source and I'll try and edit the patch suitably.  I
plan to wrap the new bits in #ifdef BZIP2 so they can be turned on and off
easily enough and also fix up the other issues raised, but my knowledge doesn't
stretch as far as autoconf - any autoconf experts around? :-)
Hmm.... any good reason to make this disableable?  It's not that much code....

If we decide we want to do that, I can probably try to figure out the autoconf end.
The bzip2 libraries would have to be bundled in if it were always on, and
judging by the issues over libmng it seems code space is at a premium? Does bzp2
compile everywhere Mozilla does?
Ah, ok.  The library issue is a good reason to have a compile-time flag...
Another thought on MIME types.  As regards full types,  I'd have thought
application/x-bzip2 should be right and fits in with what people seem to be
using for it.

However we also need a short name for the Accept/Content-Encoding, and as far as
I can see we send out Accept-Encoding gzip and accept a Content-Encoding of
either gzip or x-gzip, as per RFC2616.  However bzip2 isn't mentioned and not
does it say how new ones should be named, other than "New content-coding value
tokens SHOULD be registered".  Should we send out in Accept-Encoding bzip2 or
x-bzip2?  Should we accept both as synonomous on Content-Encoding?  My personal
preference is for just "bzip2" to be sent out.

I agree with comment 8 point 9 that it would be nice if the bzip2/gzip streams
could be merged as they are so similar, but I couldn't think of a neat way of
doing so.  Damn compile-time member lookup!
Also, I forgot to mention I came across the following during a google:

----
  8.  Using bzip2 with Netscape under XWindows

  tenthumbs@cybernex.net says:

       I also found a way to get Linux Netscape to use bzip2 for Content-
       Encoding just as it uses gzip. Add this to $HOME/.Xdefaults or
       $HOME/.Xresources

       I use the -s option because I would rather trade some decompressing
       speed for RAM usage. You can leave the option out if you want to.

  Netscape*encodingFilters:      \
          x-compress :  : .Z     : uncompress -c  \n\
          compress   :  : .Z     : uncompress -c  \n\
          x-gzip     :  : .z,.gz : gzip -cdq      \n\
          gzip       :  : .z,.gz : gzip -cdq      \n\
          x-bzip2    :  : .bz2   : bzip2 -ds \n
----

Was this supported under Communicator?  And does Mozilla support it?  Not that
it's much of a substitute as it won't work on Windows etc.
mozilla does not read preferences from .Xdefaults like 4x communicator did.

until there is an official "Accept-Encoding: bzip2", we should prefix what we
send with "x-", but we should accept both forms as equivalent.
Hopefully this addresses most of the previous issues raised, and is diffed for
Mozilla 1.4 release source.  By default it won't enable bzip2 support because
of the library issues mentioned previously - for that to happen the updated
files need to be compiled with -DBZIP2, and libbz2.so needs linking in. To test
I changed DEFINES in config/config.mk and OS_LIBS in config/autoconf.mk but I'm
sure there are better places to do this. Also, all.js needs
network.http.accept-encoding needs to be changed to
"x-bzip2,gzip,deflate,compress" if bzip2 support is compiled in (I'm not sure
if configure can do this).
Attachment #102322 - Attachment is obsolete: true
Attachment #127784 - Flags: review?(darin)
Comment on attachment 127784 [details] [diff] [review]
Revised patch for bzip2 support

i have a concern with this patch.  i think that this patch will make it so that
users will need to have the bzip2 library installed on their system.  it seems
like it might be better to explicitly load the bzip2 library (using
PR_LoadLibrary).  or, we could move this into an extension library.  however,
given that the amount of code is small, i prefer the PR_LoadLibrary option.
Attachment #127784 - Flags: review?(darin) → review-
another option might be to just statically link libbz2.a ... i think we'd need
to verify that the license is compatible before we can do that though.

here's the license from bzlib.h from my RedHat 9 box:

/*--
  This file is a part of bzip2 and/or libbzip2, a program and
  library for lossless, block-sorting data compression.
                                                                               
                                          
  Copyright (C) 1996-2002 Julian R Seward.  All rights reserved.
                                                                               
                                          
  Redistribution and use in source and binary forms, with or without
  modification, are permitted provided that the following conditions
  are met:
                                                                               
                                          
  1. Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.
                                                                               
                                          
  2. The origin of this software must not be misrepresented; you must
     not claim that you wrote the original software.  If you use this
     software in a product, an acknowledgment in the product
     documentation would be appreciated but is not required.
                                                                               
                                          
  3. Altered source versions must be plainly marked as such, and must
     not be misrepresented as being the original software.
                                                                               
                                          
  4. The name of the author may not be used to endorse or promote
     products derived from this software without specific prior written
     permission.
                                                                               
                                          
  THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
  OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
  DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
  GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
                                                                               
                                          
  Julian Seward, Cambridge, UK.
  jseward@acm.org
  bzip2/libbzip2 version 1.0 of 21 March 2000
                                                                               
                                          
  This program is based on (at least) the work of:
     Mike Burrows
     David Wheeler
     Peter Fenwick
     Alistair Moffat
     Radford Neal
     Ian H. Witten
     Robert Sedgewick
     Jon L. Bentley
                                                                               
                                          
  For more information on these sources, see the manual.
--*/
Initially I'd had in mind that configure would check for the presence of -lbz2
and decide whether to build it in compile time, but I hadn't successfully been
able to modify the autoconf setup to do this (I know little about autoconf).
However the PR_LoadLibrary sounds interesting - I'll take a look into it.

bzip2 claims to be BSD-style license, I don't know how that fits in with the MPL.
(In reply to comment #22)
> Initially I'd had in mind that configure would check for the presence of -lbz2
> and decide whether to build it in compile time, but I hadn't successfully been
> able to modify the autoconf setup to do this (I know little about autoconf).
> However the PR_LoadLibrary sounds interesting - I'll take a look into it.

configure checking is not sufficient unless we statically link to libbz2.a
because otherwise the mozilla builds from ftp.mozilla.org would require the user
to have installed libbz2.so.  that would be a new dependency.  sure, most linux
distros include it, but we'd have to decide if we want to add that as a required
dependency.  given that bzip2 isn't in common use as a content-encoding, i think
it'd be a hard sell.


> bzip2 claims to be BSD-style license, I don't know how that fits in with the MPL.

BSD-style license should mean that we're ok statically linking to the code, but
i'm not an expert on such things.  i'd need someone like mitchell or gerv to
review this.
The license does not impose any additional requirements over and above the MPL,
and so we could check it into the main tree just as we have libjpeg and libpng,
if necessary. We could also include it in nightly builds if we wanted.

Does that answer your question?

Gerv
Gervase Markham wrote:
> The license does not impose any additional requirements over and above the 
> MPL, and so we could check it into the main tree just as we have libjpeg and 
> libpng, if necessary.

Erm... correct me if I am wrong: The MIT/X.org license does not impose any
additional requirements over/above MPL either - but requests to include such
code into the main tree was DENIED multiple times in the past. Was the policy
for such checkins changed or am I missing something here ?
Roland: please email me with more details of whatever you are referring to.

Gerv
Firefox now includes bzip2 for partial updates (because bsdiff works well with bzip2), so it might be easier to add bzip2 as a transfer-encoding now.
-> default owner
Assignee: darin → nobody
Status: ASSIGNED → NEW
Component: Networking: HTTP → Networking
QA Contact: networking.http → networking
Target Milestone: Future → ---
See also bug 366559, which asks for LZMA (7-zip) transfer-encoding support.
To All -

It'd be great to revisit this issue at this time.

Are there still licensing issues around bzip2 code inclusion?

bzip2 transfer-encoding would be sweet. Save even more bytes on the wire :-).

Cheers,

- Bill
Due to its better decoding speed, brotli seems to be the right direction. That's being pursued in bug 366559 so I will close this.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: