Closed Bug 173044 Opened 22 years ago Closed 9 years ago

Support for bzip2 transfer encoding, a la w3m

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: luke, Unassigned)

References

Details

(Keywords: helpwanted)

Attachments

(1 file, 1 obsolete file)

It would be nice for mozilla to have the ability (or at least the option to) use bzip2 as a content transfer encoding, similar to how gzip is used now. As bzip2 compression tends to result in smaller files, it would be particularly useful to people over dial-up. w3m appears to have support for bzip2 transfer, but I couldn't find anything on bzip2 and mozilla.
nice suggestion; no time to work on this now -> future
Target Milestone: --- → Future
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: helpwanted
I'm prepared to try working on this, but I've not done any development on Moz before. Any decent guides I should be looking at to get me up to speed? Give us a clue ;-)
sure, take a look at: http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsHTTPCompressConv.cpp you'd want to either add code to this "stream converter" to handle bzip2, or you'd want to create a similar stream converter object for BZIP2 compression/decompression. then you'd just have to update the pref in modules/libpref/src/init/all.js pref("network.http.accept-encoding", ...); that defines our Accept-Encoding header. i believe the HTTP code, when it sees a Content-Encoding header, will just query the stream converter service for the appropriate converter.
also, mozilla doesn't currently support Transfer-Encoding: gzip, so adding bzip2 support for the Transfer-Encoding header is a much bigger task.
Sorry my mistake, I wanted bzip support for Content-Encoding, not Transfer-Encoding. Watch this space.
Attached patch bzip support patch (obsolete) — Splinter Review
The attached rather inelegent patch works for me. Two major issues with it: 1. Requires bzip libraries. Really configure ought to guess this, but I don't know enough about autoconf (yet). 2. I'm sure I've inadvertently tweaked a few too many things.
yeah, it's going to take someone's time to clean this patch up so it can be landed in mozilla. i unfortunately don't have much free time to do that right now.
Apart from being diffed against a slightly oldish tree (which is not a big deal really), I have a few comments/questions: 1) Does the bzip lib handle both bzip1 and bzip2? 2) Do people really send bzip2 files as application/bzip? What do they do for bzip1 files? 3) + if (!PL_strncasecmp (fromStr, HTTP_BZIP_TYPE , strlen (HTTP_COMPRESS_TYPE)) -- use HTTP_BZIP_TYPE as the arg to strlen, perhaps? and HTTP_BZIP2_TYPE for the next comparison 4) Don't do the /* LCR */ comment thing. That's what the "Contributors" section at the top of the file is for. ;) 5) Perhaps rename mGzipStreamInitialized to something that makes sense for both stream types, like "mDecompressStreamInitialized" or something? 6) I'd sort of prefer to keep HTTP_COMPRESS_IDENTITY as the last element of the enum... shouldn't affect any other code. 7) You also need to make some changes to nsExternalHelperAppService (see the ApplyDecodingForType and ApplyDecodingForExtension functions and the arrays they use). 8) There are some whitespace issues in the code (tabs? If so, please convert to spaces) 9) There are some subtle differences between the z_stream and bz_stream structs. For example, the former uses uInt while the latter uses "unsigned int". I would assume these are the same thing, but for clarity and consitency, I'd rather cast to the type the struct expects (too bad, really -- there are so many parallels between the two codepaths that I almost wish we could have a single codepath for the two decompressions, with some conditional dispatch of functions). 10) mGzipStreamEnded could use renaming too, in line with the widened usage. 11) Darin, is it correct to just pass through aSourceOffset unchanged here? This applies to the gzip code too.
Ta for the feedback. As for the old tree, CVS access isn't easy here ;-) bzip1 appears to have run into trouble with patent issues, which bzip2 doesn't appear to suffer. Quite a few people use bzip to mean bzip2, confusingly. Probably the best MIME type is application/x-bzip2. Regarding the use of strlen(HTTP_COMPRESS_TYPE), I did wonder about that, but the GZIP code does it the same way. The /* LCR */ comments snuck through along with some printfs. They were various markers and debug bits for me, and I didn't intend for them to go into the patch. The other issues all sound sensible to me. I'm currently working on a second version of the patch, with a --with-bzip2 switch to configure (via autoconf) to turn support on (and to incorporate some of the suggestions made).
> I did wonder about that, but the GZIP code does it the same way. Wanna fix that too? ;)
Luke, any luck with that updated patch?
Sorry, I've got distracted by work of late, and also because the fall-out of Java gcc-2.95 vs gcc-3.2 RH8 has meant I haven't upgraded for a while (policy here is that things must come as RPM!). I've just grabbed the 1.4 source and I'll try and edit the patch suitably. I plan to wrap the new bits in #ifdef BZIP2 so they can be turned on and off easily enough and also fix up the other issues raised, but my knowledge doesn't stretch as far as autoconf - any autoconf experts around? :-)
Hmm.... any good reason to make this disableable? It's not that much code.... If we decide we want to do that, I can probably try to figure out the autoconf end.
The bzip2 libraries would have to be bundled in if it were always on, and judging by the issues over libmng it seems code space is at a premium? Does bzp2 compile everywhere Mozilla does?
Ah, ok. The library issue is a good reason to have a compile-time flag...
Another thought on MIME types. As regards full types, I'd have thought application/x-bzip2 should be right and fits in with what people seem to be using for it. However we also need a short name for the Accept/Content-Encoding, and as far as I can see we send out Accept-Encoding gzip and accept a Content-Encoding of either gzip or x-gzip, as per RFC2616. However bzip2 isn't mentioned and not does it say how new ones should be named, other than "New content-coding value tokens SHOULD be registered". Should we send out in Accept-Encoding bzip2 or x-bzip2? Should we accept both as synonomous on Content-Encoding? My personal preference is for just "bzip2" to be sent out. I agree with comment 8 point 9 that it would be nice if the bzip2/gzip streams could be merged as they are so similar, but I couldn't think of a neat way of doing so. Damn compile-time member lookup!
Also, I forgot to mention I came across the following during a google: ---- 8. Using bzip2 with Netscape under XWindows tenthumbs@cybernex.net says: I also found a way to get Linux Netscape to use bzip2 for Content- Encoding just as it uses gzip. Add this to $HOME/.Xdefaults or $HOME/.Xresources I use the -s option because I would rather trade some decompressing speed for RAM usage. You can leave the option out if you want to. Netscape*encodingFilters: \ x-compress : : .Z : uncompress -c \n\ compress : : .Z : uncompress -c \n\ x-gzip : : .z,.gz : gzip -cdq \n\ gzip : : .z,.gz : gzip -cdq \n\ x-bzip2 : : .bz2 : bzip2 -ds \n ---- Was this supported under Communicator? And does Mozilla support it? Not that it's much of a substitute as it won't work on Windows etc.
mozilla does not read preferences from .Xdefaults like 4x communicator did. until there is an official "Accept-Encoding: bzip2", we should prefix what we send with "x-", but we should accept both forms as equivalent.
Hopefully this addresses most of the previous issues raised, and is diffed for Mozilla 1.4 release source. By default it won't enable bzip2 support because of the library issues mentioned previously - for that to happen the updated files need to be compiled with -DBZIP2, and libbz2.so needs linking in. To test I changed DEFINES in config/config.mk and OS_LIBS in config/autoconf.mk but I'm sure there are better places to do this. Also, all.js needs network.http.accept-encoding needs to be changed to "x-bzip2,gzip,deflate,compress" if bzip2 support is compiled in (I'm not sure if configure can do this).
Attachment #102322 - Attachment is obsolete: true
Attachment #127784 - Flags: review?(darin)
Comment on attachment 127784 [details] [diff] [review] Revised patch for bzip2 support i have a concern with this patch. i think that this patch will make it so that users will need to have the bzip2 library installed on their system. it seems like it might be better to explicitly load the bzip2 library (using PR_LoadLibrary). or, we could move this into an extension library. however, given that the amount of code is small, i prefer the PR_LoadLibrary option.
Attachment #127784 - Flags: review?(darin) → review-
another option might be to just statically link libbz2.a ... i think we'd need to verify that the license is compatible before we can do that though. here's the license from bzlib.h from my RedHat 9 box: /*-- This file is a part of bzip2 and/or libbzip2, a program and library for lossless, block-sorting data compression. Copyright (C) 1996-2002 Julian R Seward. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 3. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 4. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Julian Seward, Cambridge, UK. jseward@acm.org bzip2/libbzip2 version 1.0 of 21 March 2000 This program is based on (at least) the work of: Mike Burrows David Wheeler Peter Fenwick Alistair Moffat Radford Neal Ian H. Witten Robert Sedgewick Jon L. Bentley For more information on these sources, see the manual. --*/
Initially I'd had in mind that configure would check for the presence of -lbz2 and decide whether to build it in compile time, but I hadn't successfully been able to modify the autoconf setup to do this (I know little about autoconf). However the PR_LoadLibrary sounds interesting - I'll take a look into it. bzip2 claims to be BSD-style license, I don't know how that fits in with the MPL.
(In reply to comment #22) > Initially I'd had in mind that configure would check for the presence of -lbz2 > and decide whether to build it in compile time, but I hadn't successfully been > able to modify the autoconf setup to do this (I know little about autoconf). > However the PR_LoadLibrary sounds interesting - I'll take a look into it. configure checking is not sufficient unless we statically link to libbz2.a because otherwise the mozilla builds from ftp.mozilla.org would require the user to have installed libbz2.so. that would be a new dependency. sure, most linux distros include it, but we'd have to decide if we want to add that as a required dependency. given that bzip2 isn't in common use as a content-encoding, i think it'd be a hard sell. > bzip2 claims to be BSD-style license, I don't know how that fits in with the MPL. BSD-style license should mean that we're ok statically linking to the code, but i'm not an expert on such things. i'd need someone like mitchell or gerv to review this.
The license does not impose any additional requirements over and above the MPL, and so we could check it into the main tree just as we have libjpeg and libpng, if necessary. We could also include it in nightly builds if we wanted. Does that answer your question? Gerv
Gervase Markham wrote: > The license does not impose any additional requirements over and above the > MPL, and so we could check it into the main tree just as we have libjpeg and > libpng, if necessary. Erm... correct me if I am wrong: The MIT/X.org license does not impose any additional requirements over/above MPL either - but requests to include such code into the main tree was DENIED multiple times in the past. Was the policy for such checkins changed or am I missing something here ?
Roland: please email me with more details of whatever you are referring to. Gerv
Firefox now includes bzip2 for partial updates (because bsdiff works well with bzip2), so it might be easier to add bzip2 as a transfer-encoding now.
-> default owner
Assignee: darin → nobody
Status: ASSIGNED → NEW
Component: Networking: HTTP → Networking
QA Contact: networking.http → networking
Target Milestone: Future → ---
See also bug 366559, which asks for LZMA (7-zip) transfer-encoding support.
To All - It'd be great to revisit this issue at this time. Are there still licensing issues around bzip2 code inclusion? bzip2 transfer-encoding would be sweet. Save even more bytes on the wire :-). Cheers, - Bill
Due to its better decoding speed, brotli seems to be the right direction. That's being pursued in bug 366559 so I will close this.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: