Closed Bug 605524 Opened 14 years ago Closed 14 years ago

Can't open omni.jar due to an invalid header

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dot_ulli, Unassigned)

References

Details

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20101018 Firefox-4.0/4.0b8pre Build Identifier: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20101019 Firefox/4.0b8pre Instead of starting with "PK", the file starts with four byte of 0x00. This prevents the opening this file with the gui-orientated "file roller", a normal "unzip" will report the error in a more reproducible fashion. Reproducible: Always Steps to Reproduce: 1. open a terminal 2. cd to the unpacked directory 3. unzip -x omni.jar > /dev/null Actual Results: $ unzip -x omni.jar > /dev/null warning [omni.jar]: 3221038 extra bytes at beginning or within zipfile (attempting to process anyway) error [omni.jar]: reported length of central directory is -3221038 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... Expected Results: There are no reported errors from unzip and a normal usage within "file roller" is possible.
I can confirm what .Ulli described. I get the same error message he mentioned following the steps he described.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: de / German → Build Config
Product: Mozilla Localizations → Core
QA Contact: german.de → build-config
a) There is no requirement that every zip program in the world has to open omni.jar. omni.jar exists solely to make firefox start faster. b) zip programs that scan the begining of the zip file for magic signatures and then complain about it are overly draconian since the pointer to central directory is tranditionally located AT THE END of the file. c) programs like Windows explorer, winrar/etc work fine. Even unzip works fine is you disregard the incorrect warning + error code. If you really must open an optimized zip file using an anal zip program it must be deoptimized first. run /path/to/moz/source/config/optimizejars.py --deoptimize ./ ./ ./ in the directory that omni.jar is in.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
The assumption this bug is a duplicate of bug 595743 is wrong. It is a remaining bug in the german build process, other languages haven't been tested. The invalid header within the nightly and hourly builds is vanished, even tracemonkey is valid. Besides, this invalid header makes the side compare en-US / de very nasty, as at this very moment is it impossible to use the same time saving tools.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Ulli, what makes you think it is different? You have described the exact same symptoms as bug 595473, and we have made an explicit decision not to fix these symptoms.
(In reply to comment #3) > The assumption this bug is a duplicate of bug 595743 is wrong. It is a > remaining bug in the german build process, other languages haven't been tested. The steps you describe do not look like they are part of the build process. Can you describe the exact build steps that are failing? > > The invalid header within the nightly and hourly builds is vanished, even > tracemonkey is valid. I'm not sure what you mean. Only Windows PGO builds should have optimized omni.jar. Which specific builds are causing you problems? > > Besides, this invalid header makes the side compare en-US / de very nasty, as > at this very moment is it impossible to use the same time saving tools. You can use whiny tools if you deoptimize the jar.
(In reply to comment #4) > Ulli, what makes you think it is different? It's the header with the 4 leading bytes of 0x00. > You have described the exact same symptoms as bug 595473, and we have made an > explicit decision not to fix these symptoms. I've been following the discussion at mozillaZine too. There have been invalid headers, don't know the date. Some days later the header was valid - even you didn't fix it.
(In reply to comment #5) > The steps you describe do not look like they are part of the build process. > Can you describe the exact build steps that are failing? No, i have no knowledge of your build process. But there must be differences. > I'm not sure what you mean. Only Windows PGO builds should have optimized > omni.jar. Can't say anything about this point, I'm only using Linux > Which specific builds are causing you problems? Current nightly 21-Oct-2010 http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central-l10n/firefox-4.0b8pre.de.linux-x86_64.tar.bz2 has an *invalid* header. ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-4.0b8pre.en-US.linux-x86_64.tar.bz2 has a *valid* header That the reason I assume there's a difference within the l10n build process. > You can use whiny tools if you deoptimize the jar. Never thought of this. Perhaps there is a need of documentation for the future customization and deployment process.
(In reply to comment #6) > (In reply to comment #4) > > Ulli, what makes you think it is different? > It's the header with the 4 leading bytes of 0x00. unzip correctly ignores that. The problem is the distance between central directory and end-of-central-directory marker at the end of file. I just tried sticking the typical pkzip signature on the front, it doesn't help with any broken extraction tools. It helps the UNIX file utility identify the file, that's about it. > > > You have described the exact same symptoms as bug 595473, and we have made an > > explicit decision not to fix these symptoms. > > I've been following the discussion at mozillaZine too. There have been invalid > headers, don't know the date. Some days later the header was valid - even you > didn't fix it. If the headers were occasionally "valid", it's a minor bug since it will result in a handful of extra page faults. > Current nightly 21-Oct-2010 > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central-l10n/firefox-4.0b8pre.de.linux-x86_64.tar.bz2 > has an *invalid* header. > > ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-4.0b8pre.en-US.linux-x86_64.tar.bz2 > has a *valid* header > > That the reason I assume there's a difference within the l10n build process. Thanks for the examples. You are right, repacked jars get slightly optimized, but the nightlies don't(since there is no locale repacking there). Slightly because the order of the jar members remains the same, but the central directory is still moved to the front which helps avoid a few seeks + page faults. To summarize: The error you are seeing is to due to unzip being overly strict. The issue is in unzip reporting errors where there are none. The problem is that certain zip programs were not tested on unusual zip files. There should be no errors reported given that all of the zip members are found and pass the checksum checks.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → WORKSFORME
Just an FYI, with FF 4.0.1 (so not a nightly) I see the following error with Cygwin unzip (Info-Zip UnZip 6.0-10). It seems to unzip it ok, but gives the following error: warning [omni.jar]: 4035823 extra bytes at beginning or within zipfile (attempting to process anyway) error [omni.jar]: reported length of central directory is -4035823 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... I've reported it to Info-ZIP.
The .ZIP File Format Specification seems pretty clear that a header is required - http://www.pkware.com/documents/casestudies/APPNOTE.TXT. And the JAR specification is pretty clear it follows the ZIP specification. The fact that some unzip utilities (and it seems a minority of them) can deal with doesn't make the jar valid. I'll add, "MS handles it" isn't really an indicator of standard conformance either :). Unless you are drawing on an updated standard from somewhere?
Some more feedback from InfoZip author on the "bug" report I sent them. Is there a really good reason we are missing why this non-standard jar format is being used? I just did an initial dissection of omni.jar. It seems the structure of this archive is: Central Record #1 Central Record #2 ... Central Record #1704 (at position 164087) End of Central Directory Record (at position 164183) Local Record # 1 (at position 164205) Local Record # 2 (at position 164316) ... with local records apparently to the end of the archive. I looked at the end and found no signatures for the an end End of Central Directory Record (EOCDR), so the above is the only one found. Here are the contents of that EOCDR: record # 1705 - end of central dir record # 1 (pos 164183) number of this disk: 0 number of the disk with the start of the cen dir: 0 total number of entries in cen dir on this disk: 1704 total number of entries in the cen dir: 1704 size of the central directory: 164179 offset of cen dir start with respect to start disk: 4 file comment length: 0 no .ZIP file comment There are many reasons this modified format does not make sense. - First, the EOCDR is supposed to be at the end of the archive as the only definitive pointer into the rest of the .zip file record structure. Hiding this record in the middle of the archive defeats this primary purpose. - Second, it seems this archive is designed for unzip utilities that can find signatures in the archive, but this is not a fully reliable way to read an archive. The most reliable method is to use the EOCDR to find where the records are. It's possible that the 4-byte signatures can appear elsewhere in binary data in an archive, thus leading to possible misreading or inability to read archive data. - Third, as the .zip standard (AppNote) specifies a very particular order of structures (local headers and data, central directory records, then the EOCDR within 65K bytes of the end, many assumptions by zip and unzip utilities may be wrong when applied to this archive variant. There are reasons these structures are ordered as they are. - Fourth, as far as I know, no current widely used zip or unzip utility would be able to recreate this structure. As noted, some utilities (like UnZip and Zip using -FF) should be able to pull information out of the archive, but it seems almost no one would be able to update it (and output the update in this new format). This almost appears to be creating a "proprietary" format with a possible intent to prevent others (except those with special tools or binary editors) from updating this ".jar" file. It's easy enough to do once the archive is pulled apart, but that's not something most probably can or want to do. - Fifth, the first 10 bytes of this archive are (in hex): b8 b6 14 00 50 4b 01 02 17 0b P K 1 2 The P K 1 2 is the first Central Directory signature. The 4 bytes before that appear to be some index. Using little endian math, this ends up being 1357496, which happens to be the offset to local record #210, a .png file. Not sure if this is what it is or why this is useful. However, if everyone starts tucking their own bytes into the .zip structure without using the established procedures for doing that, I think it's asking for chaos. The standard allows leading bytes, but usually the EOCDR can point a utility to the beginning of the real archive data, but in this case the EOCDR is buried where no one can find it without scanning the archive. - Sixth, this format would have problems implementing Zip64 for files and archives larger than 4 GB. In particular, specific information is added around the EOCDR to provide some additional needed information. Since the EOCDR is not where it should be, it seems finding the additional Zip64 information would be harder and less reliable. Adding any of the other more complex new structures would be more complex, if even possible, so this format would likely force a break from the zip standard. The main value of the zip standard is it's a common language for archiving data, so this would probably not be good for jar files in the long run. I haven't dug deep enough to verify that the records themselves are all properly formatted, but there are signs of some possible irregularities.
Without going into all the complicated details, Mozilla is supposed to be using standard zip format for jar files. Before Omni.jar, all jar files started with the usual PK.. signature, readily understood by many decompression utilities. Those who have reason to open such jar files use these usual utilities. Changing the format to some non-standard variant causes unnecessary problems. Particularly for those who don't use a Microsoft version of Mozilla. So this is definitely a bug which should be fixed. The question is, why wouldn't it be quickly corrected ? What would it cost Mozilla to fix it ? (Essentially nothing.) And look what it costs the community, who might like to contribute, if it isn't fixed ? (For many on Linux, not being able to contribute.)
I have just verified the zip specification at http://www.pkware.com/documents/casestudies/APPNOTE.TXT and omni.jar clearly violated this format. Starting with the required signature and header format. So Mozilla, if this isn't corrected, abandons open formats. Clearly this makes it a bug to be fixed.
(In reply to comment #13) > I have just verified the zip specification at > http://www.pkware.com/documents/casestudies/APPNOTE.TXT > > and omni.jar clearly violated this format. > Starting with the required signature and header format. The document clearly states that the signature isn't required, and that implementors need to be prepared for the possibility that it isn't. Please read https://blog.mozilla.com/tglek/2010/09/14/firefox-4-jar-jar-jar/ for answers to all your question, including why this was done (to minimize disk seeks) and whether it should be fixed (it shouldn't).
Also, before anyone expends any more words on the topic: It does not matter one bit what format Mozilla chooses to ship our internal data files in. We have been using the ZIP format because it was convenient and we had in-tree code to handle it. If we decide that we can get better performance characteristics out of a custom format, we are free to do that, we are the ones shipping and supporting Firefox. Nowhere have we ever promised that third parties would always be able to poke at the data files shipping with Firefox with standard tools, and I don't know why anyone thinks we're beholden to that. Please leave this bug alone, we are not going to change this.
Actually, Mozilla CAN change things. Let me explain how, and why I think they should. The omni.jar format is not the standard zip format. Many contributors to Mozilla depended on being able to easily open *.jar files to make contributions. Sure, many zip utilities, with certain parameters, can open omni.jar files, either directly or by conversion to a standard zip format. And subsequently extract the contents of omni.jar, to make required changes. However, Mozilla core is unable to read standard zip formats. Considering the goals of Mozilla, this in itself is not the problem. The problem is mascarading omni.jar as a zip format. Mozilla should designate a different format, let's call it "omni". So a file in this format would be called *.omni. Any contributor needing to access an omni file would convert it to *.zip, make the modifications necessary, and reconvert it to *.omni. Since there would be a suffix corresponding to the file type, there is no need to hide the files in directories, as is done by the optimizejars.py script. By using a new suffix to make the different format clear, the confusion causing the problem disappears. As well, The index at the end of omni files would no longer be needed, which would reduce redundancy in the omni file format. I've made a bash script which makes it easier to use optimizejars.py, but it would be very nice to forgo the directories. (I call it omnizip.sh, and it uses a single argument, either "omni" or "zip" to determine the direction of conversion.) Since people at Mozilla wrote the optimizejars.py script, they would be able to more readily adjust it to use files directly with appropriate suffixes. If you want to be more friendly to Mozilla contributors. My 2 cents :)
(In reply to Siddharth Agarwal [:sid0] from comment #14) > Please read https://blog.mozilla.com/tglek/2010/09/14/firefox-4-jar-jar-jar/ This unusual format hinders contributors. How much benefit/speed brings it to the end user when Startup Cache is used?
Since there are a lot of incorrect statements in this thread, let's recap for the many folks who keep checking out this bug. The omni.jar file is NOT in jar format (as confirmed by many experts). This is because it has been "optimized" for performance reasons by Mozilla. Mozilla states they will not fix this because "they are internal files" (which I don't really accept for an open source project) Suggested fixes from the community are: - Change the suffix so its not pretending to be something its not - Change the optimization so it doesn't break JAR format, which seems to still be doable while still keeping the performance advantages (for example, leave the index at the end also and hide the new index into the first "file"). Also see discussion here - https://blog.mozilla.com/tglek/2010/09/14/firefox-4-jar-jar-jar/ Mozilla currently plans to do nothing. If you don't agree please vote and/or add your counter arguments here. Plans change and if there are enough votes next time someone else is looking at this code maybe they will make it happen.
Quick clarification on my recap, the problem is at the ZIP level, not the JAR level.
(In reply to David Rees from comment #18) > Suggested fixes from the community are: > - Change the suffix so its not pretending to be something its not Which we've done. The file is called omni.ja now.
Thanks, that definitely will avoid initial confusion and make it clear to people they need to dig deeper. Longer term supporting JAR format would be even nicer :).
(In reply to David Rees from comment #21) > Longer term supporting JAR format would be even nicer :). Well, we never followed the actual JAR format anyhow, the only parallel was that it's a zip file containing some kind of app code.
In the past I couldn't load modified omni.jar files (now called omni.ja), using the conversion program supplied. However if the zip file were not modified, but the omni file just converted in the 2 directions, everything worked fine. Has this been corrected ? (I'm still using Seamonkey 2.0.14 to continue using patches to the interface that I need.)
See Also: → 991459
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.