bug 578393 added a file with checksums in it -- which is great -- but it'd be nicer if it didn't contain leading paths in the last column. Eg: 5d8dc2206533ecff53ece54c6fd0082de31c5f9ad4473d4008ca649c320743be5b1c9b8c5eaa8a8d5d736523d0fdb65354b0e0143cabf8826c5f18c913107846 sha512 13507124 firefox-4.0b8pre.en-US.linux-i686.complete.mar instead of: 5d8dc2206533ecff53ece54c6fd0082de31c5f9ad4473d4008ca649c320743be5b1c9b8c5eaa8a8d5d736523d0fdb65354b0e0143cabf8826c5f18c913107846 sha512 13507124 update/firefox-4.0b8pre.en-US.linux-i686.complete.mar
Catlee made me think about this a bit more and I realized that the thing I thought was wrong was that the checksums file layout didn't match that of directory. Eg, the directory was completely flat, and the checksums file was partly nested.
Summary: checksums file shouldn't contain leading directories → checksums file filenames should match directory layout
the script that does the hashes has a -s parameter that strips off the majority of the directory before inclusion in the file. It would be easy to strip off all path information, but my concern and reason for not doing this was that files in UPLOAD_EXTRA_FILES could make the entries ambiguous. Take the contrived example: objdir/dist/aaa/file objdir/dist/bbb/file objdir/dist/ccc/file The current implementation would include the files as $ cat checksums eeb9aa4457ecacf281665b59cb516a618d107a1b sha1 8 aaa/file 32e8f757720907532aacd1b99fbe6157054773b6 sha1 8 bbb/file 7b51497705cb2f262cf3484d2b445feba35686b1 sha1 8 ccc/file which would provide each file in the checksum output with a unique entry. If we were to strip all path information naively, we would end up with $ cat checksums eeb9aa4457ecacf281665b59cb516a618d107a1b sha1 8 file 32e8f757720907532aacd1b99fbe6157054773b6 sha1 8 file 7b51497705cb2f262cf3484d2b445feba35686b1 sha1 8 file In this case, we would have to compute the checksum of each file named 'file' to figure out which entry is for aaa/file. If any of the files were corrupt, we wouldn't be able to tell which file as the aaa/file entry. This would still tell us that the file is not valid. It also means that if during generation, aaa/file was accidentally swapped with bbb/file, we would not be able to pick up on the reversal and would continue to treat aaa/file as bbb/file and vice-versa. I had started working on a function that would remove all path information that was not needed to uniquely identify each file. I ended up not including this in the patch because my implementation was not bullet-proof and it should be trivial to strip this information in a consuming file. If we can assert that there are not going to be multiple files with the same name in one upload, I can write a patch to only include the basename of each file. If we cannot make the assertion, I will write a better implementation of the function to remove all path information that is not needed to uniquely identify each file.
(In reply to comment #2) > the script that does the hashes has a -s parameter that strips off the majority > of the directory before inclusion in the file. It would be easy to strip off > all path information, but my concern and reason for not doing this was that > files in UPLOAD_EXTRA_FILES could make the entries ambiguous. > > Take the contrived example: > > objdir/dist/aaa/file > objdir/dist/bbb/file > objdir/dist/ccc/file > > The current implementation would include the files as > > $ cat checksums > eeb9aa4457ecacf281665b59cb516a618d107a1b sha1 8 aaa/file > 32e8f757720907532aacd1b99fbe6157054773b6 sha1 8 bbb/file > 7b51497705cb2f262cf3484d2b445feba35686b1 sha1 8 ccc/file > > which would provide each file in the checksum output with a unique entry. If > we were to strip all path information naively, we would end up with > > $ cat checksums > eeb9aa4457ecacf281665b59cb516a618d107a1b sha1 8 file > 32e8f757720907532aacd1b99fbe6157054773b6 sha1 8 file > 7b51497705cb2f262cf3484d2b445feba35686b1 sha1 8 file What do the files look like when uploaded, in this example? They can't all be in the same directory remotely.
In my testing, they ended up in directories matching their local layout. A concrete example of the upload locations matching the checksum file locations: firefox-4.0b8pre.en-GB.linux-x86_64.tar.bz2 install/firefox-4.0b8pre.en-GB.langpack.xpi update/firefox-4.0b8pre.en-GB.linux-x86_64.complete.mar update/firefox-4.0b8pre.en-GB.linux-x86_64.partial.20101029031004-20101029030658.mar ended up with /home/jhford/mozilla/fakestage/ /home/jhford/mozilla/fakestage/firefox-4.0b8pre.en-GB.linux-x86_64.tar.bz2 /home/jhford/mozilla/fakestage/update /home/jhford/mozilla/fakestage/update/firefox-4.0b8pre.en-GB.linux-x86_64.partial.20101029031004-20101029030658.mar /home/jhford/mozilla/fakestage/update/firefox-4.0b8pre.en-GB.linux-x86_64.complete.mar /home/jhford/mozilla/fakestage/install /home/jhford/mozilla/fakestage/install/firefox-4.0b8pre.en-GB.langpack.xpi /home/jhford/mozilla/fakestage/firefox-4.0b8pre.en-GB.linux-x86_64.checksums Obviously, there aren't going to be naming collisions in this case. As best as I can tell, post upload logic is flattening the directory structure. If that is the case, I would suggest that post upload logic be responsible for changing the checksum file layout. a couple things we could do: -teach post_upload.py (?) how to flatten these files -have a flag to checksums.py that says to use only the basename of files -write consumers in a way that extracts only the needed information from the file
OS: Mac OS X → All
Hardware: x86 → All
I think it's beyond the scope of post_upload.py to alter files in any way. A flag to the checksums script seems pretty sensible, how do you plan to expose that through the makefile targets? If this is turning out to be a rat's nest feel free to punt on it, I don't think it's a high priority item.
(In reply to comment #5) > I think it's beyond the scope of post_upload.py to alter files in any way. > > A flag to the checksums script seems pretty sensible, how do you plan to expose > that through the makefile targets? > > If this is turning out to be a rat's nest feel free to punt on it, I don't > think it's a high priority item. well, either a hard-coded command line option or use a make variable to control whether to strip or not. The implementation of the flag is simple, instead of printing out the filename, we would print out os.path.basename().
You need to log in before you can comment on or make changes to this bug.