Closed
Bug 305131
Opened 19 years ago
Closed 19 years ago
Mac l10n tinderbox builds are failing due to a stale mountpoint (mount-temp) that is used in the packaging process
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: mark)
References
()
Details
(Keywords: fixed1.8)
Attachments
(4 files, 1 obsolete file)
1023 bytes,
patch
|
Details | Diff | Splinter Review | |
1.52 KB,
patch
|
benjamin
:
review+
asa
:
approval1.8b4+
|
Details | Diff | Splinter Review |
1.32 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
2.65 KB,
patch
|
benjamin
:
review+
chase
:
approval1.8b4+
|
Details | Diff | Splinter Review |
ref: http://lxr.mozilla.org/mozilla/source/toolkit/mozapps/installer/packager.mk#103 The directory mount-temp is created as a mountpoint for the Mac disk image. However, something seems to be going wrong with the initial mount, because even on the first attempt after a reboot, the mountpoint is returning I/O errors. Subsequent mount and umount attempts also fail. Note: after a reboot, I have tried running the same series of hdiutil steps from UNMAKE_PACKAGE in packager.mk by hand, and they have succeeded.
please don't use webtools:tinderbox for issues w/ specific tinderboxes, this component is for the tinderbox software itself. if your problem is w/ the core build process, file a bug there, if your problem is w/ specific tinderboxes, use mozilla.org:tinderbox...
Assignee: mcafee → nobody
Component: Tinderbox → Build Config
Product: Webtools → Core
QA Contact: timeless → build-config
Comment 2•19 years ago
|
||
when did this stop working? There doesn't seem to be anything all that suspicious in the packager.mk changes except for the new DMG-builder... could that have caused this?
Reporter | ||
Comment 3•19 years ago
|
||
Using the fr builds as my guide post (http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla-l10n-fr), it looks like this repackaging error started appearing on 2005/08/10 19:01:00 http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fr/1123725660.2096.gz&fulltext=1 This may or may not be coincidence, but isn't that almost exactly the same time that we branched for 1.8, or at least when the new tinderbox builds for Mozilla1.8 were set up? None of the Mac Mozilla1.8-l10n-* builds have ever worked from what I can see.
Assignee | ||
Comment 4•19 years ago
|
||
This seems to have happened on 0810. http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla-l10n-fr&hours=24&maxdate=1123750800&legend=0 As you can see, there was trouble on all of the builds. When it cleared up, the Mac recovered for one cycle before losing its mind. Note that all of the green cycles before its last one had disk images on disk1, during the last green it was on disk2. If this happens again, let's start with "diskutil list" and take it from there. The DiskImages and DiskArbitration frameworks can be really finicky, although I've never seen the system hold such a death grip on mount points that they're not released after rebooting. In my experience, it's better to not specify your own mount points (hdiutil attach -mountpoint) when working with hdiutil. I've seen synchronicity problems doing that, and as we're seeing here, it's rough to get mount points back when the system decides to take its ball and go home. It's better to let the system mount the disk where it wants, in a fresh mount point in /Volumes. I initially wanted to do private non-/Volumes mounts in the new dmg packager too, but gave up because it's really not workable. Maybe UNMAKE_PACKAGE should follow suit.
Reporter | ||
Comment 5•19 years ago
|
||
I've rebooted maya again, and we'll see how this next build cycle turns out. If we're still seeing mountpoint issues, we can try out this patch.
Assignee | ||
Comment 6•19 years ago
|
||
The proposed patch will fall down whenever there's a mount point with a space in it. Even if we don't support volume names that are supposed to have spaces in them (I think we should), this will break as soon as the desired mount point is in use and the system decides to start appending numbers (/Volumes/Firefox 1). Since we're trying to avoid trouble when mount points go stale, we should keep this in mind and account for it. Also, this will most likely stall the build now that we've potentially got EULAs in disk images. hdiutil dumps the license into $PAGER's input pipe, using less if there is no $PAGER, asks whether you agree on stdout, and waits for a yes or no answer on stdin. So, we've got to get really cute with our attach invocation: echo Y | PAGER=true hdiutil attach -readonly -private -noautoopen $(UNPACKAGE) When it comes time to extract $(DEV_NAME) we've got to be wary of the prompt that hdiutil may have also put on stdout. We've got to match the /dev/disk pattern instead of awking for the first field. So instead of the sed-awk pipe, something like this: sed -e 's/^.*\(\/dev\/disk[^ ]*\).*$/\1/;1q'
Comment 7•19 years ago
|
||
I would *really* like to continue using explicit mountpoints unless we can document that it doesn't work: it seems like using the "echo Y | hdiutil" is what we need to do here (yuck... there should be a hdiutil -y flag or something).
Comment 8•19 years ago
|
||
*** Bug 305470 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 9•19 years ago
|
||
This implements the suggestions I made above, except it retains explicit mount points. This won't keep them from going stale. I don't have a documented procedure to reproduce the stale mount points, other than that I have noticed synchronicity problems (volumes not unmounted or devices not released until after detach returns). I think that we should take it seriously, since this started happening well before the packaging changes landed. So, if you don't like mounting in /Volumes, how would you feel about using -mountroot instead of -mountpoint? That treats the argument as a replacement for /Volumes and creates a mount point within, rather than treating the argument as the direct mount point. It still might not solve the staleness problem, but it might be worth a shot to see if it does work. We can use -mountpoint as long as 10.3 minimum is in use.
Assignee: nobody → mark
Status: NEW → ASSIGNED
Comment 10•19 years ago
|
||
Comment on attachment 193439 [details] [diff] [review] More reliable unpackaging Let's try it on trunk.
Attachment #193439 -
Flags: review+
Assignee | ||
Comment 11•19 years ago
|
||
Checked in to trunk. Marking FIXED, reopen/refile if stale mount points reappear; requesting approval1.8b4 for trunk fun.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 12•19 years ago
|
||
Comment on attachment 193439 [details] [diff] [review] More reliable unpackaging branch fun, I mean
Attachment #193439 -
Flags: approval1.8b4?
Assignee | ||
Comment 13•19 years ago
|
||
Reopening because this caused bustage. When hdiutil attaches disk images that it hasn't verified yet, it verifies them, producing output on stdout. The change to the code to extract DEV_NAME on disk images with licenses assumed that the first line would contain the device name, but this is not the case when dealing with unverified images.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 14•19 years ago
|
||
The stale mount points will need to be reclaimed on maya once this lands.
Attachment #193532 -
Flags: review?(benjamin)
Comment 15•19 years ago
|
||
Comment on attachment 193532 [details] [diff] [review] l10n bustage fix are the "echo" statements for debugging (will be removed before checkin)?
Assignee | ||
Comment 16•19 years ago
|
||
yes, echos are debug-only and will be removed
Updated•19 years ago
|
Attachment #193532 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 17•19 years ago
|
||
Fixed on trunk. Someone needs to unmount the temp mounts on maya.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 18•19 years ago
|
||
I've rebooted maya to fix the hung mountpoints.
Assignee | ||
Comment 19•19 years ago
|
||
This needs to be done on the branch in order to get Mac l10n builds. Right now, l10n branch builds are failing because the disk images now include EULAs but don't have a 24-hour operator standing by to click "Accept." hdiutil: attach failed - user canceled operation (because stdin's not hooked up to anything at all.)
Flags: blocking1.8b4?
Updated•19 years ago
|
Severity: normal → major
Flags: blocking1.8b5+
Flags: blocking1.8b4?
Flags: blocking1.8b4+
Target Milestone: --- → mozilla1.8beta4
Updated•19 years ago
|
Severity: major → normal
Flags: blocking1.8b5+
Target Milestone: mozilla1.8beta4 → ---
Updated•19 years ago
|
Attachment #193439 -
Flags: approval1.8b4? → approval1.8b4+
Comment 20•19 years ago
|
||
Have the mountpoints on Maya been removed yet since the branch checkin?
Assignee | ||
Comment 21•19 years ago
|
||
(In reply to comment #20) > Have the mountpoints on Maya been removed yet since the branch checkin? maya never got a chance to mount a branch dmg since its most recent boot, so there shouldn't be any stuck mount points.
Assignee | ||
Comment 22•19 years ago
|
||
So, this worked well on the trunk but it's not flying on the branch. I suggest rebooting maya first, since the branch and trunk are now running exactly the same code, and it's working on the trunk on the same machine - maybe those "cancelled" mount attempts during previous branch builds got the system into a weird state? If that doesn't work, it's time to seriously consider letting the system mount in /Volumes or an out-of-tree mountroot and (or?) possibly making the detach operation nonfatal, along the lines of detach || (sleep 5 && detach -force). The failure: set -e; unset NEXT_ROOT; export PAGER=true; mkdir mount-temp; echo Y | hdiutil attach -readonly -mountpoint mount-temp -private -noautoopen /builds/tinderbox/Fx-Mozilla1.8-l10n/Darwin_7.9.0_Clobber/firefox.dmg > hdi.output; DEV_NAME=`perl -n -e 'if($_=~/(\/dev\/disk[^ ]*)/) {print $1."\n";exit;}'< hdi.output`; MOUNTPOINT=`perl -n -e 'split(/\/dev\/disk[^ ]*/,$_,2);if($_[1]=~/(\/.*)/) {print $1."\n";exit;}'< hdi.output`; rsync -a ${MOUNTPOINT}/DeerPark.app firefox; hdiutil detach ${DEV_NAME}; "disk1" failed to unmount (0x0000C001) That's the error you get when a mount is busy. (Poking it with lsof might be educational.) Failed on the first attempted locale for Firefox: http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n-bg/1124946420.18733.gz&fulltext=1 Thunderbird: http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n-ca/1124947140.30685.gz&fulltext=1 Subsequent locales (and subsequent attempts for these builds until reboot) die on the stuck mount: http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n-fr/1124947140.31737.gz&fulltext=1 http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n-fr/1124947140.31737.gz&fulltext=1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•19 years ago
|
||
It seemed to work on my tree last night, at least for Thunderbird:
---------------
cd ../../dist/l10n-stage && \
set -e; unset NEXT_ROOT; export PAGER=true; mkdir mount-temp; echo Y | hdiutil
attach -readonly -mountpoint mount-temp -private -noautoopen
/Volumes/6GB_IBM_TEMP/SRC_ROOT_MZ18/mozilla/thunderbird-objdir/dist/thunderbird-1.0+.en-US.mac.dmg
> hdi.output; DEV_NAME=`perl -n -e 'if($_=~/(\/dev\/disk[^ ]*)/) {print
$1."\n";exit;}'< hdi.output`; MOUNTPOINT=`perl -n -e 'split(/\/dev\/disk[^
]*/,$_,2);if($_[1]=~/(\/.*)/) {print $1."\n";exit;}'< hdi.output`; rsync -a
${MOUNTPOINT}/Thunderbird.app thunderbird; hdiutil detach ${DEV_NAME};
"disk3" unmounted.
"disk3" ejected.
--------------------
Strange that it doesn't work on maya.
Comment 24•19 years ago
|
||
Rebooted maya and restarted scripts.
Assignee | ||
Comment 25•19 years ago
|
||
Now getting this on the trunk, although nothing significant has changed since the recent successful cycles: hdiutil: attach failed - codec overrun http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fr/1125001140.8160.gz&fulltext=1 So is it not picking up good dmgs from atlantia? This makes me want to cry.
Assignee | ||
Comment 26•19 years ago
|
||
The codec overrun is probably just a bad dmg. It only affected Thunderbird/trunk. Firefox/trunk went red because of bug 305686 - backed out until I've got something better. I still expect the branch to go green.
Assignee | ||
Comment 27•19 years ago
|
||
But it didn't. It's still getting "disk1" failed to unmount (0x0000C001) probably for reasons similar to those causing it to be broken before the new packager landed. I have to do more work on the unpackager for bug 305686 anyway, I'll integrate some of my suggestions from comment 22.
Comment 28•19 years ago
|
||
the error I am seeing on branch thunderbird is different, though - more like the one described in comment #25, but with an additional warning: ----------------------------------- make[1]: *** Warning: File `/builds/tinderbox/Tb-Mozilla1.8-l10n/Darwin_7.9.0_Clobber/thunderbird.dmg' has modification time in the future (2005-08-25 13:54:18 > 2005-08-25 13:54:02) ----------------------------------- weird...
Assignee | ||
Comment 29•19 years ago
|
||
On the -de tinderbox (and other "early" ones), you mean? I noticed that too. It's because the file is downloaded (with wget) from staging and brings the file's original timestamp with it. The clocks aren't in sync. You'll see that the messages are gone after some time has elapsed and it gets to -fr. Odd that the times worked out this way with both branch and trunk, but it seems plausible given the end times of triton's recent builds. So now there are some bogus mounts again because of the rsyncs that failed. set -e is too simple-minded for our purposes here.
Comment 30•19 years ago
|
||
Trying to unmount the mounted disk array is interesting. $ umount /dev/disk1s2 /builds/tinderbox/Fx-Mozilla1.8-l10n/Darwin_7.9.0_Clobber/mozilla/dist/l10n-stage/mount-t: No such file or directory ls on .../mount-t shows it doesn't exist (of course), there's .../mount-temp/. Gah, buffers.
Comment 31•19 years ago
|
||
(In reply to comment #25) > Now getting this on the trunk, although nothing significant has changed since > the recent successful cycles: > > hdiutil: attach failed - codec overrun > > http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fr/1125001140.8160.gz&fulltext=1 > > So is it not picking up good dmgs from atlantia? Probably maya was downloading the .dmg file from stage.m.o at the same time triton was uploading a new .dmg file. This seems possible given the available data. triton started building 13:38, finished 13:57. maya's wget started 13:53:31, finished 13:53:34. Since all of the build systems use scp to upload, they are streaming bytes directly to these files. This is the first time I recall something like this happen in our build farm. Ideally we should use rsync to upload builds. (rsync writes to '.filename' and move it to 'filename' after the transfer is complete.) Even more ideally, we should download the nightly builds, not the hourly builds, generally for different reasons but also because they are uploaded to a dated directory on stage.m.o and then copied locally (lessening the window in which a build file could be grabbed). > This makes me want to cry. I second your emotion.
Assignee | ||
Comment 32•19 years ago
|
||
With Chase's help on maya, we learned some interesting things that should help fix this for good. Here's what I'm up to this time: Mounts are in /tmp now. It's done with -mountroot, so the system will pick a good name that's not in use. This helps in case something gets wedged. Keeping the mounts out of the tree means rm -rf l10n-stage won't croak if the unmount fails. Most importantly, it avoids the asinine 89-character fixed buffer limitation. This is much better about unmounting, too. It unmounts whenever an interim operation (like an rsync) fails, then it throws the build process. And if an unmount attempt fails, it backs off for a few seconds and then attempts to forcibly unmount. The other changes are in support of bug 305686. Benjamin, I've made the assumption that dist/branding exists in the l10n tree at unpackaging time and stays around through repackaging. If the assumption is incorrect, we'll need to mkdir -p it in (browser|mail)/locales/Makefile.in - will cover if necessary in 305686, please advise. I'm leaving set -x in intentionally this time, because UNMOUNT_PACKAGE is hairy and it'll be nice to be clued in if something else goes wrong, now or ever. I thought that this would come out much more disgustingly than it did. I've kind of got a soft spot for it now, in that ugly-baby sort of way. I've got to reboot to clean out some hung mount points.
Attachment #193892 -
Flags: review?(benjamin)
Assignee | ||
Comment 33•19 years ago
|
||
Comment on attachment 193892 [details] [diff] [review] v3, revamp to stop the tears evil version of the patch, hold on
Attachment #193892 -
Attachment is obsolete: true
Attachment #193892 -
Flags: review?(benjamin)
Assignee | ||
Comment 34•19 years ago
|
||
Always quote optional arguments in the section that rips the resources out of the disk image to avoid shell vomit.
Attachment #193894 -
Flags: review?(benjamin)
Assignee | ||
Comment 35•19 years ago
|
||
Comment on attachment 193894 [details] [diff] [review] v4, stop the tears from the other eye + return $$1 || $$?; \ should be + return $$1 && $$?; \
Assignee | ||
Comment 36•19 years ago
|
||
Comment on attachment 193894 [details] [diff] [review] v4, stop the tears from the other eye Also need to quote all uses of ${MOUNTPOINT}: + rsync -a "$${MOUNTPOINT}/$(_APPNAME)" $(MOZ_PKG_APPNAME) || cleanup 1; \ The only weird one is this: + (rsync -a "$${MOUNTPOINT}/.background/`basename "$(MOZ_PKG_MAC_BACKGROUND)"`" "$(MOZ_PKG_MAC_BACKGROUND)" || cleanup 1); \ because the quote nesting looks wrong, but I've tested it and it's right.
Updated•19 years ago
|
Attachment #193894 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 37•19 years ago
|
||
The latest batch of fixes are on the trunk.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•19 years ago
|
Attachment #193894 -
Flags: approval1.8b4?
Updated•19 years ago
|
Attachment #193894 -
Flags: approval1.8b4? → approval1.8b4+
Assignee | ||
Comment 38•19 years ago
|
||
"codec overrun" again on trunk Thunderbird - timestamp on staging is 1306, build attempt completed 1309, so it fits again. Not funny.
Assignee | ||
Comment 39•19 years ago
|
||
Finally, the first successful branch Mac l10n, complete with EULA repackaging: http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n-fr/1125087480.12036.gz&fulltext=1
Comment 40•19 years ago
|
||
(In reply to comment #38) > "codec overrun" again on trunk Thunderbird - timestamp on staging is 1306, build > attempt completed 1309, so it fits again. Not funny. Teh suck. I've done emergency surgery on triton's build scripts to use rsync. I wish we hadn't needed to do this under pressure but it looked like something that must happen asap. Respins soon to be happening on all of triton's trees. Keep your fingers crossed. If it works the changes can be replicated to all other build systems.
Comment 41•19 years ago
|
||
(In reply to comment #40) > Respins soon to be happening on all of triton's trees. Keep your fingers > crossed. If it works the changes can be replicated to all other build systems. Looks like it worked based on the output of the latest trunk build process. I'll commit my changes.
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•