Firefox 3.1 Beta 3 Korean mac build didn't come up automated

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: Pike, Assigned: bhearsum)

Tracking

({fixed1.9.1})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
As said in bug 475032, the initial automation run for fx3.1 beta 3 lacked a mac build for korean.

We should figure out exactly what happened to not run into that again.

I doubt that all the parallel builds stuff helps if we end up padding around single machine issues manually after the automation.
(Assignee)

Comment 1

9 years ago
I'm looking into this a bit.

The symptom here is that this slave failed to mount the en-US dmg, which prevented it from even attempting to repack the locale. It was happily doing l10n builds previous to this, and even managed to do two locales afterwards.
Assignee: nobody → bhearsum
(Assignee)

Comment 2

9 years ago
From the log:
if test -d ../../dist/l10n-stage; then \
	  rm -f -r -v ../../dist/l10n-stage; \
	fi
/builds/slave/macosx_repack/build/mozilla-1.9.1/config/nsinstall -D ../../dist/l10n-stage
cd ../../dist/l10n-stage && \
	  set -ex; function cleanup() { hdiutil detach ${DEV_NAME} || { sleep 5 && hdiutil detach ${DEV_NAME} -force; }; return $1 && $?; }; unset NEXT_ROOT; export PAGER=true; expect /builds/slave/macosx_repack/build/mozilla-1.9.1/build/package/mac_osx/installdmg.ex /builds/slave/macosx_repack/build/mozilla-1.9.1/firefox.dmg > hdi.output; DEV_NAME=`perl -n -e 'if($_=~/(\/dev\/disk[^ ]*)/) {print $1."\n";exit;}'< hdi.output`; MOUNTPOINT=`perl -n -e 'split(/\/dev\/disk[^ ]*/,$_,2);if($_[1]=~/(\/.[^\r]*)/) {print $1;exit;}'< hdi.output` || cleanup 1; rsync -a "${MOUNTPOINT}/Firefox.app" firefox || cleanup 1; test -n "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/dsstore" && { rsync -a "${MOUNTPOINT}/.DS_Store" "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/dsstore" || cleanup 1; }; test -n "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/background.png" && { rsync -a "${MOUNTPOINT}/.background/`basename "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/background.png"`" "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/background.png" || cleanup 1; }; test -n "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/disk.icns" && { rsync -a "${MOUNTPOINT}/.VolumeIcon.icns" "/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/branding/disk.icns" || cleanup 1; }; cleanup 0; if test -n "" ; then cp /builds/slave/macosx_repack/build/mozilla-1.9.1/firefox.dmg Firefox.tmp.dmg && hdiutil unflatten Firefox.tmp.dmg && { /Developer/Tools/DeRez -skip plst -skip blkx Firefox.tmp.dmg > "" || { rm -f Firefox.tmp.dmg && false; }; } && rm -f Firefox.tmp.dmg; fi; 
+ unset NEXT_ROOT
+ export PAGER=true
+ PAGER=true
+ expect /builds/slave/macosx_repack/build/mozilla-1.9.1/build/package/mac_osx/installdmg.ex /builds/slave/macosx_repack/build/mozilla-1.9.1/firefox.dmg
++ perl -n -e 'if($_=~/(\/dev\/disk[^ ]*)/) {print $1."\n";exit;}'
+ DEV_NAME=
++ perl -n -e 'split(/\/dev\/disk[^ ]*/,$_,2);if($_[1]=~/(\/.[^\r]*)/) {print $1;exit;}'
+ MOUNTPOINT=
+ rsync -a /Firefox.app firefox
rsync: link_stat "/Firefox.app" failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at /SourceCache/rsync/rsync-30/rsync/main.c(717)
+ cleanup 1
+ hdiutil detach
hdiutil: detach: no device name specified
Usage:	hdiutil detach [options] <devname>
	hdiutil detach -help
+ sleep 5
+ hdiutil detach -force
hdiutil: detach: no device name specified
Usage:	hdiutil detach [options] <devname>
	hdiutil detach -help
make: *** [/builds/slave/macosx_repack/build/mozilla-1.9.1/dist/l10n-stage/Firefox/Firefox.app/Contents/MacOS] Error 1
(Reporter)

Comment 3

9 years ago
This is MOUNTPOINT not being set, which hints at a problem with hdi.output.

I guess that shouldn't happen, but just to make sure, where there other builds running at that time on this slave?
(Assignee)

Comment 4

9 years ago
No, we use max_builds=1 on the slaves, there's only every 1 build of any time being run at a time.
(Assignee)

Comment 5

9 years ago
I had at one point suspect that a Firefox dmg was already mounted on this machine. I tried to reproduce this behaviour by manually mounting a dmg before attempting to build the locale, but it worked fine.
(Assignee)

Comment 6

9 years ago
I mounted the Firefox.dmg roughly 500 times on moz2-darwin9-slave07 and was unable to reproduce a situation where hdi.output was empty.
(In reply to comment #6)
> I mounted the Firefox.dmg roughly 500 times on moz2-darwin9-slave07 and was
> unable to reproduce a situation where hdi.output was empty.

I believe it was Nick that during my internship I hit this problem with my own machine and few times with some of the mac machines. It got fixed by restarting the machine.

CCed Nick to see if he remembers what was the reason
We'd occasionally find that hdiutil would be hanging onto a mounted dmg and then that screwed up subsequent runs - restarting the machine was the simplest fix. I don't know if that's because tinderbox-based l10n was more fragile or if we're hitting this with buildbot at all, it's mostly in the build system anyway (which sees it's own changes) and I'd assumed it was occasional crapness on the part of the OS. We don't have great visibility to failures unless you dive through the Mozilla-l10n tree, manually excluding locales that were busted at that point. Can't think of anything that Ben hasn't checked already, and really the environment for that particular failure is long gone (subsequent builds and so on).

To address Axel's concerns about manual steps in the top comment, it would be really great if we could use buildbot's Rebuild button for a failed build. I have no idea how hard that is, but when I tried to redo the win32/da it failed to find a en_revision property to substitute here
 http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l817
Log is at
 http://production-master.build.mozilla.org:8010/builders/win32_repack/builds/128/steps/shell_7/logs/err.text

The source steps could also do with a bit of rationalising, as we update to FIREFOX_3_1b3_RELEASE twice, once with a pull in BaseRepackFactory.__init__() and then later in ReleaseRepackFactory.updateSources() (see buildbotcustom/process/factory.py). And we build in nsprpub and config twice. Full set of logs for a locale build is at 
 http://production-master.build.mozilla.org:8010/builders/win32_repack/builds/127
That's probably drifting far enough from the scope of this bug for now :-).
(Reporter)

Comment 9

9 years ago
Yeah, Rebuild ends up in resubmitBuild(), http://hg.mozilla.org/build/buildbot/file/a2845f74d55d/buildbot/process/builder.py#l720, which doesn't pay attention to build properties. l10n builds need that, though.

Not sure what the right upstream fix would be. In particular, I'd hit an edge case (or a bug in my design) as I'm setting up the build with the revisions set to 'default' and overwrite those props with the got revision. Or I'd hit the build with the tag, and get the tagged source stamp in return. It might be right to actually pick up the revision from the previous build, so it might actually be a feature, and one should copy over all props. WAY out of the scope of this bug.

We should consolidate our build factories in l10n land anyway. On the l10n server, I'm unpacking the stage with a separate build step anyway (as I want to make ident to get the binary revisions), which would show a distinct error in this scenario. Which would make it easier to data mine the regular builds, and see if we have a systematic problem.
(Assignee)

Comment 10

9 years ago
I don't know what else to do here. This is nearly impossible to diagnose without hdi.output. Maybe we should change that makefile to use hdi.$locale.output or something, so we don't lose it next time?
(Reporter)

Comment 11

9 years ago
My prime suspect would be something in hdiutil going funky, and thus the expect script not going fine.

I think we should consider looking at the expect script if we can make it through out debugging information on failure. Or convert that to python :-).
(Assignee)

Comment 12

9 years ago
How about we switch to '| tee hdi.output' so we get it in the log, too?
(Reporter)

Comment 13

9 years ago
That'd probably be the least-cost solution for now, yeah.

CCing the comm-central crowd, too, as the toolkit line would impact all products.

Not sure if the information we expose about the machine is worrysome.

/dev/disk2          	Apple_partition_scheme         	
/dev/disk2s1        	Apple_partition_map            	
/dev/disk2s2        	Apple_HFS                      	/tmp/Minefield

is sample output.
(Assignee)

Comment 14

9 years ago
Created attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

Personally, I'm not concerned about the device names and paths being exposed. This should help us diagnose this better next time it happens, though.
Attachment #366620 - Flags: review?(l10n)
(Reporter)

Comment 15

9 years ago
Comment on attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

Looking good, tested locally and that shouldn't confuse anything and get us better diagnostics if we run into this again.

Should we move this bug over to toolkit build config?
Attachment #366620 - Flags: review?(l10n) → review+
(Assignee)

Comment 16

9 years ago
Comment on attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

This patch will help us diagnose problems with our l10n automation. It does not change existing behaviour in any way.
Attachment #366620 - Flags: approval1.9.1?
(Assignee)

Comment 17

9 years ago
Comment on attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

Landed on mozilla-central: changeset:   26048:6e6f2476aa4e
Comment on attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

a191=beltzner
Attachment #366620 - Flags: approval1.9.1? → approval1.9.1+
(Assignee)

Comment 19

9 years ago
Comment on attachment 366620 [details] [diff] [review]
proposed patch to dump hdiutil output to log & file

mozilla-1.9.1: changeset:   23769:cd18d58ac196
Attachment #366620 - Flags: checked‑in+ checked‑in+
(Assignee)

Comment 20

9 years ago
Do you want to keep this bug open, Axel?

Also, feel free to move to Build Config if you want...doesn't matter to me where it lives.
(Reporter)

Comment 21

9 years ago
That's fine. Let's resolve this bug and keep an eye open.

I'm adding a separate step for unpacking in bug 398954, which should make it easier to actually data mine for failures here in the future.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED

Updated

9 years ago
Keywords: fixed1.9.1
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.