Closed
Bug 548146
Opened 14 years ago
Closed 14 years ago
deploy maemo 5 scratchbox to linux32 build slaves
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhford, Assigned: jhford)
References
Details
(Whiteboard: [maemo5][missing documentation])
Attachments
(5 files, 12 obsolete files)
1.17 KB,
text/plain
|
Details | |
656 bytes,
text/plain
|
Details | |
3.22 KB,
text/plain
|
Details | |
2.09 KB,
text/plain
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details |
6.70 KB,
patch
|
bhearsum
:
review+
jhford
:
checked-in+
|
Details | Diff | Splinter Review |
We are going to need to put scratchbox onto the 32bit linux slaves. I am almost certain that the new scratchbox is a self-contained package, and we can create a tarball with the needed files. Unlike the older scratchbox, the new scratchbox doesn't have /etc/init.d/sbox. instead, it has /<path>/scratchbox/sbin/sbox_ctl which is used to start the needed daemons. We shouldn't use the provided installation scripts as they fetch data and contain updates. There might be some negative interaction with the current chinook sdk and scratchbox as both of them do the same cpu emulation for arm binaries and both have a lot of similar bind mounts. We currently have chkconf sbox on, but we are likely going to have to have both scratchbox installs disabled at boot, enabling them as part of the build process. I am investigating whether it is possible to have both a diablo (N810) and a fremantle (N900) sdk in the same scratchbox5 install as opposed to having a seperate SB4 + Chinook (Pre-N810) and SB5+Fremantle (N900). This has the advantage of only requiring one scratchbox service to be running and a lot less disk space usage.
Updated•14 years ago
|
Summary: deploy scratchbox to linux32 build slaves → deploy maemo 5 scratchbox to linux32 build slaves
Assignee | ||
Comment 1•14 years ago
|
||
to be clear, the current scratchbox is not capable of doing N900 (maemo5) builds. We need to either have an updated Diablo+Fremantle scratchbox or keep the chinook and add a fremantle one.
Summary: deploy maemo 5 scratchbox to linux32 build slaves → deploy scratchbox to linux32 build slaves
Assignee | ||
Updated•14 years ago
|
Summary: deploy scratchbox to linux32 build slaves → deploy maemo 5 scratchbox to linux32 build slaves
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → jhford
Updated•14 years ago
|
OS: Mac OS X → Linux
Hardware: x86 → ARM
Whiteboard: [maemo5]
Assignee | ||
Updated•14 years ago
|
Hardware: ARM → x86
Assignee | ||
Comment 2•14 years ago
|
||
takes output of two different runs of: find / -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list find /builds -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list
Assignee | ||
Comment 3•14 years ago
|
||
it looks like we are safe to deploy this as a tarball of /builds/scratchbox python fs-compare.py before-upg-sorted after-upg-sorted new file - /before-sb-upgrade updated file - /file-list updated file - /home/cltbld/.bash_history new file - /home/cltbld/maemo-scratchbox-install_5.0.sh new file - /home/cltbld/maemo-sdk-install_5.0.sh updated file - /root/.bash_history new file - /root/maemo-scratchbox-install_5.0.sh new file - /tmp/scratchbox-core-1.0.16-i386.tar.gz new file - /tmp/scratchbox-devkit-apt-https-1.0.10-i386.tar.gz new file - /tmp/scratchbox-devkit-debian-1.0.10-i386.tar.gz new file - /tmp/scratchbox-devkit-doctools-1.0.13-i386.tar.gz new file - /tmp/scratchbox-devkit-git-1.0.1-i386.tar.gz new file - /tmp/scratchbox-devkit-perl-1.0.4-i386.tar.gz new file - /tmp/scratchbox-devkit-qemu-0.10.0-0sb10-i386.tar.gz new file - /tmp/scratchbox-devkit-svn-1.0-i386.tar.gz new file - /tmp/scratchbox-libs-1.0.16-i386.tar.gz new file - /tmp/scratchbox-toolchain-cs2007q3-glibc2.5-arm7-1.0.14-2-i386.tar.gz new file - /tmp/scratchbox-toolchain-cs2007q3-glibc2.5-i486-1.0.14-1-i386.tar.gz new file - /tmp/scratchbox-toolchain-host-gcc-1.0.16-i386.tar.gz
Comment 4•14 years ago
|
||
Beyond the md5 comparison, have you tested that installing through a tarball doesn't break any part of the build process? This sounds like something that should be run through staging a little bit before rolling out.
Assignee | ||
Comment 5•14 years ago
|
||
I am not currently able to get valid builds on my first machine so I can't test that it will work on another machine.
Assignee | ||
Comment 6•14 years ago
|
||
i have a second vm that I can use to test the pave-over tarball deployment option with
Assignee | ||
Comment 7•14 years ago
|
||
currently testing a build on the deployed-to vm. I used the following process: =Create Tarball= On source machine su - /scratchbox/sbin/sbox_ctl stop /scratchbox/sbin/sbox_umount_all cd /builds tar cvfps scratchbox-YYYY-MM-DD-HHMM.tar scp scratchbox-YYYY-MM-DD-HHMM.tar <wherever> =Deploy= On target machine su - /scratchbox/sbin/sbox_ctl stop /etc/init.d/sbox stop # above should do it, but be certain /scratchbox/sbin/sbox_umount_all mkdir -p /builds/slave /scratchbox/users/cltbld/builds/slave mount -o bind /builds/slave /scratchbox/users/cltbld/builds/slave #set up bind mount in /etc/fstab, must be there every boot or things break cd /builds rm -rf scratchbox tar xvfps scratchbox-YYYY-MM-DD-HHMM.tar
Assignee | ||
Comment 8•14 years ago
|
||
currently testing a build on the deployed-to vm. I used the following process: =Create Tarball= On source machine su - /scratchbox/sbin/sbox_ctl stop /scratchbox/sbin/sbox_umount_all umount /builds/slave cd /builds tar cvfpsW scratchbox-YYYY-MM-DD-HHMM.tar scp scratchbox-YYYY-MM-DD-HHMM.tar <wherever>
Status: NEW → ASSIGNED
Assignee | ||
Comment 9•14 years ago
|
||
going to run this script once i have deployed to the two slaves that phong created for me.
Assignee | ||
Comment 10•14 years ago
|
||
testing deployment (again) with =Deploy= On target machine su - /scratchbox/sbin/sbox_ctl stop /etc/init.d/sbox stop # above should do it, but be certain /scratchbox/sbin/sbox_umount_all cd /builds rm -rf scratchbox tar xvfps scratchbox-YYYY-MM-DD-HHMM.tar mkdir -p /builds/slave /scratchbox/users/cltbld/builds/slave mount -o bind /builds/slave /scratchbox/users/cltbld/builds/slave #set up bind mount in /etc/fstab, must be there every boot or things break
Assignee | ||
Comment 11•14 years ago
|
||
forgot to actually use scratchbox. this script is correct. these are the commands for one loop iteration: /scratchbox/moz_scratchbox -p -d /builds/slave sb-conf select FREMANTLE_ARMEL /scratchbox/moz_scratchbox -p -d /builds/slave make -f client.mk build /scratchbox/moz_scratchbox -p -d /builds/slave/obj-2-fre make package /scratchbox/moz_scratchbox -p -d /builds/slave/obj-2-fre make package-tests
Attachment #434129 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Attachment #434263 -
Attachment is patch: false
Comment 12•14 years ago
|
||
John I was thinking about the size of the tar ball you mentioned and to deploy through puppet. Have you considered: 1) remove scratchbox 2) recreate scratchbox from scratch to the point that you wanted rather than: 1) remove scracthbox 2) download tar ball (few gigabytes) 3) untar that tar ball (which is in the state that you wanted) What do you think? Do we have the steps needed to get us from scratch to the desired scratchbox desired state?
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #12) > John I was thinking about the size of the tar ball you mentioned and to deploy > through puppet. > > Have you considered: > 1) remove scratchbox > 2) recreate scratchbox from scratch to the point that you wanted > rather than: > 1) remove scracthbox > 2) download tar ball (few gigabytes) > 3) untar that tar ball (which is in the state that you wanted) > > What do you think? > Do we have the steps needed to get us from scratch to the desired scratchbox > desired state? yes. I thought about doing that but did not pursue it because of three reasons. first is that the scratchbox and sdk use apt to install the most of the libraries and apps. this means that we could easily have mismatched versions of various packages. to guard against this we could recursively hash on the fs on a known good sb and compare it to code on the deployed slave but the logs to do that are fairly large and that takes a long time to do. second is that it takes a *very* long time to install scratchbox. we get a max of 20kbps to the scratchbox tarballs and would have to xfer 3gb at that speed. we could use a proxy but this is still going to be the same amount of bits going to the slave. this would lower the load on nfs though. the last concern I have is that we have to do an in-place upgrade of the scratchbox. there is a high probability that not all scratchboxes are identical. if we do a pave-over install we know that all of our scratchbox installs are identical. that being said, we could probably put that huge tarball on an internal http/FTP server and have one of the deployment steps be to wget it. do you think that it'd work? we could do versioned tarballs so that we don't have issues with figuring out which package is where. (written on iPhone. sorry for poor auto correction)
Comment 14•14 years ago
|
||
How big is the resulting tarball? A few gigs is pretty large to pull down and then have sitting on every slave.
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #14) > How big is the resulting tarball? A few gigs is pretty large to pull down and > then have sitting on every slave. i have not zipped it yet, but uncompressed it is 3.0GB. I just realised that i never put the deleting of the tarball in the posted steps above. The only reason i could think of to keep the tarball is if we wanted to do binary diffs for future deployments but that is probably overengineering the problem.
Assignee | ||
Comment 16•14 years ago
|
||
looking at the updated sdk notes (http://www.forum.nokia.com/info/sw.nokia.com/id/c05693a1-265c-4c7f-a389-fc227db4c465/Maemo_5_SDK.html) it looks like they are deploying sdk updates as debian packages. This means that if we wish to keep our scratchboxes in sync with each other, we should not do any package deployment using apt/aptitude on the scratchbox installations unless we are able to do the same apt operation on all slaves + the ref image before the package gets updated on the repository and we don't add new linux32 slaves to the pool ever. One thing we could do is use rsync to keep /scratchbox up to date on each boot from a master copy of the scratchbox but that seems like a more fragile solution.
Assignee | ||
Comment 17•14 years ago
|
||
Assignee | ||
Comment 19•14 years ago
|
||
sorry for the bugspam, but this is the only way i can get the script on the test slaves. scp isn't working for me right now
Attachment #434478 -
Attachment is obsolete: true
Assignee | ||
Comment 20•14 years ago
|
||
Attachment #434482 -
Attachment is obsolete: true
Comment 21•14 years ago
|
||
(In reply to comment #13) > (In reply to comment #12) > > John I was thinking about the size of the tar ball you mentioned and to deploy > > through puppet. > > > > Have you considered: > > 1) remove scratchbox > > 2) recreate scratchbox from scratch to the point that you wanted > > rather than: > > 1) remove scracthbox > > 2) download tar ball (few gigabytes) > > 3) untar that tar ball (which is in the state that you wanted) > > > > What do you think? > > Do we have the steps needed to get us from scratch to the desired scratchbox > > desired state? > > yes. I thought about doing that but did not pursue it because of three reasons. > first is that the scratchbox and sdk use apt to install the most of the > libraries and apps. this means that we could easily have mismatched versions of > various packages. to guard against this we could recursively hash on the fs on > a known good sb and compare it to code on the deployed slave but the logs to do > that are fairly large and that takes a long time to do. second is that it takes > a *very* long time to install scratchbox. we get a max of 20kbps to the > scratchbox tarballs and would have to xfer 3gb at that speed. we could use a > proxy but this is still going to be the same amount of bits going to the slave. > this would lower the load on nfs though. the last concern I have is that we > have to do an in-place upgrade of the scratchbox. there is a high probability > that not all scratchboxes are identical. if we do a pave-over install we know > that all of our scratchbox installs are identical. We can pretty easily deal with the load on NFS problem by staging out the rollout to 10 slaves or so at a time. It would take longer overall, but eliminate load concerns there. > that being said, we could probably put that huge tarball on an internal > http/FTP server and have one of the deployment steps be to wget it. do you > think that it'd work? we could do versioned tarballs so that we don't have > issues with figuring out which package is where. Puppet has a built in HTTP server that we should use for this. It's already got an Apache install in front of it but I don't know if caching would help though, since they're hosted on the same machine.
Comment 22•14 years ago
|
||
(In reply to comment #16) > looking at the updated sdk notes > (http://www.forum.nokia.com/info/sw.nokia.com/id/c05693a1-265c-4c7f-a389-fc227db4c465/Maemo_5_SDK.html) > it looks like they are deploying sdk updates as debian packages. > > This means that if we wish to keep our scratchboxes in sync with each other, we > should not do any package deployment using apt/aptitude on the scratchbox > installations unless we are able to do the same apt operation on all slaves + > the ref image before the package gets updated on the repository and we don't > add new linux32 slaves to the pool ever. > > One thing we could do is use rsync to keep /scratchbox up to date on each boot > from a master copy of the scratchbox but that seems like a more fragile > solution. Totally agree that keeping things in sync with apt is a bit tricky. We _could_ store all the .deb's that get installed locally, and then use puppet to install those with dpkg. I'm more concerned with the on-disk size of the whole environment. Is there any way to get the size down below 3 GB? After deleting the old scratchbox, what's the impact?
Comment 23•14 years ago
|
||
(In reply to comment #22) > Totally agree that keeping things in sync with apt is a bit tricky. We _could_ > store all the .deb's that get installed locally, and then use puppet to install > those with dpkg. The only downside to this is that we can't use Puppet's "package" types to do it, which means we've got to deal with a lot of fiddly exec {}s -- working with a single tarball would be easier from that perspective.
Assignee | ||
Comment 24•14 years ago
|
||
(In reply to comment #22) > Totally agree that keeping things in sync with apt is a bit tricky. We _could_ > store all the .deb's that get installed locally, and then use puppet to install > those with dpkg. Yes, we could do that. Approximately the same amount of data has to be pushed to the slave though. There is a risk (however small) that the scratchbox install scripts don't assemble a working scratchbox. We also have to build python and openssl from source in a weird way. I would personally feel more confident with the final install if we did it as a snapshot of a known-to-work scratchbox installation. > I'm more concerned with the on-disk size of the whole environment. Is there > any way to get the size down below 3 GB? After deleting the old scratchbox, > what's the impact? This is a pave-over of the old scratchbox. Scratchbox is basically a chroot manager with a bunch of neat tricks for tricking the programs into thinking it is running on ARM, complete with ARM binary execution. The end result is an entire debian system for each target environment you have in your scratchbox. The old scratchbox had 2 for arm (one that we didn't use) and one for x86 (that we didn't use). The new one has two arm ones that we do use and no x86 ones (see https://bugzilla.mozilla.org/show_bug.cgi?id=528431#c42 for details on space usage). I deleted the unused targets which means that this deployment will actually cut our net disk usage by 1.2GB My main concern is that I am going to kill puppet with this deployment, not that the deployment method is flawed. For future changes, if it is a matter of installing a couple deb files, I would be in favour of downloading the dependent debs (so long as there aren't too many) and doing the exec{} thing in Ben's comment, but for adding a new target or upgrading the SDK version, this seems like the best way to have a working scratchbox that is identical on all slaves without over-engineering the solution. I am waiting on 3 builds to finish up, and once they do I will have a tarball that is ready to deploy (at least to staging). Is anyone around who can help me with this? Oh, and the bind mount that needs to be set up -- does puppet understand how to manage fstab and bind mounting?
Assignee | ||
Comment 25•14 years ago
|
||
=Deploy= On target machine su - /scratchbox/sbin/sbox_ctl stop /etc/init.d/sbox stop # above should do it, but be certain /scratchbox/sbin/sbox_umount_all cd /builds rm -rf scratchbox tar xvfps scratchbox-YYYY-MM-DD-HHMM.tar rm scratchbox-YYYY-MM-DD-HHMM.tar mkdir -p /builds/slave /scratchbox/users/cltbld/builds/slave mount -o bind /builds/slave /scratchbox/users/cltbld/builds/slave #set up bind mount in /etc/fstab, must be there every boot or things break
Assignee | ||
Comment 26•14 years ago
|
||
well, it turns out that bzip2'ing the tarball yields a major improvement in archive size (1.2gb vs 3.0gb). The steps should be updated with -tar xvfps scratchbox-YYYY-MM-DD-HHMM.tar -rm scratchbox-YYYY-MM-DD-HHMM.tar +tar jxvfps scratchbox-YYYY-MM-DD-HHMM.tar.bz2 +rm scratchbox-YYYY-MM-DD-HHMM.tar.bz2
Assignee | ||
Comment 27•14 years ago
|
||
this is my guess at what needs to happen.
Attachment #434618 -
Flags: feedback?(bhearsum)
Attachment #434618 -
Flags: feedback?(armenzg)
Comment 28•14 years ago
|
||
Comment on attachment 434618 [details] [diff] [review] taking a stab at it Is there any reason you can't put all of this in a bash script and just execute that? It would make the Puppet side much simpler, like: * copy tarball * execute script * delete tarball ...and you'd probably want to keep the symlink stuff in puppet, too. One thing to watch out for is the multiple bind mounting of .ssh. It'd be easy to unmount in a loop in a bash script :-). A few more comments are inline. I don't think puppet will accept variables outside of classes; move it inside. >+ "umount /builds/slave || /bin/true": >+ cwd => "/", >+ alias => "umount-builds-slave-bind", >+ subscribe => Exec['umount-all-scratchbox-binds']; This mount doesn't exist in our current scratchbox install, so you don't need to do this.
Attachment #434618 -
Flags: feedback?(bhearsum) → feedback-
Comment 29•14 years ago
|
||
Comment on attachment 434618 [details] [diff] [review] taking a stab at it +1 to the script idea >+$sbtarball = "scratchbox-2010-03-22-1916.tar.bz2" >+ > class scratchbox { > I think it should go inside the class. I could be wrong. >+ source => "${fileroot}/${sbtarball}"; source => "${fileroot}/centos/dist/${sbtarball}"; or something like that. We are not going to use httproot but have a look at the structure in here for some ideas: http://production-puppet.build.mozilla.org/darwin9/ For the next attachment could you paste the scracthbox.pp file as a whole rather than the diff? We can work on the diff when you ask for review instead of feedback
Attachment #434618 -
Flags: feedback?(armenzg)
Assignee | ||
Comment 30•14 years ago
|
||
(In reply to comment #28) > (From update of attachment 434618 [details] [diff] [review]) > Is there any reason you can't put all of this in a bash script and just execute That is what i would prefer to do. I suggested it and was told that it would be better to do all of it in puppet. > that? It would make the Puppet side > much simpler, like: > * copy tarball > * execute script > * delete tarball > > ...and you'd probably want to keep the symlink stuff in puppet, too. > > One thing to watch out for is the multiple bind mounting of .ssh. It'd be easy > to unmount in a loop in a bash script :-). why don't i do something in my script that greps for the bind mount in fstab and adds only if it isn't already there? When it comes time to test, I have absolutely zero clue how to test this. Is anyone able to help me test this? I am also not averse to rolling this out manually as time is of the essence in this deployment. If we do a manual rollout I would have to take slaves down as they go idle, run the script then put it back in the pool (In reply to comment #29) > (From update of attachment 434618 [details] [diff] [review]) > We are not going to use httproot but have a look at the structure in here for > some ideas: > http://production-puppet.build.mozilla.org/darwin9/ I will look there > For the next attachment could you paste the scracthbox.pp file as a whole > rather than the diff? > We can work on the diff when you ask for review instead of feedback yah, will do.
Assignee | ||
Comment 31•14 years ago
|
||
it is looking like the automation work that depends on this is moving a lot quicker than anticipated.
Severity: normal → major
Priority: -- → P2
Assignee | ||
Comment 32•14 years ago
|
||
this is the script to deploy scratchbox that would go in /N/centos5/scratchbox/
Attachment #435099 -
Flags: feedback?(bhearsum)
Attachment #435099 -
Flags: feedback?(armenzg)
Assignee | ||
Comment 33•14 years ago
|
||
this is the whole file as requested
Attachment #434618 -
Attachment is obsolete: true
Attachment #435100 -
Flags: feedback?(bhearsum)
Attachment #435100 -
Flags: feedback?(armenzg)
Assignee | ||
Comment 34•14 years ago
|
||
Comment on attachment 435100 [details] scratchbox.pp > ${VERSION}=2010-03-25-2138 I am not sure if i need to surround in quotes. I do not want the quotes in the file string > exec { > "VERSION=${VERSION} /N/centos5/scratchbox/deploy-sb.sh": > creates => "/scratchbox/deployed-${VERSION}", > cwd => "/builds/", > alias => "install-scratchbox", > subscribe => File['/builds/scratchbox-${VERSION}.tar.bz2']; > } 1. I don't know if i can do the env var thing. If this is passed to an exec*()-like function instead of a system()-like one, I could use /usr/bin/env to insert the variable I want. 2. does the deploy-sb.sh need to be in the files section? It is only being executed not installed and will in the NFS mount > "/builds/scratchbox-${VERSION}.tar.bz2": > source => "${fileroot}/centos5/scratchbox/scratchbox-${VERSION}.tar.bz2"; Not really sure if this is correct or not.
Assignee | ||
Comment 35•14 years ago
|
||
Comment on attachment 435099 [details] deploy-sb.sh >su - cltbld -c 'buildbot stop /builds/slave' >./scratchbox/sbin/sbox_ctl stop || error "stopping sbox" #turn off >./scratchbox/sbin/sbox_umount_all || error "umounting sbox" #turn extra off .... >touch ./scratchbox/deployed-${VERSION} >./scratchbox/sbin/sbox_ctl start >su - cltbld -c 'buildbot start /builds/slave' Not sure If i should be doing the buildbot slave control, but I'd err on the side of caution. If the slave is guarunteed to not be running, we could loose the buildbot commands.
Comment 36•14 years ago
|
||
Comment on attachment 435099 [details] deploy-sb.sh >#!/bin/bash ># This is a script to deploy a scratchbox. At the end of deployment, this ># script will touch ${WORKDIR}/scratchbox/scratchbox-deployed and ># ${WORKDIR}/scratchbox/deployed-${VERSION} ># ># This script must be run as root and will stop a buildbot slave at /builds/slave ># if it is there using the cltbld user account >if [[ x"$VERSION" == "x" ]] ; then > VERSION=2 >fi >WORKDIR='/builds/' > >error () { > echo "ERROR: $1" 1>&2 > mv /builds/slave/buildbot.tac /builds/slave/buildbot.tac.off > cat >> /builds/slave/buildbot.tac.off <<EOF >########################################################## ># # ># # ># SLAVE IS OFF BECAUSE OF SCRATCHBOX DEPLOYMENT FAILURE # ># # ># DO NOT ENABLE SLAVE UNTIL SCRATCHBOX IS FIXED # ># # ># # >########################################################## >EOF > exit 1 >} Please don't add anything to buildbot.tac. >mount | egrep "^/builds/slave" &> /dev/null >if [[ $? != 0 ]] ; then > #We need to umount /builds/slave or die > umount /builds/slave || error "could not umount /builds/slave" >fi I am confused. We have to check for /builds/slave? I didn't know we mount that. I like this script. I bet bhearsum will have more comments than me. John, how long does this process take? I am trying to picture what problems will cause having our slaves going down for deployment. Might be worth closing the trees for a couple of hours just in case.
Comment 37•14 years ago
|
||
Comment on attachment 435100 [details] scratchbox.pp ># scratchbox.pp ># installs the scratchbox files ># In this manifest we make use of some .expect scripts to automate ># the steps necessary. Sometimes these need to be done as the cltbld ># user so we do that by calling the .expect via su > >class scratchbox { > ${VERSION}=2010-03-25-2138 > exec { > "VERSION=${VERSION} /N/centos5/scratchbox/deploy-sb.sh": Use ${fileroot} instead of /N. Use $os instead of "centos5". > creates => "/scratchbox/deployed-${VERSION}", > cwd => "/builds/", > alias => "install-scratchbox", > subscribe => File['/builds/scratchbox-${VERSION}.tar.bz2']; > } > > file { > "/scratchbox": > ensure => '/builds/scratchbox', > mode => 755, > owner => root, > group => root; > > "/builds/scratchbox-${VERSION}.tar.bz2": > source => "${fileroot}/centos5/scratchbox/scratchbox-${VERSION}.tar.bz2"; Use $os. > > "/builds/scratchbox/moz_scratchbox": > source => "/N/centos5/scratchbox/moz_scratchbox", > mode => 755; > > "/scratchbox/etc/resolv.conf": > source => "/N/centos5/scratchbox/etc/resolv.conf", > require => Exec["install-scratchbox"]; > Use ${fileroot} instead of /N. Use $os instead of "centos5".
Attachment #435100 -
Flags: feedback?(armenzg) → feedback+
Comment 38•14 years ago
|
||
Comment on attachment 435099 [details]
deploy-sb.sh
You can't go stopping the buildbot slave here. We cannot predict when this is going to run, or if the buildbot slave is going to busy.
I agree with Armen about not modifying the buildbot.tac file. If we manage to screw it up on every machine that's a _ton_ of manual cleanup to do.
You need to unmount the .ssh bind mount in a loop here, I'm pretty sure.
Why do you need to test for old-scratchbox in a loop? rm -rf isn't asynchronous. If for some reason you do, set a maximum amount of time, and die after it to avoid the possibility of an infinite loop.
Rolling this out is definitely going to be tricky. Given the size and scope of this change it's best to do it in a downtime, I think. I'm also wondering if we should switch the Linux build machines to do what we do on Talos: only run Puppet once, at boot, and only launch Buildbot after it finishes. That would eliminate the possibility of a build running while we're trying to deploy.
Let's chat about this today.
Attachment #435099 -
Flags: feedback?(bhearsum) → feedback-
Comment 39•14 years ago
|
||
Comment on attachment 435100 [details]
scratchbox.pp
Why not just pass $VERSION as an arg?
This seems pretty ok overall, but you need to delete the tarball afterwards. You can do this with an additional file {} check using 'ensure => absent, force => true'.
I'd like to see this in action before I review it, too.
Assignee | ||
Comment 40•14 years ago
|
||
(In reply to comment #36) > Please don't add anything to buildbot.tac. sure, but if this deployment fails, the slave needs to be taken out of the pool. Is moving buildbot.tac ok? > I am confused. We have to check for /builds/slave? I didn't know we mount that. We don't currently, but that is something that we are going to be doing in the future. This check is to see if /builds/slave is mounted at all, and only if it is, unmount it. > John, how long does this process take? > I am trying to picture what problems will cause having our slaves going down > for deployment. Might be worth closing the trees for a couple of hours just in > case. it takes 15minutes+ per slave. There would be issues if there is a current maemo build in progress. There would also be issues if the slave rebooted during deployment. (In reply to comment #37) > Use ${fileroot} instead of /N. Use $os instead of "centos5". > Use $os. > Use ${fileroot} instead of /N. Use $os instead of "centos5". Will Do. (In reply to comment #38) > (From update of attachment 435099 [details]) > You can't go stopping the buildbot slave here. We cannot predict when this is > going to run, or if the buildbot slave is going to busy. Unless we can guarantee that there is no Maemo build happening and the slave won't reboot midstream, this is not a safe deployment. > I agree with Armen about not modifying the buildbot.tac file. If we manage to > screw it up on every machine that's a _ton_ of manual cleanup to do. Ok, I won't do that. > You need to unmount the .ssh bind mount in a loop here, I'm pretty sure. forgot that we mounted .ssh, will do this. > Why do you need to test for old-scratchbox in a loop? rm -rf isn't > asynchronous. If for some reason you do, set a maximum amount of time, and die > after it to avoid the possibility of an infinite loop. I am running the rm -rf in the background to speed up deployment. This check is just to make sure that the script does not exit before the old scratchbox finishes being deleted, but I can set a maximum time. > Rolling this out is definitely going to be tricky. Given the size and scope of > this change it's best to do it in a downtime, I think. I'm also wondering if we > should switch the Linux build machines to do what we do on Talos: only run > Puppet once, at boot, and only launch Buildbot after it finishes. That would > eliminate the possibility of a build running while we're trying to deploy. In general, it seems to be that changing anything on the filesystem that is related to the actual build while builds are happening isn't a good thing. Is there a way we can say that certain jobs are safe to deploy at any time and others are only safe before slaves start? I would agree that this is best for a downtime. Alternatively, if my automation patches are r+'d then I would not be opposed to taking slaves out of the pool manually and deploying this to our pm02 slaves. > Let's chat about this today. Sure
Assignee | ||
Comment 41•14 years ago
|
||
the patches that depend on this have been r+'d. I would like to be able to land them before they bitrot and we would also like to get this in production by the end of quarter as this is a goal.
Severity: major → critical
Assignee | ||
Comment 42•14 years ago
|
||
I am also including the libcurl3-dev package from bug 554999 in the CHINOOK-ARMEL-2007 tarball.
Blocks: 554999
Assignee | ||
Comment 43•14 years ago
|
||
Attachment #435100 -
Attachment is obsolete: true
Attachment #435100 -
Flags: feedback?(bhearsum)
Assignee | ||
Comment 44•14 years ago
|
||
patch coming soon, but i got this output on running puppetd --test --server staging-puppet.build.mozilla.org <snip> notice: //Node[moz2-linux-slave03.build.mozilla.org]/staging-buildslave/scratchbox/Exec[install-scratchbox]/returns: executed successfully <snip> notice: Finished catalog run in 1123.42 seconds
Assignee | ||
Comment 45•14 years ago
|
||
Here is the puppet manifest for deployment. As discussed it requires buildbot to not be running during the deployment. I have put the required tarballs and script in the /N/puppet-files/centos5/scratchbox directory of staging-puppet which aiui means it is also in that directory for production-puppet. I have not checked them in. I will attach the deployment script to this bug
Attachment #435959 -
Attachment is obsolete: true
Attachment #436055 -
Flags: review?(bhearsum)
Attachment #436055 -
Flags: checked-in?
Assignee | ||
Comment 46•14 years ago
|
||
updated. Using cwd => /builds/slave was not valid after changing from ${fileroot} to NFS, so I have to store the pwd before changing to /builds/slave in order to know where the tarball is located.
Attachment #435099 -
Attachment is obsolete: true
Attachment #436056 -
Flags: review?(bhearsum)
Attachment #436056 -
Flags: checked-in?
Attachment #435099 -
Flags: feedback?(armenzg)
Assignee | ||
Updated•14 years ago
|
Attachment #436056 -
Attachment mime type: application/x-sh → text/plain
Assignee | ||
Comment 47•14 years ago
|
||
Comment on attachment 436055 [details] [diff] [review] puppet manifest + timeout => 1*60*60, should be + timeout => 7200,
Assignee | ||
Comment 48•14 years ago
|
||
I have had to update the scratchbox due to bug 554999. I am tarring+bzip2'ing the new scratchbox and putting it on /N/. I am going to test that this deployment works as well. This does mean that the $sb_version variable in the puppet manifest change we land should be 2010-03-30-1129.
Assignee | ||
Comment 49•14 years ago
|
||
-don't do rm asynchronously -save build dir from existing scratchbox before deployment
Attachment #436056 -
Attachment is obsolete: true
Attachment #436208 -
Flags: review?(bhearsum)
Attachment #436056 -
Flags: review?(bhearsum)
Attachment #436056 -
Flags: checked-in?
Comment 50•14 years ago
|
||
Comment on attachment 436056 [details]
deploy-sb.sh
As per our IRC discussion, please do everything here synchronously. Doing the rm in the background is error prone, and in the worst case, ties up a slave needlessly for 5 hours.
If there's a problem when removing the old scratchbox, just error out.
umount'ing like you are doesn't work for things that are mounted multiple times, like .ssh can be. It's probably ok since we're running at boot, but for correctness can you please change the umount's to unmount by mount point?
unmount doesn't return until it's finished unmount, the sleep's are unnecessary - remove them.
Assignee | ||
Comment 51•14 years ago
|
||
(In reply to comment #50) > (From update of attachment 436056 [details]) > As per our IRC discussion, please do everything here synchronously. Doing the > rm in the background is error prone, and in the worst case, ties up a slave > needlessly for 5 hours. > > If there's a problem when removing the old scratchbox, just error out. new script attached > umount'ing like you are doesn't work for things that are mounted multiple > times, like .ssh can be. It's probably ok since we're running at boot, but for > correctness can you please change the umount's to unmount by mount point? The output redirection was breaking my loops, i have tested the fix below. > unmount doesn't return until it's finished unmount, the sleep's are unnecessary > - remove them. Actually, in my testing that was not the case. They returned before the umount had actually unmounted the file system. I am happy to remove them though [root@maemo5-test01 ~]# mount /dev/sda1 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda2 on /var type ext3 (rw) /dev/sdb1 on /builds type ext3 (rw,noatime) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 10.2.71.136:/export/buildlogs/puppet-files on /N type nfs (ro,addr=10.2.71.136) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind,/scratchbox/users/cltbld/host_usr) [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount -o bind /builds/slave/ lala [root@maemo5-test01 ~]# mount /dev/sda1 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda2 on /var type ext3 (rw) /dev/sdb1 on /builds type ext3 (rw,noatime) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 10.2.71.136:/export/buildlogs/puppet-files on /N type nfs (ro,addr=10.2.71.136) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind,/scratchbox/users/cltbld/host_usr) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) /builds/slave on /root/lala type none (rw,bind) [root@maemo5-test01 ~]# while [[ `mount | grep "^/builds/slave"` ]] ; do umount /builds/slave ; done [root@maemo5-test01 ~]# mount /dev/sda1 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda2 on /var type ext3 (rw) /dev/sdb1 on /builds type ext3 (rw,noatime) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 10.2.71.136:/export/buildlogs/puppet-files on /N type nfs (ro,addr=10.2.71.136) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind) /builds/scratchbox/users/cltbld/host_usr on /host_usr type none (rw,bind,/scratchbox/users/cltbld/host_usr) [root@maemo5-test01 ~]#
Assignee | ||
Comment 52•14 years ago
|
||
-dealing with files that were mounted multiple times is fixed
Attachment #436208 -
Attachment is obsolete: true
Attachment #436213 -
Flags: review?(bhearsum)
Attachment #436208 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #436213 -
Attachment mime type: application/x-sh → text/plain
Assignee | ||
Comment 53•14 years ago
|
||
Attachment #436055 -
Attachment is obsolete: true
Attachment #436231 -
Flags: review?(bhearsum)
Attachment #436055 -
Flags: review?(bhearsum)
Attachment #436055 -
Flags: checked-in?
Assignee | ||
Updated•14 years ago
|
Attachment #436208 -
Attachment mime type: application/x-sh → text/plain
Comment 54•14 years ago
|
||
Comment on attachment 436213 [details]
deploy-sb.sh
This seems fine.
Attachment #436213 -
Flags: review?(bhearsum) → review+
Comment 55•14 years ago
|
||
Comment on attachment 436231 [details] [diff] [review] puppet manifest - should apply cleanly >+ install-scratchbox: >+ command => "/N/centos5/scratchbox/deploy-sb.sh $sb_version", >+ creates => "/builds/scratchbox/deployed-$sb_version", >+ timeout => 1*60*60, This is still wrong. >+ mount { >+ "/builds/scratchbox/users/cltbld/builds/slave": >+ device => "/builds/slave", >+ fstype => "auto", >+ options => "bind", >+ ensure => "mounted", >+ require => Exec['install-scratchbox']; > } > } This is fine, but you need to remove the one in centos.pp
Attachment #436231 -
Flags: review?(bhearsum) → review-
Comment 56•14 years ago
|
||
(In reply to comment #55) > (From update of attachment 436231 [details] [diff] [review]) > >+ install-scratchbox: > >+ command => "/N/centos5/scratchbox/deploy-sb.sh $sb_version", > >+ creates => "/builds/scratchbox/deployed-$sb_version", > >+ timeout => 1*60*60, > > This is still wrong. > > >+ mount { > >+ "/builds/scratchbox/users/cltbld/builds/slave": > >+ device => "/builds/slave", > >+ fstype => "auto", > >+ options => "bind", > >+ ensure => "mounted", > >+ require => Exec['install-scratchbox']; > > } > > } > > > This is fine, but you need to remove the one in centos.pp ...of course, the one in centos.pp is .ssh, not the slave dir, so nevermind this.
Assignee | ||
Comment 57•14 years ago
|
||
timeout fixed. Set at 3 hours because of the slow link between MPT->MV. As discussed in IRC, the /builds/slave bind mount is not done in centos.pp so it is safe to do here.
Attachment #436231 -
Attachment is obsolete: true
Attachment #436300 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #436300 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 58•14 years ago
|
||
Comment on attachment 436300 [details] [diff] [review] puppet manifest - for rollout http://hg.mozilla.org/build/puppet-manifests/rev/77929941126c
Assignee | ||
Updated•14 years ago
|
Attachment #436300 -
Flags: checked-in?
Comment 59•14 years ago
|
||
Comment on attachment 436213 [details]
deploy-sb.sh
Checking in deploy-sb.sh;
/mofo/puppet-files/centos5/scratchbox/deploy-sb.sh,v <-- deploy-sb.sh
initial revision: 1.1
done
Haven't been able to land the tarball yet, because CVS sucks:
cvs [commit aborted]: out of memory; can not allocate 1277911658 bytes
Attachment #436213 -
Flags: checked-in+
Comment 60•14 years ago
|
||
Had to land a bustage fix because we forgot to make a case {} to stop 64-bit slaves from installing this. Landed it as http://hg.mozilla.org/build/puppet-manifests/rev/74d5fac79c34
Assignee | ||
Comment 61•14 years ago
|
||
This has landed. I haven't seen any slaves fail out on a build due to missing scratchbox. If there any slaves that do not have the updated scratchbox please file a new bug
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 62•14 years ago
|
||
John, I just remembered that the ref platform doc was never updated. Can you please document how you upgraded Scratchbox here: https://bugzilla.mozilla.org/show_bug.cgi?id=548146? (To be clear, we're looking for the actual upgrade instructions, not how to upgrade with the tarball.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 63•14 years ago
|
||
Yes, I was about to do that. I am going to be putting these steps into https://wiki.mozilla.org/index.php?title=ReferencePlatforms/Linux-scratchbox as we basically treat that as a platform in a platform. I will close this bug when I have finished
Assignee | ||
Comment 64•14 years ago
|
||
Done
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Comment 65•14 years ago
|
||
There is still no reference to the upgraded scratchbox on the ref platform page. You pointed me to https://wiki.mozilla.org/ReferencePlatforms/Linux-scratchbox, but that either needs to be migrated to, or linked from the ref platform page -- otherwise it's going to be missing for anyone who tries to create our ref platform.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•14 years ago
|
Whiteboard: [maemo5] → [maemo5][missing documentation]
Comment 66•14 years ago
|
||
Do we have a linux32 ref machine at the moment ? Did that get the update too ?
Comment 67•14 years ago
|
||
(In reply to comment #66) > Do we have a linux32 ref machine at the moment ? Did that get the update too ? We've got a VM and ix ref for this platform. Not sure if either got updated.
Comment 68•14 years ago
|
||
jhford, what say you ?
Assignee | ||
Comment 69•14 years ago
|
||
I don't know what the status of the ref machine/image is.
Comment 70•14 years ago
|
||
I booted up the 32-bit Linux ref platform today and it successfully synced up the new Scratchbox. The IX ref machine synced the new scratchbox along with the slaves on April 1st. I also copied the Scratchbox upgrade instructions to the ref platform page: https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0#Upgrade_Scratchbox and deleted the Linux-scratchbox page, because it is now entirely duplicating content found on the CentOS page. I think we're all done here.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Comment 71•14 years ago
|
||
Thanks Ben, that's awesome.
Assignee | ||
Updated•14 years ago
|
Attachment #436300 -
Flags: checked-in? → checked-in+
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•