Last Comment Bug 355309 - (end2end-bld) Tracking bug for end-to-end release run
(end2end-bld)
: Tracking bug for end-to-end release run
Status: RESOLVED FIXED
: meta
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: All All
: P2 critical (vote)
: ---
Assigned To: John O'Duinn [:joduinn] (please use "needinfo?" flag)
: build
:
Mentors:
http://wiki.mozilla.org/Build:Release...
: 375788 (view as bug list)
Depends on: 378526 299909 352230 355606 356185 361297 363237 364535 366850 367438 368579 369004 369538 370228 370459 370853 371305 371325 372744 372746 372755 372757 372759 372762 372764 372765 373080 373116 373401 373995 374555 375006 375587 375784 375785 375786 375787 375788 375789 376959 378529 379278 383297 385783 387426 387970 388366 388373 388524 389206 390493 390497 390519 391496 391498 391786 391787 391968 392969 394034 394494 394498 394500 394507 394962 394963 396253 396290 396430 396438 397842 398223 398494 399628 399900 400103 401150 401202 401290 401459 401579 401595 401596 401628 401936 402582 404062 406602 407351 407783 408157 408324 408453 408811 408868 409394 409395 409430 409434 409449 409477 409479 409493 410861 411928 412000 412006 413044 417703
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-03 17:17 PDT by Robert Helmer [:rhelmer]
Modified: 2013-08-12 21:54 PDT (History)
24 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
buildbot master.cfg as used on production build (14.63 KB, text/plain)
2007-08-27 16:27 PDT, John O'Duinn [:joduinn] (please use "needinfo?" flag)
rhelmer: review+
Details
buildbot master.cfg as used on staging build system (15.67 KB, text/plain)
2007-08-27 16:28 PDT, John O'Duinn [:joduinn] (please use "needinfo?" flag)
rhelmer: review+
Details
buildbot master.cfg as used on staging build system [checked in] (15.04 KB, text/plain)
2007-09-07 18:11 PDT, John O'Duinn [:joduinn] (please use "needinfo?" flag)
john+bugzilla: review+
rhelmer: review+
Details
buildbot master.cfg as used on production build [checked in] (14.00 KB, text/plain)
2007-09-07 18:14 PDT, John O'Duinn [:joduinn] (please use "needinfo?" flag)
john+bugzilla: review+
rhelmer: review+
Details
patch as landed (30.22 KB, patch)
2007-09-13 23:31 PDT, Robert Helmer [:rhelmer]
no flags Details | Diff | Splinter Review

Description Robert Helmer [:rhelmer] 2006-10-03 17:17:47 PDT
Tracking bug for automated release harness work (Bootstrap).
The code lives in mozilla/tools/release.
Comment 1 J. Paul Reed [:preed] 2007-02-22 00:00:43 PST
Talked with rhelmer; gonna co-opt this bug for tracking the end-to-end release work we're doing for Q2; the dependent bugs are all things that we should really try to fix as part of this effort anyway.
Comment 2 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-06-13 20:07:13 PDT
needed for "release automation", hence marking as critical.
Comment 3 J. Paul Reed [:preed] 2007-07-03 13:05:23 PDT
Down to P3 for this triage round.
Comment 4 J. Paul Reed [:preed] 2007-07-16 17:46:41 PDT
Over to John.
Comment 5 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-22 11:50:21 PDT
Lots of activity this week, here's a summary:

To see what we really have working, we're setting up a staging environment, and will then duplicate that to create an equivalent production environment. 

A staging buildbot master is now up and running at http://staging-build-console.build.mozilla.org:8810 and also communicating with http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTest. This VMimage is a clean, new clone of the current 1.8 production linux build environment.

A staging linux buildbot slave is up and running on staging-prometheus-vm, and connecting to buildbot master for jobs. This VMimage is a clean, new clone of the current 1.8 production linux build environment.

We've found a few minor gotchas which are fixed on the spot, or being tracked in separate bugs. At this point, the tag, source, linuxbuild, update and stage steps seem to be running just fine. 

Comment 6 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-24 00:27:44 PDT
A staging mac buildbot slave is up and running on bm-xserve14.build.mozilla.org, and connecting to buildbot master for jobs. This physical machine is a clean, new machine, running the current production mac build environment. Tracking bug#388373 closed.

We now have tag, source, linuxbuild, macosx-build, update and stage running fine.
Comment 7 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-24 00:39:24 PDT
A staging win32 buildbot slave is up and running on staging-pacifica-vm.build.mozilla.org, , and connecting to buildbot master for jobs. This VMimage is a clean, new clone of the current 1.8 production win32
build environment. Tracking bug#388366 closed.

We now have tag, source, linuxbuild, win32-build, macosx-build, update and stage running fine.
Comment 8 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-26 01:43:55 PDT
The l10n/repack step is now working for linux, win32, and can be seen on the staging-build-console (see URL above) along with the other working steps. 

It looks like l10n/repack is just now also working for mac also, but we want to triple check in the morning with fresh eyes before we declare success on mac.
Comment 9 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-30 11:47:01 PDT
l10n/repack confirmed worked for mac also, forgot to update.
Comment 10 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-30 12:00:12 PDT
"l10n verification" now setup, but failing because we found a script hardcoded to use stage.mozilla.org, causing the step to fail. Fixing... 

"update verification" now setup and working fully for linux and win32. On macosx, slave setup but incorrectly tried to do linux update verification on the mac! Debugging...

Comment 11 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-30 13:51:54 PDT
rhelmer just fixed l10 verification.
Comment 12 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-07-30 15:58:08 PDT
macosx update verification now working. All we did was stop and restart the mac slave, and now this works. We'll have to watch this and see if it happens again.
Comment 13 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-07 00:58:39 PDT
After a few false-starts over the last week, we had our first human-free build today. It took from 10:12am to 7:23pm to complete from tag->stage, with no human intervention, and no triggering of subsequent steps. 

(The signing step was just stubbed out, doing a quick symlink to "fake out" that there were signed bits present, so the end-to-end time may slightly increase.)

Comment 14 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-09 22:24:28 PDT
Now that the staging systems are up and running, we started setting up the equivilent production systems. 

The production build master is at http://build-console.build.mozilla.org:8810/

Comment 15 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-09 22:32:16 PDT
A setback.

While configuring the production mac slave, I discovered that the production mac slave (bm-xserve12), and the staging mac slave (bm-xserve14) were both Intel based machines. 

However, the current production mac builds are being done on a powerpc mac (bm-xserve02). This means that we are not complete with staging setup as we had thought. We now need to find two powerpc macs (bug#391496 for a staging slave), (bug#391498 for a production slave) and install buildbot on them both. 

For now, while I work on this, we are putting back the previously working Intel-based mac slave (bm-xserve14), so that at least work on the overall end-to-end automation can continue in the meanwhile. 
Comment 16 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-16 00:37:29 PDT
Production linux slave (on production-prometheus-vm) and production win32 slave (on production-pacifica-vm) are now up and running. They were tested against the staging-build-console, worked fine and are now connected to build-console.

We're trying to find/setup some intel xserve hardware, for mac production slave. 
Comment 17 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-21 19:01:06 PDT
We were resigned to moving to Intel xserve hardware, because we couldnt find any more PPC xserves. We only had bm-xserve02.build.mozilla.org, and it was currently being used for FF2.0.0.x production. If this machine failed we had no replacement.

Last week, we found 2 PPC xserves: 
- doing inventory check, we found and reimaged a PPC xserve called fireball.build.mozilla.org. This is now called bm-xserve03.build.mozilla.org, and was reimaged from the current production bm-xserve02. This is being used for automation staging. For details, see bug#391496.
- IT repaired a long broken bm-xserve05.build.mozilla.org. This was also reimaged from the current production bm-xserve02. This is being used for automation production. For details, see bug#391498.
- doing inventory check, we found bm-xserve01. This was being used by dmills, but he was happy to move to a different machine if needed.

This means we could rollout automation without having to worry about CPU architecture changes / cross compiling. And we also have spare machines, just in case...
Comment 18 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-21 19:05:56 PDT
We've left bm-xserve01 untouched, but its good to know it exists! 

The other two machines (bm-xserve03, bm-xserve05) both now have buildbot slaves installed, which connect to the build-console and staging-build-console correctly. So far each test run makes it through some steps ok, but then hits tinderbox setup problems, which we then fix, only to hit a later problem. Still debugging!
Comment 19 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-21 19:11:44 PDT
We also reimaged two Intel xserves (bm-xserve12, bm-xserve14) for use as mac slaves on TRUNK. These were able to connect and do builds from build-console / staging-build-console. 

For now, these have been disconnected from buildmasters, and are both sitting idle. Once we have automation running on FF1.8 branch, we can comeback to do this for TRUNK. Keeping bug#388373 and bug#390519 open for tracking.
Comment 20 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-22 18:47:45 PDT
Last night, got the staging system to run through end-to-end just fine, including using the new PPC mac slaves. 

This morning+afternoon, rhelmer and I went though the list of "just for testing" hacks that we had on production. See details at: http://wiki.mozilla.org/Build:Release_Automation#Just_for_testing

We alos reconfirmed cfg files are in sync between staging and production, manually cleaned up previous test runs, etc. 

Now that all those workaround are removed, we've started our first end-to-end production run. This is attempting to product "FF2.0.0.7 RC1" from the live cvs repository, using a cutoff time from last night's nightly run:

mac (bm-xserve02)
 timestamp: 1187698800
 build log:  
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8/1187698800.1400.gz&fulltext=1
 push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-05-mozilla1.8/

win32
 timestamp: 1187694420
 build log: 
http://tinderbox.mozilla.org/showlog.cgi?tree=Mozilla1.8&errorparser=windows&logfile=1187694420.13259.gz&buildtime=1187694420&buildname=WINNT%205.2%20pacifica-vm%20Depend%20Fx-Nightly&fulltext=1
 push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-04-mozilla1.8/

linux
 timestamp: 1187690820
 build log: 
http://tinderbox.mozilla.org/showlog.cgi?tree=Mozilla1.8&errorparser=unix&logfile=1187690820.25345.gz&buildtime=1187690820&buildname=Linux%20prometheus-vm%20Depend%20Fx-Nightly&fulltext=1
push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-03-mozilla1.8/

Comment 21 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-24 11:12:02 PDT
Delayed yesterday at the manual signing step, because of errors in the docs. Fixed doc mid-afternoon. Once signed, the automation detected the logfile as planned and continued.

Finished 2007-rc1 builds were handed off to QA for testing. So far so good.
Comment 22 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-27 08:30:10 PDT
QA have run the following tests:

smoketests (linux, win, mac, vista)
BFT (linux, win, mac)
FFT (win)
l10n tests for 12 p1 locales (linux, win, mac)
addon tests
...and additional manual update testing

All pass!!
Comment 23 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-27 16:27:03 PDT
Created attachment 278478 [details]
buildbot master.cfg as used on production build
Comment 24 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-27 16:28:18 PDT
Created attachment 278479 [details]
buildbot master.cfg as used on staging build system
Comment 25 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-27 16:35:21 PDT
These files are currently being used on staging-build-console and build-console. Note: for both of these attached master.cfg files

1) The password field for all buildbot slaves have been intentionally blanked out. However, the running staging and production systems do use passwords.

2) The staging buildmaster uses a differently named (but otherwise identical) set of slaves to the production buildmaster.

3) There are a couple of differences between the staging buildmaster and production buildmaster, around tagging and signing.

Diff-ing the two master.cfg files will show these differences.
Comment 26 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-08-27 17:07:16 PDT
*** Bug 375788 has been marked as a duplicate of this bug. ***
Comment 27 Robert Helmer [:rhelmer] 2007-09-06 16:48:15 PDT
Comment on attachment 278478 [details]
buildbot master.cfg as used on production build

The two steps below should not be doing "clean tinder-config area" or "TinderConfig"; can you remove those before checkin please? :


>l10nverifyFactory.addStep(ShellCommand, description='clean tinder-config area', workdir='build',
>                     command=['rm', '-rfv', '/builds/config'])
>l10nverifyFactory.addStep(ShellCommand, description='TinderConfig', workdir='build',
>                     command=['perl', './release', '-o', 'TinderConfig'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})

>updateverifyFactory.addStep(ShellCommand, description='clean tinder-config area', workdir='build',
>                      command=['rm', '-rfv', '/builds/config'])
>updateverifyFactory.addStep(ShellCommand, description='TinderConfig', workdir='build',
>                     command=['perl', './release', '-o', 'TinderConfig'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})
>updateverifyFactory.addStep(ShellCommand, description='update verificaton', workdir='build',
>                     command=['perl', './release', '-v', '-o', 'Updates'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})


r=rhelmer with that change.

There's a bunch of stuff we know we need to do already, I'd like to get this checked in first but just to enumerate:

1) switch from Buildbot's CVS class to ShellCommand cvs, so we can always use branch provided in master.cfg (switch back if we can get the branch support working right with what we're doing)

2) set up more schedulers so we can resume the process after a failed build

3) general refactoring and cleanup (consider creating a Bootstrap subclass of Shell, etc).
Comment 28 Robert Helmer [:rhelmer] 2007-09-06 16:51:31 PDT
Comment on attachment 278479 [details]
buildbot master.cfg as used on staging build system

Same as comment #27; remove the two TinderConfig related steps from l10nverify and updateverify.

Only build should need to run TinderConfig.
Comment 29 Robert Helmer [:rhelmer] 2007-09-07 12:09:36 PDT
Buildbot configs have been going here:
mozilla/tools/buildbot-configs/

Probably makes sense to have something like:

mozilla/tools/buildbot-configs/automation

And inside there have "staging" and "production" subdirectories.
Comment 30 Robert Helmer [:rhelmer] 2007-09-07 17:53:20 PDT
(In reply to comment #27)
> >updateverifyFactory.addStep(ShellCommand, description='update verificaton', workdir='build',
> >                     command=['perl', './release', '-v', '-o', 'Updates'],
> >                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})
> 
> 
> r=rhelmer with that change.

Sorry, you need this last line of course :) Overzealous cut and paste on my part.
Comment 31 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-09-07 18:11:54 PDT
Created attachment 280136 [details]
buildbot master.cfg as used on staging build system [checked in]

1) renames remaining slaves, so now all slaves have naming format of: <os>-<branch>-slave<number>

2) remove extra lines, per rhelmer's review.
Comment 32 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-09-07 18:14:55 PDT
Created attachment 280138 [details]
buildbot master.cfg as used on production build [checked in]

1) renames remaining slaves, so now all slaves have naming format of:
<os>-<branch>-slave<number>

2) remove extra lines, per rhelmer's review.
Comment 33 Robert Helmer [:rhelmer] 2007-09-07 18:27:32 PDT
Comment on attachment 280138 [details]
buildbot master.cfg as used on production build [checked in]

>####### PROJECT IDENTITY
>c['projectName'] = "Release Automation Test"

This should should probably not have "Test" in the name; looks fine besides!
Comment 34 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-09-07 18:33:31 PDT
Agreed... good catch. "Test" has been in name of both "staging-build-console" and "build-console" for ever. Will fix that later, with the next set of changes. 
Comment 35 J. Paul Reed [:preed] 2007-09-11 13:07:47 PDT
Comment on attachment 280138 [details]
buildbot master.cfg as used on production build [checked in]

Let me preface my review comments with the disclaimer I haven't been working on this project for quite some time now (not for lack of wanting to), and so given that this is a bit of a "big bang" landing, don't know how relevant these review comments are; I'm missing a lot of the context for why things were done in certain ways, and I'm not sure if there's a place I can find that in Bugzilla or in the dependent bugs. If there is, please point me at it (or any relevant documentation on the wiki).

Having said that:

-- There are 1.8 and trunk slaves; are we using this build automation for the trunk now? If so, in what capacity? Was bug 379278 fixed, and I missed it?

-- There are a bunch of slaves that are linux-1.8-slave1, etc. It looks like there's one set of slaves for staging ("1") and one set for production ("2"). If that's the case, it'd probably be clearer to name these linux-1.8-console-staging, unless the assumption is that you can mix and match these two sets in the future for redundancy?

-- I'm concerned that release builds are coming from machines that are different than where the nightlies come from, which is a fundamental process shift which I didn't see discussed anywhere. Is there a plan to address that, hopefully before 2.0.0.7?

-- Is there a reason make test is run before each step (especially in production)? Isn't it testing the same code on the slaves? (Upon further inspection, I see you have each step checking out the code, which I don't know if I understand, but ok. In that case, shouldn't it be checking out a stable tag in the production config? If a floating tag is used, a "cvs stat" after each checkout might be useful.)

-- In general, the log management seems a bit heavy handed; there are a lot of calls to "make clean_logs"; I remember talking about log management a bit, and don't know if there was any conclusion. I'm a little worried about losing logs entirely, and I think the response was "there are copies on the master," but how do we keep track of those longterm? Do we care? (I certainly do, but... maybe I'm alone here).

In general, the approach seems OK. I'm more familiar with the Bootstrap code, so it's harder for me to comment directly on the Buildbot approach. The dependent scheduling was what I had planned on using before these bugs were reassigned, so I think that's good/useful.

Can you point me at where the deliverables are popping out these days? I think that, combined with looking at the generated AUS snippets and such will be helpful.
Comment 36 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-09-13 23:26:51 PDT
(In reply to comment #29)
> Buildbot configs have been going here:
> mozilla/tools/buildbot-configs/
> 
> Probably makes sense to have something like:
> mozilla/tools/buildbot-configs/automation
> 
> And inside there have "staging" and "production" subdirectories.


Yeah, that seems good to me. Lets at least get what we used in 2007rc1 landed before we go any further... 
Comment 37 Robert Helmer [:rhelmer] 2007-09-13 23:31:01 PDT
Created attachment 280854 [details] [diff] [review]
patch as landed

RCS file: /cvsroot/mozilla/tools/buildbot-configs/automation/production/master.cfg,v
done
Checking in automation/production/master.cfg;
/cvsroot/mozilla/tools/buildbot-configs/automation/production/master.cfg,v  <--  master.cfg
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/automation/staging/master.cfg,v
done
Checking in automation/staging/master.cfg;
/cvsroot/mozilla/tools/buildbot-configs/automation/staging/master.cfg,v  <--  master.cfg
initial revision: 1.1
done
Comment 38 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-09-14 02:11:03 PDT
(In reply to comment #35)
> (From update of attachment 280138 [details])
> Let me preface my review comments with the disclaimer I haven't been working on
> this project for quite some time now (not for lack of wanting to), and so given
> that this is a bit of a "big bang" landing, don't know how relevant these
> review comments are; I'm missing a lot of the context for why things were done
> in certain ways, and I'm not sure if there's a place I can find that in
> Bugzilla or in the dependent bugs. If there is, please point me at it (or any
> relevant documentation on the wiki).

Sorry, the design context was covered in the special build team meeting you missed on 05sep2007. Lets talk offline to schedule another time we can redo this for you. Meanwhile, as you've taken yourself off the review list for this bug, we've gone ahead with landing these configs now, as they worked fine for 2007rc1, are worth preserving, and are not yet formally checked in anywhere yet.



> Having said that:
> -- There are 1.8 and trunk slaves; are we using this build automation for the
> trunk now? If so, in what capacity? Was bug 379278 fixed, and I missed it?

Automation is not yet being used for trunk, and bug#379278 has not been touched, afaik. The few trunk slaves that have been setup are listed in the "bot" section. This is intentional ae one buildmaster should be usable for both trunk and 1.8 slaves, hence putting 1.8 slaves and trunk slaves in master.cfg.



> -- There are a bunch of slaves that are linux-1.8-slave1, etc. It looks like
> there's one set of slaves for staging ("1") and one set for production ("2").
> If that's the case, it'd probably be clearer to name these
> linux-1.8-console-staging, unless the assumption is that you can mix and match
> these two sets in the future for redundancy?

Correct, the idea is to enable more slaves for redundancy, so expect to soon see linux-1.8-slave3, 4, 5, etc. And yes, with trivial changes (for example, the ssh keys), we can switch a slave between staging & production. Therefore, I would rather not encode that in the slave name, to avoid confusion. 


> -- I'm concerned that release builds are coming from machines that are
> different than where the nightlies come from, which is a fundamental process
> shift which I didn't see discussed anywhere. Is there a plan to address that,
> hopefully before 2.0.0.7?

The traditional build machines did run both production release builds, and also nightly builds, as separate different processes on the same machine. The new automation machines are intentionally separate machines from the traditional build machines, so we could in no way disrupt our live system while setting up the new automation system. 

These new automation build machines were cloned from our traditional build machines, so hardware, exact OS patchlevels, compilers, linkers, etc are identical. We then added the minimum set of added tools needed for automation (buildbot, python, twisted, etc). For details, see: http://wiki.mozilla.org/ReferencePlatforms/BuildBot. Yes, this does technically mean that the new automation machines are no longer exactly the same bits as the traditional build machines in a minimally small-as-possible way. This was part of the reason QA gave 2007rc1 a most thorough testing.

Moving nightlies to these automation machines is in the plans, but is not done yet, as there were other differences to reconcile also.



> -- Is there a reason make test is run before each step (especially in
> production)? Isn't it testing the same code on the slaves? (Upon further
> inspection, I see you have each step checking out the code, which I don't know
> if I understand, but ok. In that case, shouldn't it be checking out a stable
> tag in the production config? If a floating tag is used, a "cvs stat" after
> each checkout might be useful.)
> 
> -- In general, the log management seems a bit heavy handed; there are a lot of
> calls to "make clean_logs"; I remember talking about log management a bit, and
> don't know if there was any conclusion. I'm a little worried about losing logs
> entirely, and I think the response was "there are copies on the master," but
> how do we keep track of those longterm? Do we care? (I certainly do, but...
> maybe I'm alone here).
> 
> In general, the approach seems OK. I'm more familiar with the Bootstrap code,
> so it's harder for me to comment directly on the Buildbot approach. The
> dependent scheduling was what I had planned on using before these bugs were
> reassigned, so I think that's good/useful.
> 
> Can you point me at where the deliverables are popping out these days? I think
> that, combined with looking at the generated AUS snippets and such will be
> helpful.

Which deliverables you are talking about here? Updates/downloadable-full-install/etc? Each buildbot step sends out this type of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a staging run posted full win32 installable bits on http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Let me know if you are looking for something not already covered in those emails.
Comment 39 Robert Helmer [:rhelmer] 2007-09-14 09:28:33 PDT
(In reply to comment #38)
> (In reply to comment #35)
> Which deliverables you are talking about here?
> Updates/downloadable-full-install/etc? Each buildbot step sends out this type
> of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a
> staging run posted full win32 installable bits on
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Tinderbox is hardcoded to report that it pushed to "ftp.mozilla.org" but it doesn't really; on staging, everything pushes to staging-build-console. For production, everything pushes to build-console. The candidate and staging areas are then sync'd to the ftpserver.

The idea is to push to stage and FTP in the same exact locations as before, but we don't want to run a buildbot slave on the ftpserver. So, you can find all RC1 bits in the usual staging directories, just like any previous release.
Comment 40 J. Paul Reed [:preed] 2007-09-14 13:43:02 PDT
(In reply to comment #38)

> Sorry, the design context was covered in the special build team meeting you
> missed on 05sep2007. Lets talk offline to schedule another time we can redo
> this for you.

Actually, I think it would be better to write up this information (or annotate http://wiki.mozilla.org/Build:Release_Automation with any relevant notes from this meeting), and post it publicly, for comment and review.

People outside of MoCo who are interested in the development of the release automation aren't able to attend special build team meetings.

As this is a community-project we're working on (as much as Firefox is), it's important that that information is accessible by others outside of our walls, so they can comment and contribute. (We've commented before at MoFo Project meetings, seeking assistance from the community, so it makes it extremely difficult for anyone to contribute if there are special meetings to discuss development).

Additionally, these meetings should also be announced in public and held in public.

> Automation is not yet being used for trunk, and bug#379278 has not been
> touched, afaik. The few trunk slaves that have been setup are listed in the
> "bot" section. This is intentional ae one buildmaster should be usable for both
> trunk and 1.8 slaves, hence putting 1.8 slaves and trunk slaves in master.cfg.

If the automation isn't being used for trunk, can those sections be commented out until we are using it? (I'm mostly worried that a slave will execute bootstrap in a trunk context, and since the bug you pointed at has not been touched, that would be a Bad Thing (tm) (thinking specifically of things like unexpected tagging behavior against the trunk, etc.).

> Correct, the idea is to enable more slaves for redundancy, so expect to soon
> see linux-1.8-slave3, 4, 5, etc. And yes, with trivial changes (for example,
> the ssh keys), we can switch a slave between staging & production. Therefore, I
> would rather not encode that in the slave name, to avoid confusion. 

My suggestion was about naming, not about redundancy. 

"1" and "2" are not as clear as "staging" and "production," which aren't as clear as "staging" and "staging-backup."

It's difficult to remember that "1" = "staging" and "2" = "production," so if that's the case, why not call them that, so the mapping doesn't have to be remembered?

(Ignore the redundancy issue; in total agreement there; I'm talking about what the slaves are called.)

> The traditional build machines did run both production release builds, and also
> nightly builds, as separate different processes on the same machine. The new
> automation machines are intentionally separate machines from the traditional
> build machines, so we could in no way disrupt our live system while setting up
> the new automation system.

[snip.]

> Moving nightlies to these automation machines is in the plans, but is not done
> yet, as there were other differences to reconcile also.

Is there a bug to track this? Is there a place to discuss this change? What other differences are there to reconcile?

I understand the reasoning behind doing this while it was in development, but it seems like we're now using this for production as well (i.e. 2007), and I don't see where the discussion to have this (rather large, I might add) release process change took place.

Maybe I missed it, and it's lurking in a bug/newsgroup somewhere?

> Which deliverables you are talking about here?
> Updates/downloadable-full-install/etc? Each buildbot step sends out this type
> of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a
> staging run posted full win32 installable bits on
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Rhelmer answered; I wanted to look at the actual deliverables from automation; if the staging builds were getting on ftp.m.o, I'd be a) very surprised, and b) very scared.

> Let me know if you are looking for something not already covered in those
> emails.

There were the questions about logs and cvs stat usage; I believe, though, that you mentioned you just forgot to answer them, so I'd be interested in the answer when you have a chance. :-)
Comment 41 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-11-02 15:45:58 PDT
When setting up build automation machines on trunk, the win32 slave consistently hangs at the end of "cvs co...". I was able to reproduce the problem manually on the machine using:

 $ cvs -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32

The files are correctly checked out from cvs, and present on the slave local disk, but the cvs command never returns to the msys/bash prompt. Eventually, tinderbox times out, kills it, and flags the "cvs co" step as failed. 

Found this post: http://osdir.com/ml/gnu.mingw.msys/2003-05/msg00029.html, which claimed this is a symptom of a known "ssh not disconnecting under msys" problem from 2003, and suggested to use "-z3" or "-z5" as a workaround. 

Adding "-z3" to the cvs command win32 trunk staging slave solved the problem. I was able to run this 10 times in a row, without any problems.

 $ cvs -z3 -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32

Making a note of this here, as it seems others are hitting similar problems, and I wonder if the same workaround will help.
Comment 42 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2007-11-02 18:01:43 PDT
Did followon experiment on fx-win32-tbox, the current TRUNK production win32 machine, and found: 

1) Running the same command:
 $ cvs -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot
co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32
...hangs on fx-win32-tbox also, just like it did on the win32 staging slave. Changing the command to be "cvs -z3 -d staging...", like we did above, worked perfectly, just like it did on the win32 staging slave machine. 

2) Changing the same command to use a different CVS repo did *not* hang, even without the workaround -z3 parameter:
 $ cvs -d cvs.mozilla.org:/cvsroot
co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32
...did not hang 


Looks like there is something different about the connection to staging-trunk-automation.build.mozilla.org and cvs.mozilla.org???
Comment 43 Robert Helmer [:rhelmer] 2008-01-22 12:53:10 PST
This bug is huge and kind of nebulous. By some standards, we can do "end to end" runs now, maybe we should go over all the deps on this bug, close what we can, and then file new tracking bugs with more specific mandates?
Comment 44 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-02-28 12:42:13 PST
Lots of discussions here, but the remaining work items seem to be covered in the dependent bugs, so closing.

Note You need to log in before you can comment on or make changes to this bug.