355309 - (end2end-bld) Tracking bug for end-to-end release run

Reporter

Description

•

18 years ago

Tracking bug for automated release harness work (Bootstrap).
The code lives in mozilla/tools/release.

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 355606

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 356185

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 361297

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 363237

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Blocks: 366850

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

No longer blocks: 366850

Depends on: 366850

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 367438

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 368579

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 369004

Robert Helmer [:rhelmer]

Reporter

Updated

•

18 years ago

Depends on: 369538

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 370228

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 370459

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 370853

J. Paul Reed [:preed]

Comment 1

•

17 years ago

Talked with rhelmer; gonna co-opt this bug for tracking the end-to-end release work we're doing for Q2; the dependent bugs are all things that we should really try to fix as part of this effort anyway.

Alias: end2end-bld

Assignee: rhelmer → preed

Summary: Tracking bug for automated release harness (Bootstrap) → Tracking bug for end-to-end release run

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 371305

Michael Morgan [:morgamic]

Updated

•

17 years ago

Depends on: 371325

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 372744

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372746

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372755

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372757

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372759

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372762

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372764

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 372765

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 373080

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 373401

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 373995

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 373116

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 374555

J. Paul Reed [:preed]

Updated

•

17 years ago

URL: http://spreadsheets.google.com/pub?ke...

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 375006

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 375587

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 375714

Mike Schroepfer

Updated

•

17 years ago

No longer depends on: 372746

Mike Schroepfer

Updated

•

17 years ago

No longer depends on: 375714

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375784

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375785

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375786

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375787

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375788

Mike Schroepfer

Updated

•

17 years ago

Depends on: 375789

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 376959

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 378529

Jesse Ruderman

Updated

•

17 years ago

Keywords: meta

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 2

•

17 years ago

needed for "release automation", hence marking as critical.

Severity: major → critical

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Priority: -- → P1

J. Paul Reed [:preed]

Updated

•

17 years ago

Priority: P1 → P2

J. Paul Reed [:preed]

Comment 3

•

17 years ago

Down to P3 for this triage round.

Priority: P2 → P3

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 387426

J. Paul Reed [:preed]

Updated

•

17 years ago

Depends on: 387970

J. Paul Reed [:preed]

Comment 4

•

17 years ago

Over to John.

Assignee: preed → joduinn

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Priority: P3 → P2

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 5

•

17 years ago

Lots of activity this week, here's a summary:

To see what we really have working, we're setting up a staging environment, and will then duplicate that to create an equivalent production environment. 

A staging buildbot master is now up and running at http://staging-build-console.build.mozilla.org:8810 and also communicating with http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTest. This VMimage is a clean, new clone of the current 1.8 production linux build environment.

A staging linux buildbot slave is up and running on staging-prometheus-vm, and connecting to buildbot master for jobs. This VMimage is a clean, new clone of the current 1.8 production linux build environment.

We've found a few minor gotchas which are fixed on the spot, or being tracked in separate bugs. At this point, the tag, source, linuxbuild, update and stage steps seem to be running just fine.

Status: NEW → ASSIGNED

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 389206

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 6

•

17 years ago

A staging mac buildbot slave is up and running on bm-xserve14.build.mozilla.org, and connecting to buildbot master for jobs. This physical machine is a clean, new machine, running the current production mac build environment. Tracking bug#388373 closed.

We now have tag, source, linuxbuild, macosx-build, update and stage running fine.

Depends on: 388373

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 7

•

17 years ago

A staging win32 buildbot slave is up and running on staging-pacifica-vm.build.mozilla.org, , and connecting to buildbot master for jobs. This VMimage is a clean, new clone of the current 1.8 production win32
build environment. Tracking bug#388366 closed.

We now have tag, source, linuxbuild, win32-build, macosx-build, update and stage running fine.

Depends on: 388366

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 8

•

17 years ago

The l10n/repack step is now working for linux, win32, and can be seen on the staging-build-console (see URL above) along with the other working steps. 

It looks like l10n/repack is just now also working for mac also, but we want to triple check in the morning with fresh eyes before we declare success on mac.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 9

•

17 years ago

l10n/repack confirmed worked for mac also, forgot to update.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 10

•

17 years ago

"l10n verification" now setup, but failing because we found a script hardcoded to use stage.mozilla.org, causing the step to fail. Fixing... 

"update verification" now setup and working fully for linux and win32. On macosx, slave setup but incorrectly tried to do linux update verification on the mac! Debugging...

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 11

•

17 years ago

rhelmer just fixed l10 verification.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 12

•

17 years ago

macosx update verification now working. All we did was stop and restart the mac slave, and now this works. We'll have to watch this and see if it happens again.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 372746

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 13

•

17 years ago

After a few false-starts over the last week, we had our first human-free build today. It took from 10:12am to 7:23pm to complete from tag->stage, with no human intervention, and no triggering of subsequent steps. 

(The signing step was just stubbed out, doing a quick symlink to "fake out" that there were signed bits present, so the end-to-end time may slightly increase.)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 390519

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 390497

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 390493

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 14

•

17 years ago

Now that the staging systems are up and running, we started setting up the equivilent production systems. 

The production build master is at http://build-console.build.mozilla.org:8810/

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 15

•

17 years ago

A setback.

While configuring the production mac slave, I discovered that the production mac slave (bm-xserve12), and the staging mac slave (bm-xserve14) were both Intel based machines. 

However, the current production mac builds are being done on a powerpc mac (bm-xserve02). This means that we are not complete with staging setup as we had thought. We now need to find two powerpc macs (bug#391496 for a staging slave), (bug#391498 for a production slave) and install buildbot on them both. 

For now, while I work on this, we are putting back the previously working Intel-based mac slave (bm-xserve14), so that at least work on the overall end-to-end automation can continue in the meanwhile.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 391786

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 391787

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 391968

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 16

•

17 years ago

Production linux slave (on production-prometheus-vm) and production win32 slave (on production-pacifica-vm) are now up and running. They were tested against the staging-build-console, worked fine and are now connected to build-console.

We're trying to find/setup some intel xserve hardware, for mac production slave.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 391496

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 391498

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 392969

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 17

•

17 years ago

We were resigned to moving to Intel xserve hardware, because we couldnt find any more PPC xserves. We only had bm-xserve02.build.mozilla.org, and it was currently being used for FF2.0.0.x production. If this machine failed we had no replacement.

Last week, we found 2 PPC xserves: 
- doing inventory check, we found and reimaged a PPC xserve called fireball.build.mozilla.org. This is now called bm-xserve03.build.mozilla.org, and was reimaged from the current production bm-xserve02. This is being used for automation staging. For details, see bug#391496.
- IT repaired a long broken bm-xserve05.build.mozilla.org. This was also reimaged from the current production bm-xserve02. This is being used for automation production. For details, see bug#391498.
- doing inventory check, we found bm-xserve01. This was being used by dmills, but he was happy to move to a different machine if needed.

This means we could rollout automation without having to worry about CPU architecture changes / cross compiling. And we also have spare machines, just in case...

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 18

•

17 years ago

We've left bm-xserve01 untouched, but its good to know it exists! 

The other two machines (bm-xserve03, bm-xserve05) both now have buildbot slaves installed, which connect to the build-console and staging-build-console correctly. So far each test run makes it through some steps ok, but then hits tinderbox setup problems, which we then fix, only to hit a later problem. Still debugging!

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 19

•

17 years ago

We also reimaged two Intel xserves (bm-xserve12, bm-xserve14) for use as mac slaves on TRUNK. These were able to connect and do builds from build-console / staging-build-console. 

For now, these have been disconnected from buildmasters, and are both sitting idle. Once we have automation running on FF1.8 branch, we can comeback to do this for TRUNK. Keeping bug#388373 and bug#390519 open for tracking.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 20

•

17 years ago

Last night, got the staging system to run through end-to-end just fine, including using the new PPC mac slaves. 

This morning+afternoon, rhelmer and I went though the list of "just for testing" hacks that we had on production. See details at: http://wiki.mozilla.org/Build:Release_Automation#Just_for_testing

We alos reconfirmed cfg files are in sync between staging and production, manually cleaned up previous test runs, etc. 

Now that all those workaround are removed, we've started our first end-to-end production run. This is attempting to product "FF2.0.0.7 RC1" from the live cvs repository, using a cutoff time from last night's nightly run:

mac (bm-xserve02)
 timestamp: 1187698800
 build log:  
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8/1187698800.1400.gz&fulltext=1
 push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-05-mozilla1.8/

win32
 timestamp: 1187694420
 build log: 
http://tinderbox.mozilla.org/showlog.cgi?tree=Mozilla1.8&errorparser=windows&logfile=1187694420.13259.gz&buildtime=1187694420&buildname=WINNT%205.2%20pacifica-vm%20Depend%20Fx-Nightly&fulltext=1
 push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-04-mozilla1.8/

linux
 timestamp: 1187690820
 build log: 
http://tinderbox.mozilla.org/showlog.cgi?tree=Mozilla1.8&errorparser=unix&logfile=1187690820.25345.gz&buildtime=1187690820&buildname=Linux%20prometheus-vm%20Depend%20Fx-Nightly&fulltext=1
push dir: 
http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/2007-08-21-03-mozilla1.8/

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 21

•

17 years ago

Delayed yesterday at the manual signing step, because of errors in the docs. Fixed doc mid-afternoon. Once signed, the automation detected the logfile as planned and continued.

Finished 2007-rc1 builds were handed off to QA for testing. So far so good.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 22

•

17 years ago

QA have run the following tests:

smoketests (linux, win, mac, vista)
BFT (linux, win, mac)
FFT (win)
l10n tests for 12 p1 locales (linux, win, mac)
addon tests
...and additional manual update testing

All pass!!

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 23

•

17 years ago

Attached file buildbot master.cfg as used on production build (obsolete) — Details

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 24

•

17 years ago

Attached file buildbot master.cfg as used on staging build system (obsolete) — Details

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278478 - Attachment mime type: application/octet-stream → text/plain

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 25

•

17 years ago

These files are currently being used on staging-build-console and build-console. Note: for both of these attached master.cfg files

1) The password field for all buildbot slaves have been intentionally blanked out. However, the running staging and production systems do use passwords.

2) The staging buildmaster uses a differently named (but otherwise identical) set of slaves to the production buildmaster.

3) There are a couple of differences between the staging buildmaster and production buildmaster, around tagging and signing.

Diff-ing the two master.cfg files will show these differences.

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 394034

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394494

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394498

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394500

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394507

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394962

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 394963

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278478 - Flags: review?(rhelmer)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278478 - Flags: review?(preed)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278478 - Flags: review?(joduinn)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278479 - Flags: review?(joduinn)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278479 - Flags: review?(preed)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #278479 - Flags: review?(rhelmer)

Robert Helmer [:rhelmer]

Reporter

Comment 27

•

17 years ago

Comment on attachment 278478 [details]
buildbot master.cfg as used on production build

The two steps below should not be doing "clean tinder-config area" or "TinderConfig"; can you remove those before checkin please? :


>l10nverifyFactory.addStep(ShellCommand, description='clean tinder-config area', workdir='build',
>                     command=['rm', '-rfv', '/builds/config'])
>l10nverifyFactory.addStep(ShellCommand, description='TinderConfig', workdir='build',
>                     command=['perl', './release', '-o', 'TinderConfig'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})

>updateverifyFactory.addStep(ShellCommand, description='clean tinder-config area', workdir='build',
>                      command=['rm', '-rfv', '/builds/config'])
>updateverifyFactory.addStep(ShellCommand, description='TinderConfig', workdir='build',
>                     command=['perl', './release', '-o', 'TinderConfig'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})
>updateverifyFactory.addStep(ShellCommand, description='update verificaton', workdir='build',
>                     command=['perl', './release', '-v', '-o', 'Updates'],
>                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})


r=rhelmer with that change.

There's a bunch of stuff we know we need to do already, I'd like to get this checked in first but just to enumerate:

1) switch from Buildbot's CVS class to ShellCommand cvs, so we can always use branch provided in master.cfg (switch back if we can get the branch support working right with what we're doing)

2) set up more schedulers so we can resume the process after a failed build

3) general refactoring and cleanup (consider creating a Bootstrap subclass of Shell, etc).

Attachment #278478 - Flags: review?(rhelmer) → review+

Robert Helmer [:rhelmer]

Reporter

Comment 28

•

17 years ago

Comment on attachment 278479 [details]
buildbot master.cfg as used on staging build system

Same as comment #27; remove the two TinderConfig related steps from l10nverify and updateverify.

Only build should need to run TinderConfig.

Attachment #278479 - Flags: review?(rhelmer) → review+

Robert Helmer [:rhelmer]

Reporter

Comment 29

•

17 years ago

Buildbot configs have been going here:
mozilla/tools/buildbot-configs/

Probably makes sense to have something like:

mozilla/tools/buildbot-configs/automation

And inside there have "staging" and "production" subdirectories.

Robert Helmer [:rhelmer]

Reporter

Comment 30

•

17 years ago

(In reply to comment #27)
> >updateverifyFactory.addStep(ShellCommand, description='update verificaton', workdir='build',
> >                     command=['perl', './release', '-v', '-o', 'Updates'],
> >                     timeout=36000, haltOnFailure=1, env={'CVS_RSH': 'ssh'})
> 
> 
> r=rhelmer with that change.

Sorry, you need this last line of course :) Overzealous cut and paste on my part.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 31

•

17 years ago

Attached file buildbot master.cfg as used on staging build system [checked in] — Details

1) renames remaining slaves, so now all slaves have naming format of: <os>-<branch>-slave<number>

2) remove extra lines, per rhelmer's review.

Attachment #278479 - Attachment is obsolete: true

Attachment #280136 - Flags: review?(rhelmer)

Attachment #280136 - Flags: review?(preed)

Attachment #280136 - Flags: review?(joduinn)

Attachment #278479 - Flags: review?(preed)

Attachment #278479 - Flags: review?(joduinn)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 32

•

17 years ago

Attached file buildbot master.cfg as used on production build [checked in] — Details

1) renames remaining slaves, so now all slaves have naming format of:
<os>-<branch>-slave<number>

2) remove extra lines, per rhelmer's review.

Attachment #278478 - Attachment is obsolete: true

Attachment #280138 - Flags: review?(rhelmer)

Attachment #280138 - Flags: review?(preed)

Attachment #280138 - Flags: review?(joduinn)

Attachment #278478 - Flags: review?(preed)

Attachment #278478 - Flags: review?(joduinn)

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Attachment #280136 - Flags: review?(rhelmer) → review+

Robert Helmer [:rhelmer]

Reporter

Comment 33

•

17 years ago

Comment on attachment 280138 [details]
buildbot master.cfg as used on production build [checked in]

>####### PROJECT IDENTITY
>c['projectName'] = "Release Automation Test"

This should should probably not have "Test" in the name; looks fine besides!

Attachment #280138 - Flags: review?(rhelmer) → review+

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #280136 - Flags: review?(joduinn) → review+

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Attachment #280138 - Flags: review?(joduinn) → review+

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 34

•

17 years ago

Agreed... good catch. "Test" has been in name of both "staging-build-console" and "build-console" for ever. Will fix that later, with the next set of changes.

J. Paul Reed [:preed]

Comment 35

•

17 years ago

Comment on attachment 280138 [details]
buildbot master.cfg as used on production build [checked in]

Let me preface my review comments with the disclaimer I haven't been working on this project for quite some time now (not for lack of wanting to), and so given that this is a bit of a "big bang" landing, don't know how relevant these review comments are; I'm missing a lot of the context for why things were done in certain ways, and I'm not sure if there's a place I can find that in Bugzilla or in the dependent bugs. If there is, please point me at it (or any relevant documentation on the wiki).

Having said that:

-- There are 1.8 and trunk slaves; are we using this build automation for the trunk now? If so, in what capacity? Was bug 379278 fixed, and I missed it?

-- There are a bunch of slaves that are linux-1.8-slave1, etc. It looks like there's one set of slaves for staging ("1") and one set for production ("2"). If that's the case, it'd probably be clearer to name these linux-1.8-console-staging, unless the assumption is that you can mix and match these two sets in the future for redundancy?

-- I'm concerned that release builds are coming from machines that are different than where the nightlies come from, which is a fundamental process shift which I didn't see discussed anywhere. Is there a plan to address that, hopefully before 2.0.0.7?

-- Is there a reason make test is run before each step (especially in production)? Isn't it testing the same code on the slaves? (Upon further inspection, I see you have each step checking out the code, which I don't know if I understand, but ok. In that case, shouldn't it be checking out a stable tag in the production config? If a floating tag is used, a "cvs stat" after each checkout might be useful.)

-- In general, the log management seems a bit heavy handed; there are a lot of calls to "make clean_logs"; I remember talking about log management a bit, and don't know if there was any conclusion. I'm a little worried about losing logs entirely, and I think the response was "there are copies on the master," but how do we keep track of those longterm? Do we care? (I certainly do, but... maybe I'm alone here).

In general, the approach seems OK. I'm more familiar with the Bootstrap code, so it's harder for me to comment directly on the Buildbot approach. The dependent scheduling was what I had planned on using before these bugs were reassigned, so I think that's good/useful.

Can you point me at where the deliverables are popping out these days? I think that, combined with looking at the generated AUS snippets and such will be helpful.

Attachment #280138 - Flags: review?(preed)

J. Paul Reed [:preed]

Updated

•

17 years ago

Attachment #280136 - Flags: review?(preed)

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

URL: http://spreadsheets.google.com/pub?ke... → http://wiki.mozilla.org/Build:Release...

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 36

•

17 years ago

(In reply to comment #29)
> Buildbot configs have been going here:
> mozilla/tools/buildbot-configs/
> 
> Probably makes sense to have something like:
> mozilla/tools/buildbot-configs/automation
> 
> And inside there have "staging" and "production" subdirectories.


Yeah, that seems good to me. Lets at least get what we used in 2007rc1 landed before we go any further...

Robert Helmer [:rhelmer]

Reporter

Comment 37

•

17 years ago

Attached patch patch as landed — Details — Splinter Review

RCS file: /cvsroot/mozilla/tools/buildbot-configs/automation/production/master.cfg,v
done
Checking in automation/production/master.cfg;
/cvsroot/mozilla/tools/buildbot-configs/automation/production/master.cfg,v  <--  master.cfg
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/tools/buildbot-configs/automation/staging/master.cfg,v
done
Checking in automation/staging/master.cfg;
/cvsroot/mozilla/tools/buildbot-configs/automation/staging/master.cfg,v  <--  master.cfg
initial revision: 1.1
done

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Attachment #280136 - Attachment description: buildbot master.cfg as used on staging build system → buildbot master.cfg as used on staging build system [checked in]

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Attachment #280138 - Attachment description: buildbot master.cfg as used on production build → buildbot master.cfg as used on production build [checked in]

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 38

•

17 years ago

(In reply to comment #35)
> (From update of attachment 280138 [details])
> Let me preface my review comments with the disclaimer I haven't been working on
> this project for quite some time now (not for lack of wanting to), and so given
> that this is a bit of a "big bang" landing, don't know how relevant these
> review comments are; I'm missing a lot of the context for why things were done
> in certain ways, and I'm not sure if there's a place I can find that in
> Bugzilla or in the dependent bugs. If there is, please point me at it (or any
> relevant documentation on the wiki).

Sorry, the design context was covered in the special build team meeting you missed on 05sep2007. Lets talk offline to schedule another time we can redo this for you. Meanwhile, as you've taken yourself off the review list for this bug, we've gone ahead with landing these configs now, as they worked fine for 2007rc1, are worth preserving, and are not yet formally checked in anywhere yet.



> Having said that:
> -- There are 1.8 and trunk slaves; are we using this build automation for the
> trunk now? If so, in what capacity? Was bug 379278 fixed, and I missed it?

Automation is not yet being used for trunk, and bug#379278 has not been touched, afaik. The few trunk slaves that have been setup are listed in the "bot" section. This is intentional ae one buildmaster should be usable for both trunk and 1.8 slaves, hence putting 1.8 slaves and trunk slaves in master.cfg.



> -- There are a bunch of slaves that are linux-1.8-slave1, etc. It looks like
> there's one set of slaves for staging ("1") and one set for production ("2").
> If that's the case, it'd probably be clearer to name these
> linux-1.8-console-staging, unless the assumption is that you can mix and match
> these two sets in the future for redundancy?

Correct, the idea is to enable more slaves for redundancy, so expect to soon see linux-1.8-slave3, 4, 5, etc. And yes, with trivial changes (for example, the ssh keys), we can switch a slave between staging & production. Therefore, I would rather not encode that in the slave name, to avoid confusion. 


> -- I'm concerned that release builds are coming from machines that are
> different than where the nightlies come from, which is a fundamental process
> shift which I didn't see discussed anywhere. Is there a plan to address that,
> hopefully before 2.0.0.7?

The traditional build machines did run both production release builds, and also nightly builds, as separate different processes on the same machine. The new automation machines are intentionally separate machines from the traditional build machines, so we could in no way disrupt our live system while setting up the new automation system. 

These new automation build machines were cloned from our traditional build machines, so hardware, exact OS patchlevels, compilers, linkers, etc are identical. We then added the minimum set of added tools needed for automation (buildbot, python, twisted, etc). For details, see: http://wiki.mozilla.org/ReferencePlatforms/BuildBot. Yes, this does technically mean that the new automation machines are no longer exactly the same bits as the traditional build machines in a minimally small-as-possible way. This was part of the reason QA gave 2007rc1 a most thorough testing.

Moving nightlies to these automation machines is in the plans, but is not done yet, as there were other differences to reconcile also.



> -- Is there a reason make test is run before each step (especially in
> production)? Isn't it testing the same code on the slaves? (Upon further
> inspection, I see you have each step checking out the code, which I don't know
> if I understand, but ok. In that case, shouldn't it be checking out a stable
> tag in the production config? If a floating tag is used, a "cvs stat" after
> each checkout might be useful.)
> 
> -- In general, the log management seems a bit heavy handed; there are a lot of
> calls to "make clean_logs"; I remember talking about log management a bit, and
> don't know if there was any conclusion. I'm a little worried about losing logs
> entirely, and I think the response was "there are copies on the master," but
> how do we keep track of those longterm? Do we care? (I certainly do, but...
> maybe I'm alone here).
> 
> In general, the approach seems OK. I'm more familiar with the Bootstrap code,
> so it's harder for me to comment directly on the Buildbot approach. The
> dependent scheduling was what I had planned on using before these bugs were
> reassigned, so I think that's good/useful.
> 
> Can you point me at where the deliverables are popping out these days? I think
> that, combined with looking at the generated AUS snippets and such will be
> helpful.

Which deliverables you are talking about here? Updates/downloadable-full-install/etc? Each buildbot step sends out this type of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a staging run posted full win32 installable bits on http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Let me know if you are looking for something not already covered in those emails.

Robert Helmer [:rhelmer]

Reporter

Comment 39

•

17 years ago

(In reply to comment #38)
> (In reply to comment #35)
> Which deliverables you are talking about here?
> Updates/downloadable-full-install/etc? Each buildbot step sends out this type
> of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a
> staging run posted full win32 installable bits on
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Tinderbox is hardcoded to report that it pushed to "ftp.mozilla.org" but it doesn't really; on staging, everything pushes to staging-build-console. For production, everything pushes to build-console. The candidate and staging areas are then sync'd to the ftpserver.

The idea is to push to stage and FTP in the same exact locations as before, but we don't want to run a buildbot slave on the ftpserver. So, you can find all RC1 bits in the usual staging directories, just like any previous release.

J. Paul Reed [:preed]

Comment 40

•

17 years ago

(In reply to comment #38)

> Sorry, the design context was covered in the special build team meeting you
> missed on 05sep2007. Lets talk offline to schedule another time we can redo
> this for you.

Actually, I think it would be better to write up this information (or annotate http://wiki.mozilla.org/Build:Release_Automation with any relevant notes from this meeting), and post it publicly, for comment and review.

People outside of MoCo who are interested in the development of the release automation aren't able to attend special build team meetings.

As this is a community-project we're working on (as much as Firefox is), it's important that that information is accessible by others outside of our walls, so they can comment and contribute. (We've commented before at MoFo Project meetings, seeking assistance from the community, so it makes it extremely difficult for anyone to contribute if there are special meetings to discuss development).

Additionally, these meetings should also be announced in public and held in public.

> Automation is not yet being used for trunk, and bug#379278 has not been
> touched, afaik. The few trunk slaves that have been setup are listed in the
> "bot" section. This is intentional ae one buildmaster should be usable for both
> trunk and 1.8 slaves, hence putting 1.8 slaves and trunk slaves in master.cfg.

If the automation isn't being used for trunk, can those sections be commented out until we are using it? (I'm mostly worried that a slave will execute bootstrap in a trunk context, and since the bug you pointed at has not been touched, that would be a Bad Thing (tm) (thinking specifically of things like unexpected tagging behavior against the trunk, etc.).

> Correct, the idea is to enable more slaves for redundancy, so expect to soon
> see linux-1.8-slave3, 4, 5, etc. And yes, with trivial changes (for example,
> the ssh keys), we can switch a slave between staging & production. Therefore, I
> would rather not encode that in the slave name, to avoid confusion. 

My suggestion was about naming, not about redundancy. 

"1" and "2" are not as clear as "staging" and "production," which aren't as clear as "staging" and "staging-backup."

It's difficult to remember that "1" = "staging" and "2" = "production," so if that's the case, why not call them that, so the mapping doesn't have to be remembered?

(Ignore the redundancy issue; in total agreement there; I'm talking about what the slaves are called.)

> The traditional build machines did run both production release builds, and also
> nightly builds, as separate different processes on the same machine. The new
> automation machines are intentionally separate machines from the traditional
> build machines, so we could in no way disrupt our live system while setting up
> the new automation system.

[snip.]

> Moving nightlies to these automation machines is in the plans, but is not done
> yet, as there were other differences to reconcile also.

Is there a bug to track this? Is there a place to discuss this change? What other differences are there to reconcile?

I understand the reasoning behind doing this while it was in development, but it seems like we're now using this for production as well (i.e. 2007), and I don't see where the discussion to have this (rather large, I might add) release process change took place.

Maybe I missed it, and it's lurking in a bug/newsgroup somewhere?

> Which deliverables you are talking about here?
> Updates/downloadable-full-install/etc? Each buildbot step sends out this type
> of information in emails to Build@mozilla.org. For example, at 20:53 tonight, a
> staging run posted full win32 installable bits on
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-09-13-20-firefox2.0.0.4/ 

Rhelmer answered; I wanted to look at the actual deliverables from automation; if the staging builds were getting on ftp.m.o, I'd be a) very surprised, and b) very scared.

> Let me know if you are looking for something not already covered in those
> emails.

There were the questions about logs and cvs stat usage; I believe, though, that you mentioned you just forgot to answer them, so I'd be interested in the answer when you have a chance. :-)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 396253

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 396290

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 396430

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 396438

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 385783

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 397554

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 397842

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 398223

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 398494

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 399628

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 399900

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 400103

Reed Loden [:reed]

Updated

•

17 years ago

QA Contact: mozpreed → build

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 401150

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 401202

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 401290

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 388524

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 401459

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 401579

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 379278

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 401595

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 401596

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 401628

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 401936

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 41

•

17 years ago

When setting up build automation machines on trunk, the win32 slave consistently hangs at the end of "cvs co...". I was able to reproduce the problem manually on the machine using:

 $ cvs -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32

The files are correctly checked out from cvs, and present on the slave local disk, but the cvs command never returns to the msys/bash prompt. Eventually, tinderbox times out, kills it, and flags the "cvs co" step as failed. 

Found this post: http://osdir.com/ml/gnu.mingw.msys/2003-05/msg00029.html, which claimed this is a symptom of a known "ssh not disconnecting under msys" problem from 2003, and suggested to use "-z3" or "-z5" as a workaround. 

Adding "-z3" to the cvs command win32 trunk staging slave solved the problem. I was able to run this 10 times in a row, without any problems.

 $ cvs -z3 -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32

Making a note of this here, as it seems others are hitting similar problems, and I wonder if the same workaround will help.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 42

•

17 years ago

Did followon experiment on fx-win32-tbox, the current TRUNK production win32 machine, and found: 

1) Running the same command:
 $ cvs -d staging-trunk-automation.build.mozilla.org:/builds/cvsmirror/cvsroot
co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32
...hangs on fx-win32-tbox also, just like it did on the win32 staging slave. Changing the command to be "cvs -z3 -d staging...", like we did above, worked perfectly, just like it did on the win32 staging slave machine. 

2) Changing the same command to use a different CVS repo did *not* hang, even without the workaround -z3 parameter:
 $ cvs -d cvs.mozilla.org:/cvsroot
co -d release -r release mozilla/tools/tinderbox-configs/firefox/win32
...did not hang 


Looks like there is something different about the connection to staging-trunk-automation.build.mozilla.org and cvs.mozilla.org???

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 378526

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 402582

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 299909

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 404062

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 383297

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 406602

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 407783

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 408157

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 408453

Nick Thomas [:nthomas] (UTC+12)

Updated

•

17 years ago

Depends on: 408811

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 407351

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 408868

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 409393

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

No longer depends on: 409393

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 409394

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409430

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409434

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409449

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409477

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409479

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 409493

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 409395

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

Depends on: 410861

Robert Helmer [:rhelmer]

Reporter

Updated

•

17 years ago

No longer depends on: 397554

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Depends on: 411928

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 412000

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 412006

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Updated

•

17 years ago

Depends on: 413044

Robert Helmer [:rhelmer]

Reporter

Comment 43

•

17 years ago

This bug is huge and kind of nebulous. By some standards, we can do "end to end" runs now, maybe we should go over all the deps on this bug, close what we can, and then file new tracking bugs with more specific mandates?

Nick Thomas [:nthomas] (UTC+12)

Updated

•

16 years ago

Depends on: 417703

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 44

•

16 years ago

Lots of discussions here, but the remaining work items seem to be covered in the dependent bugs, so closing.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

buildbot master.cfg as used on production build 17 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 14.63 KB, text/plain	rhelmer : review+	Details
buildbot master.cfg as used on staging build system 17 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 15.67 KB, text/plain	rhelmer : review+	Details
buildbot master.cfg as used on staging build system [checked in] 17 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 15.04 KB, text/plain	joduinn : review+ rhelmer : review+	Details
buildbot master.cfg as used on production build [checked in] 17 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 14.00 KB, text/plain	joduinn : review+ rhelmer : review+	Details
patch as landed 17 years ago Robert Helmer [:rhelmer] 30.22 KB, patch		Details \| Diff \| Splinter Review