Closed
Bug 465868
Opened 16 years ago
Closed 16 years ago
[Tracking bug] have one Buildbot master instance and pool of slaves produce all builds and unittests for moz2.
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: catlee)
References
Details
Attachments
(7 files, 3 obsolete files)
20.55 KB,
patch
|
bhearsum
:
review+
lsblakk
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
1.79 KB,
patch
|
bhearsum
:
review+
lsblakk
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
9.82 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
3.94 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
8.41 KB,
patch
|
catlee
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
2.65 KB,
patch
|
catlee
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
3.50 KB,
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
Before: - buildbot master instance for builds; one pool of slaves for builds, all on Build network - buildbot master instance for unittests; a separate pool of slaves for unittests, all on QA network. After: - one buildbot master instance for both builds *and* unittests; one pool of slaves for both builds *and* unittests, all on the Build network. Note: I specify "moz2" here, because the different project branches there all use the same tool chain, so therefore work on mozilla-central, tracemonkey, mozilla-1.9.1 and places can be shared across the one pool. Other active code lines which use *different* tool chains will require their own pool of slaves. (Its been talked about since early summer 2008, and prepwork is covered in lots of bugs, and is even a Q4 goal, but I cant find a specific bug on it?! Hence, filing this - if you know of another preexisting bug, please close this as DUP)
Reporter | ||
Comment 1•16 years ago
|
||
Lots of work already done in Q3 to change machines to use same accounts, be on same network, etc. We already have one consolidated pool of slaves in staging m-c. Question is: Whats still to do before we enable this for production m-c? In Toronto gathering last month, we pulled together these remaining step, and tweaked the list a little more today: - [+] create 4 new linux slaves (catlee) - [+] windows slave: accept pskill license (lblakk) - [+] talk with tracemonkey developers about running all unittests, not just "make check" (joduinn). They are all ok with that. Per test run last week, all is ok now. - [+] update win32 refplatforms - moz_no_reset_path, set screen resolution (joduinn) bug#460535 - [+] stop including slavename in tinderbox column in staging (bhearsum) - [+] consolidate master.cfg (bhearsum) - [+] update existing win32 build slaves - screen resolution (lblakk) - [+] change tracemonkey unittest HW (bm-win2k3-unittest-02-hw) to be VM (lblakk) - [+] create 4 new win32 slaves, bug#460729 (joduinn) - [+] put new slaves into staging, then production (catlee, joduinn) - [ ] mac slave: audit /tools, xcode (lblakk) - [ ] update linux refplatform - scratchbox,xvnc (joduinn) - [ ] update existing linux build slaves - xvnc (lblakk) - [ ] delete and recreate old unittest slaves (4 linux, 4 win32) (lblakk) - [ ] move bm-xserve21 -> mozilla-central (lblakk) - [ ] change tracemonkey unittest HW (bm-win2k3-unittest-02-hw) to be VM (lsblakk) - [ ] upgrade bm-unittest01 to 10.5 (lblakk) - [ ] stop including slavename in tinderbox column in production (bhearsum) - [ ] bug#464079 - fix exception when doing reconfig in master (catlee) - [ ] reimage the staging xserve, use in unittest production as interim step (lsblakk) - [ ] moz2-win32-slave15->18 need utils added (lsblakk) - [ ] verify new linux slaves have firefox profile initialised (lsblakk) - [ ] consolidate slave setup doc for build/unittest slaves. Update ref images if needed. (lsblakk, joduinn) - [ ] write production consolidation patches (lsblakk) - [ ] delete staging buildbot master for unittests Does this look right, or did I miss anything?
Comment 2•16 years ago
|
||
The macs are all new images and are up and running successfully > - [+] mac slave: audit /tools, xcode (lblakk) The existing linux slave have all been updated by having firefox open once, to create a default profile and by having the slaves started with DISPLAY=:2 > - [+] update existing linux build slaves - xvnc (lblakk) > - [ ] delete and recreate old unittest slaves (4 linux, 4 win32) > (lblakk) > - [ ] move bm-xserve21 -> mozilla-central (lblakk) The tracemonkey HW box is closed off and the win32 tracemonkey tests are now being run on a VM > - [+] change tracemonkey unittest HW (bm-win2k3-unittest-02-hw) to be > VM (lsblakk) There is a bug file to re-image bm-xserve-unittest01 (and to rename it to bm-xserve22) > - [X] upgrade bm-unittest01 to 10.5 (lblakk) -- Let's rename this - [ ] re-image bm-xserve-unittest01 and rename to bm-xserve22 so it can move to production ^^ is bug 465766 so that means we can get rid of this: > - [ ] reimage the staging xserve, use in unittest production as interim > step (lsblakk) These are done: > - [+] moz2-win32-slave15->18 need utils added (lsblakk) > - [+] verify new linux slaves have firefox profile initialised (lsblakk)
Reporter | ||
Comment 3•16 years ago
|
||
Revised list after talking it through with lukas on irc. ToDo [ ] fix linux mochichrome and mochitest failures on build/unittest slaves. Mac green, win32 green. [ ] update linux refplatform - scratchbox,xvnc (joduinn) [ ] stop including slavename in tinderbox column in production (bhearsum) [ ] bug#464079 - fix exception when doing reconfig in master (catlee) [ ] consolidate slave setup doc for build/unittest slaves. Update ref images if needed. (lsblakk, joduinn) [ ] update support doc to show how to start slave with xfvb and DISPLAY settings [ ] write production consolidation patches (lsblakk) [ ] delete staging buildbot master for unittests [ ] after consolidation is in production, delete and recreate old unittest slaves (4 linux, 4 win32) (lblakk) [ ] after consolidation, investigate and possibly move bm-xserve21 -> mozilla-central (lblakk) Done: [+] create 4 new linux slaves (catlee) [+] windows slave: accept pskill license (lblakk) [+] update existing linux build slaves - xvfb (lblakk) [+] talk with tracemonkey developers about running all unittests, not just "make check" (joduinn). They are all ok with that. Per test run last week, all is ok now. [+] update win32 refplatforms - moz_no_reset_path, set screen resolution (joduinn) bug#460535 [+] stop including slavename in tinderbox column in staging (bhearsum) [+] consolidate master.cfg (bhearsum) [+] update existing win32 build slaves - screen resolution (lblakk) [+] change tracemonkey unittest HW (bm-win2k3-unittest-02-hw) to be VM (lblakk) [+] create 4 new win32 slaves, bug#460729 (joduinn) [+] put new slaves into staging, then production (catlee, joduinn) [+] change tracemonkey unittest HW (bm-win2k3-unittest-02-hw) to be VM (lsblakk) [+] upgrade bm-xserve-unittest01 to 10.5 (lblakk) [+] re-image bm-xserve-unittest01 , rename to bm-xserve22 and move it to production mozilla-1.9.1. Details in bug 465766 [+] moz2-win32-slave15->18 need utils added (lsblakk) [+] verify new linux slaves have firefox profile initialised (lsblakk) Dropped: [+] mac slave: audit /tools, xcode (lblakk). instead replaced with new machines. [+] reimage the staging xserve, use in unittest production as interim step (lsblakk)
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → catlee
Assignee | ||
Comment 4•16 years ago
|
||
looks like mochitest is timing out on staging
Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #3) > ToDo > [ ] fix linux mochichrome and mochitest failures on build/unittest slaves. Mac > green, win32 green. On Linux this is caused by a combination of buildbot not being started with DISPLAY=:2, metacity not running, or Xvfb not running. See bug 468823. > [ ] update linux refplatform - scratchbox,xvnc (joduinn) ref platform and existing slaves should be updated with cronjobs from 468823. > [ ] update support doc to show how to start slave with xfvb and DISPLAY > settings this won't be necessary once the above is done.
Assignee | ||
Comment 6•16 years ago
|
||
So our current ToDos for getting this moved into production is: [ ] set DISPLAY=:2 on linux (from bug 468823) [ ] update linux slaves with Xvfb, metacity cronjobs from bug 468823 [ ] write production consolidation patches [ ] stop including slavename in tinderbox column in production (bhearsum) ? These can happen at any time: [ ] update linux refplatform - Xvfb,metacity [ ] consolidate slave setup doc for build/unittest slaves. Update ref images if needed. (lsblakk, joduinn) [ ] delete staging buildbot master for unittests [ ] after consolidation is in production, delete and recreate old unittest slaves (4 linux, 4 win32) (lblakk) [ ] after consolidation, investigate and possibly move bm-xserve21 -> mozilla-central (lblakk)
Comment 7•16 years ago
|
||
(In reply to comment #6) > So our current ToDos for getting this moved into production is: > [ ] set DISPLAY=:2 on linux (from bug 468823) > [ ] update linux slaves with Xvfb, metacity cronjobs from bug 468823 > [ ] write production consolidation patches > [ ] stop including slavename in tinderbox column in production > (bhearsum) ? > > These can happen at any time: > [ ] update linux refplatform - Xvfb,metacity > [ ] consolidate slave setup doc for build/unittest slaves. Update ref > images if needed. (lsblakk, joduinn) > [X] delete staging buildbot master for unittests We deleted the m-c staging buildbot already when we put 1.9.1 and m-c standalone production unittest into being. > [ ] after consolidation is in production, delete and recreate old unittest > slaves (4 linux, 4 win32) (lblakk) I don't remember why we might want to move bm-xserve21, it's doing production 1.9.0 right now. Do we not want to have 2 Mac builds on that production waterfall? > [ ] after consolidation, investigate and possibly move bm-xserve21 -> > mozilla-central (lblakk)
Assignee | ||
Comment 8•16 years ago
|
||
We were still having some problems because modules were being reloaded multiple times per 'buildbot reconfig', resulting in problems when constructing unittest steps. By reloading modules in one place, we can make sure they're only reloaded once per 'buildbot reconfig'
Attachment #353107 -
Flags: review?(bhearsum)
Assignee | ||
Comment 9•16 years ago
|
||
Attachment #353109 -
Flags: review?(bhearsum)
Assignee | ||
Updated•16 years ago
|
Attachment #353107 -
Flags: review?(lukasblakk)
Assignee | ||
Comment 10•16 years ago
|
||
(In reply to comment #9) > Created an attachment (id=353109) [details] > Add repoPath to UnittestBuildFactory, and fix module reloading Ignore the changes to env.py in here, those are already covered in another bug.
Updated•16 years ago
|
Attachment #353107 -
Flags: review?(lukasblakk) → review+
Comment 11•16 years ago
|
||
Comment on attachment 353107 [details] [diff] [review] Bring unittest factory logic into master.cfg, and fix module reloading I'm not a big fan of moving the factory.py reloads over here. Is there any way to keep them confined to their buildbotcustom modules?
Assignee | ||
Comment 12•16 years ago
|
||
Attachment #353107 -
Attachment is obsolete: true
Attachment #353224 -
Flags: review?(bhearsum)
Attachment #353107 -
Flags: review?(bhearsum)
Assignee | ||
Comment 13•16 years ago
|
||
Attachment #353109 -
Attachment is obsolete: true
Attachment #353225 -
Flags: review?(bhearsum)
Attachment #353109 -
Flags: review?(bhearsum)
Assignee | ||
Updated•16 years ago
|
Attachment #353224 -
Flags: review?(lukasblakk)
Assignee | ||
Updated•16 years ago
|
Attachment #353225 -
Flags: review?(lukasblakk)
Updated•16 years ago
|
Attachment #353225 -
Flags: review?(lukasblakk) → review+
Updated•16 years ago
|
Attachment #353224 -
Flags: review?(lukasblakk) → review+
Comment 14•16 years ago
|
||
So here's the patch for production implementation - it's the same as the staging one, so if anything needs tweaking from our staging reloads and whathaveyou this will need to change too.
Attachment #353281 -
Flags: review?(catlee)
Assignee | ||
Updated•16 years ago
|
Attachment #353281 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 15•16 years ago
|
||
[x] update linux slaves with Xvfb, metacity cronjobs from bug 468823 linux slaves on production-master and staging-master now have this in cltbld's crontab: # Make sure Xvfb is running on :2 @reboot ps -C Xvfb | grep -q Xvfb || exec Xvfb :2 -screen 0 1280x1024x24 & */5 * * * * ps -C Xvfb | grep -q Xvfb || exec Xvfb :2 -screen 0 1280x1024x24 & # Make sure metacity is running on :2 @reboot ps -C metacity -f | grep -q :2 || exec metacity --display :2 --replace & */5 * * * * ps -C metacity -f | grep -q :2 || exec metacity --display :2 --replace &
Reporter | ||
Comment 16•16 years ago
|
||
(In reply to comment #7) > (In reply to comment #6) > > [ ] stop including slavename in tinderbox column in production > > (bhearsum) ? Done by catlee.
Reporter | ||
Updated•16 years ago
|
Priority: -- → P2
Comment 17•16 years ago
|
||
Comment on attachment 353224 [details] [diff] [review] Bring unittest factory logic into master.cfg, and fix module reloading Looks fine to me.
Attachment #353224 -
Flags: review?(bhearsum) → review+
Updated•16 years ago
|
Attachment #353225 -
Flags: review?(bhearsum) → review+
Comment 18•16 years ago
|
||
Comment on attachment 353225 [details] [diff] [review] Add repoPath to UnittestBuildFactory Checking in process/factory.py; /cvsroot/mozilla/tools/buildbotcustom/process/factory.py,v <-- factory.py new revision: 1.61; previous revision: 1.60 done
Attachment #353225 -
Flags: checked‑in+
Comment 19•16 years ago
|
||
Comment on attachment 353224 [details] [diff] [review] Bring unittest factory logic into master.cfg, and fix module reloading changeset: 601:b641aa91d1bf
Attachment #353224 -
Flags: checked‑in+
Updated•16 years ago
|
Attachment #353281 -
Flags: checked‑in+
Comment 20•16 years ago
|
||
Comment on attachment 353281 [details] [diff] [review] Production consolidation patch changeset: 601:b641aa91d1bf
Reporter | ||
Comment 21•16 years ago
|
||
This rolled out live in production yesterday (18dec2008), so we are now producing builds and unittests from the same pool-of-identical slaves. We're still running dedicated old unittest systems as usual, so we can watch the two sets of unittests results in parallel for a while.
Comment 22•16 years ago
|
||
Comment on attachment 353224 [details] [diff] [review] Bring unittest factory logic into master.cfg, and fix module reloading >--- a/mozilla2-staging/unittest_master.py Mon Dec 15 15:48:08 2008 +0100 >- errorparser="unittest" Losing that means that the brief log is now a useless spew of every passed (or known-fail) test which happens to include the string "error" in the message. Can we have the errorparser that understands unit test error messages back, pretty please?
Comment 23•16 years ago
|
||
Attachment #354188 -
Flags: review?(catlee)
Assignee | ||
Updated•16 years ago
|
Attachment #354188 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 24•16 years ago
|
||
(In reply to comment #22) > (From update of attachment 353224 [details] [diff] [review]) > >--- a/mozilla2-staging/unittest_master.py Mon Dec 15 15:48:08 2008 +0100 > >- errorparser="unittest" > > Losing that means that the brief log is now a useless spew of every passed (or > known-fail) test which happens to include the string "error" in the message. > Can we have the errorparser that understands unit test error messages back, > pretty please? Yup, this is bug 470757. Should be all good now.
Comment 25•16 years ago
|
||
Attachment #354206 -
Flags: review?
Updated•16 years ago
|
Attachment #354206 -
Flags: review? → review?(catlee)
Comment 26•16 years ago
|
||
The following slaves have been clobbered and re-issued to staging-master (consolidated): moz2-linux-slave07 moz2-linux-slave08 moz2-linux-slave10 moz2-linux-slave13 moz2-win32-slave07 moz2-win32-slave08 moz2-win32-slave09 moz2-win32-slave10 moz2-darwin9-slave05 bm-xserve22 When bug 470788 is resolved then 2 more mac slaves will be added to staging. Once all these slaves have been successfully running on staging, they can be switched over to production (consolidated). The patch is ready for that - see comment 25
Comment 27•16 years ago
|
||
I have turned off the standalone production unittest (moz2-unittest in /builds/buildbot) and updated production-master nagios
Comment 28•16 years ago
|
||
Comment on attachment 354188 [details] [diff] [review] Adds new mac slaves to staging changeset: 620:64e879af7c94
Attachment #354188 -
Flags: checked‑in+
Comment 29•16 years ago
|
||
2 mac slaves added to staging-pool: moz2-darwin9-slave06 moz2-darwin9-slave07
Comment 30•16 years ago
|
||
Please update the inventory & IT support docs before resolving this bug fixed.
Assignee | ||
Comment 31•16 years ago
|
||
Still left: [ ] update linux refplatform - Xvfb,metacity [ ] consolidate slave setup doc for build/unittest slaves. Update ref images if needed. (lsblakk, joduinn) [ ] update inventory
Comment 32•16 years ago
|
||
You should add [ ] update support docs to that list too, as Nick pointed out.
Assignee | ||
Updated•16 years ago
|
Attachment #354206 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 33•16 years ago
|
||
Comment on attachment 354206 [details] [diff] [review] Add 4 new slaves of each platform to Production changeset: 653:6797175d6f78
Attachment #354206 -
Flags: checked‑in+
Assignee | ||
Comment 34•16 years ago
|
||
crontab was adjusted on: moz2-linux-slave07 moz2-linux-slave08 moz2-linux-slave10 moz2-linux-slave13
Assignee | ||
Comment 35•16 years ago
|
||
The following have been moved into production: moz2-linux-slave07 moz2-linux-slave08 moz2-linux-slave10 moz2-linux-slave13 moz2-win32-slave09 moz2-win32-slave10 moz2-darwin9-slave05 moz2-darwin9-slave07 bm-xserve22
Comment 36•16 years ago
|
||
Needed to make these clobbers moz2-linux-slave08 - mozilla-central-linux/build/obj-firefox moz2-linux-slave10 - mozilla-central-linux-unittest/build/objdir as the first build (on production anyway) died.
Assignee | ||
Comment 37•16 years ago
|
||
The following have been moved into production: moz2-darwin9-slave06 moz2-win32-slave07
Assignee | ||
Comment 38•16 years ago
|
||
(In reply to comment #37) > The following have been moved into production: > moz2-darwin9-slave06 > > moz2-win32-slave07 and moz2-win32-slave08 too.
Reporter | ||
Comment 39•16 years ago
|
||
this slave passed all tests on staging-master, now need to enable on production-master.
Attachment #356276 -
Flags: review?(catlee)
Assignee | ||
Updated•16 years ago
|
Attachment #356276 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 40•16 years ago
|
||
Comment on attachment 356276 [details] [diff] [review] add moz2-win32-slave14 to production-master changeset: 668:b42dbf33aca4
Attachment #356276 -
Flags: checked‑in+
Assignee | ||
Comment 41•16 years ago
|
||
[x] update linux refplatform - Xvfb,metacity (nthomas) [x] consolidate slave setup doc for build/unittest slaves. Update ref images if needed (catlee) [x] update inventory (catlee) [x] update support docs (catlee) Are we done here?
Comment 42•16 years ago
|
||
In bug 472779 I noticed that moz2-linux-slave07/08/09/10 are not up to date for scratchbox, so I'm guessing they are an older clone of the reference platform. That's from scratchbox not being in /builds/ and symlinked back to / (there's an update too). We need to reclone, or see if we can do the the steps on the ref doc to bring them up to speed. slave14 seems to be fine, based on the existence of /builds/scratchbox.
Comment 43•16 years ago
|
||
I had to modify the umask in buildbot.tac to 022 on bm-xserve22, moz2-darwin9-slave06/07. They were spitting out snippets which the nightly update system didn't have perms to read.
Assignee | ||
Comment 44•16 years ago
|
||
Attachment #357400 -
Flags: review?(bhearsum)
Updated•16 years ago
|
Attachment #357400 -
Flags: review?(bhearsum) → review-
Comment 45•16 years ago
|
||
Comment on attachment 357400 [details] [diff] [review] add moz2-linux-slave09 to production-master Please add slave09 to mobile_master.py, too, and renable whichever other ones are fixed there.
Assignee | ||
Comment 46•16 years ago
|
||
Attachment #357400 -
Attachment is obsolete: true
Attachment #357433 -
Flags: review?(bhearsum)
Updated•16 years ago
|
Attachment #357433 -
Flags: review?(bhearsum) → review+
Assignee | ||
Updated•16 years ago
|
Attachment #357433 -
Flags: checked‑in+
Assignee | ||
Comment 47•16 years ago
|
||
Comment on attachment 357433 [details] [diff] [review] add moz2-linux-slave09 to production-master, and renable slave7,8,10 for mobile builds changeset: 683:8907fea7ecd2
Assignee | ||
Comment 48•16 years ago
|
||
moz2-linux-slave07,08,09,10 were moved onto production yesterday and have been running fine.
Assignee | ||
Comment 49•16 years ago
|
||
Putting this one to rest.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•