Closed Bug 619210 Opened 14 years ago Closed 14 years ago

[tracking bug] Graceful Shutdown of all masters with short build dir names and PYTHONPATH set, also remove symlinks in non-scl masters

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lsblakk, Assigned: bhearsum)

Details

(Whiteboard: [buildduty])

Keeping this un-prioritized since it's a buildduty bug.

The following masters need to have symlinks for util in their buildbotcustom removed prior to a restart:

pm01
pm02
talos-master02
test-master01
test-master02

The above masters (along with pm03) are currently using the short dir patch as well as Rail's patch for bug 394498 and aki's patch for bug 613411 and are currently running fine. They are the lowest priority for graceful shutdown and will need removal of symlink beforehand.

All the scl masters (test and build) need to have the tools/lib/python set in PYTHONPATH in their Makefiles, checkouts of tools in the virtualenv, and a graceful restart.  I will do this now and get as many done today and then add a list of what is left to do.
pm01-bm        cc6500a7c5cf 2c3f9830112b
pm01-sm        cc6500a7c5cf 2c3f9830112b
pm02-sm        cc6500a7c5cf 2c3f9830112b
pm02-try       cc6500a7c5cf 2c3f9830112b
pm03-bm        cc6500a7c5cf 2c3f9830112b
test-master01  cc6500a7c5cf 2c3f9830112b
test-master02  cc6500a7c5cf 2c3f9830112b
talos-master02 cc6500a7c5cf 2c3f9830112b
bm3            cc6500a7c5cf 2c3f9830112b
bm4            cc6500a7c5cf 2c3f9830112b
tm3            cc6500a7c5cf 2c3f9830112b
tm4            cc6500a7c5cf 2c3f9830112b
tm5            cc6500a7c5cf 2c3f9830112b
tm6            cc6500a7c5cf 2c3f9830112b
These masters now have a tools dir in the master dir, the Makefile has a PYTHONPATH set, and they pass checkconfig and are ready for graceful shutdown.

bm3            
bm4            
tm3           
tm4           
tm5            
tm6
Also note the try master as well as the tests-scheduler on pm02 and the scheduler-master on pm01 will need graceful shutdown.
Graceful restart on tm3: 2010-12-14 16:11
Graceful restart on bm3: 2010-12-14 16:41
Graceful restart on bm3: 2010-12-14 16:43
Here is what will need doing tomorrow AM:

pm01-sm        remove symlink in /tools/buildbotcustom, graceful restart
pm02-sm        remove symlink in /tools/buildbotcustom, graceful restart
pm02-try       remove symlink in /tools/buildbotcustom, graceful restart
test-master01  remove symlink in /tools/buildbotcustom, graceful restart
test-master02  remove symlink in /tools/buildbotcustom, graceful restart
talos-master02 remove symlink in /tools/buildbotcustom, graceful restart

bm3            check for restart success
bm4            check for restart success
tm3            check for restart success
tm5            graceful restart at 2010-12-14 16:51, check for success

tm4            needs graceful_restart
tm6            needs graceful_restart
tm3 shutdown but didn't restart. I restarted it manually with
  cd /builds/buildbot/test_master3
  make checkconfig
  make start
Current status:
buildbot-master1:
* build_master3 - DONE
* tests_master3 - DONE
* tests_master4 - DONE
buildbot-master2:
* build_master4 - DONE
* tests_master5 - DONE
* tests_master6 - DONE

test-master01:
* tests-master - DONE

test-master02:
* tests-master - DONE

talos-master02:
* tests-master - DONE

production-master03:
* builder_master - IN PROGRESS

I could not do any of the instances on production-master01 (builder_master1, scheduler_master) or production-master02 (try-trunk-master, tests-scheduler) because the former is tied up doing 4.0b8 and the latter is too busy to consider a graceful shutdown. I hope to get to them either late tonight, early tomorrow, or Friday during the downtime.

Additionally, I found that tools checkouts had been wrongly placed inside the "master" directory for all masters on buildbot-master1 and 2. I fixed this by moving them to /builds/buildbot/$master and adding a symlink to the previous location. We can remove that symlink any time after they restart next.
Assignee: nobody → bhearsum
All masters are fixed up now.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.