Closed
Bug 417887
Opened 16 years ago
Closed 15 years ago
linux buildbot masters/slaves should reboot ready for use
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: bhearsum)
References
Details
Attachments
(4 files, 3 obsolete files)
740 bytes,
text/plain
|
Details | |
3.98 KB,
text/plain
|
Details | |
3.66 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
13.66 KB,
patch
|
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
All of the buildbot masters and slaves should not require any extra prodding after a reboot. Masters should always start up everything needed on boot. For slaves, buildbot should be able to launch apps that talk to the GUI. This is done differently depending on the OS.
Updated•16 years ago
|
Priority: -- → P3
Comment 1•16 years ago
|
||
During 3.0beta3 and again now in 3.0beta4, we failed out AliveTest because X server was refusing connections. Fixed by running "xhost +", but this should be done automatically on reboot. ... ----------- Output from Profile Creation ------------- Xlib: connection to ":0.0" refused by server ...
Comment 2•16 years ago
|
||
note: for 3.0beta4, this happened on both mac and linux, so needs to be fixed on both.
Updated•16 years ago
|
Component: Build & Release → Release Engineering: Projects
QA Contact: build → release
Comment 3•16 years ago
|
||
Splitting out win32 specific details in bug#428123. Splitting out mac specific details in bug#428124.
Summary: all buildbot masters/slaves should reboot ready for use → linux buildbot masters/slaves should reboot ready for use
Assignee | ||
Comment 4•15 years ago
|
||
We chatted a bunch about this today and decided that part of this will be doing scheduled, periodic reboots of staging machines both to iron out kinks in the rebooting and to look for potential performance gains. I plan to start work on this next week.
Status: NEW → ASSIGNED
Component: Release Engineering: Future → Release Engineering
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → bhearsum
Assignee | ||
Comment 5•15 years ago
|
||
I did some initial work on this today. Thanks to catlee already setting up a cronjob to make sure Xvnc and metacity are running it seems the only thing to do here is start Buildbot on boot. This can be done by dropping the two files I'm about to attach into /etc/default/buildbot and /etc/init.d/buildbot respectively, and then doing the following: 1. Ensure /builds/slave is the buildslave directory, or is symlinked to it 2. Run 'chkconfig --add buildbot' as root. 3. Reboot to test it. Before going ahead and deploying it I want to get periodic reboots of the staging Linux slaves going to make sure everything comes up okay.
Assignee | ||
Comment 6•15 years ago
|
||
Assignee | ||
Comment 7•15 years ago
|
||
Assignee | ||
Comment 8•15 years ago
|
||
As per the meeting yesterday I'm going to investigate the difficulty and time consumption of running fsck on every reboot to help avoid cases where we burn builds after an ESX/storage problem.
Assignee | ||
Updated•15 years ago
|
Priority: P3 → P2
Assignee | ||
Comment 9•15 years ago
|
||
(In reply to comment #8) > As per the meeting yesterday I'm going to investigate the difficulty and time > consumption of running fsck on every reboot to help avoid cases where we burn > builds after an ESX/storage problem. As it turns out it's pretty easy to do this, but doesn't solve the problem we're trying to solve. When an ESX host or storage array goes down while a build is linking we end up with a corrupt object file. The hope was that fscking would fix this, but it does not. Given that, there's no point in forcing fscks on every boot. We can solve the burning-after-problems issue by using the clobberer webpage to force clobbers before we start slaves back up, however. So, two things left to do here: 1) Set-up periodic reboots of Linux machines in staging to help ensure they always come up clean. 2) Deploy to production / the rest of staging.
Priority: P2 → P3
Assignee | ||
Updated•15 years ago
|
Priority: P3 → P2
Assignee | ||
Comment 10•15 years ago
|
||
Copy and paste instructions: cd /etc/default wget -Obuildbot https://bugzilla.mozilla.org/attachment.cgi?id=359078 cd /etc/init.d wget -Obuildbot https://bugzilla.mozilla.org/attachment.cgi?id=359080 chmod +x buildbot chkconfig --add buildbot /etc/init.d/buildbot start
Assignee | ||
Comment 11•15 years ago
|
||
I just tested a periodic reboot patch on staging-master. I had to add the following line to sudoers to make it work, but after that it worked great: cltbld ALL=NOPASSWD: /usr/bin/reboot Patches to come.
Assignee | ||
Comment 12•15 years ago
|
||
This is basically a copy of the Talos code. The Build/Unittest factories end up calling the addPeriodicRebootSteps function since it has to be run at the end of a build.
Attachment #360107 -
Flags: review?(catlee)
Assignee | ||
Comment 13•15 years ago
|
||
This patch enables reboots on Linux staging slaves. I've set up moz2-linux-slave03, 04, and 17 with the proper sudoers file and tested them all - they should all come up OK. I also wanted to get the code in for production so it's just a flip-of-the-switch if we ever care to do it. To be clear: I'm not planning to enable periodic reboots in production as part of this bug.
Attachment #360108 -
Flags: review?(catlee)
Assignee | ||
Comment 14•15 years ago
|
||
Attachment #360107 -
Attachment is obsolete: true
Attachment #360132 -
Flags: review?(catlee)
Attachment #360107 -
Flags: review?(catlee)
Assignee | ||
Comment 15•15 years ago
|
||
Attachment #360108 -
Attachment is obsolete: true
Attachment #360134 -
Flags: review?(catlee)
Attachment #360108 -
Flags: review?(catlee)
Assignee | ||
Comment 16•15 years ago
|
||
Here's complete instructions for rollout of thisss, when we do it: cd /etc/default wget -Obuildbot https://bugzilla.mozilla.org/attachment.cgi?id=359078 cd /etc/init.d wget -Obuildbot https://bugzilla.mozilla.org/attachment.cgi?id=359080 chmod +x buildbot chkconfig --add buildbot /etc/init.d/buildbot start sudo -e /etc/sudoers # Add 'cltbld ALL=NOPASSWD: /usr/bin/reboot' as the last line, save and exit After that's all done it's a good idea to make sure cltbld can run 'reboot' and reboot the slave to make sure it comes up clean. Take care to watch for it to connect to the Buildbot master: su - cltbld sudo reboot
Updated•15 years ago
|
Attachment #360132 -
Flags: review?(catlee) → review+
Comment 17•15 years ago
|
||
Comment on attachment 360134 [details] [diff] [review] master-side patch Looks good. Did you intend to enable periodic rebooting for linux on production?
Attachment #360134 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 18•15 years ago
|
||
(In reply to comment #17) > (From update of attachment 360134 [details] [diff] [review]) > Looks good. Did you intend to enable periodic rebooting for linux on > production? Whoops, no, not at all.
Assignee | ||
Comment 19•15 years ago
|
||
Attachment #360134 -
Attachment is obsolete: true
Assignee | ||
Comment 20•15 years ago
|
||
We're going to deploy this on Monday.
Assignee | ||
Comment 21•15 years ago
|
||
Comment on attachment 360319 [details] [diff] [review] master side patch, disable reboots in production changeset: 933:37cbc1167a03
Attachment #360319 -
Flags: checked‑in+
Assignee | ||
Comment 22•15 years ago
|
||
Comment on attachment 360132 [details] [diff] [review] periodic reboots, buildsBeforeReboot/doPeriodicReboots combined; use alwaysRun changeset: 189:f716fe07a806
Attachment #360132 -
Flags: checked‑in+
Assignee | ||
Comment 23•15 years ago
|
||
Alright, this got landed this morning. Only thing left to do here is to deploy these changes to the ref platform.
Assignee | ||
Comment 24•15 years ago
|
||
Ref platform updated. So, this has been deployed on the following slaves: try-linux-slave01 - 06 moz2-linux-slave01 - 19 CentOS-5.0-ref-tools-vm
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 25•15 years ago
|
||
I'm re-opening this because I neglected to deploy this on our 1.8 and 1.9 machines - which is not as critical but still worthwhile to do.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 26•15 years ago
|
||
Alright, the following VMs' Buildbot slaves/masters should start on boot now: fx-linux-1.9-slave1, 2, 03, 04, 07, 08, 09 production-crazyhorse production-prometheus-vm production-prometheus-vm02 staging-1.9-master staging-prometheus-vm production-1.8-master production-1.9-master production-master staging-master sm-try-master sm-staging-try-master The following VMs were off, and almost never used anymore: staging-crozyhorse staging-prometheus-vm02 I did not start them up to add the init script.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•