Closed
Bug 966070
Opened 11 years ago
Closed 11 years ago
s/m1.large/m3.medium/
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: taras.mozilla, Assigned: massimo)
References
()
Details
Attachments
(22 files, 4 obsolete files)
Amazon's newer nodes are usually more cost-efficient.
This gets us about $100 in savings :)
More seriously, it's a big step up in ECUs and disk->ssd. We can also try m3.mediums as a stretch goal for about $3K in potential savings, but that's more risky.
Comment 1•11 years ago
|
||
Random note: by throwing more m3.medium instances at the problem, it might work and still save money. Depends how well adding masters would work out, though.
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #1)
> Random note: by throwing more m3.medium instances at the problem, it might
> work and still save money. Depends how well adding masters would work out,
> though.
I'm guessing this is a fairly fixed number of machines. Eg it doesn't grow as rapidly as number of compilation hours does. It makes sense to do something easy here, buy some reserves and call it done.
Reporter | ||
Comment 4•11 years ago
|
||
So according to graphite, these nodes are pretty damn idle on average. m3.medium
https://graphite.mozilla.org/render?from=-24hours&until=now&width=1800&height=900&target=alias(scale(sum(sum(groupByNode(exclude(hosts.buildbot-master*_us*.cpu.*.cpu.*, "(steal)"), 1, "sum")),sum(groupByNode(scale(hosts.buildbot-master*_us*.cpu.*.cpu.steal,0.5),1,"sum"))),0.01), "number of slaves")&target=alias(stacked(avg(groupByNode(exclude(hosts.buildbot-master*_us*.cpu.*.cpu.*, "(steal|idle|wait)"), 1, "sum"))), "cpu busy")&target=alias(stacked(avg(groupByNode(hosts.buildbot-master*_us*.cpu.*.cpu.idle, 1, "sum"))), "idle")&target=alias(stacked(avg(groupByNode(scale(hosts.buildbot-master*_us*.cpu.*.cpu.steal,0.5), 1, "sum"))), "ec2 steal")&target=alias(stacked(avg(groupByNode(hosts.buildbot-master*_us*.cpu.*.cpu.wait, 1, "sum"))), "IO wait")&title=m1.large(buildbot) tester efficiency 24hours
Reporter | ||
Comment 5•11 years ago
|
||
So according to graphite, these nodes are pretty damn idle on average. m3.medium should do fine
Assignee | ||
Updated•11 years ago
|
Assignee: rail → mgervasini
Updated•11 years ago
|
Comment 6•11 years ago
|
||
err, wrong bug
Assignee | ||
Comment 7•11 years ago
|
||
Quick update:
here's the list of out m1.large instances in the east region:
buildbot-master51
buildbot-master52
buildbot-master55
buildbot-master57
buildbot-master59
buildbot-master61
buildbot-master62
buildbot-master63
buildbot-master67
buildbot-master69
buildbot-master70
buildbot-master71
buildbot-master75
buildbot-master76
buildbot-master77
All the instances above are using less than 10 Gb disk space, beetween 3 and 8 Gb RAM and the cpus are not too much stressed (all except buildbot-master51, linux test, which uses ~25 Gb on the disk)
Tomorrow, I'll find the machine in this list that uses less resources and I'll make it a m3.medium instance and let it run over the weekend to see how it behaves.
Assignee | ||
Comment 8•11 years ago
|
||
I've found a good candidate in buildbot-master52. Unfortunately I have discovered after the graceful shutdown, that bm52 cannot be recreated as m3.medium or m3.large because these kind of instances are not available in us-east-1b availability zone; the error message says that m3.medium is available in us-east-1a, us-east-1c and us-east-1d
I am restating buildbot-master52 as m1.large (as it was before) and then I'll find another candidate in a different availability zone.
Assignee | ||
Comment 9•11 years ago
|
||
Update:
more data our m1.large masters in east region
Master Zone Memory Disk
name in use usage
[Mb]
-------------------------------
bm51 east-1a 5353 25G
bm52 east-1b 2050 6.3G
bm59 east-1d 3221 3.3G
bm61 east-1c 4132 4.3G
bm62 east-1b 4522 4.3G
bm63 east-1b 4530 4.0G
bm67 east-1c 4477 7.1G
bm69 east-1a 2928 8.0G
bm70 east-1c 2937 7.2G
bm71 east-1c 2920 7.1G
bm75 east-1b 2199 4.3G
bm76 east-1a 2382 5.7G
bm77 east-1c 1984 6.8G
* Instances in east-1b cannot be converted to m3.medium or m3.large
* Memory in use is total memory - free memory (free + buffer + cached):
+ m3.medium has 3.75 Gb ram (3840 Mb)
+ m3.large has 7.5 Gb ram
+ m1.large has 7.5 Gb ram
* Disk usage:
on build-masters 3.7 Gb are used for /dev/shm (defaults to half the RAM size)
m1.large and m3.large are just "too big" for the role of build master, m3.medium will probably just work but since there is no a wide margin (in terms of ram and disk), I prefer not to do the switch during the weekend and wait until next Monday.
Assignee | ||
Comment 10•11 years ago
|
||
Update:
bm77 is now a m3.medium instance, I'll post some data about the memory/disk/cpu usage as soon this master gets some jobs done.
Reporter | ||
Updated•11 years ago
|
Summary: s/m1.large/m3.large/ → s/m1.large/m3.medium/
Comment 11•11 years ago
|
||
Would be a good to add some swap to handle with long running master instances, which gradually accumulate memory, eg this one up since Jan 14:
[cltbld@buildbot-master51.srv.releng.use1.mozilla.com ~]$ free -m
total used free shared buffers cached
Mem: 7467 7159 308 0 495 2379
-/+ buffers/cache: 4284 3183
Swap: 0 0 0
Using more than a m3.medium has.
Assignee | ||
Comment 12•11 years ago
|
||
some data from last 4 days:
memory: http://mzl.la/1jGbX6z
load: http:/mzl.la/1fyeUCm
there are some spikes in the cpu load.
I have created a swap file, using the add_swap script from bug 964880, to see how this affects the performances of bm77
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #12)
> some data from last 4 days:
>
> memory: http://mzl.la/1jGbX6z
> load: http:/mzl.la/1fyeUCm
>
> there are some spikes in the cpu load.
>
> I have created a swap file, using the add_swap script from bug 964880, to
> see how this affects the performances of bm77
note cpu.idle and cpu.wait are the useful metrics to track
Assignee | ||
Comment 14•11 years ago
|
||
Here are the graphs for cpu idle and wait
cpu.idle: http://mzl.la/1dQvGQz
cpu.wait: http://mzl.la/1fBxOrM
There's no data for the last two days. Is there anything wrong with the data collection on bm77 or am I doing something wrong with graphite?
here some data from the buildbot database: I have extracted the time required to execute the get_basedir command on the slave, from the initial request to end of the job. get_basedir runs the "pwd" command on the slave and its execution is almost immediate, so this is a good measurement of the time spent from the get_basedir command in master's queue
buildbot master 77, get_basedir command, execution time in seconds:
insance type period avg max 90th percentile
m1.large 1-13 Feb 0.6119 42 1
m3.medium 15 Feb - today 0.6642 119 1
Reporter | ||
Comment 15•11 years ago
|
||
Assignee | ||
Comment 16•11 years ago
|
||
I have created a new m3.medium instance from scratch (bm70), there are still a couple of things to fix:
* creating a swap file
* add a local instance storage
Depends on: 964880
Assignee | ||
Comment 17•11 years ago
|
||
Hi rail,
this patch enables m3.medium masters with the following specs:
* a 15 GB disk mounted under /
* the instance storage (4GB - SSD), mounted under /mnt/instance_storage, will be used for swapping (bug 964880).
It also sets the moz-type for this instances to "buildbot-master"
Attachment #8382917 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #8382917 -
Flags: review?(rail) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8382917 -
Flags: checked-in+
Assignee | ||
Comment 18•11 years ago
|
||
puppet patch to enable buildbot-master70 as a build master
Attachment #8383129 -
Flags: review?(rail)
Assignee | ||
Comment 19•11 years ago
|
||
this patch adds buildbot-master70 in production-masters.json
Attachment #8383132 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #8383129 -
Flags: review?(rail) → review+
Comment 20•11 years ago
|
||
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch
Review of attachment 8383132 [details] [diff] [review]:
-----------------------------------------------------------------
::: buildfarm/maintenance/production-masters.json
@@ +631,5 @@
> + "buildbot_branch": "production-0.8",
> + "buildbot_python": "/builds/buildbot/build1/bin/python",
> + "buildbot_setup": "/builds/buildbot/build1/buildbot/master/setup.py",
> + "buildbot_version": "0.8.2",
> + "datacentre": "scl3",
A nit, use aws-us-east-1 instead.
Attachment #8383132 -
Flags: review?(rail) → review+
Comment 21•11 years ago
|
||
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch
Review of attachment 8383132 [details] [diff] [review]:
-----------------------------------------------------------------
::: buildfarm/maintenance/production-masters.json
@@ +653,5 @@
> + "basedir": "/builds/buildbot/try1",
> + "bbconfigs_branch": "production",
> + "bbconfigs_dir": "/builds/buildbot/try1/buildbot-configs",
> + "bbcustom_branch": "production-0.8",
> + {
[drive-by-lucky-notice]
ugh -- this will break the whole json.
{
..
stuff
...
{
--- basically it looks like your copy/paste included more things than you intended.
r- due to that, but if you fix it no objection
Attachment #8383132 -
Flags: review-
Assignee | ||
Updated•11 years ago
|
Attachment #8383129 -
Flags: checked-in+
Assignee | ||
Comment 22•11 years ago
|
||
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch
Thanks rail, thanks Callek!
Attachment #8383132 -
Flags: checked-in+
Assignee | ||
Comment 23•11 years ago
|
||
we have a m3.medium instance, buildbot-master70, in production. This master has two slaves locked to
it: bld-linux64-ec2-003 and bld-linux64-ec2-038. I'm going to leave it running over the weekend to see if the new instance type behaves properly.
I will check on it on Saturday as well.
Assignee | ||
Comment 24•11 years ago
|
||
bm70 is not able to sign because of: "token generation failed, error message: 403 Forbidden" exception.
This patch adds bm70 ip address in signing_new_token_allowed_ips.
Attachment #8384637 -
Flags: review?(bhearsum)
Updated•11 years ago
|
Attachment #8384637 -
Flags: review?(bhearsum) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8384637 -
Flags: checked-in+
Assignee | ||
Comment 25•11 years ago
|
||
Changes for buildbot-master70 have landed in production and all the builds are green.
I am enabling bm70 in slavealloc and unlocking its slaves (bld-linux64-ec2-003 and bld-linux64-ec2-038)
Assignee | ||
Comment 26•11 years ago
|
||
bm70 is working fine, I am starting replacing existing masters in east-1 with m3.medium instances.
On east-1 we have:
* 1 try master (bm59)
* 4 test masters (bm51, bm52, bm67, bm69)
* 5 build masters (bm57, bm61, bm62, bm63, bm70)
I am going to create 4 new masters to replace bm57, bm61, bm62 and bm63
Assignee | ||
Comment 27•11 years ago
|
||
adding buildbot-master 71..74 in production-master.json
Attachment #8386041 -
Flags: review?(sbruno)
Assignee | ||
Comment 28•11 years ago
|
||
This patch recreates buildbot-master 71..74 as build masters and adds their IP addresses in signing_new_token_allowed_ips list.
Here's the list of IP addresses:
buildbot-master71: 10.134.49.111
buildbot-master72: 10.132.50.54
buildbot-master73: 10.132.49.181
buildbot-master74: 10.132.49.94
buildbot-master 72, 73 and 74 are in aws west-2 zone
Attachment #8386046 -
Flags: review?(sbruno)
Assignee | ||
Comment 29•11 years ago
|
||
fixed wrong datacentre and hostnames for nodes in west-2 region (bm72, bm73 and 74)
Attachment #8386041 -
Attachment is obsolete: true
Attachment #8386041 -
Flags: review?(sbruno)
Attachment #8386058 -
Flags: review?(sbruno)
Comment 30•11 years ago
|
||
Comment on attachment 8386058 [details] [diff] [review]
[tools] - add bm{71..74} to production-master.json.patch
Hey mgerva,
The name key for each of the new entries is the same: "bm70-build1", I guess this needs to be fixed.
Attachment #8386058 -
Flags: review?(sbruno) → review-
Assignee | ||
Comment 31•11 years ago
|
||
Thanks simone,
here is the patch with updates "names:"
Attachment #8386058 -
Attachment is obsolete: true
Attachment #8386064 -
Flags: review?(sbruno)
Updated•11 years ago
|
Attachment #8386046 -
Flags: review?(sbruno) → review+
Comment 32•11 years ago
|
||
Comment on attachment 8386064 [details] [diff] [review]
[tools] - add bm{71..74} to production-master.json.patch
Now it looks ok to me.
In order to review, I compared this patch with the patches relative to buildbot-master70 (now in prod), and all relevant code parts seem to have been updated correctly.
Attachment #8386064 -
Flags: review?(sbruno) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8386064 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Attachment #8386046 -
Flags: checked-in+
Comment 33•11 years ago
|
||
Comment on attachment 8386046 [details] [diff] [review]
[puppet] recreate bm{71..74} as build masters and add them in signing_new_token_allowed_ips.patch
merged to production
Assignee | ||
Comment 34•11 years ago
|
||
I am rolling out the first batch of m3.medium build masters in production.
bm70, bm71, bm72, bm73 and bm74 in west-2 are now online and are already receiving new jobs. Each of the new instances will replace a m1.large instance as follows:
* bm70 -> bm63
* bm71 -> bm61
* bm72 -> bm58
* bm73 -> bm64
* bm74 -> bm65
After a clean shutdown, m1.large instances will be stopped in aws so they can be re-enabled in case of emergency
Assignee | ||
Comment 35•11 years ago
|
||
Second batch of masters, 1 build master and 4 try masters, as follows:
try:
bm75, bm76 in east-1
bm78, bm79 in west-2
build:
bm77 in east-1
this patch sets enabled => false, for all the masters above and for bm58, bm61, bm63, bm64, bm65 as part of the migration from the first batch
Attachment #8387036 -
Flags: review?(rail)
Assignee | ||
Comment 36•11 years ago
|
||
Enable 4 new try masters:
* bm75, bm76 east-1
* bm78, bm79 west-2
enable a build master in east-1: bm77 (so we have 3 build masters for region)
this patch also adds bm77 IP address (10.134.49.77) to signing_new_token_allowed_ips
Attachment #8387039 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #8387036 -
Flags: review?(rail) → review+
Assignee | ||
Comment 37•11 years ago
|
||
added new try masters IPs to signing_new_token_allowed_ips
Attachment #8387039 -
Attachment is obsolete: true
Attachment #8387039 -
Flags: review?(rail)
Attachment #8387057 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #8387057 -
Flags: review?(rail) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8387036 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Attachment #8387057 -
Flags: checked-in+
Comment 38•11 years ago
|
||
Something here broke Firefox 28.0b9.
bm71-74 are enabled in slavealloc and accept new jobs, but they are not enabled in production-masters.json, what prevents fabric reconfigs on them. release-runner uses this files to reconfig only enabled masters, so it skipped the ones listed above.
As a result we've seen a failed bouncer submitter job trying to submit jobs for 28.0b8. I also cancelled 2 xulrunner builds to be sure.
I disabled the masters in slavealloc and started a graceful shutdown
Comment 39•11 years ago
|
||
I also want to call out, if not already done, that when they are brought back up we need to go into nagios and re-enable notifications for these hosts, otherwise they could go down and no-one would be the wiser
Assignee | ||
Comment 40•11 years ago
|
||
bm75 IP address is in east-1-b subnet but m3.medium instances are not available in this subnet. Deleting this entry from puppet.
Attachment #8387689 -
Flags: review?(rail)
Assignee | ||
Comment 41•11 years ago
|
||
Deleting bm75 from production-master.json file - it cannot be puppetized as m3.medium, so we should remove this entry created in attachment 8387036 [details] [diff] [review].
Attachment #8387691 -
Flags: review?(rail)
Assignee | ||
Comment 42•11 years ago
|
||
bm75 IP address is in east-1-b subnet but m3.medium instances are not available in this subnet. Deleting this entry from puppet... and its IP address from signing_new_token_allowed_ips
Attachment #8387689 -
Attachment is obsolete: true
Attachment #8387689 -
Flags: review?(rail)
Attachment #8387695 -
Flags: review?(rail)
Comment 43•11 years ago
|
||
Comment on attachment 8387691 [details] [diff] [review]
[tools] remove buildbot-master75 from production-masters.json.patch
I'd rather change the DNS entry for bm75 instead in the inventory. Just find a free IP in other subnets, change the A and PTR entries in the inventory: https://inventory.mozilla.org/en-US/core/search/#q=10.134.48.158
Attachment #8387691 -
Flags: review?(rail) → review-
Updated•11 years ago
|
Attachment #8387695 -
Flags: review?(rail) → review-
Assignee | ||
Comment 44•11 years ago
|
||
A patch to enable bm71..74 and bm76..79 has just landed in tools (changeset: 4427:13eedba9148f)
The new masters have been enabled in slavealloc too.
Assignee | ||
Comment 45•11 years ago
|
||
All the new masters are on-line and working.
I have started the shutdown procedure for the old masters.
Comment 46•11 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #45)
> All the new masters are on-line and working.
>
> I have started the shutdown procedure for the old masters.
These are starting to alert in nagios. Please downtime or otherwise silence the checks in the future. I've ack'ed the one that just went off.
Assignee | ||
Comment 47•11 years ago
|
||
Disable buildbot master 56, 57, 59 and 60 from production-masters.json before stopping the old master instances.
Attachment #8389816 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8389816 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8389816 -
Flags: checked-in+
Assignee | ||
Comment 48•11 years ago
|
||
The following masters have been stopped:
* bm56
* bm57
* bm58
* bm59
* bm60
* bm63
* bm64
two old masters instances are still running because they have some other services
* bm61 - self serve agent
* bm65 - slave rebooter
Assignee | ||
Comment 49•11 years ago
|
||
This patch enables selfserve_agent on bm70, bm71, bm72 and bm73 and creates two new buildmasters: bm91 and 94
Attachment #8393003 -
Flags: review?(catlee)
Assignee | ||
Comment 50•11 years ago
|
||
Create two new build masters: bm90 and bm94
Attachment #8393004 -
Flags: review?(catlee)
Comment 51•11 years ago
|
||
Comment on attachment 8393004 [details] [diff] [review]
[tools] create two new buildmasters: bm91 and bm94.patch
Review of attachment 8393004 [details] [diff] [review]:
-----------------------------------------------------------------
::: buildfarm/maintenance/production-masters.json
@@ +1176,5 @@
> + "buildbot_branch": "production-0.8",
> + "buildbot_python": "/builds/buildbot/build1/bin/python",
> + "buildbot_setup": "/builds/buildbot/build1/buildbot/master/setup.py",
> + "buildbot_version": "0.8.2",
> + "datacentre": "aws-us-east-1",
should be aws-us-west-2 I think?
Attachment #8393004 -
Flags: review?(catlee) → review+
Updated•11 years ago
|
Attachment #8393003 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8393003 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Attachment #8393004 -
Flags: checked-in+
Assignee | ||
Comment 52•11 years ago
|
||
Removing bm{56..65} entries from production-masters.json - these instances are already disabled on slavealloc.
Attachment #8393078 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8393078 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 53•11 years ago
|
||
remove buildbot master entries for bm{56..65} from nodes. Removed their IPs from signing_new_token_allowed_ips.
Attachment #8393088 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8393088 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 54•11 years ago
|
||
Current status:
m1.large masters
* bm{56..65} instances except for bm61 bm62 and bm65 have been stopped because they are running extra services that are not migrated yet (self-serve agent, slaverebootrbug 985088)
* bm55 and bm66 still in production
m3.medium masters
* bm{70..74}, bm{76..79} are on-line and taking jobs
* bm75 is on hold because it has a new IP address and this requires new flows (bug 985055)
* bm91 and bm94 are running test jobs - if tests pass, they will replace bm55 and bm66
Assignee | ||
Comment 55•11 years ago
|
||
New masters have been running for several hours and all the builds are green. This patch enables them in production-masters.json.
When this patch lands, I'll start a graceful shutdown of bm55 and bm66
Attachment #8393509 -
Flags: review?(catlee)
Assignee | ||
Updated•11 years ago
|
Attachment #8393078 -
Flags: checked-in+
Updated•11 years ago
|
Attachment #8393509 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 56•11 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #55)
> When this patch lands, I'll start a graceful shutdown of bm55 and bm66
err bm55 is a try master, I'm not stopping it.
Assignee | ||
Updated•11 years ago
|
Attachment #8393509 -
Flags: checked-in+
Assignee | ||
Comment 57•11 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #56)
> (In reply to Massimo Gervasini [:mgerva] from comment #55)
>
> > When this patch lands, I'll start a graceful shutdown of bm55 and bm66
>
> err bm55 is a try master, I'm not stopping it.
Please ignore the above comment, I am stopping bm55.
Assignee | ||
Updated•11 years ago
|
Attachment #8393088 -
Flags: checked-in+
Assignee | ||
Comment 58•11 years ago
|
||
Disable buildbot master 55 and 66; removing their IP addresses from signing_new_token_allowed_ips
Attachment #8393583 -
Flags: review?(catlee)
Assignee | ||
Comment 59•11 years ago
|
||
Remove buildbot master 55 and 66 from production-masters.json
Buildbot master 66 is still running some jobs so this patch (and the puppet one) will land when the master has stopped properly.
Attachment #8393584 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8393584 -
Flags: review?(catlee) → review+
Updated•11 years ago
|
Attachment #8393583 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8393583 -
Flags: checked-in+
Assignee | ||
Updated•11 years ago
|
Attachment #8393584 -
Flags: checked-in+
Assignee | ||
Comment 60•11 years ago
|
||
Status update:
New m3.medium instances, bm{70..79}, bm91 and bm94:
* all the new m3.masters are active (except for buildbot master75 - bug 985055)
Old m1.large instances, bm{55..66}
* the following instances have been terminated:
bm56
bm57
bm58
bm59
bm60
bm63
bm64
* leaving the following instance stopped for a while (it has been stopped for few hours)
bm55
* the following instances are still up and running because they are still running other services (buildbot process is stoppe)
bm61
bm62
bm65
bm66
Assignee | ||
Comment 61•11 years ago
|
||
Two green builds on bm75, waiting for the current job to finish to enable master in production-masters.json
Attachment #8394862 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #8394862 -
Flags: review?(rail) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8394862 -
Flags: checked-in+
Assignee | ||
Comment 62•11 years ago
|
||
Buildbot-master75 is in production. All the try/build buildbot masters on m1.large have been migrated to m3.medium instances.
Some old m1.large machines require further steps before we can terminate them because they have some running services:
bm61: self serve
bm62: self serve
bm65: self serve, slave rebooter
bm66: self serve, gaia_bumper, b2g_bumper (Bug 985767 and Bug 986474)
* self serve is already running on m3.medium machines (bm70,71,72 and 73) but it needs to be tested.
* slave rebooter requires another puppet patch that needs to be coordinated with buildduty
* gaia_bumper and b2g_bumper will never fit on a m3.medium instance because of the disk and cpu usage, so bm66 will stay up for a while.
Assignee | ||
Comment 63•11 years ago
|
||
This patch removes selfserve_agent from bm61, bm62, bm63 and bm65
Waiting for nagios notifcations to be enabled on new self serve agents before landing this patch (bug 987173)
Attachment #8395713 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8395713 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8395713 -
Flags: checked-in+
Assignee | ||
Comment 64•11 years ago
|
||
Preparing to migrate slaverebooter from bm65 to bm74. Before landing this patch we need to wait for the nagios monitoring for slaverebooter on bm74 (bug 987793).
Attachment #8396477 -
Flags: review?(catlee)
Assignee | ||
Comment 65•11 years ago
|
||
bm55 was still active in puppet master definition. This patch removes it and its IP address from signing_new_token_allowed_ips
Attachment #8396536 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8396477 -
Flags: review?(catlee) → review+
Updated•11 years ago
|
Attachment #8396536 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8396536 -
Flags: checked-in+
Comment 66•11 years ago
|
||
I migrated bm69 to an m3.medium today.
Comment 67•11 years ago
|
||
buildbot-master-servo-01 has been migrated to an m3.medium, too.
Assignee | ||
Updated•11 years ago
|
Attachment #8396477 -
Flags: checked-in+
Assignee | ||
Comment 68•11 years ago
|
||
Slave rebooter moved from bm65 to bm74.
Assignee | ||
Comment 69•11 years ago
|
||
Hi Ben,
Are there any other buidbot-masters to migrate to m3.medium?
Flags: needinfo?(bhearsum)
Comment 70•11 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #69)
> Hi Ben,
>
> Are there any other buidbot-masters to migrate to m3.medium?
I don't think so. The only ones left are:
51, 52, 53, 54, 67, 68 -- which are all linux64 test masters, which need more RAM than the m3.medium has.
65 - which is already stopped
66 - which I think is having its services moved elsewhere?
Flags: needinfo?(bhearsum)
Assignee | ||
Comment 71•11 years ago
|
||
> 65 - which is already stopped
> 66 - which I think is having its services moved elsewhere?
(In reply to Ben Hearsum [:bhearsum] from comment #70)
> (In reply to Massimo Gervasini [:mgerva] from comment #69)
> > Hi Ben,
> >
> > Are there any other buidbot-masters to migrate to m3.medium?
>
> I don't think so. The only ones left are:
> 51, 52, 53, 54, 67, 68 -- which are all linux64 test masters, which need
> more RAM than the m3.medium has.
>
> 65 - which is already stopped
> 66 - which I think is having its services moved elsewhere?
* bm65 is stopped, waiting for all the netflows/nagios notifications to be closed (bug 990775)
* bm66 is tracked here: Bug 990172
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•