Status

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: taras.mozilla, Assigned: massimo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(22 attachments, 4 obsolete attachments)

2.15 KB, patch
rail
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
538 bytes, patch
rail
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
1.89 KB, patch
rail
: review+
Callek
: review-
massimo
: checked-in+
Details | Diff | Splinter Review
370 bytes, patch
bhearsum
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
2.18 KB, patch
sbruno
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
5.40 KB, patch
sbruno
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
7.45 KB, patch
rail
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
2.48 KB, patch
rail
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
1.51 KB, patch
rail
: review-
Details | Diff | Splinter Review
932 bytes, patch
rail
: review-
Details | Diff | Splinter Review
1.59 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
2.60 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
2.91 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
12.05 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
4.15 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
925 bytes, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
1.41 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
2.68 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
564 bytes, patch
rail
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
1.08 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
641 bytes, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
852 bytes, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
(Reporter)

Description

5 years ago
Amazon's newer nodes are usually more cost-efficient.

This gets us about $100 in savings :)

More seriously, it's a big step up in ECUs and disk->ssd. We can also try m3.mediums as a stretch goal for about $3K in potential savings, but that's more risky.
Random note: by throwing more m3.medium instances at the problem, it might work and still save money. Depends how well adding masters would work out, though.
I'll take a look at this.
Assignee: nobody → rail
(Reporter)

Comment 3

5 years ago
(In reply to Mike Hommey [:glandium] from comment #1)
> Random note: by throwing more m3.medium instances at the problem, it might
> work and still save money. Depends how well adding masters would work out,
> though.

I'm guessing this is a fairly fixed number of machines. Eg it doesn't grow as rapidly as number of compilation hours does. It makes sense to do something easy here, buy some reserves and call it done.
(Reporter)

Comment 4

5 years ago
So according to graphite, these nodes are pretty damn idle on average. m3.medium


https://graphite.mozilla.org/render?from=-24hours&until=now&width=1800&height=900&target=alias(scale(sum(sum(groupByNode(exclude(hosts.buildbot-master*_us*.cpu.*.cpu.*, "(steal)"), 1, "sum")),sum(groupByNode(scale(hosts.buildbot-master*_us*.cpu.*.cpu.steal,0.5),1,"sum"))),0.01), "number of slaves")&target=alias(stacked(avg(groupByNode(exclude(hosts.buildbot-master*_us*.cpu.*.cpu.*, "(steal|idle|wait)"), 1, "sum"))), "cpu busy")&target=alias(stacked(avg(groupByNode(hosts.buildbot-master*_us*.cpu.*.cpu.idle, 1, "sum"))), "idle")&target=alias(stacked(avg(groupByNode(scale(hosts.buildbot-master*_us*.cpu.*.cpu.steal,0.5), 1, "sum"))), "ec2 steal")&target=alias(stacked(avg(groupByNode(hosts.buildbot-master*_us*.cpu.*.cpu.wait, 1, "sum"))), "IO wait")&title=m1.large(buildbot) tester efficiency 24hours
(Reporter)

Comment 5

5 years ago
So according to graphite, these nodes are pretty damn idle on average. m3.medium should do fine

Updated

5 years ago
Depends on: 969590
(Assignee)

Updated

5 years ago
Assignee: rail → mgervasini
err, wrong bug
No longer depends on: 777574, 926264, 965534, 966772, 966796, 966806, 967223, 967816, 969590, 970239
(Assignee)

Comment 7

5 years ago
Quick update:

here's the list of out m1.large instances in the east region:

buildbot-master51
buildbot-master52
buildbot-master55
buildbot-master57
buildbot-master59
buildbot-master61
buildbot-master62
buildbot-master63
buildbot-master67
buildbot-master69
buildbot-master70
buildbot-master71
buildbot-master75
buildbot-master76
buildbot-master77

All the instances above are using less than 10 Gb disk space, beetween 3 and 8 Gb RAM and the cpus are not too much stressed (all except buildbot-master51, linux test, which uses ~25 Gb on the disk)

Tomorrow, I'll find the machine in this list that uses less resources and I'll make it a m3.medium instance and let it run over the weekend to see how it behaves.
(Assignee)

Comment 8

5 years ago
I've found a good candidate in buildbot-master52. Unfortunately I have discovered after the graceful shutdown, that bm52 cannot be recreated as m3.medium or m3.large because these kind of instances are not available in us-east-1b availability zone; the error message says that m3.medium is available in us-east-1a, us-east-1c and us-east-1d

I am restating buildbot-master52 as m1.large (as it was before) and then I'll find another candidate in a different availability zone.
(Assignee)

Comment 9

5 years ago
Update:

more data our m1.large masters in east region

Master   Zone    Memory  Disk
name             in use  usage
                   [Mb]
-------------------------------
bm51    east-1a    5353    25G
bm52    east-1b    2050    6.3G
bm59    east-1d    3221    3.3G
bm61    east-1c    4132    4.3G
bm62    east-1b    4522    4.3G
bm63    east-1b    4530    4.0G
bm67    east-1c    4477    7.1G
bm69    east-1a    2928    8.0G
bm70    east-1c    2937    7.2G
bm71    east-1c    2920    7.1G
bm75    east-1b    2199    4.3G
bm76    east-1a    2382    5.7G
bm77    east-1c    1984    6.8G


* Instances in east-1b cannot be converted to m3.medium or m3.large

* Memory in use is total memory - free memory (free + buffer + cached):
  + m3.medium has 3.75 Gb ram (3840 Mb)  
  + m3.large  has 7.5  Gb ram
  + m1.large  has 7.5  Gb ram

* Disk usage: 
  on build-masters 3.7 Gb are used for /dev/shm (defaults to half the RAM size)

m1.large and m3.large are just "too big" for the role of build master, m3.medium will probably just work but since there is no a wide margin (in terms of ram and disk), I prefer not to do the switch during the weekend and wait until next Monday.
(Assignee)

Comment 10

5 years ago
Update:

bm77 is now a m3.medium instance, I'll post some data about the memory/disk/cpu usage as soon this master gets some jobs done.
(Reporter)

Updated

5 years ago
Summary: s/m1.large/m3.large/ → s/m1.large/m3.medium/
Would be a good to add some swap to handle with long running master instances, which gradually accumulate memory, eg this one up since Jan 14:

[cltbld@buildbot-master51.srv.releng.use1.mozilla.com ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:          7467       7159        308          0        495       2379
-/+ buffers/cache:       4284       3183
Swap:            0          0          0

Using more than a m3.medium has.
(Assignee)

Comment 12

5 years ago
some data from last 4 days:

memory: http://mzl.la/1jGbX6z
load: http:/mzl.la/1fyeUCm

there are some spikes in the cpu load. 

I have created a swap file, using the add_swap script from bug 964880, to see how this affects the performances of bm77
(Reporter)

Comment 13

5 years ago
(In reply to Massimo Gervasini [:mgerva] from comment #12)
> some data from last 4 days:
> 
> memory: http://mzl.la/1jGbX6z
> load: http:/mzl.la/1fyeUCm
> 
> there are some spikes in the cpu load. 
> 
> I have created a swap file, using the add_swap script from bug 964880, to
> see how this affects the performances of bm77

note cpu.idle and cpu.wait are the useful metrics to track
(Assignee)

Comment 14

5 years ago
Here are the graphs for cpu idle and wait 

cpu.idle: http://mzl.la/1dQvGQz 
cpu.wait: http://mzl.la/1fBxOrM 

There's no data for the last two days. Is there anything wrong with the data collection on bm77 or am I doing something wrong with graphite?

here some data from the buildbot database: I have extracted the time required to execute the get_basedir command on the slave, from the initial request to end of the job. get_basedir runs the "pwd" command on the slave and its execution is almost immediate, so this is a good measurement of the time spent from the get_basedir command in master's queue


buildbot master 77, get_basedir command, execution time in seconds: 
insance type   period           avg      max   90th percentile
m1.large       1-13 Feb         0.6119    42   1
m3.medium      15 Feb - today   0.6642   119   1
(Assignee)

Comment 16

5 years ago
I have created a new m3.medium instance from scratch (bm70), there are still a couple of things to fix:

* creating a swap file 
* add a local instance storage
Depends on: 964880
(Assignee)

Updated

5 years ago
Depends on: 976548
(Assignee)

Comment 17

5 years ago
Hi rail,

this patch enables m3.medium masters with the following specs:
* a 15 GB disk mounted under /
* the instance storage (4GB - SSD), mounted under /mnt/instance_storage, will be used for swapping (bug 964880). 

It also sets the moz-type for this instances to "buildbot-master"
Attachment #8382917 - Flags: review?(rail)

Updated

5 years ago
Attachment #8382917 - Flags: review?(rail) → review+
(Assignee)

Updated

5 years ago
Attachment #8382917 - Flags: checked-in+
(Assignee)

Updated

5 years ago
No longer depends on: 976548
(Assignee)

Comment 18

5 years ago
puppet patch to enable buildbot-master70 as a build master
Attachment #8383129 - Flags: review?(rail)
(Assignee)

Comment 19

5 years ago
this patch adds buildbot-master70 in production-masters.json
Attachment #8383132 - Flags: review?(rail)

Updated

5 years ago
Attachment #8383129 - Flags: review?(rail) → review+
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch

Review of attachment 8383132 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/production-masters.json
@@ +631,5 @@
> +    "buildbot_branch": "production-0.8",
> +    "buildbot_python": "/builds/buildbot/build1/bin/python",
> +    "buildbot_setup": "/builds/buildbot/build1/buildbot/master/setup.py",
> +    "buildbot_version": "0.8.2",
> +    "datacentre": "scl3",

A nit, use aws-us-east-1 instead.
Attachment #8383132 - Flags: review?(rail) → review+
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch

Review of attachment 8383132 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/production-masters.json
@@ +653,5 @@
> +    "basedir": "/builds/buildbot/try1",
> +    "bbconfigs_branch": "production",
> +    "bbconfigs_dir": "/builds/buildbot/try1/buildbot-configs",
> +    "bbcustom_branch": "production-0.8",
> +  {

[drive-by-lucky-notice]

ugh -- this will break the whole json.

{
  ..
  stuff
  ...
{

--- basically it looks like your copy/paste included more things than you intended.

r- due to that, but if you fix it no objection
Attachment #8383132 - Flags: review-
(Assignee)

Updated

5 years ago
Attachment #8383129 - Flags: checked-in+
(Assignee)

Comment 22

5 years ago
Comment on attachment 8383132 [details] [diff] [review]
[tools] enable bm70 as buildbot master.patch

Thanks rail, thanks Callek!
Attachment #8383132 - Flags: checked-in+
(Assignee)

Comment 23

5 years ago
we have a m3.medium instance, buildbot-master70, in production. This master has two slaves locked to
it: bld-linux64-ec2-003 and bld-linux64-ec2-038. I'm going to leave it running over the weekend to see if the new instance type behaves properly.  

I will check on it on Saturday as well.
(Assignee)

Comment 24

5 years ago
bm70 is not able to sign because of: "token generation failed, error message: 403 Forbidden" exception. 

This patch adds bm70 ip address in signing_new_token_allowed_ips.
Attachment #8384637 - Flags: review?(bhearsum)
Attachment #8384637 - Flags: review?(bhearsum) → review+
(Assignee)

Updated

5 years ago
Attachment #8384637 - Flags: checked-in+
(Assignee)

Comment 25

5 years ago
Changes for buildbot-master70 have landed in production and all the builds are green. 
I am enabling bm70 in slavealloc and unlocking its slaves (bld-linux64-ec2-003 and bld-linux64-ec2-038)
(Assignee)

Comment 26

5 years ago
bm70 is working fine, I am starting replacing existing masters in east-1 with m3.medium instances.

On east-1 we have:

* 1 try master (bm59)
* 4 test masters (bm51, bm52, bm67, bm69)
* 5 build masters (bm57, bm61, bm62, bm63, bm70)

I am going to create 4 new masters to replace bm57, bm61, bm62 and bm63
(Assignee)

Comment 27

5 years ago
adding buildbot-master 71..74 in production-master.json
Attachment #8386041 - Flags: review?(sbruno)
(Assignee)

Comment 28

5 years ago
This patch recreates buildbot-master 71..74 as build masters and adds their IP addresses in signing_new_token_allowed_ips list.

Here's the list of IP addresses:

buildbot-master71: 10.134.49.111
buildbot-master72: 10.132.50.54
buildbot-master73: 10.132.49.181
buildbot-master74: 10.132.49.94

buildbot-master 72, 73 and 74 are in aws west-2 zone
Attachment #8386046 - Flags: review?(sbruno)
(Assignee)

Comment 29

5 years ago
fixed wrong datacentre and hostnames for nodes in west-2 region (bm72, bm73 and 74)
Attachment #8386041 - Attachment is obsolete: true
Attachment #8386041 - Flags: review?(sbruno)
Attachment #8386058 - Flags: review?(sbruno)
Comment on attachment 8386058 [details] [diff] [review]
[tools] - add bm{71..74} to production-master.json.patch

Hey mgerva,

The name key for each of the new entries is the same: "bm70-build1", I guess this needs to be fixed.
Attachment #8386058 - Flags: review?(sbruno) → review-
(Assignee)

Comment 31

5 years ago
Thanks simone,

here is the patch with updates "names:"
Attachment #8386058 - Attachment is obsolete: true
Attachment #8386064 - Flags: review?(sbruno)
Attachment #8386046 - Flags: review?(sbruno) → review+
Comment on attachment 8386064 [details] [diff] [review]
[tools] - add bm{71..74} to production-master.json.patch

Now it looks ok to me.

In order to review, I compared this patch with the patches relative to buildbot-master70 (now in prod), and all relevant code parts seem to have been updated correctly.
Attachment #8386064 - Flags: review?(sbruno) → review+
(Assignee)

Updated

5 years ago
Attachment #8386064 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Attachment #8386046 - Flags: checked-in+
Comment on attachment 8386046 [details] [diff] [review]
[puppet] recreate bm{71..74} as build masters and add them in signing_new_token_allowed_ips.patch

merged to production
(Assignee)

Comment 34

5 years ago
I am rolling out the first batch of m3.medium build masters in production.

bm70, bm71, bm72, bm73 and bm74 in west-2 are now online and are already receiving new jobs. Each of the new instances will replace a m1.large instance as follows:
* bm70 -> bm63
* bm71 -> bm61
* bm72 -> bm58
* bm73 -> bm64
* bm74 -> bm65

After a clean shutdown, m1.large instances will be stopped in aws so they can be re-enabled in case of emergency
(Assignee)

Comment 35

5 years ago
Second batch of masters, 1 build master and 4 try masters, as follows:

try:
bm75, bm76 in east-1
bm78, bm79 in west-2

build:
bm77 in east-1

this patch sets enabled => false, for all the masters above and for bm58, bm61, bm63, bm64, bm65 as part of the migration from the first batch
Attachment #8387036 - Flags: review?(rail)
(Assignee)

Comment 36

5 years ago
Enable 4 new try masters: 

* bm75, bm76 east-1
* bm78, bm79 west-2

enable a build master in east-1: bm77 (so we have 3 build masters for region)

this patch also adds bm77 IP address (10.134.49.77) to signing_new_token_allowed_ips
Attachment #8387039 - Flags: review?(rail)

Updated

5 years ago
Attachment #8387036 - Flags: review?(rail) → review+
(Assignee)

Comment 37

5 years ago
added new try masters IPs to signing_new_token_allowed_ips
Attachment #8387039 - Attachment is obsolete: true
Attachment #8387039 - Flags: review?(rail)
Attachment #8387057 - Flags: review?(rail)

Updated

5 years ago
Attachment #8387057 - Flags: review?(rail) → review+
(Assignee)

Updated

5 years ago
Attachment #8387036 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Attachment #8387057 - Flags: checked-in+
Something here broke Firefox 28.0b9.

bm71-74 are enabled in slavealloc and accept new jobs, but they are not enabled in production-masters.json, what prevents fabric reconfigs on them. release-runner uses this files to reconfig only enabled masters, so it skipped the ones listed above.

As a result we've seen a failed bouncer submitter job trying to submit jobs for 28.0b8. I also cancelled 2 xulrunner builds to be sure.

I disabled the masters in slavealloc and started a graceful shutdown
I also want to call out, if not already done, that when they are brought back up we need to go into nagios and re-enable notifications for these hosts, otherwise they could go down and no-one would be the wiser
(Assignee)

Comment 40

5 years ago
bm75 IP address is in east-1-b subnet but m3.medium instances are not available in this subnet. Deleting this entry from puppet.
Attachment #8387689 - Flags: review?(rail)
(Assignee)

Comment 41

5 years ago
Deleting bm75 from production-master.json file - it cannot be puppetized as m3.medium, so we should remove this entry created in attachment 8387036 [details] [diff] [review].
Attachment #8387691 - Flags: review?(rail)
(Assignee)

Comment 42

5 years ago
bm75 IP address is in east-1-b subnet but m3.medium instances are not available in this subnet. Deleting this entry from puppet... and its IP address from signing_new_token_allowed_ips
Attachment #8387689 - Attachment is obsolete: true
Attachment #8387689 - Flags: review?(rail)
Attachment #8387695 - Flags: review?(rail)
Comment on attachment 8387691 [details] [diff] [review]
[tools] remove buildbot-master75 from production-masters.json.patch

I'd rather change the DNS entry for bm75 instead in the inventory. Just find a free IP in other subnets, change the A and PTR entries in the inventory: https://inventory.mozilla.org/en-US/core/search/#q=10.134.48.158
Attachment #8387691 - Flags: review?(rail) → review-

Updated

5 years ago
Attachment #8387695 - Flags: review?(rail) → review-
(Assignee)

Comment 44

5 years ago
A patch to enable bm71..74 and bm76..79 has just landed in tools (changeset: 4427:13eedba9148f)
The new masters have been enabled in slavealloc too.
(Assignee)

Comment 45

5 years ago
All the new masters are on-line and working. 

I have started the shutdown procedure for the old masters.
(In reply to Massimo Gervasini [:mgerva] from comment #45)
> All the new masters are on-line and working. 
> 
> I have started the shutdown procedure for the old masters.

These are starting to alert in nagios. Please downtime or otherwise silence the checks in the future. I've ack'ed the one that just went off.
(Assignee)

Comment 47

5 years ago
Disable buildbot master 56, 57, 59 and 60 from production-masters.json before stopping the old master instances.
Attachment #8389816 - Flags: review?(catlee)
Attachment #8389816 - Flags: review?(catlee) → review+
(Assignee)

Updated

5 years ago
Attachment #8389816 - Flags: checked-in+
(Assignee)

Comment 48

5 years ago
The following masters have been stopped:

* bm56
* bm57
* bm58
* bm59
* bm60
* bm63
* bm64

two old masters instances are still running because they have some other services

* bm61 - self serve agent
* bm65 - slave rebooter
(Assignee)

Comment 49

5 years ago
This patch enables selfserve_agent on bm70, bm71, bm72 and bm73 and creates two new buildmasters: bm91 and 94
Attachment #8393003 - Flags: review?(catlee)
(Assignee)

Comment 50

5 years ago
Create two new build masters: bm90 and bm94
Attachment #8393004 - Flags: review?(catlee)
Comment on attachment 8393004 [details] [diff] [review]
[tools] create two new buildmasters: bm91 and bm94.patch

Review of attachment 8393004 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/production-masters.json
@@ +1176,5 @@
> +    "buildbot_branch": "production-0.8",
> +    "buildbot_python": "/builds/buildbot/build1/bin/python",
> +    "buildbot_setup": "/builds/buildbot/build1/buildbot/master/setup.py",
> +    "buildbot_version": "0.8.2",
> +    "datacentre": "aws-us-east-1",

should be aws-us-west-2 I think?
Attachment #8393004 - Flags: review?(catlee) → review+
Attachment #8393003 - Flags: review?(catlee) → review+
Depends on: 985055
(Assignee)

Updated

5 years ago
Depends on: 985088
(Assignee)

Updated

5 years ago
Attachment #8393003 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Attachment #8393004 - Flags: checked-in+
(Assignee)

Comment 52

5 years ago
Removing bm{56..65} entries from production-masters.json - these instances are already disabled on slavealloc.
Attachment #8393078 - Flags: review?(catlee)
Attachment #8393078 - Flags: review?(catlee) → review+
(Assignee)

Comment 53

5 years ago
remove buildbot master entries for bm{56..65} from nodes. Removed their IPs from signing_new_token_allowed_ips.
Attachment #8393088 - Flags: review?(catlee)
Attachment #8393088 - Flags: review?(catlee) → review+
(Assignee)

Comment 54

5 years ago
Current status:

m1.large masters
* bm{56..65} instances except for bm61 bm62 and bm65 have been stopped because they are running extra services that are not migrated yet (self-serve agent, slaverebootrbug 985088)
* bm55 and bm66 still in production

m3.medium masters
* bm{70..74}, bm{76..79} are on-line and taking jobs 
* bm75 is on hold because it has a new IP address and this requires new flows (bug 985055)
* bm91 and bm94 are running test jobs - if tests pass, they will replace bm55 and bm66
(Assignee)

Comment 55

5 years ago
New masters have been running for several hours and all the builds are green. This patch enables them in production-masters.json.

When this patch lands, I'll start a graceful shutdown of bm55 and bm66
Attachment #8393509 - Flags: review?(catlee)
(Assignee)

Updated

5 years ago
Attachment #8393078 - Flags: checked-in+
Attachment #8393509 - Flags: review?(catlee) → review+
(Assignee)

Comment 56

5 years ago
(In reply to Massimo Gervasini [:mgerva] from comment #55)

> When this patch lands, I'll start a graceful shutdown of bm55 and bm66

err bm55 is a try master, I'm not stopping it.
(Assignee)

Updated

5 years ago
Attachment #8393509 - Flags: checked-in+
(Assignee)

Comment 57

5 years ago
(In reply to Massimo Gervasini [:mgerva] from comment #56)
> (In reply to Massimo Gervasini [:mgerva] from comment #55)
> 
> > When this patch lands, I'll start a graceful shutdown of bm55 and bm66
> 
> err bm55 is a try master, I'm not stopping it.

Please ignore the above comment, I am stopping bm55.
(Assignee)

Updated

5 years ago
Attachment #8393088 - Flags: checked-in+
(Assignee)

Comment 58

5 years ago
Disable buildbot master 55 and 66; removing their IP addresses from signing_new_token_allowed_ips
Attachment #8393583 - Flags: review?(catlee)
(Assignee)

Comment 59

5 years ago
Remove buildbot master 55 and 66 from production-masters.json

Buildbot master 66 is still running some jobs so this patch (and the puppet one) will land when the master has stopped properly.
Attachment #8393584 - Flags: review?(catlee)
Attachment #8393584 - Flags: review?(catlee) → review+
Attachment #8393583 - Flags: review?(catlee) → review+
(Assignee)

Updated

5 years ago
Attachment #8393583 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Attachment #8393584 - Flags: checked-in+
(Assignee)

Comment 60

5 years ago
Status update:

New m3.medium instances, bm{70..79}, bm91 and bm94: 
* all the new m3.masters are active (except for buildbot master75 - bug 985055)

Old m1.large instances, bm{55..66} 
* the following instances have been terminated:  
 bm56
 bm57
 bm58
 bm59
 bm60
 bm63
 bm64
 
* leaving the following instance stopped for a while (it has been stopped for few hours)
  bm55 

* the following instances are still up and running because they are still running other services (buildbot process is stoppe)
 bm61
 bm62
 bm65
 bm66
No longer depends on: 985088
Depends on: 985088
(Assignee)

Comment 61

5 years ago
Two green builds on bm75, waiting for the current job to finish to enable master in production-masters.json
Attachment #8394862 - Flags: review?(rail)

Updated

5 years ago
Attachment #8394862 - Flags: review?(rail) → review+
(Assignee)

Updated

5 years ago
Attachment #8394862 - Flags: checked-in+
(Assignee)

Comment 62

5 years ago
Buildbot-master75 is in production. All the try/build buildbot masters on m1.large have been migrated to m3.medium instances.

Some old m1.large machines require further steps before we can terminate them because they have some running services:

bm61: self serve 
bm62: self serve
bm65: self serve, slave rebooter
bm66: self serve, gaia_bumper, b2g_bumper (Bug 985767 and Bug 986474)

* self serve is already running on m3.medium machines (bm70,71,72 and 73) but it needs to be tested.
* slave rebooter requires another puppet patch that needs to be coordinated with buildduty
* gaia_bumper and b2g_bumper will never fit on a m3.medium instance because of the disk and cpu usage, so bm66 will stay up for a while.
(Assignee)

Updated

5 years ago
Depends on: 987173
(Assignee)

Comment 63

5 years ago
This patch removes selfserve_agent from bm61, bm62, bm63 and bm65

Waiting for nagios notifcations to be enabled on new self serve agents before landing this patch (bug 987173)
Attachment #8395713 - Flags: review?(catlee)
(Assignee)

Updated

5 years ago
Depends on: 987245
Attachment #8395713 - Flags: review?(catlee) → review+
(Assignee)

Updated

5 years ago
Attachment #8395713 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Depends on: 987793
(Assignee)

Comment 64

5 years ago
Preparing to migrate slaverebooter from bm65 to bm74. Before landing this patch we need to wait for the nagios monitoring for slaverebooter on bm74 (bug  987793).
Attachment #8396477 - Flags: review?(catlee)
(Assignee)

Comment 65

5 years ago
bm55 was still active in puppet master definition. This patch removes it and its IP address from signing_new_token_allowed_ips
Attachment #8396536 - Flags: review?(catlee)
Attachment #8396477 - Flags: review?(catlee) → review+
Attachment #8396536 - Flags: review?(catlee) → review+
(Assignee)

Updated

5 years ago
Attachment #8396536 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Blocks: 990775
I migrated bm69 to an m3.medium today.
buildbot-master-servo-01 has been migrated to an m3.medium, too.
(Assignee)

Updated

5 years ago
Attachment #8396477 - Flags: checked-in+
(Assignee)

Comment 68

5 years ago
Slave rebooter moved from bm65 to bm74.
(Assignee)

Comment 69

5 years ago
Hi Ben,

Are there any other buidbot-masters to migrate to m3.medium?
Flags: needinfo?(bhearsum)
(In reply to Massimo Gervasini [:mgerva] from comment #69)
> Hi Ben,
> 
> Are there any other buidbot-masters to migrate to m3.medium?

I don't think so. The only ones left are:
51, 52, 53, 54, 67, 68 -- which are all linux64 test masters, which need more RAM than the m3.medium has.

65 - which is already stopped
66 - which I think is having its services moved elsewhere?
Flags: needinfo?(bhearsum)
(Assignee)

Comment 71

5 years ago
> 65 - which is already stopped
> 66 - which I think is having its services moved elsewhere?

(In reply to Ben Hearsum [:bhearsum] from comment #70)
> (In reply to Massimo Gervasini [:mgerva] from comment #69)
> > Hi Ben,
> > 
> > Are there any other buidbot-masters to migrate to m3.medium?
> 
> I don't think so. The only ones left are:
> 51, 52, 53, 54, 67, 68 -- which are all linux64 test masters, which need
> more RAM than the m3.medium has.
> 
> 65 - which is already stopped
> 66 - which I think is having its services moved elsewhere?

* bm65 is stopped, waiting for all the netflows/nagios notifications to be closed (bug 990775)
* bm66 is tracked here: Bug 990172
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.