Closed Bug 1292656 Opened 8 years ago Closed 8 years ago

Reduce 10.6 test pool to size appropriate for testing only mozilla-esr45 and comm-esr45

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: aselagea)

References

Details

Attachments

(3 files)

Only our ESR branches (Firefox and Thunderbird) are running on our 10.6 machines now. We have 50 machines configured right now, but can probably get by with around 10.

We'll need to remove the extra machines from our configs, etc.
(In reply to Chris Cooper [:coop] from comment #0)
> Only our ESR branches (Firefox and Thunderbird) are running on our 10.6
> machines now. We have 50 machines configured right now, but can probably get
> by with around 10.

Amy: how many minis per rack shelf? If reclaiming rack space is an issue, we could make it a multiple of that number.
Flags: needinfo?(arich)
It's 2 per shelf, so just make it an even number. The rack view in inventory will show you which are on the same shelf so you can pick ones that are together. I think we have some intermingled with builders or new testers, but if we could free up a whole rack, that would be awesome. It means we could get rid of some infrastructure (switching gear, PDUs, racks) too.
Flags: needinfo?(arich)
Whether or not it's supposed to, mozilla-release is still running 10.6 tests, so you're missing either a dependency on a bug about shutting that off, or a dependency on the *next* mergeday.
At the moment, we still run OS X 10.6 tests on ESR branches + mozilla-release and try (if explicitly specified).
See Also: → 1269543
Out of 50 enabled slaves, 48 are in "201-9 - scl3" rack and 2 in "201-8 - scl3", so I don't think we could free up a whole rack. Moreover, in bug 1279394 we disabled and removed from configs 69 minis, but they are still present in inventory with status=production (I suppose they weren't removed from the rack).

Coop: any ideas on how many minis we still want to keep enabled?
Flags: needinfo?(coop)
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #4)
> At the moment, we still run OS X 10.6 tests on ESR branches +
> mozilla-release and try (if explicitly specified).

Hrmmm, you're right. We need to support 10.6 on release for at least the next 6 weeks in case of a chemspill in the Firefox 48 cycle.

In that case, we should step down to 20 machines for now, and then 10 machines once Firefox 49 is out.
Flags: needinfo?(coop)
Assignee: nobody → aselagea
Disabled the two minis from "201-8 - scl3" (t-snow-r4-0127 and t-snow-r4-0128) and another 28 from "201-9 - scl3".

mysql> select count(*) from slaves where (name like 't-snow-r4%' and notes='Disabled in bug 1292656' and enabled=0);
+----------+
| count(*) |
+----------+
|       30 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from slaves where name like 't-snow-r4%' and enabled=1;
+----------+
| count(*) |
+----------+
|       20 |
+----------+
1 row in set (0.00 sec)
Patch to remove 30 minis from bb-configs.
Attachment #8780056 - Flags: review?(coop)
From what I can see, most of the disabled minis are still present in inventory. Also, the Nagios checks are still up. 
@Amy: do you know if there are any plans for these?
Flags: needinfo?(arich)
Attachment #8780056 - Flags: review?(coop) → review+
:allin: We'll need to file a decom bug with dcops for the entire list of 10.6 machines that have been shut down (both batches).
Flags: needinfo?(arich)
Attachment #8780056 - Flags: checked-in+
Depends on: 1294655
<nagios-releng> Tue 15:10:26 PDT [4112] nagios1.private.releng.scl3.mozilla.com:t-snow-r4-machines-ping cluster is WARNING: CLUSTER WARNING: t-snow-r4-machines-ping cluster: 72 ok, 0 warning, 0 unknown, 51 critical

Downtimed for 24 hours, could someone fix up the thresholds ?
The hosts that are being decommissioned should just be removed frmo nagios, and that will clear up on its own.
Nobody reads, so this won't actually reduce our "startup, crash" try load to nothing, but at least it will feel useful.
Attachment #8788012 - Flags: review?(aselagea)
Comment on attachment 8788012 [details] [diff] [review]
Warn off trychooser users

Thanks!
Attachment #8788012 - Flags: review?(aselagea) → review+
Attachment #8788012 - Flags: checked-in+
Updated TryChooser to include the change above.
Minis have been unracked and removed from nagios, I also did some DB cleaning. Noticed that the inventory entries have not been updated/removed yet. 
@Amy: should that be done too?
Flags: needinfo?(arich)
Van should be using invtool to mark them as decommed in inventory once he removes them. I've NIed him for that.
Flags: needinfo?(arich) → needinfo?(vle)
Flags: needinfo?(vle)
all the retired r4s have been set to decommissioned and location as removed as we are donating these minis.
Great, thanks!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Per #c6, we should decommission another 10 t-snow-r4 machines since Firefox 49 is out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Preferred to keep the last 10 machines of the current pool so that we don't include t-snow-r4-085 (which has RAM issues).
Attachment #8797440 - Flags: review?(coop)
Attachment #8797440 - Flags: review?(coop) → review+
Attachment #8797440 - Flags: checked-in+
Depends on: 1308406
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: