Closed Bug 584527 Opened 14 years ago Closed 14 years ago

Move some production build slaves to try pool

Categories

(Release Engineering :: General, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

Details

Attachments

(2 files, 1 obsolete file)

We currently have very bad wait times on the try server while we have excellent wait times on the production builders.
If we move some machines from the production pool to the try pool we can improve these wait times.
We have to be careful not to affect the wait-times on production.
I believe we don't need to change the hostnames but just clobber the builds and replace production keys for try keys.

TODO research lower than usual number of IX slaves
Once I figure out the lower number of IX machines I will update the patch.

I did a quick scan and there is room for movement:

CURRENT DISTRIBUTION
--- production/mobile/try:
* linux-VMs 44/13/30
* linux-IXs 9/8/0
* win32-VMs 55/4/36
* win32-IXs 18/0/0

NEW DISTRIBUTION
--- production/mobile/try/change:
* linux-VMs 40/13/34 ( -4,  0,  +4)
* linux-IXs  7/7/2   ( -2, -1,  +3)
* win32-VMs 45/4/46  (-10,  0, +10)
* win32-IXs 14/0/4   ( -4,  0,  +4)

NOTE: There are currently 4 Linux IX machines and 4 Windows IX machines on pm waiting for beta3
NOTE2: The mobile pool is currently separated from the production pools
NOTE3: There might be less slaves listed as there should be since some slaves could have been rebooting while I was counting and some others might have been loaned or might be missing

DATA:
#####
pm01: linux-slaves-[01-13,29-34,48-49] - 21 VMs
pm01: linux-ix-[03,12,14,19] - 4 IXs
pm03: linux-slave{14,27} - 13 VMs
pm03: linux-ix-[4,6-8,21] - 5 IXs
mobile: linux-slaves-[28,35-40,43-47,50] - 13 VMs
mobile: linux-ix-slaves-[2,9-10,13,15-18] - 8 IXs

pm01: win32-slave-[12-43,54,56-59] -37 VMs
pm01: win32-IX-slave-[10,12,14,16,20-21,24-25]  - 8 IXs
pm03: win32-slave-[01-11,44,47-51,55] - 18VMs
pm03: win32-IX-slave-[3-4,6-9,15,17-18,22] - 10 IXs
mobile: win32-slave-[45-46,52-53] - 4VMs
mobile: win32-IX-slave - 0 IXs
Attachment #462959 - Flags: feedback?(ccooper)
Comment on attachment 462959 [details] [diff] [review]
Move slaves from production pool to try pool

These changes seem to match up with your proposal, which I support.
Attachment #462959 - Flags: feedback?(ccooper) → feedback+
(as landed)
After a backout and an awful deployment I got this landed:
http://hg.mozilla.org/build/buildbot-configs/rev/ad571f121b2e

This the list of slaves that have been moved around:
moz2-linux-slave47
moz2-linux-slave48
moz2-linux-slave49
moz2-linux-slave50
mv-moz2-linux-ix-slave22
mv-moz2-linux-ix-slave23

NOTE: To replace the keys on Linux I had to login as root to unmount the .ssh keys under scratchbox (which was mounted twice).
Attachment #462959 - Attachment is obsolete: true
Attachment #464153 - Flags: checked-in+
These ones have been moved as well:
win32-slave50
win32-slave51
win32-slave52
win32-slave53
win32-slave54
win32-slave55
win32-slave56
win32-slave57
win32-slave58
win32-slave59
mw32-ix-slave22
mw32-ix-slave23
mw32-ix-slave24
mw32-ix-slave25
and this complete the transition of slaves.

The summary is that we have moved the following to the try pool
*  4 linux VMs
*  2 linux IXs
* 10 win32 VMs
*  4 win32 IXs
This patch removes mv-moz2-linux-ix-slave24 from all calculations since it had been repurposed as the win64 ref machine.

This patch move the following to the try pool:
* 5 IX linux slaves (guess who is new wost offender on the try server)
* 1 IX win32 slave

I have checked the wait times from yesterday and linux nailed 100%. 3 of the IX machines will come from the mobile master.

Win32 did not nail it yesterday. I moved 3 IX machines that were abandoned on staging. Out of these 3 machines I want one of them on the try pool.

NOTE: This plan will move forward like this if the wait times for tomorrow are similar.

After this 2nd move:
============= pm{01,03}/mobile/try
* linux IXs        8   /  7   / 6    (-2 , -3, +5)
* win32 IXs       19   /  0   / 5    (+2*,  0, +1) (NOTE the asterisk indicates the 2 IXs that were on staging)
Attachment #464527 - Flags: review?(ccooper)
Attachment #464527 - Flags: review?(ccooper)
Comment on attachment 464527 [details] [diff] [review]
move few more slaves

Unlike my comment, my patch says that I am moving 2 Win32 IX machines.
If wait times are good that is what I will do.

nthomas: coop is not around could you please review this?
Attachment #464527 - Flags: review?(nrthomas)
FTR when I copied .ssh/ for the Win32 slaves I copied them from a linux try slaves since I could not scp from a windows try slave.

This caused uploadsymbols to fail since build.m.o and they slave had never talked before.

I connected to all Win32 slaves that had been moved and typed this:
 ssh -i .ssh/trybld_dsa trybld@build.mozilla.org
and accepted the prompt.

We should pay attention to this if we move more slaves.
(In reply to comment #6)
> FTR when I copied .ssh/ for the Win32 slaves I copied them from a linux try
> slaves since I could not scp from a windows try slave.
> 
> This caused uploadsymbols to fail since build.m.o and they slave had never
> talked before.
> 
> I connected to all Win32 slaves that had been moved and typed this:
>  ssh -i .ssh/trybld_dsa trybld@build.mozilla.org
> and accepted the prompt.
> 
> We should pay attention to this if we move more slaves.

Is there a ~/.ssh/config file? If so, can you make any paths in it are correct? Generally they aren't if you copy the config from Linux to Windows.

In the future, a safer way to do this is:
- scp Windows keys from an existing slave somewhere
- scp Those keys onto new slaves
Comment on attachment 464527 [details] [diff] [review]
move few more slaves

I don't think we should deprive the moz2 pool of any more ix machines. Yesterday there were clobbers set and we kept getting builds on VMs, hence very slow results for win32.
Attachment #464527 - Flags: review?(nrthomas) → review-
It works for me.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: