The default bug view has changed. See this FAQ.

prepare the new seamicro machines for production

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: massimo, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

3 years ago
Tracking what needs to be done on releng side to bring the new 64 seamicro machines to production.
(Reporter)

Comment 1

3 years ago
Created attachment 8413885 [details] [diff] [review]
[buildbot-configs] add the new seamicro machines in buildbot-configs.patch

Add the new seamicro machines in buildbot-configs
Attachment #8413885 - Flags: review?(catlee)

Updated

3 years ago
Attachment #8413885 - Flags: review?(catlee) → review+
(Reporter)

Comment 2

3 years ago
Created attachment 8413887 [details]
new-sea-micro-64.csv

slavealloc import file for the new seamicro boxes (b-2008-sm-0001..003 are already in slavealloc).
Attachment #8413887 - Flags: review?(catlee)
(Reporter)

Updated

3 years ago
Depends on: 982261

Updated

3 years ago
Attachment #8413887 - Flags: review?(catlee) → review+
(Reporter)

Updated

3 years ago
Attachment #8413885 - Flags: checked-in+
I added bm-2008-sm to slave_health to rescue it from exceptions:
  http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8

Comment 4

3 years ago
(In reply to Nick Thomas [:nthomas] (PTO Apr 17-27 PST) from comment #3)
> I added bm-2008-sm to slave_health to rescue it from exceptions:
>   http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8

It doesn't affect Nick's patch, but I'm confused here based on nomenclature: are these machines running Linux or Windows? 

'bm-2008' leads me to think they're Windows, but they've been added to the buildbot-configs as LINUX64_EC2: https://hg.mozilla.org/build/buildbot-configs/rev/44027fc7bced#l1.14
(Reporter)

Comment 5

3 years ago
Created attachment 8414850 [details] [diff] [review]
[buildbot-configs] add seamicro machines to production.patch

Hi coop, you're right! the seamicro are WIN64_REV2 slaves
Attachment #8413885 - Attachment is obsolete: true
Attachment #8414850 - Flags: review?(coop)

Updated

3 years ago
Attachment #8414850 - Flags: review?(coop) → review+
(Reporter)

Updated

3 years ago
Attachment #8414850 - Flags: checked-in+
per :taras in person I was asked to disable a whole chunk of win64 production builders except for ~10-20

So since b-2008-ix are 17 machines, and we have 3 of the b-2008-sm nodes. So with :hwine's help we disabled all ~98 w64-ix-*'s, and will re-enable all 94 previously-enabled ones in a few hours.

CC'd all involved + sheriffs
Blocks: 1005426
Blocks: 1005427
Blocks: 1005428
Two burned jobs and one accidental kill, pretty nice for a Friday night deploy.
(Reporter)

Comment 8

3 years ago
b-2008-sm-000{1..3} are now enabled in slavealloc.

They have executed some jobs, all green except for http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/Thunderbird%20comm-central%20win32%20l10n%20nightly/builds/2187, which looks a general build failure not related to this machine
(Reporter)

Comment 9

3 years ago
Some results: w64-ix-slave129 vs. b-2008-sm-0002

* Builder WINNT 5.2 fx-team leak test build: 39 mins, 42 secs (w64-ix-slave129) [1]
* Builder WINNT 5.2 fx-team leak test build: 59 mins, 43 secs (seamicro ssd b-2008-sm-0002) [2]


                    w64-ix-slave129  |    b-2008-sm-0002
hg_update                   51 secs  |  16 mins, 36 secs                
compile             7 mins,  8 secs  |  24 mins, 30 secs 
make_buildsymbols  12 mins, 58 secs  |   6 mins, 31 secs
make_pkg_tests      7 mins, 44 secs  |   1 mins, 56 secs
make_pkg 'python    2 mins,  7 secs  |   2 mins, 51 secs


[1] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/544
[2] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/542
(Reporter)

Comment 10

3 years ago
more data:

The average compile time for the last 70 fx-team-win32-debug builds is 1654s, our seamicro spent 1471s compiling. 71% of builds are slower than it.

As soon we have more seamicro builds, I'll extract more data.
(Reporter)

Comment 11

3 years ago
more data:

WINNT 5.2 mozilla-inbound leak test build complie step: b-2008-sm-000{2,3} are 85% faster than the last 70 builds. Average time: 1801s, seamicro build time 1493s.
I think the data would be more conclusive if they were building try instead of every other branch.

Comment 13

3 years ago
(In reply to Mike Hommey [:glandium] from comment #12)
> I think the data would be more conclusive if they were building try instead
> of every other branch.

Massimo, can you switch these machines over? try has much more consistent perf than having a machine cycle between branches.
(Reporter)

Comment 14

3 years ago
Created attachment 8419257 [details] [diff] [review]
[buildbot-configs] Bug 1002634 - temporary move b-2008-sm machines from build to try.patch

moving b-2008-sm-* from WIN64_REV2 to TRY_WIN64_REV2
Attachment #8419257 - Flags: review?(catlee)

Updated

3 years ago
Attachment #8419257 - Flags: review?(catlee) → review+
We'll also need to update:
- slavealloc
- ssh keys
- vlan (maybe? :arr, any changes required to the network on these machines to do try builds?)
Flags: needinfo?(arich)
(Reporter)

Updated

3 years ago
Attachment #8419257 - Flags: checked-in+
These are not designed to be try machines, they're designed to be non-try builders.  If you want them to be try instead, we'll need to change their FQDNs and move them to a different vlan.
Flags: needinfo?(arich)
In case that wasn't clear, renaming implies that they will be reinstalled, as well.
Let's do it.
Merged and deployed to production.
(In reply to Chris AtLee [:catlee] from comment #18)

When you say "do it," do you mean rename and reimage?  It wasn't clear, since the next comment was that the existing stuff was deployed to production.
(In reply to Amy Rich [:arich] [:arr] from comment #20)
> (In reply to Chris AtLee [:catlee] from comment #18)
> 
> When you say "do it," do you mean rename and reimage?  It wasn't clear,
> since the next comment was that the existing stuff was deployed to
> production.

Let's reimage the box.
Depends on: 1008170
(Reporter)

Comment 22

3 years ago
b-2008-sm-0001 is now enabled in try/prod (https://bugzilla.mozilla.org/show_bug.cgi?id=1008170#c6) it is just waiting for a job
(Reporter)

Comment 23

3 years ago
re-enabled b-2008-sm-0002 in slavealloc (try/prod)
(Reporter)

Comment 24

3 years ago
At the moment, only b-2008-sm-0001 (non ssd), has completed some try jobs, here are some results:

WINNT 5.2 try leak test build, last 100 successful builds (in seconds):
average time: 3543 
b-2008-sm-0001 average time: 2813 
b-2008-sm-0001 is faster than 95% of the last 100 builds

WINNT 5.2 try leak test build, last 100 successful 'compile' steps (in seconds):
average time: 1681
b-2008-sm-0001 average time: 1269
b-2008-sm-0001 is faster than 96% of last 100 'compile' steps


seamicro machines are faster than other machines on try/build. As soon b-2008-sm-000{2,3} (ssd) get some jobs, we can extract some data on ssd vs. non ssd

Comment 25

3 years ago
massimo, don't ever use averages for perf comparison. Long tails mess those up. Our median try build time is ~3000s

Comment 26

3 years ago
Massimo or anyone else, do we have more datapoints for comparison yet?

eg seamicro vs world and ssd vs hd? Gotta get this prototype wrapped up.
(Reporter)

Comment 27

3 years ago
Hi taras,

more data:

try-win32-debug, end to end time in seconds, last 100 jobs: 
median: 3291
seamicro median: 2660
seamicro percentile: 91 st

try-win32, end to end time in seconds, last 100 jobs:
median: 3011
seamicro median: 2506
seamicro percentile: 95 th

try-win32-debug, hg_update, time in seconds, last 200 jobs:
hg_update median: 229
seamicro median: 171
percentile: 84 th

try-win32-debug, compile, time in seconds, last 200 jobs:
compile median: 1661
seamicro median: 1240
percentile: 95 th


b-2008-sm-0002 (ssd) has completed only 5 jobs,  b-2008-sm-0001 has completed 30 jobs
we're trying to put more load through b-2008-sm-0002 so we can get more data there. there haven't been that many requests for windows builds on try recently!

in any case, it seems pretty clear that these machines are faster than the existing machines regardless of if we use SSDs.

shall we proceed with moving the rest the seamicros into production right away, or wait to get better data on the SSDs?

I'd prefer to get more of these in production ASAP, and to swap out the drives once we know if they're worth it, and actually have them in hand. Taras, Amy, thoughts?
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(arich)

Comment 29

3 years ago
Amy is out for 10 days or so, Laura is filling in.

I'm ok with ordering ssds.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(laura)
Flags: needinfo?(arich)

Comment 30

3 years ago
Laura, can you arrange for a shipment of ssds?

Comment 31

3 years ago
Chris, i'd still like to get some harddrive comparisons if possible
(Reporter)

Comment 32

3 years ago
More data on ssd vs non ssd. 

Seamicro machines successfully completed 50 try jobs

b-2008-sm-0001, non ssd, 40 samples, time in seconds
compile median: 1253 
hg_update median: 180


b-2008-sm-0002, ssd, 10 samples, time in seconds
compile (ssd) median: 1142 
hg_update (ssd) median: 144

Comment 33

3 years ago
Q, did the seamicro machines get non-removable + write cache settings set? eg bug 1004508
Flags: needinfo?(q)

Comment 34

3 years ago
They got the same config as all the builders. I am logging in to confirm now.
Flags: needinfo?(q)

Updated

3 years ago
Depends on: 1014700

Updated

3 years ago
Depends on: 1014703
I have a quote from the vendor, but have asked if they can do better. They are going to talk to Samsung then will get back to me.
Flags: needinfo?(laura)

Comment 36

3 years ago
(In reply to Laura Thomson :laura from comment #35)
> I have a quote from the vendor, but have asked if they can do better. They
> are going to talk to Samsung then will get back to me.

Looks like we are ready to order 16(64/4) 1TB ssds to be split up at 256gb volumes.

Derek, can you post shipping details in bugzilla(or email me if confidential) so you can receive these? Thanks.
Flags: needinfo?(dmoore)
(In reply to Taras Glek (:taras) from comment #36)

> Derek, can you post shipping details in bugzilla(or email me if
> confidential) so you can receive these? Thanks.


Ideally, direct them to Van Le using the shipping address here:

https://mana.mozilla.org/wiki/display/DC/SCL3#SCL3-Shipping

If you have them, please email tracking numbers to dcops@ so we can follow the shipment.
Flags: needinfo?(dmoore)

Comment 38

3 years ago
Had to break up order due to availability.
This week:
4ssds 1Z602AW24216628636
1ssd 1Z602AW24216628636
1ssd 94055036993003046639
Next week:
10ssds 1Z74E33W4247908876
Depends on: 1017126
(Reporter)

Comment 39

3 years ago
Created attachment 8437789 [details] [diff] [review]
[buildbot-configs] add seamicro machines to production.patch

moving b-2008-sm-00{01..32} in try and b-2008-sm-00{33..64} in build
Attachment #8414850 - Attachment is obsolete: true
Attachment #8419257 - Attachment is obsolete: true
Attachment #8437789 - Flags: review?(rail)
Comment on attachment 8437789 [details] [diff] [review]
[buildbot-configs] add seamicro machines to production.patch

stamp
Attachment #8437789 - Flags: review?(rail) → review+
(Reporter)

Updated

3 years ago
Attachment #8437789 - Flags: checked-in+
Live with reconfig on 2014-06-11 08:47 PT
(Reporter)

Comment 42

3 years ago
Created attachment 8438604 [details]
seamicro.csv

moved 0004->0032 to try. Removed 0033 because it's already in slavealloc (https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c18)
Attachment #8413887 - Attachment is obsolete: true
Attachment #8438604 - Flags: review?(catlee)

Updated

3 years ago
Attachment #8438604 - Flags: review?(catlee) → review+
(Reporter)

Comment 43

3 years ago
The ssh keys have been uploaded and verified for all the seamicros machines (except for 0004 and 0031, https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c15)

All the seamicro hosts are now in slavealloc but only the following hosts are enabled: 

(try)
b-2008-sm-0001
b-2008-sm-0002
b-2008-sm-0003
b-2008-sm-0005
b-2008-sm-0006
b-2008-sm-0007
b-2008-sm-0008

(build)
b-2008-sm-0033
b-2008-sm-0034
b-2008-sm-0035
b-2008-sm-0036
b-2008-sm-0037

waiting for few green jobs to enable them all.
(Reporter)

Comment 44

3 years ago
All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are in production.

Updated

3 years ago
Depends on: 1027983
(In reply to Massimo Gervasini [:mgerva] from comment #44)
> All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are
> in production.

Due to this, dropping -0001..0003 from the list of blockers
No longer blocks: 1005426, 1005427, 1005428

Updated

3 years ago
No longer depends on: 1027983
These were put into production in June. Moving the dependency to bug 1047621 where the linking/CPU issues are being investigated.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.