Closed
Bug 1002634
Opened 11 years ago
Closed 10 years ago
prepare the new seamicro machines for production
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: massimo, Unassigned)
References
Details
Attachments
(2 files, 4 obsolete files)
1.65 KB,
patch
|
rail
:
review+
massimo
:
checked-in+
|
Details | Diff | Splinter Review |
4.74 KB,
text/csv
|
catlee
:
review+
|
Details |
Tracking what needs to be done on releng side to bring the new 64 seamicro machines to production.
Reporter | ||
Comment 1•11 years ago
|
||
Add the new seamicro machines in buildbot-configs
Attachment #8413885 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8413885 -
Flags: review?(catlee) → review+
Reporter | ||
Comment 2•11 years ago
|
||
slavealloc import file for the new seamicro boxes (b-2008-sm-0001..003 are already in slavealloc).
Attachment #8413887 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8413887 -
Flags: review?(catlee) → review+
Reporter | ||
Updated•11 years ago
|
Attachment #8413885 -
Flags: checked-in+
Comment 3•11 years ago
|
||
I added bm-2008-sm to slave_health to rescue it from exceptions:
http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8
Comment 4•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] (PTO Apr 17-27 PST) from comment #3)
> I added bm-2008-sm to slave_health to rescue it from exceptions:
> http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8
It doesn't affect Nick's patch, but I'm confused here based on nomenclature: are these machines running Linux or Windows?
'bm-2008' leads me to think they're Windows, but they've been added to the buildbot-configs as LINUX64_EC2: https://hg.mozilla.org/build/buildbot-configs/rev/44027fc7bced#l1.14
Reporter | ||
Comment 5•11 years ago
|
||
Hi coop, you're right! the seamicro are WIN64_REV2 slaves
Attachment #8413885 -
Attachment is obsolete: true
Attachment #8414850 -
Flags: review?(coop)
Updated•11 years ago
|
Attachment #8414850 -
Flags: review?(coop) → review+
Reporter | ||
Updated•11 years ago
|
Attachment #8414850 -
Flags: checked-in+
Comment 6•11 years ago
|
||
per :taras in person I was asked to disable a whole chunk of win64 production builders except for ~10-20
So since b-2008-ix are 17 machines, and we have 3 of the b-2008-sm nodes. So with :hwine's help we disabled all ~98 w64-ix-*'s, and will re-enable all 94 previously-enabled ones in a few hours.
CC'd all involved + sheriffs
Updated•11 years ago
|
Blocks: b-2008-sm-0001
Updated•11 years ago
|
Blocks: b-2008-sm-0002
Updated•11 years ago
|
Blocks: b-2008-sm-0003
Comment 7•11 years ago
|
||
Two burned jobs and one accidental kill, pretty nice for a Friday night deploy.
Reporter | ||
Comment 8•11 years ago
|
||
b-2008-sm-000{1..3} are now enabled in slavealloc.
They have executed some jobs, all green except for http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/Thunderbird%20comm-central%20win32%20l10n%20nightly/builds/2187, which looks a general build failure not related to this machine
Reporter | ||
Comment 9•11 years ago
|
||
Some results: w64-ix-slave129 vs. b-2008-sm-0002
* Builder WINNT 5.2 fx-team leak test build: 39 mins, 42 secs (w64-ix-slave129) [1]
* Builder WINNT 5.2 fx-team leak test build: 59 mins, 43 secs (seamicro ssd b-2008-sm-0002) [2]
w64-ix-slave129 | b-2008-sm-0002
hg_update 51 secs | 16 mins, 36 secs
compile 7 mins, 8 secs | 24 mins, 30 secs
make_buildsymbols 12 mins, 58 secs | 6 mins, 31 secs
make_pkg_tests 7 mins, 44 secs | 1 mins, 56 secs
make_pkg 'python 2 mins, 7 secs | 2 mins, 51 secs
[1] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/544
[2] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/542
Reporter | ||
Comment 10•11 years ago
|
||
more data:
The average compile time for the last 70 fx-team-win32-debug builds is 1654s, our seamicro spent 1471s compiling. 71% of builds are slower than it.
As soon we have more seamicro builds, I'll extract more data.
Reporter | ||
Comment 11•11 years ago
|
||
more data:
WINNT 5.2 mozilla-inbound leak test build complie step: b-2008-sm-000{2,3} are 85% faster than the last 70 builds. Average time: 1801s, seamicro build time 1493s.
Comment 12•11 years ago
|
||
I think the data would be more conclusive if they were building try instead of every other branch.
Comment 13•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #12)
> I think the data would be more conclusive if they were building try instead
> of every other branch.
Massimo, can you switch these machines over? try has much more consistent perf than having a machine cycle between branches.
Reporter | ||
Comment 14•11 years ago
|
||
moving b-2008-sm-* from WIN64_REV2 to TRY_WIN64_REV2
Attachment #8419257 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8419257 -
Flags: review?(catlee) → review+
Comment 15•11 years ago
|
||
We'll also need to update:
- slavealloc
- ssh keys
- vlan (maybe? :arr, any changes required to the network on these machines to do try builds?)
Flags: needinfo?(arich)
Reporter | ||
Updated•11 years ago
|
Attachment #8419257 -
Flags: checked-in+
Comment 16•11 years ago
|
||
These are not designed to be try machines, they're designed to be non-try builders. If you want them to be try instead, we'll need to change their FQDNs and move them to a different vlan.
Flags: needinfo?(arich)
Comment 17•11 years ago
|
||
In case that wasn't clear, renaming implies that they will be reinstalled, as well.
Comment 18•11 years ago
|
||
Let's do it.
Comment 19•11 years ago
|
||
Merged and deployed to production.
Comment 20•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #18)
When you say "do it," do you mean rename and reimage? It wasn't clear, since the next comment was that the existing stuff was deployed to production.
Comment 21•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #20)
> (In reply to Chris AtLee [:catlee] from comment #18)
>
> When you say "do it," do you mean rename and reimage? It wasn't clear,
> since the next comment was that the existing stuff was deployed to
> production.
Let's reimage the box.
Reporter | ||
Comment 22•11 years ago
|
||
b-2008-sm-0001 is now enabled in try/prod (https://bugzilla.mozilla.org/show_bug.cgi?id=1008170#c6) it is just waiting for a job
Reporter | ||
Comment 23•11 years ago
|
||
re-enabled b-2008-sm-0002 in slavealloc (try/prod)
Reporter | ||
Comment 24•11 years ago
|
||
At the moment, only b-2008-sm-0001 (non ssd), has completed some try jobs, here are some results:
WINNT 5.2 try leak test build, last 100 successful builds (in seconds):
average time: 3543
b-2008-sm-0001 average time: 2813
b-2008-sm-0001 is faster than 95% of the last 100 builds
WINNT 5.2 try leak test build, last 100 successful 'compile' steps (in seconds):
average time: 1681
b-2008-sm-0001 average time: 1269
b-2008-sm-0001 is faster than 96% of last 100 'compile' steps
seamicro machines are faster than other machines on try/build. As soon b-2008-sm-000{2,3} (ssd) get some jobs, we can extract some data on ssd vs. non ssd
Comment 25•10 years ago
|
||
massimo, don't ever use averages for perf comparison. Long tails mess those up. Our median try build time is ~3000s
Comment 26•10 years ago
|
||
Massimo or anyone else, do we have more datapoints for comparison yet?
eg seamicro vs world and ssd vs hd? Gotta get this prototype wrapped up.
Reporter | ||
Comment 27•10 years ago
|
||
Hi taras,
more data:
try-win32-debug, end to end time in seconds, last 100 jobs:
median: 3291
seamicro median: 2660
seamicro percentile: 91 st
try-win32, end to end time in seconds, last 100 jobs:
median: 3011
seamicro median: 2506
seamicro percentile: 95 th
try-win32-debug, hg_update, time in seconds, last 200 jobs:
hg_update median: 229
seamicro median: 171
percentile: 84 th
try-win32-debug, compile, time in seconds, last 200 jobs:
compile median: 1661
seamicro median: 1240
percentile: 95 th
b-2008-sm-0002 (ssd) has completed only 5 jobs, b-2008-sm-0001 has completed 30 jobs
Comment 28•10 years ago
|
||
we're trying to put more load through b-2008-sm-0002 so we can get more data there. there haven't been that many requests for windows builds on try recently!
in any case, it seems pretty clear that these machines are faster than the existing machines regardless of if we use SSDs.
shall we proceed with moving the rest the seamicros into production right away, or wait to get better data on the SSDs?
I'd prefer to get more of these in production ASAP, and to swap out the drives once we know if they're worth it, and actually have them in hand. Taras, Amy, thoughts?
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(arich)
Comment 29•10 years ago
|
||
Amy is out for 10 days or so, Laura is filling in.
I'm ok with ordering ssds.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(laura)
Flags: needinfo?(arich)
Comment 30•10 years ago
|
||
Laura, can you arrange for a shipment of ssds?
Comment 31•10 years ago
|
||
Chris, i'd still like to get some harddrive comparisons if possible
Reporter | ||
Comment 32•10 years ago
|
||
More data on ssd vs non ssd.
Seamicro machines successfully completed 50 try jobs
b-2008-sm-0001, non ssd, 40 samples, time in seconds
compile median: 1253
hg_update median: 180
b-2008-sm-0002, ssd, 10 samples, time in seconds
compile (ssd) median: 1142
hg_update (ssd) median: 144
Comment 33•10 years ago
|
||
Q, did the seamicro machines get non-removable + write cache settings set? eg bug 1004508
Flags: needinfo?(q)
Comment 34•10 years ago
|
||
They got the same config as all the builders. I am logging in to confirm now.
Flags: needinfo?(q)
Comment 35•10 years ago
|
||
I have a quote from the vendor, but have asked if they can do better. They are going to talk to Samsung then will get back to me.
Flags: needinfo?(laura)
Comment 36•10 years ago
|
||
(In reply to Laura Thomson :laura from comment #35)
> I have a quote from the vendor, but have asked if they can do better. They
> are going to talk to Samsung then will get back to me.
Looks like we are ready to order 16(64/4) 1TB ssds to be split up at 256gb volumes.
Derek, can you post shipping details in bugzilla(or email me if confidential) so you can receive these? Thanks.
Flags: needinfo?(dmoore)
Comment 37•10 years ago
|
||
(In reply to Taras Glek (:taras) from comment #36)
> Derek, can you post shipping details in bugzilla(or email me if
> confidential) so you can receive these? Thanks.
Ideally, direct them to Van Le using the shipping address here:
https://mana.mozilla.org/wiki/display/DC/SCL3#SCL3-Shipping
If you have them, please email tracking numbers to dcops@ so we can follow the shipment.
Flags: needinfo?(dmoore)
Comment 38•10 years ago
|
||
Had to break up order due to availability.
This week:
4ssds 1Z602AW24216628636
1ssd 1Z602AW24216628636
1ssd 94055036993003046639
Next week:
10ssds 1Z74E33W4247908876
Reporter | ||
Comment 39•10 years ago
|
||
moving b-2008-sm-00{01..32} in try and b-2008-sm-00{33..64} in build
Attachment #8414850 -
Attachment is obsolete: true
Attachment #8419257 -
Attachment is obsolete: true
Attachment #8437789 -
Flags: review?(rail)
Comment 40•10 years ago
|
||
Comment on attachment 8437789 [details] [diff] [review]
[buildbot-configs] add seamicro machines to production.patch
stamp
Attachment #8437789 -
Flags: review?(rail) → review+
Reporter | ||
Updated•10 years ago
|
Attachment #8437789 -
Flags: checked-in+
Comment 41•10 years ago
|
||
Live with reconfig on 2014-06-11 08:47 PT
Reporter | ||
Comment 42•10 years ago
|
||
moved 0004->0032 to try. Removed 0033 because it's already in slavealloc (https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c18)
Attachment #8413887 -
Attachment is obsolete: true
Attachment #8438604 -
Flags: review?(catlee)
Updated•10 years ago
|
Attachment #8438604 -
Flags: review?(catlee) → review+
Reporter | ||
Comment 43•10 years ago
|
||
The ssh keys have been uploaded and verified for all the seamicros machines (except for 0004 and 0031, https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c15)
All the seamicro hosts are now in slavealloc but only the following hosts are enabled:
(try)
b-2008-sm-0001
b-2008-sm-0002
b-2008-sm-0003
b-2008-sm-0005
b-2008-sm-0006
b-2008-sm-0007
b-2008-sm-0008
(build)
b-2008-sm-0033
b-2008-sm-0034
b-2008-sm-0035
b-2008-sm-0036
b-2008-sm-0037
waiting for few green jobs to enable them all.
Reporter | ||
Comment 44•10 years ago
|
||
All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are in production.
Comment 45•10 years ago
|
||
(In reply to Massimo Gervasini [:mgerva] from comment #44)
> All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are
> in production.
Due to this, dropping -0001..0003 from the list of blockers
Comment 46•10 years ago
|
||
These were put into production in June. Moving the dependency to bug 1047621 where the linking/CPU issues are being investigated.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•