Closed Bug 1002634 Opened 11 years ago Closed 10 years ago

prepare the new seamicro machines for production

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: massimo, Unassigned)

References

Details

Attachments

(2 files, 4 obsolete files)

Tracking what needs to be done on releng side to bring the new 64 seamicro machines to production.
Add the new seamicro machines in buildbot-configs
Attachment #8413885 - Flags: review?(catlee)
Attachment #8413885 - Flags: review?(catlee) → review+
Attached file new-sea-micro-64.csv (obsolete) —
slavealloc import file for the new seamicro boxes (b-2008-sm-0001..003 are already in slavealloc).
Attachment #8413887 - Flags: review?(catlee)
Depends on: 982261
Attachment #8413887 - Flags: review?(catlee) → review+
Attachment #8413885 - Flags: checked-in+
I added bm-2008-sm to slave_health to rescue it from exceptions: http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8
(In reply to Nick Thomas [:nthomas] (PTO Apr 17-27 PST) from comment #3) > I added bm-2008-sm to slave_health to rescue it from exceptions: > http://hg.mozilla.org/build/slave_health/rev/6fe815c5eeb8 It doesn't affect Nick's patch, but I'm confused here based on nomenclature: are these machines running Linux or Windows? 'bm-2008' leads me to think they're Windows, but they've been added to the buildbot-configs as LINUX64_EC2: https://hg.mozilla.org/build/buildbot-configs/rev/44027fc7bced#l1.14
Hi coop, you're right! the seamicro are WIN64_REV2 slaves
Attachment #8413885 - Attachment is obsolete: true
Attachment #8414850 - Flags: review?(coop)
Attachment #8414850 - Flags: review?(coop) → review+
Attachment #8414850 - Flags: checked-in+
per :taras in person I was asked to disable a whole chunk of win64 production builders except for ~10-20 So since b-2008-ix are 17 machines, and we have 3 of the b-2008-sm nodes. So with :hwine's help we disabled all ~98 w64-ix-*'s, and will re-enable all 94 previously-enabled ones in a few hours. CC'd all involved + sheriffs
Two burned jobs and one accidental kill, pretty nice for a Friday night deploy.
b-2008-sm-000{1..3} are now enabled in slavealloc. They have executed some jobs, all green except for http://buildbot-master86.srv.releng.scl3.mozilla.com:8001/builders/Thunderbird%20comm-central%20win32%20l10n%20nightly/builds/2187, which looks a general build failure not related to this machine
Some results: w64-ix-slave129 vs. b-2008-sm-0002 * Builder WINNT 5.2 fx-team leak test build: 39 mins, 42 secs (w64-ix-slave129) [1] * Builder WINNT 5.2 fx-team leak test build: 59 mins, 43 secs (seamicro ssd b-2008-sm-0002) [2] w64-ix-slave129 | b-2008-sm-0002 hg_update 51 secs | 16 mins, 36 secs compile 7 mins, 8 secs | 24 mins, 30 secs make_buildsymbols 12 mins, 58 secs | 6 mins, 31 secs make_pkg_tests 7 mins, 44 secs | 1 mins, 56 secs make_pkg 'python 2 mins, 7 secs | 2 mins, 51 secs [1] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/544 [2] http://buildbot-master85.srv.releng.scl3.mozilla.com:8001/builders/WINNT%205.2%20fx-team%20leak%20test%20build/builds/542
more data: The average compile time for the last 70 fx-team-win32-debug builds is 1654s, our seamicro spent 1471s compiling. 71% of builds are slower than it. As soon we have more seamicro builds, I'll extract more data.
more data: WINNT 5.2 mozilla-inbound leak test build complie step: b-2008-sm-000{2,3} are 85% faster than the last 70 builds. Average time: 1801s, seamicro build time 1493s.
I think the data would be more conclusive if they were building try instead of every other branch.
(In reply to Mike Hommey [:glandium] from comment #12) > I think the data would be more conclusive if they were building try instead > of every other branch. Massimo, can you switch these machines over? try has much more consistent perf than having a machine cycle between branches.
moving b-2008-sm-* from WIN64_REV2 to TRY_WIN64_REV2
Attachment #8419257 - Flags: review?(catlee)
Attachment #8419257 - Flags: review?(catlee) → review+
We'll also need to update: - slavealloc - ssh keys - vlan (maybe? :arr, any changes required to the network on these machines to do try builds?)
Flags: needinfo?(arich)
Attachment #8419257 - Flags: checked-in+
These are not designed to be try machines, they're designed to be non-try builders. If you want them to be try instead, we'll need to change their FQDNs and move them to a different vlan.
Flags: needinfo?(arich)
In case that wasn't clear, renaming implies that they will be reinstalled, as well.
Let's do it.
Merged and deployed to production.
(In reply to Chris AtLee [:catlee] from comment #18) When you say "do it," do you mean rename and reimage? It wasn't clear, since the next comment was that the existing stuff was deployed to production.
(In reply to Amy Rich [:arich] [:arr] from comment #20) > (In reply to Chris AtLee [:catlee] from comment #18) > > When you say "do it," do you mean rename and reimage? It wasn't clear, > since the next comment was that the existing stuff was deployed to > production. Let's reimage the box.
Depends on: 1008170
b-2008-sm-0001 is now enabled in try/prod (https://bugzilla.mozilla.org/show_bug.cgi?id=1008170#c6) it is just waiting for a job
re-enabled b-2008-sm-0002 in slavealloc (try/prod)
At the moment, only b-2008-sm-0001 (non ssd), has completed some try jobs, here are some results: WINNT 5.2 try leak test build, last 100 successful builds (in seconds): average time: 3543 b-2008-sm-0001 average time: 2813 b-2008-sm-0001 is faster than 95% of the last 100 builds WINNT 5.2 try leak test build, last 100 successful 'compile' steps (in seconds): average time: 1681 b-2008-sm-0001 average time: 1269 b-2008-sm-0001 is faster than 96% of last 100 'compile' steps seamicro machines are faster than other machines on try/build. As soon b-2008-sm-000{2,3} (ssd) get some jobs, we can extract some data on ssd vs. non ssd
massimo, don't ever use averages for perf comparison. Long tails mess those up. Our median try build time is ~3000s
Massimo or anyone else, do we have more datapoints for comparison yet? eg seamicro vs world and ssd vs hd? Gotta get this prototype wrapped up.
Hi taras, more data: try-win32-debug, end to end time in seconds, last 100 jobs: median: 3291 seamicro median: 2660 seamicro percentile: 91 st try-win32, end to end time in seconds, last 100 jobs: median: 3011 seamicro median: 2506 seamicro percentile: 95 th try-win32-debug, hg_update, time in seconds, last 200 jobs: hg_update median: 229 seamicro median: 171 percentile: 84 th try-win32-debug, compile, time in seconds, last 200 jobs: compile median: 1661 seamicro median: 1240 percentile: 95 th b-2008-sm-0002 (ssd) has completed only 5 jobs, b-2008-sm-0001 has completed 30 jobs
we're trying to put more load through b-2008-sm-0002 so we can get more data there. there haven't been that many requests for windows builds on try recently! in any case, it seems pretty clear that these machines are faster than the existing machines regardless of if we use SSDs. shall we proceed with moving the rest the seamicros into production right away, or wait to get better data on the SSDs? I'd prefer to get more of these in production ASAP, and to swap out the drives once we know if they're worth it, and actually have them in hand. Taras, Amy, thoughts?
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(arich)
Amy is out for 10 days or so, Laura is filling in. I'm ok with ordering ssds.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(laura)
Flags: needinfo?(arich)
Laura, can you arrange for a shipment of ssds?
Chris, i'd still like to get some harddrive comparisons if possible
More data on ssd vs non ssd. Seamicro machines successfully completed 50 try jobs b-2008-sm-0001, non ssd, 40 samples, time in seconds compile median: 1253 hg_update median: 180 b-2008-sm-0002, ssd, 10 samples, time in seconds compile (ssd) median: 1142 hg_update (ssd) median: 144
Q, did the seamicro machines get non-removable + write cache settings set? eg bug 1004508
Flags: needinfo?(q)
They got the same config as all the builders. I am logging in to confirm now.
Flags: needinfo?(q)
Depends on: 1014700
Depends on: 1014703
I have a quote from the vendor, but have asked if they can do better. They are going to talk to Samsung then will get back to me.
Flags: needinfo?(laura)
(In reply to Laura Thomson :laura from comment #35) > I have a quote from the vendor, but have asked if they can do better. They > are going to talk to Samsung then will get back to me. Looks like we are ready to order 16(64/4) 1TB ssds to be split up at 256gb volumes. Derek, can you post shipping details in bugzilla(or email me if confidential) so you can receive these? Thanks.
Flags: needinfo?(dmoore)
(In reply to Taras Glek (:taras) from comment #36) > Derek, can you post shipping details in bugzilla(or email me if > confidential) so you can receive these? Thanks. Ideally, direct them to Van Le using the shipping address here: https://mana.mozilla.org/wiki/display/DC/SCL3#SCL3-Shipping If you have them, please email tracking numbers to dcops@ so we can follow the shipment.
Flags: needinfo?(dmoore)
Had to break up order due to availability. This week: 4ssds 1Z602AW24216628636 1ssd 1Z602AW24216628636 1ssd 94055036993003046639 Next week: 10ssds 1Z74E33W4247908876
Depends on: 1017126
moving b-2008-sm-00{01..32} in try and b-2008-sm-00{33..64} in build
Attachment #8414850 - Attachment is obsolete: true
Attachment #8419257 - Attachment is obsolete: true
Attachment #8437789 - Flags: review?(rail)
Comment on attachment 8437789 [details] [diff] [review] [buildbot-configs] add seamicro machines to production.patch stamp
Attachment #8437789 - Flags: review?(rail) → review+
Attachment #8437789 - Flags: checked-in+
Live with reconfig on 2014-06-11 08:47 PT
Attached file seamicro.csv
moved 0004->0032 to try. Removed 0033 because it's already in slavealloc (https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c18)
Attachment #8413887 - Attachment is obsolete: true
Attachment #8438604 - Flags: review?(catlee)
Attachment #8438604 - Flags: review?(catlee) → review+
The ssh keys have been uploaded and verified for all the seamicros machines (except for 0004 and 0031, https://bugzilla.mozilla.org/show_bug.cgi?id=1014703#c15) All the seamicro hosts are now in slavealloc but only the following hosts are enabled: (try) b-2008-sm-0001 b-2008-sm-0002 b-2008-sm-0003 b-2008-sm-0005 b-2008-sm-0006 b-2008-sm-0007 b-2008-sm-0008 (build) b-2008-sm-0033 b-2008-sm-0034 b-2008-sm-0035 b-2008-sm-0036 b-2008-sm-0037 waiting for few green jobs to enable them all.
All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are in production.
Depends on: 1027983
(In reply to Massimo Gervasini [:mgerva] from comment #44) > All the seamicro machines, except for b-2008-sm-0004 and b-2008-sm-0031, are > in production. Due to this, dropping -0001..0003 from the list of blockers
No longer depends on: 1027983
These were put into production in June. Moving the dependency to bug 1047621 where the linking/CPU issues are being investigated.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: