Re-re-balance the Win64 try/non-try pools



5 years ago
3 months ago


(Reporter: philor, Unassigned)



(Whiteboard: [buildduty][capacity][buildslaves])



5 years ago
+++ This bug was initially created as a clone of Bug #917923 +++

As it turned out, bug 917923 picked a rather bad time to move a bunch of slaves to the try pool. A couple of days later, the day after wait time mails that said we had 0% wait on try and 1% wait on non-try, we stopped sending the mails.

Then everyone stopped working for the summit.

Then bug 925426 took a bunch of slaves away and reserved them for only non-inbound/central trees.

Today was probably the first genuine weekday we've had since the summit and _rev2, and at the time that I closed inbound for unrelated reasons, it took 45 minutes to get Windows builds for the tip-1 push.

Making people wait for their try Windows builds to start sucks, yes, but for most try pushes on a weekday, the long pole will be the 1 to 12 hour wait for Mac or Tegra tests to start, not the 15-75 minute waits for a Windows build to start we were seeing before the bug 917923 rebalancing. Making inbound wait 45 minutes for Windows builds to start will absolutely, without question, no possibility of debate, cause multi-hour tree closures. Probably tomorrow, if no bad code lands to cause an unrelated multi-hour closure like the one that saved us from having one blamed on infra today.

Comment 1

5 years ago
jhopkins, where are we with our transition? and what are the upcoming switch overs?

philor: I want to see the wait times for Tuesday. The ones from Monday did not show anything significant. Let's hope that after our next switch over to make things better.
Flags: needinfo?(jhopkins)
I've been mailing status to the dev-platform list.  I'm working on a firm timetable for inbound, try, and central, and will announce to that list when it's ready.
Flags: needinfo?(jhopkins)

Comment 3

5 years ago
8 builds, 3.7%, in the 45-59 minute bucket. I can't remember how or whether coalescing shows, is that how it would look if 8 times we coalesced together 45-59 minutes of inbound pushes to a single build, or would one coalescing of 8 45 minute old pending pushes into one build be listed as 8?
I moved 5 win64-rev1 build slaves from try to the production pool today.  See bug 930671.

Comment 5

5 years ago
I rebooted all of the w64 machines that had not taken a job for more than a day with slaveapi in hopes that they will come back up.

Normal buildduty actives should keep the wait times in better shape.

I think we are good with re-balancing.

# trypool #
win2k3: 236  0:      210    88.98%
win64: 14    0:       11    78.57%
# buildpool #
win2k3: 284  0:      267    94.01%
win64: 16    0:       16   100.00%
Last Resolved: 5 years ago
Resolution: --- → FIXED


3 months ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.