Closed
Bug 740853
Opened 12 years ago
Closed 12 years ago
Shuffle the tegras attached to foopies to reduce load
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: Callek)
Details
(Whiteboard: [capacity][foopy])
Attachments
(3 files, 1 obsolete file)
696 bytes,
patch
|
bear
:
review+
|
Details | Diff | Splinter Review |
10.18 KB,
patch
|
bear
:
review+
|
Details | Diff | Splinter Review |
2.21 KB,
patch
|
armenzg
:
review+
|
Details | Diff | Splinter Review |
We saw a few buildbot connection lost messages for tegra-190, which kills jobs, causes future and current jobs to fail with that tegra and overall is a pain. The likely reason is an overloaded foopy, and this foopy has quite a lot of tegras attached, lets try to balance them out a bit. Bear just stopped tegra-190 to try and reclaim some sanity in the short term.
Updated•12 years ago
|
Priority: -- → P2
Whiteboard: [buildduty][capacity][foopy]
Comment 1•12 years ago
|
||
On bug 740860 I show that we have more than 10 tegras per foopy (except foopy07). I will be reducing the number of tegras per foopy to 10 and see how we do on Monday/Tuesday. In bug 737415 I had suspicious that it is indeed a problem on why we loose connections. On another note, are *all* foopies the same rev?
Assignee: nobody → armenzg
Comment 2•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #1) > On bug 740860 I show that we have more than 10 tegras per foopy (except > foopy07). > > I will be reducing the number of tegras per foopy to 10 and see how we do on > Monday/Tuesday. > > In bug 737415 I had suspicious that it is indeed a problem on why we loose > connections. we can't reduce the tegra count to that level - it would remove too many tegras from the pool. I'm not saying this doesn't need doing as something has increased the end-to-end time for tegras and also is causing them to drop faster. > > On another note, are *all* foopies the same rev? no, we have 2 generations of foopies - all of the newer ones got assigned more tegras for that reason
Comment 3•12 years ago
|
||
I will be fixing this next week but at least I want to lay down the plan. FTR we can determine the foopies that are having connection lost by looking at the twistd.log on the buildbot masters. Look for error.ConnectionLost or RemoteCommand.interrupt (perhaps we have to ignore the reboot.py ones since it seems we always do). The IP of the foopy is the first one appearing on the line. I am choosing which tegras to disconnect by grabbing from the bottom of the list for each foopy. I don't know which ones are the newer foopies but here are their versions foopy[07-11] - 10.7.0 (386) foopy[12-17] - 10.4.1 (386) foopy[18-20,21-24] - 11.2.0 (64-bit) I will bring foopies 7-11 to 11 tegras and foopies 12-24 to 13. foopy07 contains 10 tegras: find one tegra to move to it foopy08 contains 13 tegras: tegra-201,tegra-202 foopy09 contains 13 tegras: tegra-064,tegra-065 foopy10 contains 13 tegras: tegra-078,tegra-203 foopy11 contains 13 tegras: tegra-090,tegra-091 foopy12 contains 13 tegras: no changes foopy13 contains 13 tegras: no changes foopy14 contains 17 tegras: tegra-156,tegra-157,tegra-194,tegra-195 foopy15 contains 14 tegras: tegra-146 foopy16 contains 15 tegras: tegra-171,tegra-172 foopy17 contains 16 tegras: tegra-198,tegra-204,tegra-205 foopy18 contains 14 tegras: tegra-219 foopy19 contains 14 tegras: tegra-287 foopy20 contains 14 tegras: tegra-286 foopy22 contains 14 tegras: tegra-261 foopy23 contains 14 tegras: tegra-275 foopy24 contains 12 tegras: find one tegra to move to it This means that we are reducing the number by 21 tegras from a total of 232 which is less than a 10% and we can have a known standard distribution.
Assignee | ||
Comment 4•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #3) > foopy13 contains 13 tegras: no changes Suggest dropping tegra-110 and bringing a different one over to here [110 is staging] > foopy23 contains 14 tegras: tegra-275 suggest tegra-268 instead [it is also staging]
Updated•12 years ago
|
Priority: P2 → P3
Updated•12 years ago
|
Assignee: armenzg → bear
Whiteboard: [buildduty][capacity][foopy] → [capacity][foopy]
Updated•12 years ago
|
Summary: Shuffle the tegras attached to foopy17 to reduce load → Shuffle the tegras attached to foopies to reduce load
Comment 5•12 years ago
|
||
Attachment #620874 -
Flags: review?(bugspam.Callek)
Assignee | ||
Comment 6•12 years ago
|
||
Comment on attachment 620874 [details] [diff] [review] update tegras.json and foopies.sh Some minor mistakes here, I'm going to fix it up and aim it back at you though.
Attachment #620874 -
Flags: review?(bugspam.Callek) → review-
Assignee | ||
Comment 7•12 years ago
|
||
This fixes the dashboard as discussed [needs testing]
Updated•12 years ago
|
Attachment #621056 -
Flags: review?(bear) → review+
Assignee | ||
Comment 8•12 years ago
|
||
After my latest patch here (about to attach): $ python tegras_per_foopy.py None contains 20 tegras foopy07 contains 11 tegras foopy08 contains 11 tegras foopy09 contains 11 tegras foopy10 contains 11 tegras foopy11 contains 11 tegras foopy12 contains 13 tegras foopy13 contains 13 tegras foopy14 contains 13 tegras foopy15 contains 14 tegras foopy16 contains 13 tegras foopy17 contains 13 tegras foopy18 contains 13 tegras foopy19 contains 13 tegras foopy20 contains 13 tegras foopy22 contains 13 tegras foopy23 contains 13 tegras foopy24 contains 13 tegras We have 232 tegras in 18 foopies which means a ratio of 12 tegras per foopy Looks like we'll need to (a) update this script, and (b) pull one of those 14 from foopy15 as well ;-) (not sure how we missed this earlier)
Assignee | ||
Comment 9•12 years ago
|
||
This updates both those files (for some minor misses in your first version), also adds a "_comment" key to the tegras.json in only places that warrant it. I plan to use that in my upcoming patch for the tegras_per_foopy, but am ok if you would rather have _no_ comments here. (We can't use traditional JS comments in json of course)
Attachment #620874 -
Attachment is obsolete: true
Attachment #621059 -
Flags: review?(bear)
Assignee | ||
Comment 10•12 years ago
|
||
This updates tegras-per-foopy to account for "None" and expands its usefulness, the output of this script with current patches is: Justin@ORION /d/sources/build-tools/buildfarm/mobile $ python tegras_per_foopy.py PRODUCTION: foopy07 contains 11 tegras foopy08 contains 11 tegras foopy09 contains 11 tegras foopy10 contains 11 tegras foopy11 contains 11 tegras foopy12 contains 13 tegras foopy13 contains 13 tegras foopy14 contains 13 tegras foopy15 contains 14 tegras foopy16 contains 13 tegras foopy17 contains 13 tegras foopy18 contains 13 tegras foopy19 contains 13 tegras foopy20 contains 13 tegras foopy22 contains 13 tegras foopy23 contains 13 tegras foopy24 contains 13 tegras We have 212 tegras in 17 foopies which means a ratio of 12 tegras per foopy STAGING: foopy05 contains 7 tegras foopy06 contains 16 tegras We have 23 tegras in 2 foopies which means a ratio of 11 tegras per foopy UNASSIGNED 5 With Comment: Bug 749637: Assigned to Sec-Team 15 (With no Comment)
Attachment #621071 -
Flags: review?(armenzg)
Comment 11•12 years ago
|
||
Comment on attachment 621071 [details] [diff] [review] Update tegras-per-foopy I am not sure how comfortable I feel about having to duplicate the slavealloc's notes into tegras.json but AFAIK there is no alternative. This is great. Thanks Callek.
Attachment #621071 -
Flags: review?(armenzg) → review+
Comment 12•12 years ago
|
||
Comment on attachment 621059 [details] [diff] [review] Update tegras.json and foopies.sj with the tweak of updating sec team loaner list
Attachment #621059 -
Flags: review?(bear) → review+
Assignee | ||
Comment 13•12 years ago
|
||
We finished this shuffling on Saturday
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•12 years ago
|
||
http://hg.mozilla.org/build/tools/rev/62af80b83592 http://hg.mozilla.org/build/tools/rev/f8456d3d98ef http://hg.mozilla.org/build/tools/rev/835037a1f4b9
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•