Closed Bug 999474 Opened 8 years ago Closed 8 years ago

make recommendation on "quickest & easiest" way to avoid multiple b2g device builds starting at same time

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: catlee)

References

Details

Attachments

(2 files, 1 obsolete file)

Looking for a quick bandaid to stop the tree impacts while a long term solution is devised.

We're getting bit when multiple branches start their device builds at the same time. We stopped the major meltdown by adjusting the start time of the nightly builds (bug 998035).

We're now getting hit when changes (merges) are made within the same "periodic scheduler" window.

On the surface, there are two "simple" options to prevent this from happening:
  a) convert the "periodic scheduler" usage to "nightly schedule" and map out the times to ensure no multi-start times

  b) tweak the "periodic scheduler" values to avoid simultaneous starts (there are alternate configuration parameters to that scheduler)

Both appear to have varying degrees of side effects and coding changes required to implement & deploy.

If one of these, or similar, is a low hanging fruit, it would be very helpful in buying time for the long term fix.
#releng convo leading to bug:

07:10 < hwine-ooo> Tomcat: correct -- device builds are not dep (build on
                   commit), but periodic
07:10 < bhearsum> hwine-ooo: i'm pretty sure periodic builds only happen if
                  there's been a push since the last periodic job
07:11 < hwine-ooo> bhearsum: perhaps -- we had a push this time -- what's
                   killing us is that periodics always start in lockstep, since
                   those schedulers were all added at the same time (last
                   buildbot startup)
07:12 < bhearsum> i don't think it has to do with when they were added - that's
                  controlled by
https://github.com/mozilla/build-buildbotcustom/blob/master/misc.py#L1412
07:12 < hwine-ooo> bhearsum: that's why I want to open a bug for someone with
                   more caffeine than I to consider if we win converting the
                   periodics to use the nightly scheduler -- that was we could
                   ensure only one branch at a time starts
07:13 < bhearsum> in any case, i was just trying to point out why they'd happen
                  some days but not others
07:13 < hwine-ooo> at the moment, looking for a better bandaid, not a fix
07:13 < hwine-ooo> bhearsum: ah right -- thanks! see comment about no caffeine
                   yet
07:13 < bhearsum> we could probably add a periodic_offset to adjust those times
                  per branch
07:14 < bhearsum> as you can see, right now you can only change the interval
                  offset from 00:00h
07:14 < bhearsum> er, s/offset //
07:15 < hwine-ooo> right, but that's a code change -- wouldn't just converting
                   to nightly in the configs be easier, and allow more control?
07:15 < bhearsum> only do them nightly instead of multiple times throughout the
                  day
07:15 < bhearsum> ?
07:16 < hwine-ooo> No -- we can have multiple start hours for nightly builders
                   (we use that already), so just schedule to start at [1, 7,
                   13, 19,] instead of periodic 6
07:16 < bhearsum> ah
07:16 < bhearsum> perhaps
07:16 < bhearsum> that's a code change too - whether or not something is a
                  nightly is misc.py code
07:16 < hwine-ooo> with nightly, we can also adjust start minute
07:16 < bhearsum> but we may lose the don't-trigger-if-no-changes part as it
                  stands right now
07:16 < bhearsum> can't recall
07:17 < bhearsum> actually
07:17 < hwine-ooo> yeah - hence a bug to check. Likely not as simple as it
                   looks, or we would have done it already
07:17 < bhearsum> i think periodic and nightly already use the same scheduler
07:17 < bhearsum> we might just be able to pass hour=[] and minute=[] instead
                  of hour=range()
07:17 < bhearsum> https://github.com/mozilla/build-buildbotcustom/blob/master/misc.py#L1390 nightly schedulers
07:18 < bhearsum> https://github.com/mozilla/build-buildbotcustom/blob/master/misc.py#L1404 periodic
07:18 < bhearsum> both use SpecificNightly
07:18 < bhearsum> that's probably the easiest way to do it
07:18 < bhearsum> get rid of periodic_interval and replace it with
                  periodic_{hour,minute}
07:19  * hwine-ooo trusts bhearsum :)
07:19 < bhearsum> trust, but verify
07:20 <@catlee> we could randomize them a bit...
07:20 <@catlee> not sure if that's better or worse
07:20 <@catlee> helps with load
07:20 <@catlee> would be confusing
Attachment #8411046 - Flags: review?(bhearsum)
Hal, which branches do we want starting on what minutes?
Flags: needinfo?(hwine)
Assignee: nobody → catlee
Comment on attachment 8411046 [details] [diff] [review]
support periodic_start_minute/hour for periodic schedulers

Review of attachment 8411046 [details] [diff] [review]:
-----------------------------------------------------------------

::: misc.py
@@ +1404,5 @@
> +        if 'periodic_start_hours' in config:
> +            hour = config['periodic_start_hours']
> +        else:
> +            hour = range(0, 24, config['periodic_interval'])
> +        minute = config.get('periodic_start_minute', 0)

Seems like this would be cleaner if you dropped support for periodic_interval. Maybe the default for periodic_start_hours can be range(0,25,4)? Not a hard blocker though.
Attachment #8411046 - Flags: review?(bhearsum) → review+
Catlee -- we don't care what happens when -- just that they don't happen all at once. :) maximum spread is desired, but anything will help.

If there's an order that helps b2g or QA, that's fine
Flags: needinfo?(hwine)
I was thinking the project branch loop in b2g_config.py could have a wraparound counter for the offset.
First branch in gets 0, 2nd gets 1, 3rd gets 2... 6th gets 0, 7th gets 1, ...
yeah, removing support for periodic_interval is cleaner
Attachment #8411046 - Attachment is obsolete: true
Attachment #8411149 - Flags: review?(bhearsum)
results in these periodic/nightly schedules: (hour, minute) [list of schedulers]

(0, 2) ['b2g_mozilla-aurora nightly', 'b2g_mozilla-b2g30_v1_4 nightly']
(0, 5) ['mozilla-esr24 nightly']
(0, 30) ['ash periodic', 'b2g_ash periodic', 'b2g_build-system periodic', 'b2g_cedar periodic', 'b2g_cypress periodic', 'b2g_fig periodic', 'b2g_graphics periodic', 'b2g_gum periodic', 'b2g_jamun periodic', 'b2g_larch periodic', 'b2g_maple periodic', 'b2g_mozilla-aurora periodic', 'b2g_mozilla-b2g30_v1_4 periodic', 'b2g_oak periodic', 'b2g_pine periodic', 'b2g_services-central periodic', 'birch periodic', 'build-system periodic', 'cedar periodic', 'cypress periodic', 'date periodic', 'elm periodic', 'fig periodic', 'graphics periodic', 'gum periodic', 'jamun periodic', 'larch periodic', 'maple periodic', 'oak periodic', 'services-central periodic', 'ux periodic']
(0, 40) ['comm-aurora nightly', 'mozilla-aurora nightly']
(1, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(1, 40) ['b2g_mozilla-b2g28_v1_3t nightly']
(2, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(2, 40) ['b2g_mozilla-b2g28_v1_3 nightly']
(3, 2) ['comm-central nightly', 'comm-esr24 nightly', 'mozilla-central nightly']
(3, 40) ['b2g_mozilla-b2g26_v1_2 nightly']
(3, 45) ['mozilla-b2g28_v1_3 nightly']
(4, 2) ['b2g_mozilla-central nightly', 'oak nightly', 'ux nightly']
(4, 12) ['b2g_mozilla-b2g18 nightly']
(4, 22) ['b2g_mozilla-b2g18_v1_1_0_hd nightly']
(4, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(4, 42) ['b2g_oak nightly']
(5, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(6, 30) ['ash periodic', 'b2g_ash periodic', 'b2g_build-system periodic', 'b2g_cedar periodic', 'b2g_cypress periodic', 'b2g_fig periodic', 'b2g_graphics periodic', 'b2g_gum periodic', 'b2g_jamun periodic', 'b2g_larch periodic', 'b2g_maple periodic', 'b2g_mozilla-aurora periodic', 'b2g_mozilla-b2g30_v1_4 periodic', 'b2g_oak periodic', 'b2g_pine periodic', 'b2g_services-central periodic', 'birch periodic', 'build-system periodic', 'cedar periodic', 'cypress periodic', 'date periodic', 'elm periodic', 'fig periodic', 'graphics periodic', 'gum periodic', 'jamun periodic', 'larch periodic', 'maple periodic', 'oak periodic', 'services-central periodic', 'ux periodic']
(7, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(8, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(10, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(11, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(12, 30) ['ash periodic', 'b2g_ash periodic', 'b2g_build-system periodic', 'b2g_cedar periodic', 'b2g_cypress periodic', 'b2g_fig periodic', 'b2g_graphics periodic', 'b2g_gum periodic', 'b2g_jamun periodic', 'b2g_larch periodic', 'b2g_maple periodic', 'b2g_mozilla-aurora periodic', 'b2g_mozilla-b2g30_v1_4 periodic', 'b2g_oak periodic', 'b2g_pine periodic', 'b2g_services-central periodic', 'birch periodic', 'build-system periodic', 'cedar periodic', 'cypress periodic', 'date periodic', 'elm periodic', 'fig periodic', 'graphics periodic', 'gum periodic', 'jamun periodic', 'larch periodic', 'maple periodic', 'oak periodic', 'services-central periodic', 'ux periodic']
(13, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(14, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(16, 2) ['b2g_mozilla-aurora nightly', 'b2g_mozilla-b2g30_v1_4 nightly', 'b2g_mozilla-central nightly']
(16, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(16, 40) ['b2g_mozilla-b2g28_v1_3t nightly']
(17, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(18, 30) ['ash periodic', 'b2g_ash periodic', 'b2g_build-system periodic', 'b2g_cedar periodic', 'b2g_cypress periodic', 'b2g_fig periodic', 'b2g_graphics periodic', 'b2g_gum periodic', 'b2g_jamun periodic', 'b2g_larch periodic', 'b2g_maple periodic', 'b2g_mozilla-aurora periodic', 'b2g_mozilla-b2g30_v1_4 periodic', 'b2g_oak periodic', 'b2g_pine periodic', 'b2g_services-central periodic', 'birch periodic', 'build-system periodic', 'cedar periodic', 'cypress periodic', 'date periodic', 'elm periodic', 'fig periodic', 'graphics periodic', 'gum periodic', 'jamun periodic', 'larch periodic', 'maple periodic', 'oak periodic', 'services-central periodic', 'ux periodic']
(19, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(20, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
(22, 30) ['b2g_mozilla-central periodic', 'b2g_mozilla-inbound periodic', 'mozilla-central periodic', 'mozilla-inbound periodic']
(23, 30) ['b2g-inbound periodic', 'b2g_b2g-inbound periodic', 'b2g_fx-team periodic', 'fx-team periodic']
Attachment #8411154 - Flags: review?(bhearsum)
Comment on attachment 8411149 [details] [diff] [review]
support periodic_start_minute/hour for periodic schedulers

Review of attachment 8411149 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good to me.
Attachment #8411149 - Flags: review?(bhearsum) → review+
Attachment #8411154 - Flags: review?(bhearsum) → review+
Attachment #8411149 - Flags: checked-in+
Attachment #8411154 - Flags: checked-in+
buildbot-config patch live in production: http://hg.mozilla.org/build/buildbot-configs/rev/5fce6d67a084 :)
buildbotcustom patch live in production: http://hg.mozilla.org/build/buildbotcustom/rev/b60200f1b78c :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
WOOT!!!! Thanks thanks THANKS!
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.