849385 - Avoid unnecessary buildbot builds for more parts of the tree

Reporter

Description

•

12 years ago

Bug 787449 added functionality (currently only enabled for mozilla-inbound, but due to be turned on for more trees once bug 803572 is fixed) that means for certain parts of the tree (eg browser/*, mobile/*) we only build the products/platforms that will be affected by the change, rather than all of them. The current directory list is: http://hg.mozilla.org/build/buildbotcustom/file/214dde15ffed/misc.py#l125 125 _product_excludes = { 126 'firefox': [ 127 re.compile('^CLOBBER'), 128 re.compile('^mobile/'), 129 re.compile('^b2g/'), 130 ], 131 'mobile': [ 132 re.compile('^CLOBBER'), 133 re.compile('^browser/'), 134 re.compile('^b2g/'), 135 re.compile('^xulrunner/'), 136 ], 137 'b2g': [ 138 re.compile('^CLOBBER'), 139 re.compile('^browser/'), 140 re.compile('^mobile/'), 141 re.compile('^xulrunner/'), 142 ], 143 'thunderbird': [ 144 re.compile('^CLOBBER'), 145 re.compile("^suite/") 146 ], 147 } This list was only intended as a safe starting point - we should now expand it.

Ed Morley [:emorley]

Reporter

Updated

•

12 years ago

Summary: Expand the buildbotcustom platform-specific directory list, to avoid unnecessary builds for more parts of the tree → Avoid unnecessary buildbot builds for more parts of the tree

Ed Morley [:emorley]

Reporter

Comment 1

•

12 years ago

I've grepped Makefiles, moz.build files & the tree directory listing looking for conditionals/platform specific directory names, processed it out and made a first pass at a shortlist. I need to double check each entry, but pasting here for now so I don't lose the list between now and when I'm back off of PTO next week: firefox-exclude: build/mobile/ dom/cellbroadcast dom/fm dom/icc dom/payment dom/plugins/base/android dom/system/android dom/system/gonk dom/telephony dom/voicemail dom/wifi embedding/android hal/android hal/gonk ipc/dbus ipc/netd ipc/ril media/omx-plugin mozglue/android mozglue/linker netwerk/protocol/device/gonk netwerk/system/android other-licenses/android toolkit/system/androidproxy widget/android widget/gonk mobile-exclude: accessible/public/ia2 accessible/public/msaa accessible/src/mac accessible/src/windows build/macosx build/package/mac_osx build/win32 build/win64 dom/cellbroadcast dom/fm dom/icc dom/payment dom/system/gonk dom/system/windows dom/telephony dom/voicemail dom/wifi hal/cocoa hal/gonk hal/linux hal/windows intl/locale/src/mac intl/locale/src/windows ipc/dbus ipc/netd ipc/ril netwerk/protocol/device/gonk toolkit/system/osxproxy toolkit/system/windowsproxy toolkit/themes/osx webapprt/mac webapprt/win widget/cocoa widget/gonk widget/windows b2g-exclude: accessible/public/ia2 accessible/public/msaa accessible/src/mac accessible/src/windows build/macosx build/mobile/ build/package/mac_osx build/win32 build/win64 dom/plugins/base/android dom/system/android dom/system/windows embedding/android hal/android hal/cocoa hal/linux hal/windows intl/locale/src/mac intl/locale/src/windows mozglue/android netwerk/system/android toolkit/system/androidproxy toolkit/system/osxproxy toolkit/system/windowsproxy toolkit/themes/osx webapprt/mac webapprt/win widget/android widget/cocoa widget/windows

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

Depends on: 832008

Matt Brubeck (:mbrubeck)

Comment 2

•

12 years ago

There's also code that is platform-specific within the "firefox" application, e.g. bug 832008. I'd be wary of excluding things like dom/payment that are currently B2G-only but could conceivably be used on other platforms in the future, unless we have a good way to detect when we need to un-exclude them. Maybe we could automatically generate a list from the moz.build files?

Gregory Szorc [:gps]

Comment 3

•

12 years ago

This bug scares me because it is making guesses about the build system while not living in the build system itself. While I'm sympathetic to what's being done here because the build system is inefficient, I think the code for excluding parts of the build should live in the tree. As currently implemented, when something on buildbot goes bad, we now have to figure out whether its a bug in the build system proper or in a non-visible exclusion list defined in the builder config. This seems fragile.

Mike Hommey [:glandium]

Comment 4

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #3) > This bug scares me because it is making guesses about the build system while > not living in the build system itself. While I'm sympathetic to what's being > done here because the build system is inefficient, I think the code for > excluding parts of the build should live in the tree. As currently > implemented, when something on buildbot goes bad, we now have to figure out > whether its a bug in the build system proper or in a non-visible exclusion > list defined in the builder config. This seems fragile. The thing is, it usually takes more than 10 minutes or build slaves to even get to the point where they can start looking at the mozilla source tree.

Ed Morley [:emorley]

Reporter

Comment 5

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #3) > As currently > implemented, when something on buildbot goes bad, we now have to figure out > whether its a bug in the build system proper or in a non-visible exclusion > list defined in the builder config. This seems fragile. Not really; since "gone bad" will mean no builds visible on TBPL, which is quite distinct from Frankenstein builds due to issues with in-tree build system dependencies. I completely agree that it's frustrating to have to re-invent part of the buildsystem logic in buildbot - but given that at the moment, a build has no way of determining what changesets/files were changed to cause that build, there is no way to achieve this in the build system. Even if there were, we'd still be tying up the build slave to make that decision, rather than avoiding using one for that push at all. Infra load is at bursting point at the moment, this bug (which is only a continuation of bug 787449) seems like one of the easiest quick wins we have. That said, I'm hopeful you might have some ideas for a way to export a list from our moz.build files, which we can import to buildbot on a schedule, and thus ensure we don't have surprises later on.

Mike Hommey [:glandium]

Comment 6

•

12 years ago

It's also not about the build system being inefficient, it's about not doing builds we don't need. At all.

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

12 years ago

To echo what I said on IRC: it's not really possible to make scheduling decisions based on tree information right now. We have some longer term plans that might allow for this sort of thing to be more tightly coupled, though.

Gregory Szorc [:gps]

Comment 8

•

12 years ago

The moz.build files today will tell you which directories are relevant for the current build configuration. I could see us using their traversal metadata to determine whether a changeset applies to the current build configuration. Something to think about for the future.

Ed Morley [:emorley]

Reporter

Comment 9

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #8) > The moz.build files today will tell you which directories are relevant for > the current build configuration. I could see us using their traversal > metadata to determine whether a changeset applies to the current build > configuration. Something to think about for the future. Unfortunately a buildbot job has no sensible way to know what changes occurred in that push (the buildbot scheduler only get the tipmost cset aiui). Even if we did, there's still comment 4 (though we can probably reduce the clownshoes that cause part of the 10+mins spent before configure).

Mike Hommey [:glandium]

Comment 10

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #8) > The moz.build files today will tell you which directories are relevant for > the current build configuration. I could see us using their traversal > metadata to determine whether a changeset applies to the current build > configuration. Something to think about for the future. DIRS info doesn't tell the whole story. There are places, like ipc/, where we build files from subdirectories, or others, like some subdirectories of xpcom/ where we copy files from other subdirectories of xpcom/, etc. Until moz.build have a complete view of the build, it's going to be skewed.

Gregory Szorc [:gps]

Comment 11

•

12 years ago

I filed bug 851270 to eliminate a very large chunk from the 10+ minutes pre-configure.

Mike Hommey [:glandium]

Comment 12

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #11) > I filed bug 851270 to eliminate a very large chunk from the 10+ minutes > pre-configure. And I filed bug 851294 to eliminate another large chunk on linux.

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Ed Morley [:emorley]

Reporter

Comment 13

•

11 years ago

I may come back to this - but will need a bit of sifting through the platform conditionals in moz.build (or a dump of them vs directories by parsing), so offloading for now.

Assignee: emorley → nobody

Status: ASSIGNED → NEW

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Depends on: 1055009

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Depends on: 1001516

Matt Brubeck (:mbrubeck)

Updated

•

10 years ago

Depends on: 1132771

:Gijs (he/him)

Comment 14

•

10 years ago

Attached patch improve exclusions so we build less and hopefully mitigate slave usage, — Details — Splinter Review

I am aware this is imperfect. I am aware taskcluster is going away, which means we have to redo this. I am aware that sticking this logic in here is not great from a maintainability perspective. However, I am also aware that we keep running out of Windows slaves, that trypushes take ages to run, and that new slaves are apparently not coming for some months, and that backlogs repeatedly close trees, all of which is painful to me as a developer. This needs to get fixed. The android and gonk plain substring matches are purposefully aggressive: you shouldn't be writing code in a file called 'gonk' that you expect to run on desktop anyway. Coalescing will mean that even if we run too few builds on a rare occasion, it'll likely get fixed by the next push anyway. I'm also happy to land a subset of these changes, but either way, IMO something needs to be done sooner rather than later. Perfect is the enemy of good, etc. So um, as a final note: I can't test this, so I hope my python ain't broken. Please check.

Attachment #8680821 - Flags: review?(emorley)

Ed Morley [:emorley]

Reporter

Comment 15

•

10 years ago

Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, It's been so long since I looked at this that I'm starting from scratch. I'm also not a peer of buildbot code. Perhaps someone from releng / the build system should review? Thank you for writing a patch for this though - would be great to see it fixed :-)

Attachment #8680821 - Flags: review?(emorley)

:Gijs (he/him)

Comment 16

•

10 years ago

Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Redirecting review per feedback in #developers . Please see comment 14 for disclaimers.

Attachment #8680821 - Flags: review?(ted)

(not currently active) Ted Mielczarek

Comment 17

•

10 years ago

Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Sorry, but I've never seen this code before. I'm going to pass this off to catlee, hopefully either he can review this or tag someone else on his team.

Attachment #8680821 - Flags: review?(ted) → review?(catlee)

Chris AtLee [:catlee]

Comment 18

•

10 years ago

Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Review of attachment 8680821 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thanks for the patch! A few minor tweaks perhaps to be made. Syntax looks fine. Agreed that some tests would be great, but won't block on that! ::: misc.py @@ +120,5 @@ > + re.compile('^webapprt/win'), > + re.compile('^webapprt/mac'), > + > + re.compile('^widget/cocoa'), > + re.compile('^widget/gonk'), is this handled by the more general 'gonk' filter? @@ +155,3 @@ > ], > 'thunderbird': [ > re.compile('^CLOBBER'), I think this can go too now that it's in __all?

Attachment #8680821 - Flags: review?(catlee) → review+

Chris AtLee [:catlee]

Comment 19

•

10 years ago

Oh, I lied, there are tests! tests/test_misc_important.py - and they still pass with your patch. In addition, I've looked at the impact of this change using pushlog data from mozilla-inbound from 2015-10-01 until today. number of builds triggered per product with existing product exclusion list: firefox 1791 b2g 1770 mobile 1771 number of builds triggered per product with new product exclusion list: firefox 1770 b2g 1741 mobile 1736 So we do see some improvement in number of builds scheduled.

:Gijs (he/him)

Comment 20

•

10 years ago

(In reply to Chris AtLee [:catlee] from comment #19) > Oh, I lied, there are tests! tests/test_misc_important.py - and they still > pass with your patch. > > In addition, I've looked at the impact of this change using pushlog data > from mozilla-inbound from 2015-10-01 until today. > > number of builds triggered per product with existing product exclusion list: > firefox 1791 > b2g 1770 > mobile 1771 > > number of builds triggered per product with new product exclusion list: > firefox 1770 > b2g 1741 > mobile 1736 > > So we do see some improvement in number of builds scheduled. Hrmpf, 1.6% (85 / 5332) isn't too much though. How easy is it to add per-desktop-platform (linux/windows/osx) build filters as well? Anywho: remote: https://hg.mozilla.org/build/buildbotcustom/rev/72d2e5e1b534

:Gijs (he/him)

Comment 21

•

10 years ago

Oh, though tbf, you checked mozilla-inbound - I somewhat expect slightly bigger wins on fx-team and b2g-inbound (because a larger proportion of checkins there is likely to be b2g/android/desktop-frontend-specific, which means we won't need to build other products)

Chris AtLee [:catlee]

Comment 22

•

10 years ago

b2g-inbound doesn't show much of an improvement either. before: firefox 208 b2g 761 mobile 217 after: firefox 197 b2g 761 mobile 208 fx-team shows a pretty big win though! before: firefox 564 b2g 426 mobile 549 after: firefox 554 b2g 333 mobile 459

:Gijs (he/him)

Comment 23

•

10 years ago

(In reply to Chris AtLee [:catlee] from comment #22) > b2g-inbound doesn't show much of an improvement either. > > before: > firefox 208 > b2g 761 > mobile 217 > > after: > firefox 197 > b2g 761 > mobile 208 Hm. I'm not a b2g expert, but it might be worth looking at that in more detail? I thought most of these were basically gaia github bot thingy updates? Those should match the ^b2g/ exclude for desktop that is already there, and they clearly aren't if I'm looking at treeherder: https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=89e5ec9bc73b https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=68e40aa0dbd0 https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=2e431f9d68be What's with the linux builds and the windows pgo builds and the osx tier-2 builds? Those should not be running.

Chris AtLee [:catlee]

Comment 24

•

10 years ago

We're already excluding most firefox builds on b2g-inbound already. This is just a small improvement over what's there. PGO builds don't respect these product exclusions, since they're not per-push, but rather happen on a timer. The OSX tier-2 builds are using different infrastructure, so don't contribute to capacity problems that the tier-1 builds are using.

:Gijs (he/him)

Comment 25

•

10 years ago

This has hit production according to today's IRC chat with :catlee, so let's resolve. We can file followup bugs if we notice other things that could be optimized further in this space.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: General Automation → General