Closed Bug 849385 Opened 12 years ago Closed 10 years ago

Avoid unnecessary buildbot builds for more parts of the tree

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Keywords: sheriffing-P1, Whiteboard: [capacity])

Attachments

(1 file)

Bug 787449 added functionality (currently only enabled for mozilla-inbound, but due to be turned on for more trees once bug 803572 is fixed) that means for certain parts of the tree (eg browser/*, mobile/*) we only build the products/platforms that will be affected by the change, rather than all of them. The current directory list is: http://hg.mozilla.org/build/buildbotcustom/file/214dde15ffed/misc.py#l125 125 _product_excludes = { 126 'firefox': [ 127 re.compile('^CLOBBER'), 128 re.compile('^mobile/'), 129 re.compile('^b2g/'), 130 ], 131 'mobile': [ 132 re.compile('^CLOBBER'), 133 re.compile('^browser/'), 134 re.compile('^b2g/'), 135 re.compile('^xulrunner/'), 136 ], 137 'b2g': [ 138 re.compile('^CLOBBER'), 139 re.compile('^browser/'), 140 re.compile('^mobile/'), 141 re.compile('^xulrunner/'), 142 ], 143 'thunderbird': [ 144 re.compile('^CLOBBER'), 145 re.compile("^suite/") 146 ], 147 } This list was only intended as a safe starting point - we should now expand it.
Summary: Expand the buildbotcustom platform-specific directory list, to avoid unnecessary builds for more parts of the tree → Avoid unnecessary buildbot builds for more parts of the tree
I've grepped Makefiles, moz.build files & the tree directory listing looking for conditionals/platform specific directory names, processed it out and made a first pass at a shortlist. I need to double check each entry, but pasting here for now so I don't lose the list between now and when I'm back off of PTO next week: firefox-exclude: build/mobile/ dom/cellbroadcast dom/fm dom/icc dom/payment dom/plugins/base/android dom/system/android dom/system/gonk dom/telephony dom/voicemail dom/wifi embedding/android hal/android hal/gonk ipc/dbus ipc/netd ipc/ril media/omx-plugin mozglue/android mozglue/linker netwerk/protocol/device/gonk netwerk/system/android other-licenses/android toolkit/system/androidproxy widget/android widget/gonk mobile-exclude: accessible/public/ia2 accessible/public/msaa accessible/src/mac accessible/src/windows build/macosx build/package/mac_osx build/win32 build/win64 dom/cellbroadcast dom/fm dom/icc dom/payment dom/system/gonk dom/system/windows dom/telephony dom/voicemail dom/wifi hal/cocoa hal/gonk hal/linux hal/windows intl/locale/src/mac intl/locale/src/windows ipc/dbus ipc/netd ipc/ril netwerk/protocol/device/gonk toolkit/system/osxproxy toolkit/system/windowsproxy toolkit/themes/osx webapprt/mac webapprt/win widget/cocoa widget/gonk widget/windows b2g-exclude: accessible/public/ia2 accessible/public/msaa accessible/src/mac accessible/src/windows build/macosx build/mobile/ build/package/mac_osx build/win32 build/win64 dom/plugins/base/android dom/system/android dom/system/windows embedding/android hal/android hal/cocoa hal/linux hal/windows intl/locale/src/mac intl/locale/src/windows mozglue/android netwerk/system/android toolkit/system/androidproxy toolkit/system/osxproxy toolkit/system/windowsproxy toolkit/themes/osx webapprt/mac webapprt/win widget/android widget/cocoa widget/windows
Depends on: 832008
There's also code that is platform-specific within the "firefox" application, e.g. bug 832008. I'd be wary of excluding things like dom/payment that are currently B2G-only but could conceivably be used on other platforms in the future, unless we have a good way to detect when we need to un-exclude them. Maybe we could automatically generate a list from the moz.build files?
This bug scares me because it is making guesses about the build system while not living in the build system itself. While I'm sympathetic to what's being done here because the build system is inefficient, I think the code for excluding parts of the build should live in the tree. As currently implemented, when something on buildbot goes bad, we now have to figure out whether its a bug in the build system proper or in a non-visible exclusion list defined in the builder config. This seems fragile.
(In reply to Gregory Szorc [:gps] from comment #3) > This bug scares me because it is making guesses about the build system while > not living in the build system itself. While I'm sympathetic to what's being > done here because the build system is inefficient, I think the code for > excluding parts of the build should live in the tree. As currently > implemented, when something on buildbot goes bad, we now have to figure out > whether its a bug in the build system proper or in a non-visible exclusion > list defined in the builder config. This seems fragile. The thing is, it usually takes more than 10 minutes or build slaves to even get to the point where they can start looking at the mozilla source tree.
(In reply to Gregory Szorc [:gps] from comment #3) > As currently > implemented, when something on buildbot goes bad, we now have to figure out > whether its a bug in the build system proper or in a non-visible exclusion > list defined in the builder config. This seems fragile. Not really; since "gone bad" will mean no builds visible on TBPL, which is quite distinct from Frankenstein builds due to issues with in-tree build system dependencies. I completely agree that it's frustrating to have to re-invent part of the buildsystem logic in buildbot - but given that at the moment, a build has no way of determining what changesets/files were changed to cause that build, there is no way to achieve this in the build system. Even if there were, we'd still be tying up the build slave to make that decision, rather than avoiding using one for that push at all. Infra load is at bursting point at the moment, this bug (which is only a continuation of bug 787449) seems like one of the easiest quick wins we have. That said, I'm hopeful you might have some ideas for a way to export a list from our moz.build files, which we can import to buildbot on a schedule, and thus ensure we don't have surprises later on.
It's also not about the build system being inefficient, it's about not doing builds we don't need. At all.
To echo what I said on IRC: it's not really possible to make scheduling decisions based on tree information right now. We have some longer term plans that might allow for this sort of thing to be more tightly coupled, though.
The moz.build files today will tell you which directories are relevant for the current build configuration. I could see us using their traversal metadata to determine whether a changeset applies to the current build configuration. Something to think about for the future.
(In reply to Gregory Szorc [:gps] from comment #8) > The moz.build files today will tell you which directories are relevant for > the current build configuration. I could see us using their traversal > metadata to determine whether a changeset applies to the current build > configuration. Something to think about for the future. Unfortunately a buildbot job has no sensible way to know what changes occurred in that push (the buildbot scheduler only get the tipmost cset aiui). Even if we did, there's still comment 4 (though we can probably reduce the clownshoes that cause part of the 10+mins spent before configure).
(In reply to Gregory Szorc [:gps] from comment #8) > The moz.build files today will tell you which directories are relevant for > the current build configuration. I could see us using their traversal > metadata to determine whether a changeset applies to the current build > configuration. Something to think about for the future. DIRS info doesn't tell the whole story. There are places, like ipc/, where we build files from subdirectories, or others, like some subdirectories of xpcom/ where we copy files from other subdirectories of xpcom/, etc. Until moz.build have a complete view of the build, it's going to be skewed.
I filed bug 851270 to eliminate a very large chunk from the 10+ minutes pre-configure.
(In reply to Gregory Szorc [:gps] from comment #11) > I filed bug 851270 to eliminate a very large chunk from the 10+ minutes > pre-configure. And I filed bug 851294 to eliminate another large chunk on linux.
Product: mozilla.org → Release Engineering
I may come back to this - but will need a bit of sifting through the platform conditionals in moz.build (or a dump of them vs directories by parsing), so offloading for now.
Assignee: emorley → nobody
Status: ASSIGNED → NEW
Depends on: 1055009
Depends on: 1001516
Depends on: 1132771
I am aware this is imperfect. I am aware taskcluster is going away, which means we have to redo this. I am aware that sticking this logic in here is not great from a maintainability perspective. However, I am also aware that we keep running out of Windows slaves, that trypushes take ages to run, and that new slaves are apparently not coming for some months, and that backlogs repeatedly close trees, all of which is painful to me as a developer. This needs to get fixed. The android and gonk plain substring matches are purposefully aggressive: you shouldn't be writing code in a file called 'gonk' that you expect to run on desktop anyway. Coalescing will mean that even if we run too few builds on a rare occasion, it'll likely get fixed by the next push anyway. I'm also happy to land a subset of these changes, but either way, IMO something needs to be done sooner rather than later. Perfect is the enemy of good, etc. So um, as a final note: I can't test this, so I hope my python ain't broken. Please check.
Attachment #8680821 - Flags: review?(emorley)
Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, It's been so long since I looked at this that I'm starting from scratch. I'm also not a peer of buildbot code. Perhaps someone from releng / the build system should review? Thank you for writing a patch for this though - would be great to see it fixed :-)
Attachment #8680821 - Flags: review?(emorley)
Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Redirecting review per feedback in #developers . Please see comment 14 for disclaimers.
Attachment #8680821 - Flags: review?(ted)
Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Sorry, but I've never seen this code before. I'm going to pass this off to catlee, hopefully either he can review this or tag someone else on his team.
Attachment #8680821 - Flags: review?(ted) → review?(catlee)
Comment on attachment 8680821 [details] [diff] [review] improve exclusions so we build less and hopefully mitigate slave usage, Review of attachment 8680821 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thanks for the patch! A few minor tweaks perhaps to be made. Syntax looks fine. Agreed that some tests would be great, but won't block on that! ::: misc.py @@ +120,5 @@ > + re.compile('^webapprt/win'), > + re.compile('^webapprt/mac'), > + > + re.compile('^widget/cocoa'), > + re.compile('^widget/gonk'), is this handled by the more general 'gonk' filter? @@ +155,3 @@ > ], > 'thunderbird': [ > re.compile('^CLOBBER'), I think this can go too now that it's in __all?
Attachment #8680821 - Flags: review?(catlee) → review+
Oh, I lied, there are tests! tests/test_misc_important.py - and they still pass with your patch. In addition, I've looked at the impact of this change using pushlog data from mozilla-inbound from 2015-10-01 until today. number of builds triggered per product with existing product exclusion list: firefox 1791 b2g 1770 mobile 1771 number of builds triggered per product with new product exclusion list: firefox 1770 b2g 1741 mobile 1736 So we do see some improvement in number of builds scheduled.
(In reply to Chris AtLee [:catlee] from comment #19) > Oh, I lied, there are tests! tests/test_misc_important.py - and they still > pass with your patch. > > In addition, I've looked at the impact of this change using pushlog data > from mozilla-inbound from 2015-10-01 until today. > > number of builds triggered per product with existing product exclusion list: > firefox 1791 > b2g 1770 > mobile 1771 > > number of builds triggered per product with new product exclusion list: > firefox 1770 > b2g 1741 > mobile 1736 > > So we do see some improvement in number of builds scheduled. Hrmpf, 1.6% (85 / 5332) isn't too much though. How easy is it to add per-desktop-platform (linux/windows/osx) build filters as well? Anywho: remote: https://hg.mozilla.org/build/buildbotcustom/rev/72d2e5e1b534
Oh, though tbf, you checked mozilla-inbound - I somewhat expect slightly bigger wins on fx-team and b2g-inbound (because a larger proportion of checkins there is likely to be b2g/android/desktop-frontend-specific, which means we won't need to build other products)
b2g-inbound doesn't show much of an improvement either. before: firefox 208 b2g 761 mobile 217 after: firefox 197 b2g 761 mobile 208 fx-team shows a pretty big win though! before: firefox 564 b2g 426 mobile 549 after: firefox 554 b2g 333 mobile 459
(In reply to Chris AtLee [:catlee] from comment #22) > b2g-inbound doesn't show much of an improvement either. > > before: > firefox 208 > b2g 761 > mobile 217 > > after: > firefox 197 > b2g 761 > mobile 208 Hm. I'm not a b2g expert, but it might be worth looking at that in more detail? I thought most of these were basically gaia github bot thingy updates? Those should match the ^b2g/ exclude for desktop that is already there, and they clearly aren't if I'm looking at treeherder: https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=89e5ec9bc73b https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=68e40aa0dbd0 https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=2e431f9d68be What's with the linux builds and the windows pgo builds and the osx tier-2 builds? Those should not be running.
We're already excluding most firefox builds on b2g-inbound already. This is just a small improvement over what's there. PGO builds don't respect these product exclusions, since they're not per-push, but rather happen on a timer. The OSX tier-2 builds are using different infrastructure, so don't contribute to capacity problems that the tier-1 builds are using.
This has hit production according to today's IRC chat with :catlee, so let's resolve. We can file followup bugs if we notice other things that could be optimized further in this space.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: