Avoid unnecessary buildbot builds for more parts of the tree

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
5 years ago
2 years ago

People

(Reporter: emorley, Unassigned)

Tracking

({sheriffing-P1})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [capacity])

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Bug 787449 added functionality (currently only enabled for mozilla-inbound, but due to be turned on for more trees once bug 803572 is fixed) that means for certain parts of the tree (eg browser/*, mobile/*) we only build the products/platforms that will be affected by the change, rather than all of them.

The current directory list is:

http://hg.mozilla.org/build/buildbotcustom/file/214dde15ffed/misc.py#l125

   125 _product_excludes = {
   126     'firefox': [
   127         re.compile('^CLOBBER'),
   128         re.compile('^mobile/'),
   129         re.compile('^b2g/'),
   130     ],
   131     'mobile': [
   132         re.compile('^CLOBBER'),
   133         re.compile('^browser/'),
   134         re.compile('^b2g/'),
   135         re.compile('^xulrunner/'),
   136     ],
   137     'b2g': [
   138         re.compile('^CLOBBER'),
   139         re.compile('^browser/'),
   140         re.compile('^mobile/'),
   141         re.compile('^xulrunner/'),
   142     ],
   143     'thunderbird': [
   144         re.compile('^CLOBBER'),
   145         re.compile("^suite/")
   146     ],
   147 }

This list was only intended as a safe starting point - we should now expand it.
(Reporter)

Updated

5 years ago
Summary: Expand the buildbotcustom platform-specific directory list, to avoid unnecessary builds for more parts of the tree → Avoid unnecessary buildbot builds for more parts of the tree
(Reporter)

Comment 1

5 years ago
I've grepped Makefiles, moz.build files & the tree directory listing looking for conditionals/platform specific directory names, processed it out and made a first pass at a shortlist.

I need to double check each entry, but pasting here for now so I don't lose the list between now and when I'm back off of PTO next week:

firefox-exclude:
  build/mobile/
  dom/cellbroadcast
  dom/fm
  dom/icc
  dom/payment
  dom/plugins/base/android
  dom/system/android
  dom/system/gonk
  dom/telephony
  dom/voicemail
  dom/wifi
  embedding/android
  hal/android
  hal/gonk
  ipc/dbus
  ipc/netd
  ipc/ril
  media/omx-plugin
  mozglue/android
  mozglue/linker
  netwerk/protocol/device/gonk
  netwerk/system/android
  other-licenses/android
  toolkit/system/androidproxy
  widget/android
  widget/gonk

mobile-exclude:
  accessible/public/ia2
  accessible/public/msaa
  accessible/src/mac
  accessible/src/windows
  build/macosx
  build/package/mac_osx
  build/win32
  build/win64
  dom/cellbroadcast
  dom/fm
  dom/icc
  dom/payment
  dom/system/gonk
  dom/system/windows
  dom/telephony
  dom/voicemail
  dom/wifi
  hal/cocoa
  hal/gonk
  hal/linux
  hal/windows
  intl/locale/src/mac
  intl/locale/src/windows
  ipc/dbus
  ipc/netd
  ipc/ril
  netwerk/protocol/device/gonk
  toolkit/system/osxproxy
  toolkit/system/windowsproxy
  toolkit/themes/osx
  webapprt/mac
  webapprt/win
  widget/cocoa
  widget/gonk
  widget/windows

b2g-exclude:
  accessible/public/ia2
  accessible/public/msaa
  accessible/src/mac
  accessible/src/windows
  build/macosx
  build/mobile/
  build/package/mac_osx
  build/win32
  build/win64
  dom/plugins/base/android
  dom/system/android
  dom/system/windows
  embedding/android
  hal/android
  hal/cocoa
  hal/linux
  hal/windows
  intl/locale/src/mac
  intl/locale/src/windows
  mozglue/android
  netwerk/system/android
  toolkit/system/androidproxy
  toolkit/system/osxproxy
  toolkit/system/windowsproxy
  toolkit/themes/osx
  webapprt/mac
  webapprt/win
  widget/android
  widget/cocoa
  widget/windows
Depends on: 832008
There's also code that is platform-specific within the "firefox" application, e.g. bug 832008.

I'd be wary of excluding things like dom/payment that are currently B2G-only but could conceivably be used on other platforms in the future, unless we have a good way to detect when we need to un-exclude them.  Maybe we could automatically generate a list from the moz.build files?

Comment 3

5 years ago
This bug scares me because it is making guesses about the build system while not living in the build system itself. While I'm sympathetic to what's being done here because the build system is inefficient, I think the code for excluding parts of the build should live in the tree. As currently implemented, when something on buildbot goes bad, we now have to figure out whether its a bug in the build system proper or in a non-visible exclusion list defined in the builder config. This seems fragile.
(In reply to Gregory Szorc [:gps] from comment #3)
> This bug scares me because it is making guesses about the build system while
> not living in the build system itself. While I'm sympathetic to what's being
> done here because the build system is inefficient, I think the code for
> excluding parts of the build should live in the tree. As currently
> implemented, when something on buildbot goes bad, we now have to figure out
> whether its a bug in the build system proper or in a non-visible exclusion
> list defined in the builder config. This seems fragile.

The thing is, it usually takes more than 10 minutes or build slaves to even get to the point where they can start looking at the mozilla source tree.
(Reporter)

Comment 5

5 years ago
(In reply to Gregory Szorc [:gps] from comment #3)
> As currently
> implemented, when something on buildbot goes bad, we now have to figure out
> whether its a bug in the build system proper or in a non-visible exclusion
> list defined in the builder config. This seems fragile.

Not really; since "gone bad" will mean no builds visible on TBPL, which is quite distinct from Frankenstein builds due to issues with in-tree build system dependencies.

I completely agree that it's frustrating to have to re-invent part of the buildsystem logic in buildbot - but given that at the moment, a build has no way of determining what changesets/files were changed to cause that build, there is no way to achieve this in the build system. Even if there were, we'd still be tying up the build slave to make that decision, rather than avoiding using one for that push at all.

Infra load is at bursting point at the moment, this bug (which is only a continuation of bug 787449) seems like one of the easiest quick wins we have.

That said, I'm hopeful you might have some ideas for a way to export a list from our moz.build files, which we can import to buildbot on a schedule, and thus ensure we don't have surprises later on.
It's also not about the build system being inefficient, it's about not doing builds we don't need. At all.
To echo what I said on IRC: it's not really possible to make scheduling decisions based on tree information right now. We have some longer term plans that might allow for this sort of thing to be more tightly coupled, though.

Comment 8

5 years ago
The moz.build files today will tell you which directories are relevant for the current build configuration. I could see us using their traversal metadata to determine whether a changeset applies to the current build configuration. Something to think about for the future.
(Reporter)

Comment 9

5 years ago
(In reply to Gregory Szorc [:gps] from comment #8)
> The moz.build files today will tell you which directories are relevant for
> the current build configuration. I could see us using their traversal
> metadata to determine whether a changeset applies to the current build
> configuration. Something to think about for the future.

Unfortunately a buildbot job has no sensible way to know what changes occurred in that push (the buildbot scheduler only get the tipmost cset aiui).

Even if we did, there's still comment 4 (though we can probably reduce the clownshoes that cause part of the 10+mins spent before configure).
(In reply to Gregory Szorc [:gps] from comment #8)
> The moz.build files today will tell you which directories are relevant for
> the current build configuration. I could see us using their traversal
> metadata to determine whether a changeset applies to the current build
> configuration. Something to think about for the future.

DIRS info doesn't tell the whole story. There are places, like ipc/, where we build files from subdirectories, or others, like some subdirectories of xpcom/ where we copy files from other subdirectories of xpcom/, etc.
Until moz.build have a complete view of the build, it's going to be skewed.
I filed bug 851270 to eliminate a very large chunk from the 10+ minutes pre-configure.
(In reply to Gregory Szorc [:gps] from comment #11)
> I filed bug 851270 to eliminate a very large chunk from the 10+ minutes
> pre-configure.

And I filed bug 851294 to eliminate another large chunk on linux.
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
(Reporter)

Comment 13

4 years ago
I may come back to this - but will need a bit of sifting through the platform conditionals in moz.build (or a dump of them vs directories by parsing), so offloading for now.
Assignee: emorley → nobody
Status: ASSIGNED → NEW
(Reporter)

Updated

3 years ago
Depends on: 1055009
(Reporter)

Updated

3 years ago
Depends on: 1001516
Depends on: 1132771
Created attachment 8680821 [details] [diff] [review]
improve exclusions so we build less and hopefully mitigate slave usage,

I am aware this is imperfect. I am aware taskcluster is going away, which means we have to redo this. I am aware that sticking this logic in here is not great from a maintainability perspective.

However, I am also aware that we keep running out of Windows slaves, that trypushes take ages to run, and that new slaves are apparently not coming for some months, and that backlogs repeatedly close trees, all of which is painful to me as a developer. This needs to get fixed. The android and gonk plain substring matches are purposefully aggressive: you shouldn't be writing code in a file called 'gonk' that you expect to run on desktop anyway. Coalescing will mean that even if we run too few builds on a rare occasion, it'll likely get fixed by the next push anyway. I'm also happy to land a subset of these changes, but either way, IMO something needs to be done sooner rather than later. Perfect is the enemy of good, etc.

So um, as a final note: I can't test this, so I hope my python ain't broken. Please check.
Attachment #8680821 - Flags: review?(emorley)
(Reporter)

Comment 15

2 years ago
Comment on attachment 8680821 [details] [diff] [review]
improve exclusions so we build less and hopefully mitigate slave usage,

It's been so long since I looked at this that I'm starting from scratch. I'm also not a peer of buildbot code.

Perhaps someone from releng / the build system should review?

Thank you for writing a patch for this though - would be great to see it fixed :-)
Attachment #8680821 - Flags: review?(emorley)
Comment on attachment 8680821 [details] [diff] [review]
improve exclusions so we build less and hopefully mitigate slave usage,

Redirecting review per feedback in #developers .

Please see comment 14 for disclaimers.
Attachment #8680821 - Flags: review?(ted)
Comment on attachment 8680821 [details] [diff] [review]
improve exclusions so we build less and hopefully mitigate slave usage,

Sorry, but I've never seen this code before. I'm going to pass this off to catlee, hopefully either he can review this or tag someone else on his team.
Attachment #8680821 - Flags: review?(ted) → review?(catlee)
Comment on attachment 8680821 [details] [diff] [review]
improve exclusions so we build less and hopefully mitigate slave usage,

Review of attachment 8680821 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks for the patch! A few minor tweaks perhaps to be made.

Syntax looks fine. Agreed that some tests would be great, but won't block on that!

::: misc.py
@@ +120,5 @@
> +        re.compile('^webapprt/win'),
> +        re.compile('^webapprt/mac'),
> +
> +        re.compile('^widget/cocoa'),
> +        re.compile('^widget/gonk'),

is this handled by the more general 'gonk' filter?

@@ +155,3 @@
>      ],
>      'thunderbird': [
>          re.compile('^CLOBBER'),

I think this can go too now that it's in __all?
Attachment #8680821 - Flags: review?(catlee) → review+
Oh, I lied, there are tests! tests/test_misc_important.py - and they still pass with your patch.

In addition, I've looked at the impact of this change using pushlog data from mozilla-inbound from 2015-10-01 until today.

number of builds triggered per product with existing product exclusion list:
firefox 1791
b2g 1770
mobile 1771

number of builds triggered per product with new product exclusion list:
firefox 1770
b2g 1741
mobile 1736

So we do see some improvement in number of builds scheduled.
(In reply to Chris AtLee [:catlee] from comment #19)
> Oh, I lied, there are tests! tests/test_misc_important.py - and they still
> pass with your patch.
> 
> In addition, I've looked at the impact of this change using pushlog data
> from mozilla-inbound from 2015-10-01 until today.
> 
> number of builds triggered per product with existing product exclusion list:
> firefox 1791
> b2g 1770
> mobile 1771
> 
> number of builds triggered per product with new product exclusion list:
> firefox 1770
> b2g 1741
> mobile 1736
> 
> So we do see some improvement in number of builds scheduled.

Hrmpf, 1.6% (85 / 5332) isn't too much though.

How easy is it to add per-desktop-platform (linux/windows/osx) build filters as well?

Anywho:

remote:   https://hg.mozilla.org/build/buildbotcustom/rev/72d2e5e1b534
Oh, though tbf, you checked mozilla-inbound - I somewhat expect slightly bigger wins on fx-team and b2g-inbound (because a larger proportion of checkins there is likely to be b2g/android/desktop-frontend-specific, which means we won't need to build other products)
b2g-inbound doesn't show much of an improvement either.

before:
firefox 208
b2g 761
mobile 217

after:
firefox 197
b2g 761
mobile 208

fx-team shows a pretty big win though!
before:
firefox 564
b2g 426
mobile 549

after:
firefox 554
b2g 333
mobile 459
(In reply to Chris AtLee [:catlee] from comment #22)
> b2g-inbound doesn't show much of an improvement either.
> 
> before:
> firefox 208
> b2g 761
> mobile 217
> 
> after:
> firefox 197
> b2g 761
> mobile 208

Hm. I'm not a b2g expert, but it might be worth looking at that in more detail? I thought most of these were basically gaia github bot thingy updates? Those should match the ^b2g/ exclude for desktop that is already there, and they clearly aren't if I'm looking at treeherder:

https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=89e5ec9bc73b
https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=68e40aa0dbd0
https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=2e431f9d68be

What's with the linux builds and the windows pgo builds and the osx tier-2 builds? Those should not be running.
We're already excluding most firefox builds on b2g-inbound already. This is just a small improvement over what's there.

PGO builds don't respect these product exclusions, since they're not per-push, but rather happen on a timer.

The OSX tier-2 builds are using different infrastructure, so don't contribute to capacity problems that the tier-1 builds are using.
This has hit production according to today's IRC chat with :catlee, so let's resolve. We can file followup bugs if we notice other things that could be optimized further in this space.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.