Stop running builds and tests on pushes which only include changes to the b2g or mobile or browser directories

RESOLVED FIXED

Status

Release Engineering
General Automation
P3
normal
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: Ehsan, Assigned: catlee)

Tracking

({sheriffing-P1})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [capacity])

Attachments

(6 attachments)

See the discussion on dev.platform.
(Assignee)

Comment 1

5 years ago
this depends on the work done in bug 653041. We decided in the end to not use maxChanges, but only to poll the tip changeset. We need to switch to use maxChanges, and then include logic to say that if maxChanges is exceeded, to do all builds since we don't have the full set of files that have changed.
Blocks: 752836
(Assignee)

Updated

5 years ago
Assignee: nobody → catlee
Priority: -- → P3
Summary: Stop running builds and tests on pushes which only include changes to the b2g or mobile directories → Stop running builds and tests on pushes which only include changes to the b2g or mobile or browser directories
This seems quite a big win and I'm very happy that it got pin pointed :)
Whiteboard: [capacity]
(Assignee)

Comment 3

5 years ago
So I've got a patch running in staging which operates with the following logic:

(1) If a file changes that is specifically watched by a product, then build that product
(2) If no files have are watched by any product, then build all products

A consequence of (2) is that pushes with no changes trigger builds on all products, which is desirable from a tree-management standpoint.

The current watchlist is:
'firefox': ['^browser/']
'b2g':     ['^b2g/']
'mobile':  ['^mobile/']

Can we add more to that?

Can we add excludes?

Is there any benefit to adding push comment support, like ONLYB2G, NOMOBILE, etc.?
(In reply to Chris AtLee [:catlee] from comment #3)
> 'firefox': ['^browser/']
> 'b2g':     ['^b2g/']
> 'mobile':  ['^mobile/']
> 
> Can we add more to that?

I can't think of anything offhand.

> Is there any benefit to adding push comment support, like ONLYB2G, NOMOBILE,
> etc.?

This seems too likely to be abused.
xulrunner: ['^xulrunner/']
mobile: ['^mobile/', '^embedding/android/'] ; actually probably any directory that contains android.
(Assignee)

Comment 6

5 years ago
We don't do xulrunner builds per push, so this would have no effect at the moment. However, it could allow us to effectively to xulrunner builds per push, only when ^xulrunner/ changes with a bit of refactoring.

Adding '/android/' to mobile would mean that any time a directory that contains android changes, we only build mobile. Is that the right thing to do? Do the b2g builds depend on any android stuff in m-c?
(In reply to Chris AtLee [:catlee] from comment #6)
> We don't do xulrunner builds per push, so this would have no effect at the
> moment. However, it could allow us to effectively to xulrunner builds per
> push, only when ^xulrunner/ changes with a bit of refactoring.

The contrary would be more interesting: don't build mobile, desktop and b2g when only changes to xulrunner/ occurred.


> Adding '/android/' to mobile would mean that any time a directory that
> contains android changes, we only build mobile. Is that the right thing to
> do? Do the b2g builds depend on any android stuff in m-c?

I'm not sure. It doesn't depend on anything in embedding/android, that's for sure.
(Assignee)

Comment 8

5 years ago
(In reply to Mike Hommey [:glandium] from comment #7)
> (In reply to Chris AtLee [:catlee] from comment #6)
> > We don't do xulrunner builds per push, so this would have no effect at the
> > moment. However, it could allow us to effectively to xulrunner builds per
> > push, only when ^xulrunner/ changes with a bit of refactoring.
> 
> The contrary would be more interesting: don't build mobile, desktop and b2g
> when only changes to xulrunner/ occurred.

Ah, that's very true. It could be added as you suggested to a 'xulrunner' product which would cause the other products to ignore it, or it could be added to each product's exclude list.

I'm still not sure if a mix of includes and excludes is workable, or if we should have only exclude lists per product....
This is probably a stupid question, but comment 3 and the second part of comment 6 seem ambiguous (missing an "only"): if someone pushes a changeset that touches both layout/foo.h and browser/foo.js, we'll still rebuild everything, right?
(Assignee)

Comment 10

5 years ago
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #9)
> This is probably a stupid question, but comment 3 and the second part of
> comment 6 seem ambiguous (missing an "only"): if someone pushes a changeset
> that touches both layout/foo.h and browser/foo.js, we'll still rebuild
> everything, right?

Yeah, maybe better to express like this?

For each file that's changed in a push,
(1) If the file is specifically watched by a product, then build that product
(2) If the file is not watched by any product, then build all products
(Assignee)

Comment 11

5 years ago
This makes me think that despite the repetition, having only an exclude list per product would be simpler. Then the logic would go something like:

For a given product, if there is at least one file changed in a push that doesn't match any of the product's exclude list, then build the product.
(Assignee)

Comment 12

5 years ago
Created attachment 670447 [details] [diff] [review]
product specific fileIsImportant support

There are two main parts to this:

1) Change how hg polling works. This patch adds a new flag to the hg poller, "mergePushChanges" which defaults to True. The effect of this is to combine all changes in a push into a single buildbot change. The tip-most change's author and comments will be used in the buildbot change, and all files changed in the push will be set as the buildbot change's files attribute.

If maxChanges is set and is hit, one additional file will be added to the change called 'overflow'. This is used to indicate that too many changes were hit.

We then turn off tipsOnly polling for non-try repositories so we can see all the changes (and their files) in each push.

2) fileIsImportant function per product. I decided on using just exclude lists for each product instead of a combination of include/exclude lists. The primary reason for this is that the logic for product inclusion lists meant that changing the list of one product could have unintended consequences on other products. Using exclusions lists means the list of files to ignore is explicit per product, even though the exclusion lists are a bit more verbose.
Attachment #670447 - Flags: review?(nthomas)
Attachment #670447 - Flags: review?(bhearsum)
Comment on attachment 670447 [details] [diff] [review]
product specific fileIsImportant support

Review of attachment 670447 [details] [diff] [review]:
-----------------------------------------------------------------

::: misc.py
@@ +132,5 @@
> +    ],
> +    'b2g': [
> +        re.compile('^browser/'),
> +        re.compile('^mobile/'),
> +    ],

You can definitely add xulrunner/ to mobile/b2g. Maybe to firefox too, though that might make us miss xulrunner-specific changes in nightlies on occasion.

::: test/test_misc_important.py
@@ +16,5 @@
> +            self.revlink = revlink
> +        if comments:
> +            self.comments = comments
> +        if revision:
> +            self.revision = revision

This could be simplified by setting the defaults in the __init__ signature rather than defining them at class level. Doesn't really matter either way.
Attachment #670447 - Flags: review?(bhearsum) → review+
Comment on attachment 670447 [details] [diff] [review]
product specific fileIsImportant support

>diff --git a/changes/hgpoller.py b/changes/hgpoller.py
>+        # We want to add at most self.maxChanges changes per push
>+        # Go through the list of pushes backwards, since we want to keep the

FWIW, if mergePushChanges is True the code will collapse maxChanges pushes to as many changes, then stop.

>+            for change in reversed(push['changesets']):
>+                if self.maxChanges is not None and (len(change_list) >= self.maxChanges or
>+                                                    i >= self.maxChanges):
>+                    too_many = True
>+                    log.msg("got too many changes")

Please yank the logging if it was just for testing, or identify the repo so there's clarity when the master log has multiple polls in close proximity.

>+            if too_many and self.mergePushChanges:
>+                # Add a dummy change to indicate we had too many changes
>+                c['files'].extend(['overflow'])
>+
>+            if self.mergePushChanges and c['node'] is not None:
>+                change_list.append(c)
>+
>+        if too_many and not self.mergePushChanges:
>+            # We add this at the end, and the list gets reversed below. That
>+            # means this dummy change ends up being the 'first' change of the
>+            # set

The overflow change or file are a marker for later data mining or debugging ?

>diff --git a/misc.py b/misc.py
>+_product_excludes = {
>+    'firefox': [
>+        re.compile('^mobile/'),
>+        re.compile('^b2g/'),
>+    ],

Ben's comment made me wonder about nightly builds and the way they look up a good revision. Are all the products looking up their own good revision so that it doesn't matter that the last build might be on different revisions ?

>+    'thunderbird': [re.compile("^suite/")],

Can Thunderbird exclude browser/, b2g/ and mobile/ or is this just comm-central dirs ? It might overbuild.

Any plans to move the machinery we use for spidermonkey/ionmonkey/js over to this method ?

>     if config.get('enable_try', False):
...
>     else:
>-        tipsOnly = True
>+        tipsOnly = False

I wondered if this might be hard on the hg server around merge time, but pushlog appears to query all changes even if it only needs to return tips, so this only changes how much we transfer on the first poll post-merge.

>+    for product, product_builders in buildersByProduct.items():
>+        if config.get('enable_try'):
>+            fileIsImportant = lambda c: isHgPollerTriggered(c, config['hgurl'])
>+        else:
>+            fileIsImportant = makeImportantFunc(config['hgurl'], product)
>+
>+        branchObjects['schedulers'].append(scheduler_class(

mobile does a funny thing where it sets stage_product to 'firefox', which might cause issues with the scheduling here (I'd look at try chooser but frankly it scares me to death). IIRC that's something to do with disk space reasons on ftp.m.o which may no longer apply, and there's a symlink from mobile/try-builds to ../firefox/try-builds, so it might be OK. Did you try dumping a scheduler master ?

r+ assuming the issues I've raised can be explained away.
Attachment #670447 - Flags: review?(nthomas) → review+
(Assignee)

Comment 15

5 years ago
(In reply to Nick Thomas [:nthomas] from comment #14)
> Comment on attachment 670447 [details] [diff] [review]
> product specific fileIsImportant support
> 
> >diff --git a/changes/hgpoller.py b/changes/hgpoller.py
> >+        # We want to add at most self.maxChanges changes per push
> >+        # Go through the list of pushes backwards, since we want to keep the
> 
> FWIW, if mergePushChanges is True the code will collapse maxChanges pushes
> to as many changes, then stop.

Is this not ok?

> >+            for change in reversed(push['changesets']):
> >+                if self.maxChanges is not None and (len(change_list) >= self.maxChanges or
> >+                                                    i >= self.maxChanges):
> >+                    too_many = True
> >+                    log.msg("got too many changes")
> 
> Please yank the logging if it was just for testing, or identify the repo so
> there's clarity when the master log has multiple polls in close proximity.

I changed this to
log.msg("%s: got too many changes" % self.baseURL)

> >+            if too_many and self.mergePushChanges:
> >+                # Add a dummy change to indicate we had too many changes
> >+                c['files'].extend(['overflow'])
> >+
> >+            if self.mergePushChanges and c['node'] is not None:
> >+                change_list.append(c)
> >+
> >+        if too_many and not self.mergePushChanges:
> >+            # We add this at the end, and the list gets reversed below. That
> >+            # means this dummy change ends up being the 'first' change of the
> >+            # set
> 
> The overflow change or file are a marker for later data mining or debugging ?

We need some kind of indication that we've discarded data so that the fileIsImportant functions can decide to build everything if the pollers have overflowed. It's better to build too much in the case of overflow than to build too little.

> >diff --git a/misc.py b/misc.py
> >+_product_excludes = {
> >+    'firefox': [
> >+        re.compile('^mobile/'),
> >+        re.compile('^b2g/'),
> >+    ],
> 
> Ben's comment made me wonder about nightly builds and the way they look up a
> good revision. Are all the products looking up their own good revision so
> that it doesn't matter that the last build might be on different revisions ?

B2G nightly schedulers are on their own. Firefox and mobile are tied together...so I think that if the most recent push on a branch only affects one firefox, mobile won't get built, and so that revision won't be considered for a nightly build.

I'm not sure what the right thing to do here is. How important is it for firefox/fennec to be built on the same revision? We've had complaints that the b2g nightlies aren't using the same revision as desktop/mobile nightlies. On the other hand, if we want to tie these all together, how do we decide which revision to use in the face of only products affected by a change?

> >+    'thunderbird': [re.compile("^suite/")],
> 
> Can Thunderbird exclude browser/, b2g/ and mobile/ or is this just
> comm-central dirs ? It might overbuild.

Thunderbird only looks at comm-central right now.

> Any plans to move the machinery we use for spidermonkey/ionmonkey/js over to
> this method ?

spidermonkey already uses something like this. What do you mean about ionmonkey/js?

> >     if config.get('enable_try', False):
> ...
> >     else:
> >-        tipsOnly = True
> >+        tipsOnly = False
> 
> I wondered if this might be hard on the hg server around merge time, but
> pushlog appears to query all changes even if it only needs to return tips,
> so this only changes how much we transfer on the first poll post-merge.
> 
> >+    for product, product_builders in buildersByProduct.items():
> >+        if config.get('enable_try'):
> >+            fileIsImportant = lambda c: isHgPollerTriggered(c, config['hgurl'])
> >+        else:
> >+            fileIsImportant = makeImportantFunc(config['hgurl'], product)
> >+
> >+        branchObjects['schedulers'].append(scheduler_class(
> 
> mobile does a funny thing where it sets stage_product to 'firefox', which
> might cause issues with the scheduling here (I'd look at try chooser but
> frankly it scares me to death). IIRC that's something to do with disk space
> reasons on ftp.m.o which may no longer apply, and there's a symlink from
> mobile/try-builds to ../firefox/try-builds, so it might be OK. Did you try
> dumping a scheduler master ?

AFAICT for mobile, 'stage_product' (which gets used as the key in buildersByProduct) is set to 'mobile'. e.g. http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.py#l688

> r+ assuming the issues I've raised can be explained away.

I think the nightly scheduling issue needs to be addressed before landing.
(In reply to Chris AtLee [:catlee] from comment #15)
> (In reply to Nick Thomas [:nthomas] from comment #14)
> > Ben's comment made me wonder about nightly builds and the way they look up a
> > good revision. Are all the products looking up their own good revision so
> > that it doesn't matter that the last build might be on different revisions ?
> 
> B2G nightly schedulers are on their own. Firefox and mobile are tied
> together...so I think that if the most recent push on a branch only affects
> one firefox, mobile won't get built, and so that revision won't be
> considered for a nightly build.
> 
> I'm not sure what the right thing to do here is. How important is it for
> firefox/fennec to be built on the same revision? We've had complaints that
> the b2g nightlies aren't using the same revision as desktop/mobile
> nightlies. On the other hand, if we want to tie these all together, how do
> we decide which revision to use in the face of only products affected by a
> change?

Warning: this is probably a slight tangent, but relevant:
I think these sort of differences are things people need to get used to. We're going to be doing more and more builds of more and more different things. We can't tie everything together. Firefox+Fennec make some sense to tie together, because combined they're a unit that we ship (though I also think there's a good argument for separating them). Thunderbird less so, and b2g is entirely separate IMO. We're way past the point where we can do things like "build all of the things every night" -- we should be looking to modularize more.

I think tools like TBPL could be improved to deal with these things easier for developers/sheriffs to cope with.
(In reply to Chris AtLee [:catlee] from comment #15)
> (In reply to Nick Thomas [:nthomas] from comment #14)
> > FWIW, if mergePushChanges is True the code will collapse maxChanges pushes
> > to as many changes, then stop.
> Is this not ok?

Just need to expand the comment.
 
> > The overflow change or file are a marker for later data mining or debugging ?
> 
> We need some kind of indication that we've discarded data so that the
> fileIsImportant functions can decide to build everything if the pollers have
> overflowed. It's better to build too much in the case of overflow than to
> build too little.

OK. I guess I was thrown by a lack of comment next to isImportantForProduct saying any change with an 'overflow' file will be important.

> I'm not sure what the right thing to do here is. How important is it for
> firefox/fennec to be built on the same revision? We've had complaints that
> the b2g nightlies aren't using the same revision as desktop/mobile
> nightlies. On the other hand, if we want to tie these all together, how do
> we decide which revision to use in the face of only products affected by a
> change?

Tricky tricky! If we split Fennec out, and (somehow!) we don't get a landing in mobile/ for several days we wouldn't get a nightly. I suspect people will prefer to have the gecko changes in the meantime, otherwise bisection is hard if it comes in large chunks. Similar argument for b2g. 

So a periodic build of everything, either every day and early enough to be considered for a nightly, or maybe at most N hours since last build.

> > Any plans to move the machinery we use for spidermonkey/ionmonkey/js over to
> > this method ?
> 
> spidermonkey already uses something like this. What do you mean about
> ionmonkey/js?

Apart from having only one machinery to do this, I was probably just conflating things in my head.

> AFAICT for mobile, 'stage_product' (which gets used as the key in
> buildersByProduct) is set to 'mobile'. e.g.
> http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.
> py#l688

This is the bit of code I meant:
http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.py#l1489
(Assignee)

Comment 18

5 years ago
Talked to Ehsan, and we're going to try and turn this on m-i only for now.

Ideally we want b2g, fennec, firefox built from the same nightly revision.

Also, for the "last good" algorithm, a revision for which we've skipped builds should be considered green for platforms for which builds on the previous revision were green.
(Assignee)

Comment 19

5 years ago
Created attachment 673267 [details] [diff] [review]
minor fixes

Adds the baseURL to the "too many changes" message.

Makes the per-product builds toggleable per branch. We'll switch it on inbound for now with a buildbot-configs patch.
Attachment #673267 - Flags: review?(nthomas)
(Assignee)

Comment 20

5 years ago
(In reply to Nick Thomas [:nthomas] from comment #17)
> (In reply to Chris AtLee [:catlee] from comment #15)
> > (In reply to Nick Thomas [:nthomas] from comment #14)
> > > FWIW, if mergePushChanges is True the code will collapse maxChanges pushes
> > > to as many changes, then stop.
> > Is this not ok?
> 
> Just need to expand the comment.

Done in the latest attachment.

> > > The overflow change or file are a marker for later data mining or debugging ?
> > 
> > We need some kind of indication that we've discarded data so that the
> > fileIsImportant functions can decide to build everything if the pollers have
> > overflowed. It's better to build too much in the case of overflow than to
> > build too little.
> 
> OK. I guess I was thrown by a lack of comment next to isImportantForProduct
> saying any change with an 'overflow' file will be important.

'overflow' isn't special except for it's not specifically excluded by the isImportantForProduct function. I added a comment about it in that function. Do you think it's better to add support specifically for it?

> > I'm not sure what the right thing to do here is. How important is it for
> > firefox/fennec to be built on the same revision? We've had complaints that
> > the b2g nightlies aren't using the same revision as desktop/mobile
> > nightlies. On the other hand, if we want to tie these all together, how do
> > we decide which revision to use in the face of only products affected by a
> > change?
> 
> Tricky tricky! If we split Fennec out, and (somehow!) we don't get a landing
> in mobile/ for several days we wouldn't get a nightly. I suspect people will
> prefer to have the gecko changes in the meantime, otherwise bisection is
> hard if it comes in large chunks. Similar argument for b2g. 
> 
> So a periodic build of everything, either every day and early enough to be
> considered for a nightly, or maybe at most N hours since last build.

We're going to punt on this problem and enable this on inbound only for now.

> > > Any plans to move the machinery we use for spidermonkey/ionmonkey/js over to
> > > this method ?
> > 
> > spidermonkey already uses something like this. What do you mean about
> > ionmonkey/js?
> 
> Apart from having only one machinery to do this, I was probably just
> conflating things in my head.
> 
> > AFAICT for mobile, 'stage_product' (which gets used as the key in
> > buildersByProduct) is set to 'mobile'. e.g.
> > http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.
> > py#l688
> 
> This is the bit of code I meant:
> http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/config.
> py#l1489

Happily that's only for the try branch, and this logic is completely disabled on try.
(Assignee)

Updated

5 years ago
Blocks: 803572
Attachment #673267 - Flags: review?(nthomas) → review+
(Assignee)

Updated

5 years ago
Attachment #670447 - Flags: checked-in+
(Assignee)

Comment 21

5 years ago
Comment on attachment 673267 [details] [diff] [review]
minor fixes

let's let this soak for a few days without switching mozilla-inbound to make sure we haven't broken anything
Attachment #673267 - Flags: checked-in+
In production
(Assignee)

Comment 23

5 years ago
Created attachment 674502 [details] [diff] [review]
enable per-product builds on mozilla-inbound
Attachment #674502 - Flags: review?(aki)

Updated

5 years ago
Attachment #674502 - Flags: review?(aki) → review+
(Assignee)

Updated

5 years ago
Attachment #674502 - Flags: checked-in+

Comment 24

5 years ago
This is live.
Looking good so far, eg:
B2G-only: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=f35ce23ddfc8
Android-only: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=1f74dee93e26
Desktop-only: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=9379c0915c6a
Created attachment 693986 [details] [diff] [review]
[custom] Exclude 'CLOBBER' from triggering all-product builds

Since the creation of a global CLOBBER file to indicate "needs a clobber" we now have a file that would trigger builds on other platforms as well.

And broken builds for some - not-as-quick-to-the-draw - users to boot.

This does not eliminate the need for said other products to *also* clobber due to the CLOBBER file change, but does keep our extra-builds for unneeded products down.
Attachment #693986 - Flags: review?(catlee)
(In reply to Justin Wood (:Callek) from comment #26)
> Created attachment 693986 [details] [diff] [review]
> [custom] Exclude 'CLOBBER' from triggering all-product builds
> 
> Since the creation of a global CLOBBER file
In bug 717372 fwiw
(Assignee)

Updated

4 years ago
Attachment #693986 - Flags: review?(catlee) → review+
Comment on attachment 693986 [details] [diff] [review]
[custom] Exclude 'CLOBBER' from triggering all-product builds

http://hg.mozilla.org/build/buildbotcustom/rev/214dde15ffed

Updated

4 years ago
Attachment #693986 - Flags: checked-in+
(In reply to Chris AtLee [:catlee] from comment #21)
> let's let this soak for a few days without switching mozilla-inbound to make
> sure we haven't broken anything

Can this be rolled out further now? Its been more than a few days...
(Assignee)

Comment 30

4 years ago
(In reply to Mark Banner (:standard8) from comment #29)
> (In reply to Chris AtLee [:catlee] from comment #21)
> > let's let this soak for a few days without switching mozilla-inbound to make
> > sure we haven't broken anything
> 
> Can this be rolled out further now? Its been more than a few days...

Not yet, due to bug 803572.

Updated

4 years ago
Keywords: sheriffing-P1
Blocks: 832008

Updated

4 years ago
No longer blocks: 803572
Depends on: 803572

Updated

4 years ago
Blocks: 849385
Created attachment 725041 [details] [diff] [review]
enable per-product builds on mozilla-beta

We don't generate nightlies on mozilla-beta, so bug 803572 shouldn't have an impact there IIUC. Aki, I see you reviewed the inbound version of this patch, so mind doing the honors?
Attachment #725041 - Flags: review?(aki)

Updated

4 years ago
Attachment #725041 - Flags: review?(aki) → review+
Comment on attachment 725041 [details] [diff] [review]
enable per-product builds on mozilla-beta

https://hg.mozilla.org/build/buildbot-configs/rev/601700a7af4e
Attachment #725041 - Flags: checked-in+
Product: mozilla.org → Release Engineering
Found in triage, moving to correct component.

:catlee, :aki: I see some patches landed... whats left to do here?
Component: Other → General Automation
Flags: needinfo?(catlee)
Flags: needinfo?(aki)
(In reply to John O'Duinn [:joduinn] from comment #33)
> :catlee, :aki: I see some patches landed... whats left to do here?

This is only enabled on ~{inbound, b2g-inbound, fx-team, cypress, mozilla-beta, mozilla-b2g18} since until bug 803572 is fixed, it doesn't play nicely with the "pick last green changeset for the Nightly" script, so can only be enabled for branches where we don't build nightlies.

So:
a) It would be good to get bug 803572 fixed.
b) In the meantime, we're still not using this on many trees that aren't building nightlies - is there any reason why we don't just make this enabled as long as 'enable_nightly' isn't set (ie not use the 'enable_perproduct_builds' pref at all, or at least make it enabled by default?).
(Assignee)

Comment 35

4 years ago
b) sounds like a good idea. I think we should keep the pref, but perhaps have it default to !enable_nightly.

The code to change is here: http://hg.mozilla.org/build/buildbotcustom/file/default/misc.py#l894
Flags: needinfo?(catlee)
(Assignee)

Comment 36

4 years ago
John - Ed and Ryan and I just had a chat about this in #releng, and we're wondering if the "use the last good revision for the nightly build" algorithm is worth keeping.

It certainly has proved useful in the past, ensuring that we got good nightly builds of firefox on all platforms every night.

Today however, I think it's causing us more headaches, and the benefits are mitigated by pushing most development onto integration branches and having full time sherrifs. mozilla-central is in good shape most of the time, and in cases where nightlies happen on a bad revision, sheriffs or QA or developers can trigger a new set of nightlies on a different revision with self-serve.

Current problems with the "use last good revision" algorithm:
- It prevents us from using information about which files have changed to control which builds happen on branches where we have nightlies. At least until bug 803572 is fixed, but...

- The algorithm to figure this out is expensive to run, and will get even more complicated to write and test if we do bug 803572.

- Long standing failures of some platforms prevent nightlies from happening at all, which is worse than having only some nightlies. Recent examples of this are leo bustage, hamachi_eng bustage, and win64 bustage. Some of these have external dependencies which aren't easily resolved.
Flags: needinfo?(joduinn)

Comment 37

4 years ago
I think this question has been answered by people with more context.
Flags: needinfo?(aki)
(summary of chat catlee+ myself had last week)

Background:
1) We used to build nightly using tip, and we were being burnt by bad landings that caused RelEng to generate broken nightly builds. Also, some developers would avoid landing scary patches closer to start-of-nightly-build because they wouldnt have time to backout in case of bustage. Changing "build using tip" to  "build using last known good changeset" was done by RelEng as a safety check to make nightly builds more reliable, enable developers in different timezones to land when they are awake to watch for fallout, and also to avoid burning nightly users with a broken nightly build.


2) We used to have all checkins land on tip. Now that we have integration branches, fewer changes land directly into mozilla-central. Having changes land and be green on an integration branch first, before landing on m-c, is much safer then direct landing onto m-c. Even including checkins from integration branches, now only 2-3% of checkins land on mozilla-central, and even less land on mozilla-central, then aurora, then beta then release, so the code within the checkins that are landing are less likely to break nightly users.


3) RelEng used to have to do manual steps to regenerate nightlies, with bugs being filed by devs to track the work. Now, this process is greatly simplified, and sheriffs can trigger a new nightly using self-serve.
Given how much we now use integration branches, I would be fine with turning off this  "build nightly using last known good changeset" safety feature, and going back to "build nightly using tip" if we: 

3a) restrict mozilla-central to sheriff only landings. From  previous emails with sheriffs, having mozilla-central be restricted to  sheriff-only landings would also simplify their day-to-day sheriff work merging integration branches into m-c. Of course, for exceptions like chemspills, it makes sense to allow developers land hotfixes on mozilla-central if that helps expedite things, but that would be by explicit exception. 

3b) sheriffs avoid merging integration branches into mozilla-central and aurora in the few hours before nightlies are triggered. This would allow any possible bustages to be detected, and backed out, before the nightlies are triggered. For most sheriffs, they are asleep at this time, but some CET-based sheriffs will need to keep aware of this. The idea is to only land on mozilla-central when sheriff has enough time to see results, and if needed, successfully backout before the nightly builds start.

Worst case, sheriffs can trigger new nightlies themselves using self-serve, but this should be used only by exception if both (a) and (b) fail for some reason.


Ed: Sound ok to you? Any concerns before we go ahead with this?
Flags: needinfo?(joduinn) → needinfo?(emorley)
Restricting m-c to sheriff only check-ins may make some people sad.  We have some developers who always land on central (such as Ms2ger) and some others who occasionally do so.  For example, looking at central today, mcmanus and jandem have also landed on central.  Some people do that when they have stuff which absolutely needs to make it to the next nightly.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #39)
> For example, looking at central today, mcmanus and
> jandem have also landed on central.  Some people do that when they have
> stuff which absolutely needs to make it to the next nightly.

mcmanus was accidental (I asked) and jandem's was a situation that almst-certainly would have gotten a=us if requested.
We are definitely not wanting to restrict mozilla-central to sheriffs-only. 

At some point in Q1 we (the sheriffs) will be raising a discussion (or more, polishing off a proposal that's currently WIP, and getting feedback on it) about discouraging non-{merge,chemspill,respin,...,insert-common-sense-reason} commits on mozilla-central, regardless of who makes the push.
Flags: needinfo?(emorley)
(In reply to comment #41)
> We are definitely not wanting to restrict mozilla-central to sheriffs-only. 
> 
> At some point in Q1 we (the sheriffs) will be raising a discussion (or more,
> polishing off a proposal that's currently WIP, and getting feedback on it)
> about discouraging non-{merge,chemspill,respin,...,insert-common-sense-reason}
> commits on mozilla-central, regardless of who makes the push.

That's very sane, especially since that it basically documents 99% of the status quo.  Thanks for doing that.
Simplified proposal, based on the fact that bug 955447 says that if we ever did build on a good m-c rev, rather than the tip rev, it was before 2011, and we've actually been doing tip for years now:

1) Just do it.

Updated

3 years ago
Depends on: 983731
(Assignee)

Comment 44

3 years ago
I think bug 983731 addresses the rest of the issues here.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Doesn't mozilla-central need enable_perproduct_builds setting?
Flags: needinfo?(catlee)
(Assignee)

Comment 46

3 years ago
Yes, I think you're right.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 47

3 years ago
Created attachment 8395935 [details] [diff] [review]
enable per-product builds on mozilla-central
Attachment #8395935 - Flags: review?(nthomas)
Flags: needinfo?(catlee)
Comment on attachment 8395935 [details] [diff] [review]
enable per-product builds on mozilla-central

Which just leaves the question of whether this should be the default or not.
Attachment #8395935 - Flags: review?(nthomas) → review+
(Assignee)

Updated

3 years ago
Attachment #8395935 - Flags: checked-in+
something is in production
(Assignee)

Updated

3 years ago
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED

Updated

3 years ago
Depends on: 1056790

Updated

3 years ago
Blocks: 1056790
No longer depends on: 1056790

Updated

3 years ago
Blocks: 1056792
You need to log in before you can comment on or make changes to this bug.