Automatic backfilling should deal better with perma failures

RESOLVED FIXED

Status

Testing
General
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Armen - back on June 11th, Assigned: Armen - back on June 11th)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

2 years ago
RyanVM: you had some thoughts on this. Would you mind elaborating and helping us understand better?
ni? myself to keep this on my radar
Flags: needinfo?(ryanvm)
I'm not sure what more there is to add here beyond what's covered by the various deps of bug 1180732. I think the main point is we don't want unbounded backfilling on new jobs that fail (and won't have a previously-green run since they're new) and we don't want to bother with backfilling/retrying if a cause has already been identified and backed out.

One other interesting observation I made, though, is that automatic backfilling/retriggering doesn't distinguish between visible and hidden jobs. So when a new permafailing job gets turned on (happened recently) or in general if you have a hidden permafailing job, you end up with a huge pile of backfilled jobs that are also failing in addition to auto-retries on them when they fail! I suspect that's a contributing factor to why we had some serious backlog issues last week.
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #2)
> One other interesting observation I made, though, is that automatic
> backfilling/retriggering doesn't distinguish between visible and hidden
> jobs. So when a new permafailing job gets turned on (happened recently) or
> in general if you have a hidden permafailing job, you end up with a huge
> pile of backfilled jobs that are also failing in addition to auto-retries on
> them when they fail! I suspect that's a contributing factor to why we had
> some serious backlog issues last week.

Looks like Armen just filed bug 1197223 for this.
(Assignee)

Comment 4

2 years ago
That's right. Thanks for the info!
(Assignee)

Comment 5

2 years ago
I'm going to start looking into this.
Assignee: nobody → armenzg
(Assignee)

Comment 6

2 years ago
https://github.com/mozilla/mozilla_ci_tools/commit/aeabd6f50e3e0ca97ab7de1ec7d2a81beb700c95

from mozci.mozci import (
    find_backfill_revlist
)
revlist = find_backfill_revlist(
    repo_url='http://hg.mozilla.org/integration/mozilla-inbound',
    buildername='Ubuntu VM 12.04 mozilla-inbound opt test gtest',
    revision='0eee1ce8d43c',
    max_revisions=7,
)
print revlist
[]

revlist = find_backfill_revlist(
    repo_url='http://hg.mozilla.org/integration/mozilla-inbound',
    buildername='Android 4.3 armv7 API 11+ mozilla-inbound opt test plain-reftest-5',
    revision='0eee1ce8d43c',
    max_revisions=7,
)
print revlist
[u'0eee1ce8d43c', u'aa291bcfb0e8', u'afd0786c65f5']
(Assignee)

Comment 7

2 years ago
Being tested under dry_run:
https://github.com/mozilla/pulse_actions/commit/37d818381c9c63d31607c9a280624061dfbe467f
(Assignee)

Comment 8

2 years ago
Deployed a bunch of changes.

It could affect:
* trigger missing jobs
* trigger talos jobs
* manual backfill

sheriffs, jmaher: Please let me know if you notice anything wonky.

Automatic backfilling is not yet running without dry run.
(Assignee)

Comment 9

2 years ago
No issues so far.

I've enabled automated backfilling.

Here's the first backfill:
> Oct 28 07:20:13 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:58d4fc52_Windows 7 32-bit mozilla-inbound opt test mochitest-2 will backfill [u'58d4fc528b3b', u'2730cc97c6ec', u'9a67e1d55e0d', u'b7dd8bf95c82', u'80f9778bb787']. 

Here's are few *not* backfilled jobs:
> Oct 28 07:17:08 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:b5acf46a_Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-jetpack will not backfill. 
> Oct 28 07:19:03 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:53952bbf_Ubuntu VM 12.04 mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:19:11 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:1e9c356a_b2g_emulator_vm mozilla-inbound opt test marionette-webapi will not backfill. 
> Oct 28 07:20:28 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:c537a7eb_Ubuntu VM 12.04 mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:20:33 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:1949b1c7_Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound opt test gtest will not backfill. 
> Oct 28 07:20:45 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:58d4fc52_Windows 7 32-bit mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:23:40 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:8d655f2a_Ubuntu VM 12.04 mozilla-inbound debug test mochitest-jetpack will not backfill. 
> Oct 28 07:24:30 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:2730cc97_Windows 7 32-bit mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:25:24 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:c537a7eb_Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test gtest will not backfill. 

Both hidden jobs:
* Ubuntu debug jetpack: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=Ubuntu%20debug%20jetpack&exclusion_profile=false&fromchange=39af5c53fad6
* gtest: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=gtest&fromchange=39af5c53fad6&exclusion_profile=false

To my surprise jetpack is *sometimes* green.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.