Last Comment Bug 1195824 - Automatic backfilling should deal better with perma failures
: Automatic backfilling should deal better with perma failures
Status: RESOLVED FIXED
:
Product: Testing
Classification: Components
Component: General (show other bugs)
: unspecified
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
:
:
Mentors:
Depends on:
Blocks: 1180732
  Show dependency treegraph
 
Reported: 2015-08-18 08:59 PDT by Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
Modified: 2015-10-28 07:31 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-08-18 08:59:42 PDT
RyanVM: you had some thoughts on this. Would you mind elaborating and helping us understand better?
Comment 1 User image Ryan VanderMeulen [:RyanVM] 2015-08-18 13:03:58 PDT
ni? myself to keep this on my radar
Comment 2 User image Ryan VanderMeulen [:RyanVM] 2015-08-21 06:56:51 PDT
I'm not sure what more there is to add here beyond what's covered by the various deps of bug 1180732. I think the main point is we don't want unbounded backfilling on new jobs that fail (and won't have a previously-green run since they're new) and we don't want to bother with backfilling/retrying if a cause has already been identified and backed out.

One other interesting observation I made, though, is that automatic backfilling/retriggering doesn't distinguish between visible and hidden jobs. So when a new permafailing job gets turned on (happened recently) or in general if you have a hidden permafailing job, you end up with a huge pile of backfilled jobs that are also failing in addition to auto-retries on them when they fail! I suspect that's a contributing factor to why we had some serious backlog issues last week.
Comment 3 User image Wes Kocher (:KWierso) 2015-08-21 07:56:47 PDT
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #2)
> One other interesting observation I made, though, is that automatic
> backfilling/retriggering doesn't distinguish between visible and hidden
> jobs. So when a new permafailing job gets turned on (happened recently) or
> in general if you have a hidden permafailing job, you end up with a huge
> pile of backfilled jobs that are also failing in addition to auto-retries on
> them when they fail! I suspect that's a contributing factor to why we had
> some serious backlog issues last week.

Looks like Armen just filed bug 1197223 for this.
Comment 4 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-08-21 08:11:26 PDT
That's right. Thanks for the info!
Comment 5 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-21 10:44:53 PDT
I'm going to start looking into this.
Comment 6 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-21 13:39:58 PDT
https://github.com/mozilla/mozilla_ci_tools/commit/aeabd6f50e3e0ca97ab7de1ec7d2a81beb700c95

from mozci.mozci import (
    find_backfill_revlist
)
revlist = find_backfill_revlist(
    repo_url='http://hg.mozilla.org/integration/mozilla-inbound',
    buildername='Ubuntu VM 12.04 mozilla-inbound opt test gtest',
    revision='0eee1ce8d43c',
    max_revisions=7,
)
print revlist
[]

revlist = find_backfill_revlist(
    repo_url='http://hg.mozilla.org/integration/mozilla-inbound',
    buildername='Android 4.3 armv7 API 11+ mozilla-inbound opt test plain-reftest-5',
    revision='0eee1ce8d43c',
    max_revisions=7,
)
print revlist
[u'0eee1ce8d43c', u'aa291bcfb0e8', u'afd0786c65f5']
Comment 7 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-23 08:52:07 PDT
Being tested under dry_run:
https://github.com/mozilla/pulse_actions/commit/37d818381c9c63d31607c9a280624061dfbe467f
Comment 8 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-27 11:05:54 PDT
Deployed a bunch of changes.

It could affect:
* trigger missing jobs
* trigger talos jobs
* manual backfill

sheriffs, jmaher: Please let me know if you notice anything wonky.

Automatic backfilling is not yet running without dry run.
Comment 9 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2015-10-28 07:31:06 PDT
No issues so far.

I've enabled automated backfilling.

Here's the first backfill:
> Oct 28 07:20:13 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:58d4fc52_Windows 7 32-bit mozilla-inbound opt test mochitest-2 will backfill [u'58d4fc528b3b', u'2730cc97c6ec', u'9a67e1d55e0d', u'b7dd8bf95c82', u'80f9778bb787']. 

Here's are few *not* backfilled jobs:
> Oct 28 07:17:08 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:b5acf46a_Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-jetpack will not backfill. 
> Oct 28 07:19:03 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:53952bbf_Ubuntu VM 12.04 mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:19:11 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:1e9c356a_b2g_emulator_vm mozilla-inbound opt test marionette-webapi will not backfill. 
> Oct 28 07:20:28 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:c537a7eb_Ubuntu VM 12.04 mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:20:33 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:1949b1c7_Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound opt test gtest will not backfill. 
> Oct 28 07:20:45 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:58d4fc52_Windows 7 32-bit mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:23:40 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:8d655f2a_Ubuntu VM 12.04 mozilla-inbound debug test mochitest-jetpack will not backfill. 
> Oct 28 07:24:30 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:2730cc97_Windows 7 32-bit mozilla-inbound debug test gtest will not backfill. 
> Oct 28 07:25:24 pulse-actions app/worker1.1: mozci	 INFO:	 BACKFILL-END:c537a7eb_Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test gtest will not backfill. 

Both hidden jobs:
* Ubuntu debug jetpack: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=Ubuntu%20debug%20jetpack&exclusion_profile=false&fromchange=39af5c53fad6
* gtest: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=gtest&fromchange=39af5c53fad6&exclusion_profile=false

To my surprise jetpack is *sometimes* green.

Note You need to log in before you can comment on or make changes to this bug.