881950 - Update test harness to return a retry return code when activation fails

Reporter

Description

•

11 years ago

Spun off from bug 847442 - > WRT to C, jimm can you please write a patch that has a return code of 4 if the > metrotestharness.exe fails to activate? It will turn the job purple and run it > on another win8 machine.

Jim Mathies [:jimm]

Reporter

Updated

•

11 years ago

Blocks: 880693

Jim Mathies [:jimm]

Reporter

Updated

•

11 years ago

Assignee: nobody → jmathies

Jim Mathies [:jimm]

Reporter

Comment 1

•

11 years ago

Attached patch patch — Details — Splinter Review

Jim Mathies [:jimm]

Reporter

Comment 2

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&showall=1&rev=1b8d13e04ca0

Jim Mathies [:jimm]

Reporter

Comment 3

•

11 years ago

Comment on attachment 761491 [details] [diff] [review] patch I haven't gotten this to trigger yet on try, but assuming I do and this works, seeking review.

Attachment #761491 - Flags: review?(netzen)

Brian R. Bondy [:bbondy]

Comment 4

•

11 years ago

Comment on attachment 761491 [details] [diff] [review] patch Review of attachment 761491 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/metro/shell/testing/metrotestharness.cpp @@ +193,5 @@ > +// From buildbotcustom logic: > +#define SUCCESS 0 > +#define WARNINGS 1 > +#define FAILURE 2 > +#define EXCEPTION 3 Is it expected that EXCEPTION is not used? @@ +194,5 @@ > +#define SUCCESS 0 > +#define WARNINGS 1 > +#define FAILURE 2 > +#define EXCEPTION 3 > +#define RETRY 4 /* will retry endlessly on new slaves, be careful with this! */ Did you mean to add any RETRY logic? You return it but I don't see it used anywhere.

Jim Mathies [:jimm]

Reporter

Comment 5

•

11 years ago

These are just return codes that buildbot picks up and acts on. I added defines for each but we only currently return three out of the four. According to armen, if I return RETRY the harness will pick that up, mark it as a blue exception on tbpl, and retrigger the run automatically.

Jim Mathies [:jimm]

Reporter

Comment 6

•

11 years ago

(In reply to Jim Mathies [:jimm] from comment #5) > These are just return codes that buildbot picks up and acts on. I added > defines for each but we only currently return three out of the four. > According to armen, if I return RETRY the harness will pick that up, mark it > as a blue exception on tbpl, and retrigger the run automatically. * three out of the five

Brian R. Bondy [:bbondy]

Comment 7

•

11 years ago

Comment on attachment 761491 [details] [diff] [review] patch Review of attachment 761491 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for explaining, sounds good.

Attachment #761491 - Flags: review?(netzen) → review+

Jim Mathies [:jimm]

Reporter

Comment 8

•

11 years ago

I triggered about twenty runs and didn't get it, so I'm just going to land this and we'll see what happens on inbound. There we're seeing about five per day.

Jim Mathies [:jimm]

Reporter

Comment 9

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/459c06768df6

Phil Ringnalda (:philor)

Comment 10

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=24070010&tree=Mozilla-Inbound - I sort of wondered whether it was really your return code being 4 that would do it, or it was mozharness returning 4 to buildbot. Looks like it's the latter, and you have to do something to tell mozharness that it should be giving buildbot the 4.

Jim Mathies [:jimm]

Reporter

Updated

•

11 years ago

Whiteboard: [leave-open]

Armen [:armenzg]

Comment 11

•

11 years ago

I will look into it. Thanks philor.

Assignee: jmathies → armenzg

Ed Morley [:emorley]

Comment 12

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/459c06768df6

Jim Mathies [:jimm]

Reporter

Comment 13

•

11 years ago

The return codes are coming back, so I guess we just need some sort of automation harness change to pick these up - 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254 Bug 880693 - Intermittent metrotestharness.exe | ActivateApplication result 80270254 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run 06:57:43 ERROR - Return code: 4

Armen [:armenzg]

Comment 14

•

11 years ago

(In reply to Jim Mathies [:jimm] from comment #13) > The return codes are coming back, so I guess we just need some sort of > automation harness change to pick these up - > > 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | > ActivateApplication result 80270254 > Bug 880693 - Intermittent metrotestharness.exe | ActivateApplication result > 80270254 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited > with code 4 during test run > 06:57:43 ERROR - Return code: 4 Yeah. I'm looking into mozharness' code to find out how to make it pipe correctly.

Armen [:armenzg]

Updated

•

11 years ago

Component: Tests → Release Engineering: Automation (General)

Product: Firefox for Metro → mozilla.org

QA Contact: catlee

Version: Trunk → other

Armen [:armenzg]

Comment 15

•

11 years ago

Besides mozharness not respecting return codes because all other suites do not necessarily respect standards, I'm pretty sure that this two lines are what makes the parser turn the whole job as just a warning. 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run I'm pretty sure it is "TEST-UNEXPECTED-FAIL" that makes us end with TBPL WARNING status. I'm not sure though. This is so messy. This happens somewhere in parse_line_by_line: http://hg.mozilla.org/build/mozharness/file/default/mozharness/mozilla/testing/unittest.py#l74 which grabs its matching info from here: http://hg.mozilla.org/build/mozharness/file/59a2f5792c69/mozharness/mozilla/testing/errors.py#l22 I think if we added another entry under "harness_error": http://hg.mozilla.org/build/mozharness/file/59a2f5792c69/mozharness/mozilla/testing/errors.py#l46 and we printed it from inside the harness (not the mozharness) we could then turn it into a TBPL_RETRY. Here's some of the code path: * run the command [1] * we parse the log and the return code [2] * the parser runs [3] * we set the buildbot status to warning [4] > WARNING - - # TBPL WARNING # * self.worst_buildbot_status is what is 1 and makes us just warn [5] > program finished with exit code 1 I'm going to experiment with something on staging. [1] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l349 return_code = \ self.run_command(cmd, cwd=dirs['abs_work_dir'], output_parser=parser, env=env) [2] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l361 tbpl_status, log_level = parser.evaluate_parser(return_code) [3] http://hg.mozilla.org/build/mozharness/file/default/mozharness/mozilla/testing/unittest.py#l116 [4] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l364 [5] http://hg.mozilla.org/build/mozharness/file/tip/mozharness/mozilla/buildbot.py#l89 self.buildbot_status(tbpl_status, level=log_level)

Armen [:armenzg]

Comment 16

•

11 years ago

Attached patch mozharness retry example — Details — Splinter Review

We will have to replace the "RETRY_ME_PLEASE" with some other string that becomes a consensus. Does everything else look good? I will send an email to get a consensus meanwhile.

Attachment #762282 - Flags: feedback?(aki)

Jim Mathies [:jimm]

Reporter

Comment 17

•

11 years ago

(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #15) > Besides mozharness not respecting return codes because all other suites do > not necessarily respect standards, I'm pretty sure that this two lines are > what makes the parser turn the whole job as just a warning. > 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | > ActivateApplication result 80270254 > 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code > 4 during test run > > I'm pretty sure it is "TEST-UNEXPECTED-FAIL" that makes us end with TBPL > WARNING status. I'm not sure though. This is so messy. I can strip that text out of the internal harness output - http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#59 http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#319 if you think it will help.

Aki Sasaki (not active)

Comment 18

•

11 years ago

Comment on attachment 762282 [details] [diff] [review] mozharness retry example I'm not sure if CRITICAL is the right log level, but if it is, that's fine. I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this to another unique string. This looks good in terms of how it's written; thanks for figuring this out!

Attachment #762282 - Flags: feedback?(aki) → feedback+

Armen [:armenzg]

Comment 19

•

11 years ago

Assigning to coop to find someone to drive this to completion. Volunteers welcomed! This is what we need: 1) get a consensus on what string to fail on (I'm sending an email now) 2) jimm to get the metro job to print that magic retry string 3) review and land this patch

Assignee: armenzg → coop

Jim Mathies [:jimm]

Reporter

Comment 20

•

11 years ago

(In reply to Aki Sasaki [:aki] from comment #18) > Comment on attachment 762282 [details] [diff] [review] > mozharness retry example > > I'm not sure if CRITICAL is the right log level, but if it is, that's fine. > > I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this > to another unique string. > > This looks good in terms of how it's written; thanks for figuring this out! How about 'FAIL_SHOULD_RETRY'? The test framework should make it's own decision about how many retries it grants a single build (say 3 times tops, then it should fail the run). I think armen filed a bug on doing this, currently I guess we will retry indefinitely, potentially exhausting the slave pool.

Jim Mathies [:jimm]

Reporter

Comment 21

•

11 years ago

(In reply to Jim Mathies [:jimm] from comment #20) > (In reply to Aki Sasaki [:aki] from comment #18) > > Comment on attachment 762282 [details] [diff] [review] > > mozharness retry example > > > > I'm not sure if CRITICAL is the right log level, but if it is, that's fine. > > > > I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this > > to another unique string. > > > > This looks good in terms of how it's written; thanks for figuring this out! > > How about 'FAIL_SHOULD_RETRY'? The test framework should make it's own > decision about how many retries it grants a single build (say 3 times tops, > then it should fail the run). I think armen filed a bug on doing this, > currently I guess we will retry indefinitely, potentially exhausting the > slave pool. Actually to match other test output, that should be 'FAIL-SHOULD-RETRY'.

Jim Mathies [:jimm]

Reporter

Comment 22

•

11 years ago

Attached patch harness update — Details — Splinter Review

Add retry message support.

Attachment #762742 - Flags: review?(netzen)

Brian R. Bondy [:bbondy]

Updated

•

11 years ago

Attachment #762742 - Flags: review?(netzen) → review+

Jim Mathies [:jimm]

Reporter

Comment 23

•

11 years ago

I'll check this in once we have sign off from releng.

Jim Mathies [:jimm]

Reporter

Updated

•

11 years ago

Blocks: 884247

Chris Cooper [:coop] (he/him)

Assignee

Comment 24

•

11 years ago

(In reply to Jim Mathies [:jimm] from comment #23) > I'll check this in once we have sign off from releng. What sign off do you need, Jim? I'm fine with 'FAIL-SHOULD-RETRY' if that's what you're looking for.

Jim Mathies [:jimm]

Reporter

Comment 25

•

11 years ago

(In reply to Chris Cooper [:coop] from comment #24) > (In reply to Jim Mathies [:jimm] from comment #23) > > I'll check this in once we have sign off from releng. > > What sign off do you need, Jim? > > I'm fine with 'FAIL-SHOULD-RETRY' if that's what you're looking for. Sounds good, I'll check it in. Can someone from releng pick up step 3 in comment 19?

Jim Mathies [:jimm]

Reporter

Comment 26

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c506a6940d07

Aki Sasaki (not active)

Comment 27

•

11 years ago

Comment on attachment 762282 [details] [diff] [review] mozharness retry example Checked in, with new retry string. https://hg.mozilla.org/build/mozharness/rev/5c195cb35e14 Not live yet. I can transplant to production if we need this sooner, or wait til :edmorley is comfy with http://hg.mozilla.org/build/mozharness/rev/7a39cb9045f9 and merge everything at once.

Attachment #762282 - Flags: checked-in+

Phil Ringnalda (:philor)

Comment 28

•

11 years ago

Made them unstarrable, since tbpl doesn't know that FAIL-SHOULD-RETRY has any bearing on results, so at least you quieted me down about them :)

Ryan VanderMeulen [:RyanVM]

Comment 29

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/c506a6940d07

Aki Sasaki (not active)

Comment 30

•

11 years ago

The mozharness portion is in production.

Jim Mathies [:jimm]

Reporter

Comment 31

•

11 years ago

This doesn't seem to be working. From a recent build on inbound: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt%206.2&rev=5b3196ad66f4 15:52:42 CRITICAL - FAIL-SHOULD-RETRY | metrotestharness.exe | ActivateApplication result 80270254 15:52:42 INFO - INFO | metrotestharness.exe | Deleting C:\slave\test\build\application\firefox\tests.ini 15:52:42 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run 15:52:42 INFO - INFO | automation.py | Application ran for: 0:00:00.385000 15:52:42 INFO - INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmprgt87ipidlog 15:52:43 INFO - SUCCESS: The process with PID 3236 has been terminated. 15:52:43 INFO - ERROR: The process with PID 1572 could not be terminated. 15:52:43 INFO - Reason: There is no running instance of the task. 15:52:43 INFO - SUCCESS: The process with PID 3956 has been terminated. 15:52:43 INFO - WARNING | leakcheck | refcount logging is off, so leaks can't be detected! 15:52:43 INFO - INFO | runtests.py | Running tests: end. 15:52:43 ERROR - Return code: 4 15:52:43 INFO - TinderboxPrint: mochitest-mochitest-metro-chrome<br/><em class="testfail">T-FAIL</em> 15:52:43 CRITICAL - # TBPL WARNING # 15:52:43 CRITICAL - The mochitest suite: mochitest-metro-chrome ran with return status: WARNING 15:52:43 INFO - Copying logs to upload dir... 15:52:43 INFO - mkdir: C:\slave\test\build\upload\logs The FAIL-SHOULD-RETRY seems to get picked up as a CRITICAL error, but the run doesn't restart.

Aki Sasaki (not active)

Comment 32

•

11 years ago

Attached patch not sure how this ever worked — Details — Splinter Review

We used to either a) not use worst_level, or b) use worst_level against the keys of an unordered dict. Now we're using worst_level in all self.tbpl_status setting locations, and using an ordered tuple.

Attachment #765138 - Flags: review?(rail)

Rail Aliiev [:rail]

Updated

•

11 years ago

Attachment #765138 - Flags: review?(rail) → review+

Aki Sasaki (not active)

Comment 33

•

11 years ago

Comment on attachment 765138 [details] [diff] [review] not sure how this ever worked http://hg.mozilla.org/build/mozharness/rev/c03f32f78920

Attachment #765138 - Flags: checked-in+

Aki Sasaki (not active)

Comment 34

•

11 years ago

This should be fixed.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Jim Mathies [:jimm]

Reporter

Updated

•

11 years ago

Whiteboard: [leave-open]

Jim Mathies [:jimm]

Reporter

Comment 35

•

11 years ago

woot! https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=WINNT%206.2%20mozilla-inbound%20opt&rev=e8d3827f914c

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Updated

•

10 years ago

OS: Windows 8 Metro → Windows 8.1

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

patch 11 years ago Jim Mathies [:jimm] 5.67 KB, patch	bbondy : review+	Details \| Diff \| Splinter Review
mozharness retry example 11 years ago Armen [:armenzg] 4.00 KB, patch	mozilla : feedback+ mozilla : checked-in+	Details \| Diff \| Splinter Review
harness update 11 years ago Jim Mathies [:jimm] 7.18 KB, patch	bbondy : review+	Details \| Diff \| Splinter Review
not sure how this ever worked 11 years ago Aki Sasaki (not active) 4.50 KB, patch	rail : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review