Closed
Bug 881950
Opened 11 years ago
Closed 11 years ago
Update test harness to return a retry return code when activation fails
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jimm, Assigned: coop)
References
Details
Attachments
(4 files)
5.67 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
4.00 KB,
patch
|
mozilla
:
feedback+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
7.18 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
4.50 KB,
patch
|
rail
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
Spun off from bug 847442 -
> WRT to C, jimm can you please write a patch that has a return code of 4 if the > metrotestharness.exe fails to activate? It will turn the job purple and run it > on another win8 machine.
Reporter | ||
Updated•11 years ago
|
Assignee: nobody → jmathies
Reporter | ||
Comment 1•11 years ago
|
||
Reporter | ||
Comment 2•11 years ago
|
||
Reporter | ||
Comment 3•11 years ago
|
||
Comment on attachment 761491 [details] [diff] [review]
patch
I haven't gotten this to trigger yet on try, but assuming I do and this works, seeking review.
Attachment #761491 -
Flags: review?(netzen)
Comment 4•11 years ago
|
||
Comment on attachment 761491 [details] [diff] [review]
patch
Review of attachment 761491 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/metro/shell/testing/metrotestharness.cpp
@@ +193,5 @@
> +// From buildbotcustom logic:
> +#define SUCCESS 0
> +#define WARNINGS 1
> +#define FAILURE 2
> +#define EXCEPTION 3
Is it expected that EXCEPTION is not used?
@@ +194,5 @@
> +#define SUCCESS 0
> +#define WARNINGS 1
> +#define FAILURE 2
> +#define EXCEPTION 3
> +#define RETRY 4 /* will retry endlessly on new slaves, be careful with this! */
Did you mean to add any RETRY logic? You return it but I don't see it used anywhere.
Reporter | ||
Comment 5•11 years ago
|
||
These are just return codes that buildbot picks up and acts on. I added defines for each but we only currently return three out of the four. According to armen, if I return RETRY the harness will pick that up, mark it as a blue exception on tbpl, and retrigger the run automatically.
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #5)
> These are just return codes that buildbot picks up and acts on. I added
> defines for each but we only currently return three out of the four.
> According to armen, if I return RETRY the harness will pick that up, mark it
> as a blue exception on tbpl, and retrigger the run automatically.
* three out of the five
Comment 7•11 years ago
|
||
Comment on attachment 761491 [details] [diff] [review]
patch
Review of attachment 761491 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for explaining, sounds good.
Attachment #761491 -
Flags: review?(netzen) → review+
Reporter | ||
Comment 8•11 years ago
|
||
I triggered about twenty runs and didn't get it, so I'm just going to land this and we'll see what happens on inbound. There we're seeing about five per day.
Reporter | ||
Comment 9•11 years ago
|
||
Comment 10•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=24070010&tree=Mozilla-Inbound - I sort of wondered whether it was really your return code being 4 that would do it, or it was mozharness returning 4 to buildbot. Looks like it's the latter, and you have to do something to tell mozharness that it should be giving buildbot the 4.
Reporter | ||
Updated•11 years ago
|
Whiteboard: [leave-open]
Comment 12•11 years ago
|
||
Reporter | ||
Comment 13•11 years ago
|
||
The return codes are coming back, so I guess we just need some sort of automation harness change to pick these up -
06:57:41 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254
Bug 880693 - Intermittent metrotestharness.exe | ActivateApplication result 80270254 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run
06:57:43 ERROR - Return code: 4
Comment 14•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #13)
> The return codes are coming back, so I guess we just need some sort of
> automation harness change to pick these up -
>
> 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe |
> ActivateApplication result 80270254
> Bug 880693 - Intermittent metrotestharness.exe | ActivateApplication result
> 80270254 06:57:41 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited
> with code 4 during test run
> 06:57:43 ERROR - Return code: 4
Yeah. I'm looking into mozharness' code to find out how to make it pipe correctly.
Updated•11 years ago
|
Component: Tests → Release Engineering: Automation (General)
Product: Firefox for Metro → mozilla.org
QA Contact: catlee
Version: Trunk → other
Comment 15•11 years ago
|
||
Besides mozharness not respecting return codes because all other suites do not necessarily respect standards, I'm pretty sure that this two lines are what makes the parser turn the whole job as just a warning.
12:58:43 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254
12:58:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run
I'm pretty sure it is "TEST-UNEXPECTED-FAIL" that makes us end with TBPL WARNING status. I'm not sure though. This is so messy.
This happens somewhere in parse_line_by_line:
http://hg.mozilla.org/build/mozharness/file/default/mozharness/mozilla/testing/unittest.py#l74
which grabs its matching info from here:
http://hg.mozilla.org/build/mozharness/file/59a2f5792c69/mozharness/mozilla/testing/errors.py#l22
I think if we added another entry under "harness_error":
http://hg.mozilla.org/build/mozharness/file/59a2f5792c69/mozharness/mozilla/testing/errors.py#l46
and we printed it from inside the harness (not the mozharness) we could then turn it into a TBPL_RETRY.
Here's some of the code path:
* run the command [1]
* we parse the log and the return code [2]
* the parser runs [3]
* we set the buildbot status to warning [4]
> WARNING - - # TBPL WARNING #
* self.worst_buildbot_status is what is 1 and makes us just warn [5]
> program finished with exit code 1
I'm going to experiment with something on staging.
[1] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l349
return_code = \
self.run_command(cmd, cwd=dirs['abs_work_dir'], output_parser=parser, env=env)
[2] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l361
tbpl_status, log_level = parser.evaluate_parser(return_code)
[3] http://hg.mozilla.org/build/mozharness/file/default/mozharness/mozilla/testing/unittest.py#l116
[4] http://hg.mozilla.org/build/mozharness/file/default/scripts/desktop_unittest.py#l364
[5] http://hg.mozilla.org/build/mozharness/file/tip/mozharness/mozilla/buildbot.py#l89
self.buildbot_status(tbpl_status, level=log_level)
Comment 16•11 years ago
|
||
We will have to replace the "RETRY_ME_PLEASE" with some other string that becomes a consensus.
Does everything else look good?
I will send an email to get a consensus meanwhile.
Attachment #762282 -
Flags: feedback?(aki)
Reporter | ||
Comment 17•11 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #15)
> Besides mozharness not respecting return codes because all other suites do
> not necessarily respect standards, I'm pretty sure that this two lines are
> what makes the parser turn the whole job as just a warning.
> 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | metrotestharness.exe |
> ActivateApplication result 80270254
> 12:58:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code
> 4 during test run
>
> I'm pretty sure it is "TEST-UNEXPECTED-FAIL" that makes us end with TBPL
> WARNING status. I'm not sure though. This is so messy.
I can strip that text out of the internal harness output -
http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#59
http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#319
if you think it will help.
Comment 18•11 years ago
|
||
Comment on attachment 762282 [details] [diff] [review]
mozharness retry example
I'm not sure if CRITICAL is the right log level, but if it is, that's fine.
I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this to another unique string.
This looks good in terms of how it's written; thanks for figuring this out!
Attachment #762282 -
Flags: feedback?(aki) → feedback+
Comment 19•11 years ago
|
||
Assigning to coop to find someone to drive this to completion.
Volunteers welcomed!
This is what we need:
1) get a consensus on what string to fail on (I'm sending an email now)
2) jimm to get the metro job to print that magic retry string
3) review and land this patch
Assignee: armenzg → coop
Reporter | ||
Comment 20•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #18)
> Comment on attachment 762282 [details] [diff] [review]
> mozharness retry example
>
> I'm not sure if CRITICAL is the right log level, but if it is, that's fine.
>
> I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this
> to another unique string.
>
> This looks good in terms of how it's written; thanks for figuring this out!
How about 'FAIL_SHOULD_RETRY'? The test framework should make it's own decision about how many retries it grants a single build (say 3 times tops, then it should fail the run). I think armen filed a bug on doing this, currently I guess we will retry indefinitely, potentially exhausting the slave pool.
Reporter | ||
Comment 21•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #20)
> (In reply to Aki Sasaki [:aki] from comment #18)
> > Comment on attachment 762282 [details] [diff] [review]
> > mozharness retry example
> >
> > I'm not sure if CRITICAL is the right log level, but if it is, that's fine.
> >
> > I'm actually fine with "RETRY_ME_PLEASE" but am also ok with changing this
> > to another unique string.
> >
> > This looks good in terms of how it's written; thanks for figuring this out!
>
> How about 'FAIL_SHOULD_RETRY'? The test framework should make it's own
> decision about how many retries it grants a single build (say 3 times tops,
> then it should fail the run). I think armen filed a bug on doing this,
> currently I guess we will retry indefinitely, potentially exhausting the
> slave pool.
Actually to match other test output, that should be 'FAIL-SHOULD-RETRY'.
Reporter | ||
Comment 22•11 years ago
|
||
Add retry message support.
Attachment #762742 -
Flags: review?(netzen)
Updated•11 years ago
|
Attachment #762742 -
Flags: review?(netzen) → review+
Reporter | ||
Comment 23•11 years ago
|
||
I'll check this in once we have sign off from releng.
Assignee | ||
Comment 24•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #23)
> I'll check this in once we have sign off from releng.
What sign off do you need, Jim?
I'm fine with 'FAIL-SHOULD-RETRY' if that's what you're looking for.
Reporter | ||
Comment 25•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #24)
> (In reply to Jim Mathies [:jimm] from comment #23)
> > I'll check this in once we have sign off from releng.
>
> What sign off do you need, Jim?
>
> I'm fine with 'FAIL-SHOULD-RETRY' if that's what you're looking for.
Sounds good, I'll check it in. Can someone from releng pick up step 3 in comment 19?
Reporter | ||
Comment 26•11 years ago
|
||
Comment 27•11 years ago
|
||
Comment on attachment 762282 [details] [diff] [review]
mozharness retry example
Checked in, with new retry string.
https://hg.mozilla.org/build/mozharness/rev/5c195cb35e14
Not live yet. I can transplant to production if we need this sooner, or wait til :edmorley is comfy with http://hg.mozilla.org/build/mozharness/rev/7a39cb9045f9 and merge everything at once.
Attachment #762282 -
Flags: checked-in+
Comment 28•11 years ago
|
||
Made them unstarrable, since tbpl doesn't know that FAIL-SHOULD-RETRY has any bearing on results, so at least you quieted me down about them :)
Comment 29•11 years ago
|
||
Comment 30•11 years ago
|
||
The mozharness portion is in production.
Reporter | ||
Comment 31•11 years ago
|
||
This doesn't seem to be working. From a recent build on inbound:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt%206.2&rev=5b3196ad66f4
15:52:42 CRITICAL - FAIL-SHOULD-RETRY | metrotestharness.exe | ActivateApplication result 80270254
15:52:42 INFO - INFO | metrotestharness.exe | Deleting C:\slave\test\build\application\firefox\tests.ini
15:52:42 WARNING - TEST-UNEXPECTED-FAIL | automation.py | Exited with code 4 during test run
15:52:42 INFO - INFO | automation.py | Application ran for: 0:00:00.385000
15:52:42 INFO - INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmprgt87ipidlog
15:52:43 INFO - SUCCESS: The process with PID 3236 has been terminated.
15:52:43 INFO - ERROR: The process with PID 1572 could not be terminated.
15:52:43 INFO - Reason: There is no running instance of the task.
15:52:43 INFO - SUCCESS: The process with PID 3956 has been terminated.
15:52:43 INFO - WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
15:52:43 INFO - INFO | runtests.py | Running tests: end.
15:52:43 ERROR - Return code: 4
15:52:43 INFO - TinderboxPrint: mochitest-mochitest-metro-chrome<br/><em class="testfail">T-FAIL</em>
15:52:43 CRITICAL - # TBPL WARNING #
15:52:43 CRITICAL - The mochitest suite: mochitest-metro-chrome ran with return status: WARNING
15:52:43 INFO - Copying logs to upload dir...
15:52:43 INFO - mkdir: C:\slave\test\build\upload\logs
The FAIL-SHOULD-RETRY seems to get picked up as a CRITICAL error, but the run doesn't restart.
Comment 32•11 years ago
|
||
We used to either a) not use worst_level, or b) use worst_level against the keys of an unordered dict.
Now we're using worst_level in all self.tbpl_status setting locations, and using an ordered tuple.
Attachment #765138 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #765138 -
Flags: review?(rail) → review+
Comment 33•11 years ago
|
||
Comment on attachment 765138 [details] [diff] [review]
not sure how this ever worked
http://hg.mozilla.org/build/mozharness/rev/c03f32f78920
Attachment #765138 -
Flags: checked-in+
Comment 34•11 years ago
|
||
This should be fixed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•11 years ago
|
Whiteboard: [leave-open]
Reporter | ||
Comment 35•11 years ago
|
||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
OS: Windows 8 Metro → Windows 8.1
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•