Add a timeout parameter to mozharness' run_command

RESOLVED FIXED

Status

Release Engineering
Mozharness
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: jgriffin, Assigned: jgriffin)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Assignee)

Description

4 years ago
We have a lot of B2G test runs getting killed during the test by the 1200s timeout, and this prevents the mozharness script from being able to dump logcat, which leaves us with very little information to go on.

I think we need to add a 'timeout' parameter to the run_command method so we can kill the command when it has exceeded a certain threshold without output, and then continue with the mozharness script.

Alternately, we could implement some new mozprocess-based command that could be used selectively in scripts that have installed mozprocess.

Comment 1

4 years ago
Essentially a dup of bug 840305, though this bug has the better summary for non-TBPL folk.
(Assignee)

Comment 2

4 years ago
Created attachment 774374 [details] [diff] [review]
WIP: add mozprocess_run_command to mozharness
(Assignee)

Updated

4 years ago
Assignee: nobody → jgriffin
(Assignee)

Comment 3

4 years ago
This isn't exactly what I want for a final version, but I will run this on ash to see if anything breaks horribly.
Is 203 test failures in a push horrible enough? ;)

Just an import error with mozprocess:
https://tbpl.mozilla.org/php/getParsedLog.php?id=25200546&tree=Ash
(Assignee)

Comment 5

4 years ago
Yep that's bad, I'm trying to figure out what went wrong. :)
(Assignee)

Comment 6

4 years ago
So, I made some fixes and it looks like desktop tests are running green.  Android tests are busted, though.  Will do some more investigation.
(Assignee)

Comment 7

4 years ago
Looks like I fixed the Android bustage.  Now I just need to add some code to make the error handling more intelligent.
(Assignee)

Comment 8

4 years ago
Created attachment 774996 [details] [diff] [review]
WIP 0.2: add mozprocess_run_command to mozharness
(Assignee)

Updated

4 years ago
Attachment #774374 - Attachment is obsolete: true
(Assignee)

Comment 9

4 years ago
Created attachment 775030 [details] [diff] [review]
WIP 0.3: add mozprocess_run_command to mozharness

Added better error handling; will trigger another run on ash.
(Assignee)

Updated

4 years ago
Attachment #774996 - Attachment is obsolete: true
(Assignee)

Comment 10

4 years ago
Everything was green on the last run, except the B2G emu tests, which were red due to gps' resource monitoring patches.  I'm going to merge mozharness to ash-mozharness, and trigger that set of tests again.
(Assignee)

Comment 11

4 years ago
Created attachment 775962 [details] [diff] [review]
WIP 0.4: add mozprocess_run_command to mozharness
(Assignee)

Updated

4 years ago
Attachment #775030 - Attachment is obsolete: true
(Assignee)

Comment 12

4 years ago
Comment on attachment 775962 [details] [diff] [review]
WIP 0.4: add mozprocess_run_command to mozharness

I'm going to do another full run on ash with a couple of tweaks I added here, but I think this version is suitable for feedback.

Questions:  are we ok defining a mozprocess_run_command command, like this patch does?  Or would we rather just add a mozprocess arg to run_command, or something else?

Should we have a not-None default for outputTimeout?

If we do that, should we make run_command smart enough so that it detects if mozprocess is available in site-packages, and if so, use the mozprocess method by default?
Attachment #775962 - Flags: feedback?(aki)

Comment 13

4 years ago
Comment on attachment 775962 [details] [diff] [review]
WIP 0.4: add mozprocess_run_command to mozharness

a) Thanks for taking this on.  I've taken some stabs at this but haven't had a fix that has stuck, yet.  It seems to be a large source of tbpl comments, and this will help me make my hg-git stuff more stable, too.

b) We have some similar constructs in resource monitoring:
http://hg.mozilla.org/build/mozharness/file/f7fba2c98999/mozharness/base/python.py#l360
http://hg.mozilla.org/build/mozharness/file/f7fba2c98999/mozharness/base/python.py#l383
and mock:
http://hg.mozilla.org/build/mozharness/file/f7fba2c98999/mozharness/mozilla/mock.py#l92

c) I think my only real comment is, could you rename outputTimeout to output_timeout to match the rest of the variable names?

(In reply to Jonathan Griffin (:jgriffin) from comment #12)
> Questions:  are we ok defining a mozprocess_run_command command, like this
> patch does?  Or would we rather just add a mozprocess arg to run_command, or
> something else?
>
> Should we have a not-None default for outputTimeout?
> 
> If we do that, should we make run_command smart enough so that it detects if
> mozprocess is available in site-packages, and if so, use the mozprocess
> method by default?

If you wanted to rework this, allowing for run_command to use mozprocess if a) output_timeout is not None, and b) mozprocess is installed, would be a good workflow.  This workflow of mozprocess_run_command is still better than not having an output timeout at all, though.
Attachment #775962 - Flags: feedback?(aki) → feedback+

Updated

4 years ago
Blocks: 840305
(Assignee)

Comment 14

4 years ago
Created attachment 777207 [details] [diff] [review]
WIP 0.5: add output_timeout to run_command

I've refactored per your last suggestion, and changed outputTimeout to output_timeout.  I'm running another set of tests on ash.
(Assignee)

Updated

4 years ago
Attachment #775962 - Attachment is obsolete: true
(Assignee)

Comment 15

4 years ago
Comment on attachment 777207 [details] [diff] [review]
WIP 0.5: add output_timeout to run_command

The run on ash is looking very green.

We could file a follow-up bug to enable output_timeout support on posix systems without mozprocess, if desired.

Would you like a test for this?  I could potentially add one that is hardcoded to fetch mozprocess from puppetagain, but since none of the other tests operate like this, I'm not sure if you want to add that.
Attachment #777207 - Flags: review?(aki)

Comment 16

4 years ago
Comment on attachment 777207 [details] [diff] [review]
WIP 0.5: add output_timeout to run_command

Thanks!  I imagine we'll iterate on this, but I'd love to see how this affects the intermittent orange bugs dependent on this bug.

I'm not entirely sure how to test this; I don't see an easy way to use mock, for example.  We could potentially add mozprocess to the list of unittest dependencies in requirements.txt, to allow for testing.
Attachment #777207 - Flags: review?(aki) → review+
(Assignee)

Comment 17

4 years ago
Comment on attachment 777207 [details] [diff] [review]
WIP 0.5: add output_timeout to run_command

Landed on default so I can watch it on cedar:

https://hg.mozilla.org/build/mozharness/rev/879cf481df30

Aki, if you want me to write a test by adding mozprocess to requirements.txt, let me know and I'll do that in a follow-up patch.
Attachment #777207 - Flags: checked-in+
Merged to production.
(Assignee)

Updated

4 years ago
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED

Updated

4 years ago
Blocks: 847727
Product: mozilla.org → Release Engineering
(Assignee)

Updated

4 years ago
Depends on: 904839

Updated

3 years ago
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.