Allow sut execsu with arbitrary timeout

RESOLVED FIXED in mozilla28

Status

defect
RESOLVED FIXED
6 years ago
a year ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

unspecified
mozilla28
x86
macOS
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Assignee

Description

6 years ago
In some circumstances we want to collect logcat traces for the duration of a test, and we want to collect that data as it becomes available (in case the connection to the sutagent is lost during the test). There is some work necessary in the test harness (maybe devicemanager?) to make this work -- we probably need to be collecting the logcat output in a separate thread.

On the sutagent side, there is another problem. If we simply "execsu logcat", logcat is executed and returns data more-or-less as it becomes available, but after 300 seconds, the redirection is stopped and (from the sut client's perspective) the command completes. (logcat keeps running, but the remaining output is lost.)

https://hg.mozilla.org/mozilla-central/file/ad2a5a4f53ec/build/mobile/sutagent/android/DoCommand.java#l3648

Extending the exec/execsu capability to take a timeout parameter would help to solve the logcat problem and make our exec command that much more flexible.


I'm hoping this can be implemented fairly easily and I can get this into sutagent 1.20.
Assignee

Comment 1

6 years ago
sutagent already has commands 'exec', 'execsu', 'execcwd', 'execcwdsu', etc. These are convenient to execute and used in automation -- I don't want to change the behavior of these commands.

I also don't want to add exectimeout, execsutimeout, execcwdtimeout, etc! 

Instead, this patch adds 'execext' (extended exec), slightly harder to use,  extensible to N optional parameters, and currently supporting parameters:
  su - execute via su
  cwd=<dir> - execute with current working directory as specified
  t=<timeout-in-seconds> - wait for program to finish in specified timeout, instead of default timeout of 300 seconds

Note that cwd does not always work -- an existing problem I found in bug 934668.

The timeout does not apply to launches of Java programs -- again, an existing condition.

I changed the waits from a while loop, waiting in 30 second increments, to a single wait (join) for up to the maximum timeout. I believe these to be functionally equivalent.

I have mostly tested this with:

  execext su t=30 logcat -v time 
  execext su t=3000 logcat -v time 
  execsu logcat -v time
Attachment #827011 - Flags: review?(jmaher)
Comment on attachment 827011 [details] [diff] [review]
add execext command

Review of attachment 827011 [details] [diff] [review]:
-----------------------------------------------------------------

overall, this should work well.  A few ideas/questions:
1) Have a set_timeout function that adjusts the default timeout, then you can use the existing functions, not ideal, but another solution
2) do we have a lot of use of execcwdsu and execcwd?  maybe we could deprecate those in a future version
3) we have negatus, will this get ported there?
4) have you tested this in a harness while collecting logcat and running tests? (we could get rid of our file logging since everything goes to logcat!)

::: build/mobile/sutagent/android/DoCommand.java
@@ +815,5 @@
> +                if (Argc >= 2)
> +                    {
> +                    boolean su = false;
> +                    String cwd = null;
> +                    int timeout = 0;

shouldn't we default this to DEFAULT_STARTPRG_TIMEOUT_SECONDS?

@@ +3701,5 @@
>                  }
>              RedirOutputThread outThrd = new RedirOutputThread(pProc, out);
>              outThrd.start();
> +            try {
> +                outThrd.join(timeoutSeconds * 1000);

what if timeoutSeconds == 0, <1, MAXINT?  we should have some reasonable parameters here, i.e. >1, <24*3600

@@ +3705,5 @@
> +                outThrd.join(timeoutSeconds * 1000);
> +                int nRetCode = pProc.exitValue();
> +                sRet = "return code [" + nRetCode + "]";
> +                }
> +            catch (IllegalThreadStateException itse) {

why was this in a for loop before?  did it catch a different error condition that would help us timeout faster?
Attachment #827011 - Flags: review?(jmaher) → review+
Assignee

Comment 3

6 years ago
(In reply to Joel Maher (:jmaher) from comment #2)
> Comment on attachment 827011 [details] [diff] [review]
> add execext command
> 
> Review of attachment 827011 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> overall, this should work well.  A few ideas/questions:
> 1) Have a set_timeout function that adjusts the default timeout, then you
> can use the existing functions, not ideal, but another solution

I'd be afraid of the timeout getting confused over time.

> 2) do we have a lot of use of execcwdsu and execcwd?  maybe we could
> deprecate those in a future version

I don't know of anywhere that we use cwd, but I haven't determined that it is not used either.

> 3) we have negatus, will this get ported there?

I'll check with mcote. We might handle it the same as execcwd -- http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozdevice/mozdevice/devicemanagerSUT.py#300.

> 4) have you tested this in a harness while collecting logcat and running
> tests? (we could get rid of our file logging since everything goes to
> logcat!)

I have not tried that yet -- but am eager to!
 
> ::: build/mobile/sutagent/android/DoCommand.java
> @@ +815,5 @@
> > +                if (Argc >= 2)
> > +                    {
> > +                    boolean su = false;
> > +                    String cwd = null;
> > +                    int timeout = 0;
> 
> shouldn't we default this to DEFAULT_STARTPRG_TIMEOUT_SECONDS?

Yes! Thanks.

> @@ +3701,5 @@
> >                  }
> >              RedirOutputThread outThrd = new RedirOutputThread(pProc, out);
> >              outThrd.start();
> > +            try {
> > +                outThrd.join(timeoutSeconds * 1000);
> 
> what if timeoutSeconds == 0, <1, MAXINT?  we should have some reasonable
> parameters here, i.e. >1, <24*3600

I added some limits: warn if limits exceeded and use default instead.

> @@ +3705,5 @@
> > +                outThrd.join(timeoutSeconds * 1000);
> > +                int nRetCode = pProc.exitValue();
> > +                sRet = "return code [" + nRetCode + "]";
> > +                }
> > +            catch (IllegalThreadStateException itse) {
> 
> why was this in a for loop before?  did it catch a different error condition
> that would help us timeout faster?

I cannot think of why the loop was used. I see no advantage.
https://hg.mozilla.org/mozilla-central/rev/58d7c1291d4c
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28

Updated

a year ago
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.