Closed Bug 1457662 Opened Last year Closed Last year

ADBDevice.shell()'s output callback is unreliable

Categories

(Testing :: Mozbase, defect, P1)

Version 3
defect

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file, 1 obsolete file)

When the callback parameter is used with ADBDevice.shell(), the output stream seen by the callback is not a reliable and complete representation of the output created by the process: There are occasional inconsistencies.

For instance, in

https://treeherder.mozilla.org/logviewer.html#?job_id=176060679&repo=try&lineNumber=3068-3079

[task 2018-04-27T23:35:37.178Z] 23:35:37     INFO -  org.mozilla.geckoview.test | 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.jaINSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2018-04-27T23:35:37.178Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=90
[task 2018-04-27T23:35:37.179Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.ProgressDelegateTest
[task 2018-04-27T23:35:37.179Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2018-04-27T23:35:37.180Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=111
[task 2018-04-27T23:35:37.180Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=loadUnknownHost
[task 2018-04-27T23:35:37.180Z] 23:35:37     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: 1
[task 2018-04-27T23:35:37.181Z] 23:35:37  WARNING -  TEST-UNEXPECTED-FAIL | org.mozilla.geckoview.test.ProgressDelegateTest.multipleLoads | missing test completion status
[task 2018-04-27T23:35:37.181Z] 23:35:37     INFO -  TEST-INFO took 2816ms
[task 2018-04-27T23:35:37.182Z] 23:35:37     INFO -  TEST-START | org.mozilla.geckoview.test.ProgressDelegateTest.loadUnknownHost
[task 2018-04-27T23:35:37.182Z] 23:35:37     INFO -  org.mozilla.geckoview.test | koSessionTestRule$3.evaluate(GeckoSessionTestRule.java:900)
[task 2018-04-27T23:35:37.182Z] 23:35:37     INFO -  org.mozilla.geckoview.test | 	at android.support.test.internal.statement.UiThreadStatement$1.run(UiThreadStatement.java:44)

Notice for instance "FrameworkMethod.jaINSTRUMENTATION_STATUS:" as though two streams were interleaved.

This leads to intermittent "missing test completion status" errors in geckoview-junit.
I originally misunderstood how dup() works: I thought it produced a second file handle fully independent of the original. In fact, it seems that the two file handles share position and some other information. That's why there was a need to regularly tell() and seek(). 

Now I'm not surprised this is intermittently failing.
I have a solution -- should be ready on Monday.
Priority: -- → P1
Before this patch, you can see random failures with "missing test completion status":

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f120f5fa454446bd3fc9bfbdbcdfb762d6df673d

Once this patch is applied, only specific test failures (status -2) remain:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3fd78675463298379bc3df26901a57c8eda41927

and the logs look consistent.
Attachment #8971961 - Flags: review?(bob)
Comment on attachment 8971961 [details] [diff] [review]
make shell output callback reliable

Oops - not quite ready!
Attachment #8971961 - Flags: review?(bob)
Comment on attachment 8971961 [details] [diff] [review]
make shell output callback reliable

Actually, it looks okay.

Please see comment 3.
Attachment #8971961 - Flags: review?(bob)
Comment on attachment 8971961 [details] [diff] [review]
make shell output callback reliable

Review of attachment 8971961 [details] [diff] [review]:
-----------------------------------------------------------------

Overall looks good. I have a couple of suggestions if you don't mind.

::: testing/mozbase/mozdevice/mozdevice/adb.py
@@ +25,4 @@
>          #: command argument argument list.
>          self.args = args
>          #: Temporary file handle to be used for stdout.
> +        self.stdout_file = tempfile.NamedTemporaryFile(mode='w+b', delete=True)

delete=True is the default.

@@ +1109,3 @@
>                      except IOError:
>                          pass
> +                    exitcode = adb_process.proc.poll()

I'm concerned that _timed_read_line strips the newline which might result in us getting an empty string back even though it is not end of file. Can we change _timed_read_line to not strip the line and if you need to do so, you can strip it in stdout_callback(line).

_timed_read_line has another potential problem in that if the read times out, then we return line even though it hasn't been initialized. Can we add an except: in _timed_read_line and set line = '' so that we always have a valid return value even if the read times out?

There is a race here where it is possible for the process to terminate while we still have output which has not been processed. We need to add another loop after the while to make sure we have gotten all of the output.

line = stdout2.readline()
while line:
    stdout_callback(line.rstrip())
    line = stdout2.readline()

@@ +1111,5 @@
> +                    exitcode = adb_process.proc.poll()
> +        else:
> +            while ((time.time() - start_time) <= timeout) and exitcode is None:
> +                time.sleep(self._polling_interval)
> +                exitcode = adb_process.proc.poll()

This is a style comment. I can't remember the original source but I believe it was Kernighan and Plauger who suggested keeping the first branch of an if...else short rather than having a short tail after the else.

instead of

if stdout_callback:
   more stuff
else:
   less stuff

I prefer

if not stdout_callback:
   less stuff
else:
   more stuff
Attachment #8971961 - Flags: review?(bob)
(In reply to Bob Clary [:bc:] from comment #6)

> 
> I'm concerned that _timed_read_line strips the newline which might result in
> us getting an empty string back even though it is not end of file. Can we
> change _timed_read_line to not strip the line and if you need to do so, you
> can strip it in stdout_callback(line).
> 
> _timed_read_line has another potential problem in that if the read times
> out, then we return line even though it hasn't been initialized. Can we add
> an except: in _timed_read_line and set line = '' so that we always have a
> valid return value even if the read times out?
> 

On second thought, I think these comments are bogus so long as we don't use _timed_read_line outside of its original loop. I was thinking of the follow up loop to read the remainder but so long as you don't use _timed_read_line there, I don't think the stripping and the exception don't matter.
Comment on attachment 8972116 [details] [diff] [review]
make shell output callback reliable

Review of attachment 8972116 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, thanks! r+.
Attachment #8972116 - Flags: review?(bob) → review+
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bfab79dc9a7
[mozdevice] Use separate file object to read shell output for callback; r=bc
https://hg.mozilla.org/mozilla-central/rev/1bfab79dc9a7
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.