1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status

Assignee

Description

•

7 years ago

As noticed on bug 1433905 comment 8 the DeviceRunner doesn't override the `poll()` and `wait()` methods, and as such calling both methods from a consumer of mozrunner will trigger the implementations in BaseRunner. Those do not check for the remote process, but instead check for the process status of `adb`, which in all the cases will return `0` - `adb` always forks itself as daemon. Only `is_running` of the DeviceRunner class has the correct behavior: https://dxr.mozilla.org/mozilla-central/rev/27fd083ed7ee5782e52a5eaf0286c5ffa8b35a9e/testing/mozbase/mozrunner/mozrunner/base/device.py#124 It queries the device manager and checks if the process exists. In case of FennecRunner this is the DeviceManagerADB class which is based on DeviceManager: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozdevice/mozdevice/devicemanager.py `processExist()` itself fetches the process info and returns True or False. I would say we can use the same for `poll()` and `wait()` to at least return `None` or `0`. I'm not sure if there is a way to retrieve the exit code of an already gone process via adb. Goeff do you know?

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Comment 1

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #0) > I'm not sure if there is a way to retrieve the exit code of > an already gone process via adb. Goeff do you know? I think it is not possible to retrieve the exit code of a remote process which has already completed. A technique that we have used with some success is to echo the exit status of the remote process: https://dxr.mozilla.org/mozilla-central/rev/9b69cc60e5848f2f8802c911fd00771b50eed41f/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py#140 then wait for the process to end and collect and parse its output for the exit code: https://dxr.mozilla.org/mozilla-central/rev/9b69cc60e5848f2f8802c911fd00771b50eed41f/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py#171-185 That ought to work if the remote process is a shell script or a native executable like 'ps' or 'ls'. Of course, if you are launching an application (firefox), that's probably going to be done through 'am start' or similar and this technique won't be of much use.

Flags: needinfo?(gbrown)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 2

•

7 years ago

I got it successfully running as part of my patch for bug 1433873. Here a try build which utilizes a new wait() method on the DeviceRunner class: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9408c41de5868adae6d675d2ff021f7e6d7af07 I will extract that patch, and upload it here for a separate release of mozrunner which is overdue anyway.

Comment hidden (mozreview-request)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

7 years ago

Attachment #8953232 - Flags: review?(gbrown)

Attachment #8953233 - Flags: review?(gbrown)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

7 years ago

Attachment #8953233 - Attachment is obsolete: true

Attachment #8953233 - Flags: review?(gbrown)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 5

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #2) > I will extract that patch, and upload it here for a separate release of > mozrunner which is overdue anyway. Well, we shouldn't release yet given that more work has to be done for both mozrunner and mozprocess, so I removed the second commit.

Geoff Brown [:gbrown]

Comment 6

•

7 years ago

mozreview-review

Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review228768 ::: testing/mozbase/mozrunner/mozrunner/base/device.py:127 (Diff revision 1) > self.app_ctx.remote_process)) > > def is_running(self): > return self.app_ctx.dm.processExist(self.app_ctx.remote_process) > > + def poll(self): I don't see poll() defined in the base class. What am I missing? ::: testing/mozbase/mozrunner/mozrunner/base/device.py:134 (Diff revision 1) > + > + def wait(self, timeout=None): > + count = 0 > + > + while True: > + if not self.is_running(): The is_running() call is ultimately an adb command out to the remote device -- that might take some time. I expect this implementation is usually fine, but on a slow/malfunctioning device, it might wait too long. Instead of assuming one loop == 1 second, I'd prefer to see something like: end = datetime.now() + datetime.timedelta(timeout) ... if datetime.now() > end: You can't guarantee a perfect wait time, but this approach will limit the damage.

Attachment #8953232 - Flags: review?(gbrown) → review-

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 7

•

7 years ago

mozreview-review-reply

Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review228768 > I don't see poll() defined in the base class. What am I missing? Oh, I was working that much with the process classes that I didn't notice that mozrunner has such an awkward API. So instead of `poll()` we have the property `returncode` only, or have to use `wait(timeout=0)`. I will remove that method. > The is_running() call is ultimately an adb command out to the remote device -- that might take some time. > > I expect this implementation is usually fine, but on a slow/malfunctioning device, it might wait too long. Instead of assuming one loop == 1 second, I'd prefer to see something like: > > end = datetime.now() + datetime.timedelta(timeout) > ... > if datetime.now() > end: > > You can't guarantee a perfect wait time, but this approach will limit the damage. I checked again and we actually have the same 1s sleep logic in `start()` and `stop()`. I have never seen that such a call can take that long, but I agree that we should make that more accurate. I will update all affected methods in this module, which also gives us better coverage for `wait()`. A new try build is up at https://treeherder.mozilla.org/#/jobs?repo=try&revision=700aaa319090b0ccdaca703d0b344e08d1550259

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

7 years ago

Summary: [mozrunner] DeviceRunner has to override poll() and wait() methods to check for remote process status → [mozrunner] DeviceRunner has to override wait() to check for remote process status

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 8

•

7 years ago

While digging deeper I noticed that we inappropriately override `is_running`, it's just identical to the BaseRunner. Instead we completely miss the `returncode` property, which has to be independent from the BaseRunner. Maybe that is what I originally tried to refer to but accidentally replaced with `poll()`. Further I will patch `processExist()` which claims to return a boolean, but actually returns the process id.

Summary: [mozrunner] DeviceRunner has to override wait() to check for remote process status → [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 9

•

7 years ago

try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c18dc895ab1a0d2a4de43d44fd1e9ad1450e55

Comment hidden (mozreview-request)

Geoff Brown [:gbrown]

Comment 12

•

7 years ago

mozreview-review

Comment on attachment 8953970 [details] Bug 1438679 - [mozdevice] processExist() has to return a boolean and not remote process id. https://reviewboard.mozilla.org/r/223128/#review229168

Attachment #8953970 - Flags: review?(gbrown) → review+

Geoff Brown [:gbrown]

Comment 13

•

7 years ago

mozreview-review

Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review229232 This looks great now. Thanks.

Attachment #8953232 - Flags: review?(gbrown) → review+

Pulsebot

Comment 14

•

7 years ago

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5685421d86eb [mozdevice] processExist() has to return a boolean and not remote process id. r=gbrown https://hg.mozilla.org/integration/autoland/rev/0f79e397a320 [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. r=gbrown

Natalia Csoregi [:nataliaCs]

Comment 15

•

7 years ago

Backed out for xpcshell failures on Android. backout: https://hg.mozilla.org/integration/autoland/rev/d04639eccb1871310307a9515f3b310536cb652e push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=0f79e397a3202e6579fb8a6b642e828ab2dff793 failure logs: https://treeherder.mozilla.org/logviewer.html#?job_id=164398083&repo=autoland&lineNumber=26093 [task 2018-02-26T19:17:46.257Z] 19:17:46 INFO - (xpcshell/head.js) | test MAIN run_test finished (1) [task 2018-02-26T19:17:46.257Z] 19:17:46 INFO - exiting test [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - intl/uconv/tests/unit/test_encode_CP1258.js | [7298, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399 [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - intl/uconv/tests/unit/test_encode_CP1258.js | [7298, Main Thread] WARNING: OOPDeinit() without successful OOPInit(): file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 3371 [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - <<<<<<< [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - INFO | Result summary: [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - INFO | Passed: 152 [task 2018-02-26T19:17:46.258Z] 19:17:46 WARNING - INFO | Failed: 152 [task 2018-02-26T19:17:46.259Z] 19:17:46 WARNING - One or more unittests failed. [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - INFO | Todo: 0 [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - INFO | Retried: 0 [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - SUITE-END | took 1792s [task 2018-02-26T19:17:46.273Z] 19:17:46 ERROR - Return code: 1 [task 2018-02-26T19:17:46.273Z] 19:17:46 INFO - TinderboxPrint: xpcshell<br/>152/<em class="testfail">152</em>/0 [task 2018-02-26T19:17:46.273Z] 19:17:46 INFO - ##### xpcshell log ends

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 16

•

7 years ago

It's totally not clear to me what's failing here. I cannot find any failing test in that log even it is listed as WARNING at the end. Geoff, do you have any idea? If not I will have to dig into that tomorrow.

Flags: needinfo?(hskupin) → needinfo?(gbrown)

Geoff Brown [:gbrown]

Comment 17

•

7 years ago

I see a lot of "Process still running after test!", which seems to come from: https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/testing/xpcshell/runxpcshelltests.py#292-300 Note that poll() is implemented as: https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/testing/xpcshell/remotexpcshelltests.py#194 ...which seems related to the processExist() change.

Flags: needinfo?(gbrown)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 18

•

7 years ago

Interestingly I haven't gotten an email from Bugzilla about your reply, so I figured out the same meanwhile as you pointed out in the last comment. The usage of `processExist()` is really wrong here, and would explain it.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 19

•

7 years ago

I see a lot of usage for `processExist()` in tree, and I think it is a better idea to actually move this change out to a different bug. https://dxr.mozilla.org/mozilla-central/search?q=processExist&=mozilla-central I will revert this change.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 20

•

7 years ago

I filed bug 1441457 so we can take care of that later. Try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=19917287913758a534b1a3f5bcace29f1867d7c6

Comment hidden (mozreview-request)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

7 years ago

Attachment #8953970 - Attachment is obsolete: true

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 22

•

7 years ago

I made a little mistake, so here the all passing try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=78f579e836f501062096ad461ecd1c3ef67702b8

Pulsebot

Comment 23

•

7 years ago

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b30f674e78e [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. r=gbrown

Andreea Pavel [:apavel]

Comment 24

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1b30f674e78e

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. 7 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 59 bytes, text/x-review-board-request	gbrown : review+	Details
Bug 1438679 - [mozrunner] Bump version to 6.15. 7 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 59 bytes, text/x-review-board-request		Details
Bug 1438679 - [mozdevice] processExist() has to return a boolean and not remote process id. 7 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 59 bytes, text/x-review-board-request	gbrown : review+	Details