Closed
Bug 1438679
Opened 6 years ago
Closed 6 years ago
[mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status
Categories
(Testing :: Mozbase, enhancement, P1)
Tracking
(firefox60 fixed)
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox60 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
Attachments
(1 file, 2 obsolete files)
As noticed on bug 1433905 comment 8 the DeviceRunner doesn't override the `poll()` and `wait()` methods, and as such calling both methods from a consumer of mozrunner will trigger the implementations in BaseRunner. Those do not check for the remote process, but instead check for the process status of `adb`, which in all the cases will return `0` - `adb` always forks itself as daemon. Only `is_running` of the DeviceRunner class has the correct behavior: https://dxr.mozilla.org/mozilla-central/rev/27fd083ed7ee5782e52a5eaf0286c5ffa8b35a9e/testing/mozbase/mozrunner/mozrunner/base/device.py#124 It queries the device manager and checks if the process exists. In case of FennecRunner this is the DeviceManagerADB class which is based on DeviceManager: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozdevice/mozdevice/devicemanager.py `processExist()` itself fetches the process info and returns True or False. I would say we can use the same for `poll()` and `wait()` to at least return `None` or `0`. I'm not sure if there is a way to retrieve the exit code of an already gone process via adb. Goeff do you know?
Flags: needinfo?(gbrown)
Comment 1•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #0) > I'm not sure if there is a way to retrieve the exit code of > an already gone process via adb. Goeff do you know? I think it is not possible to retrieve the exit code of a remote process which has already completed. A technique that we have used with some success is to echo the exit status of the remote process: https://dxr.mozilla.org/mozilla-central/rev/9b69cc60e5848f2f8802c911fd00771b50eed41f/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py#140 then wait for the process to end and collect and parse its output for the exit code: https://dxr.mozilla.org/mozilla-central/rev/9b69cc60e5848f2f8802c911fd00771b50eed41f/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py#171-185 That ought to work if the remote process is a shell script or a native executable like 'ps' or 'ls'. Of course, if you are launching an application (firefox), that's probably going to be done through 'am start' or similar and this technique won't be of much use.
Flags: needinfo?(gbrown)
Assignee | ||
Comment 2•6 years ago
|
||
I got it successfully running as part of my patch for bug 1433873. Here a try build which utilizes a new wait() method on the DeviceRunner class: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9408c41de5868adae6d675d2ff021f7e6d7af07 I will extract that patch, and upload it here for a separate release of mozrunner which is overdue anyway.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8953232 -
Flags: review?(gbrown)
Attachment #8953233 -
Flags: review?(gbrown)
Assignee | ||
Updated•6 years ago
|
Attachment #8953233 -
Attachment is obsolete: true
Attachment #8953233 -
Flags: review?(gbrown)
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #2) > I will extract that patch, and upload it here for a separate release of > mozrunner which is overdue anyway. Well, we shouldn't release yet given that more work has to be done for both mozrunner and mozprocess, so I removed the second commit.
Comment 6•6 years ago
|
||
mozreview-review |
Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review228768 ::: testing/mozbase/mozrunner/mozrunner/base/device.py:127 (Diff revision 1) > self.app_ctx.remote_process)) > > def is_running(self): > return self.app_ctx.dm.processExist(self.app_ctx.remote_process) > > + def poll(self): I don't see poll() defined in the base class. What am I missing? ::: testing/mozbase/mozrunner/mozrunner/base/device.py:134 (Diff revision 1) > + > + def wait(self, timeout=None): > + count = 0 > + > + while True: > + if not self.is_running(): The is_running() call is ultimately an adb command out to the remote device -- that might take some time. I expect this implementation is usually fine, but on a slow/malfunctioning device, it might wait too long. Instead of assuming one loop == 1 second, I'd prefer to see something like: end = datetime.now() + datetime.timedelta(timeout) ... if datetime.now() > end: You can't guarantee a perfect wait time, but this approach will limit the damage.
Attachment #8953232 -
Flags: review?(gbrown) → review-
Assignee | ||
Comment 7•6 years ago
|
||
mozreview-review-reply |
Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review228768 > I don't see poll() defined in the base class. What am I missing? Oh, I was working that much with the process classes that I didn't notice that mozrunner has such an awkward API. So instead of `poll()` we have the property `returncode` only, or have to use `wait(timeout=0)`. I will remove that method. > The is_running() call is ultimately an adb command out to the remote device -- that might take some time. > > I expect this implementation is usually fine, but on a slow/malfunctioning device, it might wait too long. Instead of assuming one loop == 1 second, I'd prefer to see something like: > > end = datetime.now() + datetime.timedelta(timeout) > ... > if datetime.now() > end: > > You can't guarantee a perfect wait time, but this approach will limit the damage. I checked again and we actually have the same 1s sleep logic in `start()` and `stop()`. I have never seen that such a call can take that long, but I agree that we should make that more accurate. I will update all affected methods in this module, which also gives us better coverage for `wait()`. A new try build is up at https://treeherder.mozilla.org/#/jobs?repo=try&revision=700aaa319090b0ccdaca703d0b344e08d1550259
Assignee | ||
Updated•6 years ago
|
Summary: [mozrunner] DeviceRunner has to override poll() and wait() methods to check for remote process status → [mozrunner] DeviceRunner has to override wait() to check for remote process status
Assignee | ||
Comment 8•6 years ago
|
||
While digging deeper I noticed that we inappropriately override `is_running`, it's just identical to the BaseRunner. Instead we completely miss the `returncode` property, which has to be independent from the BaseRunner. Maybe that is what I originally tried to refer to but accidentally replaced with `poll()`. Further I will patch `processExist()` which claims to return a boolean, but actually returns the process id.
Summary: [mozrunner] DeviceRunner has to override wait() to check for remote process status → [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status
Assignee | ||
Comment 9•6 years ago
|
||
try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c18dc895ab1a0d2a4de43d44fd1e9ad1450e55
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 12•6 years ago
|
||
mozreview-review |
Comment on attachment 8953970 [details] Bug 1438679 - [mozdevice] processExist() has to return a boolean and not remote process id. https://reviewboard.mozilla.org/r/223128/#review229168
Attachment #8953970 -
Flags: review?(gbrown) → review+
Comment 13•6 years ago
|
||
mozreview-review |
Comment on attachment 8953232 [details] Bug 1438679 - [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. https://reviewboard.mozilla.org/r/222514/#review229232 This looks great now. Thanks.
Attachment #8953232 -
Flags: review?(gbrown) → review+
Comment 14•6 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5685421d86eb [mozdevice] processExist() has to return a boolean and not remote process id. r=gbrown https://hg.mozilla.org/integration/autoland/rev/0f79e397a320 [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. r=gbrown
Comment 15•6 years ago
|
||
Backed out for xpcshell failures on Android. backout: https://hg.mozilla.org/integration/autoland/rev/d04639eccb1871310307a9515f3b310536cb652e push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=0f79e397a3202e6579fb8a6b642e828ab2dff793 failure logs: https://treeherder.mozilla.org/logviewer.html#?job_id=164398083&repo=autoland&lineNumber=26093 [task 2018-02-26T19:17:46.257Z] 19:17:46 INFO - (xpcshell/head.js) | test MAIN run_test finished (1) [task 2018-02-26T19:17:46.257Z] 19:17:46 INFO - exiting test [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - intl/uconv/tests/unit/test_encode_CP1258.js | [7298, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399 [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - intl/uconv/tests/unit/test_encode_CP1258.js | [7298, Main Thread] WARNING: OOPDeinit() without successful OOPInit(): file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 3371 [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - <<<<<<< [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - INFO | Result summary: [task 2018-02-26T19:17:46.258Z] 19:17:46 INFO - INFO | Passed: 152 [task 2018-02-26T19:17:46.258Z] 19:17:46 WARNING - INFO | Failed: 152 [task 2018-02-26T19:17:46.259Z] 19:17:46 WARNING - One or more unittests failed. [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - INFO | Todo: 0 [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - INFO | Retried: 0 [task 2018-02-26T19:17:46.259Z] 19:17:46 INFO - SUITE-END | took 1792s [task 2018-02-26T19:17:46.273Z] 19:17:46 ERROR - Return code: 1 [task 2018-02-26T19:17:46.273Z] 19:17:46 INFO - TinderboxPrint: xpcshell<br/>152/<em class="testfail">152</em>/0 [task 2018-02-26T19:17:46.273Z] 19:17:46 INFO - ##### xpcshell log ends
Flags: needinfo?(hskupin)
Assignee | ||
Comment 16•6 years ago
|
||
It's totally not clear to me what's failing here. I cannot find any failing test in that log even it is listed as WARNING at the end. Geoff, do you have any idea? If not I will have to dig into that tomorrow.
Flags: needinfo?(hskupin) → needinfo?(gbrown)
Comment 17•6 years ago
|
||
I see a lot of "Process still running after test!", which seems to come from: https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/testing/xpcshell/runxpcshelltests.py#292-300 Note that poll() is implemented as: https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/testing/xpcshell/remotexpcshelltests.py#194 ...which seems related to the processExist() change.
Flags: needinfo?(gbrown)
Assignee | ||
Comment 18•6 years ago
|
||
Interestingly I haven't gotten an email from Bugzilla about your reply, so I figured out the same meanwhile as you pointed out in the last comment. The usage of `processExist()` is really wrong here, and would explain it.
Assignee | ||
Comment 19•6 years ago
|
||
I see a lot of usage for `processExist()` in tree, and I think it is a better idea to actually move this change out to a different bug. https://dxr.mozilla.org/mozilla-central/search?q=processExist&=mozilla-central I will revert this change.
Assignee | ||
Comment 20•6 years ago
|
||
I filed bug 1441457 so we can take care of that later. Try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=19917287913758a534b1a3f5bcace29f1867d7c6
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8953970 -
Attachment is obsolete: true
Assignee | ||
Comment 22•6 years ago
|
||
I made a little mistake, so here the all passing try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=78f579e836f501062096ad461ecd1c3ef67702b8
Comment 23•6 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b30f674e78e [mozrunner] DeviceRunner has to override returncode and wait() to check for remote process status. r=gbrown
Comment 24•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1b30f674e78e
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in
before you can comment on or make changes to this bug.
Description
•