Closed Bug 940915 Opened 11 years ago Closed 10 years ago

[Marionette] marionette's detecting of hung device

Categories

(Remote Protocol :: Marionette, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wachen, Unassigned)

References

Details

I am wondering if there is a way of current marionette to detect if the device hung or not? If not, how can we achieve that? Or, can we write something to do that?
We don't have a way to do that currently.

To achieve that, we have to define what problem we're going to detect. You can have non-responsiveness of a single app or of the main b2g process. If we're currently in the context of a hung application, then your command to marionette would time out. In this case, we'll need to figure out if marionette is not responding because the  a) if marionette hit a bug, b) b2g process crashed, or c) if it is hung. For a), finding a programmatic way to determine this is a bit hard. We'd have to check the logcat to see if there was an error near the last marionette output, and this check might bring up false positives since you might have a known 'harmless' error in the log output. For b), we can check if the b2g process is still running on the phone by checking the output of 'adb shell ps' or something similar, for c) I'm not sure how we can verify this. Perhaps we can get some information regarding the thread state using adb shell commands?
I will try to investigate on this. Perhaps I can start with adb shell command or other linux commands in MTBF for trial runs. However, the tough thing will be if when should we do those commands? For example, if there is broken pipeline in marionette client error message alrdy, is there a way to run a check command on that time simultaneously?
As an update, we now have some detection for this thanks to Bug 959520 landing. If a Marionette test result shows that it fails due to a socket error or a timeout during a test, then we will try to see if the b2g process is still running and if it isn't, we let the user know by printing out 'b2g process has died'. If b2g is still running, then we look at the logcat and get the 5 most recent errors/exceptions. We then check if the b2g's process id has changed. If it changed, it means that b2g crashed and was restarted, so we print that out as well.

This is currently in mozilla-central, but not uplifted to 1.3. Is this for testing against 1.3?
Hi, mdas, if it is not hard (although I know it's always tough to do so), can you uplift it to 1.3? 

And, I don't know if we can run new marionette for old version b2g since I failed on mismatching of marionette client and b2g multiple times....

I am planning to use v1.3 for awhile to stablize the framework before moving on.
sure, sorry we've neglected the uplifting work lately, I'll start uplifting now.
No problem at all and thanks so much for helping on this.
I'm going through all the patches and I realize one issue. A few changes required downstream changes to gaia-ui-tests, which are applied but only on master. If I uplift these patches to v1.3 in marionette, then I'll have to uplift related changes to gaia-ui-tests on their v1.3 branch, which will take quite a bit of time for little benefit. They're all minor changes to the structure of the data we send back (for example: https://bugzilla.mozilla.org/show_bug.cgi?id=964367)

Due to this, I'll apply only the patches that don't affect gaia-ui-tests, and uplift those to v1.3. This will let you have the useful changes as soon as possible and won't depend on any changes to gaia's v1.3 branch. If you feel the need to have any of the patches I'm not applying, I can uplift them to both gecko and gaia, but it will take much longer to do.
As an update, I'm still working on this. I'll see if my latest patch set works on try and then will land in v1.3
Blocks: MTBF-meta
bug 986063 has been resolved, so you should now have access to the updated marionette client code in the v1.3 branch
(In reply to Walter Chen[:ypwalter][:wachen] from comment #4)
> Hi, mdas, if it is not hard (although I know it's always tough to do so),
> can you uplift it to 1.3? 
> 
> And, I don't know if we can run new marionette for old version b2g since I
> failed on mismatching of marionette client and b2g multiple times....
> 
> I am planning to use v1.3 for awhile to stablize the framework before moving
> on.

By the way, are you using the client that's in the tree, or do you pull down the client from pypi? If that's the case, I'll have to release a special version for this to pypi.
Has the new version of marionette helped this problem?
Flags: needinfo?(wachen)
Btw, from Bug 991685, a new marionette-client was released in case you don't use the in-tree version
socket.error was caught and reconnect fine currently.
Great, marking as resolved then, thanks!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Clean ni? for Walter
Flags: needinfo?(wachen)
Blocks: MTBF-Marionette
No longer blocks: MTBF-meta
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.