Closed Bug 790595 Opened 7 years ago Closed 7 years ago

devicemanagerSUT.py error messages should be TBPL-parser-friendly (eg "unable to connect socket ... reconnecting socket")

Categories

(Testing :: Mozbase, defect)

ARM
Android
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sheriff-want])

Attachments

(3 files, 1 obsolete file)

For bug 681861 failures, we just get things like :
{
========= Started Install App on Device failed (results: 2, elapsed: 33 mins, 33 secs) (at 2012-09-12 03:57:36.160428) =========
python /builds/sut_tools/installApp.py 10.250.51.178 build/fennec-18.0a1.en-US.android-arm.apk org.mozilla.fennec
 in dir /builds/tegra-338/test/. (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', '/builds/sut_tools/installApp.py', '10.250.51.178', u'build/fennec-18.0a1.en-US.android-arm.apk', 'org.mozilla.fennec']
 environment:
  HOME=/home/cltbld
  PATH=/tools/buildbot-0.8.4-pre-moz2/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin
  PWD=/builds/tegra-338/test
  SUT_IP=10.250.51.178
  SUT_NAME=tegra-338
 using PTY: False
copying build/fennec/application.ini to build/talos/remoteapp.ini
connecting to: 10.250.51.178
reconnecting socket
10.250.48.24 50338
results: {'process': [['10007', '1195', 'com.android.inputmethod.latin'], ['1001', '1205', 'com.android.phone'], ['1000', '1020', 'system'], ['10029', '1346', 'com.android.deskclock'], ['10010', '1377', 'com.android.providers.calendar'], ['10031', '1470', 'com.mozilla.SUTAgentAndroid'], ['10034', '1493', 'org.mozilla.ffxcp'], ['10033', '1484', 'org.mozilla.fencp'], ['10018', '1217', 'com.android.launcher'], ['10013', '1443', 'com.cooliris.media'], ['10009', '1422', 'com.android.quicksearchbox'], ['10002', '1435', 'com.android.music'], ['10032', '1412', 'com.mozilla.watcher'], ['10004', '1309', 'android.process.media'], ['10006', '1393', 'com.android.mms'], ['10014', '1363', 'com.android.email'], ['10017', '1355', 'com.android.bluetooth'], ['1000', '1223', 'com.android.settings'], ['10015', '1257', 'android.process.acore']]}
results: {'memory': ['PA:830611456, FREE: 756637696']}
results: {'uptime': ['0 days 0 hours 11 minutes 9 seconds 516 ms']}
results: {'screen': ['X:1600 Y:1200']}
results: {'os': ['harmony-eng 2.2 FRF91 20110202.102810 test-keys']}
results: {'screen': ['X:1600 Y:1200']}
INFO: we have a current resolution of 1600, 1200
INFO: adjusting screen resolution to 1024, 768 and rebooting
calling reboot
DEBUG: gCallbackData is:  on port: 50338
Creating server with 10.250.48.24:50338
Calling disconnect on callback server
. . . . . . . . . . Shutting down server now
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
...
<snip>
...
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
No handlers could be found for logger "root"
program finished with exit code 1
elapsedTime=2013.077052
========= Finished Install App on Device failed (results: 2, elapsed: 33 mins, 33 secs) (at 2012-09-12 04:31:09.255198) =========
}

More useful would be:
{
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
Automation Error: "Exceeded devicemanagerSUT.py timeout"
program finished with exit code 1
}

...or any other string we can add to:
http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/b3f9daa143ce/php/inc/GeneralErrorFilter.php#l24

Also, is this known?
> No handlers could be found for logger "root"
Assignee: nobody → bmo
Status: NEW → ASSIGNED
BTW, the m-c version of devicemanagerSUT at testing/mozbase/mozdevice/mozdevice/devicemanagerSUT.py has some additional diagnostics. In particular, I know it prints the socket error on a connect failure:

reconnecting socket
unable to connect socket: [Errno 111] Connection refused
reconnecting socket
unable to connect socket: [Errno 111] Connection refused
reconnecting socket
unable to connect socket: [Errno 111] Connection refused
reconnecting socket
unable to connect socket: [Errno 111] Connection refused
reconnecting socket
unable to connect socket: [Errno 111] Connection refused
Traceback (most recent call last):
  File "_tests/testing/mochitest/runtestsremote.py", line 494, in <module>
    main()
  File "_tests/testing/mochitest/runtestsremote.py", line 386, in main
    dm = devicemanagerSUT.DeviceManagerSUT(options.deviceIP, options.devicePort)
  File "/home/mozdev/src/objdir-native-droid/_tests/testing/mochitest/devicemanagerSUT.py", line 56, in __init__
    raise BaseException("Failed to connect to SUT Agent and retrieve the device root.")
BaseException: Failed to connect to SUT Agent and retrieve the device root.
make: *** [mochitest-robotium] Error 1
Oh, so the deployed version isn't what is on m-c?
we use an older version of devicemanager in the sut_tools (what we verify, cleanup, install, etc...).  So there is a really good chance most of this pain will be solved when we can get the sut_tools using the latest devicemanager (in bug 781341).
Depends on: 781341
Blocks: 681861
No longer depends on: 681861
Bug 793678 will mitigate this for sut_tools, but I'd like to handle this properly in devicemanagerSUT.py too; so leaving this bug open.
Attached patch Patch v1 (obsolete) — Splinter Review
Adds a number of "Remote Device Error:" / "Automation Error:" prefixes, some were present previously but removed inadvertently by bug 792262.

(Apologies if the patch is formatted weirdly, not really used to git diff).
Attachment #664065 - Flags: review?(jmaher)
Comment on attachment 664065 [details] [diff] [review]
Patch v1

thanks!
Attachment #664065 - Flags: review?(jmaher) → review+
https://github.com/mozilla/mozbase/commit/52a1bb1141ca095bb6c1bd610698f1b2e6ae51f8

(First commit managed to change all the line endings to native; yey for github's broken windows client).

Callek, do you know when the next deployment might be? :-)
(In reply to Ed Morley [:edmorley UTC+1] from comment #7)
> Callek, do you know when the next deployment might be? :-)

By Callek I meant jhammel and deployment I meant merge to m-c. (Confusing myself with bug 793678).

/me goes to get some more caffeine.
Attached patch Patch v1Splinter Review
(Attaching patch that doesn't break bugzilla's diff; for posterity)
Attachment #664065 - Attachment is obsolete: true
Summary: devicemanagerSUT.py should output a more useful final error than repeated "unable to connect socket \n reconnecting socket" and then "program finished with exit code 1" → devicemanagerSUT.py error messages should be TBPL-parser-friendly (eg "unable to connect socket ... reconnecting socket")
Attached patch FollowupSplinter Review
Followup to remove prefix for an expected-and-not-really-error case.

Reworded existing string to make not-error state clearer.

(Prevents retries due a la https://tbpl.mozilla.org/php/getParsedLog.php?id=15483618&tree=Try#error0)
Attachment #664164 - Flags: review?(jmaher)
Comment on attachment 664164 [details] [diff] [review]
Followup

Review of attachment 664164 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #664164 - Flags: review?(jmaher) → review+
Blocks: 793746
Won't be on m-c until the next mozbase -> m-c merge & won't be in sut_tools until bug 781341 is complete, but nothing more to do here, so closing this out.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 793739
Hmmm...
https://tbpl.mozilla.org/php/getParsedLog.php?id=15491368&tree=Try

Another couple that were not really errors but were phrased as such and so got prefixed.

"Automation Error: failed to validate file when downloading /mnt/sdcard/tests/logs/mochitest.log!"

and

"Remote Device Error: error returned from pull: Automation Error: could not get all file data"

Not sure what strings you would prefer?

"Automation Info: failed to validate file when downloading /mnt/sdcard/tests/logs/mochitest.log!"

and 

"Automation Info: error returned from pull: could not get all file data"

respectively, perhaps?
Addresses a few more "are really warnings rather than errors" cases; s/error/unsuccessful/, removes exclamation mark to downplay implied severity etc
Attachment #664398 - Flags: review?(jmaher)
Comment on attachment 664398 [details] [diff] [review]
Followup for comment 14

Review of attachment 664398 [details] [diff] [review]:
-----------------------------------------------------------------

Now I am asking to cleanup existing error messages :)

::: mozdevice/mozdevice/devicemanagerSUT.py
@@ +711,5 @@
>                      data = self._sock.recv(to_recv)
>                      timer = 0
>                  timer += select_timeout
>                  if timer > timeout:
> +                    err('timeout in uread while retrieving file')

can we make this have a 'DeviceManager' in the string?

@@ +769,5 @@
>              return None
>          try:
>              filesize = int(filesizestr)
>          except ValueError:
> +            err('invalid file size in returned metadata')

these two messages look scary, we might want to keep these as Automation Error, at the very least they should have DeviceManager.

@@ +790,4 @@
>          if buf == None:
>              return None
>          if buf[-len(prompt):] != prompt:
> +            err('no prompt found after file data--DeviceManager may be out of sync with agent')

this might be a valid msg to kill a job, but not retry.  Can we make these have DeviceManager in the front of them.
Attachment #664398 - Flags: review?(jmaher) → review-
Comment on attachment 664398 [details] [diff] [review]
Followup for comment 14

my bad, my previous comments were overlooking the fact that we were calling an internal err() function which added DeviceManager to the string.
Attachment #664398 - Flags: review- → review+
Component: General → Mozbase
No longer depends on: 781341
Blocks: 798799
You need to log in before you can comment on or make changes to this bug.