Closed Bug 790595 Opened 12 years ago Closed 12 years ago

devicemanagerSUT.py error messages should be TBPL-parser-friendly (eg "unable to connect socket ... reconnecting socket")

Categories

(Testing :: Mozbase, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sheriff-want])

Attachments

(3 files, 1 obsolete file)

For bug 681861 failures, we just get things like : { ========= Started Install App on Device failed (results: 2, elapsed: 33 mins, 33 secs) (at 2012-09-12 03:57:36.160428) ========= python /builds/sut_tools/installApp.py 10.250.51.178 build/fennec-18.0a1.en-US.android-arm.apk org.mozilla.fennec in dir /builds/tegra-338/test/. (timeout 1200 secs) watching logfiles {} argv: ['python', '/builds/sut_tools/installApp.py', '10.250.51.178', u'build/fennec-18.0a1.en-US.android-arm.apk', 'org.mozilla.fennec'] environment: HOME=/home/cltbld PATH=/tools/buildbot-0.8.4-pre-moz2/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin PWD=/builds/tegra-338/test SUT_IP=10.250.51.178 SUT_NAME=tegra-338 using PTY: False copying build/fennec/application.ini to build/talos/remoteapp.ini connecting to: 10.250.51.178 reconnecting socket 10.250.48.24 50338 results: {'process': [['10007', '1195', 'com.android.inputmethod.latin'], ['1001', '1205', 'com.android.phone'], ['1000', '1020', 'system'], ['10029', '1346', 'com.android.deskclock'], ['10010', '1377', 'com.android.providers.calendar'], ['10031', '1470', 'com.mozilla.SUTAgentAndroid'], ['10034', '1493', 'org.mozilla.ffxcp'], ['10033', '1484', 'org.mozilla.fencp'], ['10018', '1217', 'com.android.launcher'], ['10013', '1443', 'com.cooliris.media'], ['10009', '1422', 'com.android.quicksearchbox'], ['10002', '1435', 'com.android.music'], ['10032', '1412', 'com.mozilla.watcher'], ['10004', '1309', 'android.process.media'], ['10006', '1393', 'com.android.mms'], ['10014', '1363', 'com.android.email'], ['10017', '1355', 'com.android.bluetooth'], ['1000', '1223', 'com.android.settings'], ['10015', '1257', 'android.process.acore']]} results: {'memory': ['PA:830611456, FREE: 756637696']} results: {'uptime': ['0 days 0 hours 11 minutes 9 seconds 516 ms']} results: {'screen': ['X:1600 Y:1200']} results: {'os': ['harmony-eng 2.2 FRF91 20110202.102810 test-keys']} results: {'screen': ['X:1600 Y:1200']} INFO: we have a current resolution of 1600, 1200 INFO: adjusting screen resolution to 1024, 768 and rebooting calling reboot DEBUG: gCallbackData is: on port: 50338 Creating server with 10.250.48.24:50338 Calling disconnect on callback server . . . . . . . . . . Shutting down server now reconnecting socket unable to connect socket reconnecting socket unable to connect socket ... <snip> ... reconnecting socket unable to connect socket reconnecting socket unable to connect socket reconnecting socket unable to connect socket No handlers could be found for logger "root" program finished with exit code 1 elapsedTime=2013.077052 ========= Finished Install App on Device failed (results: 2, elapsed: 33 mins, 33 secs) (at 2012-09-12 04:31:09.255198) ========= } More useful would be: { reconnecting socket unable to connect socket reconnecting socket unable to connect socket reconnecting socket unable to connect socket Automation Error: "Exceeded devicemanagerSUT.py timeout" program finished with exit code 1 } ...or any other string we can add to: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/b3f9daa143ce/php/inc/GeneralErrorFilter.php#l24 Also, is this known? > No handlers could be found for logger "root"
Assignee: nobody → bmo
Status: NEW → ASSIGNED
BTW, the m-c version of devicemanagerSUT at testing/mozbase/mozdevice/mozdevice/devicemanagerSUT.py has some additional diagnostics. In particular, I know it prints the socket error on a connect failure: reconnecting socket unable to connect socket: [Errno 111] Connection refused reconnecting socket unable to connect socket: [Errno 111] Connection refused reconnecting socket unable to connect socket: [Errno 111] Connection refused reconnecting socket unable to connect socket: [Errno 111] Connection refused reconnecting socket unable to connect socket: [Errno 111] Connection refused Traceback (most recent call last): File "_tests/testing/mochitest/runtestsremote.py", line 494, in <module> main() File "_tests/testing/mochitest/runtestsremote.py", line 386, in main dm = devicemanagerSUT.DeviceManagerSUT(options.deviceIP, options.devicePort) File "/home/mozdev/src/objdir-native-droid/_tests/testing/mochitest/devicemanagerSUT.py", line 56, in __init__ raise BaseException("Failed to connect to SUT Agent and retrieve the device root.") BaseException: Failed to connect to SUT Agent and retrieve the device root. make: *** [mochitest-robotium] Error 1
Oh, so the deployed version isn't what is on m-c?
we use an older version of devicemanager in the sut_tools (what we verify, cleanup, install, etc...). So there is a really good chance most of this pain will be solved when we can get the sut_tools using the latest devicemanager (in bug 781341).
Depends on: 781341
Blocks: 681861
No longer depends on: 681861
Bug 793678 will mitigate this for sut_tools, but I'd like to handle this properly in devicemanagerSUT.py too; so leaving this bug open.
Attached patch Patch v1 (obsolete) — Splinter Review
Adds a number of "Remote Device Error:" / "Automation Error:" prefixes, some were present previously but removed inadvertently by bug 792262. (Apologies if the patch is formatted weirdly, not really used to git diff).
Attachment #664065 - Flags: review?(jmaher)
Comment on attachment 664065 [details] [diff] [review] Patch v1 thanks!
Attachment #664065 - Flags: review?(jmaher) → review+
https://github.com/mozilla/mozbase/commit/52a1bb1141ca095bb6c1bd610698f1b2e6ae51f8 (First commit managed to change all the line endings to native; yey for github's broken windows client). Callek, do you know when the next deployment might be? :-)
(In reply to Ed Morley [:edmorley UTC+1] from comment #7) > Callek, do you know when the next deployment might be? :-) By Callek I meant jhammel and deployment I meant merge to m-c. (Confusing myself with bug 793678). /me goes to get some more caffeine.
Attached patch Patch v1Splinter Review
(Attaching patch that doesn't break bugzilla's diff; for posterity)
Attachment #664065 - Attachment is obsolete: true
Summary: devicemanagerSUT.py should output a more useful final error than repeated "unable to connect socket \n reconnecting socket" and then "program finished with exit code 1" → devicemanagerSUT.py error messages should be TBPL-parser-friendly (eg "unable to connect socket ... reconnecting socket")
Attached patch FollowupSplinter Review
Followup to remove prefix for an expected-and-not-really-error case. Reworded existing string to make not-error state clearer. (Prevents retries due a la https://tbpl.mozilla.org/php/getParsedLog.php?id=15483618&tree=Try#error0)
Attachment #664164 - Flags: review?(jmaher)
Comment on attachment 664164 [details] [diff] [review] Followup Review of attachment 664164 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #664164 - Flags: review?(jmaher) → review+
Blocks: 793746
Won't be on m-c until the next mozbase -> m-c merge & won't be in sut_tools until bug 781341 is complete, but nothing more to do here, so closing this out.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 793739
Hmmm... https://tbpl.mozilla.org/php/getParsedLog.php?id=15491368&tree=Try Another couple that were not really errors but were phrased as such and so got prefixed. "Automation Error: failed to validate file when downloading /mnt/sdcard/tests/logs/mochitest.log!" and "Remote Device Error: error returned from pull: Automation Error: could not get all file data" Not sure what strings you would prefer? "Automation Info: failed to validate file when downloading /mnt/sdcard/tests/logs/mochitest.log!" and "Automation Info: error returned from pull: could not get all file data" respectively, perhaps?
Addresses a few more "are really warnings rather than errors" cases; s/error/unsuccessful/, removes exclamation mark to downplay implied severity etc
Attachment #664398 - Flags: review?(jmaher)
Comment on attachment 664398 [details] [diff] [review] Followup for comment 14 Review of attachment 664398 [details] [diff] [review]: ----------------------------------------------------------------- Now I am asking to cleanup existing error messages :) ::: mozdevice/mozdevice/devicemanagerSUT.py @@ +711,5 @@ > data = self._sock.recv(to_recv) > timer = 0 > timer += select_timeout > if timer > timeout: > + err('timeout in uread while retrieving file') can we make this have a 'DeviceManager' in the string? @@ +769,5 @@ > return None > try: > filesize = int(filesizestr) > except ValueError: > + err('invalid file size in returned metadata') these two messages look scary, we might want to keep these as Automation Error, at the very least they should have DeviceManager. @@ +790,4 @@ > if buf == None: > return None > if buf[-len(prompt):] != prompt: > + err('no prompt found after file data--DeviceManager may be out of sync with agent') this might be a valid msg to kill a job, but not retry. Can we make these have DeviceManager in the front of them.
Attachment #664398 - Flags: review?(jmaher) → review-
Comment on attachment 664398 [details] [diff] [review] Followup for comment 14 my bad, my previous comments were overlooking the fact that we were calling an internal err() function which added DeviceManager to the string.
Attachment #664398 - Flags: review- → review+
Component: General → Mozbase
No longer depends on: 781341
Blocks: 798799
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: