Closed Bug 799334 Opened 12 years ago Closed 10 years ago

Intermittent Android DMError: DeviceManager: pull unsuccessful: could not get all file data

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Assigned: wlach)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=15882488&tree=Mozilla-Inbound Android no-ionmonkey Tegra 250 mozilla-inbound opt test mochitest-8 on 2012-10-06 11:55:18 PDT for push 7fc54528af64 slave: tegra-187 INFO | runtests.py | Running tests: end. DeviceManager: pull unsuccessful: could not get all file data TEST-UNEXPECTED-FAIL | DeviceManager: pull unsuccessful: could not get all file data | Exception caught while running tests. program finished with exit code 1 https://tbpl.mozilla.org/php/getParsedLog.php?id=15923953&tree=Firefox Android no-ionmonkey Tegra 250 mozilla-central opt test mochitest-2 on 2012-10-08 07:35:37 PDT for push e7f2e2c944b7 slave: tegra-313 INFO | runtests.py | Running tests: end. DeviceManager: pull unsuccessful: could not get all file data TEST-UNEXPECTED-FAIL | DeviceManager: pull unsuccessful: could not get all file data | Exception caught while running tests. program finished with exit code 1
This is a side effect of the exceptions in mozdevice. It is hard to tell if we would have other errors in this situation or not before the exceptions were added to mozdevice. This most likely helped surface the real issue. My gut tells me we should find a better method for pulling files back to account for errors and if we receive enough of the data we can go forward.
As Joel hinted, the problem is that we don't retry properly if pulling the file fails right now. So if the connection gets reset while the test is running (this is known to happen), we'll just fail: http://mxr.mozilla.org/mozilla-central/source/build/mobile/remoteautomation.py#175 The proper fix for this is in devicemanagerSUT. My preferred solution would be to remove the special case we have for pulling files and using (roughly) the same error-handling logic that we have for other commands.
wlach, I like your proposed solution, except we might hit a corner case on the file pull as we could lose data somewhere along the lines (agent doesn't read it all, sdcard issues, etc...). If we acknowledge the failure here, we could retry up to 5 times (including the reconnection if that is the root cause).
Summary: Intermittent TEST-UNEXPECTED-FAIL | DeviceManager: pull unsuccessful: could not get all file data | Exception caught while running tests. → Intermittent TEST-UNEXPECTED-FAIL | DeviceManager: pull unsuccessful: could not get all file data | Automation Error: Exception caught while running tests.
Whiteboard: [orange]
(Ready for bug 816581)
Summary: Intermittent TEST-UNEXPECTED-FAIL | DeviceManager: pull unsuccessful: could not get all file data | Automation Error: Exception caught while running tests. → Intermittent Android DMError: DeviceManager: pull unsuccessful: could not get all file data | Automation Error: Exception caught while running tests.
I noticed that this was happening quite regularly with eideticker, and think I have isolated the problem. Gonna post a patch.
Assignee: nobody → wlachance
Attached patch Make pullFile more resilient (obsolete) — — Splinter Review
I'm not 100% sure about the test -- it basically detects the brokenness that existed before, but I'm not sure how useful that is. I'm pretty sure the monkeypatching I'm doing is pretty resilient against timeout issues though, so there's that.
Attachment #783392 - Flags: review?(mcote)
I am getting lots of these errors after running the tests with this patch: ERROR: test_userserial (droidsut_launch.LaunchTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mcote/projects/mozbase/src/mozbase/mozdevice/tests/droidsut_launch.py", line 29, in test_userserial "OK\nreturn code [0]")]) File "/Users/mcote/projects/mozbase/src/mozbase/mozdevice/tests/sut.py", line 29, in __init__ self._sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 190, in __init__ setattr(self, method, getattr(_sock, method)) AttributeError: '_socketobject' object attribute 'recv' is read-only
As discussed on irc, let's not do the test as it's fragile and doesn't even work with many versions of python. It would also help if I actually included the part of the patch that actually fixed the problem...
Attachment #783988 - Flags: review?(mcote)
Attachment #783392 - Attachment is obsolete: true
Attachment #783392 - Flags: review?(mcote)
Comment on attachment 783988 [details] [diff] [review] Make pullFile more resilient take 2 Review of attachment 783988 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! Just one comment. Also I imagine you will put this on try just to be sure? ::: mozdevice/mozdevice/devicemanagerSUT.py @@ +570,2 @@ > data = self._sock.recv(to_recv) > + if data == "": I believe this should be "if not data" because recv() will return None if the connection was closed.
Attachment #783988 - Flags: review?(mcote) → review+
Good point re: try. I'll do one now: https://tbpl.mozilla.org/?tree=Try&rev=e5eccc7394f6 (also includes some other changes which were made earlier, as I plan to do a new release of mozdevice and mirror to the tree once this lands)
Try run looks good except for the usual noise. Going to push to mozbase, then I'll sync it across to m-c.
Depends on: 900629
So my proposed fix was merged in with bug 900629, unfortunately it looks like it did not solve this problem. :( I think it's back to the original idea: put in retry logic for pullFile.
Tweaking summary to avoid false positives.
Summary: Intermittent Android DMError: DeviceManager: pull unsuccessful: could not get all file data | Automation Error: Exception caught while running tests. → Intermittent Android DMError: DeviceManager: pull unsuccessful: could not get all file data
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: