Closed Bug 750982 Opened 13 years ago Closed 12 years ago

Intermittent "Cleanup Device failed" with nothing but a sudden "program finished with exit code 1"

Categories

(Release Engineering :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=11365038&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=11373963&tree=Fx-Team recv'ing... response: /data/data/com.mozilla.SUTAgentAndroid/files/tests $> program finished with exit code 1 elapsedTime=91.100019
I have the theory that we have to add stdout.flush() in certain spots before a point where we can fail. bear mentioned something about that it was some sort of interaction between python and buildbot.
Callek, can you please look into making this failure mode a little more informative? I almost feel that the right approach is to add a print statement before any setflag() which we set before returning a code and exiting. I was also hoping to turn this "red" (RETCODE_ERROR==1) job into "purple" (4) or "blue" (5) (whatever it actually re-triggers a job). https://tbpl.mozilla.org/?rev=83ff77ce8d6c&tree=Mozilla-Inbound&jobname=Android%20XUL%20Tegra%20250%20mozilla-inbound%20opt%20test%20mochitest-7 (In reply to Phil Ringnalda (:philor) from comment #0) > https://tbpl.mozilla.org/php/getParsedLog.php?id=11365038&tree=Mozilla- > Inbound > https://tbpl.mozilla.org/php/getParsedLog.php?id=11373963&tree=Fx-Team > > recv'ing... > response: /data/data/com.mozilla.SUTAgentAndroid/files/tests > $> > program finished with exit code 1 > elapsedTime=91.100019 These first two finish exactly the same way. After checking that the processes have uninstalled it goes into trying to delete the devRoot but it fails silently around there. Perhaps all we need is to print what flags we are going to set before returning any codes. http://hg.mozilla.org/build/tools/file/default/sut_tools/cleanup.py#l54 (In reply to Phil Ringnalda (:philor) from comment #1) > https://tbpl.mozilla.org/php/getParsedLog.php?id=11342315&tree=Mozilla-Aurora On this one, we start deleting devRoot but suddenly we can't delete anymore and fail Deleting file(s) from /mnt/sdcard/tests/profile/extensions/staged/workerbootstrap-test@mozilla.org Deleted worker.js Deleted install.rdf Deleted bootstrap.js Unable to delete directory /mnt/sdcard/tests/profile/extensions/staged/workerbootstrap-test@mozilla.org Unable recv'ing... response: to delete directory /mnt/sdcard/tests/profile/extensions/staged Unable to delete directory /mnt/sdcard/tests/profile/extensions Unable to delete userChrome.css Unable to delete user.js Unable to delete tests.manifest Unable to delete tests.jar Unable to delete permissions.sqlite Unable to delete directory /mnt/sdcard/tests/profile Deleting file(s) from /mnt/sdcard/tests/logs Unable to delete mochitest.log Unable to delete directory /mnt/sdcard/tests/logs Unable to delete robocop.apk Unable to delete fennec-15.0a1.en-US.android-arm.apk Unable to delete directory /mnt/sdcard/tests $> removeDir() returned and fail because we check to see if it is gone: http://hg.mozilla.org/build/tools/file/default/sut_tools/cleanup.py#l66 I assume that you pointed out this log because it ends dramatically with signal 15. 05-01 20:22:32.829 W/Zygote ( 938): Preloaded drawable resource #0x1080process killed by signal 15 program finished with exit code -1 elapsedTime=192.726851 (In reply to Phil Ringnalda (:philor) from comment #2) > https://tbpl.mozilla.org/php/getParsedLog.php?id=11377001&tree=Firefox This last log is the same thing as the previous one.
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #4) > Callek, can you please look into making this failure mode a little more > informative? I almost feel that the right approach is to add a print > statement before any setflag() which we set before returning a code and > exiting. Yea that is the overall right approach -- the problem actually stems from another issue though. setFlag should be outputting for us more details here: http://hg.mozilla.org/build/tools/file/fd310f3edd12/sut_tools/sut_lib.py#l376 log.info unfortunately is not actually doing output. I have bug 749863 (awaiting review) that should fix this, by making the log.info (and log.debug) calls actually output when called. Marking dep so that if for some reason this doesn't get fixed by that we can notice! > I was also hoping to turn this "red" (RETCODE_ERROR==1) job into "purple" > (4) or "blue" (5) (whatever it actually re-triggers a job). In the cases where cleanup happens AFTER the regular job portions this would be a bad idea, as it would re-run the job for failing to do post-cleanup. If it retries on early-cleanup failures then it is a good thing. Also part of the issue I forsee though is that clientproxy is racy with error.flg and will kill buildbot once it notices it, which might be before we get any indication to actually RETRY, this is followup fodder though, once we see how much of a problem it is when I get my current patch-queue deployed.
Depends on: 749863
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.