Closed
Bug 750982
Opened 13 years ago
Closed 12 years ago
Intermittent "Cleanup Device failed" with nothing but a sudden "program finished with exit code 1"
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: philor, Unassigned)
References
Details
(Keywords: intermittent-failure)
https://tbpl.mozilla.org/php/getParsedLog.php?id=11365038&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=11373963&tree=Fx-Team
recv'ing...
response: /data/data/com.mozilla.SUTAgentAndroid/files/tests
$>
program finished with exit code 1
elapsedTime=91.100019
Reporter | ||
Comment 1•13 years ago
|
||
Reporter | ||
Comment 2•13 years ago
|
||
Comment 3•13 years ago
|
||
I have the theory that we have to add stdout.flush() in certain spots before a point where we can fail.
bear mentioned something about that it was some sort of interaction between python and buildbot.
Comment 4•13 years ago
|
||
Callek, can you please look into making this failure mode a little more informative? I almost feel that the right approach is to add a print statement before any setflag() which we set before returning a code and exiting.
I was also hoping to turn this "red" (RETCODE_ERROR==1) job into "purple" (4) or "blue" (5) (whatever it actually re-triggers a job).
https://tbpl.mozilla.org/?rev=83ff77ce8d6c&tree=Mozilla-Inbound&jobname=Android%20XUL%20Tegra%20250%20mozilla-inbound%20opt%20test%20mochitest-7
(In reply to Phil Ringnalda (:philor) from comment #0)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11365038&tree=Mozilla-
> Inbound
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11373963&tree=Fx-Team
>
> recv'ing...
> response: /data/data/com.mozilla.SUTAgentAndroid/files/tests
> $>
> program finished with exit code 1
> elapsedTime=91.100019
These first two finish exactly the same way.
After checking that the processes have uninstalled it goes into trying to delete the devRoot but it fails silently around there.
Perhaps all we need is to print what flags we are going to set before returning any codes.
http://hg.mozilla.org/build/tools/file/default/sut_tools/cleanup.py#l54
(In reply to Phil Ringnalda (:philor) from comment #1)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11342315&tree=Mozilla-Aurora
On this one,
we start deleting devRoot but suddenly we can't delete anymore and fail
Deleting file(s) from /mnt/sdcard/tests/profile/extensions/staged/workerbootstrap-test@mozilla.org
Deleted worker.js
Deleted install.rdf
Deleted bootstrap.js
Unable to delete directory /mnt/sdcard/tests/profile/extensions/staged/workerbootstrap-test@mozilla.org
Unable
recv'ing...
response: to delete directory /mnt/sdcard/tests/profile/extensions/staged
Unable to delete directory /mnt/sdcard/tests/profile/extensions
Unable to delete userChrome.css
Unable to delete user.js
Unable to delete tests.manifest
Unable to delete tests.jar
Unable to delete permissions.sqlite
Unable to delete directory /mnt/sdcard/tests/profile
Deleting file(s) from /mnt/sdcard/tests/logs
Unable to delete mochitest.log
Unable to delete directory /mnt/sdcard/tests/logs
Unable to delete robocop.apk
Unable to delete fennec-15.0a1.en-US.android-arm.apk
Unable to delete directory /mnt/sdcard/tests
$>
removeDir() returned
and fail because we check to see if it is gone:
http://hg.mozilla.org/build/tools/file/default/sut_tools/cleanup.py#l66
I assume that you pointed out this log because it ends dramatically with signal 15.
05-01 20:22:32.829 W/Zygote ( 938): Preloaded drawable resource #0x1080process killed by signal 15
program finished with exit code -1
elapsedTime=192.726851
(In reply to Phil Ringnalda (:philor) from comment #2)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11377001&tree=Firefox
This last log is the same thing as the previous one.
Comment 5•13 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #4)
> Callek, can you please look into making this failure mode a little more
> informative? I almost feel that the right approach is to add a print
> statement before any setflag() which we set before returning a code and
> exiting.
Yea that is the overall right approach -- the problem actually stems from another issue though.
setFlag should be outputting for us more details here:
http://hg.mozilla.org/build/tools/file/fd310f3edd12/sut_tools/sut_lib.py#l376
log.info unfortunately is not actually doing output.
I have bug 749863 (awaiting review) that should fix this, by making the log.info (and log.debug) calls actually output when called. Marking dep so that if for some reason this doesn't get fixed by that we can notice!
> I was also hoping to turn this "red" (RETCODE_ERROR==1) job into "purple"
> (4) or "blue" (5) (whatever it actually re-triggers a job).
In the cases where cleanup happens AFTER the regular job portions this would be a bad idea, as it would re-run the job for failing to do post-cleanup. If it retries on early-cleanup failures then it is a good thing.
Also part of the issue I forsee though is that clientproxy is racy with error.flg and will kill buildbot once it notices it, which might be before we get any indication to actually RETRY, this is followup fodder though, once we see how much of a problem it is when I get my current patch-queue deployed.
Depends on: 749863
Reporter | ||
Comment 6•13 years ago
|
||
Reporter | ||
Comment 7•13 years ago
|
||
Reporter | ||
Comment 8•13 years ago
|
||
Reporter | ||
Comment 9•13 years ago
|
||
Reporter | ||
Comment 10•13 years ago
|
||
Reporter | ||
Comment 11•13 years ago
|
||
Reporter | ||
Comment 12•13 years ago
|
||
Reporter | ||
Comment 13•13 years ago
|
||
Reporter | ||
Comment 14•13 years ago
|
||
Reporter | ||
Comment 15•13 years ago
|
||
Reporter | ||
Comment 16•13 years ago
|
||
Reporter | ||
Comment 17•13 years ago
|
||
Reporter | ||
Comment 18•13 years ago
|
||
Reporter | ||
Comment 19•13 years ago
|
||
Reporter | ||
Comment 20•13 years ago
|
||
Reporter | ||
Comment 21•13 years ago
|
||
Reporter | ||
Comment 22•13 years ago
|
||
Reporter | ||
Comment 23•13 years ago
|
||
Reporter | ||
Comment 24•13 years ago
|
||
Reporter | ||
Comment 25•13 years ago
|
||
Reporter | ||
Comment 26•13 years ago
|
||
Reporter | ||
Comment 27•13 years ago
|
||
Reporter | ||
Comment 28•13 years ago
|
||
Reporter | ||
Comment 29•13 years ago
|
||
Reporter | ||
Comment 30•13 years ago
|
||
Reporter | ||
Comment 31•13 years ago
|
||
Reporter | ||
Comment 32•13 years ago
|
||
Reporter | ||
Comment 33•13 years ago
|
||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•12 years ago
|
Keywords: intermittent-failure
Assignee | ||
Updated•12 years ago
|
Whiteboard: [orange]
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•