Closed
Bug 701755
Opened 13 years ago
Closed 12 years ago
be over aggressive in killing off servers during the setup scripts for a test run
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: bear)
References
Details
(Keywords: intermittent-failure, Whiteboard: [mobile_unittests][android_tier_1])
Attachments
(1 file, 2 obsolete files)
3.77 KB,
patch
|
mozilla
:
review+
bear
:
checked-in+
|
Details | Diff | Splinter Review |
Last quarter we landed code to ensure our test harnesses will cleanup properly on errors. This works great, but there are scenarios where the device loses connectivity and we kill the entire job from buildbot/twisted error handling. The end result is we still have leftover webservers sitting around. We need to be super aggressive before each test run while we are installing the .apk and look for anything running on the ports that we care about and kill -9 those processes.
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → bear
Assignee | ||
Comment 1•12 years ago
|
||
Attachment #587118 -
Flags: review?(aki)
Assignee | ||
Updated•12 years ago
|
Attachment #587118 -
Flags: feedback?(hwine)
Comment 2•12 years ago
|
||
Comment on attachment 587118 [details] [diff] [review] try and remove *all* related child processes for a tegra during cleanup Looks as if the return value from checkStalled is: False - no processes were found to be killed True - some processes were found to be killed from 'ps' OR some "well known" processes were found and could not be killed (i.e. it's not related to whether there are still processes running after function completes) That seems an odd enough contract to be worth a comment.
Attachment #587118 -
Flags: feedback?(hwine) → feedback+
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Hal Wine [:hwine] from comment #2) > Comment on attachment 587118 [details] [diff] [review] > try and remove *all* related child processes for a tegra during cleanup > > Looks as if the return value from checkStalled is: > False - no processes were found to be killed > True - some processes were found to be killed from 'ps' OR > some "well known" processes were found and could not be killed > (i.e. it's not related to whether there are still processes running after > function completes) > > That seems an odd enough contract to be worth a comment. yea, I'm torn on that also - do I just assume that now with the more aggressive bits that it will always clean up and not return anything or make the True just be informational in the cleanup.py calling step
Comment 4•12 years ago
|
||
(In reply to Mike Taylor [:bear] from comment #3) > (In reply to Hal Wine [:hwine] from comment #2) > > Comment on attachment 587118 [details] [diff] [review] > > try and remove *all* related child processes for a tegra during cleanup > > > > Looks as if the return value from checkStalled is: > > False - no processes were found to be killed > > True - some processes were found to be killed from 'ps' OR > > some "well known" processes were found and could not be killed > > (i.e. it's not related to whether there are still processes running after > > function completes) > > > > That seems an odd enough contract to be worth a comment. > > yea, I'm torn on that also - do I just assume that now with the more > aggressive bits that it will always clean up and not return anything or make > the True just be informational in the cleanup.py calling step Maybe something like CLEAN = 1 DIRTY_BUT_KILLED = 2 DIRTY_NOT_KILLED = 3 and return one of those?
Comment 5•12 years ago
|
||
Comment on attachment 587118 [details] [diff] [review] try and remove *all* related child processes for a tegra during cleanup Hm, should you be checking the return value of the killPID call in checkStalled ? I can see wanting to have more info about what's happening, either in output or in the return value or logs or w/e.
Attachment #587118 -
Flags: review?(aki) → review+
Assignee | ||
Comment 6•12 years ago
|
||
changed result from checkStalled to reflect if stalled pids were found and handled or found and not handled
Attachment #587118 -
Attachment is obsolete: true
Attachment #596680 -
Flags: review?(aki)
Comment 7•12 years ago
|
||
Comment on attachment 596680 [details] [diff] [review] try and remove *all* related child processes for a tegra during cleanup >+if checkStalled(os.env['SUT_NAME']) > 1: >+ setFlag(errorFile, "Remote Device Error: process from previous test run present") >+ sys.exit(2) I like how you put this in a method. Why are we setting an error flag if we found old cruft but cleaned it up? I would guess we'd want a warning at 2 and an error at 3. Possibly an "all clear" message at 1 depending how verbose we want to be. >+ for pid in pids: >+ if not killPID(pid, sig=signal.SIGKILL, includeChildren=True): >+ result := 3 I thought this was some sort of special python command, but on Python 2.6.7 this appears to be a typo: >>> a = 2 >>> a := 3 File "<stdin>", line 1 a := 3 ^ SyntaxError: invalid syntax We're close though.
Attachment #596680 -
Flags: review?(aki) → review-
Assignee | ||
Comment 8•12 years ago
|
||
changed response to checkStalled to show a warning if the result was > 1 and removed errant Delphi-style assignment that snuck in
Attachment #596680 -
Attachment is obsolete: true
Attachment #596711 -
Flags: review?(aki)
Comment 9•12 years ago
|
||
Comment on attachment 596711 [details] [diff] [review] try and remove *all* related child processes for a tegra during cleanup I almost want a final else: to catch any weirdness when errcode > 1 but is not 2 or 3. But this will work without it. r=me.
Attachment #596711 -
Flags: review?(aki) → review+
Assignee | ||
Comment 10•12 years ago
|
||
Comment on attachment 596711 [details] [diff] [review] try and remove *all* related child processes for a tegra during cleanup committed changeset 2226:4ef60f593587
Attachment #596711 -
Flags: checked-in+
Assignee | ||
Comment 11•12 years ago
|
||
overnight cronjob runs of this new code has removed all but 1 hung CP tegra \o/ marking as fixed
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange][mobile_unittests][android_tier_1] → [mobile_unittests][android_tier_1]
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•