Closed
Bug 690232
Opened 13 years ago
Closed 13 years ago
Windows slaves: SIGKILL failed to kill process
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 666019
People
(Reporter: philor, Unassigned)
References
Details
(Whiteboard: [windows][builldbot])
I thought this was WinXP-only, because that was the only place I'd seen it, for either weeks or months now, not sure which. But https://tbpl.mozilla.org/php/getParsedLog.php?id=6592286&tree=Mozilla-Inbound is a Win7 Talos run which hung (because msys blows, and it got a permission denied error trying to clear the cache, that part's uninteresting), and then SIGKILL failed to kill process. And https://tbpl.mozilla.org/php/getParsedLog.php?id=6590093&tree=Mozilla-Inbound is a Win64 nightly, which timed out in hg, again uninteresting and no surprise, and then SIGKILL failed to kill process. For completeness, a random chunk of WinXP ones: https://tbpl.mozilla.org/php/getParsedLog.php?id=6602739&tree=Mozilla-Inbound was a shutdown timeout https://tbpl.mozilla.org/php/getParsedLog.php?id=6589136&tree=Mozilla-Inbound was a test timeout https://tbpl.mozilla.org/php/getParsedLog.php?id=6597228&tree=Firefox was a shutdown timeout
Reporter | ||
Comment 1•13 years ago
|
||
Forgot to mention the severity-enhancer that made me actually file: https://tbpl.mozilla.org/php/getParsedLog.php?id=6602739&tree=Mozilla-Inbound was some mochitest-chrome shutdown timeout, which then ate our mochitest-browser-chrome, mochitest-a11y and mochitest-ipcplugins
Reporter | ||
Comment 2•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6603813&tree=Mozilla-Inbound
Reporter | ||
Comment 3•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6603202&tree=Mozilla-Inbound
Comment 4•13 years ago
|
||
Oh sigh. I thought these were not an issue anymore. I bet this [a] got lost with the newer version of buildbot/twisted 10.1 [b] [a] https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Twisted_patch_to_allow_buildbot_to_kill_jobs [b] https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Install_Buildbot This is awful. For a little more context: * w7 tester [1] - OSError: [Errno 13] Permission denied: 'c:\\users\\cltbld\\appdata\\local\\temp\\tmpqo0enl\\profile\\Cache\\_CACHE_001_' * w64 builders [2] - SIGKILL failed to kill process (after a time out) * xp tester [3] - SIGKILL failed to kill process (after a time out) [1] Running test tp5: Started Wed, 28 Sep 2011 02:19:05 Screen width/height:1024/768 colorDepth:24 Browser inner width/height: 1006/586 NOISE: Cycle 1: loaded http://localhost/page_load_test/tp5/thesartorialist.blogspot.com/thesartorialist.blogspot.com/index.html (next: http://localhost/page_load_test/tp5/cakewrecks.blogspot.com/cakewrecks.blogspot.com/index.html) Traceback (most recent call last): File "run_tests.py", line 540, in ? test_file(arg, screen, amo) File "run_tests.py", line 485, in test_file browser_dump, counter_dump, print_format = mytest.runTest(browser_config, test) File "c:\talos-slave\talos-data\talos\ttest.py", line 397, in runTest self.cleanupProfile(temp_dir) File "c:\talos-slave\talos-data\talos\ttest.py", line 149, in cleanupProfile self._hostproc.removeDirectory(dir) File "c:\talos-slave\talos-data\talos\ffprocess_win32.py", line 203, in removeDirectory shutil.rmtree(dir) File "C:\Python24\lib\shutil.py", line 163, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python24\lib\shutil.py", line 163, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python24\lib\shutil.py", line 168, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Python24\lib\shutil.py", line 166, in rmtree os.remove(fullname) OSError: [Errno 13] Permission denied: 'c:\\users\\cltbld\\appdata\\local\\temp\\tmpqo0enl\\profile\\Cache\\_CACHE_001_' [2] Error pulling changes into e:\builds\moz2_slave\m-in-w64-ntly\build from http://hg.mozilla.org/integration/mozilla-inbound; clobbering command: START command: hg clone -r 95bbaf6cb2a6c9a4d3375da8381cb8db909ec4a0 http://hg.mozilla.org/integration/mozilla-inbound e:\\\\builds\\\\moz2_slave\\\\m-in-w64-ntly\\\\build command: cwd: e:\builds\moz2_slave\m-in-w64-ntly command: output: command timed out: 3600 seconds without output, attempting to kill SIGKILL failed to kill process using fake rc=-1 program finished with exit code -1 remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last): Failure: exceptions.RuntimeError: SIGKILL failed to kill process ] [3] WARNING: 1 sort operation has occurred for the SQL statement '0x16315df8'. See https://developer.mozilla.org/En/Storage/Warnings details.: file e:/builds/moz2_slave/m-in-w32-dbg/build/storage/src/mozStoragePrivateHelpers.cpp, line 144 TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output command timed out: 1200 seconds without output, attempting to kill SIGKILL failed to kill process using fake rc=-1 program finished with exit code -1
Comment 5•13 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #4) > Oh sigh. I thought these were not an issue anymore. > > I bet this [a] got lost with the newer version of buildbot/twisted 10.1 [b] > > [a] > https://wiki.mozilla.org/ReferencePlatforms/Test/ > WinXP#Twisted_patch_to_allow_buildbot_to_kill_jobs I believe Dustin was concerned about this particular patch when he was rolling out the new buildbot version, but couldn't find anyone at the time who could give him details. Armen: can you verify that the affected slaves in the logs that philor linked are, in fact, missing this twisted patch? Note: I'm not asking you to take the bug, just verify the cause.
OS: Windows 7 → All
Priority: -- → P3
Whiteboard: [windows][builldbot]
Comment 6•13 years ago
|
||
I believe this is the issue: C:\Users\cltbld>C:\mozilla-build\wget\wget.exe http://hg.mozilla.org/build/opsi-package-sources/raw-file/520de951bbb0/twisted_dumbwin32proc/CLIENT_DATA/_dumbwin32proc.py C:\Users\cltbld>C:\mozilla-build\msys\bin\diff.exe _dumbwin32proc.py C:\mozilla-build\buildbotve\Lib\site-packages\twisted\internet\_dumbwin32proc.py 241c241,242 < os.popen('taskkill /T /F /PID %s' % self.pid) --- > win32process.TerminateProcess(self.hProcess, 1) > We should deploy that version to all Windows build and test slaves.
Comment 7•13 years ago
|
||
That diff is monkeypatched in - see bug 666019 So it's possible that monkeypatch isn't working correctly, or that there's some other killing-processes-on-windows patch that used to be in place, but which nobody could remember well enough to point me to. I would recommend starting your diagnostics there, rather than patching over the problem by hacking _dumbwin32proc.py.
Reporter | ||
Comment 8•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6623585&tree=Firefox
Comment 9•13 years ago
|
||
I just hit this on WinXP Debug TryServer. I was expecting an orange result (from a crashtest that's expected to hang), but got purple on WinXP Debug instead, since we fail to kill the hanging process. https://tbpl.mozilla.org/?tree=Try&rev=0762a4443dc1 https://tbpl.mozilla.org/php/getParsedLog.php?id=6842811&tree=Try https://tbpl.mozilla.org/php/getParsedLog.php?id=6844930&tree=Try
Reporter | ||
Comment 12•13 years ago
|
||
I wasn't too worried about this, because I look at every single failed result no matter what the color, but it turns out that in general people just totally ignore purple, and also believe that all purple is the same, so if they push a Windows crash to try, they just assume try is broken when they get purple, and go ahead and push it for real.
Severity: normal → blocker
Summary: (Some?) Windows slaves: SIGKILL failed to kill process → Windows slaves: SIGKILL failed to kill process
Reporter | ||
Comment 13•13 years ago
|
||
I guess this isn't actually blocking development, just making it miserable.
Severity: blocker → critical
Comment 14•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #7) > That diff is monkeypatched in - see bug 666019 > > So it's possible that monkeypatch isn't working correctly, or that there's > some other killing-processes-on-windows patch that used to be in place, but > which nobody could remember well enough to point me to. I would recommend > starting your diagnostics there, rather than patching over the problem by > hacking _dumbwin32proc.py. Actually I just looked because the *newly* rebuilt SeaMonkey slaves hit this. And it looks like the patch from Bug 666019 despite mentioning it was deployed to the slaves branch, was actually deployed to default, then merged to production-0.8 and never hit the slaves branch. I suggest we either manually apply this patch to our slaves or deploy a buildbot 0.8.4-pre-moz3
Comment 15•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #14) > (In reply to Dustin J. Mitchell [:dustin] from comment #7) > And it looks like the patch from Bug 666019 despite mentioning it was > deployed to the slaves branch, was actually deployed to default, then merged > to production-0.8 and never hit the slaves branch. Correction: was never deployed to hg at all (I looked at wrong monkeypatch)
Reporter | ||
Comment 16•13 years ago
|
||
Is there any chance this will ever be fixed, or should I patch tbpl to lie about the status of jobs, and show all purple as orange?
Comment 17•13 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #16) > Is there any chance this will ever be fixed, or should I patch tbpl to lie > about the status of jobs, and show all purple as orange? Just rediscovered today in triage, so...possibly? Let's dupe to bug 666019 and get that deployed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•