Closed Bug 666019 Opened 14 years ago Closed 11 years ago

monkey-patch a stock Twisted-11.0.0 to manage processes correctly

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

References

Details

(Whiteboard: [windows][builldbot])

Attachments

(1 file, 2 obsolete files)

From what I understand of bug 600736 and https://wiki.mozilla.org/ReferencePlatforms/Win64#Buildbot, the Twisted we want on slaves is Twisted-9.0.0 with both http://hg.mozilla.org/build/twisted/raw-rev/0b441f0f68b4 https://bug543626.bugzilla.mozilla.org/attachment.cgi?id=451277 where the first is just a simple patch to use taskkill instead of win32process.TerminateProcess(self.hProcess, 1). The second is a forward-port of a patch catlee submitted to Twisted: Date: Mon Jan 25 16:34:20 2010 +0000 Merge faster-b1282int-2310 Author: catlee Reviewer: TimAllen, exarkun Fixes: #2310 Switch the banana function for decoding base-128 encoded integers from an expensive exponentiation-based approach to a much cheaper bit-shifting approach. Also add a benchmark which demonstrates how much faster decoding these integers now is. And which has been accepted upstream in 10.0.0. So I need to patch our local copy of Buildbot to apply the _dumbwin32proc.py patch; the other isn't required.
Blocks: 637347
Attached image screenshot.jpg (obsolete) —
So this failed - and badly - with the attached error message. I'll try to transcribe the important stuff for future searching: File "D:\mozilla-build\python24\lib\site-packages\twisted\internet\_dumbwin32proc.py", line 205, in __init__ raise OSError(pwte) exceptions.OSError: (5, 'AssignProcessToJobObject', 'Access is Denied.') Although now that I look at it, that's not the version of Twisted I installed (mine was in c:\tmp\bbve). I'll try again.
OK, that was a false alarm - not sure how. This is running now on win32-slave60. I'm aware that the tests that need to be killed are not running on this system, but it's a start.
Attached patch m666019-buildbot-p1-r1.patch (obsolete) — Splinter Review
the patch in question
Attachment #540938 - Flags: review?(armenzg)
Attachment #540938 - Flags: feedback?(jmaher)
Comment on attachment 540938 [details] [diff] [review] m666019-buildbot-p1-r1.patch The feedback I'd like is: is this patch, *applied to a stock Twisted-11.0.0 install* (rather than to the Twisted that's on the refimage), likely to be sufficient? Set aside the Twisted version upgrade.
Attachment #540938 - Flags: feedback?(anodelman)
Comment on attachment 540938 [details] [diff] [review] m666019-buildbot-p1-r1.patch Review of attachment 540938 [details] [diff] [review]: ----------------------------------------------------------------- I am not a twisted expert, but this patch seems pretty straightforward. ::: slave/buildslave/dumbw32monkey.py @@ +21,5 @@ > + return > + > + from twisted.internet import error > + > + print "applying _dumbwin32proc monkeypatch" I am not familiar with the print statements normally used with the buildslave and twisted, but should this print statement be a bit more descriptive so it can be found easier? @@ +26,5 @@ > + def new_signalProcess(self, signalID): > + if self.pid is None: > + raise error.ProcessExitedAlready() > + if signalID in ("INT", "TERM", "KILL"): > + os.popen('taskkill /T /F /PID %s' % self.pid) what if signalID isn't in the list? Should we print a warning, do we even care?
Attachment #540938 - Flags: feedback?(jmaher) → feedback+
(In reply to comment #5) > I am not familiar with the print statements normally used with the > buildslave and twisted, but should this print statement be a bit more > descriptive so it can be found easier? It will appear on the twistd.log. It's just useful as a way of verifying that the patch is in place. > what if signalID isn't in the list? Should we print a warning, do we even > care? I don't think we care - the buildslave never sends any other signals. The original doesn't handle such signals either, so I think that's fine. I'm not too concerned with the correctness of the patch, since I'm copying something that's already in production -- I'm more concerned to know that this patch is *enough*.
Comment on attachment 540938 [details] [diff] [review] m666019-buildbot-p1-r1.patch I like the approach.
Attachment #540938 - Flags: review?(armenzg) → review+
OK, landed on the slaves branch, although not deployed anywhere yet.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Attachment #540938 - Flags: feedback?(anodelman)
(In reply to Dustin J. Mitchell [:dustin] from comment #8) > OK, landed on the slaves branch, although not deployed anywhere yet. Actually doesn't look like this was checked in at all, (c.f. Bug 690232)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whoops, this definitely isn't me anymore.
Assignee: dustin → nobody
Attachment #540931 - Attachment is obsolete: true
So what needs to happen here? Where should this patch be landing, and where does it need to be deployed? All Windows platforms I'm assuming?
Priority: -- → P3
Whiteboard: [windows][builldbot]
(In reply to Chris Cooper [:coop] from comment #11) > So what needs to happen here? We need to deploy this patch to windows machines somehow. > Where should this patch be landing, http://hg.mozilla.org/build/buildbot on the slaves branch. > and where does it need to be deployed? > All Windows platforms I'm assuming? It needs to be deployed on all windows machines, if its better to hard-patch this on windows via opsi/etc. or if its better to cut a moz3 package of buildbot 0.8.4 I'm not sure, but either should be aproximately the same amount of work, since we have buildbot installed as a virtualenv now on windows.
FYI I just deployed this on my win slaves that were hitting this (due to the pain I'm experiencing on SeaMonkey due to unrelated test issues causing hangs because we can't kill stuff right) with the following: * Graceful shutdown buildbot on the slave * Login as Administrator * Open a mozbuild prompt * Execute the following commands cd /d wget -O patch https://bugzilla.mozilla.org/attachment.cgi?id=540938 cd mozilla-build/buildbotve/Lib/site-packages/buildslave patch -p3 < /d/patch rm bot.pyc /d/patch * Reboot slave If I encounter weird errors/problems with the patch I'll report here, but I don't expect any.
FYI after applying the patch I still get failures here: TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_importScripts.html | application timed out after 330 seconds with no output If you fixed bug 589668, you'd get a screenshot here command timed out: 1200 seconds without output, attempting to kill SIGKILL failed to kill process using fake rc=-1 program finished with exit code -1 I also checked twistd.log to make sure that its being run, it is. I also applied a slight change to print to the log when running the patched function, which is never run, which to me means the monkeypatch isn't actually getting applied internally correctly
Attached patch v2Splinter Review
Dustin suggested this change on IRC after I had issues with the patch in my environ. This is the correct patch. (very trivial change, but necessary). That said, in the few instances that have run since I got this landed, I'm no longer getting SIGKILL Failed... messages and subsequent complete failures. I am instead getting things like: |ERROR: The process "1744" not found.| Though logging into the machine seems to indicate its not a real error. And its not triggering any odd stuff with buildbot. (my theory is that the process goes away before buildbot can then kill it...) Either way, this brings us back to where we were before the buildbot upgrade.
Attachment #540938 - Attachment is obsolete: true
Attachment #580725 - Flags: review+
Assigning to bear as part of his "make-buildduty-suck-less" initiative.
Assignee: nobody → bear
Is there *anything* that we can quickly and reasonably do here? We are repeatedly missing genuine test failures, because we believe that purple means purple, rather than remembering that purple on Windows is absolutely devoid of any meaning whatsoever.
Attachment #580725 - Attachment is patch: true
This bit us again last night. There is a reviewed patch in-bug, can it land?
(In reply to Ed Morley (27th Aug is UK public holiday, back 28th) [:edmorley] from comment #20) > This bit us again last night. > > There is a reviewed patch in-bug, can it land? Unfortunately, simply checking it in isn't going to do anything. It needs to get deployed, which means we need a RelEng person assigned. Bear is no longer with the group, so I'm unassigning him and removing the priority to get this retriaged.
Assignee: bear → nobody
Priority: P3 → --
found in triage.
Component: Release Engineering → Release Engineering: Platform Support
QA Contact: coop
Is this still needed on Windows after the switch to Win64 builders and GPO?
I think it still is required as part of the imaging/GPO process. If it's already covered by that, then no need for this bug.
Product: mozilla.org → Release Engineering
(In reply to Chris AtLee [:catlee] from comment #24) > I think it still is required as part of the imaging/GPO process. If it's > already covered by that, then no need for this bug. Any clue about this, Q?
Flags: needinfo?(q)
We patch the twisted files during the machine install. And we have GPO frameworks for updating files in mozillabuild if need be.
Flags: needinfo?(q)
Resolving based on comment #26.
Status: REOPENED → RESOLVED
Closed: 14 years ago11 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: