Closed Bug 459527 Opened 16 years ago Closed 16 years ago

problems after: "WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test" needs process killed

Categories

(Release Engineering :: General, defect, P1)

All
Windows Server 2003
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: joduinn)

Details

"WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test" tinderbox is red because it needs a process (probably either firefox.exe or xpcshell.exe) killed.  (The process is holding open files that the build process tries, and fails, to remove.)
None of the passwords we have on file for cltbld, buildbot, mozqa, or root/administrator for any of build or qa seem to work for logging into this machine.
Assignee: server-ops → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering: Maintenance
QA Contact: mrz → release
paged joduinn
Assignee: nobody → joduinn
Priority: -- → P1
responding to page, and taking bug.
1) machine password was temp reset on Thursday so dietrich and sdwilsh could do some debugging on that machine. Once they have finished debugging, we will reset the password to usual passwords on file with IT. Sorry for the snafu, we should have let IT-oncall know of this experiment-in-progress.

2) I couldnt find any extra processes running on this machine, but rebooted it anyway, just in case. 

3) I've now triggered a new unittest run on moz2-win32-slave08, to confirm all is ok. Please reopen this bug if there is still a problem.
OS: Other → Windows Server 2003
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
This box subequently had a bunch of hg exceptions, which strangely didn't show up as bustage on the tinderbox waterfall, and only on builds there were changes (the periodic scheduler. Since we most likely have an older mercurial on there, it isn't coping with the two heads since the 3.1b1 release branch was added. I've clobbered the box, because that's the very easiest way to start again.
er ... (the periodic scheduler was fine).
Um weird, the clobber didn't fix it. I've verified that
* the problem started immediately after the reboot of moz2-win2-slave08 on 2008-10-11
* the buildbot slave got started the same way as moz2-win32-slave07, which doesn't have this problem
* the problem is that a no-change build can update fine, but any build with a change will report an error to the buildbot waterfall, and not-report to the tinderbox waterfall to leave us with a perma-yellow build box.

We go
  C:\WINDOWS\system32\cmd.exe /c d:\mercurial\hg.EXE pull --verbose
and get this stdio
  pulling from http://hg.mozilla.org/mozilla-central
  searching for changes
  adding changesets
  adding manifests
  adding file changes
  added 1 changesets with 5 changes to 5 files
  (run 'hg update' to get a working copy)
If you open the stdio log for this it never finishes loading, which is very odd given it has purple "exception" status.

Then python call-stack for the error is:
Traceback (most recent call last):
  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/spread/pb.py", line 884, in _recvMessage
    netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/spread/flavors.py", line 119, in remoteMessageReceived
    state = method(*args, **kw)
  File "/tools/buildbot-077/lib/python2.5/site-packages/buildbot/process/buildstep.py", line 165, in remote_update
    self._finished(Failure())
  File "/tools/buildbot-077/lib/python2.5/site-packages/buildbot/process/buildstep.py", line 198, in _finished
    d = defer.maybeDeferred(self.remoteComplete, failure)
--- <exception caught here> ---
  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 107, in maybeDeferred
    result = f(*args, **kw)
  File "/tools/buildbot-077/lib/python2.5/site-packages/buildbot/process/buildstep.py", line 357, in remoteComplete
    loog.finish()
  File "/tools/buildbot-077/lib/python2.5/site-packages/buildbot/status/builder.py", line 402, in finish
    self.merge()
  File "/tools/buildbot-077/lib/python2.5/site-packages/buildbot/status/builder.py", line 365, in merge
    text = "".join([c[1] for c in self.runEntries])
<type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode byte 0x92 in position 153: ordinal not in range(128)

This may be more informative:
 http://production-master.build.mozilla.org:2010/builders/WINNT%205.2%20mozilla-central%20moz2-win32-slave08%20dep%20unit%20test/builds/877/steps/hg/logs/err.html

Any ideas Ben ? Complete guess, deferred connections still hanging around on the master ?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: "WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test" needs process killed → problems after: "WINNT 5.2 mozilla-central moz2-win32-slave08 dep unit test" needs process killed
Did a restart and clobber, it's green now.  Should we leave this bug open in case the issue comes up again?
I suggest we wait a few hours before closing.
Fine now.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Component: Release Engineering: Maintenance → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.