Closed Bug 419328 Opened 16 years ago Closed 16 years ago

"WINNT 5.2 qm-win2k3-01 dep unit test" stopped cycling (last cycle had weird error)

Categories

(Release Engineering :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: joduinn)

References

()

Details

"WINNT 5.2 qm-win2k3-01 dep unit test" stopped cycling.  I'd hoped that a checkin would cause a new cycle that would work, but it didn't.

The last cycle's log, in its entirety (from http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1203873881.1203873884.20062.gz&fulltext=1 ) was:

tinderbox: tree: Firefox
tinderbox: builddate: 1203873881
tinderbox: status: busted
tinderbox: build: WINNT 5.2 qm-win2k3-01 dep unit test
tinderbox: errorparser: unittest
tinderbox: binaryurl: 
tinderbox: logcompression: bzip2
tinderbox: logencoding: base64
tinderbox: END
Unable to kill process sh.exe:
Process does not exist.

PsKill v1.12 - Terminates processes on local or remote systems
Copyright (C) 1999-2005  Mark Russinovich
Sysinternals - www.sysinternals.com

Unable to kill process make.exe:
Process does not exist.

PsKill v1.12 - Terminates processes on local or remote systems
Copyright (C) 1999-2005  Mark Russinovich
Sysinternals - www.sysinternals.com

[Failure instance: Traceback (failure with no frames): twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
]

No More Errors
Assignee: server-ops → aravind
The box had some messages like XPCshell crashing, firefox crashing and such. Wouldn't let me login to rdesktop.  Powercycled the box and trying to clean stuff up and restart things.

Kicked off the buildbot processes.  Please re-open if necessary.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
I think there's a focus problem on the tinderbox that needs to be resolved.  Can you make sure the mouse is outside of the area where mochitests will run, please?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The cmd window I started buildbot in is minimized.  and I killed the rdesktop session.  So not sure what mouse you are talking about.  Looks like the box is just sitting there doing nothing.

Not much more I can do here.
Left John a VM about the box.
Just talked with Aravind on the phone. This is one of the Talos machines (running buildbot). Not sure whats going on, but will start investigating.
(In reply to comment #7)
> Just talked with Aravind on the phone. This is one of the Talos machines
> (running buildbot). Not sure whats going on, but will start investigating.

This is a unit test machine, not a Talos machine.
Even after killing buildbot, and confirming no errant sh.exe or make.exe were
running, the machine was still very very very slow. and kept dropping RDC
connections. (fwiw, I did notice 4 different instances of rotatelogs.exe
running on this machine)

I've rebooted the machine, confirmed the slave working directory is cleaned out, and restarted the buildbot slave.
On these builds, it's good to force a build after you think you've fixed the problem, because then it's clear whether the problem is fixed and we can reopen the tree.

That said, I just reopened the tree anyway.  The orange is sort of "known", and we'll just live with it if it stays around.
(In reply to comment #10)
> On these builds, it's good to force a build after you think you've fixed the
> problem, because then it's clear whether the problem is fixed and we can reopen
> the tree.
I have been trying to do that, but the buildbot master wont allocate jobs to the newly rebooted slave. Its a known problem we get sometimes with buildbot, and I'm (still) trying to fix that. 


> That said, I just reopened the tree anyway.  The orange is sort of "known", and
> we'll just live with it if it stays around.
Err... I guess ok. Dont know whats best in these situations, so happy to follow your lead. From IRC it seems like there were buildbot/unittest problems and also code problems at play this morning, which makes me feel better about effort required to figure out what was going on.
Over to build per Johns request.
Assignee: aravind → nobody
Status: REOPENED → NEW
Component: Server Operations: Tinderbox Maintenance → Build & Release
QA Contact: justin → build
(In reply to comment #11)
> (In reply to comment #10)
> > On these builds, it's good to force a build after you think you've fixed the
> > problem, because then it's clear whether the problem is fixed and we can reopen
> > the tree.
> I have been trying to do that, but the buildbot master wont allocate jobs to
> the newly rebooted slave. Its a known problem we get sometimes with buildbot,
> and I'm (still) trying to fix that. 
The instructions on http://wiki.mozilla.org/Buildbot/IT_Unittest_Support_Document dont match up with what I find on buildbot master qm-rhel02, and the only log files I find on qm-rhel02 are many months old. Not sure where the master really lives, not what to do now, so will contact robcee for help. 
Priority: -- → P1
Talked with robcee; the doc was fine, I was mis-reading it! :-( Have now restarted master, and its allocating jobs to all slaves correctly now, including qm-win2k3-01. Lets see how it goes now.
Assignee: nobody → joduinn
The mochitest failures went away, but now all the PNG and JPEG tests are failing, which probably means the color depth changed.
I added a comment to http://wiki.mozilla.org/Buildbot/IT_Unittest_Support_Document about the color depth issue. [I sent an email to IT/QA about this issue last time, but this is the first time I've seen that there's a tinderbox troubleshooting document.]
Sending this back to IT since joduinn is gone and this machine is still having problems (see latest comments).
Assignee: joduinn → server-ops
Component: Build & Release → Server Operations: Tinderbox Maintenance
QA Contact: build → justin
(In reply to comment #17)
> Sending this back to IT since joduinn is gone and this machine is still having
> problems (see latest comments).
I'm still here; reclaiming back.

The original reported problem is fixed. Trying to figure out if roc's theory about the new problem is correct - investigating... 
Assignee: server-ops → joduinn
Component: Server Operations: Tinderbox Maintenance → Build & Release
QA Contact: justin → build
After reading https://bugzilla.mozilla.org/show_bug.cgi?id=414720#c0

... I confirmed that on qm-win2k3-01, the steps listed for GPEDIT.MSC are already valid and in place. No change needed - already set to 'Enabled' and '24 bit'.  

However, when I try to confirm the Windows Control Panel display settings is 24bit, I instead find that "Default Monitor" is set to 8bit, with no way to choose 24-bit mode. Interesting to note that in ControlPanel->Settings, the "Advanced" button is disabled and that the screen resolution is maxed out at 1280x1024.
(In reply to comment #19)
> After reading https://bugzilla.mozilla.org/show_bug.cgi?id=414720#c0

Keep reading. :-) The current problem, described further down in the bug, is that the desktop picks up the color depth of the last client to connect to it. Change the depth of your client, reconnect, and it will probably start working again.

The change to the stupid Windows registry setting was required to *allow* RDP clients to use 24-bit mode. There doesn't appear to be a way to *require* 24-bit mode, as far as I can tell. [If there was, that would eliminate this recurring issue.]
(In reply to comment #20)
> (In reply to comment #19)
> > After reading https://bugzilla.mozilla.org/show_bug.cgi?id=414720#c0
> 
> Keep reading. :-) The current problem, described further down in the bug, is
> that the desktop picks up the color depth of the last client to connect to it.
> Change the depth of your client, reconnect, and it will probably start working
> again.
Actually, I had read it all, my Microsoft RDC client (v1.03) for Mac does not give me a choice about color depth as far as I can find out. I was going to look for a windows RDP connection tomorrow morning, and see if that allowed me to change color depth. What RDP software do you use?
Microsoft® Remote Desktop Connection Client for Mac, Version 2.0.0 Beta2 (071001)

Preferences -> Display -> Colors, set to Millions

Available form the download link on http://www.microsoft.com/mac/products/remote-desktop/default.mspx, although it's not terribly obvious.
killed open windows and verified desktop was set to 24 bits depth.

John: when connecting to these machines using Windows Remote Desktop Connection, make sure you're set to "millions" of colors before connecting. Otherwise, it will downgrade the screen depth. I usually save preferences for each machine we connect to and use that icon for connecting to the remote machine.
Status: NEW → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Actually, the Microsoft RDC 1.0.3 client *does* allow me to set color depth - nrthomas just showed me how. Thanks Nick. Its on the first connection dialog box, under the little clickable triangle on the left hand side, beside the word "options" (duh!). 

John learns something new, and also gives thanks to robcee for stepping in while OOO, fixing this color depth, and closing this bug. I confirm that mochitests are still now passing.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.