Closed Bug 695267 Opened 13 years ago Closed 12 years ago

resolution problems with talos-r4-snow slaves

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Assigned: jhford)

References

Details

(Whiteboard: [buildslaves][talos])

There are issues with this machine.  It is causing a whole bunch of failures that look like they are related to the resolution of the machine.  I have disabled this slave in slavealloc.

When you get a chance, please let me know if there is something strange with this machine and/or its dongle.
fwiw, this machine is showing a lot more available resolutions than other slaves.  It is also showing duplicate entries.  VNC looks normal, and screenresolution get shows 1600x1200x32.

talos-r4-snow-051:~ cltbld$ screenresolution get
Display 0: 1600x1200x32
talos-r4-snow-051:~ cltbld$ screenresolution list                    
Available Modes on Display 0
  640x480x8 	640x480x16 	640x480x32 	640x480x8 
  640x480x16 	640x480x32 	800x600x8 	800x600x16 
  800x600x32 	800x600x8 	800x600x16 	800x600x32 
  1024x768x8 	1024x768x16 	1024x768x32 	1024x768x8 
  1024x768x16 	1024x768x32 	1280x960x8 	1280x960x16 
  1280x960x32 	1280x960x8 	1280x960x16 	1280x960x32 
  1280x960x8 	1280x960x16 	1280x960x32 	1280x768x8 
  1280x768x16 	1280x768x32 	1280x768x8 	1280x768x16 
  1280x768x32 	1280x768x8 	1280x768x16 	1280x768x32 
  1280x768x8 	1280x768x16 	1280x768x32 	1280x768x8 
  1280x768x16 	1280x768x32 	1280x768x8 	1280x768x16 
  1280x768x32 	1280x768x8 	1280x768x16 	1280x768x32 
  1280x1024x8 	1280x1024x16 	1280x1024x32 	1344x1008x8 
  1344x1008x16 	1344x1008x32 	1344x1008x8 	1344x1008x16 
  1344x1008x32 	1344x1008x8 	1344x1008x16 	1344x1008x32 
  1344x1008x8 	1344x1008x16 	1344x1008x32 	1400x1050x8 
  1400x1050x16 	1400x1050x32 	1400x1050x8 	1400x1050x16 
  1400x1050x32 	1400x1050x8 	1400x1050x16 	1400x1050x32 
  640x480x8 	640x480x16 	640x480x32 	640x480x8 
  640x480x16 	640x480x32 	640x480x8 	640x480x16 
  640x480x32 	800x600x8 	800x600x16 	800x600x32 
  800x600x8 	800x600x16 	800x600x32 	800x600x8 
  800x600x16 	800x600x32 	800x600x8 	800x600x16 
  800x600x32 	832x624x8 	832x624x16 	832x624x32 
  1024x768x8 	1024x768x16 	1024x768x32 	1024x768x8 
  1024x768x16 	1024x768x32 	1024x768x8 	1024x768x16 
  1024x768x32 	1024x768x8 	1024x768x16 	1024x768x32 
  1152x870x8 	1152x870x16 	1152x870x32 	1280x1024x8 
  1280x1024x16 	1280x1024x32 	1280x1024x8 	1280x1024x16 
  1280x1024x32 	1280x1024x8 	1280x1024x16 	1280x1024x32 
  1600x1200x8 	1600x1200x16 	1600x1200x32 	1600x1200x8 
  1600x1200x16 	1600x1200x32 	640x480x8 	640x480x16 
  640x480x32 	800x600x8 	800x600x16 	800x600x32 
  800x600x8 	800x600x16 	800x600x32 	1024x768x8 
  1024x768x16 	1024x768x32 	1024x768x8 	1024x768x16 
  1024x768x32 	1280x960x8 	1280x960x16 	1280x960x32 
  1280x960x8 	1280x960x16 	1280x960x32 	1280x960x8 
  1280x960x16 	1280x960x32 	1600x1024x8 	1600x1024x16 
  1600x1024x32 	1920x1080x8 	1920x1080x16 	1920x1080x32
colo-trip: --- → scl1
According to https://tbpl.mozilla.org/php/getParsedLog.php?id=6904079&full=1&branch=mozilla-inbound, talos-r4-snow-073 is in the same state (the key failure being "test_popup_attribute.xul | popup tests are likely to fail for screen heights less than 768 pixels" which says out of all those choices, it apparently picked either 800x600 or 640x480 to try to run tests under).
colo-trip: scl1 → ---
Summary: please help diagnose talos-r4-snow-051 → please help diagnose talos-r4-snow-051 and 073
don't know how i managed to remove the colo-trip flag
colo-trip: --- → scl1
talos-r4-snow-018 is also showing this problem.  It is showing the list of resolutions expected when the dongle is not installed.
Summary: please help diagnose talos-r4-snow-051 and 073 → please help diagnose talos-r4-snow-051 and 073 and 018
talos-r4-snow-052 (based on https://tbpl.mozilla.org/php/getParsedLog.php?id=6939072&tree=Try&full=1)
talos-r4-snow-074 (based on https://tbpl.mozilla.org/php/getParsedLog.php?id=6939478&tree=Try)
talos-r4-snow-034 (based on https://tbpl.mozilla.org/php/getParsedLog.php?id=6939069&tree=Try&full=1)

Just too many getting into this state to leave them harassing people who don't have anyone to explain what they are, I've rehidden the rev4 jobs on Try.
So there are two problems here - the huge list of screen resolutions, and the dongle-free list of screen resolutions.

The most obvious thing to try would be to verify that this state survives a reboot, and then try swapping dongles to determine whether it's a bad dongle or is associated with the mini.
Severity: normal → major
Depends on: 695926
We'll do the relops part in bug 695926
Assignee: server-ops-releng → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Summary: please help diagnose talos-r4-snow-051 and 073 and 018 → [tracker] screen resolution problems on r4 minis
This bug tracks the symptoms.  Bugs that it depends on track possible solutions.
Summary: [tracker] screen resolution problems on r4 minis → resolution problems with talos-r4-snow slaves
Depends on: 695930
So, I have no clue why the list in comment 2 is the way it is.  The list of resolutions is the same on all slaves, other than slaves from bug 695926 which are missing dongles.

The slaves we have seen exhibit this issue are:

018
034
051
052
073
074

Seeing two pairs of machines that are consecutively numbered makes me wonder if it has something to do with placement in the rack.  I also wonder if they are shorting out to each other?  Maybe they are independently shorting out to the rack?

Should we try covering all of the dongle's exposed metal bits with electrical tape?  I don't have any more ideas as to what could be going wrong here.
Missed a couple: 
018
023
034
051
052
069
073
074

Its worth noting that there is a different graphics chipset (geforce 9400M vs 320M) in these machines vs the previous generation.
The whole list is

talos-r4-snow-011     *
talos-r4-snow-018     *
talos-r4-snow-023
talos-r4-snow-034
talos-r4-snow-051
talos-r4-snow-052     *
talos-r4-snow-061     *
talos-r4-snow-069     *
talos-r4-snow-073
talos-r4-snow-074     *

indeed, 73/74 are next to each other, as are 51/52, but the rest aren't.  The stars indicate hosts that *currently* have short resolution lists and incorrect resolution - bug 695926.  The other four hosts are reported in this bug, but seem correct from everything 'screenresolution' can show me.

We can probably fix the starred hosts.  John, can you do some more investigation of the remaining four?  Are they misreported (sorry philor)?  Are they only failing on some boots?  Without further characterizing those failures, there's not much we can do to fix them.
They can most certainly be misdiagnosed on my part: there's only one test in one suite which actually tells me "this won't go well with that low a resolution." Resolution became the hammer once focus stopped being the hammer post-wifi and post-software update.

However, 023 is in your list as being fine for resolution, and the log link I pasted includes https://tbpl.mozilla.org/php/getParsedLog.php?id=6941407&tree=Mozilla-Inbound&full=1#error36 which is that test, "test_popup_attribute.xul | popup tests are likely to fail for screen heights less than 768 pixels," so whether that means that we have two resolution issues, one which persists across reboots and admits to being bad through screenresolution and another which does not (for either or both of those things), or that actually means something other than the second number in the resolution (like window height rather than screen height), at least some of the unstarred slaves do still have to do with resolution.
I'd bet on some yet-another focus problem with talos-r4-snow-043, based on https://tbpl.mozilla.org/php/getParsedLog.php?id=6941772&tree=Mozilla-Inbound&full=1, but by now I don't even know where to place that bet.
Wound up putting my 043 bet in bug 695976.
Component: Release Engineering → Server Operations: RelEng
colo-trip: scl1 → ---
Component: Server Operations: RelEng → Release Engineering
john, can you prioritize and tag this bug
Assignee: nobody → jhford
Whiteboard: [buildslaves][talos]
talos-r4-snow-064 claiming 1600x1200x32, nevertheless failed "test_popup_attribute.xul | popup tests are likely to fail for screen heights less than 768 pixels" and the other 228 things in that set of failure, in https://tbpl.mozilla.org/php/getParsedLog.php?id=7842168&full=1&branch=mozilla-beta
Depends on: 709436
https://tbpl.mozilla.org/php/getParsedLog.php?id=8078142&tree=Mozilla-Release&full=1 is that charmer talos-r4-snow-069 doing the push before tagging 9.0.1, choosing 800x600 on a mochitest-other to maximize the number of failures (270).
DVI doctors are installed everywhere now.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.