Closed Bug 786914 Opened 13 years ago Closed 13 years ago

Many test slaves not taking jobs

Categories

(Release Engineering :: General, defect, P1)

x86_64
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: nthomas)

Details

(Whiteboard: [buildduty][capacity])

Currently, around 1000 of the 3000 pending jobs are 10.8 tests, going back about five hours. There are currently seven jobs running, so there must be, what, 73, 74 slaves not taking jobs? One possibility, that I don't know how to evaluate the probability of, is that those 73 think that their basedir is C:\talos-slave.
As of the 2012-08-30 02:00:04 copy of http://build.mozilla.org/builds/last-job-per-slave.html#talos Done work in last hour: talos-mtnlion-r5-006 talos-mtnlion-r5-014 talos-mtnlion-r5-015 talos-mtnlion-r5-017 talos-mtnlion-r5-018 talos-mtnlion-r5-053 talos-mtnlion-r5-082 Last job completed about 12.5 hours ago: talos-mtnlion-r5-004 talos-mtnlion-r5-005 talos-mtnlion-r5-007 talos-mtnlion-r5-008 talos-mtnlion-r5-009 talos-mtnlion-r5-011 talos-mtnlion-r5-012 talos-mtnlion-r5-013 talos-mtnlion-r5-016 talos-mtnlion-r5-019 talos-mtnlion-r5-021 talos-mtnlion-r5-023 talos-mtnlion-r5-024 talos-mtnlion-r5-025 talos-mtnlion-r5-026 talos-mtnlion-r5-027 talos-mtnlion-r5-028 talos-mtnlion-r5-029 talos-mtnlion-r5-037 talos-mtnlion-r5-041 talos-mtnlion-r5-042 talos-mtnlion-r5-043 talos-mtnlion-r5-044 talos-mtnlion-r5-045 talos-mtnlion-r5-046 talos-mtnlion-r5-047 talos-mtnlion-r5-048 talos-mtnlion-r5-049 talos-mtnlion-r5-050 talos-mtnlion-r5-051 talos-mtnlion-r5-052 talos-mtnlion-r5-054 talos-mtnlion-r5-055 talos-mtnlion-r5-056 talos-mtnlion-r5-057 talos-mtnlion-r5-058 talos-mtnlion-r5-059 talos-mtnlion-r5-060 talos-mtnlion-r5-076 talos-mtnlion-r5-081 talos-mtnlion-r5-083 talos-mtnlion-r5-084 talos-mtnlion-r5-085 talos-mtnlion-r5-086 talos-mtnlion-r5-088 talos-mtnlion-r5-089 Never done a job: talos-mtnlion-r5-020 talos-mtnlion-r5-030 talos-mtnlion-r5-031 talos-mtnlion-r5-032 talos-mtnlion-r5-033 talos-mtnlion-r5-034 talos-mtnlion-r5-035 talos-mtnlion-r5-036 talos-mtnlion-r5-038 talos-mtnlion-r5-039 talos-mtnlion-r5-040 talos-mtnlion-r5-061 talos-mtnlion-r5-062 talos-mtnlion-r5-063 talos-mtnlion-r5-064 talos-mtnlion-r5-065 talos-mtnlion-r5-066 talos-mtnlion-r5-067 talos-mtnlion-r5-068 talos-mtnlion-r5-069 talos-mtnlion-r5-070 talos-mtnlion-r5-071 talos-mtnlion-r5-072 talos-mtnlion-r5-073 talos-mtnlion-r5-074 talos-mtnlion-r5-075 talos-mtnlion-r5-077 talos-mtnlion-r5-078 talos-mtnlion-r5-079 talos-mtnlion-r5-080 talos-mtnlion-r5-087 talos-mtnlion-r5-090
(In reply to Nick Thomas [:nthomas] from comment #1) > Last job completed about 12.5 hours ago: There are many other slaves of other OS in the state too: talos-r4-snow, talos-r4-leopard, talos-r3-fed*. This will be fallout from bug 786807. > Never done a job: These seem to have uptimes of 6 days and look like they never got rebooted after 10.8 was enabled in production. Only the staging slaves and 022 are disabled in slavealloc.
talos-r4-snow-* and talos-mtnlion-r5-* (where uptime > 1 hour) rebooted
talos-r3-fed-* and talos-r3-fed64-* done.
Assignee: nobody → nthomas
Priority: -- → P1
Summary: Nearly every 10.8 slave is not taking jobs → Many test slaves not taking jobs
talos-r4-lion-* done too. Turns out talos-mtnlion-r5-080 wasn't ready for production (bug 786993) and burned ~250 builds when hg wasn't working. talos-mtnlion-r5-087 (bug 786994) aslo had issues, there might be more.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
I went through the list of last build per slaves for all the mtnlion machines today and opened bugs or reimaged/rebooted the problematic ones. I also updated https://wiki.mozilla.org/ReferencePlatforms/HowToSetupNewPlatform to indicate that this is something you should watch after new platform is put into production to catch any wonky slaves.
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.