Closed Bug 789822 Opened 13 years ago Closed 12 years ago

Intermittent xperf hangs running tp5n (command timed out: 3600 seconds without output, attempting to kill)

Categories

(Testing :: Talos, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(firefox19 affected)

RESOLVED WORKSFORME
Tracking Status
firefox19 --- affected

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [purple])

I can't actually say for sure that it's because of the tp5n switch, since that happened at the same time as the switch to run it on inbound, and until then it didn't run very often and ran on a tree I rarely look at, so it could have been hanging and just getting retriggered and starred as infra (since a hang on Windows is the wrong color because we can't be bothered to fix trying and failing to crash the hung process to get a stack). https://tbpl.mozilla.org/php/getParsedLog.php?id=15065985&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound talos xperf on 2012-09-07 18:31:13 PDT for push e0c765e8be0f slave: talos-r3-w7-045 Running test tp5n: Started Fri, 07 Sep 2012 18:38:20 command timed out: 3600 seconds without output, attempting to kill SIGKILL failed to kill process https://tbpl.mozilla.org/php/getParsedLog.php?id=15079628&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound talos xperf on 2012-09-08 14:22:21 PDT for push 8d2e26084434 slave: talos-r3-w7-009 Running test tp5n: Started Sat, 08 Sep 2012 14:24:29 command timed out: 3600 seconds without output, attempting to kill SIGKILL failed to kill process
https://tbpl.mozilla.org/php/getParsedLog.php?id=15059369&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound pgo talos xperf on 2012-09-07 14:41:34 PDT for push 58bebcfa82af slave: talos-r3-w7-045
Ah, what fun! Another bug because we're not on mozprocess yet. :jmaher, in the interim, we should probably make the error message better so that we can at least see what it is failing to kill.
Whiteboard: [orange][purple] → [purple]
Summary: Intermittent xperf hangs running tp5n → Intermittent xperf hangs running tp5n (command timed out: 3600 seconds without output, attempting to kill)
Jeff, please can you take a look / else find an owner :-)
Flags: needinfo?(jhammel)
ctalbert, do you know who should own this?
Flags: needinfo?(jhammel) → needinfo?(ctalbert)
looking at this in orange factor there are 4 windows machines which cause this: talos-r3-w7-009 talos-r3-w7-027 talos-r3-w7-033 talos-r3-w7-045
Flags: needinfo?(ctalbert)
Depends on: 848658
(In reply to Joel Maher (:jmaher) from comment #239) > looking at this in orange factor there are 4 windows machines which cause > this: > talos-r3-w7-009 > talos-r3-w7-027 > talos-r3-w7-033 > talos-r3-w7-045 Good spot, thank you :-)
Have disabled those machines in slavealloc (see bug 848658 comment 2).
the last 5 days we haven't seen this reported, 5 days ago we pulled these 4 problematic machines from production.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to TinderboxPushlog Robot from comment #245) > slave: t-w732-ix-110 Seems to have an otherwise OK record.
Note xperf has not been run since the switchover to iX machines (bug 878858).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
While this is happening it is because we are on new machines without the xperf toolchain. So this really has nothing to do with xperf problems. With that said, we should check if xperf toolchain is available and error out with a nice fancy message!
Depends on: 880174
(In reply to Joel Maher (:jmaher) from comment #252) > While this is happening it is because we are on new machines without the > xperf toolchain. So this really has nothing to do with xperf problems. > With that said, we should check if xperf toolchain is available and error > out with a nice fancy message! Yeah agreed - filed bug 880174 :-)
Depends on: 879879
(I'll continue starring into this bug for now since TBPL won't match against the others)
we need to hide xperf, it is only on m-c and try, so it should be easy. Once we have the tooling in place, then we can unhide and star like a solar system.
(In reply to Joel Maher (:jmaher) from comment #255) > we need to hide xperf, it is only on m-c and try, so it should be easy. > Once we have the tooling in place, then we can unhide and star like a solar > system. Done :-) Bug 880192 for unhiding once fixed.
Makes you wonder where 002 was, when the ability to run xperf was being handed out, doesn't it?
We haven't seen this for 3 months; I had updated the toolchain in early July, but all of these are t-w732-ix-002. I am fine closing this and reopening it if we get this timeout.
Depends on: t-w732-ix-002
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.