[MDC1] t-yosemite-r7-357 problem running jobs
Categories
(Infrastructure & Operations :: RelOps: Hardware, task)
Tracking
(Not tracked)
People
(Reporter: apop, Unassigned)
References
Details
Attachments
(1 file)
53.37 KB,
image/png
|
Details |
Currently, the machine has been set on quarantine. The last jobs , finished as exception : https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-357
On papertrail found the following lines :
Mar 18 14:17:42 t-yosemite-r7-357 kernel: firefox (map: 0xffffff8033e8eb40) triggered DYLD shared region unnest for map: 0xffffff8033e8eb40, region 0x7fff92c00000->0x7fff92e00000. While not abnormal for debuggers, this increases system memory footprint until the target exits.
Mar 18 14:17:46 t-yosemite-r7-357 kernel: process Python[1492] caught causing excessive wakeups. Observed wakeups rate (per sec): 155; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45007
Mar 18 14:17:46 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.ReportCrash[2039]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.ReportCrash
Mar 18 14:17:46 t-yosemite-r7-357 kernel: CODE SIGNING: cs_invalid_page(0x10bbd5000): p=2040[spindump] final status 0x2000000, allowing (remove VALID) page
Mar 18 14:17:47 t-yosemite-r7-357 kernel: firefox (map: 0xffffff8033e8eb40) triggered DYLD shared region unnest for map: 0xffffff8033e8eb40, region 0x7fff92c00000->0x7fff92e00000. While not abnormal for debuggers, this increases system memory footprint until the target exits.
the final entries :
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Clearing AirPlay peer 00:00:00:00:00:00
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Clearing AirPlay peer 00:00:00:00:00:00
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Can't clear peer traffic registration, ret 82
Mar 18 20:00:24 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.imfoundation.IMRemoteURLConnectionAgent): The _DirtyJetsamMemoryLimit key is not available on this platform.
Mar 18 20:00:24 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.imfoundation.IMRemoteURLConnectionAgent): The _DirtyJetsamMemoryLimit key is not available on this platform.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.DataDetectorsDynamicData): The JoinExistingSession key is only available to Application services.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.FileSyncAgent.PHD.isRunning): The HideUntilCheckIn property is an architectural performance issue. Please transition away from it.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.mbloginhelper.user): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.mbpluginhost.user): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.secd): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.secd): The ServiceIPC key is no longer respected. Please remove it.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.speech.speechsynthesisd): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.TrustEvaluationAgent): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.appkit.xpc.sandboxedServiceRunner): The JoinExistingSession key is only available to Application services.
Dave, can you please take a look ?
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Reimaged however, is running multiple jobs.
I cycled the power for this machine and it did not come up (no response to ping/ssh and no logs appear in papertrail).
I've powered it off for a few days and will power it back up after that to see if it can complete tasks.
Reporter | ||
Comment 3•6 years ago
|
||
the machine is still down, following :
Pinging t-yosemite-r7-357.test.releng.mdc1.mozilla.com [10.49.56.141] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Ping statistics for 10.49.56.141:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
Dave, can you please power it on ? Then, lets check if the issue still persists..
Updated•6 years ago
|
It looks like it has been crashing recently also:
[dhouse@t-mojave-r7-357.test.releng.mdc1.mozilla.com ~]$ last -20
dhouse ttys000 10.49.48.101 Tue Nov 19 14:42 still logged in
task_1574166055 console Tue Nov 19 14:32 still logged in
reboot ~ Tue Nov 19 14:32
task_1574166055 console Tue Nov 19 12:21 - crash (02:11)
reboot ~ Tue Nov 19 12:21
shutdown ~ Tue Nov 19 12:20
reboot ~ Tue Nov 19 12:20
cltbld console Tue Nov 19 06:11 - crash (06:09)
reboot ~ Tue Nov 19 06:11
shutdown ~ Tue Nov 19 06:11
dhouse ttys000 10.49.48.101 Tue Nov 19 06:08 - shutdown (00:02)
cltbld console Tue Nov 19 05:41 - shutdown (00:30)
reboot ~ Tue Nov 19 05:41
shutdown ~ Tue Nov 19 05:41
dhouse ttys000 10.49.48.101 Tue Nov 19 05:01 - shutdown (00:39)
cltbld console Tue Nov 19 04:53 - shutdown (00:47)
reboot ~ Tue Nov 19 04:53
cltbld console Wed Nov 13 06:25 - crash (5+22:28)
reboot ~ Wed Nov 13 06:25
I've kicked off a reimage, and I'll test if it keeps crashing.
https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1014/workers/mdc1/t-mojave-r7-357
crashed on a few tasks today
I'll move this to staging for now. If it fails further, we can decommission it
Description
•