Closed Bug 1536400 Opened 6 years ago Closed 3 years ago

[MDC1] t-yosemite-r7-357 problem running jobs

Categories

(Infrastructure & Operations :: RelOps: Hardware, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: apop, Unassigned)

References

Details

Attachments

(1 file)

Currently, the machine has been set on quarantine. The last jobs , finished as exception : https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-357

On papertrail found the following lines :

Mar 18 14:17:42 t-yosemite-r7-357 kernel: firefox (map: 0xffffff8033e8eb40) triggered DYLD shared region unnest for map: 0xffffff8033e8eb40, region 0x7fff92c00000->0x7fff92e00000. While not abnormal for debuggers, this increases system memory footprint until the target exits.
Mar 18 14:17:46 t-yosemite-r7-357 kernel: process Python[1492] caught causing excessive wakeups. Observed wakeups rate (per sec): 155; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45007
Mar 18 14:17:46 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.ReportCrash[2039]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.ReportCrash
Mar 18 14:17:46 t-yosemite-r7-357 kernel: CODE SIGNING: cs_invalid_page(0x10bbd5000): p=2040[spindump] final status 0x2000000, allowing (remove VALID) page
Mar 18 14:17:47 t-yosemite-r7-357 kernel: firefox (map: 0xffffff8033e8eb40) triggered DYLD shared region unnest for map: 0xffffff8033e8eb40, region 0x7fff92c00000->0x7fff92e00000. While not abnormal for debuggers, this increases system memory footprint until the target exits.

the final entries :
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Clearing AirPlay peer 00:00:00:00:00:00
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Clearing AirPlay peer 00:00:00:00:00:00
Mar 18 19:59:54 t-yosemite-r7-357 kernel: IO80211AWDLPeerManager::setAwdlPeerTrafficRegistration Can't clear peer traffic registration, ret 82
Mar 18 20:00:24 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.imfoundation.IMRemoteURLConnectionAgent): The _DirtyJetsamMemoryLimit key is not available on this platform.
Mar 18 20:00:24 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.imfoundation.IMRemoteURLConnectionAgent): The _DirtyJetsamMemoryLimit key is not available on this platform.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.DataDetectorsDynamicData): The JoinExistingSession key is only available to Application services.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.FileSyncAgent.PHD.isRunning): The HideUntilCheckIn property is an architectural performance issue. Please transition away from it.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.mbloginhelper.user): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.mbpluginhost.user): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.secd): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.secd): The ServiceIPC key is no longer respected. Please remove it.
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.speech.speechsynthesisd): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.TrustEvaluationAgent): This key does not do anything: OnDemand
Mar 18 20:00:32 t-yosemite-r7-357 com.apple.xpc.launchd: (com.apple.appkit.xpc.sandboxedServiceRunner): The JoinExistingSession key is only available to Application services.

Dave, can you please take a look ?

Flags: needinfo?(dhouse)
Attached image 357.PNG

Reimaged however, is running multiple jobs.

I cycled the power for this machine and it did not come up (no response to ping/ssh and no logs appear in papertrail).

I've powered it off for a few days and will power it back up after that to see if it can complete tasks.

the machine is still down, following :

Pinging t-yosemite-r7-357.test.releng.mdc1.mozilla.com [10.49.56.141] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.49.56.141:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Dave, can you please power it on ? Then, lets check if the issue still persists..

Assignee: relops → nobody
Component: RelOps: General → RelOps: Hardware
QA Contact: klibby

It looks like it has been crashing recently also:

[dhouse@t-mojave-r7-357.test.releng.mdc1.mozilla.com ~]$ last -20
dhouse    ttys000  10.49.48.101     Tue Nov 19 14:42   still logged in
task_1574166055  console                   Tue Nov 19 14:32   still logged in
reboot    ~                         Tue Nov 19 14:32
task_1574166055  console                   Tue Nov 19 12:21 - crash  (02:11)
reboot    ~                         Tue Nov 19 12:21
shutdown  ~                         Tue Nov 19 12:20
reboot    ~                         Tue Nov 19 12:20
cltbld    console                   Tue Nov 19 06:11 - crash  (06:09)
reboot    ~                         Tue Nov 19 06:11
shutdown  ~                         Tue Nov 19 06:11
dhouse    ttys000  10.49.48.101     Tue Nov 19 06:08 - shutdown  (00:02)
cltbld    console                   Tue Nov 19 05:41 - shutdown  (00:30)
reboot    ~                         Tue Nov 19 05:41
shutdown  ~                         Tue Nov 19 05:41
dhouse    ttys000  10.49.48.101     Tue Nov 19 05:01 - shutdown  (00:39)
cltbld    console                   Tue Nov 19 04:53 - shutdown  (00:47)
reboot    ~                         Tue Nov 19 04:53
cltbld    console                   Wed Nov 13 06:25 - crash (5+22:28)
reboot    ~                         Wed Nov 13 06:25

I've kicked off a reimage, and I'll test if it keeps crashing.
https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1014/workers/mdc1/t-mojave-r7-357

crashed on a few tasks today
I'll move this to staging for now. If it fails further, we can decommission it

Flags: needinfo?(dhouse)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: