Closed Bug 1725300 Opened 4 years ago Closed 4 years ago

Intermittent localized repacks Finished repackage step (success) after setting permissions

Categories

(Taskcluster :: Workers, defect)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: malexandru, Unassigned)

Details

(Keywords: intermittent-failure)

Failure log: https://treeherder.mozilla.org/logviewer?job_id=348164626&repo=mozilla-central&lineNumber=999

[task 2021-08-12T00:20:04.464Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/es.lproj/InfoPlist.strings
[task 2021-08-12T00:20:04.464Z] 00:20:04     INFO -  folder: /Firefox Nightly.app/Contents/Resources/defaults
[task 2021-08-12T00:20:04.465Z] 00:20:04     INFO -  Setting permissions to 040755 for /Firefox Nightly.app/Contents/Resources/defaults/
[task 2021-08-12T00:20:04.465Z] 00:20:04     INFO -  folder: /Firefox Nightly.app/Contents/Resources/defaults/pref
[task 2021-08-12T00:20:04.466Z] 00:20:04     INFO -  Setting permissions to 040755 for /Firefox Nightly.app/Contents/Resources/defaults/pref/
[task 2021-08-12T00:20:04.466Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/defaults/pref/channel-prefs.js
[task 2021-08-12T00:20:04.468Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/defaults/pref/channel-prefs.js
[task 2021-08-12T00:20:04.468Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/updater.ini
[task 2021-08-12T00:20:04.468Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/updater.ini
[task 2021-08-12T00:20:04.468Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/dependentlibs.list
[task 2021-08-12T00:20:04.469Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/dependentlibs.list
[task 2021-08-12T00:20:04.469Z] 00:20:04     INFO -  folder: /Firefox Nightly.app/Contents/Resources/META-INF
[task 2021-08-12T00:20:04.470Z] 00:20:04     INFO -  Setting permissions to 040755 for /Firefox Nightly.app/Contents/Resources/META-INF/
[task 2021-08-12T00:20:04.470Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/META-INF/mozilla.rsa
[task 2021-08-12T00:20:04.471Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/META-INF/mozilla.rsa
[task 2021-08-12T00:20:04.471Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/META-INF/cose.sig
[task 2021-08-12T00:20:04.472Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/META-INF/cose.sig
[task 2021-08-12T00:20:04.472Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/META-INF/mozilla.sf
[task 2021-08-12T00:20:04.474Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/META-INF/mozilla.sf
[task 2021-08-12T00:20:04.474Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/META-INF/manifest.mf
[task 2021-08-12T00:20:04.475Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/META-INF/manifest.mf
[task 2021-08-12T00:20:04.475Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/META-INF/cose.manifest
[task 2021-08-12T00:20:04.477Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/META-INF/cose.manifest
[task 2021-08-12T00:20:04.477Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/precomplete
[task 2021-08-12T00:20:04.478Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/Resources/precomplete
[task 2021-08-12T00:20:04.478Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/Resources/fix_stacks.py
[task 2021-08-12T00:20:04.479Z] 00:20:04     INFO -  Setting permissions to 100755 for /Firefox Nightly.app/Contents/Resources/fix_stacks.py
[task 2021-08-12T00:20:04.479Z] 00:20:04     INFO -  file: /Firefox Nightly.app/Contents/PkgInfo
[task 2021-08-12T00:20:04.479Z] 00:20:04     INFO -  Setting permissions to 100644 for /Firefox Nightly.app/Contents/PkgInfo
[task 2021-08-12T00:20:04.479Z] 00:20:04     INFO -  folder: /.background
[task 2021-08-12T00:20:04.480Z] 00:20:04     INFO -  Setting permissions to 040755 for /.background/
[task 2021-08-12T00:20:04.480Z] 00:20:04     INFO -  file: /.background/background.png
[task 2021-08-12T00:20:04.481Z] 00:20:04     INFO -  Setting permissions to 100644 for /.background/background.png
[task 2021-08-12T00:20:04.481Z] 00:20:04     INFO -  file: /.VolumeIcon.icns
[task 2021-08-12T00:20:04.484Z] 00:20:04     INFO -  Setting permissions to 100644 for /.VolumeIcon.icns
[task 2021-08-12T00:20:04.484Z] 00:20:04     INFO -  file: /.DS_Store
[task 2021-08-12T00:20:04.485Z] 00:20:04     INFO -  Setting permissions to 100644 for /.DS_Store
[task 2021-08-12T00:20:51.491Z] 00:20:51     INFO - Return code: 0
[task 2021-08-12T00:20:51.491Z] 00:20:51     INFO - [mozharness: 2021-08-12 00:20:51.491677Z] Finished repackage step (success)
[task 2021-08-12T00:20:51.542Z] cleanup
[task 2021-08-12T00:20:51.542Z] + cleanup
[task 2021-08-12T00:20:51.542Z] + local rv=0
[task 2021-08-12T00:20:51.542Z] + cleanup_xvfb
[task 2021-08-12T00:20:51.542Z] ++ pidof Xvfb
[task 2021-08-12T00:20:51.543Z] + local xvfb_pid=
[task 2021-08-12T00:20:51.543Z] + local vnc=false
[task 2021-08-12T00:20:51.543Z] + local interactive=false
[task 2021-08-12T00:20:51.543Z] + '[' -n '' ']'
[task 2021-08-12T00:20:51.544Z] + exit 0
[taskcluster 2021-08-12 00:20:51.871Z] === Task Finished ===
[taskcluster 2021-08-12 00:20:52.126Z] Artifact "public/build" not found at "/builds/worker/artifacts/"
[taskcluster 2021-08-12 00:20:58.586Z] Successful task run with exit code: 0 completed in 227.374 seconds

As far as what's running in the task is concerned, everything was fine. Up to taskcluster itself saying the task was successful on the last line.
And yet the task was marked failed. This has to be something wrong going on the taskcluster side.

Component: General → Workers
Product: Firefox Build System → Taskcluster

The Bugbug bot thinks this bug should belong to the 'Core::Permission Manager' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Workers → Permission Manager
Product: Taskcluster → Core
Component: Permission Manager → Workers
Product: Core → Taskcluster

I'm not sure what is going on, just capturing data while the logs are fresh.

The message Artifact "public/build" not found at "/builds/worker/artifacts/" is from Taskcluster. The code is attempting to log an error without failing the task.

https://github.com/taskcluster/taskcluster/blob/edfc885d73181034978ab765e432353b59343a65/workers/docker-worker/src/features/artifacts.js#L51

It failed on run 0, and completed on run 1. Here's Taskcluster's page for the task:

https://firefox-ci-tc.services.mozilla.com/tasks/AlmBSv_XQJakPg5IqvGaPg

The successful run 1 also has the error message:

[taskcluster 2021-08-12 00:44:47.534Z] Artifact "public/build" not found at "/builds/worker/artifacts/"
[taskcluster 2021-08-12 00:44:51.526Z] Successful task run with exit code: 0 completed in 236.595 seconds

Because both have this message, I suspect this error message is a dead end.

Skimming the logs, the only difference appears in the permissions section. These are in run 0 but not in run 1:

[task 2021-08-12T00:20:04.481Z] 00:20:04     INFO -  file: /.VolumeIcon.icns
[task 2021-08-12T00:20:04.484Z] 00:20:04     INFO -  Setting permissions to 100644 for /.VolumeIcon.icns
[task 2021-08-12T00:20:04.484Z] 00:20:04     INFO -  file: /.DS_Store
[task 2021-08-12T00:20:04.485Z] 00:20:04     INFO -  Setting permissions to 100644 for /.DS_Store

There are timestamps on each line that make comparison difficult, so there may be other differences.

Server logs from run 0 (Aug 12 00:20:53Z):

{
  "type": "Uploading public/build/es-MX/target.dmg",
  "source": "uploadToS3",
  "taskId": "AlmBSv_XQJakPg5IqvGaPg",
  "runId": 0,
  "artifactName": "public/build/es-MX/target.dmg",
  "putUrl": "https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/0/public/build/es-MX/target.dmg?AWSAccessKeyId=TheAccessKey&Content-Type=application%2Foctet-stream&Expires=1628730363&Signature=ASignature"
}
{
  "type": "task resolved",
  "source": "top",
  "provisionerId": "gecko-3",
  "workerId": "i-04df9a74bdef6376a",
  "workerGroup": "us-west-2",
  "workerType": "b-linux",
  "workerNodeType": "m5n.4xlarge",
  "taskId": "AlmBSv_XQJakPg5IqvGaPg",
  "runId": 0,
  "taskState": "failed"
}

From run 2 (Aug 12 00:44:47Z):

{
  "type": "Uploading public/build/es-MX/target.dmg",
  "source": "uploadToS3",
  "taskId": "AlmBSv_XQJakPg5IqvGaPg",
  "runId": 1,
  "artifactName": "public/build/es-MX/target.dmg",
  "putUrl": "https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/1/public/build/es-MX/target.dmg?AWSAccessKeyId=TheAccessKey&Content-Type=application%2Foctet-stream&Expires=1628731797&Signature=ADifferentSig"
}
{
  "type": "task resolved",
  "source": "top",
  "provisionerId": "gecko-3",
  "workerId": "i-0185ecc39307ae1d6",
  "workerGroup": "us-west-1",
  "workerType": "b-linux",
  "workerNodeType": "m5.4xlarge",
  "taskId": "AlmBSv_XQJakPg5IqvGaPg",
  "runId": 1,
  "taskState": "completed"
}

Looking at the workers:

  • run 0, i-04df9a74bdef6376a, us-west-2, created 2021-08-11T23:38:26.389Z, stopped 2021-08-12T01:19:49.965Z
  • run 1, i-0185ecc39307ae1d6, us-west-1, created 2021-08-12T00:24:19.792Z, stopped 2021-08-12T01:19:48.339Z

The lines about /.VolumeIcon.icns and /.DS_Store are in the successful run 1 as well, just further up the logs:

[task 2021-08-12T00:43:56.407Z] 00:43:56     INFO -  file: /.DS_Store
[task 2021-08-12T00:43:56.407Z] 00:43:56     INFO -  Setting permissions to 100644 for /.DS_Store
[task 2021-08-12T00:43:56.407Z] 00:43:56     INFO -  file: /.VolumeIcon.icns
[task 2021-08-12T00:43:56.411Z] 00:43:56     INFO -  Setting permissions to 100644 for /.VolumeIcon.icns

Whatever the failure is, I don't think it is in the task output.

Server logs around task finishing are more useful. For run 0:

Aug 12 00:20:58Z ip-10-144-62-24 docker-worker 2021/08/12 00:20:58 {"type":"Uploading public/logs/certified.log","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":0,"artifactName":"public/logs/certified.log","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/0/public/logs/certified.log?AWSAccessKeyId=[accessKey]&Content-Type=text%2Fplain&Expires=1628730368&Signature=[sig]"}
Aug 12 00:20:58Z ip-10-144-62-24 kernel [ 2541.806839] veth640c217: renamed from eth0
Aug 12 00:20:58Z ip-10-144-62-24 kernel [ 2541.898465] docker0: port 1(vethf65a2bf) entered disabled state
Aug 12 00:20:58Z ip-10-144-62-24 kernel [ 2541.929154] docker0: port 1(vethf65a2bf) entered disabled state
Aug 12 00:20:58Z ip-10-144-62-24 kernel [ 2541.930474] device vethf65a2bf left promiscuous mode
Aug 12 00:20:58Z ip-10-144-62-24 kernel [ 2541.930476] docker0: port 1(vethf65a2bf) entered disabled state
Aug 12 00:20:59Z ip-10-144-62-24 docker-worker
2021/08/12 00:20:59 {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":0,"artifactName":"public/logs/live_backing.log","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/0/public/logs/live_backing.log?AWSAccessKeyId=[accesskey]&Content-Type=text%2Fplain%3B%20charset%3Dutf-8&Expires=1628730369&Signature=[sig]"}
Aug 12 00:20:59Z ip-10-144-62-24 docker-worker 2021/08/12 00:20:59 {"type":"error killing states","source":"top","provisionerId":"gecko-3","workerId":"i-04df9a74bdef6376a","workerGroup":"us-west-2","workerType":"b-linux","workerNodeType":"m5n.4xlarge","error":"Could not kill states properly. Error: Error: Error calling 'killed' for chainOfTrust : Error [ERR_STREAM_WRITE_AFTER_END] [ERR_STREAM_WRITE_AFTER_END]: write after end\n    at writeAfterEnd (_stream_writable.js:253:14)\n    at PassThrough.Writable.write (_stream_writable.js:302:5)\n    at ChainOfTrust.killed (/home/ubuntu/docker-worker/src/lib/features/chain_of_trust.js:117:19)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at /home/ubuntu/docker-worker/src/lib/states.js:58:15\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at Task.run (/home/ubuntu/docker-worker/src/lib/task.js:992:7)\n    at Task.start (/home/ubuntu/docker-worker/src/lib/task.js:700:17)\n    at Tas
Aug 12 00:20:59Z ip-10-144-62-24 docker-worker kListener.runTaskset (/home/ubuntu/docker-worker/src/lib/task_listener.js:524:9)\n    at async Promise.all (index 0)"}
Aug 12 00:20:59Z ip-10-144-62-24 docker-worker
2021/08/12 00:20:59 {"type":"task resolved","source":"top","provisionerId":"gecko-3","workerId":"i-04df9a74bdef6376a","workerGroup":"us-west-2","workerType":"b-linux","workerNodeType":"m5n.4xlarge","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":0,"taskState":"failed"}

For run 1, the logs and chain of trust files are successfully uploaded:

Aug 12 00:44:51Z ip-10-143-44-11 docker-worker
2021/08/12 00:44:51 {"type":"Uploading public/logs/certified.log","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":1,"artifactName":"public/logs/certified.log","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/1/public/logs/certified.log?AWSAccessKeyId=[accesskey]&Content-Type=text%2Fplain&Expires=1628731801&Signature=[sig]"}
Aug 12 00:44:51Z ip-10-143-44-11 kernel [ 1220.719838] docker0: port 2(vethf0cc7c7) entered disabled state
Aug 12 00:44:51Z ip-10-143-44-11 kernel [ 1220.719879] vethce61901: renamed from eth0
Aug 12 00:44:51Z ip-10-143-44-11 kernel [ 1220.807207] docker0: port 2(vethf0cc7c7) entered disabled state
Aug 12 00:44:51Z ip-10-143-44-11 kernel [ 1220.808564] device vethf0cc7c7 left promiscuous mode
Aug 12 00:44:51Z ip-10-143-44-11 kernel [ 1220.808566] docker0: port 2(vethf0cc7c7) entered disabled state
Aug 12 00:44:51Z ip-10-143-44-11 docker-worker
2021/08/12 00:44:51 {"type":"Uploading public/chain-of-trust.json","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":1,"artifactName":"public/chain-of-trust.json","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/1/public/chain-of-trust.json?AWSAccessKeyId=[accesskey]&Content-Type=text%2Fplain&Expires=1628731801&Signature=[sig]"}
Aug 12 00:44:51Z ip-10-143-44-11 docker-worker
2021/08/12 00:44:51 {"type":"Uploading public/logs/live_backing.log","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":1,"artifactName":"public/logs/live_backing.log","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/1/public/logs/live_backing.log?AWSAccessKeyId=[accesskey]&Content-Type=text%2Fplain%3B%20charset%3Dutf-8&Expires=1628731801&Signature=[sig]"}
Aug 12 00:44:52Z ip-10-143-44-11 docker-worker
2021/08/12 00:44:52 {"type":"Uploading public/chain-of-trust.json.sig","source":"uploadToS3","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":1,"artifactName":"public/chain-of-trust.json.sig","putUrl":"https://firefoxcitc-public-artifacts.s3.us-west-2.amazonaws.com/AlmBSv_XQJakPg5IqvGaPg/1/public/chain-of-trust.json.sig?AWSAccessKeyId=[accesskey]&Content-Type=application%2Foctet-stream&Expires=1628731801&Signature=[sig]"}
Aug 12 00:44:52Z ip-10-143-44-11 docker-worker
2021/08/12 00:44:52 {"type":"task resolved","source":"top","provisionerId":"gecko-3","workerId":"i-0185ecc39307ae1d6","workerGroup":"us-west-1","workerType":"b-linux","workerNodeType":"m5.4xlarge","taskId":"AlmBSv_XQJakPg5IqvGaPg","runId":1,"taskState":"completed"}

A formatted version of the error (which appears to print over two lines):

Could not kill states properly. Error: Error: Error calling 'killed' for chainOfTrust : Error [ERR_STREAM_WRITE_AFTER_END] [ERR_STREAM_WRITE_AFTER_END]: write after end
    at writeAfterEnd (_stream_writable.js:253:14)
    at PassThrough.Writable.write (_stream_writable.js:302:5)
    at ChainOfTrust.killed (/home/ubuntu/docker-worker/src/lib/features/chain_of_trust.js:117:19)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /home/ubuntu/docker-worker/src/lib/states.js:58:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at Task.run (/home/ubuntu/docker-worker/src/lib/task.js:992:7)
    at Task.start (/home/ubuntu/docker-worker/src/lib/task.js:700:17)
    at TaskListener.runTaskset (/home/ubuntu/docker-worker/src/lib/task_listener.js:524:9)
    at async Promise.all (index 0)"}

This code also implies it is printing a handled error:

https://github.com/taskcluster/taskcluster/blob/66941613e4242d69dd2aff3fb560359eb633d59c/workers/docker-worker/src/task.js#L549-L555

This is part of a function called abortRun, called from several places with an optional error.

This feels like a more promising avenue for determining the cause of this intermittent. At the same time, the taskcluster team's focus is on the generic worker, not the docker worker, so my investigation will probably stop here for now.

Closing as per https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup

Feel free to re-open should this occur again.

Severity: -- → S4
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.