Open Bug 1650396 Opened 5 years ago Updated 5 years ago

gecko-t-bitbar-gw-unit-p2 tasks fail as exception with exit code 4 found in task payload.onExitStatus list

Categories

(Testing :: General, defect, P2)

defect

Tracking

(Not tracked)

Future

People

(Reporter: CosminS, Unassigned)

Details

(Whiteboard: [stockwell:infra])

we're seeing these exceptions on autoland, all retriggeres have failed: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=exception&revision=83710b53eef098a0b2a2184ab02e8bcd76a00ab9&selectedTaskRun=cZpSqg_CR4aTffHEK3VkIg.5

looking at gecko-t-bitbar-gw-unit-p2 workers in the last 20 mins all tasks have failed as exception: https://firefox-ci-tc.services.mozilla.com/provisioners/proj-autophone/worker-types/gecko-t-bitbar-gw-unit-p2?sortBy=Task%20Resolved&sortDirection=desc

log: https://firefox-ci-tc.services.mozilla.com/tasks/Z_xOcZh3TnOWQkzz2mSMJw/runs/5/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FZ_xOcZh3TnOWQkzz2mSMJw%2Fruns%2F5%2Fartifacts%2Fpublic%2Flogs%2Flive.log

[task 2020-07-03T12:49:51.363Z] tcp        0      0 localhost:5037          localhost:39109         TIME_WAIT   -                    timewait (28.79/0/0)
[task 2020-07-03T12:49:51.363Z] tcp6       0      0 [::]:60098              [::]:*                  LISTEN      57/livelog           off (0.00/0/0)
[task 2020-07-03T12:49:51.363Z] tcp6       0      0 [::]:60099              [::]:*                  LISTEN      57/livelog           off (0.00/0/0)
[task 2020-07-03T12:49:51.363Z] tcp6       0      0 localhost:60098         localhost:46180         ESTABLISHED 57/livelog           keepalive (7.56/0/0)
[task 2020-07-03T12:49:51.363Z] udp        0      0 127.0.0.11:46271        0.0.0.0:*                           -                    off (0.00/0/0)
[task 2020-07-03T12:49:51.363Z] Active UNIX domain sockets (servers and established)
[task 2020-07-03T12:49:51.363Z] Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
[task 2020-07-03T12:49:51.363Z] 
[task 2020-07-03T12:49:51.363Z] 
[task 2020-07-03T12:49:51.447Z] 
[task 2020-07-03T12:49:51.447Z] 
[task 2020-07-03T12:49:51.447Z] df -h
[task 2020-07-03T12:49:51.447Z] Filesystem      Size  Used Avail Use% Mounted on
[task 2020-07-03T12:49:51.447Z] overlay         458G   92G  343G  22% /
[task 2020-07-03T12:49:51.447Z] tmpfs            64M     0   64M   0% /dev
[task 2020-07-03T12:49:51.447Z] tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
[task 2020-07-03T12:49:51.447Z] shm              64M     0   64M   0% /dev/shm
[task 2020-07-03T12:49:51.447Z] /dev/sda1       458G   92G  343G  22% /test
[task 2020-07-03T12:49:51.447Z] tmpfs           7.8G     0  7.8G   0% /proc/acpi
[task 2020-07-03T12:49:51.447Z] tmpfs           7.8G     0  7.8G   0% /proc/scsi
[task 2020-07-03T12:49:51.447Z] tmpfs           7.8G     0  7.8G   0% /sys/firmware
[task 2020-07-03T12:49:51.447Z] 
[task 2020-07-03T12:49:51.447Z] 
[task 2020-07-03T12:49:51.447Z] 
[task 2020-07-03T12:49:51.447Z] script.py: exiting with exitcode 4.
[fetches 2020-07-03T12:49:51.497Z] removing /builds/task_1593780497/fetches
[fetches 2020-07-03T12:49:51.498Z] finished
[taskcluster 2020-07-03T12:49:51.505Z]    Exit Code: 4
[taskcluster 2020-07-03T12:49:51.505Z]    User Time: 31.316009s
[taskcluster 2020-07-03T12:49:51.505Z]  Kernel Time: 7.12997s
[taskcluster 2020-07-03T12:49:51.505Z]    Wall Time: 1m29.602255076s
[taskcluster 2020-07-03T12:49:51.505Z]       Result: FAILED
[taskcluster 2020-07-03T12:49:51.505Z] === Task Finished ===
[taskcluster 2020-07-03T12:49:51.505Z] Task Duration: 1m29.60456019s
[taskcluster 2020-07-03T12:49:51.558Z] Uploading artifact public/logs/localconfig.json from file workspace/logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2021-07-03T09:58:11.529Z
[taskcluster 2020-07-03T12:49:51.733Z] Uploading artifact public/test_info/android-performance.log from file workspace/build/blobber_upload_dir/android-performance.log with content encoding "gzip", mime type "text/plain" and expiry 2021-07-03T09:58:11.529Z
[taskcluster 2020-07-03T12:49:51.825Z] Uploading artifact public/test_info/logcat-FA83W1A02560.log from file workspace/build/blobber_upload_dir/logcat-FA83W1A02560.log with content encoding "gzip", mime type "text/plain" and expiry 2021-07-03T09:58:11.529Z
[taskcluster 2020-07-03T12:49:52.023Z] Uploading artifact public/test_info/resource-usage.json from file workspace/build/blobber_upload_dir/resource-usage.json with content encoding "gzip", mime type "application/json" and expiry 2021-07-03T09:58:11.529Z
[taskcluster 2020-07-03T12:49:52.155Z] Uploading redirect artifact public/logs/live.log to URL https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Z_xOcZh3TnOWQkzz2mSMJw/runs/5/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2021-07-03T09:58:11.529Z
[taskcluster:error] Task appears to have failed intermittently - exit code 4 found in task payload.onExitStatus list ```
Flags: needinfo?(aerickson)
Component: Workers → RelOps: Hardware
Product: Taskcluster → Infrastructure & Operations
Whiteboard: [stockwell:infra]

From the included log it seems APK was bad.

[task 2020-07-03T12:49:46.077Z] 12:49:46 INFO - Failed to install /builds/task_1593780497/workspace/build/geckoview-androidTest.apk on pixel2-24: ADBProcessError args: adb wait-for-device install /builds/task_1593780497/workspace/build/geckoview-androidTest.apk, exitcode: 1, stdout: adb: failed to install /builds/task_1593780497/workspace/build/geckoview-androidTest.apk: Failure [INSTALL_PARSE_FAILED_NO_CERTIFICATES: Failed to collect certificates from /data/app/vmdl1579882320.tmp/base.apk using APK Signature Scheme v2: SHA-256 digest of contents did not verify]

There aren't really intermittent machine issues. I guess we should parse the error message before returning a exit code 4. BC, thoughts?

Flags: needinfo?(aerickson) → needinfo?(bob)

I would handle it in https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/android.py#344-352

We should move this to Testing... and handle it there.

Flags: needinfo?(bob)

OK. Moved to Testing::General. Let me know if there's a better sub category.

Component: RelOps: Hardware → General
Product: Infrastructure & Operations → Testing
Target Milestone: --- → Future
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.