Closed Bug 1593543 Opened 5 years ago Closed 5 years ago

GCP docker-worker images don't have lz4

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: miles)

References

Details

Nov 03 15:19:12 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker {"type":"ensure image","source":"top","provisionerId":"proj-servo","workerId":"714612925948711628","workerGroup":"community-tc-workers-google","workerType":"docker","workerNodeType":"projects/757942385826/machineTypes/n1-standard-4","image":{"type":"task-image","path":"public/image.tar.lz4","taskId":"D8pWPKUgSb-4XI_GGunu5g"}}
Nov 03 15:19:14 proj-servo-docker-oy5aes4orzebkdskvi9ayw systemd-networkd veth8c32a93: Gained IPv6LL
Nov 03 15:19:22 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker {"type":"aliveness check","source":"top","provisionerId":"proj-servo","workerId":"714612925948711628","workerGroup":"community-tc-workers-google","workerType":"docker","workerNodeType":"projects/757942385826/machineTypes/n1-standard-4","alive":true,"uptime":976,"interval":30000}
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker Uncaught Exception! Attempting to report to Sentry and crash.
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker Error: spawn lz4 ENOENT
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at onErrorNT (internal/child_process.js:362:16)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker reportError - level: fatal, tags: {}
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker  { Error: spawn lz4 ENOENT
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at onErrorNT (internal/child_process.js:362:16)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker   errno: 'ENOENT',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker   code: 'ENOENT',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker   syscall: 'spawn lz4',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker   path: 'lz4',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker   spawnargs:
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker    [ '-d',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker      '/mnt/tmp-docker-images/TjBUZ1WHReW41i9BBVpcWQ/image.tar.lz4',
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker      '/mnt/tmp-docker-images/TjBUZ1WHReW41i9BBVpcWQ/image.tar' ] }
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker Succesfully reported error to Sentry.
Nov 03 15:19:30 proj-servo-docker-oy5aes4orzebkdskvi9ayw docker-worker 2019/11/03 15:19:30 exit status 1

The image is docker-worker-gcp-googlecompute-2019-10-08t02-31-36z. I think that was built with monopacker?

https://github.com/taskcluster/docker-worker/pull/524 addresses the "Uncaught Exception!" part, and should hopefully convert this from a wedged worker + claim-expired into a "regular" task failure.

It remains to build an image containing lz4.

==> Builds finished. The artifacts of successful builds are:
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"0", "builder-id", "packer.googlecompute"}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"0", "id", "docker-worker-gcp-googlecompute-2019-11-03t21-30-52z"}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"0", "string", "A disk image was created: docker-worker-gcp-googlecompute-2019-11-03t21-30-52z"}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"0", "files-count", "0"}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"0", "end"}
--> docker_worker_gcp: A disk image was created: docker-worker-gcp-googlecompute-2019-11-03t21-30-52z
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"1", "builder-id", ""}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"1", "id", ""}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"1", "string", ""}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"1", "files-count", "0"}
2019/11/03 21:40:00 machine readable: docker_worker_gcp,artifact []string{"1", "end"}
2019/11/03 21:40:00 [INFO] (telemetry) Finalizing.
--> docker_worker_gcp:
2019/11/03 21:40:00 waiting for all plugin processes to complete...
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 [ERR] Error decoding response stream 62: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 39: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 55: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 73: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 78: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 81: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 59: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 84: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 44: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 53: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 58: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 60: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 45: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 56: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 80: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 83: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 66: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 70: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 71: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 72: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 82: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 42: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 43: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 47: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 65: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 64: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 68: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 79: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 37: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 69: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 67: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 41: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 38: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 40: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 50: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 52: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 77: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 51: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 54: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 63: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 75: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 76: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 49: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 57: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 74: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 46: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 48: EOF
2019/11/03 21:40:00 [ERR] Error decoding response stream 61: EOF
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited
2019/11/03 21:40:00 /usr/local/bin/packer: plugin process exited

so, I guess that's .. success??

I guess not.. instances created with that image just never do anything, not even logging. Halp?

Flags: needinfo?(wcosta)
Flags: needinfo?(miles)

For now as a work-around I’ve made Servo CI upload uncompressed images. The build image is 1.5 GB uncompressed and takes ~25 seconds to download from GCP. It’s half that size with lz4, which takes ~5 seconds to download from AWS (on the taskcluster.net deployment).

Miles was able to build an image and is testing it.

Assignee: nobody → miles
Flags: needinfo?(wcosta)
Flags: needinfo?(miles)

Miles, this is now deployed eveyrwhere, right?

Flags: needinfo?(miles)

As of this writing, https://community-tc.services.mozilla.com/tasks/ciL0gzGYSQSLT3RDZJFM6Q/runs/0 has been "running" for 20 minutes and does not have a log file. This is the same symptom as this bug, and I made that task specifically to try an lz4-compressed Docker image.

The worker uses GCP image projects/taskcluster-imaging/global/images/docker-worker-gcp-googlecompute-2019-11-04t22-31-35z, copied from https://community-tc.services.mozilla.com/worker-manager/proj-servo%2Fdocker which comes from https://github.com/mozilla/community-tc-config/blob/9d6dce8d2463f7d61f22d389921a2c18f23b54f3/generate/workers.py#L67-L70

It looks like the fix is not deployed to that image at least.

Yes, the docker-worker-gcp-googlecompute-2019-11-04t22-31-35z should have lz4, or at least have the liblz4-tool package installed per https://github.com/taskcluster/monopacker/pull/27.

I'm not sure what the issue is, once we have logs I imagine we'll be able to better understand.

Flags: needinfo?(miles)

Oh sorry, it looks like I misdiagnosed. https://community-tc.services.mozilla.com/tasks/ciL0gzGYSQSLT3RDZJFM6Q/runs/0 completed successfully with a compressed Docker image. So I think this bug is fixed. But the log was inaccessible until the task ended, even though bug 1593520 is fixed and other docker-worker tasks do have live logs. I’ll comment there about that.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.