open /var/lib/docker/tmp/docker-import-.../repo/.../json: no such file or directory
Categories
(Infrastructure & Operations :: RelOps: General, defect)
Tracking
(Not tracked)
People
(Reporter: mhentges, Unassigned)
References
Details
My job on mobile-1-images
is failing while squashing the docker image.
2019-10-09 20:58:12,547 root DEBUG Cleaning up /tmp/docker-squash-lkai3x72 temporary directory
2019-10-09 20:58:14,719 root ERROR 404 Client Error: Not Found ("b'open /var/lib/docker/tmp/docker-import-786951861/repo/73f371facc8aecf7b130ec6faa9b1539969ea2eb6d5f2fbb070ca8d545ed08e0/json: no such file or directory'")
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 222, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localunixsocket/v1.18/images/load
FWIW, I tried building the image and squashing locally using the following commands, and it finishes without an error.
$ taskgraph build-image linux -t mitchhentges:v1
$ docker-squash -v -t "mitchhentges:v1-squashed" "mitchhentges:v1"
Comment 1•5 years ago
|
||
Is this intermittent?
Reporter | ||
Comment 2•5 years ago
|
||
It seems consistent: I've seen > 5 failures, but never any successes yet.
Comment 3•5 years ago
|
||
Does docker squash work in other tasks?
Reporter | ||
Comment 4•5 years ago
•
|
||
Comment 5•5 years ago
|
||
What's different about that task and this one?
Reporter | ||
Comment 6•5 years ago
|
||
Quite a bit: the project, the image that's being built, the environment.
I'm going to play with the Dockerfile and see if the issue is related to its contents. If so, I'll bisect and determine what specifically is causing the squash issue
Reporter | ||
Comment 7•5 years ago
•
|
||
Sometimes, I'm getting a different error:
2019-10-11 21:48:49,340 urllib3.connectionpool DEBUG http://localhost:None "POST /v1.18/images/load HTTP/1.1" 500 4
2019-10-11 21:48:49,340 root DEBUG Cleaning up /tmp/docker-squash-10vn23ow temporary directory
2019-10-11 21:48:51,259 root ERROR 500 Server Error: Internal Server Error ("b'EOF'")
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 222, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localunixsocket/v1.18/images/load
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/docker_squash/cli.py", line 87, in run
from_layer=args.from_layer, tag=args.tag, output_path=args.output_path, tmp_dir=args.tmp_dir, development=args.development, cleanup=args.cleanup).run()
File "/usr/local/lib/python3.6/dist-packages/docker_squash/squash.py", line 59, in run
return self.squash(image)
File "/usr/local/lib/python3.6/dist-packages/docker_squash/squash.py", line 92, in squash
image.load_squashed_image()
File "/usr/local/lib/python3.6/dist-packages/docker_squash/image.py", line 254, in load_squashed_image
self._load_image(self.new_image_dir)
File "/usr/local/lib/python3.6/dist-packages/docker_squash/image.py", line 299, in _load_image
self.docker.load_image(f)
File "/usr/local/lib/python3.6/dist-packages/docker/api/image.py", line 298, in load_image
self._raise_for_status(res)
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 224, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.6/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("b'EOF'")
I'm unsure if this is intermittent or not. I've been slowly removing commands from my Dockerfile
and committing the changes one-by-one, and some of the builds fail due to .../images/load
, and others fail due to /var/lib/docker/tmp/docker-import-.../repo/.../json
. It doesn't seem to be strictly related to the docker image size.
Reporter | ||
Comment 8•5 years ago
|
||
This task failed because it couldn't upload the artifact:
[taskcluster:error] Error uploading "public/image.tar.zst" artifact. Could not upload artifact. Status Code: 400
Reporter | ||
Comment 9•5 years ago
|
||
I've reduced the problem Dockerfile
as much as possible while still seeing the same error. The smallest I've got it is visible here.
I'm going to create a minimal standalone repository that can cause the issue so that I can confirm that other application-services
config isn't affecting this problem.
Reporter | ||
Comment 10•5 years ago
|
||
After investigating this some more, this is entirely a docker image size issue. Above a certain size, one of two possible errors will fail the image build. Note that the error that occurs isn't consistent, you can re-run the same docker build and you'll get a different error.
I created a standalone repository that just builds docker images when commits are pushed or a release happens. After creating the first commit, I re-ran the build until I saw both errors:
404 Client Error: Not Found ("b'open /var/lib/docker/tmp/docker-import-500949428/repo/d136a4118509e25a1760292024c9ba8602b93d90e6ea951a6bd902b7f4cbc228/json: no such file or directory'")
404 Client Error: Not Found for url: http+docker://localunixsocket/v1.18/images/load
The most convenient workaround I can think of right now is to move some dependency installations from the Dockerfile
to instead happen each time the Dockerfile
is used
Comment 11•5 years ago
|
||
Is this a disk-space issue?
Reporter | ||
Comment 12•5 years ago
|
||
I'm not sure how mobile-1-images
works internally, these are the error messages I see based on the different Dockerfile
s I'm providing.
Based on how this is dependent on docker image size, it sounds like it could be related to disk or memory space, but that's an assumption.
Reporter | ||
Comment 13•5 years ago
|
||
To test if this is a disk space issue, I doubled the volumeSize
and diskspaceThreshold
of an images worker and tried again. I received the 500 Server Error: Internal Server Error for url: http+docker://localunixsocket/v1.18/images/load
error when I built the image
Comment 14•5 years ago
|
||
Interesting, so this a docker bug. I think we are blocked at the moment from upgrading docker by requiring an ubuntu upgrade and kernel versions and packet and mumble mumble mumble. It'd be interesting to know if this occurs with newer dockers, or if it's correlated with some aspect of how docker-worker uses docker.
But for the moment, I don't think any of that can change, so .. do you have an adequate workaround?
Reporter | ||
Comment 15•5 years ago
|
||
I can work around this by removing installations from the docker image and adding it as a "pre-run" list of commands for each task using the docker image.
Comment 16•5 years ago
|
||
So, I'm guessing that this will work better with a newer docker version, which requires a newer Ubuntu version on the host. I expect relops will be working on that at some point, so moving over to that component for triage..
Description
•