Closed
Bug 1451224
Opened 6 years ago
Closed 6 years ago
Long running jobs fail
Categories
(Taskcluster :: General, enhancement)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: franziskus, Unassigned)
References
Details
We have image building tasks for NSS that take some time. They used to fail intermittently, then worked for a while and now seem to always fail. Here are some: * https://tools.taskcluster.net/groups/GrKGAkFES2CaDxPFnepzgw/tasks/GrKGAkFES2CaDxPFnepzgw/runs/0/logs/public%2Flogs%2Flive.log * https://tools.taskcluster.net/groups/MW7FUQEgRtaUV_cFN_zh_w/tasks/MW7FUQEgRtaUV_cFN_zh_w/runs/0/logs/public%2Flogs%2Flive.log * https://tools.taskcluster.net/groups/QJenIm12RFC9YwFCnZLn5Q/tasks/QJenIm12RFC9YwFCnZLn5Q/runs/0/logs/public%2Flogs%2Flive.log There is no reason for these jobs to fail from what they are running. The docker images run fine locally and the executing the jobs outside of the docker image build run fine as well.
Comment 1•6 years ago
|
||
Hey Wander, Could this be due to a recent hg-worker workerType upgrade?
Flags: needinfo?(wcosta)
Comment 2•6 years ago
|
||
I am not sure, going to investigate it.
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Flags: needinfo?(wcosta)
Comment 3•6 years ago
|
||
This seems a problem with the task itself, more precisaly, in the Hacl build. Running locally in a vagrant machine with Linux Bionic, I get the same error: worker/hacl-star/code/bignum Hacl.Spe.Poly1305_32.fst ./Hacl.Bignum.Parameters.fst(201,14-201,17): (Error) assertion failed (see also /home/worker/hacl-star/dependencies/FStar/ulib/FStar.UInt64.fst(65,12-65,32)) Verified module: Hacl.Bignum.Parameters (528197 milliseconds) 1 error was reported (see above) ../../Makefile.include:159: recipe for target 'Hacl.Bignum.Parameters.fst-verify' failed make[1]: *** [Hacl.Bignum.Parameters.fst-verify] Error 1 make[1]: *** Waiting for unfinished jobs.... Verified module: Hacl.Spe.Poly1305_32 (39651 milliseconds) All verification conditions discharged successfully make[1]: Leaving directory '/home/worker/hacl-star/code/poly1305_32' make: *** [verify-nss] Error 2 Makefile:67: recipe for target 'verify-nss' failed make: Leaving directory '/home/worker/hacl-star' The command '/bin/sh -c bash /tmp/setup-user.sh' returned a non-zero code: 2
Assignee: wcosta → nobody
Status: ASSIGNED → NEW
Reporter | ||
Comment 4•6 years ago
|
||
This task works totally fine for me locally. It's not an issue with the job itself. As I said it needs quite some resources, which can fail it, but otherwise it works fine.
Flags: needinfo?(wcosta)
Comment 5•6 years ago
|
||
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #4) > This task works totally fine for me locally. It's not an issue with the job > itself. As I said it needs quite some resources, which can fail it, but > otherwise it works fine. Hrm, so maybe it is just a matter of using a larger instance?
Flags: needinfo?(wcosta) → needinfo?(franziskuskiefer)
Reporter | ||
Comment 6•6 years ago
|
||
> Hrm, so maybe it is just a matter of using a larger instance?
Looking at the retriggered jobs that didn't help :/
Flags: needinfo?(franziskuskiefer)
Reporter | ||
Comment 7•6 years ago
|
||
I moved the job that fails here to a different task where it succeeds. That doesn't solve the issue but produces higher load on each NSS push. So it would be great if this issue could be resolved.
Comment 8•6 years ago
|
||
It sounds like this was an issue of the task causing OOM on the machine, which typically kills dockerd because Linux's OOM killer always does the worst possible thing. So the fix was probably the right one.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•