Increase terminationGracePeriodSeconds for bitrisescript
Categories
(Release Engineering :: Release Automation, defect)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ahal)
References
Details
Attachments
(4 files)
63 bytes,
text/x-github-pull-request
|
Details | Review | |
63 bytes,
text/x-github-pull-request
|
Details | Review | |
[mozilla-releng/scriptworker-scripts] Bug 1891815 - Increase global task timeout to two hours (#975)
63 bytes,
text/x-github-pull-request
|
Details | Review | |
60 bytes,
text/x-github-pull-request
|
Details | Review |
I recently enabled some bitrisescript
tasks in the Firefox-iOS, but noticed that many of the tasks are failing with WORKER_SHUTDOWN
. Basically kubernetes was terminating the workers before the tasks could complete. Johan pointed me toward bug 1791366 which had a similar symptom.
The issue is that kubernetes sees that we don't have any work left to claim and signals to some of the replicas that they should shut down. Kubernetes has a config option called terminationGracePeriodSeconds
which is the amount of time the replica has to finish whatever it is doing before it will be forcefully killed. This value was configured for 30 min, which means any tasks that took longer that were at risk of being forcefully terminated before they could finish. For more info, see:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
Bitrisescript is a little different in that the length of the tasks can be arbitrary. So we should set the limit to something quite high as this will act as the ceiling for all workflows we might want to implement in Bitrise. I'm thinking we should do two hours for starters.
Comment 1•10 months ago
|
||
Comment 2•10 months ago
|
||
Comment 3•10 months ago
|
||
Comment 4•10 months ago
|
||
Assignee | ||
Comment 6•10 months ago
|
||
Yep, this is confirmed fixed. Thanks!
Description
•