Intermittent Automation Error: python exited with signal -9
Categories
(Release Engineering :: Release Automation: Uploading, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Filed by: rgurzau [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=265829335&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/VvPGJxgIQ-Swln21fshQ8A/runs/0/artifacts/public/logs/live_backing.log
2019-09-10 00:15:29,281 - asyncio - DEBUG - Using selector: EpollSelector
2019-09-10 00:15:29,285 - beetmoverscript.task - INFO - Action types: ['push-to-nightly']
2019-09-10 00:15:29,286 - scriptworker.client - DEBUG - Task is validated against this schema: {'title': 'Taskcluster beetmover task minimal schema', 'type': 'object', 'properties': {'dependencies': {'type': 'array', 'minItems': 1, 'uniqueItems': True, 'items': {'type': 'string'}}, 'payload': {'type': 'object', 'properties': {'upload_date': {'type': 'number'}, 'build_number': {'type': 'number'}, 'locale': {'type': 'string'}, 'maxRunTime': {'type': 'number'}, 'version': {'type': 'string'}, 'next_version': {'type': 'string'}, 'appVersion': {'type': 'string'}, 'releaseProperties': {'type': 'object', 'properties': {'appName': {'type': 'string'}, 'buildid': {'type': 'string'}, 'appVersion': {'type': 'string'}, 'hashType': {'type': 'string'}, 'platform': {'type': 'string'}, 'branch': {'type': 'string'}}, 'required': ['appName', 'buildid', 'appVersion', 'hashType', 'platform', 'branch']}, 'upstreamArtifacts': {'type': 'array', 'items': {'type': 'object', 'properties': {'taskType': {'type': 'string'}, 'locale': {'type': 'string'}, 'taskId': {'type': 'string'}, 'paths': {'type': 'array', 'minItems': 1, 'uniqueItems': True, 'items': {'type': 'string'}}, 'zipExtract': {'type': 'boolean'}}, 'required': ['taskId', 'taskType', 'paths', 'locale']}, 'minItems': 1, 'uniqueItems': True}}, 'required': ['upload_date', 'upstreamArtifacts', 'releaseProperties'], 'optional': ['build_number', 'version', 'locale', 'maxRunTime', 'appVersion', 'next_version']}}, 'required': ['payload', 'dependencies']}
2019-09-10 00:15:29,289 - beetmoverscript.task - INFO - Buckets: ['nightly']
2019-09-10 00:15:29,289 - beetmoverscript.task - INFO - Action types: ['push-to-nightly']
2019-09-10 00:15:29,290 - beetmoverscript.task - DEBUG - Loading release_props from task's payload: {'appName': 'Firefox', 'appVersion': '71.0a1', 'branch': 'mozilla-central', 'buildid': '20190909214621', 'hashType': 'sha512', 'platform': 'linux'}
Automation Error: python exited with signal -9
Comment 1•5 years ago
|
||
This sounds like OOM on the new workers to me. I asked CloudOps to bump the memory.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I believe this is fixed now. Please reopen if you see the error again.
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 4•5 years ago
|
||
New occurrence:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270156413&repo=mozilla-central&lineNumber=22
2019-10-07 22:40:34,357 - signingscript.sign - DEBUG - Found firefox/gmp-clearkey/0.1/libclearkey.so to sign widevine
2019-10-07 22:40:34,358 - signingscript.sign - DEBUG - Found firefox/plugin-container to sign widevine_blessed
2019-10-07 22:40:34,359 - signingscript.sign - DEBUG - Widevine files to sign: {'firefox': 'widevine', 'firefox/libxul.so': 'widevine', 'firefox/firefox': 'widevine', 'firefox/firefox-bin': 'widevine', 'firefox/gmp-clearkey/0.1/libclearkey.so': 'widevine', 'firefox/plugin-container': 'widevine_blessed'}
2019-10-07 22:40:34,360 - signingscript.utils - INFO - mkdir /app/workdir/wvtarsegsfdxd
2019-10-07 22:42:29,201 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/libxul.so.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/firefox.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/firefox-bin.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/gmp-clearkey/0.1/libclearkey.so.sig to the sigfile paths...
2019-10-07 22:42:29,207 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/plugin-container.sig to the sigfile paths...
Automation Error: python exited with signal -9
Comment 5•5 years ago
|
||
OOM in the signing pool?
Comment hidden (Intermittent Failures Robot) |
Comment 7•5 years ago
|
||
(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #5)
OOM in the signing pool?
Somehow expected given the hardware difference that we had between AWS and GCP.
@catlee pushed a fix for beefier instances there in terms of CPU and memory.
All jobs from PM nightlies are now green - https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=nightly&revision=4651f71eeb5476a6dc9002a47a45c3a5b17aba6c
Will keep an eye on the AM nightlies to see how this behaves, before closing the bug.
Comment 8•5 years ago
|
||
Turns out I was wrong and once @catlee deployed his fix earlier this morning, we didn't push anything to GCP signing-production environment to enforce recent changes. We rerun that job to deploy those. GCP will gracefully remove the old instances and ramp-up new ones once that's complete.
Comment 9•5 years ago
|
||
Follow-up problem, earlier today, in bug 1585603 we landed GCP support for addonworkers. Missed some configurations and things blew up. TIL that if this GCP app no longer works, scaling is broken for all of the workers.
I pushed https://github.com/mozilla-releng/k8s-autoscale/pull/47/ to fix this. Waiting for the k8s-autoscale pods to come back green and then auto-scaling to start again.
Comment 10•5 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #9)
Follow-up problem, earlier today, in bug 1585603 we landed GCP support for addonworkers. Missed some configurations and things blew up. TIL that if this GCP app no longer works, scaling is broken for all of the workers.
I pushed https://github.com/mozilla-releng/k8s-autoscale/pull/47/ to fix this. Waiting for the k8s-autoscale pods to come back green and then auto-scaling to start again.
Green signing jobs + auto-scaling is now fixed. We should no longer see issues of this sort for now.
Comment 11•5 years ago
|
||
FTR: all subsequent balrog jobs failing are due to an unrelated balrog issue, tracked separately in bug 1587078.
Comment 12•5 years ago
|
||
Green jobs + auto-scaling so I think we can close this for now.
Feel free to re-open should you see another signing problem.
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Description
•