Closed Bug 1580054 Opened 5 years ago Closed 5 years ago

Intermittent Automation Error: python exited with signal -9

Categories

(Release Engineering :: Release Automation: Uploading, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Filed by: rgurzau [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=265829335&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/VvPGJxgIQ-Swln21fshQ8A/runs/0/artifacts/public/logs/live_backing.log


2019-09-10 00:15:29,281 - asyncio - DEBUG - Using selector: EpollSelector
2019-09-10 00:15:29,285 - beetmoverscript.task - INFO - Action types: ['push-to-nightly']
2019-09-10 00:15:29,286 - scriptworker.client - DEBUG - Task is validated against this schema: {'title': 'Taskcluster beetmover task minimal schema', 'type': 'object', 'properties': {'dependencies': {'type': 'array', 'minItems': 1, 'uniqueItems': True, 'items': {'type': 'string'}}, 'payload': {'type': 'object', 'properties': {'upload_date': {'type': 'number'}, 'build_number': {'type': 'number'}, 'locale': {'type': 'string'}, 'maxRunTime': {'type': 'number'}, 'version': {'type': 'string'}, 'next_version': {'type': 'string'}, 'appVersion': {'type': 'string'}, 'releaseProperties': {'type': 'object', 'properties': {'appName': {'type': 'string'}, 'buildid': {'type': 'string'}, 'appVersion': {'type': 'string'}, 'hashType': {'type': 'string'}, 'platform': {'type': 'string'}, 'branch': {'type': 'string'}}, 'required': ['appName', 'buildid', 'appVersion', 'hashType', 'platform', 'branch']}, 'upstreamArtifacts': {'type': 'array', 'items': {'type': 'object', 'properties': {'taskType': {'type': 'string'}, 'locale': {'type': 'string'}, 'taskId': {'type': 'string'}, 'paths': {'type': 'array', 'minItems': 1, 'uniqueItems': True, 'items': {'type': 'string'}}, 'zipExtract': {'type': 'boolean'}}, 'required': ['taskId', 'taskType', 'paths', 'locale']}, 'minItems': 1, 'uniqueItems': True}}, 'required': ['upload_date', 'upstreamArtifacts', 'releaseProperties'], 'optional': ['build_number', 'version', 'locale', 'maxRunTime', 'appVersion', 'next_version']}}, 'required': ['payload', 'dependencies']}
2019-09-10 00:15:29,289 - beetmoverscript.task - INFO - Buckets: ['nightly']
2019-09-10 00:15:29,289 - beetmoverscript.task - INFO - Action types: ['push-to-nightly']
2019-09-10 00:15:29,290 - beetmoverscript.task - DEBUG - Loading release_props from task's payload: {'appName': 'Firefox', 'appVersion': '71.0a1', 'branch': 'mozilla-central', 'buildid': '20190909214621', 'hashType': 'sha512', 'platform': 'linux'}
Automation Error: python exited with signal -9

This sounds like OOM on the new workers to me. I asked CloudOps to bump the memory.

See Also: → 1579841
Component: General → Release Automation: Uploading
QA Contact: catlee → mtabara

I believe this is fixed now. Please reopen if you see the error again.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

New occurrence:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270156413&repo=mozilla-central&lineNumber=22

2019-10-07 22:40:34,357 - signingscript.sign - DEBUG - Found firefox/gmp-clearkey/0.1/libclearkey.so to sign widevine
2019-10-07 22:40:34,358 - signingscript.sign - DEBUG - Found firefox/plugin-container to sign widevine_blessed
2019-10-07 22:40:34,359 - signingscript.sign - DEBUG - Widevine files to sign: {'firefox': 'widevine', 'firefox/libxul.so': 'widevine', 'firefox/firefox': 'widevine', 'firefox/firefox-bin': 'widevine', 'firefox/gmp-clearkey/0.1/libclearkey.so': 'widevine', 'firefox/plugin-container': 'widevine_blessed'}
2019-10-07 22:40:34,360 - signingscript.utils - INFO - mkdir /app/workdir/wvtarsegsfdxd
2019-10-07 22:42:29,201 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/libxul.so.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/firefox.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/firefox-bin.sig to the sigfile paths...
2019-10-07 22:42:29,206 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/gmp-clearkey/0.1/libclearkey.so.sig to the sigfile paths...
2019-10-07 22:42:29,207 - signingscript.sign - DEBUG - Adding /app/workdir/wvtarsegsfdxd/firefox/plugin-container.sig to the sigfile paths...
Automation Error: python exited with signal -9

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

OOM in the signing pool?

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #5)

OOM in the signing pool?

Somehow expected given the hardware difference that we had between AWS and GCP.
@catlee pushed a fix for beefier instances there in terms of CPU and memory.
All jobs from PM nightlies are now green - https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=nightly&revision=4651f71eeb5476a6dc9002a47a45c3a5b17aba6c

Will keep an eye on the AM nightlies to see how this behaves, before closing the bug.

Turns out I was wrong and once @catlee deployed his fix earlier this morning, we didn't push anything to GCP signing-production environment to enforce recent changes. We rerun that job to deploy those. GCP will gracefully remove the old instances and ramp-up new ones once that's complete.

Regressed by: 1542819

Follow-up problem, earlier today, in bug 1585603 we landed GCP support for addonworkers. Missed some configurations and things blew up. TIL that if this GCP app no longer works, scaling is broken for all of the workers.

I pushed https://github.com/mozilla-releng/k8s-autoscale/pull/47/ to fix this. Waiting for the k8s-autoscale pods to come back green and then auto-scaling to start again.

See Also: → 1585603

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #9)

Follow-up problem, earlier today, in bug 1585603 we landed GCP support for addonworkers. Missed some configurations and things blew up. TIL that if this GCP app no longer works, scaling is broken for all of the workers.

I pushed https://github.com/mozilla-releng/k8s-autoscale/pull/47/ to fix this. Waiting for the k8s-autoscale pods to come back green and then auto-scaling to start again.

Green signing jobs + auto-scaling is now fixed. We should no longer see issues of this sort for now.

FTR: all subsequent balrog jobs failing are due to an unrelated balrog issue, tracked separately in bug 1587078.

See Also: → 1587078

Green jobs + auto-scaling so I think we can close this for now.
Feel free to re-open should you see another signing problem.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Keywords: regression
See Also: → 1863769
You need to log in before you can comment on or make changes to this bug.