Closed Bug 1587078 Opened 6 years ago Closed 6 years ago

mono-repo balrogscript deployment to production breaks gecko-3-balrog GCP workers

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: apavel, Assigned: mtabara)

References

Details

(Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

[scriptworker-scripts] Bug 1587078 - temporarily add decoding for ED25519 key 6 years ago Mihai Tabara [:mtabara]⌚️GMT 62 bytes, text/x-github-pull-request		Details \| Review

Andreea Pavel [:apavel]

Reporter

Description

•

6 years ago

TrTreeherder link: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=270281947&resultStatus=testfailed%2Cbusted%2Cexception&revision=035f52aed4427b22facfa883067e298f10ef9e97

Failure log: https://taskcluster-artifacts.net/Vr8TN67WQEGr3bEY5E4fzg/0/public/logs/chain_of_trust.log

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

6 years ago

Assignee: nobody → mtabara

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

6 years ago

Component: Release Automation: Signing → Release Automation: Updates

QA Contact: aki → mtabara

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

6 years ago

Summary: Perma Balrog complete updates Signing exceptions → mono-repo balrogscript deployment to production breaks gecko-3-balrog GCP workers

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 1

•

6 years ago

We had a green balrog job running through the AM nightly at 11:20 UTC https://tools.taskcluster.net/groups/efiPWf9LSMq9J19mjfp3Tw/tasks/Ao4C9XcPQUe0fPAhJrJoyA/runs/0

At 11:50 UTC I've deployed for the first time, the balrogscript from its new home, the mono-repo. The push was https://github.com/mozilla-releng/scriptworker-scripts/pull/24 and I pushed that to production branch.
See:

Something is wrong along the way, digging now.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

6 years ago

Comment 2

•

6 years ago

Found the culprit, it's related to bug 1587068.

We adjusted for signingscript only, but grafted the changes to mono-repo. Then we switched over balrogscript, which still has its ed25519 key double-encoded, but uses https://github.com/mozilla-releng/scriptworker-scripts/blob/master/docker.d/init.sh#L105 in deployment.

The balrog jobs are actually working as expected, the worker is just killing the task afterwards with internal-error.

We can either:
a) patch up mono-repo for now as balrogworkers are the only ones switched to production
b) ask CloudOps to assist us directly on the secrets side.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 3

•

6 years ago

Attached file [scriptworker-scripts] Bug 1587078 - temporarily add decoding for ED25519 key — Details

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 4

•

6 years ago

(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)

Found the culprit, it's related to bug 1587068.

We adjusted for signingscript only, but grafted the changes to mono-repo. Then we switched over balrogscript, which still has its ed25519 key double-encoded, but uses https://github.com/mozilla-releng/scriptworker-scripts/blob/master/docker.d/init.sh#L105 in deployment.

The balrog jobs are actually working as expected, the worker is just killing the task afterwards with internal-error.

We can either:
a) patch up mono-repo for now as balrogworkers are the only ones switched to production

Just did this for now to unblock. Hit another issue with mono-repo deployment (potentially a side-effect of https://github.com/mozilla-releng/scriptworker-scripts/pull/26). Working on debugging this and prepping a fix. If this takes too long, I'll push to production which I know for sure it works ...

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 5

•

6 years ago

Will push in the morning. Evening nightlies will likely fail with Exception. Feel free to ignore, fix is coming in the morning and we'll rerun the jobs.

Comment hidden (Intermittent Failures Robot)

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 7

•

6 years ago

Turns out we were pushing the wrong docker-tag in our Docker hub registry, hence Jenkins didn't pick that up to deploy furthermore in GCP. We were including the project name (e.g. balrogscript ) in the tag, which was incorrect. We pushed https://github.com/mozilla-releng/scriptworker-scripts/commit/c212afe7d381fedac6719b07ceebcc527c109e7d to fix this and re-deployed.

I've rerun all jobs from most recent nightly (AM UTC from 9th October in https://tools.taskcluster.net/groups/S3tAxSvvTr2WdORCBDF0mw) and they are turning back green.

We're done here, sorry again for 2 days bustage.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Comment hidden (Intermittent Failures Robot)

BMO Automation

Updated

•

9 months ago

Component: Release Automation: Updates → Release Automation

You need to log in before you can comment on or make changes to this bug.

Bugzilla

mono-repo balrogscript deployment to production breaks gecko-3-balrog GCP workers

Categories

(Release Engineering :: Release Automation, defect)

Tracking

(Not tracked)

People

(Reporter: apavel, Assigned: mtabara)

References

Details

(Whiteboard: [stockwell disable-recommended])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type