1552485 - [tracking] Reduce release Graph End-to-End times

Assignee

Description

•

5 years ago

•

While doing some cleanup for bug 1530728, I was glancing over our release artifacts and figured there's no consumers at this point for the individual checksums detached signatures.

To recap:

beetmover submits files X, Y to S3 and generates a target.checksums at the end
checksums-signing consumes that target.checksums and signs that, providing a target.checksums and a detached signature target.checksums.asc
beetmover-checksums-signing consumes the two from above and transfers them under beetmover-checksums folder in root in candidates (e.g. for 66.0.5), slightly pretty-named:
target.checksums -> firefox-{version}.beet
target.checksums.asc -> firefox-{version}.checksums.asc

Follow-up, release-generate-checksums job, iterates in the public S3 folder and reads the contents of the beet files and concatenates them together in a big-fat SHA{256,512}SUMS. That later gets signed and beetmoved in the root of the ~candidates directory.

But at this point there is no regular use of the asc files from S3.
There's ~~two~~ three options:

We drop the asc files as they are currently ununused.
We add verification for those in the release-generate-checksums to ensure the files have not been tampered within S3.
We rewrite the existing mozharness script into a scriptworker that ensures the CoT verification and downloads the files from upstream tasks instead. However, that might look like a huge payload tasks since there are currently 992 individual checksums files for 66.0.5, for example.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Blocks: 1530728

Updated

•

5 years ago

Type: defect → task

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 1

•

5 years ago

Re-using this bug as this conversation happened at the all-hands. This is to track all the work that's to be done, including but not limited to:

clear plan with measureble small wins
beetmover and checksums improvements work
balrog improvements work

More bugs are to be filed against this, but for now, dropping some of the ideas:

Stop producing target.checksums files from BM jobs (maybe? or we could use these to grab both sha512/sha256 hashes for the checksum scriptworker below)
Remove checksum-signing and beetmove-checksum-signing jobs
Have beetmover and balrog chunking match l10n chunking
-- we discovered that balrog jobs take over 5 hours in cumulative run time to complete!
create a new checksum scriptworker type to generate the signed SHASUMS files by inspecting all the CoT artifacts for the release (or target.checksums)
-- alternatively, create the SHASUMS file in a generic worker that knows how to do CoT verification, and do regular signing / beetmoving

Component: Release Automation: Other → General

QA Contact: sfraser → catlee

Summary: investigate the usage of individual checksums signing jobs → [tracking] Reduce release Graph End-to-End times

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 2

•

5 years ago

•

Edited

To recap here as I grabbed this bug earlier this week, the ideas that have been depicted in Whistler and so far in this bug largely are related to:

checksums
a) save up time
chunkification
a) save up time

To extrapolate, in 1), this includes removing the so-far useless signing of individual checksums to save computational time, but also to enhance security by ensuring SHASUMS big-fat checksums are CoT protected.

To extrapolate, in 2) we want to chunkify to reduce the number of jobs, hence the deps edges in the graph. But this might not mean necessarily time saved as some of these might still block each other.

Chris AtLee [:catlee]

Comment 3

•

5 years ago

re: 2), I think we also wanted to have beetmover and balrog chunking match the l10n chunking

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 4

•

5 years ago

So far the ideas that were floating around were mainly related to checksums and chunkification. Since we might discover other potential improvements along the way, I suggest we keep this bug as a meta bug and file individual ones for each chapter we want to optimize.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Depends on: 1567429

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Depends on: 1567431

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 5

•

5 years ago

Two ideas to improve our search in this are:

we need to create the perfect benchmark - that end-to-end time that we know we can't beat (not taking into consideration hardware updates or more paralelization). So basically a BF deep in the graph, offset each task to its latest dependency - as if there was no waiting time - and compute the end-to-end boundary. We know for sure that's a limit we can't reach. Once we have that, we can better measure the extent to which we can optimize the graph
we need to look again at the layout of tasks (similarly to what we did in Whistler around the checksums conversation) - the graph layout - to see if there's room to reorganize, reoptimize and redo some of the nodes/edges so that we can get better timings. A good example is checksums. If we redo the logic, we main gain a similar cost but a smaller grapher. There could be similar cases.

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Depends on: 1572102

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Group: mozilla-employee-confidential

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Depends on: 1579067

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Assignee: nobody → mtabara

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Depends on: 1533337

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Updated

•

5 years ago

Blocks: 1590054

Mihai Tabara [:mtabara]⌚️GMT

Assignee

Comment 6

•

5 years ago

Conclusions - https://mihaitabara.github.io/2019/11/21/release-end-to-end-reduced-by-40-percent.html
Follow-ups will be addressed in dedicated bugs.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

[tracking] Reduce release Graph End-to-End times

Categories

(Release Engineering :: General, task)

Tracking

(Not tracked)

People

(Reporter: mtabara, Assigned: mtabara)

References

(Depends on 2 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Comment 5

Updated

Updated

Updated

Updated

Updated

Updated

Comment 6