Open Bug 1779449 Opened 3 years ago Updated 1 year ago

upload-symbols jobs are slow

Categories

(Tecken :: Upload, defect)

defect

Tracking

(Not tracked)

People

(Reporter: mstange, Unassigned)

Details

https://searchfox.org/mozilla-central/source/toolkit/crashreporter/tools/upload_symbols.py

"Sym" jobs for macOS builds on treeherder take about 12min at the moment. Example

These jobs seem to be mostly CPU-bound, decompressing and recompressing various files.

For example, we do the following for each dSYM:

  • We decompress the .dSYM.tar out of the incoming .tar.zstd archive.
  • Then we recompress it to .dSYM.tar.bz2
  • Then we store it in a .zip file

Could we make the build job produce an archive of the right shape to begin with?

I can think of a few reasons why we might be doing what we're doing at the moment:

  • On Windows, we need to cab-compress PDB files and binaries. Maybe we didn't want to do this cab compression inside the build job for some reason.
  • There is a file size limit on the build job artifacts. That's probably why we use zstd for the build job artifact. But we use zip during the upload. Could we make the server support zstd uploads?
Flags: needinfo?
Flags: needinfo? → needinfo?(cdenizet)

Mike, what do you think ?

Flags: needinfo?(cdenizet) → needinfo?(mh+mozilla)

See bug 1654994 why this was done. Back then I thought bug 1635150 or some follow up would move the work to the symbols server, but that never happened.

Flags: needinfo?(mh+mozilla)

Hmm, if the compression work needs to happen somewhere anyway, then I guess it doesn't really matter where it happens - either way it will delay the time at which symbols become available on the server.

Though, actually, maybe we can move dSYM compression out of the critical path of getting .sym files to the server. We could do the upload in two stages:

  1. First we make a zip with the .sym files, and upload that zip.
  2. Then we compress the dSYMs / .so.dbgs / .pdbs.
  3. Then we stuff those into a new zip file, and upload that.

Then at least the profiler symbolication API will return symbols sooner, because it only needs the .sym files and not the dSYMs.

This isn't a Socorro bug. Socorro bugs don't cover the upload_symbols scripts/tasks. Can someone put it in the right product/component?

It is a Socorro bug in the sense that things won't improve significantly until Socorro accepts the .tar.zst archives.

Mike, where are you proposing the dSYM.gz compression would happen? And how do you feel about the two-stage idea from comment 3?

Flags: needinfo?(mh+mozilla)

Ugh.

Component: Symbols → Upload
Product: Socorro → Tecken

In December, bendk adjusted the code so it uploaded smaller zip files which should decrease the likelihood that the upload times out and has to be restarted.

Did that improve timings for the upload-symbols job? Are the timings captured and graphed anywhere?

Flags: needinfo?(mstange.moz)

It looks like the Mac upload jobs still take around 10min each: https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=sym%2Cshippable (click a Sym / Symx job, look for "Duration: " in the bottom left corner)
I'm not aware of any graphs of these numbers.

Last I checked, the bulk of the time is not spent uploading, but compressing.

Flags: needinfo?(mstange.moz)

(In reply to Markus Stange [:mstange] from comment #9)

It looks like the Mac upload jobs still take around 10min each: https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=sym%2Cshippable (click a Sym / Symx job, look for "Duration: " in the bottom left corner)
I'm not aware of any graphs of these numbers.

Last I checked, the bulk of the time is not spent uploading, but compressing.

Since bug 1807204, compression is parallelized, but compressing XUL with bz2 is still what takes the most time during the compression phase. That said, the compression phase only takes 5 minutes in the upload task I looked at. The remainder? 1 minute of Mercurial and... 4 minutes of upload.

Flags: needinfo?(mh+mozilla)
You need to log in before you can comment on or make changes to this bug.