Closed Bug 1423881 Opened 8 years ago Closed 8 years ago

Upload symbols by sending URL to symbol artifact instead of downloading and re-uploading it

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: ted, Assigned: ted)

References

Details

Attachments

(1 file)

We see intermittent failures in the upload-symbols task where it runs over its allotted 10 minute runtime: bug 1392349. Fundamentally the task just downloads the symbols.zip artifact from the build then uploads it back to the Socorro symbol upload API. Once we switch symbol uploads to use Tecken instead of Socorro (bug 1422735), we'll be able to change the script to not download the symbols.zip locally. peterbe added a feature to Tecken where the API will accept a URL to a symbols.zip, and then fetch it on the server side and upload the files to the symbol store. This should save several minutes in each upload-symbols task because the symbols-full.zip is >1GB on many platforms.
The upload-symbols task worked (and only took 5 minutes): https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a408c80658de0e89e19ab795902e4a2ca803305&selectedJob=150914627 ...but looking at the log shows: [task 2017-12-08T22:18:23.620Z] INFO:upload-symbols:Uploading symbol file "https://queue.taskcluster.net/v1/task/FehHTo3DRc6SngDZ1SAQKQ/artifacts/public/build/target.crashreporter-symbols-full.zip" to "https://symbols.stage.mozaws.net/upload/" [task 2017-12-08T22:18:23.620Z] INFO:upload-symbols:Attempt 1 of 5... [task 2017-12-08T22:20:23.770Z] ERROR:upload-symbols:Error: HTTPSConnectionPool(host='symbols.stage.mozaws.net', port=443): Read timed out. (read timeout=120) [task 2017-12-08T22:20:23.770Z] INFO:upload-symbols:Retrying... [task 2017-12-08T22:20:32.779Z] INFO:upload-symbols:Attempt 2 of 5... [task 2017-12-08T22:21:11.276Z] INFO:upload-symbols:Uploaded successfully! So we probably need to bump the HTTP timeout so this doesn't need to retry. Presumably it needs to wait for the server to download the entire zip, so it might take a little while.
In Datadog, there is a graph here: https://app.datadoghq.com/dash/339351/tecken-performance?live=true&page=0&is_auto=false&from_ts=1512871130054&to_ts=1512957530054&tile_size=l&fullscreen=247871689 called "Upload by Download URL times (msecs)" which gives us numbers of how long it takes for Tecken to do that download. At the time of writing, there is no data in that graph, on $prod, so I we can't gauge how long that normally takes. But yes, you're right, it's yet another thing the tecken upload process has to do and it adds to the total processing time which is currently capped to 300 seconds in the Elastic load balancer.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=ba6609fb72426b704507ac7e997a17665e241db5 Can you demonstrate what I should look for in that. Is this a failed symbol upload?
I just rebased the patch and changed the timeouts, it hasn't actually uploaded symbols yet. :) It looks like the symbol upload tasks are running, so they should complete in a few minutes.
They both completed successfully without having to retry, and each took ~4 minutes.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #7) > They both completed successfully without having to retry, and each took ~4 > minutes. The timeout is set to 300s on Stage and prod of symbols.mozilla.org. So ~4 minutes makes me nervous. In other news, I checked Datadog for $stage and found that there were 2 "Upload by Download"s. Both your files were 1.1GB and the first one took 30 seconds and the second one 48 seconds just to download to disk. Individually they both took 17 seconds to extract from .zip to a tree of files. Those are numbers we can't do much about (except we might extract the zip file to disk faster by calculating first which files we can skip). I get slightly nervous that it took about 4 minutes. Let's keep an eye on things. When we use symbols.mozilla.org (instead of Stage) we're going to benefit more from caching around S3 lookups and stuff.
Oh, the 4 minutes is end-to-end time, so there's a lot of overhead for other bits. From your numbers it sounds like we only spend ~60s in the actual Tecken request, so that should be well within the margin of safety.
Comment on attachment 8937944 [details] bug 1423881 - Upload symbols by sending URL to symbol artifact. https://reviewboard.mozilla.org/r/208646/#review214430 Two nits bounced back to you to consider. ::: toolkit/crashreporter/tools/upload_symbols.py:74 (Diff revision 1) > description='Upload symbols in ZIP using token from Taskcluster secrets service.') > parser.add_argument('zip', > - help='Symbols zip file') > + help='Symbols zip file - URL or path to local file') > args = parser.parse_args() > > - if not os.path.isfile(args.zip): > + if not args.zip.startswith('http') and not os.path.isfile(args.zip): The equivalent of this test, when it's a URL, would be to do a requests.head() on the URL. However, perhaps that's not "safe" since it'd could be a local URL and thus a useless test. Would it be helpful to do a quick `requests.head(args.zip).status_code < 400`? ::: toolkit/crashreporter/tools/upload_symbols.py:119 (Diff revision 1) > - files={'symbols.zip': open(args.zip, 'rb')}, > headers={'Auth-Token': auth_token}, > allow_redirects=False, > - timeout=120) > + # Allow a longer read timeout because uploading by URL means the server > + # has to fetch the entire zip file, which can take a while. The load balancer > + # in front of Tecken has a 300 second timeout, so we'll use that. I would prefer 'symbols.mozilla.org' instead of 'Tecken'. Tecken is just a code name. Another name I've been using is "Mozilla Symbol Server" but it's very verbose.
Attachment #8937944 - Flags: review?(peterbe) → review+
Comment on attachment 8937944 [details] bug 1423881 - Upload symbols by sending URL to symbol artifact. https://reviewboard.mozilla.org/r/208646/#review214430 > The equivalent of this test, when it's a URL, would be to do a requests.head() on the URL. > However, perhaps that's not "safe" since it'd could be a local URL and thus a useless test. > > Would it be helpful to do a quick `requests.head(args.zip).status_code < 400`? I'm not sure that's actually worth the network round trip. Presumably Tecken will error if the URL errors anyway, so it doesn't buy us much. The file check is useful mostly because otherwise we'd get an exception in `open` below. > I would prefer 'symbols.mozilla.org' instead of 'Tecken'. Tecken is just a code name. > Another name I've been using is "Mozilla Symbol Server" but it's very verbose. Good point. I'll fix that.
Hi Can anyone please explain me did you found the code for this bug and how did you send to mozreview
(In reply to Harris Jillani from comment #14) > Hi Can anyone please explain me did you found the code for this bug and how > did you send to mozreview Hi there, There are good docs for getting started with mozreview here: https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview-user.html If you get stuck, there's a #introduction channel on irc.mozilla.org where there are usually people available to help.
Pushed by tmielczarek@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8df82f289066 Upload symbols by sending URL to symbol artifact. r=peterbe
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: