Open Bug 1863417 Opened 1 years ago Updated 1 year ago

Intermittent Site not up-to-date reason: "/builds/worker/.mozbuild/srcdirs/gecko-8a5b87fe5d69/_virtualenvs/common" does not exist

Categories

(Firefox Build System :: Task Configuration, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: noemi_erli, Unassigned)

References

Details

First appearance here.
Log: https://treeherder.mozilla.org/logviewer?job_id=435207111&repo=mozilla-beta

[task 2023-11-06T19:47:43.115Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:47:43.115Z] WARNING:upload-symbols:Maximum retries hit, giving up!
[task 2023-11-06T19:47:43.361Z] Creating default state directory: /builds/worker/.mozbuild
[task 2023-11-06T19:47:43.361Z] Running "pip check" to verify compatibility between the system Python and the "mach" site.
[task 2023-11-06T19:47:43.361Z] Creating local state directory: /builds/worker/.mozbuild/srcdirs/gecko-8a5b87fe5d69
[task 2023-11-06T19:47:43.361Z] Running "pip check" to verify compatibility between the system Python and the "common" site.
[task 2023-11-06T19:47:43.361Z] Site not up-to-date reason: "/builds/worker/.mozbuild/srcdirs/gecko-8a5b87fe5d69/_virtualenvs/common" does not exist
[taskcluster 2023-11-06 19:47:43.882Z] === Task Finished ===
[taskcluster 2023-11-06 19:47:43.883Z] Unsuccessful task run with exit code: 1 completed in 547.335 seconds
See Also: → 1854239
Component: Messaging System → Task Configuration
Product: Firefox → Firefox Build System

It failed because of this upload failure to the symbols server for all 7 attempts. An upload directly before succeeded.

[task 2023-11-06T19:41:14.571Z] INFO:upload-symbols:Using symbol upload token from the secrets service: "http://taskcluster/secrets/v1/secret/project/releng/gecko/build/level-3/gecko-symbol-upload"
[task 2023-11-06T19:41:14.676Z] INFO:upload-symbols:Uploading symbol file "/tmp/tmpalgaqo4s/symbols.zip" to "https://symbols.mozilla.org/upload/"
[task 2023-11-06T19:41:14.676Z] INFO:upload-symbols:Attempt 1 of 7...
[task 2023-11-06T19:42:50.142Z] INFO:upload-symbols:Uploaded successfully!
[task 2023-11-06T19:42:50.162Z] INFO:upload-symbols:Using symbol upload token from the secrets service: "http://taskcluster/secrets/v1/secret/project/releng/gecko/build/level-3/gecko-symbol-upload"
[task 2023-11-06T19:42:50.248Z] INFO:upload-symbols:Uploading symbol file "/tmp/tmpalgaqo4s/symbols2.zip" to "https://symbols.mozilla.org/upload/"
[task 2023-11-06T19:42:50.248Z] INFO:upload-symbols:Attempt 1 of 7...
[task 2023-11-06T19:42:59.649Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:42:59.649Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:42:59.649Z] ====================
[task 2023-11-06T19:42:59.649Z] 
[task 2023-11-06T19:42:59.649Z] ====================
[task 2023-11-06T19:42:59.649Z] 
[task 2023-11-06T19:42:59.649Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:43:09.535Z] INFO:upload-symbols:Attempt 2 of 7...

Looks like redo.retrier defaults to wait 10 seconds between retry attempts:

https://github.com/mozilla-releng/redo/blob/50cfe8e3656f253f9e51df3a998530351d2d9a8c/src/redo/__init__.py#L16

That's probably not enough time for the rate-limit to expire. I think you should change that to 30 seconds.

redo.retrier defaults to 10 seconds between the first and second attempts, but then uses exponential backoff, so after 3 attempts it's >30s, and there's 80s between attempts 6 and 7. (Also we seem to get 429 on the first attempt, when we shouldn't have a rate-limit?)

Flags: needinfo?(willkg)

Rate limiting isn't by ip address or by account, it's for all concurrent upload attempts. So when a burst of builds all kick off and start pushing symbols, that'll probably trigger rate limiting for some of those builds.

The logs have this:

[task 2023-11-06T19:42:50.248Z] INFO:upload-symbols:Attempt 1 of 7...
[task 2023-11-06T19:42:59.649Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:42:59.649Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:42:59.649Z] ====================
[task 2023-11-06T19:42:59.649Z] 
[task 2023-11-06T19:42:59.649Z] ====================
[task 2023-11-06T19:42:59.649Z] 
[task 2023-11-06T19:42:59.649Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:43:09.535Z] INFO:upload-symbols:Attempt 2 of 7...
[task 2023-11-06T19:43:29.862Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:43:29.862Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:43:29.862Z] ====================
[task 2023-11-06T19:43:29.863Z] 
[task 2023-11-06T19:43:29.863Z] ====================
[task 2023-11-06T19:43:29.863Z] 
[task 2023-11-06T19:43:29.863Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:43:45.945Z] INFO:upload-symbols:Attempt 3 of 7...
[task 2023-11-06T19:43:54.981Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:43:54.981Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:43:54.981Z] ====================
[task 2023-11-06T19:43:54.981Z] 
[task 2023-11-06T19:43:54.981Z] ====================
[task 2023-11-06T19:43:54.981Z] 
[task 2023-11-06T19:43:54.981Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:44:17.819Z] INFO:upload-symbols:Attempt 4 of 7...
[task 2023-11-06T19:44:27.572Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:44:27.572Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:44:27.572Z] ====================
[task 2023-11-06T19:44:27.572Z] 
[task 2023-11-06T19:44:27.572Z] ====================
[task 2023-11-06T19:44:27.572Z] 
[task 2023-11-06T19:44:27.572Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:44:58.523Z] INFO:upload-symbols:Attempt 5 of 7...
[task 2023-11-06T19:45:24.126Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:45:24.126Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:45:24.126Z] ====================
[task 2023-11-06T19:45:24.126Z] 
[task 2023-11-06T19:45:24.126Z] ====================
[task 2023-11-06T19:45:24.126Z] 
[task 2023-11-06T19:45:24.126Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:46:12.327Z] INFO:upload-symbols:Attempt 6 of 7...
[task 2023-11-06T19:46:22.334Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:46:22.334Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:46:22.334Z] ====================
[task 2023-11-06T19:46:22.334Z] 
[task 2023-11-06T19:46:22.334Z] ====================
[task 2023-11-06T19:46:22.334Z] 
[task 2023-11-06T19:46:22.334Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:47:33.970Z] INFO:upload-symbols:Attempt 7 of 7...
[task 2023-11-06T19:47:43.115Z] ERROR:upload-symbols:Error: got HTTP response 429: 
[task 2023-11-06T19:47:43.115Z] ERROR:upload-symbols:Response body:
[task 2023-11-06T19:47:43.115Z] ====================
[task 2023-11-06T19:47:43.115Z] 
[task 2023-11-06T19:47:43.115Z] ====================
[task 2023-11-06T19:47:43.115Z] 
[task 2023-11-06T19:47:43.115Z] INFO:upload-symbols:Retrying...
[task 2023-11-06T19:47:43.115Z] WARNING:upload-symbols:Maximum retries hit, giving up!

If we look at just the "Attempt" lines we get this:

[task 2023-11-06T19:42:50.248Z] INFO:upload-symbols:Attempt 1 of 7...  19:42:50
[task 2023-11-06T19:43:09.535Z] INFO:upload-symbols:Attempt 2 of 7...  19:43:09
[task 2023-11-06T19:43:45.945Z] INFO:upload-symbols:Attempt 3 of 7...  19:43:45
[task 2023-11-06T19:44:17.819Z] INFO:upload-symbols:Attempt 4 of 7...  19:44:17
[task 2023-11-06T19:44:58.523Z] INFO:upload-symbols:Attempt 5 of 7...  19:44:58
[task 2023-11-06T19:46:12.327Z] INFO:upload-symbols:Attempt 6 of 7...  19:46:12
[task 2023-11-06T19:47:33.970Z] INFO:upload-symbols:Attempt 7 of 7...  19:47:33

Seems like it did 7 attempts over 5 minutes. That's probably fine. I'd still change the first one to 30 seconds.

How many other builds were running at the same time? Is there any other way to reduce the number of builds trying to upload symbols at the same time?

If not, that's fine. We'll have to wait until Tecken is migrated to GCP before we can make any changes on the Tecken side of things.

Flags: needinfo?(willkg)

The severity field is not set for this bug.
:ahal, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(ahal)
Severity: -- → S3
Flags: needinfo?(ahal)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.