Closed Bug 1682996 Opened 5 years ago Closed 5 years ago

Frequent update requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://aus4-admin.mozilla.org/api/v2/releases/Firefox-mozilla-central-nightly-20201216214834

Categories

(Release Engineering :: Release Automation, defect, P5)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: oremj)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=324762052&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/FfmTZluBSKKW9NGBJBKvzw/runs/2/artifacts/public/logs/live_backing.log


2020-12-16 23:36:08,563 - balrogclient.api - DEBUG - REQUEST STATS: {"timestamp": 1608161768.5636039, "method": "GET", "url": "https://aus4-admin.mozilla.org/api/v2/releases/Firefox-mozilla-central-nightly-20201216214834", "status_code": 403, "elapsed_secs": 0.146679}
2020-12-16 23:36:08,563 - redo - DEBUG - retry: Caught exception: 
Traceback (most recent call last):
  File "/app/lib/python3.8/site-packages/redo/__init__.py", line 170, in retry
    return action(*args, **kwargs)
  File "/app/lib/python3.8/site-packages/balrogscript/script.py", line 94, in <lambda>
    retry(lambda: submitter.run(**release), jitter=5, sleeptime=10, max_sleeptime=30, attempts=10)
  File "/app/lib/python3.8/site-packages/balrogscript/submitter/cli.py", line 404, in run
    return NightlySubmitterBase.run(self, *args, schemaVersion=4, **kwargs)
  File "/app/lib/python3.8/site-packages/balrogscript/submitter/cli.py", line 214, in run
    return self.run_backend2(
  File "/app/lib/python3.8/site-packages/balrogscript/submitter/cli.py", line 346, in run_backend2
    existing_release = balrog_request(session, "get", url)
  File "/app/lib/python3.8/site-packages/balrogclient/api.py", line 107, in balrog_request
    resp.raise_for_status()
  File "/app/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://aus4-admin.mozilla.org/api/v2/releases/Firefox-mozilla-central-nightly-20201216214834
2020-12-16 23:36:08,568 - redo - DEBUG - sleeping for 11.57s (attempt 1/10)
2020-12-16 23:36:20,139 - redo - DEBUG - attempt 2/10```
Summary: Frequent requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://aus4-admin.mozilla.org/api/v2/releases/Firefox-mozilla-central-nightly-20201216214834 → Frequent update requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://aus4-admin.mozilla.org/api/v2/releases/Firefox-mozilla-central-nightly-20201216214834

Looking at https://firefox-ci-tc.services.mozilla.com/provisioners/scriptworker-k8s/worker-types/gecko-3-balrog , it seems like there are good workers and bad workers (the failures seem to be isolated to a handful of workers).

Could we have bad credentials? Or could we be creating workers outside of an IP allowlist?

Flags: needinfo?(bhearsum)

jmaher
aki: how do you tell the workers, it seems to always be unique ID on that link
aki
jmaher: if you click on a link under worker id, then you'll see the history for that worker. they're spot instances so they don't last forever, but they last 1+ tasks. i don't see any workers with both green and red tasks; i've only seen all green or all red
jmaher
oh, I see
aki
if i sort by task started, then i see there is a batch of green between batches of red, which tells me it might not be a server hiccup
could be a server hiccup, but i'm currently guessing bad workers

I wonder if https://bugzilla.mozilla.org/show_bug.cgi?id=1681129 is related - it was just fixed in the last 24h, and made changes to the whitelists.

Flags: needinfo?(bhearsum) → needinfo?(oremj)
Blocks: 1683096

:bhearsum was right, it was a side-effect of bug 1681129 being applied. We believe that the additional scriptworker IPs were added/applied in a temporary branch that wasn't merged to master, so when bug 1681129 was applied it overwrote those "temporary" changes and started causing 403's for approximately 4/5 of the requests from the scriptworker pool. I created a PR to add the other scriptworker IPs and applied it which has resolved this bug, but there may be some other missing IPs we need to add.

Flags: needinfo?(oremj)
Blocks: 1683215

Thank you!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
No longer blocks: 1683096
Regressed by: 1681129

jbuck or oremj - do we need a similar change for balrog (and ship it?) stage? I just hit this with balrogscript stage scriptworkers trying to talk to balrog stage admin: https://firefoxci.taskcluster-artifacts.net/JCAUz6pRTpCGBhBgKRCZpQ/0/public/logs/live_backing.log

Flags: needinfo?(oremj)
Flags: needinfo?(jbuckley)
Assignee: nobody → oremj
Flags: needinfo?(oremj)

Fixed for balrog stage admin.

Also updated shipit api dev.

Flags: needinfo?(jbuckley)

(In reply to Jeremy Orem [:oremj] from comment #11)

Fixed for balrog stage admin.

Working well again, thanks!

Component: Release Automation: Updates → Release Automation
You need to log in before you can comment on or make changes to this bug.