Closed
Bug 1286824
Opened 9 years ago
Closed 9 years ago
balrogVPNProxy is busted after balrog -> cloudops migration
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: intermittent-bug-filer, Assigned: garndt)
References
Details
(Keywords: intermittent-failure)
Comment 1•9 years ago
|
||
garndt, pete: can someone take a look at this, its very frequent in the update tests :)
Flags: needinfo?(pmoore)
Flags: needinfo?(garndt)
Updated•9 years ago
|
Comment 2•9 years ago
|
||
I'm going to see if I can catch a smoking gun...
Comment 3•9 years ago
|
||
15:03 <bhearsum> pmoore: i think https://bugzilla.mozilla.org/show_bug.cgi?id=1286824 is caused by the balrog -> cloudops migration
15:03 <bhearsum> aus4-admin.mozilla.org changed IPs
Comment 4•9 years ago
|
||
Updating summary now that we've diagnosed this a bit. It seems there's a hack in docker-worker that hardcodes the aus4-admin.mozilla.org IP address, so when it changed yesterday the proxy stopped working. I've updated that in https://github.com/taskcluster/docker-worker/pull/234, and Taskcluster folks are currently redeploying docker-worker. Hopefully it will work after that's done.
Blocks: 1248741
Summary: Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for balrogVPNProxy : Internal Server Error → balrogVPNProxy is busted after balrog -> cloudops migration
Updated•9 years ago
|
Severity: normal → blocker
Comment 5•9 years ago
|
||
Code is merged in docker-worker - working on making a new AMI now. Spoke to garndt (woke him up) to get help with this.
Comment 6•9 years ago
|
||
Even after the AMI was rebuilt we were having issues with the vpn proxy connectinng to the new production interface. After a ton of debugging (thanks to garndt and mostlygeek for that), we finally realized that the taskcluster-balrog LDAP account was missing an ACL. We looped jabba in, and he got it added to the vpn_balrog group, which got things fixed up.
All of the funsize balrog jobs that have started since the acl was added have succeeded. Capacity is still low (we stopped new instances from starting after the issue was discovered), but that will come back up over time.
Flags: needinfo?(garndt)
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → garndt
Component: General → Docker-Worker
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•