Lando services down - Could not Communicate with Lando API
Categories
(Conduit :: Lando, defect, P1)
Tracking
(Not tracked)
People
(Reporter: glob, Assigned: ckolos)
Details
Lando UI is reporting Could not Communicate with Lando API for all revisions.
Lando API is returning HTTP/502 for all requests.
| Assignee | ||
Comment 1•6 years ago
|
||
GKE failed to auto-repair a failed node in the NAT node-pool required for API communication with transplant. Attempting to re-spin the node pool.
| Assignee | ||
Comment 2•6 years ago
|
||
Blocked on persistent GKE failures when launching nat-pool. Escalating as soon as I can get access to support role.
| Assignee | ||
Comment 3•6 years ago
|
||
final update:
A legacy NAT configuration was causing the issue of a new node-pool failing. Deleting all nat configs (pool, routes, fw rules, etc) and reapplying managed to recreate the cluster. A redeploy of already-in-prod code was then don and the API endpoint is no longer failing as well as lando ui returning healthy on its heartbeat.
My apologies for the delay on this.
Comment 4•6 years ago
|
||
Attempts to reland fail with
The requested landing_path is valid, but transplant failed.Please retry your request at a later time.```
From Standard8: "It gave me an ISE after submitting the landing on the first time I ran it. Then it comes up with that error"
Comment 5•6 years ago
|
||
It's bac to "Could not Communicate with Lando API" - not sure if that is due to ongoing work.
<ckolos> I've been working on the issue on/off since 2 AM my time.
There's something wrong on the GCP side
I can get it to recover, but it falls over soon after.
I've engaged with support @ GCP and will update when I know more. Meantime, if you're behind SSO, you can follow https://firefoxoperations.statuspage.io
<habib> we have escalated this with Google
Comment 7•6 years ago
|
||
Support ticket with Google:
https://enterprise.google.com/supportcenter/managecases#Case/001000000040sBR/U-20699520
Comment 8•6 years ago
|
||
autoland reopened when this bug and bug 1582383 got resolved at 17:58 UTC.
Thank you for fixing it.
Description
•