Lando crashing before a landing job is "submitted" leaves job in an invalid state
Categories
(Conduit :: Lando, defect)
Tracking
(Not tracked)
People
(Reporter: zeid, Unassigned)
Details
- Stack: https://lando.services.mozilla.com/D220409/
- Landing job ID: 93992
- Sentry issue: https://mozilla.sentry.io/issues/5841282752
In order to allow the page to load, I cancelled the job manually.
EDIT:
Also worth noting that diff IDs were not recorded correctly either, possibly related to the same issue:
Revisions: D221433 diff None ← D221434 diff None ← D220412 diff None ← D220409 diff None
Comment 1•1 month ago
|
||
Looks like this probably happened because we create the landing job here without specifying a status, then just below when we add the revisions to the job here, which also commits to the DB, and then shortly after we set the status here. So the pod failed after the job was created with revisions but before the status and landed revision diffs were set and committed.
Perhaps we should set the status to some default (submitted
?) when we initialize the landing job, or we could create a function which builds the job and adds all the associated revisions in one transaction.
Reporter | ||
Comment 2•1 month ago
•
|
||
I think Lando crashed mid-way before all the revisions/patches were correctly associated with the landing job. That's the main reason that the landing job is not submitted until after all the revisions are correctly associated, and why we don't want the status to be submitted at the same time as the creation of the job. So in a way, this is the intended behaviour by design, to fail if a job is not fully created.
I think one possible solution here is to fix the serializer to show "no status", and then have a check or job that detects jobs that do not have a status for a long time and mark them as invalid/cancelled. This is an edge case which I strongly suspect is related to a memory or CPU issue.
EDIT:
we could create a function which builds the job and adds all the associated revisions in one transaction.
I think this is also a potentially good solution. There are quite a few things that happen during job creation (including fetching and storing diffs) which we'd have to refactor quite a bit to get this to work. I think this would anyway be easier in the new Lando vs. the current one. Regardless, changing the way status is serialized would fix the page load issue, which would at least not cause a server error.
Reporter | ||
Updated•1 month ago
|
Description
•