Open Bug 1925039 Opened 5 months ago Updated 5 months ago

Lando times out on large sized stacks during landing

Categories

(Conduit :: Lando, defect)

defect

Tracking

(Not tracked)

People

(Reporter: pehrsons, Unassigned)

References

Details

I'm trying to land D224712. It's the 16th patch in a stack of 39 patches. The stack consists of several bugs that are linked because they depend on each other. Just loading the page takes ~45 seconds. Trying to land it times out after 60 seconds.

If I remove the link to its child commit D224713 such that the stack only contains 16 patches, loading the page takes 20 seconds.

Let's do the math: 45/39 = 1.15s per patch. 20/16 = 1.25s per patch. Performance seems to scale almost linearly with the number of patches in the stack, regardless of whether they are parents or children of the one lando might need to land.

In phabricator I used its Conduit API pages to test the queries lando does to phabricator when loading D224712. They were fast.

Whatever else lando is doing has terrible performance characteristics and is to at the very least some extent entirely unnecessary. When I open D224712 in lando I am entirely uninterested in landing, or checking the status of, its children. They are for later.

Landing D224712 without children in the stack now takes ~40 seconds and leads to

Landing Failed
The warnings present when the request was constructed have changed. Please acknowledge the new warnings and try again.

Acknowledging the warnings and landing again renders the same timing and result.

I have put the stack back together.

Whatever else lando is doing has terrible performance characteristics and is to at the very least some extent entirely unnecessary

It is not accurate to state there are "entirely unnecessary" actions being taken.

We've spent a lot of time on reducing work Lando needs to do in order to get all the required information to display the required information; clearly this is a lot more than just a single call for each revision. We have some work in progress that should improve this situation further.

Clearing dependencies as a workaround exists that you've already employed.

Summary: Lando times out on moderately sized stacks → Lando times out on large sized stacks

(In reply to :glob ✱ from comment #1)

Whatever else lando is doing has terrible performance characteristics and is to at the very least some extent entirely unnecessary

It is not accurate to state there are "entirely unnecessary" actions being taken.

By "to at the very least some extent" I intended to mean the time spent loading and checking children of the rev I want to land. Since lando can land significantly faster and with the same end result without those children being children, I do consider those actions unnecessary.

We've spent a lot of time on reducing work Lando needs to do in order to get all the required information to display the required information; clearly this is a lot more than just a single call for each revision. We have some work in progress that should improve this situation further.

Would you mind linking the work you are referring to to this bug?

Clearing dependencies as a workaround exists that you've already employed.

I haven't been able to land with the no-children workaround, mind you. I'll file another bug as a blocker then as the failure mode is different.

With 16 patches taking 40s to land (though I haven't seen it land yet, as it bailed), 24 patches is needed to reach 60s if it scales linearly. 25 patches should warrant a timeout then. Is that a large stack? I would disagree. But it's beside the point; it's those times I need to land a stack of 25 patches or more that I need lando to work, because it's the one tool at my disposal.

See Also: → 1925062

(In reply to Andreas Pehrson [:pehrsons] from comment #0)

Landing D224712 without children in the stack now takes ~40 seconds and leads to

Without the warnings (see bug 1925062 comment 1), first load took 24s, after clicking the button to land the POST took 21s to return 302, and the subsequent load took 19s.

Does the failed landing increase the POST answer time by 50-100%? So it must have known it failed but has so many more things to do that you might never find out (if there's a timeout).

Increasing the timeout seems like a good short-term fix. Are there good reasons for not doing that?

Flags: needinfo?(glob)

Increasing the timeout seems like a good short-term fix. Are there good reasons for not doing that?

Like most things this is more complicated than it might seem; however, I'll see if what can happen here if anything.

Flags: needinfo?(glob)
Summary: Lando times out on large sized stacks → Lando times out on large sized stacks during landing
You need to log in before you can comment on or make changes to this bug.