Closed Bug 665123 Opened 9 years ago Closed 6 years ago

change timeouts on hg pulse plugin

Categories

(Webtools :: Pulse, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dustin, Unassigned)

References

Details

As per discussion in #ops, the pulse hook currently has a longer timeout than the hg-lock-file timeout, resulting in errors for others pushing when the pulse hook is timing out.

As a short-term solution, these timeouts should be ordered such that a pulse send will fail and the push complete in less time than it takes another user to time out waiting on the hg lock.
Christian, realizing you're really busy, is there any way we can take care of this today (Friday)?  I'd hate to have this melt down again over the weekend.

If it's a tweak that someone with root on the hg systems can make, can you describe that here?
Yep, looking into it in the next hour
Hmmm, looked into this a bit and I think it may be due to flow control kicking in...so the connection is there it just isn't accepting published messages. I need to investigate more. I'll watch this over the weekend to make sure it doesn't bring hg down.
If it is flow control it may help to update erlang:

http://www.lshift.net/blog/2009/12/01/garbage-collection-in-erlang

Still investigating...
(In reply to comment #1)
> Christian, realizing you're really busy, is there any way we can take care
> of this today (Friday)?  I'd hate to have this melt down again over the
> weekend.
...

(In reply to comment #4)
...
> 
> Still investigating...



legneato: While you investigate, can we meanwhile disable this on production hg asap? I'm totally fine with re-enabling this hook after the investigation+fix are done, but right now, I'm concerned that this could cause another tree closure with no warning.
I'm not super concerned. I'd prefer it to stay on.
I think the purpose of this bug was just to correctly order the timeout of the hook script with respect to the overall hg lock timeout, so that even if the script *does* hang again, it doesn't cause the cascading failure.  It seems like the investigation is into deeper issues better investigated on bug 665118?
Blocks: 1022701
Assignee: christian → nobody
This is going to be fixed by completely changing the way the publisher works (decoupling it via a "maildir"-style queue), so WONTFIXing this, as we're going to ditch all the old hg pulse-shim code.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.