Closed
Bug 1036244
Opened 10 years ago
Closed 10 years ago
Occasional mirror process hangs "corrupt" a web head - monitor and detect
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: hwine, Assigned: bkero)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] )
We've seen a couple of "hung" processes in the new hg webhead sync process. It would be great to monitor for that so we can find and fix before it turns into an intermittent failure to devs and CI.
Context from #releng today:
15:50 < glandium> bkero: i think i always get the the same webhead
15:50 < bkero> glandium: That should be pretty easy to confirm using the aforementioned method.
15:51 < glandium> bkero: https://pastebin.mozilla.org/5535061
15:51 < glandium> bkero: and that's a changeset from yesterday
15:53 < glandium> bkero: https://pastebin.mozilla.org/5535062 that's the most recent error i got
15:54 < bkero> glandium: found hgweb6 to be missing that changeset, all the others are 200
15:54 < bkero> glandium: Ah, a 'pull' operation got locked
15:57 < bkero> glandium: should be updated now, see if any changes
Comment 1•10 years ago
|
||
per IRC this was theoretically part of an old ssh configuration issue that was fixed, but this webheads daemon was likely never restarted. And this should never happen again.
That said, I still agree monitoring to verify that is good practice.
Updated•10 years ago
|
Component: Server Operations: Developer Services → Mercurial: hg.mozilla.org
Product: mozilla.org → Developer Services
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/66]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/66] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] [kanban:engops:https://kanbanize.com/ctrl_board/6/66]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] [kanban:engops:https://kanbanize.com/ctrl_board/6/66] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846]
Assignee | ||
Comment 2•10 years ago
|
||
Have we had any of these issues occur recently? IIRC this was fixed by disabling the SSH ControlSockets.
Assignee: server-ops-devservices → bkero
Assignee | ||
Comment 3•10 years ago
|
||
AIUi there haven't been any failures attributed to this for quite some time. As such, it's probably not an effective use of engineering resources until that happens. This work will likely be done when the replication infrastructure is rewritten to be more robust.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•