Closed Bug 1036244 Opened 11 years ago Closed 11 years ago

Occasional mirror process hangs "corrupt" a web head - monitor and detect

Tracking

(Not tracked)

Status:

RESOLVED WORKSFORME

People

(Reporter: hwine, Assigned: bkero)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] )

hwine

Reporter

Description

•

11 years ago

We've seen a couple of "hung" processes in the new hg webhead sync process. It would be great to monitor for that so we can find and fix before it turns into an intermittent failure to devs and CI. Context from #releng today: 15:50 < glandium> bkero: i think i always get the the same webhead 15:50 < bkero> glandium: That should be pretty easy to confirm using the aforementioned method. 15:51 < glandium> bkero: https://pastebin.mozilla.org/5535061 15:51 < glandium> bkero: and that's a changeset from yesterday 15:53 < glandium> bkero: https://pastebin.mozilla.org/5535062 that's the most recent error i got 15:54 < bkero> glandium: found hgweb6 to be missing that changeset, all the others are 200 15:54 < bkero> glandium: Ah, a 'pull' operation got locked 15:57 < bkero> glandium: should be updated now, see if any changes

Justin Wood (:Callek)

Comment 1

•

11 years ago

per IRC this was theoretically part of an old ssh configuration issue that was fixed, but this webheads daemon was likely never restarted. And this should never happen again. That said, I still agree monitoring to verify that is good practice.

Kendall Libby [:fubar] (he/him)

Updated

•

11 years ago

Component: Server Operations: Developer Services → Mercurial: hg.mozilla.org

Product: mozilla.org → Developer Services

:kanban-engops

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/66]

:kanban-engops

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/66] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] [kanban:engops:https://kanbanize.com/ctrl_board/6/66]

Nobody; OK to take it and work on it

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] [kanban:engops:https://kanbanize.com/ctrl_board/6/66] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846]

Ben Kero [:bkero]

Assignee

Comment 2

•

11 years ago

Have we had any of these issues occur recently? IIRC this was fixed by disabling the SSH ControlSockets.

Assignee: server-ops-devservices → bkero

Ben Kero [:bkero]

Assignee

Comment 3

•

11 years ago

AIUi there haven't been any failures attributed to this for quite some time. As such, it's probably not an effective use of engineering resources until that happens. This work will likely be done when the replication infrastructure is rewritten to be more robust.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Occasional mirror process hangs "corrupt" a web head - monitor and detect

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

Tracking

(Not tracked)

People

(Reporter: hwine, Assigned: bkero)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/846] )

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Comment 2

Comment 3