Closed Bug 1281757 Opened 8 years ago Closed 8 years ago

hg vcsreplicator consumer is CRITICAL

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: fubar)

References

Details

I saw a number of hg vcsreplicator consumer is CRITICAL alerts when I checked irc this morning. I took a look at hgweb14 first.

1) The documentation at https://mana.mozilla.org/wiki/display/NAGIOS/procs+-+hg+vcsreplicator+consumer appears to be incorrect of outdated since supervisorctl isn't installed.

I checked https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/ops.html#monitoring-and-alerts which instead lists systemctl as the tool to use. We need to update the nagios docs.

2) It looks like vcsreplicator@7.service is the problem. I tried restarting it, but it failed again. Checking the journal shows:

journalctl -f --unit vcsreplicator@7.service
-- Logs begin at Mon 2016-04-18 21:08:13 UTC. --
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com vcsreplicator[12033]: with open(p, 'wb') as fh:
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com vcsreplicator[12033]: IOError: [Errno 2] No such file or directory: u'/repo/hg/mozilla/users/mantaroh_gmail.com/central/.hg/hgrc'
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@7.service: main process exited, code=exited, status=1/FAILURE
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@7.service entered failed state.
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@7.service failed.
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@7.service holdoff time over, scheduling restart.
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: start request repeated too quickly for vcsreplicator@7.service
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: Failed to start Mirror Mercurial changes.
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@7.service entered failed state.
Jun 23 10:48:08 hgweb14.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@7.service failed.

Indeed, that directory does not exist. /repo/hg/mozilla/users/mantaroh_gmail.com/m-c/ is the only thing under /repo/hg/mozilla/users/mantaroh_gmail.com and it's empty.

I'm not sure how to remediate from here.
I've updated the documentation and escalated to :fubar
I managed to fix the m-c repo on the web heads, only to find that there was another broken one: central. So far, attempts to fix haven't worked. 

Mantaroh, would it be possible for you to recreate your 'central' repo? I've moved the original aside, just in case.
Flags: needinfo?(mantaroh)
Hi Kendall,

I cloned m-c to my repository in accordance with 'Read the Doc'. [1] However, ssh connection disconnected when I cloned m-c repository. 

[1] https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/managing-repos.html#when-to-use-a-user-repository

(In reply to Kendall Libby [:fubar] from comment #2)
> Mantaroh, would it be possible for you to recreate your 'central' repo? I've
> moved the original aside, just in case.
So I recreate 'central' repository which cloned from m-c repository?
Flags: needinfo?(mantaroh)
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #3)
> Hi Kendall,
> 
> I cloned m-c to my repository in accordance with 'Read the Doc'. [1]
> However, ssh connection disconnected when I cloned m-c repository. 

aha! good to know.

> (In reply to Kendall Libby [:fubar] from comment #2)
> So I recreate 'central' repository which cloned from m-c repository?

yes, please recreate your repo. 

if we run into another issue, and you are only making a fresh clone of m-c, I can do it manually on the ssh node, but I'd like to avoid that it possible.
See Also: → 1279367
(In reply to Kendall Libby [:fubar] from comment #4)
> yes, please recreate your repo. 
> 
> if we run into another issue, and you are only making a fresh clone of m-c,
> I can do it manually on the ssh node, but I'd like to avoid that it possible.
I try to clone m-c to my user repository 'central'. But the response doesn't reply. (execution time > 30min)
The cloned repository is https://hg.mozilla.org/users/mantaroh_gmail.com/central/.

And the console log is https://pastebin.mozilla.org/8879121 .
I was afraid that would be the case. I've manually cloned m-c to mantaroh_gmail.com/central/. please let me know if you run into any further problems with it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.