Closed Bug 1288605 Opened 9 years ago Closed 9 years ago

vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync

Categories

(Infrastructure & Operations :: MOC: Problems, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdevney, Unassigned)

Details

23:43 <@nagios-scl3> Thu 20:43:48 PDT [5534] hgweb12.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer) 23:44 < jedi> That's new. 23:44 <@nagios-scl3> Thu 20:44:08 PDT [5537] hgweb11.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer) 23:44 <@nagios-scl3> Thu 20:44:17 PDT [5540] hgweb13.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer) 23:44 <@nagios-scl3> Thu 20:44:18 PDT [5543] hgweb14.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer) 23:45 <@nagios-scl3> Thu 20:45:18 PDT [5546] hgweb12.dmz.scl3.mozilla.com:hg vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync (http://m.mozilla.org/hg+vcsreplicator+lag) ● vcsreplicator@6.service - Mirror Mercurial changes Loaded: loaded (/etc/systemd/system/vcsreplicator@.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Fri 2016-07-22 01:42:17 UTC; 2h 2min ago Process: 28279 ExecStart=/var/hg/venv_replication/bin/vcsreplicator-consumer /etc/mercurial/vcsreplicator.ini --partition %i (code=exited, status=1/FAILURE) Main PID: 28279 (code=exited, status=1/FAILURE) Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state. Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed. Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service holdoff time over, scheduling restart. Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: start request repeated too quickly for vcsreplicator@6.service Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Failed to start Mirror Mercurial changes. Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state. Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed. [root@hgweb12.dmz.scl3 ~]# journalctl -f --unit vcsreplicator@6.service -- Logs begin at Thu 2016-07-21 08:12:44 UTC. -- Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com vcsreplicator[16189]: with open(p, 'wb') as fh: Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com vcsreplicator[16189]: IOError: [Errno 2] No such file or directory: u'/repo/hg/mozilla/users/mikokm_gmail.com/mozilla-central/.hg/hgrc' Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service: main process exited, code=exited, status=1/FAILURE Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state. Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed. Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service holdoff time over, scheduling restart. Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: start request repeated too quickly for vcsreplicator@6.service Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Failed to start Mirror Mercurial changes. Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state. Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed. Restarting the service doesn't seem to clear the error. Escalating to developer services (gps).
00:21 <~fubar> sal: jedi: looking 00:21 < jedi> :) 00:21 < sal> ty! 00:21 < jedi> https://bugzilla.mozilla.org/show_bug.cgi?id=1288605 00:21 < firebot> Bug 1288605 — NEW, nobody@mozilla.org — vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync 00:24 <~fubar> jedi: did the right thing, someone's user repo is messed up so I have to fix/resync 00:24 < jedi> oh yay 00:31 <~fubar> first I have to remmeber how to remove things from the replication queue... 00:32 < jedi> I was gonna ask, is this something that can be taught to MOC so you don't have to get paged? 00:32 < jedi> But, if you have to figure it out... ;) 00:33 < gps> fubar: /var/hg/venv_replication/bin/vcsreplicator-consumer /etc/mercurial/vcsreplicator.ini --skip --partition <N> 00:33 < gps> https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/ops.html#remediation-to-consumer-lag 00:34 < gps> which user repo is it? 00:34 <~fubar> gps: users/mikokm_gmail.com/mozilla-central 00:34 < gps> that's weird 00:35 <~fubar> looks like they tried to delete that and their m-c repo, from hgssh4:/var/log/messages 00:35 < gps> oh, hmmm 00:36 < gps> we don't handle deletes in the replication log. but as long as the delete finishes from pash by the time the repo is created again, it should just work 00:36 < gps> if deletes from pash aren't working, that could cause problems 00:36 < gps> i wouldn't be surprised if they were broken 00:37 < nagios-scl3> Thu 21:37:43 PDT [5572] hgweb11.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: MOC: Incidents → MOC: Problems
You need to log in before you can comment on or make changes to this bug.