Closed
Bug 1288605
Opened 9 years ago
Closed 9 years ago
vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync
Categories
(Infrastructure & Operations :: MOC: Problems, task)
Infrastructure & Operations
MOC: Problems
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mdevney, Unassigned)
Details
23:43 <@nagios-scl3> Thu 20:43:48 PDT [5534] hgweb12.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args
vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer)
23:44 < jedi> That's new.
23:44 <@nagios-scl3> Thu 20:44:08 PDT [5537] hgweb11.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args
vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer)
23:44 <@nagios-scl3> Thu 20:44:17 PDT [5540] hgweb13.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args
vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer)
23:44 <@nagios-scl3> Thu 20:44:18 PDT [5543] hgweb14.dmz.scl3.mozilla.com:procs - hg vcsreplicator consumer is CRITICAL: PROCS CRITICAL: 7 processes with regex args
vcsreplicator-consumer (http://m.mozilla.org/procs+-+hg+vcsreplicator+consumer)
23:45 <@nagios-scl3> Thu 20:45:18 PDT [5546] hgweb12.dmz.scl3.mozilla.com:hg vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync
(http://m.mozilla.org/hg+vcsreplicator+lag)
● vcsreplicator@6.service - Mirror Mercurial changes
Loaded: loaded (/etc/systemd/system/vcsreplicator@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2016-07-22 01:42:17 UTC; 2h 2min ago
Process: 28279 ExecStart=/var/hg/venv_replication/bin/vcsreplicator-consumer /etc/mercurial/vcsreplicator.ini --partition %i (code=exited, status=1/FAILURE)
Main PID: 28279 (code=exited, status=1/FAILURE)
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state.
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed.
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service holdoff time over, scheduling restart.
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: start request repeated too quickly for vcsreplicator@6.service
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Failed to start Mirror Mercurial changes.
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state.
Jul 22 01:42:17 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed.
[root@hgweb12.dmz.scl3 ~]# journalctl -f --unit vcsreplicator@6.service
-- Logs begin at Thu 2016-07-21 08:12:44 UTC. --
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com vcsreplicator[16189]: with open(p, 'wb') as fh:
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com vcsreplicator[16189]: IOError: [Errno 2] No such file or directory: u'/repo/hg/mozilla/users/mikokm_gmail.com/mozilla-central/.hg/hgrc'
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service: main process exited, code=exited, status=1/FAILURE
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state.
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed.
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service holdoff time over, scheduling restart.
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: start request repeated too quickly for vcsreplicator@6.service
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Failed to start Mirror Mercurial changes.
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: Unit vcsreplicator@6.service entered failed state.
Jul 22 03:47:15 hgweb12.dmz.scl3.mozilla.com systemd[1]: vcsreplicator@6.service failed.
Restarting the service doesn't seem to clear the error. Escalating to developer services (gps).
| Reporter | ||
Comment 1•9 years ago
|
||
00:21 <~fubar> sal: jedi: looking
00:21 < jedi> :)
00:21 < sal> ty!
00:21 < jedi> https://bugzilla.mozilla.org/show_bug.cgi?id=1288605
00:21 < firebot> Bug 1288605 — NEW, nobody@mozilla.org — vcsreplicator lag is CRITICAL: CRITICAL - 2/8 partitions out of sync
00:24 <~fubar> jedi: did the right thing, someone's user repo is messed up so I have to fix/resync
00:24 < jedi> oh yay
00:31 <~fubar> first I have to remmeber how to remove things from the replication queue...
00:32 < jedi> I was gonna ask, is this something that can be taught to MOC so you don't have to get paged?
00:32 < jedi> But, if you have to figure it out... ;)
00:33 < gps> fubar: /var/hg/venv_replication/bin/vcsreplicator-consumer /etc/mercurial/vcsreplicator.ini --skip --partition <N>
00:33 < gps> https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/ops.html#remediation-to-consumer-lag
00:34 < gps> which user repo is it?
00:34 <~fubar> gps: users/mikokm_gmail.com/mozilla-central
00:34 < gps> that's weird
00:35 <~fubar> looks like they tried to delete that and their m-c repo, from hgssh4:/var/log/messages
00:35 < gps> oh, hmmm
00:36 < gps> we don't handle deletes in the replication log. but as long as the delete finishes from pash by the time the repo is created again, it should just work
00:36 < gps> if deletes from pash aren't working, that could cause problems
00:36 < gps> i wouldn't be surprised if they were broken
00:37 < nagios-scl3> Thu 21:37:43 PDT [5572] hgweb11.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
| Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•8 years ago
|
Component: MOC: Incidents → MOC: Problems
You need to log in
before you can comment on or make changes to this bug.
Description
•