Open
Bug 1241406
Opened 8 years ago
Updated 7 years ago
No push user information recorded and push not showing on TreeHerder
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
NEW
People
(Reporter: MattN, Unassigned)
Details
I pushed to try and got this at the end of the message (after most, if not all, of the usual success info): > error: changegroup.vcsreplicator hook raised an exception: ProduceResponse(topic='pushdata', partition=4, error=6, offset=241385) I didn't get a Try email and it didn't show on TH so I pushed again[1] and the error message didn't appear but I still didn't a Try email nor did it show on TH. When looking at [1] the push id, user, and date are all unknown so I think something is busted. [1] https://hg.mozilla.org/try/rev/623a2dc067dc
Reporter | ||
Comment 1•8 years ago
|
||
The try push before mine was also missing push information: https://hg.mozilla.org/try/rev/7900096b4a12
Comment 2•8 years ago
|
||
I see the same thing happening to tnikkel's push before Matt as well https://hg.mozilla.org/try/rev/7900096b4a12 The push 2 pushes before is the last successful one https://hg.mozilla.org/try/rev/4ee1e000746f Something has broken in between these two commits.
Comment 3•8 years ago
|
||
The last successful push was at 2016-01-21 06:48:45 UTC and the failures started from 2016-01-21 06:58:27.
Comment 4•8 years ago
|
||
note that mozilla-inbound seems to have sync issues too - gps is on it
Comment 5•8 years ago
|
||
I resynced try and mozilla-inbound manually before playing it safe and scheduling a resync for all repos that have been pushed to in the past few days. That should finish up within minutes. No data was lost AFAICT. The replication just didn't occur. What I find strange is my IRC connection from people was killed (the irssi process died) around the same time this was reported. I suspect there was a larger network event that occurred. glandium reported this from the irc backlog (i assume times are from Japan): 15:53 <nagios-scl3> Wed 22:53:09 PST [5067] hgweb9.dmz.scl3.mozilla.com:Zookeeper - hg is WARNING: ENSEMBLE WARNING - node (hgssh1.dmz.scl3.mozilla.com) is alive but not available (http://m.mozilla.org/Zookeeper+-+hg) 15:53 <nagios-scl3> Wed 22:53:09 PST [5068] hgssh1.dmz.scl3.mozilla.com:Zookeeper - hg is WARNING: NODE CRITICAL - not in read/write mode: null (http://m.mozilla.org/Zookeeper+-+hg) 15:53 <nagios-scl3> Wed 22:53:19 PST [5069] hgweb6.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets: OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag) 15:53 <nagios-scl3> Wed 22:53:19 PST [5070] hgweb9.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets: OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag) 15:53 <nagios-scl3> Wed 22:53:19 PST [5071] hgweb10.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets: OffsetResponse(topic=pushdata, partition=0, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag) 15:53 <nagios-scl3> Wed 22:53:19 PST [5072] hgweb1.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets: OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag) 15:53 <nagios-scl3> Wed 22:53:19 PST [5073] hgweb7.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets: OffsetResponse(topic=pushdata, partition=0, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag) 15:54 <nagios-scl3> Wed 22:54:09 PST [5074] hgssh1.dmz.scl3.mozilla.com:Zookeeper - hg is OK: zookeeper node and ensemble OK (http://m.mozilla.org/Zookeeper+-+hg) 15:54 <nagios-scl3> Wed 22:54:09 PST [5075] hgweb9.dmz.scl3.mozilla.com:Zookeeper - hg is OK: zookeeper node and ensemble OK (http://m.mozilla.org/Zookeeper+-+hg) 15:54 <nagios-scl3> Wed 22:54:09 PST [5076] hgweb1.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync (http://m.mozilla.org/hg+vcsreplicator+lag) 15:54 <nagios-scl3> Wed 22:54:09 PST [5077] hgweb10.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync (http://m.mozilla.org/hg+vcsreplicator+lag) 15:54 <nagios-scl3> Wed 22:54:09 PST [5078] hgweb9.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync (http://m.mozilla.org/hg+vcsreplicator+lag) 15:54 <nagios-scl3> Wed 22:54:09 PST [5079] hgweb6.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync (http://m.mozilla.org/hg+vcsreplicator+lag) 15:54 <nagios-scl3> Wed 22:54:09 PST [5080] hgweb7.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync (http://m.mozilla.org/hg+vcsreplicator+lag) error 6 matches what MattN posted.
Comment 6•8 years ago
|
||
Error 6 is NOT_LEADER_FOR_PARTITION.
Comment 7•8 years ago
|
||
We also encountered paragraph #2 of https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/replication.html#data-loss for the first time. I guess that's not a theoretical limitation any more. It was bound to happen sometime. *sigh*
Updated•8 years ago
|
Component: Mercurial: Pushlog → Mercurial: hg.mozilla.org
QA Contact: hwine
Updated•7 years ago
|
QA Contact: hwine → klibby
You need to log in
before you can comment on or make changes to this bug.
Description
•