1241406 - No push user information recorded and push not showing on TreeHerder

Reporter

Description

•

8 years ago

I pushed to try and got this at the end of the message (after most, if not all, of the usual success info):

> error: changegroup.vcsreplicator hook raised an exception: ProduceResponse(topic='pushdata', partition=4, error=6, offset=241385)

I didn't get a Try email and it didn't show on TH so I pushed again[1] and the error message didn't appear but I still didn't a Try email nor did it show on TH.

When looking at [1] the push id, user, and date are all unknown so I think something is busted.

[1] https://hg.mozilla.org/try/rev/623a2dc067dc

Matthew N. [:MattN]

Reporter

Comment 1

•

8 years ago

The try push before mine was also missing push information: https://hg.mozilla.org/try/rev/7900096b4a12

Nigel Babu [:nigelb]

Comment 2

•

8 years ago

I see the same thing happening to tnikkel's push before Matt as well

https://hg.mozilla.org/try/rev/7900096b4a12


The push 2 pushes before is the last successful one

https://hg.mozilla.org/try/rev/4ee1e000746f

Something has broken in between these two commits.

Nigel Babu [:nigelb]

Comment 3

•

8 years ago

The last successful push was at 2016-01-21 06:48:45 UTC and the failures started from 2016-01-21 06:58:27.

Carsten Book [:Tomcat]

Comment 4

•

8 years ago

note that mozilla-inbound seems to have sync issues too - gps is on it

Gregory Szorc [:gps]

Comment 5

•

8 years ago

I resynced try and mozilla-inbound manually before playing it safe and scheduling a resync for all repos that have been pushed to in the past few days. That should finish up within minutes.

No data was lost AFAICT. The replication just didn't occur.

What I find strange is my IRC connection from people was killed (the irssi process died) around the same time this was reported. I suspect there was a larger network event that occurred.

glandium reported this from the irc backlog (i assume times are from Japan):

    15:53 <nagios-scl3> Wed 22:53:09 PST [5067] hgweb9.dmz.scl3.mozilla.com:Zookeeper - hg is WARNING: ENSEMBLE WARNING - node (hgssh1.dmz.scl3.mozilla.com) is alive but
                        not available (http://m.mozilla.org/Zookeeper+-+hg)
    15:53 <nagios-scl3> Wed 22:53:09 PST [5068] hgssh1.dmz.scl3.mozilla.com:Zookeeper - hg is WARNING: NODE CRITICAL - not in read/write mode: null
                        (http://m.mozilla.org/Zookeeper+-+hg)
    15:53 <nagios-scl3> Wed 22:53:19 PST [5069] hgweb6.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets:
                        OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:53 <nagios-scl3> Wed 22:53:19 PST [5070] hgweb9.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets:
                        OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:53 <nagios-scl3> Wed 22:53:19 PST [5071] hgweb10.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets:
                        OffsetResponse(topic=pushdata, partition=0, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:53 <nagios-scl3> Wed 22:53:19 PST [5072] hgweb1.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets:
                        OffsetResponse(topic=pushdata, partition=1, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:53 <nagios-scl3> Wed 22:53:19 PST [5073] hgweb7.dmz.scl3.mozilla.com:hg vcsreplicator lag is WARNING: WARNING - exception fetching offsets:
                        OffsetResponse(topic=pushdata, partition=0, error=6, offsets=()) (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5074] hgssh1.dmz.scl3.mozilla.com:Zookeeper - hg is OK: zookeeper node and ensemble OK (http://m.mozilla.org/Zookeeper+-+hg)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5075] hgweb9.dmz.scl3.mozilla.com:Zookeeper - hg is OK: zookeeper node and ensemble OK (http://m.mozilla.org/Zookeeper+-+hg)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5076] hgweb1.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
                        (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5077] hgweb10.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
                        (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5078] hgweb9.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
                        (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5079] hgweb6.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
                        (http://m.mozilla.org/hg+vcsreplicator+lag)
    15:54 <nagios-scl3> Wed 22:54:09 PST [5080] hgweb7.dmz.scl3.mozilla.com:hg vcsreplicator lag is OK: OK - 8/8 consumers completely in sync
                        (http://m.mozilla.org/hg+vcsreplicator+lag) 


error 6 matches what MattN posted.

Gregory Szorc [:gps]

Comment 6

•

8 years ago

Error 6 is NOT_LEADER_FOR_PARTITION.

Gregory Szorc [:gps]

Comment 7

•

8 years ago

We also encountered paragraph #2 of https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/replication.html#data-loss for the first time. I guess that's not a theoretical limitation any more. It was bound to happen sometime. *sigh*

Gregory Szorc [:gps]

Updated

•

8 years ago

Component: Mercurial: Pushlog → Mercurial: hg.mozilla.org

QA Contact: hwine

Hal Wine [:hwine] use NI!

Updated

•

7 years ago

QA Contact: hwine → klibby

Bugzilla

Quick Search

No push user information recorded and push not showing on TreeHerder

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

Tracking

(Not tracked)

People

(Reporter: MattN, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated