Closed Bug 1567946 Opened 5 years ago Closed 5 years ago

all tree closed, changelogs broken or not updating, no ingestion possible by Treeherder

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Assigned: sheehan)

Details

https://hg.mozilla.org/try/ shows an empty changelog. Consequence: https://treeherder.mozilla.org/#/jobs?repo=try doesn't show more recent pushes which the server still accepts.

Flags: needinfo?(sheehan)
Summary: changelog of Try repository broken, no ingestion possible by Treeherder → changelog of central and Try repositories broken, no ingestion possible by Treeherder
Flags: needinfo?(larsberg)
Summary: changelog of central and Try repositories broken, no ingestion possible by Treeherder → all tree closed, changelogs broken or not updating, no ingestion possible by Treeherder

From IRC #vcs (timezone ET):

9:36 AM <hg-deploy-bot> Started deploy of revision 25c9bed28326 to hg.mozilla.org; previous 842df4a82218
10:32 AM <pulsebot> Check-in: https://hg.mozilla.org/hgcustom/version-control-tools/rev/83cdb0bd4dcf - Connor Sheehan - ansible/hg-ssh-server: change scm_allow_direct_push gid to 692 (Bug 1515119) r=glob
11:19 AM <hg-deploy-bot> Finished deploy of hooks and extensions to hg.mozilla.org

I'm looking now. This morning's deploy shouldn't have caused this issue but I doubt it's a coincidence.

Also, wrong Lars. :)

Assignee: nobody → sheehan
Flags: needinfo?(sheehan)
Flags: needinfo?(larsberg)

This should be fixed now. The problem here was related to some stale config in the recent deploy to hgmo. The replication system has a process that monitors all relevant Kafka consumer groups to hide changesets from public view until all mirrors have received the changeset. The purpose of this is to avoid having one mirror display a changeset (during an hg pull for example), while another mirror has not pulled down the new changeset yet. We use a file to track the relevant Kafka consumer groups. The names of the Kafka consumer groups are derived from the local hostnames for the given host. This is done during runs of Ansible playbooks (Ansible pulls the hostname into a "facts" object, which we use to specify a hostname).

I updated the hostnames locally on the hgweb mirrors, since the current hostname was simply the private IP address converted to a dash-separated string, and I wanted something more verbose. However I did not update the groups file, so the Ansible deployment this morning caused the Kafka consumer group name to be overwritten. Then the replication consistency process was waiting for the now-dead consumer group to acknowledge messages, which it would never do.

I've removed the bad Kafka group names from the file manually, and I'll be pushing the updated file to v-c-t shortly.

Sorry for the inconvenience!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.