10:42 < nagios-sjc1> [81] hg2.build.scl1:Mercurial mirror sync is CRITICAL: SYNC PROBLEMS - 1 of 6 repositories out of sync

[root@hg2 mozilla-central]# pwd

[root@hg2 mozilla-central]# hg log|head
changeset:   76885:da2f5b63ba1e

/mozilla-central: {mirror_tip: da2f5b63ba1e, upstream_tip: c9479e3f6c54}

I've tried doing a manual push per https://mana.mozilla.org/wiki/display/SYSADMIN/Hg+Build+Network+Mirrors although this doesn't seem to help the situation.

Comment 1

7 years ago
the empty part is because it seems only the metadata is getting copied:

[root@hg2 mozilla-central]# ls -ahl
total 12K
drwxr-sr-x 3 hg   hg 4.0K Jul 19 16:32 .
drwxrwsr-x 7 root hg 4.0K Sep 13 11:08 ..
drwxr-sr-x 3 hg   hg 4.0K Sep 13 06:34 .hg

The servers aren't even using enough space to have a single shallow checkout.

[root@hg2 mozilla-central]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p2      24G  6.0G   17G  27% /
/dev/cciss/c0d0p4     627G  198M  595G   1% /data
/dev/cciss/c0d0p1     244M   19M  213M   8% /boot
tmpfs                 3.0G  4.0K  3.0G   1% /dev/shm
I changed the max_check_attempts to 6 in hopes that this will take care of the problem. It does not solve the root cause but it should dampen the alerts until it grows in severeity or is aleviated all together.
Ben can you take some time to look at the hg mirrors and see why they are not pulling actual data?

re https://mana.mozilla.org/wiki/display/SYSADMIN/Hg+Build+Network+Mirrors
Confirmed with the build guys, this is expected.  The "-U" option is passed when cloning so that no working-copy is cloned, only the internal hg representation.

The repo continually being out of sync is still a problem through.
It looks like there was a commit to the hghooks repo earlier today which has caused the repo to be broken. It's still out of sync. I think the repo is corrupted.

[root@hg1 hghooks]# hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
 mozhghooks/treeclosure_comm_central.py@121: a04b613baae3 in manifests not found
34 files, 126 changesets, 163 total revisions
1 integrity errors encountered!
(first damaged changeset appears to be 121)
[root@hg1 hghooks]# 

See bug #684460 where a commit was made earlier today

I talked with gozer about this in #build, and he said it was an issue he might have seen before to do with filesystem caching and serving mercurial over NFS.  I'm wondering if this is a problem on the side in sjc1 (nfs+dm-svn02) that is only appearing now that we have to do non-trivial cloning elsewhere.

It seems this is a problem with NFS serving hg, and since we never tried keeping a mirror up to date, we never encountered this issue before.  We can keep this information in mind when rebuilding the system in PHX.
