Several times a week we hit at error with hg such as: abort: data/talos-r3/master1.cfg.i@6f63c247e067: no match found! This happens when pulling / updating from hg.mozilla.org, and seems to happen at the same time when we're landing changes to the repo. It requires the slave to be manually cleaned up.
is it always with the same repo?
Assignee: server-ops → aravind
No, I've seen it on build/buildbot-configs, build/tools and mozilla-central at least.
The google hits I get ask to check the integrity of the repos in question. Since a subsequent clone works fine, I am assuming the server repo is fine. Next time you hit this issue, can you run a "hg verify" on the repo? Are these problems limited to any particular O.S?
I think I've hit an incarnation of this bug last night on the l10n dashboard. Here's what my code does: It asks json-pushes for the changesets it's having, runs and hg pull, and then asks the local repo for further details. The local clone didn't have a specific revision last night. Running the code again made things go fine, which leads me to expect that the repo for the pushes hook had newer data than the one I pulled from. Speculation on irc yesterday was that if you happen to switch from one server to the other during a pull, you could end up with file manifests for different versions. Sorry, didn't find the bug early enough to catch aravind's last request to run an hg verify, will do on the next occasion.
talos-r3-snow-016:tools cltbld$ hg verify checking changesets checking manifests crosschecking files in changesets and manifests checking files buildfarm/utils/generate-tpcomponent.py@654: d8515fbec4b4 in manifests not found 542 files, 655 changesets, 1521 total revisions 1 integrity errors encountered! (first damaged changeset appears to be 654)
The clone started at 06:38:26 and failed at 06:39:15. A change was pushed at 06:38:40. The revision 'd8515fbec4b4' doesn't exist in the repo AFAICT.
Any chance to get a tarball of that failed clone attached to the bug so that we can debug locally? PS: Likely the push of the 3.6.6 release config update.
@djc: any ideas here? I am kind of lost. catlee says it happens when the repo had a recent push. We do see some corruption on the local repo when this happens (see comment 6).
Every single one of the builds on my most recent push are burning, like: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1279029231.1279029324.9414.gz abort: data/thunderbird/l10n-thunderbird-changesets-3.1.i@add4593f45ee: no match found! (cloning buildbot-configs)
@djc: What can I provide for hg folks to troubleshoot this? We are having to turn off caching to cope with the problem, and even that doesn't make it go away. We still see this problem, only now its ephemeral.
17:47 < mpm> It smells like rsync or hook weirdness. 17:50 < mpm> Can we find out what versions they're running and how their servers interact? 17:52 < mpm> There are 3 ways this can happen: 17:52 < mpm> a) client has corruption and pushes it to server 17:53 < mpm> b) rsync or similar updates manifest before files 17:53 < mpm> c) rollback (ie due to failing hook) during pull with older hg 17:54 < mpm> Recent hg has a config option to check for (a) too.
The repo on the server isn't corrupted, 'hg verify' shows that, so I think that rules out (a). Our server is running hg 1.5.4 (bug 551015), so I don't think it's (c) (I know we've seen that in the past). I'm not sure how data gets from the backend servers to the webheads, so (b) seems plausible. Aravind?
mpm (mercurial dev) suggested adding sync,noac options to the nfs mount options, we did that and re-enabled caching on the buildbot-configs repos. Please comment here if you notice these problems again.
We're hosting hg repos on NFS? That just sounds like fail waiting to happen...
Aravind discussed this with mpm, and he said NFS should work fine, modulo some write ordering issues. He suggested adding sync to the NFS options and had confidence that it would fix the problem.
Please re-open if this continues to happen.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
This has reoccurred: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1280421897.1280422060.4525.gz updating working directory abort: data/thunderbird/l10nbuilds.ini.i@5d6f213899b2: no match found! (on buildbot-configs, of course)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
On a fresh pull of buildbot-configs on my local machine: mark-banners-macbook:~ mark$ hg clone http://hg.mozilla.org/build/buildbot-configs destination directory: buildbot-configs requesting all changes adding changesets adding manifests adding file changes added 2770 changesets with 6447 changes to 1357 files updating to branch default abort: data/thunderbird/l10nbuilds.ini.i@5d6f213899b2: no match found! Something needs poking at the hg end I believe.
Severity: minor → blocker
had a corrupt cache, clearing that fixed it.
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago → 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.