Closed
Bug 574348
Opened 14 years ago
Closed 14 years ago
Occasional hg corruption (abort: foo.i@XXXXXXXXXX: no match found)
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: aravind)
References
Details
Several times a week we hit at error with hg such as: abort: data/talos-r3/master1.cfg.i@6f63c247e067: no match found! This happens when pulling / updating from hg.mozilla.org, and seems to happen at the same time when we're landing changes to the repo. It requires the slave to be manually cleaned up.
Reporter | ||
Comment 2•14 years ago
|
||
No, I've seen it on build/buildbot-configs, build/tools and mozilla-central at least.
Assignee | ||
Comment 3•14 years ago
|
||
The google hits I get ask to check the integrity of the repos in question. Since a subsequent clone works fine, I am assuming the server repo is fine. Next time you hit this issue, can you run a "hg verify" on the repo? Are these problems limited to any particular O.S?
Comment 4•14 years ago
|
||
I think I've hit an incarnation of this bug last night on the l10n dashboard. Here's what my code does: It asks json-pushes for the changesets it's having, runs and hg pull, and then asks the local repo for further details. The local clone didn't have a specific revision last night. Running the code again made things go fine, which leads me to expect that the repo for the pushes hook had newer data than the one I pulled from. Speculation on irc yesterday was that if you happen to switch from one server to the other during a pull, you could end up with file manifests for different versions. Sorry, didn't find the bug early enough to catch aravind's last request to run an hg verify, will do on the next occasion.
Comment 5•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1277473051.1277473367.32583.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1277473079.1277473367.32584.gz
Reporter | ||
Comment 6•14 years ago
|
||
talos-r3-snow-016:tools cltbld$ hg verify checking changesets checking manifests crosschecking files in changesets and manifests checking files buildfarm/utils/generate-tpcomponent.py@654: d8515fbec4b4 in manifests not found 542 files, 655 changesets, 1521 total revisions 1 integrity errors encountered! (first damaged changeset appears to be 654)
Reporter | ||
Comment 7•14 years ago
|
||
The clone started at 06:38:26 and failed at 06:39:15. A change was pushed at 06:38:40. The revision 'd8515fbec4b4' doesn't exist in the repo AFAICT.
Comment 8•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1277529170.1277529227.24180.gz
Comment 9•14 years ago
|
||
Any chance to get a tarball of that failed clone attached to the bug so that we can debug locally? PS: Likely the push of the 3.6.6 release config update.
Assignee | ||
Comment 10•14 years ago
|
||
@djc: any ideas here? I am kind of lost. catlee says it happens when the repo had a recent push. We do see some corruption on the local repo when this happens (see comment 6).
Comment 11•14 years ago
|
||
Every single one of the builds on my most recent push are burning, like: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1279029231.1279029324.9414.gz abort: data/thunderbird/l10n-thunderbird-changesets-3.1.i@add4593f45ee: no match found! (cloning buildbot-configs)
Assignee | ||
Comment 13•14 years ago
|
||
@djc: What can I provide for hg folks to troubleshoot this? We are having to turn off caching to cope with the problem, and even that doesn't make it go away. We still see this problem, only now its ephemeral.
Comment 14•14 years ago
|
||
17:47 < mpm> It smells like rsync or hook weirdness. 17:50 < mpm> Can we find out what versions they're running and how their servers interact? 17:52 < mpm> There are 3 ways this can happen: 17:52 < mpm> a) client has corruption and pushes it to server 17:53 < mpm> b) rsync or similar updates manifest before files 17:53 < mpm> c) rollback (ie due to failing hook) during pull with older hg 17:54 < mpm> Recent hg has a config option to check for (a) too.
Comment 15•14 years ago
|
||
The repo on the server isn't corrupted, 'hg verify' shows that, so I think that rules out (a). Our server is running hg 1.5.4 (bug 551015), so I don't think it's (c) (I know we've seen that in the past). I'm not sure how data gets from the backend servers to the webheads, so (b) seems plausible. Aravind?
Assignee | ||
Comment 16•14 years ago
|
||
mpm (mercurial dev) suggested adding sync,noac options to the nfs mount options, we did that and re-enabled caching on the buildbot-configs repos. Please comment here if you notice these problems again.
Reporter | ||
Comment 17•14 years ago
|
||
We're hosting hg repos on NFS? That just sounds like fail waiting to happen...
Comment 18•14 years ago
|
||
Aravind discussed this with mpm, and he said NFS should work fine, modulo some write ordering issues. He suggested adding sync to the NFS options and had confidence that it would fix the problem.
Assignee | ||
Comment 19•14 years ago
|
||
Please re-open if this continues to happen.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 20•14 years ago
|
||
This has reoccurred: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1280421897.1280422060.4525.gz updating working directory abort: data/thunderbird/l10nbuilds.ini.i@5d6f213899b2: no match found! (on buildbot-configs, of course)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 21•14 years ago
|
||
On a fresh pull of buildbot-configs on my local machine: mark-banners-macbook:~ mark$ hg clone http://hg.mozilla.org/build/buildbot-configs destination directory: buildbot-configs requesting all changes adding changesets adding manifests adding file changes added 2770 changesets with 6447 changes to 1357 files updating to branch default abort: data/thunderbird/l10nbuilds.ini.i@5d6f213899b2: no match found! Something needs poking at the hg end I believe.
Severity: minor → blocker
Assignee | ||
Comment 22•14 years ago
|
||
had a corrupt cache, clearing that fixed it.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•