Closed Bug 1705658 Opened 3 years ago Closed 3 years ago

internal hgmirror in us-east-1 doesn't handle recent revisions which causes busted tasks: abort: unknown revision 'e51aaee17ca61796f9aa82ba502241f5ec165a39'!

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Assigned: sheehan)

Details

Code sheriffs reported several machines in the us-east-1 data center failing tasks, e.g. this one - see also the AC(rt) ones for the push.

Log: https://treeherder.mozilla.org/logviewer?job_id=336758727&repo=try&lineNumber=27

[setup 2021-04-16T07:20:02.316Z] running as worker:worker
[vcs 2021-04-16T07:20:02.316Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2021-04-16T07:20:02.411Z] hgmointernal rate hit; cloning from private hgweb mirror
[vcs 2021-04-16T07:20:02.411Z] fetching hg.mozilla.org fingerprint from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgfingerprint
[vcs 2021-04-16T07:20:02.483Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--config', 'hostsecurity.hg.mozilla.org:fingerprints=sha256:FF:E7:8D:93:E9:56:3C:C0:19:FC:00:4C:18:B9:86:E5:08:E5:10:F5:E2:EA:48:E8:22:D3:A3:3A:CA:99:C3:4C,sha256:17:38:aa:92:0b:84:3e:aa:8e:52:52:e9:4c:2f:98:a9:0e:bf:6c:3e:e9:15:ff:0a:29:80:f7:06:02:5b:e8:48', '--upstream', 'https://us-east-1.hgmointernal.net/mozilla-unified', '--revision', 'e51aaee17ca61796f9aa82ba502241f5ec165a39', 'https://us-east-1.hgmointernal.net/try', '/builds/worker/checkouts/gecko']
[vcs 2021-04-16T07:20:02.536Z] (using Mercurial 4.8.2)
[vcs 2021-04-16T07:20:02.536Z] ensuring https://us-east-1.hgmointernal.net/try@e51aaee17ca61796f9aa82ba502241f5ec165a39 is available at /builds/worker/checkouts/gecko
[vcs 2021-04-16T07:20:02.536Z] (existing repository shared store: /builds/worker/checkouts/hg-store/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29/.hg)
[vcs 2021-04-16T07:20:03.204Z] (pulling to obtain e51aaee17ca61796f9aa82ba502241f5ec165a39)
[vcs 2021-04-16T07:20:06.289Z] PERFHERDER_DATA: {"framework": {"name": "vcs"}, "suites": [{"extraOptions": ["c5n.4xlarge"], "hgVersion": "4.8.2", "lowerIsBetter": true, "name": "overall", "serverUrl": "us-east-1.hgmointernal.net", "shouldAlert": false, "subtests": [], "value": 3.751431941986084}, {"extraOptions": ["c5n.4xlarge"], "hgVersion": "4.8.2", "lowerIsBetter": true, "name": "overall_nopull", "serverUrl": "us-east-1.hgmointernal.net", "shouldAlert": false, "subtests": [], "value": 3.751431941986084}, {"extraOptions": ["c5n.4xlarge"], "hgVersion": "4.8.2", "lowerIsBetter": true, "name": "overall_nopull_fullcheckout", "serverUrl": "us-east-1.hgmointernal.net", "shouldAlert": false, "subtests": [], "value": 3.751431941986084}, {"extraOptions": ["c5n.4xlarge"], "hgVersion": "4.8.2", "lowerIsBetter": true, "name": "overall_nopull_populatedwdir", "serverUrl": "us-east-1.hgmointernal.net", "shouldAlert": false, "subtests": [], "value": 3.751431941986084}]}
[vcs 2021-04-16T07:20:06.289Z] abort: unknown revision 'e51aaee17ca61796f9aa82ba502241f5ec165a39'!
[taskcluster 2021-04-16 07:20:07.543Z] === Task Finished ===

This could affect any task cloning from the repository. At the moment the Try tree is closed and developers cannot test. It could also hit production trees anytime.

Flags: needinfo?(sheehan)

Try had to be closed for these frequent task failures.

I've taken the hosts in us-east-1 out of service for now while I investigate. Trees should be okay to re-open now.

It appears there is a corruption of data on try on one of the hosts, that isn't recovering via autorecover.

Assignee: nobody → sheehan
Flags: needinfo?(sheehan) → needinfo?(aryx.bugmail)

Thank you, Try has been reopened and I will keep an eye on it for the next 30 minutes.

Flags: needinfo?(aryx.bugmail)
Summary: TREES CLOSED: internal hgmirror in us-east-1 doesn't handle recent revisions which causes busted tasks: abort: unknown revision 'e51aaee17ca61796f9aa82ba502241f5ec165a39'! → internal hgmirror in us-east-1 doesn't handle recent revisions which causes busted tasks: abort: unknown revision 'e51aaee17ca61796f9aa82ba502241f5ec165a39'!
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.