cloning hg.m.o/users/prepr-ffxbld/mozilla-2.0 taking over 4 hours, still not finished

RESOLVED DUPLICATE of bug 661828

Status

task
--
critical
RESOLVED DUPLICATE of bug 661828
8 years ago
4 years ago

People

(Reporter: aki, Assigned: nmeyerhans)

Tracking

Details

Reporter

Description

8 years ago
I'm trying to run a preproduction release to make sure bug 557260 doesn't break Firefox releases when it lands Tuesday morning.

The first part of this is cloning a bunch of user repos in prepr-ffxbld.
This has been timing out (>3600 seconds) several times since Friday afternoon.

Today I decided to stop using the buildbot automation and do it manually:

[cltbld@moz2-linux-slave51 build]$ ssh -l prepr-ffxbld -oIdentityFile=~cltbld/.ssh/ffxbld_dsa hg.mozilla.org clone mozilla-2.0 releases/mozilla-2.0
Please wait.  Cloning /releases/mozilla-2.0 to /users/prepr-ffxbld/mozilla-2.0

This has been running between 4-5 hours and still hasn't completed.
Reporter

Comment 1

8 years ago
Determining and fixing the root cause would be ideal.
A short term workaround would be cloning that user repo for me, at which time we can lower the priority on this bug.
Reporter

Updated

8 years ago
Blocks: 557260
Reporter

Comment 2

8 years ago
The clone finished overnight.

I went afk around 3:30am-ish PDT and I believe it wasn't done by that point.
I'd love to know why the clone took over 9 hours ?
Severity: major → normal

Updated

8 years ago
Assignee: server-ops → nmeyerhans
Reporter

Comment 3

8 years ago
Raising priority, as this (and bug 661828, which is probably a dup) is killing our ability to quickly port+test mobile releases, which we're trying to do by Friday.
Severity: normal → critical
Reporter

Comment 4

8 years ago
Rail says this first started 3-4 weeks ago, and this doesn't seem to have resolved by itself over that time.  Hoping that regression window is helpful.
Assignee

Comment 5

8 years ago
We've found a likely culprit: disk contention due to filesystem backups.  It seems that backups weren't being made roughly 4-5 weeks ago due to a hardware failure affecting the backup host. The hardware was repaired roughly 3 to 4 weeks ago.  Backups are apparently scheduled to start at 1 AM Pacific time and have recently been taking >7 hours to complete.

We've cancelled the currently running backup job, which should free up IO capacity to let any current hg operations complete in a reasonable amount of time.  We've got to revisit how we back up these filesystems, though.  I suspect that if we can find a time window when releng isn't making heavy use of hg, we can at least reschedule the backups to run during that time.
Reporter

Comment 6

8 years ago
duping forward since all the action is on bug 661828.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 661828
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.