Closed
Bug 1112262
Opened 10 years ago
Closed 10 years ago
https://github.com/mozilla/gecko-projects is out-of-sync with hg project repos due to no free inodes on disk of vcssync2.srv.releng.usw2.mozilla.com
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kgrandon, Unassigned)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4235] )
Attachments
(1 file)
126.55 KB,
image/png
|
Details |
We're currently using cypress for gecko feature work, and I'd like to be able to access it from git. I could not find a branch under gecko-dev, so I'm wondering if I missed it of it's tracked somewhere else.
I've found this, but it looks old, and I'm not sure if it's the correct repo to track: https://github.com/mozilla/gecko-projects/tree/cypress
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4235]
Comment 1•10 years ago
|
||
Correct - "disposable branches" (aka "twigs" & "project branches") are converted and pushed to the gecko-projects repository on github.
It does look as if there are problems with that mirror -- cypress was only recently returned as a normal twig, and may have been over looked.
Moving to the correct product & component for the git mirroring.
Component: Mercurial: hg.mozilla.org → Tools
OS: Mac OS X → All
Product: Developer Services → Release Engineering
Hardware: x86 → All
Comment 2•10 years ago
|
||
Pete: cypress is configured, but not converting - seems like debug is warranted before following reset (<https://wiki.mozilla.org/ReleaseEngineering/VCSSync/HowTo#How_to_deal_with_project_branch_reset>)
Flags: needinfo?(pmoore)
Comment 3•10 years ago
|
||
So this is quite bizarre for several reasons.
1) the project branches are not syncing due to the following error, and have not been since Dec 08:39 PT on 8 December 2014:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com vcs2vcs]$ cat /opt/vcs2vcs/projects.log
2014-12-17 05:43:01 pid-27007 Acquiring lock
2014-12-17 05:43:01 pid-27007 Updating mozharness
pulling from http://hg.mozilla.org/build/mozharness
abort: could not lock repository /opt/vcs2vcs/mozharness: No space left on device
However the disk seems fine:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com vcs2vcs]$ df -h /opt/vcs2vcs/mozharness/
Filesystem Size Used Avail Use% Mounted on
/dev/xvdj 99G 57G 37G 61% /opt
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com vcs2vcs]$
The run script is:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ cat /opt/vcs2vcs/run_projects.sh
#!/bin/bash
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# This file is managed by puppet
set -e
cd /opt/vcs2vcs
exec > projects.log 2>&1
function log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') pid-$$ $*"
}
log "Acquiring lock"
lockfile -s60 -r5 projects.lock
trap "rm -f $PWD/projects.lock" EXIT
# Get mozharness updated / checked out and working
log "Updating mozharness"
(timeout 20 hg --cwd mozharness pull -u)
log "Running hg_git.py"
python mozharness/scripts/vcs-sync/vcs_sync.py -c mozharness/configs/vcs_sync/project-branches.py
# Touch our timestamp file so nagios can check if we're fresh
touch projects.stamp
log "Done"
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$
So the curious parts are:
1) Why does the hg pull think there is "No space left on device" ?
2) Why are we not getting nagios alerts?
3) If no projects are syncing for a week, how come we haven't heard about this?
I can probably fix this by blowing away the mozharness checkout, and recloning, but it is not clear what caused this problem. I have also checked permissions are ok (i.e. mozharness is owned by vcs2vcs):
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ ls -ltrA /opt/vcs2vcs/
total 3750100
-rw-r----- 1 asasaki asasaki 3840072439 Oct 7 2013 initial3.tar.bz2
drwxr-xr-x 9 vcs2vcs vcs2vcs 4096 Oct 11 2013 git-mozharness
drwxr-x--- 6 vcs2vcs vcs2vcs 4096 Oct 11 2013 build
-rwxr-x--- 1 vcs2vcs vcs2vcs 789 Jan 7 2014 run_projects.sh
drwxrwxr-x 13 vcs2vcs vcs2vcs 4096 Dec 5 18:48 mozharness
-rw-r--r-- 1 vcs2vcs vcs2vcs 0 Dec 8 08:30 projects.stamp
drwxrwxr-x 2 vcs2vcs vcs2vcs 4096 Dec 8 08:31 logs
-rw-r--r-- 1 vcs2vcs vcs2vcs 229 Dec 17 05:48 projects.log
A manual update attempt resulted in the following:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ hg -R /opt/vcs2vcs/mozharness pull -u
pulling from http://hg.mozilla.org/build/mozharness
searching for changes
abort: No space left on device: /opt/vcs2vcs/mozharness/.hg/journal.dirstate
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$
Flags: needinfo?(pmoore)
Comment 4•10 years ago
|
||
A fresh clone exhibits the same problem:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ hg clone -r production http://hg.mozilla.org/build/mozharness /opt/vcs2vcs/mozharness
abort: No space left on device: /opt/vcs2vcs/mozharness/.hg
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ df -h /opt/vcs2vcs
Filesystem Size Used Avail Use% Mounted on
/dev/xvdj 99G 57G 37G 61% /opt
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$
:/
Comment 5•10 years ago
|
||
Thanks bhearsum for the tip! Out of inodes...
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ df -i /opt/vcs2vcs
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvdj 6553600 6553599 1 100% /opt
Comment 6•10 years ago
|
||
So possible solutions I see at the moment:
1) Rebuild partition with more inodes, and reinstall. It would make sense to do this in bug 927199 where this machine is being puppetised.
2) Blow stuff away that is no longer needed (no obvious candidates I see at the moment).
3) Split gecko-projects vcs sync jobs across other machines.
4) Shrink existing partition and create a new partition.
5) Since this is EC2 hosted, maybe attaching additional storage is possible.
Comment 7•10 years ago
|
||
OK I'm currently running a report to double check why we have so many small files / high inode usage, to check there is not some fundamental problem there, and then if all looks in order and so many inodes are really needed, I will create a new volume in usw2 with the same disk size (99GB) via:
https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#Volumes:sort=desc:createTime
with double the number of inodes (13107200) and I will rsync the the data across, and then swap out the old volume with the new one.
I've temporarily disabled the vcs sync cron job for gecko projects, until this is done. It hasn't run since December 8th due to this problem anyway.
Comment 8•10 years ago
|
||
It turns out to be genuine usage, e.g. the report identified directories with over 10,000 files, such as:
[vcs2vcs@vcssync2.srv.releng.usw2.mozilla.com ~]$ ls /opt/vcs2vcs/build/conversion/project-branches/.git/objects/67 | wc -l
10097
So I'll proceed as proposed above by migrating to a new volume with more inodes.
Comment 9•10 years ago
|
||
Top inode eaters:
10017 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/71
10021 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/00
10022 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/0c
10022 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/25
10036 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/bc
10060 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/2c
10099 /opt/vcs2vcs/build/conversion/project-branches/.git/objects/67
Updated•10 years ago
|
Summary: Git mirror for cypress branch → https://github.com/mozilla/gecko-projects is out-of-sync with hg project repos due to no free inodes on disk of vcssync2.srv.releng.usw2.mozilla.com
Comment 10•10 years ago
|
||
Created 100GB Volume vol-3ff24e2e (Magenetic, not encrypted) in us-west-2b and attached as /dev/sdg to instance i-b0d76287 (vcssync2.srv.releng.usw2.mozilla.com).
Comment 11•10 years ago
|
||
Comment 12•10 years ago
|
||
bash-4.1# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvde1 202:65 0 10G 0 disk /
xvdj 202:144 0 100G 0 disk /opt
xvdk 202:160 0 100G 0 disk
bash-4.1# mkfs.ext4 -N 13107200 /dev/xvdk
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
13107200 inodes, 26214400 blocks
1310720 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
800 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
bash-4.1#
Comment 13•10 years ago
|
||
bash-4.1# mkdir /opt_new
bash-4.1# ls -ltrd /opt*
drwxr-xr-x 4 root root 4096 Oct 11 2013 /opt
drwxr-xr-x 2 root root 4096 Dec 22 05:03 /opt_new
bash-4.1# sudo mount /dev/xvdk /opt_new/
bash-4.1# cat /etc/fstab
LABEL=root_dev / ext4 defaults,noatime 1 1
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
/dev/xvdj /opt ext4 defaults,noatime 1 2
bash-4.1# ls /opt_new/
lost+found
bash-4.1# rsync -gloptruc /opt /opt_new
Comment 14•10 years ago
|
||
It looks like this rsync process could take a couple of days...
It is running in a screen session as root.
Comment 15•10 years ago
|
||
The rsync has completed, and I remounted:
bash-4.1# mount -l
/dev/xvde1 on / type ext4 (rw,noatime) [root_dev]
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /dev/shm type tmpfs (rw)
/dev/xvdj on /opt type ext4 (rw,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/xvdk on /opt_new type ext4 (rw)
bash-4.1# umount /dev/xvdj
bash-4.1# umount /dev/xvdk
bash-4.1# cat /etc/fstab
LABEL=root_dev / ext4 defaults,noatime 1 1
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
/dev/xvdj /opt ext4 defaults,noatime 1 2
bash-4.1# sed -i 's/\/dev\/xvdj/\/dev\/xvdj/' /etc/fstab
bash-4.1# mount -a
bash-4.1# mount -l
/dev/xvde1 on / type ext4 (rw,noatime) [root_dev]
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/xvdk on /opt type ext4 (rw,noatime)
bash-4.1# cd /opt
bash-4.1# ls
lost+found opt
bash-4.1# mv opt/vcs2vcs/ .
bash-4.1# rm -rf opt
bash-4.1# ls -ltrA
total 20
drwxr-xr-x 6 vcs2vcs root 4096 Dec 17 05:53 vcs2vcs
drwx------ 2 root root 16384 Dec 22 03:12 lost+found
I also then su'd to vcs2vcs user, and reenabled crontab.
I'm now watching job to check all is ok.
Comment 16•10 years ago
|
||
I've also deleted /opt_new
Comment 17•10 years ago
|
||
All seems to be working.
I've detached and deleted the previous 100Gb volume, so now only the new volume is attached, and the old storage has been given back.
I'll close this bug when we've had a successful run, and all the project branches have been brought up-to-date.
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Assignee | ||
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•