Closed
Bug 836014
Opened 11 years ago
Closed 11 years ago
save space on puppetagain /data
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
We have a bunch of files in the various repo mirrors that are duplicates of one another. For example: > dmitchell@releng-puppet1 /tmp $ find /data -name zd1211-firmware-1.4-5.fc15.noarch.rpm | xargs md5sum > d69bbecaca5191b089f1f30efe5fd8ab /data/repos/yum/mirrors/fedora/16/2012-03-07/releases/Everything/x86_64/os/Packages/zd1211-firmware-1.4-5.fc15.noarch.rpm > d69bbecaca5191b089f1f30efe5fd8ab /data/repos/yum/mirrors/fedora/16/2012-03-07/releases/Everything/i386/os/Packages/zd1211-firmware-1.4-5.fc15.noarch.rpm there are 28,545 such files. At a few MB apiece, that can save us ~50G! http://premium.caribe.net/~adrian2/fdupes.html seems to be the app to use for this, but it's horrendously slow. So I'd like to find a way to run it only after actually mirroring a repo, and then have rsync preserve the hard-links when syncing. Rsync's -H flag does so.
Reporter | ||
Comment 1•11 years ago
|
||
fdupes seems ridiculously slow, and only *finds* dupes - it doesn't symlink them (except in version 1.50, which isn't in RPM form). 'hardlink' does - http://pkgs.fedoraproject.org/cgit/hardlink.git/ - and it's in RPM form and already mirrored. It ain't especially fast, either, but what can you do. I'm running in on releng-puppet1.srv.releng.scl3.mozilla.com now, using 'time'. I'll post the results tomorrow.
Summary: save space on puppetagain /data with fdupes → save space on puppetagain /data
Reporter | ||
Comment 2•11 years ago
|
||
Well, that wasn't long, but it wasn't much savings, either: [root@releng-puppet1.srv.releng.scl3 data]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 342G 253G 72G 78% / tmpfs 1004M 0 1004M 0% /dev/shm /dev/sda1 97M 73M 19M 80% /boot [root@releng-puppet1.srv.releng.scl3 data]# time hardlink -v /data/repos/yum Directories 179 Objects 109395 IFREG 109210 Mmaps 6187 Comparisons 6186 Linked 6186 saved 4604907520 real 9m9.004s user 0m1.342s sys 0m7.526s [root@releng-puppet1.srv.releng.scl3 data]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 342G 249G 77G 77% / tmpfs 1004M 0 1004M 0% /dev/shm /dev/sda1 97M 73M 19M 80% /boot So 5g savings. The ubuntu repos are arranged by filename, so I don't expect a lot of savings there - but I'll run it to see.
Reporter | ||
Comment 3•11 years ago
|
||
That saved another 45M in 10m. Not worth it. I've adjusted the rsync's to use -H and updated the docs, but this doesn't save an appreciable amount of space, unfortunately.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•