Closed
Bug 716441
Opened 13 years ago
Closed 13 years ago
Disk usage report problem on 10.253.0.10:/vol/ftp_stage (mpt-netapp-a)
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: dparsons)
References
Details
Attachments
(1 file)
8.04 KB,
image/png
|
Details |
If I create a 1G file then delete it again, the free space does not recover. Eg on surf:
$ cd /mnt/netapp/stage
$ df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
10.253.0.10:/vol/ftp_stage
6291457 5714983 576474 91% /mnt/netapp/stage
$ dd if=/dev/zero of=testfile bs=1K count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 20.5196 seconds, 52.3 MB/s
$ df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
10.253.0.10:/vol/ftp_stage
6291457 5716018 575439 91% /mnt/netapp/stage
$ rm testfile
rm: remove regular file `testfile'? y
$ df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
10.253.0.10:/vol/ftp_stage
6291457 5715988 575469 91% /mnt/netapp/stage
The 30M improvement in free space is likely due to other changes happening on this partition, but I was expecting 1G's worth.
I've noticed the same problem deleting many files in bug 715840 - a steady deletion of more than 300G of files didn't result in any change in the free space reported by df. However, some hours/days later the free space did jump up by about the right value. Nagios has had quite a bit to say about CPU usage on this netapp too. Are there known issues with it right now ?
Reporter | ||
Comment 1•13 years ago
|
||
FWIW, bug 715706 recently added a bind mount from another NFS mount into that share, and bug 715026 tracks potential corruption of the / partition on surf.
Reporter | ||
Comment 2•13 years ago
|
||
Trending is at http://people.mozilla.com/~nthomas/trend-recent.png, where "Everything else" is /mnt/netapp/stage. There seems to be an upward going spike every Pacific midnight, is this from data-deduplication or is the partition too big for that ?
Also, I can't reproduce the issue on mpt-netapp-b using dd.
Assignee | ||
Comment 3•13 years ago
|
||
Dedupe starts at 12AM Pacific time and is usually finished before 1AM.
Comment 4•13 years ago
|
||
FWIW, I've experienced this type of thing in the past as well- not specific to any particular share or netapp unit, or even netapp in general.
I think what's happening is that the delete operation is "succeeding" on the client far more quickly than the operation actually happens- hence you see a slow regain of the used space. "df" has no way of knowing about this, so it reports what happens to be free right at that time.
I don't understand why this would happen with a single 1GB file as you're example illustrates, which generally delete very quickly. I think it does explain why deleting many files would take hours or days to fully regain all of the space.
We do have some known load issues on certain NetApp filers. I don't know off-hand if this is one of the affected ones or how severe it is affected, but I suspect there are some issues.
Almost all NetApp-related performance issues (at least in SJC1, where surf is) should be resolved in Q1 or early-Q2 as we are moving to a new datacenter which has newer, much more powerful NetApp units. Way more CPU horsepower, IOPS capacity, and network bandwidth.
If that's not soon enough, we'll need to dig further into this for a cause and treatment options.
Updated•13 years ago
|
Assignee: server-ops → dparsons
Reporter | ||
Comment 5•13 years ago
|
||
We're on a track to filling the partition, and from here it seems that either something is wrong with the netapp, or there are things going on behind the curtain which make it very difficult for me to keep nagios happy.
Backstory, for the last couple of days nagios has been WARNING or CRITICAL on surf:disk - /mnt/netapp/stage, which is mpt-netapp-a:/vol/ftp_stage. Recently we had > 550G free on this partition after granting it more space, and moving some bits off it.
On Jan 14 the space drops suddenly,
Jan 14 12:15:01 2012 561438
Jan 14 12:45:01 2012 591861
Jan 14 13:15:01 2012 590922
Jan 14 13:45:01 2012 589807
Jan 14 14:15:01 2012 400940
Jan 14 14:45:01 2012 587656
Jan 14 15:15:01 2012 586672
Jan 14 15:45:02 2012 586319
Jan 14 16:15:01 2012 587198
Jan 14 16:45:01 2012 588884
Jan 14 17:15:01 2012 588500
Jan 14 17:45:01 2012 115746
Jan 14 18:15:01 2012 115261
Jan 14 18:45:01 2012 125088
Jan 14 19:15:01 2012 273832
Jan 14 19:45:01 2012 321371
Jan 14 20:15:01 2012 319750
Jan 14 20:45:02 2012 318705
Jan 14 21:15:01 2012 317690
(Pacific times, free space in MB from df). The increase at 19:15 & 45 corresponds to a 200G snapshot getting deleted, IIRC.
Currently we're down 115G free, and don't see any improvement when deleting files until the next dedupe runs at midnight. I am looking for recent additions of big files, but I'm pretty sure something is happening on the netapp too.
Reporter | ||
Comment 6•13 years ago
|
||
From IRC on the 19th, lerxst says the space was taken up in 313GB of snapshots, which were automatically created by the netapp to allow files to be restored to earlier states. Those snapshots have now been removed, and new ones disabled. Thanks!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•