Closed
Bug 939600
Opened 12 years ago
Closed 11 years ago
developeradm.private.scl3.mozilla.com using too much swap
Categories
(Infrastructure & Operations :: Virtualization, task, P4)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ericz, Assigned: nmaul)
Details
(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/201] )
developeradm.private.scl3.mozilla.com has alerted twice today for using more than half of swap. The first time I restarted Zamboni dashboard which cleared it up but only for a few hours. Other services don't appear to be much using memory, and there is plenty of free memory but something is still using swap.
Reporter | ||
Comment 1•12 years ago
|
||
developeradm has been flapping all day on swap. There is a suspiciously long-running wget:
29085 apache 20 0 783M 441M 1288 S 0.7 14.6 26:41.49 wget -q -m -p -k -E -T 5 -t 3 -R mov,ogv,mp4,gz,bz2,zip,exe,download,flag*,login*,*\$history,*\$json -D developer.mozilla.org -X */profiles
which is the top memory user. Should that be going that long?
Comment 2•12 years ago
|
||
No, that wget looks like a problem - I've killed it. It's part of a small job that downloads a bunch of assets and then stuffs them into a tarball on the netapp. Not sure what the deal is, but it's clearly errant behaviour, so I've killed off the process chain. As a follow-up, I cleared the memory cache and re-initialised swap.
Assignee: server-ops-webops → dmaher
Status: NEW → RESOLVED
Closed: 12 years ago
Component: WebOps: IT-Managed Tools → WebOps: Community Platform
Priority: -- → P4
Resolution: --- → FIXED
Comment 3•12 years ago
|
||
This has been flapping on swap again for days through the holiday and is clearly not resolved.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•12 years ago
|
||
apache 3478 2.5 22.4 1330244 692164 ? S Dec29 42:41 wget -q -m -p -k -E -T 5 -t 3 -R mov,ogv,mp4,gz,bz2,zip,exe,download,flag*,login*,*\$history,*\$json -D developer.mozilla.org -X */profiles -np https://developer.mozilla.org/en-US/
wget should not be running that long. Needs a timeout and otherwise fixing.
Comment 5•12 years ago
|
||
developeradm.private.scl3 paged for swap again at least twice more this week. No longrunning wget processes visible, just python and nodejs stuff.
Comment 6•12 years ago
|
||
Paged again.
<nagios-scl3:#sysadmins> Fri 08:38:41 PST [5877]
developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 38%
free (2300 MB out of 6143 MB) (http://m.allizom.org/Swap)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2166 apache 20 0 3012m 1.1g 924 D 0.3 37.2 0:32.76 python2.6
2161 apache 20 0 2752m 984m 952 D 0.3 32.6 0:32.05 python2.6
Comment 7•12 years ago
|
||
<nagios-scl3:#sysadmins> Fri 09:40:03 PST [5888]
developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 37%
free (2229 MB out of 6143 MB) (http://m.allizom.org/Swap)
Comment 8•11 years ago
|
||
As per bug 952877, RAM has been doubled on that node, which should solve the problem. Re-open if swapping issues persist.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 9•11 years ago
|
||
<nagios-scl3:#sysadmins> Sun 05:38:17 PST [5438]
developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 50%
free (1006 MB out of 2047 MB) (http://m.mozilla.org/Swap)
Not sure what was eating all the swap, gone by the time I looked at it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Assignee: dmaher → server-ops-webops
Updated•11 years ago
|
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/201]
Assignee | ||
Comment 10•11 years ago
|
||
To summarize:
Yes, there's a long-running wget. The MDN project wanted to have a downloadable tarball of the site... sort of "offline documentation". You can see from comments here that it excludes a number of things, but possibly it could exclude more. Or possibly there's a better way to do this altogether.
There's also crons for dev, stage, and prod that happen to coincide, time-wise. I've shuffled them around some, but I suspect this won't make a huge difference.
In the short term, can we just throw some more RAM at this? Looks like it has 6GB now, can we bump up to 8GB? CC'ing the storage/virtualization folks. Don't know how much capacity we have in SCL3 VMware.
Component: WebOps: Community Platform → Server Operations: Virtualization
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → cshields
Comment 11•11 years ago
|
||
looking at sar for the last few days, only see memory usage >60% for any uniform time between 0400 and 0600 (that's PDT) - a little concerned that we're just throwing more ram at it *again*. However, it is occasionally memory tight, and settings appear OK on there ...
Checked /proc/sys/vm/swappiness - set to the ultra low '10' - so it really shouldn't use swap until things get VERY tight. I think 2GB would be an acceptable move.
Brief reboot needed, who/when can I coordinate? I'll be around by 7AM Eastern tomorrow.
Assignee | ||
Comment 12•11 years ago
|
||
All done, and looks good. Thanks for the help! :)
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•