Closed Bug 939600 Opened 12 years ago Closed 11 years ago

developeradm.private.scl3.mozilla.com using too much swap

Categories

(Infrastructure & Operations :: Virtualization, task, P4)

x86
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Assigned: nmaul)

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/201] )

developeradm.private.scl3.mozilla.com has alerted twice today for using more than half of swap. The first time I restarted Zamboni dashboard which cleared it up but only for a few hours. Other services don't appear to be much using memory, and there is plenty of free memory but something is still using swap.
developeradm has been flapping all day on swap. There is a suspiciously long-running wget: 29085 apache 20 0 783M 441M 1288 S 0.7 14.6 26:41.49 wget -q -m -p -k -E -T 5 -t 3 -R mov,ogv,mp4,gz,bz2,zip,exe,download,flag*,login*,*\$history,*\$json -D developer.mozilla.org -X */profiles which is the top memory user. Should that be going that long?
No, that wget looks like a problem - I've killed it. It's part of a small job that downloads a bunch of assets and then stuffs them into a tarball on the netapp. Not sure what the deal is, but it's clearly errant behaviour, so I've killed off the process chain. As a follow-up, I cleared the memory cache and re-initialised swap.
Assignee: server-ops-webops → dmaher
Status: NEW → RESOLVED
Closed: 12 years ago
Component: WebOps: IT-Managed Tools → WebOps: Community Platform
Priority: -- → P4
Resolution: --- → FIXED
This has been flapping on swap again for days through the holiday and is clearly not resolved.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
apache 3478 2.5 22.4 1330244 692164 ? S Dec29 42:41 wget -q -m -p -k -E -T 5 -t 3 -R mov,ogv,mp4,gz,bz2,zip,exe,download,flag*,login*,*\$history,*\$json -D developer.mozilla.org -X */profiles -np https://developer.mozilla.org/en-US/ wget should not be running that long. Needs a timeout and otherwise fixing.
developeradm.private.scl3 paged for swap again at least twice more this week. No longrunning wget processes visible, just python and nodejs stuff.
Paged again. <nagios-scl3:#sysadmins> Fri 08:38:41 PST [5877] developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 38% free (2300 MB out of 6143 MB) (http://m.allizom.org/Swap) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2166 apache 20 0 3012m 1.1g 924 D 0.3 37.2 0:32.76 python2.6 2161 apache 20 0 2752m 984m 952 D 0.3 32.6 0:32.05 python2.6
<nagios-scl3:#sysadmins> Fri 09:40:03 PST [5888] developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 37% free (2229 MB out of 6143 MB) (http://m.allizom.org/Swap)
As per bug 952877, RAM has been doubled on that node, which should solve the problem. Re-open if swapping issues persist.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
<nagios-scl3:#sysadmins> Sun 05:38:17 PST [5438] developeradm.private.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 50% free (1006 MB out of 2047 MB) (http://m.mozilla.org/Swap) Not sure what was eating all the swap, gone by the time I looked at it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: dmaher → server-ops-webops
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/201]
Assignee: server-ops-webops → nmaul
To summarize: Yes, there's a long-running wget. The MDN project wanted to have a downloadable tarball of the site... sort of "offline documentation". You can see from comments here that it excludes a number of things, but possibly it could exclude more. Or possibly there's a better way to do this altogether. There's also crons for dev, stage, and prod that happen to coincide, time-wise. I've shuffled them around some, but I suspect this won't make a huge difference. In the short term, can we just throw some more RAM at this? Looks like it has 6GB now, can we bump up to 8GB? CC'ing the storage/virtualization folks. Don't know how much capacity we have in SCL3 VMware.
Component: WebOps: Community Platform → Server Operations: Virtualization
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → cshields
looking at sar for the last few days, only see memory usage >60% for any uniform time between 0400 and 0600 (that's PDT) - a little concerned that we're just throwing more ram at it *again*. However, it is occasionally memory tight, and settings appear OK on there ... Checked /proc/sys/vm/swappiness - set to the ultra low '10' - so it really shouldn't use swap until things get VERY tight. I think 2GB would be an acceptable move. Brief reboot needed, who/when can I coordinate? I'll be around by 7AM Eastern tomorrow.
All done, and looks good. Thanks for the help! :)
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.