Closed
Bug 799727
Opened 12 years ago
Closed 6 years ago
High memory usage on syncstorage gunicorn processes
Categories
(Cloud Services Graveyard :: Server: Sync, defect, P4)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gene, Assigned: rfkelly)
References
Details
(Whiteboard: [qa+])
In production, we see high levels of memory utilization by gunicorn processes on the syncstorage systems. We currently spin up processes with the assumption that each one will use 1GB of memory. What is causing this and is it possible to reduce the memory requirements of the app? I'll add more data as I gather it. Here is some raw info about current memory utilization to give a flavor of what's going on : Gunicorn RSS as of 20:50 on 20121008,,,,,,,,,,,, Host,Mem,,Start Time,CPU Time DD-HH:MM:SS,pid,proc,RSS,RSS in MB,% of total,# cores,# procs,% cores used sync1.web.scl2.svc.mozilla.com: 19:05:38 up 35 days,,,,,,,,,,,, sync1.web.scl2.svc.mozilla.com: Mem: 16078,16078,,,,,,,,,8,, sync1.web.scl2.svc.mozilla.com,,,Sep17,00:02:41,7750,gunicorn,11684,11.41,0.07%,,5,62.50% sync1.web.scl2.svc.mozilla.com,,,Sep17,00:13:29,7753,gunicorn,346736,338.61,2.11%,,, sync1.web.scl2.svc.mozilla.com,,,Sep17,00:17:13,7754,gunicorn,346992,338.86,2.11%,,, sync1.web.scl2.svc.mozilla.com,,,Sep17,00:22:31,7752,gunicorn,346808,338.68,2.11%,,, sync1.web.scl2.svc.mozilla.com,,,Sep17,00:26:31,7751,gunicorn,346744,338.62,2.11%,,, Total,,,,,,,,1366.18,8.50%,,, ,,,,,,,,,,,, sync2.web.scl2.svc.mozilla.com: 19:05:38 up 35 days,,,,,,,,,,,, sync2.web.scl2.svc.mozilla.com: Mem: 16079,16079,,,,,,,,,8,, sync2.web.scl2.svc.mozilla.com,,,Oct02,2-17:10:23,12615,gunicorn,2067872,2019.41,12.56%,,5,62.50% sync2.web.scl2.svc.mozilla.com,,,Oct05,1-12:06:29,18586,gunicorn,1319716,1288.79,8.02%,,, sync2.web.scl2.svc.mozilla.com,,,Sep17,00:01:05,31544,gunicorn,11024,10.77,0.07%,,, sync2.web.scl2.svc.mozilla.com,,,Sep30,3-10:30:44,18949,gunicorn,2475680,2417.66,15.04%,,, sync2.web.scl2.svc.mozilla.com,,,Sep30,3-13:35:16,12940,gunicorn,2442920,2385.66,14.84%,,, Total,,,,,,,,8122.28,50.51%,,, ,,,,,,,,,,,, sync3.web.scl2.svc.mozilla.com: 19:05:38 up 134 days,,,,,,,,,,,, sync3.web.scl2.svc.mozilla.com: Mem: 16079,16079,,,,,,,,,8,, sync3.web.scl2.svc.mozilla.com,,,07:04,00:19:00,7629,gunicorn,328608,320.91,2.00%,,5,62.50% sync3.web.scl2.svc.mozilla.com,,,Oct01,2-21:25:12,11639,gunicorn,2042892,1995.01,12.41%,,, sync3.web.scl2.svc.mozilla.com,,,Oct04,1-17:39:05,27165,gunicorn,1171692,1144.23,7.12%,,, sync3.web.scl2.svc.mozilla.com,,,Sep13,00:00:49,3251,gunicorn,10368,10.13,0.06%,,, sync3.web.scl2.svc.mozilla.com,,,Sep30,3-06:04:44,29149,gunicorn,2115116,2065.54,12.85%,,, Total,,,,,,,,5535.82,34.43%,,, ,,,,,,,,,,,, sync4.web.scl2.svc.mozilla.com: 19:05:38 up 133 days,,,,,,,,,,,, sync4.web.scl2.svc.mozilla.com: Mem: 16079,16079,,,,,,,,,8,, sync4.web.scl2.svc.mozilla.com,,,Oct01,3-04:59:44,26151,gunicorn,2161816,2111.15,13.13%,,5,62.50% sync4.web.scl2.svc.mozilla.com,,,Oct04,1-16:55:12,7458,gunicorn,1400264,1367.45,8.50%,,, sync4.web.scl2.svc.mozilla.com,,,Oct06,22:54:27,1948,gunicorn,1338804,1307.43,8.13%,,, sync4.web.scl2.svc.mozilla.com,,,Sep13,00:01:12,9471,gunicorn,10348,10.11,0.06%,,, sync4.web.scl2.svc.mozilla.com,,,Sep30,3-08:02:37,22188,gunicorn,2427460,2370.57,14.74%,,, Total,,,,,,,,7166.69,44.57%,,, ,,,,,,,,,,,, sync5.web.scl2.svc.mozilla.com: 19:05:38 up 7 days,,,,,,,,,,,, sync5.web.scl2.svc.mozilla.com: Mem: 32238,,,,,,,,,,,, ,,,,,,,,,,,, sync6.web.scl2.svc.mozilla.com: 19:05:38 up 574 days,,,,,,,,,,,, sync6.web.scl2.svc.mozilla.com: Mem: 24159,24159,,,,,,,,,8,, sync6.web.scl2.svc.mozilla.com,,,Oct01,3-04:17:00,1301,gunicorn,1409080,1376.05,5.70%,,5,62.50% sync6.web.scl2.svc.mozilla.com,,,Oct01,3-05:09:07,4302,gunicorn,1820152,1777.49,7.36%,,, sync6.web.scl2.svc.mozilla.com,,,Oct08,07:29:54,8208,gunicorn,591708,577.84,2.39%,,, sync6.web.scl2.svc.mozilla.com,,,Sep13,00:01:13,26340,gunicorn,11460,11.19,0.05%,,, sync6.web.scl2.svc.mozilla.com,,,Sep30,3-10:50:41,22355,gunicorn,2463820,2406.07,9.96%,,, Total,,,,,,,,6148.65,25.45%,,, ,,,,,,,,,,,, sync7.web.scl2.svc.mozilla.com: 19:05:38 up 574 days,,,,,,,,,,,, sync7.web.scl2.svc.mozilla.com: Mem: 32239,32239,,,,,,,,,8,, sync7.web.scl2.svc.mozilla.com,,,Oct01,3-04:53:46,10909,gunicorn,2411760,2355.23,7.31%,,5,62.50% sync7.web.scl2.svc.mozilla.com,,,Oct04,1-22:36:36,30214,gunicorn,2207924,2156.18,6.69%,,, sync7.web.scl2.svc.mozilla.com,,,Oct04,1-23:32:36,30474,gunicorn,1434128,1400.52,4.34%,,, sync7.web.scl2.svc.mozilla.com,,,Sep13,00:01:16,13037,gunicorn,11452,11.18,0.03%,,, sync7.web.scl2.svc.mozilla.com,,,Sep30,3-09:50:30,19459,gunicorn,2416676,2360.04,7.32%,,, Total,,,,,,,,8283.14,25.69%,,, ,,,,,,,,,,,, sync8.web.scl2.svc.mozilla.com: 19:05:38 up 574 days,,,,,,,,,,,, sync8.web.scl2.svc.mozilla.com: Mem: 32239,32239,,,,,,,,,8,, sync8.web.scl2.svc.mozilla.com,,,Oct04,1-18:46:15,32316,gunicorn,1360568,1328.68,4.12%,,5,62.50% sync8.web.scl2.svc.mozilla.com,,,Oct04,1-21:38:06,29375,gunicorn,1396188,1363.46,4.23%,,, sync8.web.scl2.svc.mozilla.com,,,Oct06,1-03:34:27,8662,gunicorn,1303572,1273.02,3.95%,,, sync8.web.scl2.svc.mozilla.com,,,Oct07,18:37:01,6402,gunicorn,737060,719.79,2.23%,,, sync8.web.scl2.svc.mozilla.com,,,Sep13,00:01:15,8298,gunicorn,11476,11.21,0.03%,,, Total,,,,,,,,4696.16,14.57%,,, ,,,,,,,,,,,, sync1.web.phx1.svc.mozilla.com: 19:00:34 up 116 days,,,,,,,,,,,, sync1.web.phx1.svc.mozilla.com: Mem: 24022,24022,,,,,,,,,8,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:05:06,21118,gunicorn,11728,11.45,0.05%,,9,112.50% sync1.web.phx1.svc.mozilla.com,,,Sep13,00:10:05,21134,gunicorn,91364,89.22,0.37%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:10:29,21128,gunicorn,91916,89.76,0.37%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:11:31,21133,gunicorn,92460,90.29,0.38%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:12:53,21129,gunicorn,91792,89.64,0.37%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:14:31,21131,gunicorn,92784,90.61,0.38%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:16:16,21132,gunicorn,92108,89.95,0.37%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:18:59,21127,gunicorn,93076,90.89,0.38%,,, sync1.web.phx1.svc.mozilla.com,,,Sep13,00:21:53,21130,gunicorn,92432,90.27,0.38%,,, Total,,,,,,,,732.09,3.05%,,, ,,,,,,,,,,,, sync2.web.phx1.svc.mozilla.com: 19:00:34 up 116 days,,,,,,,,,,,, sync2.web.phx1.svc.mozilla.com: Mem: 24022,24022,,,,,,,,,8,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:05:01,16356,gunicorn,11732,11.46,0.05%,,9,112.50% sync2.web.phx1.svc.mozilla.com,,,Sep13,00:09:50,16359,gunicorn,92008,89.85,0.37%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:10:17,16362,gunicorn,92244,90.08,0.37%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:11:15,16358,gunicorn,92072,89.91,0.37%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:12:32,16363,gunicorn,92316,90.15,0.38%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:14:04,16357,gunicorn,92204,90.04,0.37%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:15:40,16360,gunicorn,92168,90.01,0.37%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:18:04,16364,gunicorn,92644,90.47,0.38%,,, sync2.web.phx1.svc.mozilla.com,,,Sep13,00:20:58,16361,gunicorn,92420,90.25,0.38%,,, Total,,,,,,,,732.23,3.05%,,, ,,,,,,,,,,,, sync3.web.phx1.svc.mozilla.com: 19:00:34 up 116 days,,,,,,,,,,,, sync3.web.phx1.svc.mozilla.com: Mem: 24022,24022,,,,,,,,,8,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:03:31,13242,gunicorn,11736,11.46,0.05%,,9,112.50% sync3.web.phx1.svc.mozilla.com,,,Sep13,00:08:53,13246,gunicorn,92020,89.86,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:09:20,13244,gunicorn,92240,90.08,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:10:18,13247,gunicorn,92096,89.94,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:11:37,13248,gunicorn,92060,89.90,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:13:22,13249,gunicorn,92144,89.98,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:15:22,13245,gunicorn,92516,90.35,0.38%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:17:08,13243,gunicorn,92212,90.05,0.37%,,, sync3.web.phx1.svc.mozilla.com,,,Sep13,00:20:14,13250,gunicorn,92332,90.17,0.38%,,, Total,,,,,,,,731.79,3.05%,,, ,,,,,,,,,,,, sync4.web.phx1.svc.mozilla.com: 19:00:34 up 116 days,,,,,,,,,,,, sync4.web.phx1.svc.mozilla.com: Mem: 24022,24022,,,,,,,,,8,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:04:01,30955,gunicorn,11724,11.45,0.05%,,9,112.50% sync4.web.phx1.svc.mozilla.com,,,Sep13,00:09:16,30961,gunicorn,92268,90.11,0.38%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:09:38,30956,gunicorn,92136,89.98,0.37%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:10:32,30960,gunicorn,92300,90.14,0.38%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:11:52,30963,gunicorn,92056,89.90,0.37%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:13:47,30962,gunicorn,92772,90.60,0.38%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:15:35,30957,gunicorn,92196,90.04,0.37%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:17:28,30958,gunicorn,92308,90.14,0.38%,,, sync4.web.phx1.svc.mozilla.com,,,Sep13,00:20:31,30959,gunicorn,92224,90.06,0.37%,,, Total,,,,,,,,732.41,3.05%,,, ,,,,,,,,,,,, sync5.web.phx1.svc.mozilla.com: 19:00:34 up 161 days,,,,,,,,,,,, sync5.web.phx1.svc.mozilla.com: Mem: 48267,48267,,,,,,,,,24,, sync5.web.phx1.svc.mozilla.com,,,02:35,03:35:32,12180,gunicorn,268668,262.37,0.54%,,9,37.50% sync5.web.phx1.svc.mozilla.com,,,Oct05,2-07:58:45,24168,gunicorn,1101596,1075.78,2.23%,,, sync5.web.phx1.svc.mozilla.com,,,Oct05,2-09:59:54,30835,gunicorn,1438048,1404.34,2.91%,,, sync5.web.phx1.svc.mozilla.com,,,Oct05,2-10:27:49,26549,gunicorn,1036668,1012.37,2.10%,,, sync5.web.phx1.svc.mozilla.com,,,Oct05,2-11:23:49,13509,gunicorn,1740004,1699.22,3.52%,,, sync5.web.phx1.svc.mozilla.com,,,Oct06,1-16:06:58,17125,gunicorn,820376,801.15,1.66%,,, sync5.web.phx1.svc.mozilla.com,,,Oct08,16:50:24,18735,gunicorn,1071680,1046.56,2.17%,,, sync5.web.phx1.svc.mozilla.com,,,Oct08,16:56:48,22295,gunicorn,794312,775.70,1.61%,,, sync5.web.phx1.svc.mozilla.com,,,Sep13,00:07:37,2532,gunicorn,11732,11.46,0.02%,,, Total,,,,,,,,8088.95,16.76%,,, ,,,,,,,,,,,, sync6.web.phx1.svc.mozilla.com: 19:00:34 up 161 days,,,,,,,,,,,, sync6.web.phx1.svc.mozilla.com: Mem: 48267,48267,,,,,,,,,24,, sync6.web.phx1.svc.mozilla.com,,,Oct03,3-16:11:37,13788,gunicorn,1439028,1405.30,2.91%,,9,37.50% sync6.web.phx1.svc.mozilla.com,,,Oct04,3-04:55:06,21673,gunicorn,1647932,1609.31,3.33%,,, sync6.web.phx1.svc.mozilla.com,,,Oct06,2-01:48:28,489,gunicorn,1102988,1077.14,2.23%,,, sync6.web.phx1.svc.mozilla.com,,,Oct08,13:05:41,31848,gunicorn,758660,740.88,1.53%,,, sync6.web.phx1.svc.mozilla.com,,,Oct08,15:57:05,21336,gunicorn,1078356,1053.08,2.18%,,, sync6.web.phx1.svc.mozilla.com,,,Oct08,20:14:04,8956,gunicorn,955124,932.74,1.93%,,, sync6.web.phx1.svc.mozilla.com,,,Sep13,00:07:58,22643,gunicorn,11720,11.45,0.02%,,, sync6.web.phx1.svc.mozilla.com,,,Sep21,10-10:16:56,31149,gunicorn,1548868,1512.57,3.13%,,, sync6.web.phx1.svc.mozilla.com,,,Sep25,8-16:09:41,2004,gunicorn,1812724,1770.24,3.67%,,, Total,,,,,,,,10112.70,20.95%,,, ,,,,,,,,,,,, sync7.web.phx1.svc.mozilla.com: 19:00:34 up 161 days,,,,,,,,,,,, sync7.web.phx1.svc.mozilla.com: Mem: 48267,48267,,,,,,,,,24,, sync7.web.phx1.svc.mozilla.com,,,06:21,00:57:12,2892,gunicorn,237860,232.29,0.48%,,9,37.50% sync7.web.phx1.svc.mozilla.com,,,Oct01,5-02:47:00,28049,gunicorn,1481744,1447.02,3.00%,,, sync7.web.phx1.svc.mozilla.com,,,Oct05,2-10:51:13,5682,gunicorn,1124140,1097.79,2.27%,,, sync7.web.phx1.svc.mozilla.com,,,Oct05,2-11:06:13,31141,gunicorn,1072000,1046.88,2.17%,,, sync7.web.phx1.svc.mozilla.com,,,Oct05,2-12:05:19,3680,gunicorn,1449248,1415.28,2.93%,,, sync7.web.phx1.svc.mozilla.com,,,Oct05,2-12:14:41,12691,gunicorn,1499232,1464.09,3.03%,,, sync7.web.phx1.svc.mozilla.com,,,Oct07,22:53:37,29524,gunicorn,1093056,1067.44,2.21%,,, sync7.web.phx1.svc.mozilla.com,,,Sep13,00:07:33,18047,gunicorn,11720,11.45,0.02%,,, sync7.web.phx1.svc.mozilla.com,,,Sep20,11-08:09:04,11072,gunicorn,1487940,1453.07,3.01%,,, Total,,,,,,,,9235.29,19.13%,,,
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → rfkelly
Updated•12 years ago
|
Component: Firefox Sync: Backend → Server: Sync
Assignee | ||
Comment 1•12 years ago
|
||
I'm surprised to see some of these machines demonstrating high memory usage and some not. In particular, these machines seem fine, with low memory usage and processes all having stayed up since they were last kicked: sync1.web.scl2.svc.mozilla.com sync1.web.phx1.svc.mozilla.com sync2.web.phx1.svc.mozilla.com sync3.web.phx1.svc.mozilla.com sync4.web.phx1.svc.mozilla.com Do these machines differ enough from the others to provide any clues? I vaguely recall :atoll mentioning memory problems on one RHEL platform but not another.
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #1) > sync1.web.scl2.svc.mozilla.com Oh, pencil suggests that this machine is getting 0 qps, which would explain why it's not showing the same memory use pattern as the others :-) The others I don't know about.
Reporter | ||
Comment 3•12 years ago
|
||
rfkelly : correct, sync1-4 in PHX1 are configured to only come into play if the cluster is dying, otherwise they get no traffic. sync1 in SCL2 has been dead for some time which explains it's data (I believe)
Reporter | ||
Comment 4•12 years ago
|
||
rfkelly : here's a summary of what we're seeing. Looks like it's highly variable how long it takes to reach the high memory utilization (some as short as 16 minutes to get to 1GB) (03:37:26 PM) atoll: so when sync1..4.phx1 were having swap trouble, many weeks ago but this year since couchbase in may, i found that long-running processes had the 1GB plus ram usage (03:37:46 PM) atoll: i deferred poking at it further until we deployed the new Sync code that bobm pushed a couple weeks ago (03:38:03 PM) atoll: since analyzing memory issues in ancient stale code is not a very good use of time vs. analyzing it on new code (03:38:34 PM) atoll: since ckolos reports we're still seeing issues, i *suspect* it's still "growth then plateau around 1.1GB", since it sounds like the new code appears not to have changed that profile (03:40:19 PM) ckolos: so sync3.web.scl2 (03:40:23 PM) ckolos: pid 29149 (03:40:50 PM) ckolos: Virt is 2223m, rss is 2.0g, shared is 3116, stack (data) is 2.0gb (03:41:50 PM) ckolos: other than sync1/5 all scl2 sync web heads have at least 1 gunicorn process taking more than 2gb of memory (03:42:06 PM) ckolos: oop, damn you syn8 (03:42:28 PM) ckolos: okay sync8 doesn't have one over 2gb, but does have 3 over 1.2gb (03:44:19 PM) ckolos: so... go fish. (03:45:31 PM) atoll: any correlation between process age? (03:46:00 PM) ckolos: likely some, but not definitively (03:46:40 PM) ckolos: there are procs with 1 day of CPU time taking 1.2gb, while others with 3+ days, taking "only" 2.3 (03:47:01 PM) ckolos: so if so, it's not direct linear growth (03:47:32 PM) ckolos: comparing phx and scl2 is even more frustratin (03:47:51 PM) ckolos: where a proc with 2+ days of cpu time is only using 1.075 gb (03:48:05 PM) ckolos: and another with 16 mins is using 1.046 (03:47:43 PM) atoll: yeah, i don't know why they're so variant yet :( (03:47:54 PM) atoll: comparing sync5..7.phx1 to sync1..8.scl2 may help (03:48:04 PM) atoll: and just ignore 1..4.phx1 since they're not in use most times (03:48:14 PM) ckolos: this is on sync5.phx (03:48:31 PM) atoll: maybe the initial memory burden for a worker is stable at 1GB after startup and a request or two (03:48:55 PM) ckolos: possibly, but then that means that sync1-4 aren't used at *all* (03:48:58 PM) atoll: correct (03:49:03 PM) ckolos: b/c they're all around 90mb per proc (03:49:26 PM) atoll: sync1..4.phx1 are set as "last resort" servers in the zeus pool, since if they're in active use they cause couchbase to swap out (03:49:45 PM) atoll: once we have a couchbase hardware solution for scl2, it must also go to phx1 (03:49:54 PM) ckolos: really though, there's not enough running to come up with anything other than slightly-better-than-guesses (03:50:12 PM) atoll: do the sync load tests show the same worker memory usage? (03:50:22 PM) ckolos: unknown. (03:50:26 PM) ckolos: where would that data be? (03:50:36 PM) atoll: sync*.web.scl2.stage graphs and collection, if any (03:50:55 PM) atoll: rfkelly is online and may be of further use here, in case he's ever observed memory usage previously (03:56:54 PM) ckolos: none of the stage syncweb servers have gunicorn processes running that hot. (03:57:13 PM) ckolos: highest use in stage is 94mb (03:57:25 PM) ckolos: so I'm guessing no loadtests have been done in a while. (03:57:37 PM) ckolos: most procs are dated aug 29
Assignee | ||
Comment 5•12 years ago
|
||
My first suspect here is the per-node-name connection pool and related data structures. How many individual [host:nodename] sections do we have configured in the prod settings file?
Assignee | ||
Comment 6•12 years ago
|
||
I tried about an hour of light load against stage this afternoon, and monitored the memory usage of two gunicorn processes - one which was freshly restarted, and one that had been alive since 29 August. RSS snapshots at 15-minute intervals: New Proc Old Proc t=0 41232 75856 t=15m 53580 75924 t=30m 54244 75912 t=45m 54836 75920 t=60m 55604 75908 So the memory usage does seem to slowly climb to a peak value as requests come in, then stay relatively steady at that level.
Reporter | ||
Comment 7•12 years ago
|
||
rfkelly, I think this is what you're asking for : At PHX1 and SCL2 the production.ini file for syncstorage contains these : [server:main] use = egg:Paste#http host = 0.0.0.0 port = 5000 use_threadpool = True threadpool_workers = 60 [app:main] use = egg:SyncStorage configuration = file:/etc/sync/sync.conf
Reporter | ||
Comment 8•12 years ago
|
||
Sorry, I see what you mean now. In the sync.conf file we have : In PHX1 we have 610 lines like this : [host:phx-sync609.services.mozilla.com] In SCL2 we have 1320 lines like this : [host:scl2-sync1320.services.mozilla.com]
Assignee | ||
Comment 9•12 years ago
|
||
As a first step, I'd like to make a new release and push it to stage with the following changes: * memory-usage-dumping support from Bug 799874 * update all our dependencies to latest version In particular I want to update SQLAlchemy, which is a whole minor version behind the current release (0.6.6 vs 0.7.9) which has some known memory-usage improvements. We can then throw some load at it and take periodic memory-usage dumps from one of the gunicorn worker processes. I can then analyse these dumps offline to get an idea of where the memory is being spent. Will we have the Ops bandwith for a push to stage sometime in the next few days? If not then I can run my own tests, but I think memory-usage data from stage under full load will be significantly more useful than what I can simulate locally.
Comment 10•12 years ago
|
||
Submit Stage deploy ticket for Sync as per usual, and if Gene is blocked I can push it out.
Assignee | ||
Comment 11•12 years ago
|
||
Filed Bug 800254 for deploying gunicorn changes into stage.
Depends on: 800254
Assignee | ||
Comment 12•12 years ago
|
||
Gene, in the config file you grepped in Comment 8 there should be a [storage] section. Can you please post (or email me if sensitive) the contents of that section, minus any passwords etc? I want to check for anything that might explain why memory usage on stage seems to be much better controlled than in production. Stage has 160 [host:blah] sections vs 1320 in production, but the difference in memory usage between the two doesn't seem to scale with that number. Perhaps they have slightly different configurations in e.g. number of connections per pool.
Reporter | ||
Comment 13•12 years ago
|
||
Sure, here's that section. I've compared and it's the same at scl2 and phx1 [storage] backend = syncstorage.storage.memcachedsql.MemcachedSQLStorage sqluri = pymysql://USERNAMEGOESHERE:PASSWORDGOESHERE@sync1.db.scl2.svc.mozilla.com/weave0 standard_collections = true use_quota = false quota_size = 25600 pool_size = 2 pool_recycle = 1200 reset_on_return = true batch_size = 100 cache_servers = localhost:11222 create_tables = false display_config = false hosts = scl2-sync1.services.mozilla.com scl2-sync2.services.mozilla.com scl2-sync3.services.mozilla.com . . . scl2-sync1318.services.mozilla.com scl2-sync1319.services.mozilla.com scl2-sync1320.services.mozilla.com shard = true
Assignee | ||
Comment 14•12 years ago
|
||
Bug 802486 identifies a cache-clearing issue that likely contributes to the high memory usage. This issue would result in an empty dict being kept in memory for each unique userid ever encountered by the server. That's only of the order of ~300 bytes of memory used per user, but we do serve a lot of users... Probably not the whole story, but it's a solid start.
Depends on: 802486
Assignee | ||
Comment 15•12 years ago
|
||
I'm prepping a deployment to get the above-mentioned fix out into production - Bug 803389. It will be interesting to see how much of a difference the tweaks so far have made.
Depends on: 803389
Assignee | ||
Comment 16•10 years ago
|
||
Bob, can you confirm whether this is still an issue for current sync? If so then we should put it on our radar for sync+fxa deployment planning.
Flags: needinfo?(bobm)
Updated•10 years ago
|
Whiteboard: [qa+]
Comment 17•10 years ago
|
||
Most gunicorn workers are in the 1GB and under range, however there are a couple of outliers. sync1.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 21851,654,951968,946532,1113304 21851,3228,1086948,1081628,1248400 21851,11828,634456,629020,795792 28912,21851,11756,7964,107360 21851,21852,1915644,1910268,2077040 21851,21853,940500,935168,1101940 21851,21857,1173020,1167580,1334352 21851,21858,1166896,1161464,1328236 21851,21859,945688,940260,1107032 sync2.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 8948,8863,1167276,1163088,1329732 2188,8948,11536,8076,109440 8948,8949,1252668,1248504,1415148 8948,8951,982400,978372,1145016 8948,8953,1080108,1075988,1242632 8948,8954,1125696,1121600,1288244 8948,8955,1421692,1417824,1584468 8948,19640,744772,740564,907208 8948,20869,919524,915292,1081936 sync3.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 2216,6760,11620,8076,109440 6760,6789,963724,964544,1131188 6760,6790,1057516,1052368,1219012 6760,6794,965408,960260,1126904 6760,6796,1085048,1079872,1246516 6760,6988,1113764,1108416,1275060 6760,17266,1104832,1099612,1266256 6760,21081,784004,778660,945304 6760,32528,1141740,1136392,1303036 sync4.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 5047,709,1037620,1033476,1200120 5047,3629,916064,912004,1078648 5047,4464,781464,779504,946148 5047,4879,1036016,1033880,1200524 2214,5047,11364,8080,109444 5047,5068,1582712,1611836,1782652 5047,11283,1591280,1587504,1754148 5047,22663,953084,949224,1115868 5047,30280,1410736,1406876,1573520 sync5.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 26597,8994,917196,911980,1078616 26597,9896,704416,699440,866076 26597,12619,924384,919592,1086228 26597,17183,920084,915012,1081648 26597,26389,958344,955308,1121944 14334,26597,11588,8080,109436 26597,26600,949552,946316,1112952 26597,26602,961044,955832,1122468 26597,32443,716948,711668,878304 sync6.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 22769,9956,918788,913600,1080236 22769,10205,946784,943588,1110224 22769,12584,949612,944340,1110976 22769,14819,647132,641972,808608 22769,15258,713376,712212,878848 22769,22177,701860,696588,863224 13874,22769,11568,8076,109432 22769,22778,948196,943052,1109688 22769,26315,702680,701496,868132 sync7.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 14816,11169,632620,627328,793964 14816,11370,901852,896564,1063200 14816,14094,704260,699096,865732 14816,14452,629164,623872,790508 13847,14816,11600,8080,109436 14816,16834,1057040,1051872,1218508 14816,22077,934376,929088,1095724 14816,31648,731460,726352,892988 14816,32555,612476,607184,773820 sync8.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 7219,473,13788,12008,132132 473,3442,631280,626672,804308 473,16617,817324,814148,991784 473,19755,644720,656248,833884 473,29223,356380,637560,815196 sync9.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 1632,1649,10152,7656,107052 1649,1974,518120,513300,680032 1649,2557,380420,375724,542456 1649,10650,494068,489600,656332 1649,30300,449664,445356,612088 sync10.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 9616,1379,506736,501860,668592 9423,9616,10844,7660,107056 9616,15202,504008,498784,665516 9616,17566,527428,522808,689540 9616,17675,553944,548804,715536 sync11.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 32495,2941,10796,7652,107048 2941,13538,531512,527492,694224 2941,13553,506328,503116,669848 2941,24705,536032,533092,699824 2941,28657,437740,433640,600372 sync12.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 1656,496,388496,383324,550056 1638,1656,10188,7656,107052 1656,2099,383816,379728,546460 1656,21088,494964,491292,658024 1656,24204,543384,548544,715276 sync13.web.phx1.svc.mozilla.com ppid,pid,rss,size,vsize 17235,9771,524904,532088,698820 32656,17235,10904,7664,107060 17235,22606,484984,479836,646568 17235,26641,483060,477952,644684 17235,29777,480468,475364,642096
Flags: needinfo?(bobm)
Comment 18•10 years ago
|
||
From outlier on sync1.web: Address Kbytes RSS Dirty Mode Mapping 0000000000400000 4 4 0 r-x-- python 0000000000600000 8 8 4 rw--- python 0000000001b62000 4276 4264 4264 rw--- [ anon ] 0000000001f8f000 1902352 1902272 1902272 rw--- [ anon ] ...
Comment 19•10 years ago
|
||
All dependent bugs have been Resolved. What is our status here?
Priority: -- → P1
Assignee | ||
Comment 20•10 years ago
|
||
We currently watching out for this issue on the sync1.5 storage nodes, but I'm hopeful it won't be a problem in the one-box-per-node setup we're currently using. So let's keep it open but not a blocker.
Updated•9 years ago
|
Priority: P1 → P4
Assignee | ||
Comment 21•6 years ago
|
||
> 4 years ago > > We currently watching out for this issue on the sync1.5 storage nodes, but I'm hopeful it won't be a problem in > the one-box-per-node setup we're currently using. So let's keep it open but not a blocker. 4 years later, I haven't heard any complaints about this, so I'm going to go ahead and close it out. :bobm please feel free to open a new bug if there are similar concerns on the sync1.5 server boxes.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Updated•1 year ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•