Closed
Bug 1304171
Opened 8 years ago
Closed 6 years ago
Direct rest and bzapi requests to the new 6th webhead for a day or two and measure performance
Categories
(bugzilla.mozilla.org :: Infrastructure, task)
Tracking
()
RESOLVED
INVALID
People
(Reporter: dylan, Assigned: fubar)
References
Details
Per: https://wiki.mozilla.org/BMO/Meetings/2016-09-20
We'll try REST and BZAPI calls to one webhead for a few hours or a day,
in order to verify that
UI performance is improved, and
API performance does not significantly degrade.
Assignee | ||
Comment 1•8 years ago
|
||
Created new zeus pool called 'bugzilla.mozilla.org-api' with just web6 in, and bugzilla.mozilla.org-https pool as failover. Created new rule 'bugzilla-api' that checks URL path for /bzapi/ or /rest/ and sends them to the new pool.
Currently enabled on prod, and traffic is flowing happily to web6.
Assignee: nobody → klibby
Assignee | ||
Comment 2•8 years ago
|
||
10:23:49 <@nagios-scl3> Thu 07:23:49 PDT [5215] web6.bugs.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 28% free (559 MB out of 2046 MB)
Assignee | ||
Comment 3•8 years ago
|
||
There weren't a huge number of clients connected (ie more idle workers than not), but lots of httpd processes swapped out. Load was fairly spiky, too.
If this is going to be a longer term model, we might want to tune things better for handling API calls, or add a second node.
Reporter | ||
Comment 4•8 years ago
|
||
cool, I'm going to take a look at apache size limit stuff.
Assignee | ||
Comment 5•8 years ago
|
||
graphite shows a marked difference in memory usage between the new node and the others, suggesting a memory leak in bzapi/rest requests. dylan is investigating, but there will likely be ongoing swap alerts to the MOC.
Assignee | ||
Comment 6•8 years ago
|
||
I've returned API traffic back to the entire pool; it's the end of the day and I don't want the MOC to get paged over and over again as swap fills up (and it just ate everything on web6!).
I think there's low enough "user" traffic that we could split the cluster 4/2 and be pretty happy. More so once dylan tracks down why SizeLimit isn't firing on API traffic (kill_pigs is culling them instead).
Comment 7•8 years ago
|
||
:dylan: did you get enough info when we ran the test, or do we need another couple of days of redirecting to get more info?
Flags: needinfo?(dylan)
Reporter | ||
Comment 8•8 years ago
|
||
Another 24-hr period would be useful. Especially if I can add more instrumentation to try to track down the remaining memory leaks.
Flags: needinfo?(dylan)
Assignee | ||
Comment 9•8 years ago
|
||
I've re-enabled this with a small change - web5 and web6 are now serving API requests while web1-4 are handling all other traffic. Hopefully having two nodes handling API traffic will prevent nagios from throwing swap alerts while the test is running.
Comment 10•8 years ago
|
||
Apache on web5 stopped responding entirely and was graceful'd.
Comment 11•8 years ago
|
||
Had web5 page for swap today, and restarted Apache. My apologies if I had interrupted any testing :(
19:22:45 <@nagios-scl3> Mon 19:22:45 PST [5605] web5.bugs.scl3.mozilla.com:Swap is CRITICAL: SWAP CRITICAL - 22% free (450 MB out of 2047 MB) (http://m.mozilla.org/Swap)
Comment 12•8 years ago
|
||
<@nagios-scl3:#sysadmins> (IRC) Tue 03:43:26 PST [5340]
web6.bugs.scl3.mozilla.com:Out of memory - killed process is WARNING:
WARNING: Log errors found: Jan 10 11:42:46 web6.bugs.scl3.mozilla.com
apache[8172]: Out of memory!
(http://m.mozilla.org/Out+of+memory+-+killed+process)
<@nagios-scl3:#sysadmins> Tue 03:49:56 PST [5346]
web6.bugs.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 30% free (597 MB
out of 2046 MB) (http://m.mozilla.org/Swap)
Apache Server Status for localhost
Server Version: Apache/2.2.15 (Unix) mod_perl/2.0.4 Perl/v5.10.1
Server Built: Nov 3 2016 10:35:25
--------------------------------------------------------------------------
Current Time: Tuesday, 10-Jan-2017 11:54:39 UTC
Restart Time: Tuesday, 10-Jan-2017 03:12:19 UTC
Parent Server Generation: 0
Server uptime: 8 hours 42 minutes 20 seconds
Total accesses: 70782 - Total Traffic: 7.7 MB
CPU Usage: u11562.5 s318.54 cu0 cs0 - 37.9% CPU load
2.26 requests/sec - 258 B/second - 114 B/request
4 requests currently being processed, 44 idle workers
______W_____._____.............._......................__...._._
_.........K........__......_............._........._._........._
.K...K..._._...__._...._................_...._.__..._....__.....
....__..........................................................
....
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
So appears that one process, the killed one, was taking the remainder of virtual memory on the host. The large number of processes are taking up the rest, no other individual memory hogs.
Mem: 16334188k total, 15002416k used, 1331772k free, 19900k buffers
Swap: 2096124k total, 1318792k used, 777332k free, 295232k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5606 apache 30 10 801m 481m 3152 S 0.0 3.0 2:03.25 httpd
8348 apache 30 10 777m 475m 3276 S 0.0 3.0 3:51.39 httpd
8406 apache 30 10 776m 424m 3144 S 0.0 2.7 3:32.19 httpd
8344 apache 30 10 714m 415m 3220 S 0.0 2.6 3:49.42 httpd
5633 apache 30 10 707m 407m 3172 S 0.0 2.6 2:00.41 httpd
8260 apache 30 10 688m 382m 3248 S 0.0 2.4 3:59.79 httpd
5604 apache 30 10 696m 366m 3680 S 0.0 2.3 2:02.61 httpd
8342 apache 30 10 682m 362m 3200 S 0.0 2.3 3:52.23 httpd
8315 apache 30 10 641m 339m 3240 S 0.0 2.1 3:47.86 httpd
8318 apache 30 10 635m 335m 3184 S 0.0 2.1 4:08.25 httpd
8280 apache 30 10 631m 329m 3328 S 0.0 2.1 4:06.92 httpd
8385 apache 30 10 646m 324m 3380 S 0.0 2.0 3:47.81 httpd
8382 apache 30 10 622m 320m 3304 S 0.0 2.0 3:32.26 httpd
5567 apache 30 10 616m 318m 3212 S 0.0 2.0 1:51.18 httpd
8357 apache 30 10 629m 317m 3192 S 0.0 2.0 3:48.72 httpd
8257 apache 30 10 620m 314m 3156 S 0.0 2.0 3:35.99 httpd
8384 apache 30 10 619m 309m 3320 S 0.0 1.9 3:38.04 httpd
8193 apache 30 10 611m 308m 3276 S 0.0 1.9 3:54.77 httpd
8329 apache 30 10 640m 308m 3208 S 0.0 1.9 4:32.50 httpd
8250 apache 30 10 632m 303m 3200 S 0.0 1.9 4:02.22 httpd
8352 apache 30 10 604m 298m 3396 S 0.0 1.9 3:55.86 httpd
5607 apache 30 10 611m 296m 3328 S 8.3 1.9 2:11.14 httpd
5634 apache 30 10 605m 295m 3232 S 0.0 1.9 1:49.98 httpd
8405 apache 30 10 635m 295m 3212 S 0.0 1.9 3:42.68 httpd
8304 apache 30 10 618m 294m 3164 S 0.0 1.8 3:10.71 httpd
8252 apache 30 10 623m 291m 3164 S 0.0 1.8 3:34.02 httpd
8349 apache 30 10 645m 285m 3208 S 0.0 1.8 4:30.29 httpd
5569 apache 30 10 609m 285m 3416 S 0.0 1.8 1:44.09 httpd
5628 apache 30 10 585m 285m 3176 S 0.0 1.8 2:06.90 httpd
8389 apache 30 10 603m 282m 3144 S 0.0 1.8 3:44.04 httpd
8331 apache 30 10 597m 281m 3436 S 0.0 1.8 4:19.56 httpd
8259 apache 30 10 635m 281m 3184 S 0.0 1.8 4:16.38 httpd
8187 apache 30 10 602m 279m 3212 S 7.6 1.8 3:47.85 httpd
8377 apache 30 10 575m 278m 3212 S 0.3 1.7 3:06.77 httpd
8289 apache 30 10 606m 277m 3364 S 0.0 1.7 3:42.44 httpd
8281 apache 30 10 577m 276m 3192 S 0.0 1.7 3:36.99 httpd
5605 apache 30 10 600m 275m 3256 S 0.0 1.7 1:47.27 httpd
5566 apache 30 10 591m 274m 3308 S 0.0 1.7 2:08.32 httpd
8394 apache 30 10 597m 272m 3244 S 0.0 1.7 3:51.43 httpd
8271 apache 30 10 616m 269m 3212 S 0.0 1.7 4:00.11 httpd
8336 apache 30 10 593m 265m 3336 S 0.0 1.7 3:21.47 httpd
5631 apache 30 10 583m 261m 3172 S 0.0 1.6 1:55.20 httpd
8225 apache 30 10 590m 256m 3164 S 0.0 1.6 3:46.66 httpd
5632 apache 30 10 581m 251m 3252 S 0.0 1.6 1:51.46 httpd
8395 apache 30 10 604m 236m 3520 S 0.0 1.5 4:20.09 httpd
8203 apache 30 10 568m 233m 3256 S 0.0 1.5 3:22.72 httpd
5568 apache 30 10 538m 230m 3188 S 0.0 1.4 1:56.39 httpd
5629 apache 30 10 524m 222m 3204 S 0.0 1.4 1:38.02 httpd
7953 root 30 10 311m 17m 2588 S 0.0 0.1 0:02.99 httpd
Bounced httpd to recover memory/swap as everyone else has.
Comment 13•8 years ago
|
||
Thu 06:23:36 PST [5684] web5.bugs.scl3.mozilla.com:httpd max clients is CRITICAL: (Service Check Timed Out) (http://m.mozilla.org/httpd+max+clients)
I restarted Apache.
Reporter | ||
Comment 14•8 years ago
|
||
That looks like too many processes. My math says each webhead should have about 25, but that's clearly almost 50. Why?
Assignee | ||
Comment 15•8 years ago
|
||
Reverting back to standard zeus config so that we're not pestering the MOC all (long) weekend; sadly, two nodes still isn't quite enough.
Updated•6 years ago
|
Type: defect → task
Reporter | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•