1304171 - Direct rest and bzapi requests to the new 6th webhead for a day or two and measure performance

Reporter

Description

•

8 years ago

Per: https://wiki.mozilla.org/BMO/Meetings/2016-09-20 We'll try REST and BZAPI calls to one webhead for a few hours or a day, in order to verify that UI performance is improved, and API performance does not significantly degrade.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 1

•

8 years ago

Created new zeus pool called 'bugzilla.mozilla.org-api' with just web6 in, and bugzilla.mozilla.org-https pool as failover. Created new rule 'bugzilla-api' that checks URL path for /bzapi/ or /rest/ and sends them to the new pool. Currently enabled on prod, and traffic is flowing happily to web6.

Assignee: nobody → klibby

Kendall Libby [:fubar] (he/him)

Assignee

Comment 2

•

8 years ago

10:23:49 <@nagios-scl3> Thu 07:23:49 PDT [5215] web6.bugs.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 28% free (559 MB out of 2046 MB)

Kendall Libby [:fubar] (he/him)

Assignee

Comment 3

•

8 years ago

There weren't a huge number of clients connected (ie more idle workers than not), but lots of httpd processes swapped out. Load was fairly spiky, too. If this is going to be a longer term model, we might want to tune things better for handling API calls, or add a second node.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 4

•

8 years ago

cool, I'm going to take a look at apache size limit stuff.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 5

•

8 years ago

graphite shows a marked difference in memory usage between the new node and the others, suggesting a memory leak in bzapi/rest requests. dylan is investigating, but there will likely be ongoing swap alerts to the MOC.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 6

•

8 years ago

I've returned API traffic back to the entire pool; it's the end of the day and I don't want the MOC to get paged over and over again as swap fills up (and it just ate everything on web6!). I think there's low enough "user" traffic that we could split the cluster 4/2 and be pretty happy. More so once dylan tracks down why SizeLimit isn't firing on API traffic (kill_pigs is culling them instead).

Amy Rich [:arr] [:arich]

Comment 7

•

8 years ago

:dylan: did you get enough info when we ran the test, or do we need another couple of days of redirecting to get more info?

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 8

•

8 years ago

Another 24-hr period would be useful. Especially if I can add more instrumentation to try to track down the remaining memory leaks.

Flags: needinfo?(dylan)

Kendall Libby [:fubar] (he/him)

Assignee

Comment 9

•

8 years ago

I've re-enabled this with a small change - web5 and web6 are now serving API requests while web1-4 are handling all other traffic. Hopefully having two nodes handling API traffic will prevent nagios from throwing swap alerts while the test is running.

Dave Williams [:daveio]

Comment 10

•

8 years ago

Apache on web5 stopped responding entirely and was graceful'd.

Justin Lazaro [:jlaz] (use needinfo)

Comment 11

•

8 years ago

Had web5 page for swap today, and restarted Apache. My apologies if I had interrupted any testing :( 19:22:45 <@nagios-scl3> Mon 19:22:45 PST [5605] web5.bugs.scl3.mozilla.com:Swap is CRITICAL: SWAP CRITICAL - 22% free (450 MB out of 2047 MB) (http://m.mozilla.org/Swap)

Peter Radcliffe [:pir]

Comment 12

•

8 years ago

<@nagios-scl3:#sysadmins> (IRC) Tue 03:43:26 PST [5340] web6.bugs.scl3.mozilla.com:Out of memory - killed process is WARNING: WARNING: Log errors found: Jan 10 11:42:46 web6.bugs.scl3.mozilla.com apache[8172]: Out of memory! (http://m.mozilla.org/Out+of+memory+-+killed+process) <@nagios-scl3:#sysadmins> Tue 03:49:56 PST [5346] web6.bugs.scl3.mozilla.com:Swap is WARNING: SWAP WARNING - 30% free (597 MB out of 2046 MB) (http://m.mozilla.org/Swap) Apache Server Status for localhost Server Version: Apache/2.2.15 (Unix) mod_perl/2.0.4 Perl/v5.10.1 Server Built: Nov 3 2016 10:35:25 -------------------------------------------------------------------------- Current Time: Tuesday, 10-Jan-2017 11:54:39 UTC Restart Time: Tuesday, 10-Jan-2017 03:12:19 UTC Parent Server Generation: 0 Server uptime: 8 hours 42 minutes 20 seconds Total accesses: 70782 - Total Traffic: 7.7 MB CPU Usage: u11562.5 s318.54 cu0 cs0 - 37.9% CPU load 2.26 requests/sec - 258 B/second - 114 B/request 4 requests currently being processed, 44 idle workers ______W_____._____.............._......................__...._._ _.........K........__......_............._........._._........._ .K...K..._._...__._...._................_...._.__..._....__..... ....__.......................................................... .... Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process So appears that one process, the killed one, was taking the remainder of virtual memory on the host. The large number of processes are taking up the rest, no other individual memory hogs. Mem: 16334188k total, 15002416k used, 1331772k free, 19900k buffers Swap: 2096124k total, 1318792k used, 777332k free, 295232k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5606 apache 30 10 801m 481m 3152 S 0.0 3.0 2:03.25 httpd 8348 apache 30 10 777m 475m 3276 S 0.0 3.0 3:51.39 httpd 8406 apache 30 10 776m 424m 3144 S 0.0 2.7 3:32.19 httpd 8344 apache 30 10 714m 415m 3220 S 0.0 2.6 3:49.42 httpd 5633 apache 30 10 707m 407m 3172 S 0.0 2.6 2:00.41 httpd 8260 apache 30 10 688m 382m 3248 S 0.0 2.4 3:59.79 httpd 5604 apache 30 10 696m 366m 3680 S 0.0 2.3 2:02.61 httpd 8342 apache 30 10 682m 362m 3200 S 0.0 2.3 3:52.23 httpd 8315 apache 30 10 641m 339m 3240 S 0.0 2.1 3:47.86 httpd 8318 apache 30 10 635m 335m 3184 S 0.0 2.1 4:08.25 httpd 8280 apache 30 10 631m 329m 3328 S 0.0 2.1 4:06.92 httpd 8385 apache 30 10 646m 324m 3380 S 0.0 2.0 3:47.81 httpd 8382 apache 30 10 622m 320m 3304 S 0.0 2.0 3:32.26 httpd 5567 apache 30 10 616m 318m 3212 S 0.0 2.0 1:51.18 httpd 8357 apache 30 10 629m 317m 3192 S 0.0 2.0 3:48.72 httpd 8257 apache 30 10 620m 314m 3156 S 0.0 2.0 3:35.99 httpd 8384 apache 30 10 619m 309m 3320 S 0.0 1.9 3:38.04 httpd 8193 apache 30 10 611m 308m 3276 S 0.0 1.9 3:54.77 httpd 8329 apache 30 10 640m 308m 3208 S 0.0 1.9 4:32.50 httpd 8250 apache 30 10 632m 303m 3200 S 0.0 1.9 4:02.22 httpd 8352 apache 30 10 604m 298m 3396 S 0.0 1.9 3:55.86 httpd 5607 apache 30 10 611m 296m 3328 S 8.3 1.9 2:11.14 httpd 5634 apache 30 10 605m 295m 3232 S 0.0 1.9 1:49.98 httpd 8405 apache 30 10 635m 295m 3212 S 0.0 1.9 3:42.68 httpd 8304 apache 30 10 618m 294m 3164 S 0.0 1.8 3:10.71 httpd 8252 apache 30 10 623m 291m 3164 S 0.0 1.8 3:34.02 httpd 8349 apache 30 10 645m 285m 3208 S 0.0 1.8 4:30.29 httpd 5569 apache 30 10 609m 285m 3416 S 0.0 1.8 1:44.09 httpd 5628 apache 30 10 585m 285m 3176 S 0.0 1.8 2:06.90 httpd 8389 apache 30 10 603m 282m 3144 S 0.0 1.8 3:44.04 httpd 8331 apache 30 10 597m 281m 3436 S 0.0 1.8 4:19.56 httpd 8259 apache 30 10 635m 281m 3184 S 0.0 1.8 4:16.38 httpd 8187 apache 30 10 602m 279m 3212 S 7.6 1.8 3:47.85 httpd 8377 apache 30 10 575m 278m 3212 S 0.3 1.7 3:06.77 httpd 8289 apache 30 10 606m 277m 3364 S 0.0 1.7 3:42.44 httpd 8281 apache 30 10 577m 276m 3192 S 0.0 1.7 3:36.99 httpd 5605 apache 30 10 600m 275m 3256 S 0.0 1.7 1:47.27 httpd 5566 apache 30 10 591m 274m 3308 S 0.0 1.7 2:08.32 httpd 8394 apache 30 10 597m 272m 3244 S 0.0 1.7 3:51.43 httpd 8271 apache 30 10 616m 269m 3212 S 0.0 1.7 4:00.11 httpd 8336 apache 30 10 593m 265m 3336 S 0.0 1.7 3:21.47 httpd 5631 apache 30 10 583m 261m 3172 S 0.0 1.6 1:55.20 httpd 8225 apache 30 10 590m 256m 3164 S 0.0 1.6 3:46.66 httpd 5632 apache 30 10 581m 251m 3252 S 0.0 1.6 1:51.46 httpd 8395 apache 30 10 604m 236m 3520 S 0.0 1.5 4:20.09 httpd 8203 apache 30 10 568m 233m 3256 S 0.0 1.5 3:22.72 httpd 5568 apache 30 10 538m 230m 3188 S 0.0 1.4 1:56.39 httpd 5629 apache 30 10 524m 222m 3204 S 0.0 1.4 1:38.02 httpd 7953 root 30 10 311m 17m 2588 S 0.0 0.1 0:02.99 httpd Bounced httpd to recover memory/swap as everyone else has.

Ludovic Hirlimann [:Usul]

Comment 13

•

8 years ago

Thu 06:23:36 PST [5684] web5.bugs.scl3.mozilla.com:httpd max clients is CRITICAL: (Service Check Timed Out) (http://m.mozilla.org/httpd+max+clients) I restarted Apache.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 14

•

8 years ago

That looks like too many processes. My math says each webhead should have about 25, but that's clearly almost 50. Why?

Kendall Libby [:fubar] (he/him)

Assignee

Comment 15

•

8 years ago

Reverting back to standard zeus config so that we're not pestering the MOC all (long) weekend; sadly, two nodes still isn't quite enough.

Kendall Libby [:fubar] (he/him)

Assignee

Updated

•

8 years ago

Updated

•

6 years ago

Type: defect → task

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INVALID

Bugzilla

Direct rest and bzapi requests to the new 6th webhead for a day or two and measure performance

Categories

(bugzilla.mozilla.org :: Infrastructure, task)

Tracking

()

People

(Reporter: dylan, Assigned: fubar)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Updated

Updated