Seeing lots of high load alerts on git this morning: nagios-scl3 Wed 02:49:06 PST  git1.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 132.56, 162.98, 177.21
Created attachment 8552360 [details] git1-load-201401210301PT.png /me got paged by failures to push to git from the vcs-sync system as load spiked to 300 Looks to have started around 0930 UTC
During load, seeing quite a few requests from an older osx git client via "tail -f access_log" -- looks like that just started "recently": [email@example.com httpd]# egrep -c "\(Apple Git-33\)\"$" access_log* access_log:1641 access_log-20141228:0 access_log-20150104:0 access_log-20150111:0 access_log-20150118:0
/me notes box is configured with only 2GB swap, may want to try increase for peak loads like this also "khugepaged" makes an appearance in top -- issues with at reported on web seem to match what we're seeing: https://bugzilla.redhat.com/show_bug.cgi?id=879801 trying https://bugzilla.redhat.com/show_bug.cgi?id=879801#c17
Applied: [firstname.lastname@example.org httpd]# cat /sys/kernel/mm/redhat_transparent_hugepage/defrag [always] madvise never [email@example.com httpd]# echo never > !$ echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag [firstname.lastname@example.org httpd]# !cat cat /sys/kernel/mm/redhat_transparent_hugepage/defrag always madvise [never]
Created attachment 8552419 [details] git1-load-20150121T0419PT.png Okay, I'm happy with that result :)
Hmm, less sure comment 4 has anything to do with it. The end of the last event had a similar drop off, and it doesn't appear we took any action. See bug 1087640 attachment 8510038 [details] ni :bkero & :gps to render opinion on leaving change in comment 4 applied, which was based on https://bugzilla.redhat.com/show_bug.cgi?id=879801#c17
Assignee: nobody → hwine
Status: NEW → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
See Also: → bug 1087640
I don't have an opinion on the kernel change because I'm not familiar with the subject matter. I reckon this is Git doing repacks somewhere. Do we have a CRON job doing periodic repacks? This would help prevent random repacks on client-initiated server-side operations and would put us in more control of server behavior.
(In reply to Gregory Szorc [:gps] from comment #7) > I reckon this is Git doing repacks somewhere. > > Do we have a CRON job doing periodic repacks? This would help prevent random > repacks on client-initiated server-side operations and would put us in more > control of server behavior. No - opened bug 1124754 for this work
See Also: → bug 1124754
I too don't know enough about the effects of hugepage defragging on system performance on loaded systems to advise on whether to keep it on. Likely if it is still in this state now it doesn't make much difference in performance.
Socket timeout errors 8:42 AM <@nagios-scl3> Tue 08:42:48 PDT  git1.dmz.scl3.mozilla.com:http - gitweb Port 80 is CRITICAL: CRITICAL - Socket timeout after 60 seconds (http://m.mozilla.org/http+-+gitweb+Port+80) & Host: git-zlb.vips.scl3.mozilla.com Service: HTTP - Port 80 Service State: CRITICAL
no longer meaningful in light of bug 1277297
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.