In order to prevent future DOS outages like Bug 1228806, work with :gps to enable the hg.mozilla.org web servers to emit their mercurial logs into MozDef and then establish MozDef filters/alerts to banhammer IPs.
:jeff brings up the point that this might best be done in NSM (Bro) instead of MozDef. :michal` what do you think?
Thanks for filing the bug, Gene! In addition to the ZLB and httpd logs, we have a Mercurial extension that captures application-level details and writes to /var/log/hg.log (via syslog). This log is very low level. We have a script and crons running that merge related events to produce a slightly higher-level log (as well as aggregate daily values). Those are written to /var/log/hg/parsed.YYYY-MM-DD. That file looks like: 2015-11-29T23:59:54 build/tools 18.104.22.168 getbundle 130 0.01 0.01 That's: * Time of request * repository * source IP * Mercurial action * HTTP response body size in bytes * Wall time of request in seconds * CPU time of request in seconds The high-level log is probably most useful for MozDef. However, for simple IP abuse, we could probably look at just the "BEGIN_REQUEST" entries in the lower-level /var/log/hg.log / syslog stream.
Guys, I was working 1227876 to collect data in the event of max_clients alert. Is this a useful idea? If so, what commands would you like the script to run if the alert trips?
We probably want to see what requests are outstanding and the IPs they are associated with. However, httpd's server-status will report the ZLB IP. We do record the proper IP in /var/log/hg.log. But we currently have no way to parse that log and look for active requests. I could probably whip up a program that tailed the log and printed the list of active requests every few seconds or something.
(In reply to Gene Wood [:gene] from comment #1) > :jeff brings up the point that this might best be done in NSM (Bro) instead > of MozDef. :michal` what do you think? Sure we can do that. I noticed Bro can see a real IP of the client (along with some other HTTP layer data). How urgent this is? We have a pending MozDef ES cluster expansion, that needs to happen before.
:michal, regarding urgency, we currently don't have defense against denial of service attacks of hg. They have occurred in the recent past affect hg availability. I'm unsure off-hand of impact of an hg outage due to a DOS. To me it seems like something we should address sooner than later. Jeff, any thoughts on the priority of getting DOS protection on hg.mozilla.org in relation to the timeline for the MozDef ES cluster expansion?
Lets prioritize this ahead of the ES cluster expansion given that there are a bunch of code changes necessary in mozdef to work it's way up to ES v2+.
assigning to Michal has he can likely turn this out faster than I can.
hg.mozilla.org traffic now flows to Bro. A log sample will follow tomorrow.
And the first catch is that nagios logs in to hg as ADMIN :(
* Over plaintext - HTTP
Hm, it's not nagios, it's some python script 1453767013.695166 C7STG41oBH4jMaXtge 10.22.74.208 20861 10.22.74.132 80 1 GET reviewboard-hg.mozilla.org /gecko/raw-file/b2efea6d9316441d33a52e4fac73c9d585e078e1/dom/base/nsGlobalWindow.h - Python-urllib/2.6 0 76457 200 Script output follows - - - AuthBruteforcing::HTTP_AUTH_SUCCESS admin - - - - F5oOR54I8DKaumNUJb text/plain 22.214.171.124 reviewboard-hg1.dmz.scl3.mozilla.com The source IP is 126.96.36.199 - a generic NAT IP in the SCL3.
HTTP logs for Mercurial are collected by Bro and the following detection scripts are monitoring it: - HTTP Errors - counting requests causing HTTP reply code >=400, >=500 and sending us notification if there is more than 1000 error over 15 minutes, per IP (each IP has a separate threshold) - AuthBruteforcing - sending us alert on more than 10 (I think) failed authenticaton attempts The first script has an excellent DoS detection history. The second would work for a reviewboard, I guess. I cannot parse HG protocol logs, if there is something like that involved. But we log all TCP and UDP sessions for HG.