Enable hg.mozilla.org to emit mercurial logs to MozDef



Enterprise Information Security
2 years ago
2 years ago


(Reporter: gene, Assigned: michal)



(Whiteboard: [nsm])



2 years ago
In order to prevent future DOS outages like Bug 1228806, work with :gps to enable the hg.mozilla.org web servers to emit their mercurial logs into MozDef and then establish MozDef filters/alerts to banhammer IPs.


2 years ago
Assignee: nobody → asmith

Comment 1

2 years ago
:jeff brings up the point that this might best be done in NSM (Bro) instead of MozDef. :michal` what do you think?
Flags: needinfo?(mpurzynski)

Comment 2

2 years ago
Thanks for filing the bug, Gene!

In addition to the ZLB and httpd logs, we have a Mercurial extension that captures application-level details and writes to /var/log/hg.log (via syslog). This log is very low level. We have a script and crons running that merge related events to produce a slightly higher-level log (as well as aggregate daily values). Those are written to /var/log/hg/parsed.YYYY-MM-DD. That file looks like:

  2015-11-29T23:59:54 build/tools getbundle 130 0.01 0.01


* Time of request
* repository
* source IP
* Mercurial action
* HTTP response body size in bytes
* Wall time of request in seconds
* CPU time of request in seconds

The high-level log is probably most useful for MozDef. However, for simple IP abuse, we could probably look at just the "BEGIN_REQUEST" entries in the lower-level /var/log/hg.log / syslog stream.

Comment 3

2 years ago

I was working 1227876 to collect data in the event of max_clients alert. Is this a useful idea? If so, what commands would you like the script to run if the alert trips?

Comment 4

2 years ago
We probably want to see what requests are outstanding and the IPs they are associated with. However, httpd's server-status will report the ZLB IP. We do record the proper IP in /var/log/hg.log. But we currently have no way to parse that log and look for active requests. I could probably whip up a program that tailed the log and printed the list of active requests every few seconds or something.
(In reply to Gene Wood [:gene] from comment #1)
> :jeff brings up the point that this might best be done in NSM (Bro) instead
> of MozDef. :michal` what do you think?

Sure we can do that. I noticed Bro can see a real IP of the client (along with some other HTTP layer data). How urgent this is? We have a pending MozDef ES cluster expansion, that needs to happen before.
Flags: needinfo?(mpurzynski)

Comment 6

2 years ago
:michal, regarding urgency, we currently don't have defense against denial of service attacks of hg. They have occurred in the recent past affect hg availability. I'm unsure off-hand of impact of an hg outage due to a DOS.

To me it seems like something we should address sooner than later.

Jeff, any thoughts on the priority of getting DOS protection on hg.mozilla.org in relation to the timeline for the MozDef ES cluster expansion?
Flags: needinfo?(jbryner)
Lets prioritize this ahead of the ES cluster expansion given that there are a bunch of code changes necessary in mozdef to work it's way up to ES v2+.
Flags: needinfo?(jbryner)
assigning to Michal has he can likely turn this out faster than I can.
Assignee: asmith → mpurzynski
hg.mozilla.org traffic now flows to Bro. A log sample will follow tomorrow.
And the first catch is that nagios logs in to hg as ADMIN :(
* Over plaintext - HTTP
Whiteboard: [nsm]
Hm, it's not nagios, it's some python script

1453767013.695166       C7STG41oBH4jMaXtge    20861    80      1       GET     reviewboard-hg.mozilla.org      /gecko/raw-file/b2efea6d9316441d33a52e4fac73c9d585e078e1/dom/base/nsGlobalWindow.h      -       Python-urllib/2.6       0       76457   200     Script output follows   -       -       -       AuthBruteforcing::HTTP_AUTH_SUCCESS     admin   -       -       -       -       F5oOR54I8DKaumNUJb      text/plain  reviewboard-hg1.dmz.scl3.mozilla.com

The source IP is - a generic NAT IP in the SCL3.
HTTP logs for Mercurial are collected by Bro and the following detection scripts are monitoring it:

- HTTP Errors - counting requests causing HTTP reply code >=400, >=500 and sending us notification if there is more than 1000 error over 15 minutes, per IP (each IP has a separate threshold)
- AuthBruteforcing - sending us alert on more than 10 (I think) failed authenticaton attempts

The first script has an excellent DoS detection history. The second would work for a reviewboard, I guess.

I cannot parse HG protocol logs, if there is something like that involved. But we log all TCP and UDP sessions for HG.
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.