Closed Bug 880358 Opened 11 years ago Closed 10 years ago

balrog should notify when a release is added (or modified?) by a human

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [balrog])

Attachments

(2 files, 3 obsolete files)

At the Balrog security we talked about the idea of having notifications for certain types of changes that happen. For most, this doesn't make sense because of the rapid rate of change. But some events are rare enough that this could make sense. The one we really thought of here was addition of a release by a human (eg, not ffxbld). We could think of no regular case where this would be done, so notifications would only happen in the rare case that we need to add one by hand, or if someone were trying to add a malicious release.

The mechanism that we'd notify by isn't determined yet. CEF logging, maybe? Simple e-mail notification? TBD

I thought about this a bit more and there's a few other events we may want to notify for too:
* Human changes a release
* ACL changes
* Rule changes on a non-test channel (eg, "release", "beta", "nightly", etc.)
Nick and I just talked about this and came up with a few possible ideas:
* Cronjob that runs at least every 5min that looks for changes to the history tables, and alerts in some way if some undesired change is made
* Mysql triggers that run on non-history table changes and do similar to the above
* Some sort of audit method on the History class that checks for bad things, and alerts if some way if they are found.
(In reply to Ben Hearsum [:bhearsum] from comment #1)
> Nick and I just talked about this and came up with a few possible ideas:
> * Cronjob that runs at least every 5min that looks for changes to the
> history tables, and alerts in some way if some undesired change is made
> * Mysql triggers that run on non-history table changes and do similar to the
> above

Did a bit of research on this. MySQL doesn't support sending mail directly through triggers (http://forums.mysql.com/read.php?99,33635,33651#msg-33651), and custom triggers require writing C code against an interface that hopefully won't change. Looks like triggers aren't a great solution here.
Initial patch for this is https://github.com/bhearsum/balrog/compare/master...change-notification

That patch only logs some messages to the application log, though. I need to talk to app? sec to see if using CEF logging would be better here, and whether or not whatever systems they have that process CEF events could e-mail whenever such an event comes up.
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> Initial patch for this is
> https://github.com/bhearsum/balrog/compare/master...change-notification
> 
> That patch only logs some messages to the application log, though. I need to
> talk to app? sec to see if using CEF logging would be better here, and
> whether or not whatever systems they have that process CEF events could
> e-mail whenever such an event comes up.

I pinged security about this, looks like it's something in opsec's domain. Still waiting to hear back.
Attached patch wip (obsolete) — Splinter Review
Still waiting to hear back from opsec about whether or not their cef logging systems can do notification to us. If they can't, will need some additional and a bit of modified.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Attachment #765772 - Flags: feedback?(nthomas)
Comment on attachment 765772 [details] [diff] [review]
wip

>+       @param onUpdate: Like onInsert, but the callable must support an
>+                        addition parameter:

Nit, addition -> additional.

>+def getHumanModificationMonitors(systemAccounts):
>+    # Long lines from "what" get truncated to avoid printing out massive
>+    # release blobs ty the logs.

Nit, s/ty/in/

>+    def onInsert(table, who, what):
>+        if who not in systemAccounts:
>+            truncated = str(what)[:100]

Would this condition be relaxed for CEF ? Seems like it would be useful information to have all of, somewhere outside of the db.

>+log_format = "%(asctime)s - %(levelname)s - PID: %(process)s - Request: %(requestid)s - %(name)s.%(funcName)s#%(lineno)s: %(message)s"

Is there any more info we can put in this ? I'm finding it too easy to parse the log lines :-) 

>diff --git a/requirements/dev.txt b/requirements/dev.txt
>+requests==0.10.8
>diff --git a/auslib/admin/__init__.py b/vendor/lib/python/mozilla_buildtools/__init__.py
>similarity index 100%
>copy from auslib/admin/__init__.py
>copy to vendor/lib/python/mozilla_buildtools/__init__.py
>diff --git a/vendor/lib/python/mozilla_buildtools/retry.py b/vendor/lib/python/mozilla_buildtools/retry.py
>new file mode 100644

Is this from another patch or leftover from earlier work ? Can't see where this is getting used.
Attachment #765772 - Flags: feedback?(nthomas) → feedback+
(In reply to Nick Thomas [:nthomas] from comment #6)
> >+    def onInsert(table, who, what):
> >+        if who not in systemAccounts:
> >+            truncated = str(what)[:100]
> 
> Would this condition be relaxed for CEF ? Seems like it would be useful
> information to have all of, somewhere outside of the db.

I hadn't thought of that, but we could certainly send the full data to the CEF log.

> >+log_format = "%(asctime)s - %(levelname)s - PID: %(process)s - Request: %(requestid)s - %(name)s.%(funcName)s#%(lineno)s: %(message)s"
> 
> Is there any more info we can put in this ? I'm finding it too easy to parse
> the log lines :-) 

Haha...I'm happy to change this if you'd like. I've erred far on the side of verbosity. Might be able to take out %(asctime)s because we have %(process)s and %(requestid)s. Maybe %(levelname)s is pointless, too. And I could see an argument for taking %(name)s and %(lineno)s....

> >diff --git a/requirements/dev.txt b/requirements/dev.txt
> >+requests==0.10.8
> >diff --git a/auslib/admin/__init__.py b/vendor/lib/python/mozilla_buildtools/__init__.py
> >similarity index 100%
> >copy from auslib/admin/__init__.py
> >copy to vendor/lib/python/mozilla_buildtools/__init__.py
> >diff --git a/vendor/lib/python/mozilla_buildtools/retry.py b/vendor/lib/python/mozilla_buildtools/retry.py
> >new file mode 100644
> 
> Is this from another patch or leftover from earlier work ? Can't see where
> this is getting used.

Ah, this is me re-adding retry.py because the snippet comparison script requires it. Unrelated to this patch.
Updated patch that uses CEF for notification + logs more things.

Simon, I took the approach that CEF_WARN was a good enough level for normal 400 errors. Anything that is obviously bad (eg, user trying to do something they're not authorized, forbidden domain found) are CEF_ALERT. I'm asking for feedback from you mainly to make sure that that is good enough.

Nick, the bulk of this patch is just adding cef_event calls wherever we're returning 400, plus for the human modification callbacks. The only tricky thing here is that whatever is setting up an application (wsgi files, standalone servers, tests) need to set auslib.app correctly, because cef_event needs app.config. I couldn't find another way of doing this that didn't require separate cef_event methods for each application.
Attachment #765772 - Attachment is obsolete: true
Attachment #768357 - Flags: review?(nthomas)
Attachment #768357 - Flags: feedback?(sbennetts)
The CEF logs look good, see also [1] for future ref.
CEF logs go into ArcSight automatically. Opsec will have to write custom rules for it. Is there a bug for the OpSec part of the sec review ?


[1] https://mana.mozilla.org/wiki/display/SECURITY/CEF+Guidelines+for+Application+Development+at+Mozilla
If ulfr is happy with the log levels, then so am I :)
(In reply to Julien Vehent [:ulfr] from comment #9)
> The CEF logs look good, see also [1] for future ref.
> CEF logs go into ArcSight automatically. Opsec will have to write custom
> rules for it. Is there a bug for the OpSec part of the sec review ?

I've been in contact with them, but don't have anything on file yet.
Comment on attachment 768357 [details] [diff] [review]
change notification w/ cef integration

Sample message is like:
CEF:0.5|Mozilla|Balrog|N/A|Unauthorized user foo tried to DELETE to /users/:id/permissions/:permission|Unauthorized user foo tried to DELETE to /users/:id/permissions/:permission|8|cs1Label=requestClientApplication cs1=Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20130626 Firefox/25.0 requestMethod=DELETE request=/users/bob/permissions/admin src=127.0.0.1 dhost=127.0.0.1:9000 suser=foo

I talked to kang on IRC and he said that the section before the severity looked really long. I'm going to try to shorten it up.
Attachment #768357 - Attachment is obsolete: true
Attachment #768357 - Flags: review?(nthomas)
Attachment #768357 - Flags: feedback?(sbennetts)
Attachment #769066 - Flags: review?(nthomas)
Comment on attachment 769066 [details] [diff] [review]
cleaned up cef messages

Review of attachment 769066 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for the delay in looking at this, and then I found a few things to query.

::: admin.ini-dist
@@ +13,5 @@
>  ; encryption key for session cookies. should be a few hundred bits
>  secret_key=
>  
>  [site-specific]
> +system_accounts=ffxbld

Lets leave this blank in repo.

::: auslib/AUS.py
@@ +48,5 @@
>      def containsForbiddenDomain(self, updateData):
>          for patch in updateData['patches']:
>              domain = urlparse(patch['URL'])[1]
>              if domain not in self.db.domainWhitelist:
> +                cef_event('Forbidden domain', CEF_ALERT, domain=domain, updateData=updateData)

Supposing a forbidden domain did get into the db, would we hammer syslog & downstream with a message for every request that mapped to that ? I'm wondering what the desired outcome is for the user facing side, because a lot of events might result in slowness/brokenness in several places.

::: auslib/admin/views/rules.py
@@ +61,5 @@
>          releaseNames = db.releases.getReleaseNames()
>          form.mapping.choices = [(item['name'],item['name']) for item in releaseNames]
>          form.mapping.choices.insert(0, ('', 'NULL' ) )
>          if not form.validate():
> +            cef_event("Bad input", CEF_WARN, errors=form.errors)

Is this sort of error something security wants to see ? It looks like we are notifying on bad API input. There are some other cases like this too.

::: auslib/db.py
@@ +1062,5 @@
>  
> +
> +def getHumanModificationMonitors(systemAccounts):
> +    # Long lines from "what" get truncated to avoid printing out massive
> +    # release blobs ty the logs.

Typo nit - 'ty'

::: auslib/log.py
@@ +3,5 @@
>  from flask import request
>  
> +import cef
> +
> +import auslib

catlee and I talked about other ways of dealing with the needing a reference to app.config, and he raised the idea of doing
 from auslib.web.base import app
here. The only things doing a blanket 'import auslib.log' prior to this patch are the four scripts that run balrog and admin apps (wsgi and not) so they already have app floating around. Hopefully the new things that import log will be OK or fixable too. What do you think ?
Attachment #769066 - Flags: review?(nthomas) → review-
(In reply to Nick Thomas [:nthomas] from comment #14)
> ::: auslib/AUS.py
> @@ +48,5 @@
> >      def containsForbiddenDomain(self, updateData):
> >          for patch in updateData['patches']:
> >              domain = urlparse(patch['URL'])[1]
> >              if domain not in self.db.domainWhitelist:
> > +                cef_event('Forbidden domain', CEF_ALERT, domain=domain, updateData=updateData)
> 
> Supposing a forbidden domain did get into the db, would we hammer syslog &
> downstream with a message for every request that mapped to that ? I'm
> wondering what the desired outcome is for the user facing side, because a
> lot of events might result in slowness/brokenness in several places.

Hrm. That's a really good point. I'm not sure what to do here. I feel like it _is_ useful to notify on this. Eg, we accidentally or on purpose remove a whitelisted domain, and then start serving a bunch of null requests as a result. Perhaps that would be noticed by QA automation or our own automation though...

I'm definitely worried about spamming the CEF log now that you point this out.

> ::: auslib/admin/views/rules.py
> @@ +61,5 @@
> >          releaseNames = db.releases.getReleaseNames()
> >          form.mapping.choices = [(item['name'],item['name']) for item in releaseNames]
> >          form.mapping.choices.insert(0, ('', 'NULL' ) )
> >          if not form.validate():
> > +            cef_event("Bad input", CEF_WARN, errors=form.errors)
> 
> Is this sort of error something security wants to see ? It looks like we are
> notifying on bad API input. There are some other cases like this too.

Yup, they do: https://mana.mozilla.org/wiki/display/SECURITY/CEF+Guidelines+for+Application+Development+at+Mozilla#CEFGuidelinesforApplicationDevelopmentatMozilla-WhenToLog?


> ::: auslib/log.py
> @@ +3,5 @@
> >  from flask import request
> >  
> > +import cef
> > +
> > +import auslib
> 
> catlee and I talked about other ways of dealing with the needing a reference
> to app.config, and he raised the idea of doing
>  from auslib.web.base import app
> here. 

The reason that doesn't work is because this code is used by both applications, and doesn't know which one is calling it. So if we import auslib.web.base.app, we don't get the right thing in the admin application. And it gets worse...because db.py calls cef_event, and also doesn't know which application is running, you can't even have wrappers in auslib.admin/auslib.web. I'm open to suggestions here, I ran out of ideas though!
Hmm, good point. The other idea we had was stashing the cef config on the log module during the 4 app initialisations.
Attached patch improve cef_config location (obsolete) — Splinter Review
(In reply to Nick Thomas [:nthomas] from comment #16)
> Hmm, good point. The other idea we had was stashing the cef config on the
> log module during the 4 app initialisations.

Yeah, that's much better...here's a patch with that! I also added a Makefile target to run pyflakes...because I'm lazy.

Note to self: If this patch passes review, need to get the the server config updated before pushing. Might want to wait for the new nodes to avoid distracting IT with updating the old ones.
Attachment #774813 - Flags: review?(nthomas)
Comment on attachment 774813 [details] [diff] [review]
improve cef_config location

>diff --git a/AUS-server.py b/AUS-server.py
> 
>+    import auslib
>     from auslib.web.base import app, AUS
> 
>+    auslib.log.cef_config = auslib.log.get_cef_config(options.cefLog)
>     AUS.setDb(options.db)
>     AUS.db.setDomainWhitelist(options.whitelistedDomains)
>     try:
>         AUS.db.create()
>     except DatabaseAlreadyControlledError:
>         pass
> 
>     app.config['SECRET_KEY'] = 'abc123'
>     app.config['DEBUG'] = True
>+    auslib.app = app

Hmm, I was hoping this last line would go away in a bunch of places, right patch ?
(In reply to Nick Thomas [:nthomas] from comment #18)
> Comment on attachment 774813 [details] [diff] [review]
> improve cef_config location
> 
> >diff --git a/AUS-server.py b/AUS-server.py
> > 
> >+    import auslib
> >     from auslib.web.base import app, AUS
> > 
> >+    auslib.log.cef_config = auslib.log.get_cef_config(options.cefLog)
> >     AUS.setDb(options.db)
> >     AUS.db.setDomainWhitelist(options.whitelistedDomains)
> >     try:
> >         AUS.db.create()
> >     except DatabaseAlreadyControlledError:
> >         pass
> > 
> >     app.config['SECRET_KEY'] = 'abc123'
> >     app.config['DEBUG'] = True
> >+    auslib.app = app
> 
> Hmm, I was hoping this last line would go away in a bunch of places, right
> patch ?

Whoops, I forgot to remove this. It's unused now. I'll repost.
Also hushed the last pyflakes warning, so that target returns 0.
Attachment #774813 - Attachment is obsolete: true
Attachment #774813 - Flags: review?(nthomas)
Attachment #777950 - Flags: review?(nthomas)
Comment on attachment 777950 [details] [diff] [review]
remove unneeeded references

Review of attachment 777950 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with some changes to further tidy up. If landing will be delayed for the new nodes feel free to update the patch and carry the review over.

Did OpSec signoff on a CEF message for every update request with a bad domain ? I can't think of any alternatives that would give a timely report of badness with our existing systems.

::: AUS-server.py
@@ +33,2 @@
>  
> +    import auslib

I was a bit specific on the last review, and this can go too I think. See also admin.py and admin.wsgi.

::: admin.ini-dist
@@ +6,5 @@
>  [logging]
>  ;Where to put the application log. No rotation is done on this file.
>  logfile=/var/log/aus.log
> +;Where to log CEF messages. Normally set to syslog.
> +cef_config=syslog

Looks like this can be a path to a file instead (eg development) but will cause an exception on the first cef_event() if undefined. Please add a comment about that here and in balrog.ini-dist.

::: auslib/test/admin/views/base.py
@@ +4,5 @@
>  import unittest
>  
>  from flask import Response
>  
> +import auslib

Shouldn't be necessary any more.

@@ +24,5 @@
>          app.config['SECRET_KEY'] = 'abc123'
>          app.config['DEBUG'] = True
>          app.config['CSRF_ENABLED'] = False
> +        auslib.log.cef_config = auslib.log.get_cef_config(self.cef_file)
> +        auslib.app = app

Likewise the assignment of auslib.app here.

::: auslib/test/web/test_client.py
@@ +7,5 @@
>  
>  class ClientTest(unittest.TestCase):
>      def setUp(self):
>          app.config['DEBUG'] = True
> +        auslib.app = app

Remove this and the earlier 'import auslib' here too.

::: balrog.wsgi
@@ +9,5 @@
>  
>  from raven.contrib.flask import Sentry
>  
>  from auslib.config import ClientConfig
> +import auslib

This should be auslib.log, right ?

@@ +27,2 @@
>  
> +import auslib

And remove this one ?
Attachment #777950 - Flags: review?(nthomas) → review+
Please use "needinfo?" if you want to get our feedback in the bug, its the most efficient way/makes sure its being seen if we're not assigned to the bug :) (we're all copied on a zillion bugs..)


Can you send some test messages through syslog? I'll check that we receive them and they're parsing correctly (or if you have logger access / you can request logger access if you want, check for deviceProduct="Balrog")

Additionally:
- we can send out alerts based on events in various ways, although i would suggest having your own alerts if that's for reliability. we do (and want to) handle security related ones.
- arcsight can cope with quite a bit of log spam, how many messages per second are we talking about?
- by default we log all the things sent via cef to our central loggers. for alerting, we need to know exactly what messages we can alert on, and sometimes a little bit of help with the logic to make sure we're alerting on everything that makes sense. if each message is an alert, that might be fine as well, we just need to know so that i can tell arcsight to alert, basically
Flags: needinfo?(bhearsum)
(In reply to Guillaume Destuynder [:kang] from comment #22)
> Please use "needinfo?" if you want to get our feedback in the bug, its the
> most efficient way/makes sure its being seen if we're not assigned to the
> bug :) (we're all copied on a zillion bugs..)
> 
> 
> Can you send some test messages through syslog? I'll check that we receive
> them and they're parsing correctly (or if you have logger access / you can
> request logger access if you want, check for deviceProduct="Balrog")

If you know how to send them by hand, I can send some test ones. Otherwise, I won't be able to until this patch lands.

> - arcsight can cope with quite a bit of log spam, how many messages per
> second are we talking about?

Most of the time, they'll be nothing. If we have an incident where a forbidden domain is in an active blob, we'll get one message per request that ends up at that blob. Could be 100s or 1000s per minute.

> - by default we log all the things sent via cef to our central loggers. for
> alerting, we need to know exactly what messages we can alert on, and
> sometimes a little bit of help with the logic to make sure we're alerting on
> everything that makes sense. if each message is an alert, that might be fine
> as well, we just need to know so that i can tell arcsight to alert, basically

Going to deal with the alerting part after this lands, this is very helpful though, thanks!
Flags: needinfo?(bhearsum)
(In reply to Nick Thomas [:nthomas] from comment #21)
> ::: admin.ini-dist
> @@ +6,5 @@
> >  [logging]
> >  ;Where to put the application log. No rotation is done on this file.
> >  logfile=/var/log/aus.log
> > +;Where to log CEF messages. Normally set to syslog.
> > +cef_config=syslog
> 
> Looks like this can be a path to a file instead (eg development) but will
> cause an exception on the first cef_event() if undefined. Please add a
> comment about that here and in balrog.ini-dist.

How about setting "syslog" or "cef.log" as the default? Might be a little more user friendly.
@comment 23
For sending test messages, you can copy the output and send it from any system (dev/stg included) with a command such as:
$ logger -p local4.info 'insert cef msg here'

For the threshold: if we have rare spikes of 1000 msgs per minute thats no problem (even non-rare). You can always add the correlation logic and send a "1000 attempts for X", which saves some work/resources/bandwidth but its not required.
Trying to figure out how to get this landed. It depends on config changes, but I think we're close to moving to new dev nodes...so maybe we'll wait for that. I'll update this bug once I figure out what we're doing.
Product: mozilla.org → Release Engineering
mass component change
Component: General Automation → Balrog: Backend
Blocks: balrog-beta
Depends on: 962706
Comment on attachment 777950 [details] [diff] [review]
remove unneeeded references

I pushed this, but I'm going to wait until at least bug 933161 is ready before asking IT to put it in production, since it's not urgent.
Attachment #777950 - Flags: checked-in+
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/ea743c7b9a83658f9bf4172239dd67f11d7a7b64
bug 880358: balrog should notify when a release is added by a human. r=nthomas
Depends on: 964396
Depends on: 964426
This is landed (finally) and seems to be working. Kang did notice bug 965915 though, which we should fix soon.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: