Add filters and notifiers to heka for statsd data



5 years ago
4 years ago


(Reporter: jason, Assigned: oremj)


Firefox Tracking Flags

(Not tracked)



(2 attachments)



5 years ago
Create Lua filters that will watch the statsd data as it flows through, and generate notifiers when the triggers happen.


5 years ago
Assignee: server-ops-amo → oremj

Comment 1

5 years ago
What data do we want to filter/notify on?
Andy: Have specs for the first few stats you'd like to notify on?
Flags: needinfo?(amckay)

Comment 3

5 years ago
I like the standard deviation script that you showed in the secret channel for these values:

Flags: needinfo?(amckay)
Great, I'll start with these. Will try to have something to experiment with by the end of this week.
Aaaaannnd I'm not going to have anything ready by the end of this week, I've been busy w/ other things. I've finally been able to start on it today, though, and will definitely have some stuff to try out next week.
I've made a ton of progress on this and am to a point where I'm ready to start testing out my code on the running dev/stage Heka instance. To simplify the testing and iteration process, I've had update the Heka config to allow me to dynamically add and remove sandbox filters w/o a restart or any further config changes. My first attempt at uploading my script failed, though, b/c I was depending on recent changes to the Lua environment, so Heka itself needs to be updated first. I've built an RPM from a recent Heka nightly and have given it to Jason for deployment, attached here for posterity.
Okay, well, the RPM was slightly too large to upload, so here's a link:
We have lift-off:


Apologies for the still buggy dashboard UI, but this page might need a refresh before you see everything. Ultimately, however, there should be a list of output links on the right side of the page. Among these should be 4 of type "CBUF", which will take you to graphs for the response codes for each listed host.

For now there are no notifiers going out. Instead, when an anomaly is detected, an annotation is placed on the graph. This is to avoid a flood of false positives, since we have a lot of work to get the anomaly detection dialed in to not fire except when we really want it to. This is especially true when we're on dev and stage, since the traffic is low and intermittent.

I need to switch gears for a bit to handle some other Heka stuff, but I'll come back to this in a week or so to help push this along further. The code that's generating those graphs was dynamically loaded into Heka, so for now I'm going to attach that code and the related config to this bug so we have it for posterity.
Created attachment 8372533 [details]
Lua source to the addons/marketplace http_monitor filter
Created attachment 8372534 [details]
TOML config for addons/marketplace http_monitor filter


5 years ago
Priority: -- → P4


4 years ago
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: → Mozilla Services
Version: other → unspecified
It's close. The filter you linked to set up to process the output of Heka's http server log (nginx, apache, haproxy) parsing instead of the same info coming in via statsd.

I actually did deploy a filter to graph and analyse the statsd data. The anomaly detection stuff is only doing graph annotations, not actually sending notifiers, b/c at the time our anomaly detection code was a bit more rough and had too many false positives when the data was sparse. We've since added some algorithms that will handle this better.

Also, I've recently added a filter that can handle statsd data in the general case:

which is a better choice than the one-off that I originally wrote for marketplace. Just realized I need to add support for anomaly detection there, though, I'll work on that today or tomorrow and will update this ticket when it's ready.
Flags: needinfo?(rmiller)
Okay, as of the stat_graph filter (for general purpose graphing and monitoring of specific statsd data points) supports anomaly detection and has been merged to dev. So the custom filter code that I deployed can be replaced, either by http_status.lua (if you want to go the 'parsing the log files' route) or by stat_graph.lua (if you want to continue to use statsd as the source of the monitored data).


4 years ago
Last Resolved: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.