add metrics generation for eliot
Categories
(Eliot :: General, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(5 files)
When writing Eliot, I added some rough metrics and I have the infrastructure for generating metrics implemented, but I didn't spend a lot of time thinking about what questions I wanted to answer with metrics and making sure I had metrics to generate data to answer those questions.
This bug covers that work.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Assignee | ||
Comment 2•4 years ago
|
||
willkg merged PR #2267: "bug 1674406: reimplement metrics in Eliot and add new ones" in 4ba06a9.
I also redid metrics in Eliot so that they're auto-documented. It extends a proof-of-concept I did a while back. I figure we'll test it out here and if it works, I'll switch Tecken webapp to do the same thing. If it doesn't work, then we'll remove it.
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Assignee | ||
Comment 5•3 years ago
|
||
Brian pointed out the gauges should be counters. Then he said it'd be even better if they were histograms since we're probably going to look at max/min/mean over time for them.
Changing them to histograms now.
Assignee | ||
Comment 6•3 years ago
|
||
Assignee | ||
Comment 7•3 years ago
|
||
Assignee | ||
Comment 8•3 years ago
|
||
The thing left to do here is build a dashboard.
Assignee | ||
Comment 9•3 years ago
|
||
I started to add a dashboard.
Things we want to see:
- symbolication v4 vs. v5 usage
- how long it takes to handle a symbolication request
- mean and 95 percentile for how long it takes to parse SYM files
- cache hits vs. cache misses
- cache churn (adding things and removing things)
I've got graphs for 1, 2, and 3.
Cache hits vs. cache misses can't be done now because the current code has that in the diskcache get method, but that's effectively skipped because the symbolicator_resource code checks to see if it's in the cache first. Oops. I'll need to fix that.
I thought I had a graph for 5 looking at eliot.diskcache.set vs. eliot.diskcache.evict, but I'm not seeing any evict metrics. Either the cache is so enormous that I haven't hit evictions, yet, or the disk cache manager isn't set up to send metrics, yet. I need to look into that.
Assignee | ||
Comment 10•3 years ago
|
||
Assignee | ||
Comment 11•3 years ago
|
||
Assignee | ||
Comment 12•3 years ago
|
||
Assignee | ||
Comment 13•3 years ago
|
||
Assignee | ||
Comment 14•3 years ago
|
||
I fixed 4 (cache hits vs. misses).
I spent a bunch of time looking at 5 (cache churn). I haven't seen any evictions, but the cache manager is emitting metrics, so it's entirely possible I haven't put enough load on the server to create an eviction.
Regardless, I've got dashboard now and I think I'm going to call this good. We can do followup bugs with specific needs.
Marking as FIXED.
Assignee | ||
Comment 15•1 year ago
|
||
Moving to Eliot product.
Description
•