Closed Bug 1057351 Opened 10 years ago Closed 10 years ago

graphite is showing stats.input-prod data with a baseline of 0.1

Categories

(Infrastructure & Operations :: Tools, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: ericz)

Details

Yesterday a bunch of the stats.input-prod things went from a baseline of 0 to a baseline of 0.1.

That's a horrible description, but a better visual is this:

https://graphite-phx1.mozilla.org/render/?width=586&height=308&_salt=1408712797.839&from=00%3A00_20140821&until=23%3A59_20140822&target=stats.input-prod.response.200&target=stats.input-prod.response.404

On the left side of the graph, both lines go all the way down to 0. Then at around 10pm, both lines stop going all the way down to 0--they go down to 0.1. Neither ever gets down to 0.

This happens with all the data in stats.input-prod.

I looked at stats.sumo and the baseline doesn't change from 0 to 0.1 for those stats, but the lines do look different. For lines with less data, they go from a solid line to a dotted line:

https://graphite-phx1.mozilla.org/render/?width=586&height=308&_salt=1408713075.725&from=00%3A00_20140821&until=23%3A59_20140822&target=stats.sumo.response.200&target=stats.sumo.response.400

Something seems really fishy.

Did graphite get updated on Thursday? Am I doing something horribly wrong?
I just checked libraries for sumo and input and noticed input is pretty far behind on pystatsd. I wrote up bug #1057353 to update that and see if that alleviates the 0 -> 0.1 baseline issue. I'll do that monday since we don't push on fridays.
Given it's monday morning EDT, I made the changes, got them reviewed and pushed them. Input and SUMO are now using the same versions of django-statsd and pystatsd libraries. There's no change to the graphs--new data is still baselining at 0.1.

Did graphite get updated on Thursday in some way? Any ideas what might be going on or what else I can check?
(In reply to Will Kahn-Greene [:willkg] from comment #2)
> Given it's monday morning EDT, I made the changes, got them reviewed and
> pushed them. Input and SUMO are now using the same versions of django-statsd
> and pystatsd libraries. There's no change to the graphs--new data is still
> baselining at 0.1.
> 
> Did graphite get updated on Thursday in some way? Any ideas what might be
> going on or what else I can check?

Will -
No changes to graphite, but ericz and I have seen similar in the past 24 hours.  We are investigating and will try to give an update shortly.
Assignee: server-ops → eziegenhorn
Component: Server Operations → Tools
Product: mozilla.org → Infrastructure & Operations
The spotted lines was from an attempt to get statsd to not repeatedly send old metrics forever as it was making up a huge portion of the metrics sent to graphite and it was useless.  Instead of statsd supplying zeros when it doesn't get data for a metric, it just didn't send anything on.  What :willkg noticed with some stats seeming to have a baseline of 0.1 was caused by that same effect combined with whatever percentage calculations statsd was doing.  The lesser amount of data and primarily the lower number of zeros caused the calculations to change by about 0.1.  The statsd config (deleteIdleStats is the relevant option) was reverted and everything looks back to normal now.  I'll open a new bug to find a workaround for this impasse.  Thanks :willkg for your assistance figuring this out!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.