Closed Bug 1275010 Opened 9 years ago Closed 9 years ago

MESSAGE_MANAGER_MESSAGE_SIZE histogram should not be aggregated

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

MESSAGE_MANAGER_MESSAGE_SIZE has over 10K unique keys for a single submission date, like: "ublock0:sb:3471" "ublock0:sb:2618" "ublock0:sb:3792" ... This is causing the aggregation service, that supports t.m.o, to take a *very* long time to compute aggregates and render the service unavailable in some cases. The aggregation service was never designed to support histograms with that many keys and the histogram should be filtered out entirely by the aggregation pipeline. Bill, I am assuming that you are not using the aggregates from t.m.o., am I right?
Flags: needinfo?(wmccloskey)
Apparently some days have over 100K unique keys...
Assignee: nobody → rvitillo
Priority: -- → P1
I'm not sure what you mean by "aggregates". We definitely use the data on telemetry.mozilla.org to look at the MESSAGE_MANAGER_MESSAGE_SIZE histogram (or we did when it first landed; it seems to take forever now). I filed bug 1275032 just now with an idea to prevent these ublock keys from polluting the data. In the short term, could you throw away any keys that contain digits? We don't care about them and it would eliminate all these ublock keys. Doing this retroactively would be fine too. Sorry for the trouble. I wasn't aware ublock was doing this when I landed the change.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #2) > I'm not sure what you mean by "aggregates". We definitely use the data on > telemetry.mozilla.org to look at the MESSAGE_MANAGER_MESSAGE_SIZE histogram > (or we did when it first landed; it seems to take forever now). Right, t.m.o isn't useful right now for MESSAGE_MANAGER_MESSAGE_SIZE. > I filed bug 1275032 just now with an idea to prevent these ublock keys from > polluting the data. In the short term, could you throw away any keys that > contain digits? We don't care about them and it would eliminate all these > ublock keys. Doing this retroactively would be fine too. The aggregation job is going to ignore the histogram entirely for the time being. I would like to avoid special casing the aggregation job in order to fix issues with histograms that should be dealt with on the client side. Even if we landed such change, it would require to retroactively apply it on the database as well as t.m.o will happily continue to fetch aggregates built in the past month, which will render the service unavailable. This is maybe something Chris could look into? For now, you can compute the aggregates by running your own Spark job.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Once I've fixed the issues with the key spam (which I'm going to do in bug 1275707), should I just rename the telemetry from MESSAGE_MANAGER_MESSAGE_SIZE to something else? I'll get bsmedberg to review it. MESSAGE_MANAGER_MESSAGE_SIZE is on beta, and we're pretty late in beta, so I'm not sure I'm going to be able to fix it there before it goes to release.
Flags: needinfo?(rvitillo)
Sorry, I should have fixed this earlier on the client side. I'd noticed all of these keys a little while ago, and that the site was super slow sometimes when looking at them.
(In reply to Andrew McCreight [:mccr8] from comment #5) > Once I've fixed the issues with the key spam (which I'm going to do in bug > 1275707), should I just rename the telemetry from > MESSAGE_MANAGER_MESSAGE_SIZE to something else? Yes, please rename the probe.
Flags: needinfo?(rvitillo)
The new version has landed in bug 1275707, as MESSAGE_MANAGER_MESSAGE_SIZE2. No ublock data yet, but I'll keep an eye on it to double check that it isn't spammy.
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.