Too many graphite metrics from Socorro, some including UUIDs

RESOLVED FIXED in 60

Status

Socorro
Infra
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: ericz, Assigned: Gabriela Thumé)

Tracking

unspecified
x86
Linux

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa-])

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
As discussed with :lonnen on IRC, we're seeing socorro go a bit metrics-crazy in graphite in PHX1, contributing to disk shortage woes.  For example:

stats_counts.socorro-prod.webapp.middleware.GET.*

has 18K-20K metrics on each of our four graphite servers in PHX1.  Many have UUIDs like:

bpapi-crash_data-datatype-processed-uuid-d479d19c-d09e-4427-8452-bd5722130916-

which is probably too fine-grained to be of much trending value as Graphite is meant for.  Others have specific dates like:

bpapi-signaturesummary-report_type-uptime-signature-hang2520257C2520ZwFsControlFile-start_date-2013-09-09T00253A00253A00-end_date-2013-09-16T00253A00253A00-

which has pretty much the same problem.  I'm seeing this from socorro prod, stage and dev.
(Reporter)

Updated

5 years ago
Group: mozilla-corporation-confidential

Comment 1

5 years ago
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/7ab6b5c8bd17ee2c13cddc57cfad2bd4ee50ae68
fixes Bug 916905 - removing unique uuids and dates

https://github.com/mozilla/socorro/commit/3537765878af77dc24b08940ed7e1339e516bd8b
Merge pull request #1514 from GabiThume/bug916905

Fixes Bug 916905 - removing unique uuids and dates

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Assignee: nobody → gabithume
Target Milestone: --- → 60

Updated

5 years ago
Whiteboard: [qa-]
(Reporter)

Comment 2

5 years ago
I'm seeing about 128000 metrics in PHX1 for GET urls.  I don't think we should be doing any per-url metrics.  Can you A) fix that again and B) tell me what we can do to prevent this from happening in the future?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 3

5 years ago
@ericz - we shipped a new endpoint. I think you're right and that we should stop using URLs for this. Maybe the answer is to map them to django views, which is a lot less useful because it doesn't contain product info... but oh well. It's that or remove it completely.
(Reporter)

Comment 4

5 years ago
Created attachment 821384 [details]
socorro_gets.txt

After another cleanup of Socorro metrics in PHX1 last week, attached is the current 18,075 GET metrics it has since received from Socorro.  Of those, 12,347 are from the middleware component.  This includes staging and prod.  :lonnen, can you review this and make sure that this is what you expect to see in Graphite and at least mostly useful data?

After a brief review myself, it looks like some sanitation is done on some of them such as the dates in 

stats.socorro-stage.webapp.middleware.GET.bpapi-signaturesummary-report_type-products-signature-js253A253AGCMarker253A253AprocessMarkStackTop2528js253A253ASliceBudget25262529-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-versions-Firefox253A26-0a2-.200

and whatever the underlines represent in

stats.socorro-prod.webapp.middleware.GET.bpapi-signaturesummary-report_type-uptime-signature-F2102588022______________________________________________________________________________________________________-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-.200

but I should mention that you should keep in mind that metric names in Graphite directly translate into file names and with some of the Socorro metrics we are exceeding the maximum filename limit -- those will never be created, they just repeatedly log errors when it tries to create them.
Attachment #821384 - Flags: feedback?(chris.lonnen)

Comment 5

5 years ago
The underlines are no sanitation, they are part of signatures (and a side-effect of Abode's encoding of function names in their symbols).

Comment 6

5 years ago
Eric -- I don't expect to see any *.analytics.* stats, period. It's all removed from the code.

lonnen@musashi:~/repos/socorro master:?
[17:14:56] $ grep -r "analytics" webapp-django/

I'm really baffled.

Comment 7

5 years ago
filed: https://bugzilla.mozilla.org/show_bug.cgi?id=930585

we'll stop sending any metrics for a while and see if that helps

Updated

5 years ago
Attachment #821384 - Flags: feedback?(chris.lonnen)

Updated

5 years ago
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.