Closed Bug 916905 Opened 11 years ago Closed 11 years ago

Too many graphite metrics from Socorro, some including UUIDs

Categories

(Socorro :: Infra, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Assigned: gabithume)

Details

(Whiteboard: [qa-])

Attachments

(1 file)

As discussed with :lonnen on IRC, we're seeing socorro go a bit metrics-crazy in graphite in PHX1, contributing to disk shortage woes.  For example:

stats_counts.socorro-prod.webapp.middleware.GET.*

has 18K-20K metrics on each of our four graphite servers in PHX1.  Many have UUIDs like:

bpapi-crash_data-datatype-processed-uuid-d479d19c-d09e-4427-8452-bd5722130916-

which is probably too fine-grained to be of much trending value as Graphite is meant for.  Others have specific dates like:

bpapi-signaturesummary-report_type-uptime-signature-hang2520257C2520ZwFsControlFile-start_date-2013-09-09T00253A00253A00-end_date-2013-09-16T00253A00253A00-

which has pretty much the same problem.  I'm seeing this from socorro prod, stage and dev.
Group: mozilla-corporation-confidential
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/7ab6b5c8bd17ee2c13cddc57cfad2bd4ee50ae68
fixes Bug 916905 - removing unique uuids and dates

https://github.com/mozilla/socorro/commit/3537765878af77dc24b08940ed7e1339e516bd8b
Merge pull request #1514 from GabiThume/bug916905

Fixes Bug 916905 - removing unique uuids and dates
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee: nobody → gabithume
Target Milestone: --- → 60
Whiteboard: [qa-]
I'm seeing about 128000 metrics in PHX1 for GET urls.  I don't think we should be doing any per-url metrics.  Can you A) fix that again and B) tell me what we can do to prevent this from happening in the future?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
@ericz - we shipped a new endpoint. I think you're right and that we should stop using URLs for this. Maybe the answer is to map them to django views, which is a lot less useful because it doesn't contain product info... but oh well. It's that or remove it completely.
Attached file socorro_gets.txt
After another cleanup of Socorro metrics in PHX1 last week, attached is the current 18,075 GET metrics it has since received from Socorro.  Of those, 12,347 are from the middleware component.  This includes staging and prod.  :lonnen, can you review this and make sure that this is what you expect to see in Graphite and at least mostly useful data?

After a brief review myself, it looks like some sanitation is done on some of them such as the dates in 

stats.socorro-stage.webapp.middleware.GET.bpapi-signaturesummary-report_type-products-signature-js253A253AGCMarker253A253AprocessMarkStackTop2528js253A253ASliceBudget25262529-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-versions-Firefox253A26-0a2-.200

and whatever the underlines represent in

stats.socorro-prod.webapp.middleware.GET.bpapi-signaturesummary-report_type-uptime-signature-F2102588022______________________________________________________________________________________________________-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-.200

but I should mention that you should keep in mind that metric names in Graphite directly translate into file names and with some of the Socorro metrics we are exceeding the maximum filename limit -- those will never be created, they just repeatedly log errors when it tries to create them.
Attachment #821384 - Flags: feedback?(chris.lonnen)
The underlines are no sanitation, they are part of signatures (and a side-effect of Abode's encoding of function names in their symbols).
Eric -- I don't expect to see any *.analytics.* stats, period. It's all removed from the code.

lonnen@musashi:~/repos/socorro master:?
[17:14:56] $ grep -r "analytics" webapp-django/

I'm really baffled.
filed: https://bugzilla.mozilla.org/show_bug.cgi?id=930585

we'll stop sending any metrics for a while and see if that helps
Attachment #821384 - Flags: feedback?(chris.lonnen)
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: