As discussed with :lonnen on IRC, we're seeing socorro go a bit metrics-crazy in graphite in PHX1, contributing to disk shortage woes. For example: stats_counts.socorro-prod.webapp.middleware.GET.* has 18K-20K metrics on each of our four graphite servers in PHX1. Many have UUIDs like: bpapi-crash_data-datatype-processed-uuid-d479d19c-d09e-4427-8452-bd5722130916- which is probably too fine-grained to be of much trending value as Graphite is meant for. Others have specific dates like: bpapi-signaturesummary-report_type-uptime-signature-hang2520257C2520ZwFsControlFile-start_date-2013-09-09T00253A00253A00-end_date-2013-09-16T00253A00253A00- which has pretty much the same problem. I'm seeing this from socorro prod, stage and dev.
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/7ab6b5c8bd17ee2c13cddc57cfad2bd4ee50ae68 fixes Bug 916905 - removing unique uuids and dates https://github.com/mozilla/socorro/commit/3537765878af77dc24b08940ed7e1339e516bd8b Merge pull request #1514 from GabiThume/bug916905 Fixes Bug 916905 - removing unique uuids and dates
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Assignee: nobody → gabithume
Target Milestone: --- → 60
I'm seeing about 128000 metrics in PHX1 for GET urls. I don't think we should be doing any per-url metrics. Can you A) fix that again and B) tell me what we can do to prevent this from happening in the future?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
@ericz - we shipped a new endpoint. I think you're right and that we should stop using URLs for this. Maybe the answer is to map them to django views, which is a lot less useful because it doesn't contain product info... but oh well. It's that or remove it completely.
Created attachment 821384 [details] socorro_gets.txt After another cleanup of Socorro metrics in PHX1 last week, attached is the current 18,075 GET metrics it has since received from Socorro. Of those, 12,347 are from the middleware component. This includes staging and prod. :lonnen, can you review this and make sure that this is what you expect to see in Graphite and at least mostly useful data? After a brief review myself, it looks like some sanitation is done on some of them such as the dates in stats.socorro-stage.webapp.middleware.GET.bpapi-signaturesummary-report_type-products-signature-js253A253AGCMarker253A253AprocessMarkStackTop2528js253A253ASliceBudget25262529-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-versions-Firefox253A26-0a2-.200 and whatever the underlines represent in stats.socorro-prod.webapp.middleware.GET.bpapi-signaturesummary-report_type-uptime-signature-F2102588022______________________________________________________________________________________________________-start_date-XXXX-XX-XX-end_date-XXXX-XX-XX-.200 but I should mention that you should keep in mind that metric names in Graphite directly translate into file names and with some of the Socorro metrics we are exceeding the maximum filename limit -- those will never be created, they just repeatedly log errors when it tries to create them.
The underlines are no sanitation, they are part of signatures (and a side-effect of Abode's encoding of function names in their symbols).
Eric -- I don't expect to see any *.analytics.* stats, period. It's all removed from the code. lonnen@musashi:~/repos/socorro master:? [17:14:56] $ grep -r "analytics" webapp-django/ I'm really baffled.
filed: https://bugzilla.mozilla.org/show_bug.cgi?id=930585 we'll stop sending any metrics for a while and see if that helps
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago → 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.