Closed Bug 519871 Opened 12 years ago Closed 12 years ago

Get collector traffic (wc -l) over last 3 months

Categories

(Mozilla Metrics :: Data/Backend Reports, defect)

All
Other
defect
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: morgamic, Assigned: morgamic)

Details

Attachments

(1 file)

We're trying to profile crash volume and need to run a line count on access logs dating back as far as we can go.  You guys have a script for this -- can we run this on crash-reports.m.... and get a simple line graph of count per day?
Can Metrics help here?
Assignee: server-ops → nobody
Group: infra
Component: Server Operations → Data/Backend Reports
Product: mozilla.org → Mozilla Stats
QA Contact: mrz → data-reports
Target Milestone: --- → 0.1
Version: other → 0.1
that should be fairly simple, where do the logs sit and what in the logs shuold we be looking for? (or are we looking for everything in the wc -l style?)
Ideally we'd find a way to trend these logs over time, and if you have the ability to split by product, version and platform that's a definite bonus.
Tell me where that is and I'll see what I can do.


Full analysis on crash data is a big Q4 objective for us, but if I can quickly come up with something to to solve this particular questions, happy to help
In about half an hour, the following file will be finished writing out:
cm-metricsetl02:/etl/processed/crash_report_trend.tsv.gz

It contains three columns,
yyyy-mm-dd-hh    user_agent_string    count_requests


Currently, we're averaging about 150 unique user agent strings.  The Mac crash reporters tend to put the precise name of the machine type (i.e. iMac MacBook MacBookPro) plus the kernel version number which leads to fragmented records.

It is likely the file might be useful to you as is, but if not, Pedro or I can easily take it and run a real ETL process on it to properly tokenize and cluster the UA strings.  Just let us know what type of analysis you want to do.

For reference, here is the one-liner I used to create the file.  It probably would have been a little easier to read if I had done a true perl script but...

zgrep -oP '(?<=" ")[^"]+(?=" ")' *.gz | pv -s $(gzip -l *.gz | tail -1 | awk '{print $2}') | sed -re 's/^access_(.............)[^:]+:(.*)/\1\t\2/' | sort -S 10G | uniq -c | awk '{printf "%s\t%s\t%d\n", $2, $3, $1}' | gzip > /etl/processed/crash_report_trend.tsv.gz

zgrep -oP
      -- only print the matching text, use Perl like syntax
'(?<=" ")[^"]+(?=" ")'
      -- A regex to match just the user agent string without the surrounding quotes (so -o gives me what I want)

pv -s $(gzip -l *.gz | tail -1 | awk '{print $2}')
      -- pipeviewer to give me status and performance of the oneliner.  Since I can't muck with the list of filenames that zgrep is parsing (otherwise, it wouldn't emit the "filename:" prefix), I manually calculate the total size of the stream with gzip -l

sed -re 's/^access_(.............)[^:]+:(.*)/\1\t\2/'
      -- Strip off the "filename:" part returning just the yyyy-mm-dd-hh and the UA

sort -S 10G | uniq -c
      -- Big fat sort buffer, then emit the unique rows with a count

awk '{printf "%s\t%s\t%d\n", $2, $3, $1}'
      -- Reorder the columns and make them tab separated

gzip > /etl/processed/crash_report_trend.tsv.gz
      -- Write it out to a gzip file.
Sorry, one more thing:  The requests don't have much information about the product or version.  If it can't be derived from the information in the user agent string, it won't be able to be derived from the access log requests at all.
morgamic:

Could you review the user agent strings captured in this and suggest what kind of breakdowns would be useful to you (or anyone else) on this?

Metrics can work on capturing this data out of the log files on a continual basis, but we need to know the best way to parse this string to allow people to make sense of it.
Attachment #404855 - Flags: review?(morgamic)
p.s. the data starts at 2009-09-12-21 because those were the earliest logs I had available.  It looks like the logs get deleted less than a month after creation.
Assignee: nobody → morgamic
Status: NEW → ASSIGNED
Whiteboard: Waiting review
Shaver - here's what you asked for.  Daniel - I think Shaver should work with the Firefox team to provide smarter URLs similar to AUS2 since we don't have much useful data in the URL currently (since everything comes via POST).  Mike, is that feasible for 3.6 so we can get more data from logs to cover the 100% of crashes information?
Comment on attachment 404855 [details]
Number of crash reports by user agent string by yyyy-mm-dd-hh

This is very useful.  Thanks, Daniel.  In the future we should continue to trend this and if we get more accurate information from Firefox's crash URL we can modify but UA string seems to do the trick.
Attachment #404855 - Flags: review?(morgamic) → review+
For now, if someone could review the UA strings that we currently receive and give us an idea of some logic we could apply to categorize them, we can easily turn this one-time processing into a continuously run process that we can trend.
Anything with CFNetwork is Mac -- the crash reporter on Mac doesn't properly set the UA string so the default network handler is using its UA.  Unfortunately there's a lot of variance here, but we don't have product info.

Linux and Windows are easily identifiable:
   283 2009-09-12-22   Breakpad/1.0 (Linux)    39
   284 2009-09-12-22   Breakpad/1.0 (Windows)  27611

So, I think the best we can do now is crash volume over time per platform -- and you could get that by grouping CFNetwork stuff under the Mac umbrells and using Breakpad/1.0 ($platform) for the other two.
Filed bug 521349 for the client enhancement. Please add any requirements there.
We delivered what we could for the time being.  Reopen if something is missing or file a new bug for future enhancement.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: Waiting review
Target Milestone: 2010-07.1 → ---
Version: 0.1 → unspecified
You need to log in before you can comment on or make changes to this bug.