Closed
Bug 519871
Opened 15 years ago
Closed 15 years ago
Get collector traffic (wc -l) over last 3 months
Categories
(Mozilla Metrics :: Data/Backend Reports, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Unreviewed
People
(Reporter: morgamic, Assigned: morgamic)
Details
Attachments
(1 file)
We're trying to profile crash volume and need to run a line count on access logs dating back as far as we can go. You guys have a script for this -- can we run this on crash-reports.m.... and get a simple line graph of count per day?
Comment 1•15 years ago
|
||
Can Metrics help here?
Assignee: server-ops → nobody
Group: infra
Component: Server Operations → Data/Backend Reports
Product: mozilla.org → Mozilla Stats
QA Contact: mrz → data-reports
Target Milestone: --- → 0.1
Version: other → 0.1
Comment 2•15 years ago
|
||
that should be fairly simple, where do the logs sit and what in the logs shuold we be looking for? (or are we looking for everything in the wc -l style?)
Assignee | ||
Comment 3•15 years ago
|
||
Ideally we'd find a way to trend these logs over time, and if you have the ability to split by product, version and platform that's a definite bonus.
Comment 4•15 years ago
|
||
Tell me where that is and I'll see what I can do.
Full analysis on crash data is a big Q4 objective for us, but if I can quickly come up with something to to solve this particular questions, happy to help
Comment 5•15 years ago
|
||
In about half an hour, the following file will be finished writing out:
cm-metricsetl02:/etl/processed/crash_report_trend.tsv.gz
It contains three columns,
yyyy-mm-dd-hh user_agent_string count_requests
Currently, we're averaging about 150 unique user agent strings. The Mac crash reporters tend to put the precise name of the machine type (i.e. iMac MacBook MacBookPro) plus the kernel version number which leads to fragmented records.
It is likely the file might be useful to you as is, but if not, Pedro or I can easily take it and run a real ETL process on it to properly tokenize and cluster the UA strings. Just let us know what type of analysis you want to do.
For reference, here is the one-liner I used to create the file. It probably would have been a little easier to read if I had done a true perl script but...
zgrep -oP '(?<=" ")[^"]+(?=" ")' *.gz | pv -s $(gzip -l *.gz | tail -1 | awk '{print $2}') | sed -re 's/^access_(.............)[^:]+:(.*)/\1\t\2/' | sort -S 10G | uniq -c | awk '{printf "%s\t%s\t%d\n", $2, $3, $1}' | gzip > /etl/processed/crash_report_trend.tsv.gz
zgrep -oP
-- only print the matching text, use Perl like syntax
'(?<=" ")[^"]+(?=" ")'
-- A regex to match just the user agent string without the surrounding quotes (so -o gives me what I want)
pv -s $(gzip -l *.gz | tail -1 | awk '{print $2}')
-- pipeviewer to give me status and performance of the oneliner. Since I can't muck with the list of filenames that zgrep is parsing (otherwise, it wouldn't emit the "filename:" prefix), I manually calculate the total size of the stream with gzip -l
sed -re 's/^access_(.............)[^:]+:(.*)/\1\t\2/'
-- Strip off the "filename:" part returning just the yyyy-mm-dd-hh and the UA
sort -S 10G | uniq -c
-- Big fat sort buffer, then emit the unique rows with a count
awk '{printf "%s\t%s\t%d\n", $2, $3, $1}'
-- Reorder the columns and make them tab separated
gzip > /etl/processed/crash_report_trend.tsv.gz
-- Write it out to a gzip file.
Comment 6•15 years ago
|
||
Sorry, one more thing: The requests don't have much information about the product or version. If it can't be derived from the information in the user agent string, it won't be able to be derived from the access log requests at all.
Comment 7•15 years ago
|
||
morgamic:
Could you review the user agent strings captured in this and suggest what kind of breakdowns would be useful to you (or anyone else) on this?
Metrics can work on capturing this data out of the log files on a continual basis, but we need to know the best way to parse this string to allow people to make sense of it.
Attachment #404855 -
Flags: review?(morgamic)
Comment 8•15 years ago
|
||
p.s. the data starts at 2009-09-12-21 because those were the earliest logs I had available. It looks like the logs get deleted less than a month after creation.
Updated•15 years ago
|
Assignee: nobody → morgamic
Status: NEW → ASSIGNED
Whiteboard: Waiting review
Assignee | ||
Comment 9•15 years ago
|
||
Shaver - here's what you asked for. Daniel - I think Shaver should work with the Firefox team to provide smarter URLs similar to AUS2 since we don't have much useful data in the URL currently (since everything comes via POST). Mike, is that feasible for 3.6 so we can get more data from logs to cover the 100% of crashes information?
Assignee | ||
Comment 10•15 years ago
|
||
Comment on attachment 404855 [details]
Number of crash reports by user agent string by yyyy-mm-dd-hh
This is very useful. Thanks, Daniel. In the future we should continue to trend this and if we get more accurate information from Firefox's crash URL we can modify but UA string seems to do the trick.
Attachment #404855 -
Flags: review?(morgamic) → review+
Comment 11•15 years ago
|
||
For now, if someone could review the UA strings that we currently receive and give us an idea of some logic we could apply to categorize them, we can easily turn this one-time processing into a continuously run process that we can trend.
Assignee | ||
Comment 12•15 years ago
|
||
Anything with CFNetwork is Mac -- the crash reporter on Mac doesn't properly set the UA string so the default network handler is using its UA. Unfortunately there's a lot of variance here, but we don't have product info.
Linux and Windows are easily identifiable:
283 2009-09-12-22 Breakpad/1.0 (Linux) 39
284 2009-09-12-22 Breakpad/1.0 (Windows) 27611
So, I think the best we can do now is crash volume over time per platform -- and you could get that by grouping CFNetwork stuff under the Mac umbrells and using Breakpad/1.0 ($platform) for the other two.
Comment 13•15 years ago
|
||
Adding Ted, re comment 9.
Comment 14•15 years ago
|
||
Filed bug 521349 for the client enhancement. Please add any requirements there.
Comment 15•15 years ago
|
||
We delivered what we could for the time being. Reopen if something is missing or file a new bug for future enhancement.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: Waiting review
Updated•13 years ago
|
Target Milestone: 2010-07.1 → ---
Updated•13 years ago
|
Version: 0.1 → unspecified
You need to log in
before you can comment on or make changes to this bug.
Description
•