Last Comment Bug 411358 - Show top crashes by URL
: Show top crashes by URL
Status: RESOLVED FIXED
:
Product: Socorro
Classification: Server Software
Component: General (show other bugs)
: Trunk
: All All
: P1 normal (vote)
: 0.6
Assigned To: Austin King [:ozten]
: socorro
Mentors:
http://code.google.com/p/socorro/issu...
Depends on:
Blocks: 470524 470525 470526 470527 470561 470563
  Show dependency treegraph
 
Reported: 2008-01-08 15:06 PST by Samuel Sidler (old account; do not CC)
Modified: 2011-12-28 10:40 PST (History)
9 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
daily cron job for top crashers by domain (25.69 KB, patch)
2008-12-12 11:59 PST, Austin King [:ozten]
no flags Details | Diff | Splinter Review
top crashers by url DB tables DDL and DML (in progress) (7.43 KB, application/octet-stream)
2008-12-12 12:04 PST, Austin King [:ozten]
no flags Details
php code for viewing grouped by url or domain (29.60 KB, patch)
2008-12-15 10:21 PST, Austin King [:ozten]
no flags Details | Diff | Splinter Review
Updates to python and php code. combined in one patch. (68.19 KB, patch)
2008-12-17 10:08 PST, Austin King [:ozten]
no flags Details | Diff | Splinter Review
Updated with Lars feedback (63.75 KB, patch)
2008-12-17 17:46 PST, Austin King [:ozten]
lars: review+
morgamic: review+
Details | Diff | Splinter Review
This is the SQL for creating new db tables (3.93 KB, text/plain)
2008-12-17 17:48 PST, Austin King [:ozten]
no flags Details
This is the SQL for rolling back this deployment of new tables (287 bytes, text/plain)
2008-12-17 17:49 PST, Austin King [:ozten]
no flags Details

Description Samuel Sidler (old account; do not CC) 2008-01-08 15:06:06 PST
Reported by morgamic, Jul 10, 2007

Need to show crashes by operating system and/or platform on the top crasher
page and possibly the main query page.

--

Comment 5 by morgamic, Aug 03, 2007

OS was done, we need something that shows top crashes by URL now.
Comment 1 Michael Morgan [:morgamic] 2008-02-29 04:51:28 PST
Do we still want to show this report despite concerns about privacy issues?
Comment 2 Samuel Sidler (old account; do not CC) 2008-02-29 20:09:37 PST
We need this report. If it has to be behind a secure part of the site, so be it. But we need this.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2008-04-04 06:25:57 PDT
so here's a query that will get you URLs appearing in >10 crash reports, but strips off query strings:

select split_part(url, '?', 1) as url_part, count(*) as c from reports where url is NOT NULL and URL != '' group by url_part HAVING count(*) > 10;

(needs a date limiter, obviously)

The downside is that you can't actually get to individual crash reports from that.
Comment 4 timeless 2008-04-04 07:06:16 PDT
can you limit them to crashes that happen for more than 3 users? :)
Comment 5 Ted Mielczarek [:ted.mielczarek] 2008-04-04 07:20:32 PDT
select split_part(url, '?', 1) as url_part, count(*) as c, count(distinct user_id) as users from reports where url is NOT NULL and URL != '' group by url_part HAVING count(*) > 10 AND count(distinct user_id) > 3;

Seems to work. I don't know how much the database will hate that query though. :)
Comment 6 Ted Mielczarek [:ted.mielczarek] 2008-04-04 07:20:56 PDT
CCing morgamic as he's my DBA guru.
Comment 7 Michael Morgan [:morgamic] 2008-05-27 09:54:19 PDT
Justin could you take a look at this?  Summary of requirements:
* for a given crash signature, show a list of URLs that occurred 3 or more times
* strip off the query arguments
* start off with the assumption that you'll be doing this the aggregate way
Comment 8 Ted Mielczarek [:ted.mielczarek] 2008-05-27 09:57:24 PDT
Comment 5 contains a query that will generate a "top URLs" list containing URLs included in >10 crash reports from >3 unique users.
Comment 9 Justin Gallardo 2008-05-27 10:22:27 PDT
Sure thing. I will start working on something for this right now.
Comment 10 Henrik Skupin (:whimboo) 2008-11-21 14:55:21 PST
Can we get an update on this feature?
Comment 11 Austin King [:ozten] 2008-11-21 14:57:38 PST
It is on my TODO list and should be shipped by 12/15.
Comment 12 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-12-08 12:09:09 PST
From bug 415027:
----
Comment #10 From  Austin King   2008-12-08 11:26:03 PST

Wireframe for report:
http://people.mozilla.org/~aking/Socorro/TCbyURL/TCByURL-wireframe.jpg

Comment #11 From Samuel Sidler (:ss | :sps) 2008-12-08 11:29:36 PST

Actually, comment 10 should be in bug 411358.
----

As far as I can tell from that mockup, the crash reports displayed under each URL aren't grouped by crash signature, which seems...not very useful.  

It's possible--I'd even say extremely likely--that a page like myspace.com could trigger crashes in WMP/F4M, Flash, even somewhere in layout or the parser, and the current mockup looks like it just tells us "myspace.com crashes a lot" instead of "myspace.com crashes a lot in WMP/F4M, myspace.com crashes a lot in Flash, myspace.com crashes a lot in this particular function in the html parser, myspace.com crashes a lot in function A in layout, and myspace.com crashes a lot in function B in layout".

Grouping by crash signature within the URL report makes these reports much more useful for QA, who'd otherwise have to dig back through all the comments figuring out which crashes appeared, and appeared most, on a page/site.  (It also happens to be how Talkback presents this information, which was one of the parts of Talkback that worked :P )
Comment 13 Austin King [:ozten] 2008-12-08 12:56:37 PST
Thanks Smokey for c12.
Updates with breakdowns below each url for signature. 

http://people.mozilla.org/~aking/Socorro/TCbyURL/TCByURL-wireframeV2.jpg
Comment 14 Henrik Skupin (:whimboo) 2008-12-08 14:15:45 PST
So you show the first stack frame only? Wouldn't it be nice to also have the first frame with symbols available? I ask because of the example kernel32.dll in your mockup.
Comment 15 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2008-12-08 14:19:18 PST
It won't be the first frame, it will be the crash signature. See bug 411349 about work to make the signatures more meaningful when the top frames aren't very unique.
Comment 16 Henrik Skupin (:whimboo) 2008-12-08 14:30:16 PST
Thanks Benjamin. That sounds good. So forget my last comment...
Comment 17 Austin King [:ozten] 2008-12-08 15:11:15 PST
In V2 I am aggregating by domain, then urls, then crash signatures.

Here is V3 which just starts with urls
http://people.mozilla.org/~aking/Socorro/TCbyURL/TCByURL-wireframeV3.jpg
So the myspace.com domain might show up at #3, #17, and #50 in top crashes by url.

Is V3 more useful than V2?

Background:
This Top Crashers by URL depends on two features we don't have yet, authentication ( to display full urls and link to reports ) and search by domain/url.

One of the original constraints was thinking about the page for logged in versus non-logged users. Aggregating by domains would be the public view. If you were logged in, then you could drill down into URLs.
Comment 18 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-12-08 17:18:16 PST
I think I like v2 better than v3, but that may just be a bias towards expecting a domain to have similar types of content/crashes across all of its pages. My only concern with v2 over v3 is that v2 could "hide" a relatively large crash on, say, not-quite-myspace-popular-but-still-important.com because the volume of pages on myspace.com and other large sites.  

That is, myspace.com in aggregate has 500 crashes, but 50 of those come from myspace.com/crashme and the other 450 are spread across 100 pages (~3 crashes/other page).  facebook.com, flickr.com, yahoo.com, and so forth are also in the same situation.  Then not-quite-myspace-popular-but-still-important.com (mail.google.com, perhaps) has 99 crashes, all on the same page (in this case by virtue of stripping query strings, etc.), but it's far down the domain list because of the number of "large-volume-of-pages" sites.  I don't know how common this case might be, but it is something that came to mind when considering v2.
Comment 19 Austin King [:ozten] 2008-12-12 11:59:51 PST
Created attachment 352767 [details] [diff] [review]
daily cron job for top crashers by domain
Comment 20 Austin King [:ozten] 2008-12-12 12:04:18 PST
Created attachment 352769 [details]
top crashers by url DB tables DDL and DML (in progress)

This is the create statements needed for
Dimension tables:
signaturedims
urldims

Fact Table:
topcrashurlfacts

Config Tabls:
tcbyurlconfig

TODO I don't have all the constraints and indexes in place. This file is my working scratch file. Would cleanup or integrate with schema.py (???)

References to productdims are from MTBF patches
https://bugzilla.mozilla.org/show_bug.cgi?id=411424
Comment 21 Austin King [:ozten] 2008-12-15 10:21:38 PST
Created attachment 353074 [details] [diff] [review]
php code for viewing grouped by url or domain

This code can be previewed on my dev instance
http://aking.khan.mozilla.org/reporter/topcrasher/byurl/Firefox/3.0.1

(requires VPN sorry)
Comment 22 Austin King [:ozten] 2008-12-16 22:03:40 PST
Staging notes:
Running against 8/24 which has 85K report rows across all products and
77K for Firefox 3.0.1
49K with non null url + sig
18mb 26mb 35mb 50mb 48mb 
0 cpu 10 cpu 1
4:11 - 4:30

died on bad column name ( comments now user_comments )
rerunning ( will revert staging only code before checkin )

ran 8/23 with tons of logging
50k records non null url + sig
took 19.5 minutes

... Adding 3.0 to the mix Prod id 7 Firefox 3.0 ALL
ran 8/22 
53k records non null url + sig
took 21 minutes

Disabled via config and
ran 8/21
Exited very quickly, no facts created

Enabled configs and 
ran 8/21

Tue Dec 16 19:45:09 PST 2008
8:08
23 minutes...

So for 4 days of data...
topcrashurlfacts        - 17809408 (17 MB)
topcrashurlfactsreports - 155648 (152 KB)
urldims                 - 13877248 (13 MB)
signaturedims           - 1245184 (1 MB)

... Changed code to record aggregate info for facts where there are
more than 1 crash ( head of long tail )

memory - 8856 kb ( stayed under 10 MB )
Tue Dec 16 20:49:10 PST 2008
Tue Dec 16 20:50:46 PST 2008

holy crap!


Deleted all facts, urldims, signaturedims...

Ran 8/22 
1 min 15 seconds

Found bug... '\n' comments should be filter out of
topcrashurlfactsreports

Ran 8/23
1 min 9 seconds

Ran 8/24
1 min 7 seconds

Table sizes for 2 products across 4 days are now:
topcrashurlfacts - 1695744 (1.6 MB)
topcrashurlfactsreports - 24576 (24 KB)
urldims - 1056768 (1 MB)
signaturedims - 1245184 (1.2 MB)
Comment 23 Austin King [:ozten] 2008-12-16 22:09:52 PST
Comments:
18mb 26mb 35mb 50mb 48mb 
0 cpu 10 cpu 1

and 
memory - 8856 kb ( stayed under 10 MB )

are about python's Res memory and % CPU.
Comment 24 Austin King [:ozten] 2008-12-17 10:08:09 PST
Created attachment 353471 [details] [diff] [review]
Updates to python and php code. combined in one patch.

Working on an updated SQL script.
The CSS changes are in the MTBF patch Bug 411424
Comment 25 Austin King [:ozten] 2008-12-17 17:46:06 PST
Created attachment 353592 [details] [diff] [review]
Updated with Lars feedback
Comment 26 Austin King [:ozten] 2008-12-17 17:48:26 PST
Created attachment 353594 [details]
This is the SQL for creating new db tables
Comment 27 Austin King [:ozten] 2008-12-17 17:49:04 PST
Created attachment 353595 [details]
This is the SQL for rolling back this deployment of new tables
Comment 28 K Lars Lohn [:lars] [:klohn] 2008-12-17 18:38:01 PST
Comment on attachment 353592 [details] [diff] [review]
Updated with Lars feedback

I've reviewed and approved the Python code based on the idea that it be revisited later for some housecleaning and refactoring.
Comment 29 Austin King [:ozten] 2008-12-18 11:02:19 PST
2 products 8-22
First run on old partitioning scheme took 58 minutes ( instead of 67 seconds,
or 20 minutes for the pre-optimized script )

2 product 8-23 52 minutes
2008-12-17 23:26:30,592 INFO - done.
2008-12-17 22:34:26,38

3 products 8-24 timed out - could be my problem, didn't use nohup...

trying by dropping index, then rebuilding index
3 products 8-24 2 hours 38 minutes

rebuilding indexes takes 500 millis

Will continue testing, Conclusions so far
This is in our bad performance, but good enough to ship with range. MTBF runs
in under a minute.

We will want to be careful with the number of builds we want to calculate "top
crashers by url" for. Specifically major releases which generate a lot of rows
in reports. For less used builds, it isn't an issue, Running against 3.0b3 for
a days worth of data took only 1 second.
Comment 30 Michael Morgan [:morgamic] 2008-12-18 18:28:24 PST
UX/Polish
- put link in brackets - like [ link ] to space it out from the actual url
- is there a reason why the signatures are not linked?  might be useful
- comment links look good! woot.
- would be cool if the signatures under a domain were indented somehow but that's minor

Code:
- PHP looks good, let's kick it out there and polish it as we get feedback

Austin - sorry I was not able to review this more closely, I ran out of time this week.
Comment 31 Michael Morgan [:morgamic] 2008-12-18 18:29:30 PST
Comment on attachment 353592 [details] [diff] [review]
Updated with Lars feedback

Let's get it out the door and in front of some eyes.
Comment 32 Austin King [:ozten] 2008-12-19 13:05:15 PST
I will write up some documentation but here is a soft launch for Top Crashers by URL...
http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.0.5
http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.1b3pre
http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.1b2
http://crash-stats.mozilla.com/topcrasher/byurl/Firefox/3.0.6pre - not enough crashes on same url to make the report...

Details which would explain 3.0.6pre being empty are coming... to Socorro code wiki page.
Comment 33 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-12-19 21:44:06 PST
Austin, this looks good!  I've filed some smaller things as follow-ups, bug 470524, bug 470525, and bug 470526 (a couple of them I stole from morgamic in comment 30).
Comment 34 Smokey Ardisson (offline for a while; not following bugs - do not email) 2008-12-19 22:09:04 PST
...and bug 470527 on a random failure to show signatures when expanding URLs in the bydomain report.
Comment 35 Henrik Skupin (:whimboo) 2008-12-20 01:18:38 PST
Lets add all of these follow-up bugs to the dependency list. It looks really great!

Note You need to log in before you can comment on or make changes to this bug.