Closed Bug 667033 Opened 9 years ago Closed 9 years ago

"Crashes Per User" doesn't count content crashes

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Assigned: lonnen)

References

Details

Attachments

(1 file, 4 obsolete files)

When I tried to find in the source what actual data we are using for the "Crashes Per User" graphs, I found that we never count content crashes there, so most Fennec crashes currently are not visible in any graph. We need to find some way to make that available. It might even make sense to display browser+content crashes as the default graph for Fennec.
This is probably the most important bug in 2.1.

Kairo, do you want all graphs to show browser+content crashes?  Or just Fennec?
Assignee: nobody → chris.lonnen
Severity: normal → blocker
Priority: -- → P1
Target Milestone: --- → 2.1
(In reply to comment #1)
> This is probably the most important bug in 2.1.

In terms of content crashes, this is the highest priority, I guess, yes.

> Kairo, do you want all graphs to show browser+content crashes?  Or just
> Fennec?

We can do it everywhere as currently Firefox has 0 content crashes, but they will come along at some date, and they're quite important to be listed.
The ADU estimates were being under-reported before. This patch corrects the numbers returned by the web service.

Kairo, I cannot verify that we are not counting content crashes. Can you point me to where you're seeing that?
I had a hard time deciphering those queries in detail, but to me, it very much looks like adu_codes.CRASH_BROWSER is set in socorro/cron/daily_crash.py to be the number of reports that are crashes (hangid is null) from browser processes (process_type is null). That excludes content crashes. And the numbers in https://crash-stats.mozilla.com/daily?p=Fennec&v[]= fit to my data on Fennec reports of browser processes, excluding content processes (which account for almost double the number as browser processes).
When report_type is specified as 'any' or 'all' the sql query will no longer filter the results on hang_type.
Attachment #546939 - Attachment is obsolete: true
Attachment #547067 - Flags: review?(laura)
removed the debug line I left in the last patch
Attachment #547067 - Attachment is obsolete: true
Attachment #547067 - Flags: review?(laura)
Going to have to dig deeper.
Comment on attachment 546939 [details] [diff] [review]
Corrects the adu estimates for all reports

This fixes this problem, but we need to audit daily_crashes (on both the Python and PHP sides) for correct handling of content crashes too.  

I'll hold off on the r+ until that work is complete.
This includes Lars's patch to daily_crash so that it will support jetpack and content crashes and my modifications to make the web service to respond with all crashes instead of filtering out two types when asked for all. After the corrected cron starts running the charts will have accurate numbers.
Attachment #547070 - Attachment is obsolete: true
Attachment #547205 - Flags: review?(laura)
Comment on attachment 547205 [details] [diff] [review]
add support for jetpack and content crashes in the daily_crash cron

Review of attachment 547205 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.   Consider getting Josh to check the perf impact on the SQL changes.
Attachment #547205 - Flags: review?(laura) → review+
After speaking with KaiRo, we've modified what is returned from the web service to avoid counting the same crashes multiple times.
Attachment #547205 - Attachment is obsolete: true
Comment on attachment 547262 [details] [diff] [review]
changes to exactly what 'all' means

Review of attachment 547262 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #547262 - Flags: review+
r3280 http://code.google.com/p/socorro/source/detail?r=3280
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Robert, is this something you can test, or at least help QA test?  Thanks!
(In reply to comment #14)
> Robert, is this something you can test, or at least help QA test?  Thanks!

Where do I find something that already runs it and has at least enough data so I can verify that the numbers make sense?
(In reply to comment #15)
> (In reply to comment #14)
> > Robert, is this something you can test, or at least help QA test?  Thanks!
> 
> Where do I find something that already runs it and has at least enough data
> so I can verify that the numbers make sense?

Ah, doh; I don't know.  This might take a push, and then a little wait until production crons have run, I guess?
So we need to wait for production to really test?
(In reply to comment #17)
> So we need to wait for production to really test?

I was guessing :-(  It'd be good if we have STR on staging, to verify.
Kairo, is there enough here: https://crash-stats.allizom.org/daily?p=Fennec&v[]= for us to do anything useful with?
(In reply to comment #19)
> Kairo, is there enough here:
> https://crash-stats.allizom.org/daily?p=Fennec&v[]= for us to do anything
> useful with?

Not really. Those numbers are nearer to what I'd expect, but still lower than those I have, not matching any of those I have at all. But I guess that's just because it's not a full sample that is there.
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.