Analyze duplicate data

RESOLVED FIXED in 2.0

Status

Socorro
General
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: laura, Unassigned)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

7 years ago
As a result of bug 629088, we now have data on which crashes look like dupes (according to our first run algorithm, anyway).

I've noticed from looking at the UI on staging that there are some crashes that have a lot of dupes, and some that have none.  Finding the pattern may help to solve the underlying problem.

Let's run some queries against the PostgreSQL database to slice dupe and non-dupe crashes in different ways: by version, by platform, by time since startup, perhaps?  (Ted noticed at least one of the dupe crashes was a startup crash)

We could also do a more complex clustering analysis using HBase.  

Crashkill team: let us know what you think.

Comment 1

7 years ago
yeah, definitely some additional analysis would help to figure out the problem.

we notice this a lot on mozilla-central/nightly builds where the volume is low and the dups are easier to spot, and the thesis proposed that it is far more prevalent on some signatures rather than others fits with anecdotal examination of the data.

it would be good to see if the pattern/volume continues from nightlies, into beta, and final release, and also for the first few days of the unthrottled 4.0 release data collection.

josh also mentioned the possibility of looking for OS version dependencies.  win,mac, linux would be interesting but it probably be more useful to go down to OS version info rounding off to minor level like

win  5.1
win  6.0
win  6.1
mac 10.4
mac 10.5
mac 10.6
lin  2.6

Comment 2

7 years ago
also correlation to startup crashes v. other crashes,  high pct. of crashes happen within 3 minutes since start.

Comment 3

7 years ago
We'd expect the percentage of startup crashes among the "dupes" to be significantly higher than in the rest of crashes (due to people going into a cycle of trying to start Firefox and crashing, trying to restart, etc.) - but it would be interesting to see actual data on that.

Comment 4

7 years ago
> We'd expect the percentage of startup crashes among the "dupes" to be
> significantly higher than in the rest of crashes (due to people going into a
> cycle of trying to start Firefox and crashing, 

we still need to figure out how many of the dups are "user initiated" v. "non-user initiated", and under what conditions the "non-user initiated" crashes are happening so there is more to learn here I thing, but I agree with your theory.

Comment 5

7 years ago
(In reply to comment #3)
> We'd expect the percentage of startup crashes among the "dupes" to be
> significantly higher than in the rest of crashes (due to people going into a
> cycle of trying to start Firefox and crashing, trying to restart, etc.) - but
> it would be interesting to see actual data on that.

This is what I am seeing in THunderbird
Blocks: 579136
Depends on: 629088
Chris, Robert,

Can you give me a list of reports you'd like me to run, including any filtering and grouping levels?  I'm happy to do these, but I need some specifics for Laura to approve before I run them.

Also, is CSV format OK with you, or do you want the data some other way?  CSV, XML and Postgres tables are easy, other formats will require more time.

Thanks!

Comment 7

7 years ago
maybe the best way to do the reporting on this is to add it to the existing pub-crashdata and url .csv files.  

https://crash-analysis.mozilla.com/crash_analysis/20110502/20110502-pub-crashdata.csv.gz

that would allow us to try and correlate reports using lots of other crash meta data.
(Reporter)

Comment 8

7 years ago
What's the action here?
Target Milestone: --- → 2.0

Comment 9

7 years ago
we added install age to the nightly .csv reports, but adding the marking on the reports that we are also marking as dups in the database would be useful as well.

I suggest that we add a "dup"  field to the .csv reports like mentioned in comment 7.  then we can use that to start to correlate dups against other crash meta data.

Comment 10

7 years ago
(In reply to comment #9)
> I suggest that we add a "dup"  field to the .csv reports

That's part of bug 655750.

Comment 11

7 years ago
I then I think we are done here unless josh has something he wants to do.

Comment 12

7 years ago
Actually, it's bug 658049 as the other one needed to be split into two parts.

Updated

7 years ago
Depends on: 658049
This isn't my bug, I have nothing I want to do.  If you want me to run queries or do exports of any specific duplicate information, file another bug.
(Reporter)

Updated

7 years ago
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Updated

7 years ago
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.