Closed Bug 426386 Opened 13 years ago Closed 13 years ago
Need to decide on DB schema for new reporting aggregate tables
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/20080311 Firefox/188.8.131.52 Build Identifier: Trunk In order to speed up reporting of crashes, aggregate tables will be populated on a regular basis(cron), as oppose to being done on every report query. This will allow us to efficiently display things like the top 100 crashers and all crashes sorted by rank. Reproducible: Always Steps to Reproduce: 1. 2. 3.
I was thinking, at least for topcrashers, that we could do something like: branch_id, date, topcrash_rank, signature, count branch_id could be an fkey to branches (although it looks like we don't have an id column there), since we'd want a report per-branch. If we key off of branches we may want to add a field to the branches table so we don't run topcrash queries on old versions (like 3.0a4 or whatever we still have in there).
Status: UNCONFIRMED → NEW
Ever confirmed: true
For top crashers, I think it would be good to break it down to a count to OS version as well. Talking with morgamic, this is what he came up with: id signature branch product build total win mac linux rank lastrank date rank and lastrank can fkey off to a ranks table, which would allow us to easily track the movement of a crash over time. After looking at the data that we actually get from the crash reports, I think this breaks it down as much as we would want, and allows us to easily add things in the future(new OS maybe?).
Sounds pretty reasonable. Do rank/lastrank need to be keys, can't they just be integers? (Are we tracking anything but position in the top N crashes?) We should also consider adding an integer primary key to the branches table, then we could just use a foreign key to that instead of having to replicate branch/product in this table.
I agree, and think rank and last_rank should just be integers and not key off. I also added some fields, and came up with the following: -- uptime is the average uptime across all of the reports for that signature -- users is a count of the unique users that have submitted the crash -- trend keeps track of the ranks for the last 7 polls. We should decide on how much this actually stores. CREATE TABLE summary ( id integer NOT NULL PRIMARY KEY DEFAULT nextval('summary_seq'), signature character varying(255), branch integer, product character varying(30), total integer, win integer, mac integer, linux integer, rank integer, last_rank integer, trend character varying(30), uptime integer, users integer, last_updated timestamp with time zone ); As for a fkey off to the branches table, I don't know that this is needed. The summary table isn't normalized in any way, and should be dead simple to grab the data you need from it. Thoughts?
Looks good to me, but I'd name it "topcrash_summary" or something, to be a little more clear.
Hey, crashes or topcrashes is a fine table name. Other than that schema looks good.
Implemented in Pylons with the following model: top_crashers_table = Table('topcrashers', meta, Column('id', Integer, primary_key=True), Column('signature', String(255), nullable=False), Column('version', String(30), nullable=False), Column('product', String(30), nullable=False), Column('build', String(30), nullable=False), Column('total', Integer), Column('win', Integer), Column('mac', Integer), Column('linux', Integer), Column('rank', Integer), Column('last_rank', Integer), Column('trend', String(30)), Column('uptime', Float(2)), Column('users', Integer), Column('last_updated', DateTime()) )
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.