Need to decide on DB schema for new reporting aggregate tables.

RESOLVED FIXED in 0.5

Status

P1
normal
RESOLVED FIXED
11 years ago
7 years ago

People

(Reporter: justin.gallardo, Assigned: justin.gallardo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13
Build Identifier: Trunk

In order to speed up reporting of crashes, aggregate tables will be populated on a regular basis(cron), as oppose to being done on every report query. This will allow us to efficiently display things like the top 100 crashers and all crashes sorted by rank.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
I was thinking, at least for topcrashers, that we could do something like:
branch_id, date, topcrash_rank, signature, count

branch_id could be an fkey to branches (although it looks like we don't have an id column there), since we'd want a report per-branch. If we key off of branches we may want to add a field to the branches table so we don't run topcrash queries on old versions (like 3.0a4 or whatever we still have in there).
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 2

11 years ago
For top crashers, I think it would be good to break it down to a count to OS version as well. Talking with morgamic, this is what he came up with:

   id
   signature
   branch
   product
   build
   total
   win
   mac
   linux
   rank
   lastrank
   date

rank and lastrank can fkey off to a ranks table, which would allow us to easily track the movement of a crash over time.

After looking at the data that we actually get from the crash reports, I think this breaks it down as much as we would want, and allows us to easily add things in the future(new OS maybe?).
Priority: -- → P1
Target Milestone: --- → 0.5
Assignee: nobody → justin.gallardo
Sounds pretty reasonable. Do rank/lastrank need to be keys, can't they just be integers? (Are we tracking anything but position in the top N crashes?)

We should also consider adding an integer primary key to the branches table, then we could just use a foreign key to that instead of having to replicate branch/product in this table.
(Assignee)

Comment 4

11 years ago
I agree, and think rank and last_rank should just be integers and not key off. I also added some fields, and came up with the following:

-- uptime is the average uptime across all of the reports for that signature
-- users is a count of the unique users that have submitted the crash
-- trend keeps track of the ranks for the last 7 polls. We should decide on how much this actually stores.

CREATE TABLE summary (
    id integer NOT NULL PRIMARY KEY DEFAULT nextval('summary_seq'),
    signature character varying(255),
    branch integer,
    product character varying(30),
    total integer,
    win integer,
    mac integer,
    linux integer,
    rank integer,
    last_rank integer,
    trend character varying(30),
    uptime integer,
    users integer,
    last_updated timestamp with time zone
);

As for a fkey off to the branches table, I don't know that this is needed. The summary table isn't normalized in any way, and should be dead simple to grab the data you need from it. 

Thoughts?
Looks good to me, but I'd name it "topcrash_summary" or something, to be a little more clear.
Hey, crashes or topcrashes is a fine table name.  Other than that schema looks good.
(Assignee)

Updated

11 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 7

11 years ago
Implemented in Pylons with the following model:

top_crashers_table = Table('topcrashers', meta,
  Column('id', Integer, primary_key=True),
  Column('signature', String(255), nullable=False),
  Column('version', String(30), nullable=False),
  Column('product', String(30), nullable=False),
  Column('build', String(30), nullable=False),
  Column('total', Integer),
  Column('win', Integer),
  Column('mac', Integer),
  Column('linux', Integer),
  Column('rank', Integer),
  Column('last_rank', Integer),
  Column('trend', String(30)),
  Column('uptime', Float(2)),
  Column('users', Integer),
  Column('last_updated', DateTime())
)
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.