1274024 - getting a csv of bugzilla data for analysis, in anticipation of London

Reporter

Description

•

8 years ago

I am hoping to get a CSV of bugzilla data to perform some basic analysis, organized as one bug per row, each column representing an appropriate field. The spec would be as followed:

A complete spec of the CSV would be this:

Filtered Date Range: January 1st, 2010 onward
Filtered Products: Core, Desktop, Android, IOS, Toolkit
Columns of CSV:
- INT: bug ID
- DATESTRING: Date bug was filed
- STRING: the bug resolution (fixed, wontfix, anything other than those two)
- STRING: product
- STRING: keywords
- STRING: status flag
- STRING: platform
- STRING: product 
- STRING: component of product
- TRUE / FALSE: user story is present
- TRUE / FALSE: did the bug start in the General component of a product?
- STRING: release version when added by a code sheriff
- STRING: priority if added by staff
- TRUE / FALSE: whether there is an unresolved needinfo
- TRUE / FALSE: has an attachment
- INT: # of comments
- TRUE / FALSE if any comment is marked as abuse / spam / non-pertinent
- STRING severity

David Lawrence [:dkl]

Updated

•

8 years ago

Assignee: administration → nobody

Product: Bugzilla → bugzilla.mozilla.org

QA Contact: default-qa

Version: unspecified → Production

Hamilton

Reporter

Comment 1

•

8 years ago

Emma Humphries has alerted me that "there should already be an extract available for you with all the fields, that's the one mcote can connect you to."

These fields are based (mostly) on her list of "what we think does / doesn't make a good bug":

1. keywords
2. status flags
3. regression range (check comments, or the has reg regression range field, the later is not in heavy use yet)
4. platform
5. user story (if a feature request)

Fields that we think indicate success as they are added

1. release version when added by a code sheriff
2. priority if added by staff
3. approved review flags (this is a flag on an attachment, and you should query bugs for attachments)
4. patches (older bugs that don't use mozreview will have patches as atta

Fields that we think indicate failure

1. too many comments (expect a non-linear relationship)
2. jumping across product boundaries
3. jumping between components in a product
4. starting in the General component of a product
5. needinfos open too long 
6. comments flagged as abuse/spam/non-pertinent

Fields that we think have no bearing 

1. severity 
2. release version if set with bug filing

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 2

•

8 years ago

P2 because I will not immediately be working on this, but I'll follow up with questions shortly. Briefly, here is the difficulty for having these fields. Anything omitted is trivial.

> STRING: the bug resolution
this is trivial, but do you want only resolved bugs or open ones too?

> STRING: platform
I assume this means platform *and* OS

> TRUE / FALSE: did the bug start in the General component of a product?
Specifically General or other Triage components? I'll talk to some people that know about this process

> STRING: release version when added by a code sheriff
> STRING: priority if added by staff

These are most difficult and it will take me some time to explain. There are a few options
so don't worry too much about how long this will take, it's more about the trade-offs of accuracy. More on that later!

Dylan Hardison [:dylan] (he/him)

Assignee

Updated

•

8 years ago

Assignee: nobody → dylan

Hamilton

Reporter

Comment 3

•

8 years ago

> STRING: the bug resolution

Yes, both resolved & open ones. The analysis centers around the correlates of a bug getting resolved.

>> STRING: platform
> I assume this means platform *and* OS

Yes, I think that'd be a fair assumption - a separate column for OS would be great.

>> TRUE / FALSE: did the bug start in the General component of a product?
> Specifically General or other Triage components? I'll talk to some people that know about this process

I am not sure about this piece.

To reiterate, according to Emma it appears there may already be an extract of this data as per https://bugzilla.mozilla.org/show_bug.cgi?id=1274024#c1. If that's the case, then there might be very little work to get these data, especially the difficult parts.

Mark Côté [:mcote]

Comment 4

•

8 years ago

(In reply to Hamilton from comment #1)
> Fields that we think indicate failure
> 
> 1. too many comments (expect a non-linear relationship)

One note here: the volume of comments has sometimes dramatically gone up since the introduction of MozReview, because (a) MozReview encourages splitting up work more, so bugs will now sometimes get 10 or 20 attachments instead of 1, and (b) updates to commits cause updates to the corresponding attachment *and* the attachments for all subsequent commits, due to rebasing (e.g. if you have 10 commits and update commit 2, you will get updates to commits 3-10 as well).

Just something to consider.  Unfortunately we aren't yet tagging these comments (we will be in the future), but we could separate out bugs that have MozReview attachments versus those that don't.

Mark Côté [:mcote]

Comment 5

•

8 years ago

(In reply to Hamilton from comment #3)
> To reiterate, according to Emma it appears there may already be an extract
> of this data as per https://bugzilla.mozilla.org/show_bug.cgi?id=1274024#c1.
> If that's the case, then there might be very little work to get these data,
> especially the difficult parts.

That's actually just a sanitized database dump, similar (but smaller) to the current database.  It would require a similar amount of processing to get into the format you're looking for.

Hamilton

Reporter

Comment 6

•

8 years ago

(In reply to Mark Côté [:mcote] from comment #5)

> That's actually just a sanitized database dump, similar (but smaller) to the
> current database.  It would require a similar amount of processing to get
> into the format you're looking for.


Got it - sounds like we're on the right track then.


(In reply to Mark Côté [:mcote] from comment #4)
 
> One note here: the volume of comments has sometimes dramatically gone up
> since the introduction of MozReview, because (a) MozReview encourages
> splitting up work more, so bugs will now sometimes get 10 or 20 attachments
> instead of 1, and (b) updates to commits cause updates to the corresponding
> attachment *and* the attachments for all subsequent commits, due to rebasing
> (e.g. if you have 10 commits and update commit 2, you will get updates to
> commits 3-10 as well).

That's a great point. This is the sort of thing I will likely be able to tease out by "interacting" date + # comments in my statistical model, so that we can control or that.

Hamilton

Reporter

Comment 7

•

8 years ago

:dylan I think it would also suffice, if possible, to just get me a version of this dataset with the trivial-to-collect stuff, so I can get started on some of the basic data analysis while the harder-to-work-out parts get implemented. Would this be possible?

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 8

•

8 years ago

Definitely. I can get you a csv the beginning of next week.

Flags: needinfo?(dylan)

Andre Klapper

Comment 9

•

8 years ago

Could the curious audience in this public issue tracker be told what's going to happen in London? Thanks.

Mark Côté [:mcote]

Comment 10

•

8 years ago

I don't think anyone knows yet. :)  Hamilton and team want to analyze usage patterns in BMO to determine if there are ways we can streamline and/or enhance the UI and our bug-tracking processes, but I don't think anyone knows what that will look like yet.  We're in an exploratory phase right now, as far as I know.

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 11

•

8 years ago

Just pinging in, I'll be getting the report together and shared to Hamilton today.

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 12

•

8 years ago

Confirming a detail with :mcote before sending the data via out of band channel.

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 13

•

8 years ago

Attached file hamilton-report.sql — Details

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 14

•

8 years ago

I have given a first round of data to the reporter. It contains only public bugs (from a May 3rd sanitized dump).

Needinfo me for further requests -- for fresh data we'll conduct a security review.

Dylan Hardison [:dylan] (he/him)

Assignee

Updated

•

8 years ago

Attachment #8756434 - Attachment mime type: text/x-sql → text/plain

Dylan Hardison [:dylan] (he/him)

Assignee

Updated

•

8 years ago

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Assignee

Comment 15

•

8 years ago

I'm guessing this can be closed.

Status: NEW → RESOLVED

Closed: 8 years ago

Flags: needinfo?(dylan)

Resolution: --- → FIXED

Hamilton

Reporter

Updated

•

8 years ago

Blocks: 1294503