Closed Bug 1294503 Opened 8 years ago Closed 8 years ago

getting additional data rows / columns for a deeper bugzilla analysis

Categories

(bugzilla.mozilla.org :: Administration, task, P2)

Production

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: hulmer, Unassigned)

References

Details

User Story

I am hoping to get a CSV of bugzilla data to perform some basic analysis, organized as one bug per row, each column representing an appropriate field. 

A complete spec of the CSV would be this:

Filtered Date Range: January 1st, 2010 onward
Filtered Products: Core, Desktop, Android, IOS, Toolkit

Original columns of CSV as per Bug #1274024 (starred columns means there wasn't enough time to pull these before London):
- INT: bug ID
- DATESTRING: Date bug was filed
- STRING: the bug resolution (fixed, wontfix, anything other than those two)
- STRING: product
- STRING: keywords
- STRING: status flag
- STRING: platform
- STRING: product 
- STRING: component of product
- TRUE / FALSE: user story is present
- TRUE / FALSE: did the bug start in the General component of a product?
- STRING: release version when added by a code sheriff
- STRING: priority if added by staff
- TRUE / FALSE: whether there is an unresolved needinfo
- TRUE / FALSE: has an attachment
- INT: # of comments
- **TRUE / FALSE: if any comment is marked as abuse / spam / non-pertinent
- STRING severity

Some additional columns I am hoping we can pull:
- STRING: regression range (check comments, or the has reg regression range field, the latter is not in heavy use yet)
- STRING: release version when added by a code sheriff
- TRUE / FALSE: is this bug a patch? (older bugs that don't use mozreview will have patches as attaches)
- TRUE / FALSE: did the bug get originally filed in the right product (not just component)?
- TRUE / FALSE: are any comments flagged as abuse / spam / non-pertinent?
+++ This bug was initially created as a clone of Bug #1274024 +++

I am hoping to get a CSV of bugzilla data to perform some basic analysis, organized as one bug per row, each column representing an appropriate field. 

I requested a subset of all this data before London, and am hoping to expand the analysis I started back then to suggest some answers to the question "What makes a successful bug?"

My recollection is the previous cut of the data got most of the easy-to-query columns just fine. My hope is we can get some of the harder-to-pull ones as well, so I can see whether or not they contribute to successful bug resolutions.

A complete spec of the CSV would be this:

Filtered Date Range: January 1st, 2010 onward
Filtered Products: Core, Desktop, Android, IOS, Toolkit

Original columns of CSV as per Bug #1274024 (starred columns means there wasn't enough time to pull these before London):
- INT: bug ID
- DATESTRING: Date bug was filed
- STRING: the bug resolution (fixed, wontfix, anything other than those two)
- STRING: product
- STRING: keywords
- STRING: status flag
- STRING: platform
- STRING: product 
- STRING: component of product
- TRUE / FALSE: user story is present
- TRUE / FALSE: did the bug start in the General component of a product?
- STRING: release version when added by a code sheriff
- STRING: priority if added by staff
- TRUE / FALSE: whether there is an unresolved needinfo
- TRUE / FALSE: has an attachment
- INT: # of comments
- **TRUE / FALSE: if any comment is marked as abuse / spam / non-pertinent
- STRING severity

Some additional columns I am hoping we can pull:
- STRING: regression range (check comments, or the has reg regression range field, the latter is not in heavy use yet)
- STRING: release version when added by a code sheriff
- TRUE / FALSE: is this bug a patch? (older bugs that don't use mozreview will have patches as attaches)
- TRUE / FALSE: did the bug get originally filed in the right product (not just component)?
- TRUE / FALSE: are any comments flagged as abuse / spam / non-pertinent?
I should also mention - the flag in the first cut of the data set had a variable called has_unresolved_needinfo, but I think the logic for generating that may need to be re-checked. In the previous version of the data set there was only one row that had a 1 for this value.
Setting this a P2 -- work can be done on it next week. Sorry for the delay, we're catching up on some operational deficiencies.
Priority: -- → P2
No problem / thanks Dylan!
User Story: (updated)
Priority: P2 → --
Priority: -- → P2
:dylan any idea when we might get moving on this?
The unresolved parts are actually going to take more time and time is a little thin right now. I wonder if another avenue to pursue from this is to make use of the research database dumps directly?
What do you have in mind in this regard? Handing over access to the research database dumps to me?
Asking :mcote to get Hamilton access to the research dumps
Ah Hamilton should have access per bug 1258849. I'm going to close this ticket and make sure that Hamilton can access the research dump.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.