Closed Bug 729703 Opened 12 years ago Closed 9 years ago

Enhancement ideas for coding contributor dashboards

Categories

(Mozilla Metrics :: Data/Backend Reports, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE
Backlogged - BZ

People

(Reporter: davidwboswell, Assigned: josephine)

Details

(Whiteboard: Research / Commentary / Dialog)

Attachments

(1 file)

I was just on a call with the Coding Stewards and they had feedback on what would make the coding contributor dashboards more useful for them.  Opening bug to capture their feedback.
one of the ideas would be to create a list of "missing-in-action" contributors based on anyone that was a recent contributor (in last year?), but hadn't submitted a patch in the last month or two.  from this module owners could check-in with the contributor.  

bugzilla queries to do this kind of thing are not straight forward.

This query

https://bugzilla.mozilla.org/buglist.cgi?type0-1-0=changedafter;list_id=2422406;field0-1-0=attachments.description;field0-0-0=attachments.description;value0-1-0=2012-02-01;type0-0-0=changedbefore;value0-0-0=%20Now;classification=Client%20Software;query_format=advanced;product=Firefox

shows a list of bugs for the firefox frontend where patches have been attached, and lists out the owner of the bug, but not the patch provider.   We would need to get list of patch provider for the last month, then diff that agaist patch providers for the past year to get at the desired list of missing contributors.
Comment 1 is the most important to me, another thing that I'd like to have would be to star certain people who would appear in my starred list.  Alternately I could enter an email and they would appear in that list.
maybe gerv can help with the bugzilla foo.
Unfortunately, the person who added an attachment is not a field which can be displayed in the search results (the search results are about bugs, not attachments). We'd either need to write a custom report thing built into Bugzilla, or someone would need to do the above search using the Bugzilla API, then request each bug, look through the attachments, find the relevant ones, and make a list of the attachers. Anyone can do that; docs are here: 
https://wiki.mozilla.org/Bugzilla:REST_API

Gerv
I just talked with pmartins on IRC about this and he suggested we find some time to talk with him and Paulo about this so his team can put a plan in place for making these changes.

I'll add this to the agenda for the next Coding Contribute Group and we can figure out when we can get all our feedback together and when we'd like to schedule something (the regular call is too late in the day so we can schedule a time earlier in the West Coast for the discussion).
I would like to have API access to the data used to create the dashboard, I think that would be a very useful feature.
I would like to echo David's request. It will be easier to prototype new tools/ideas based on the data contained within the dashboards if there is an API to access it.

With regards to the dashboards themselves, I couldn't find an easy way to make queries for specific time periods such as the past 30 days. If I'm interested in finding out the list of contributors who have stopped contributing recently, wading through lists of people who haven't contributed for the past year is not very helpful.
Also, some way to associate a given contributor with a means to contact them is key - it doesn't help if I know that someone has stopped contributing if I'm unable to reach them to follow up. I don't see anything like an email address in the data provided, and not all contributors have Mozillians accounts.
(In reply to Josh Matthews [:jdm] from comment #7)
> I would like to echo David's request. It will be easier to prototype new
> tools/ideas based on the data contained within the dashboards if there is an
> API to access it.
> 
> With regards to the dashboards themselves, I couldn't find an easy way to
> make queries for specific time periods such as the past 30 days. If I'm
> interested in finding out the list of contributors who have stopped
> contributing recently, wading through lists of people who haven't
> contributed for the past year is not very helpful.

The dashboards always process the data since a specific date and 12 months back and does a breakdown by month of the number of patches for each contributor.
There isn't the possibility to change that range of time because the table always show 12 months of data.
As for the contributors that stopped contributing recently, although you still have to go through the pages, but you can sort the table for "Days since last patch" and see them at the top.
From what I understand:

What would be most useful is some kind of contributor REST HTTP API that returns JSON data.
It would give easy access to aggregated data which is gathered from Bugzilla and mozilla-central revision history.
It would be similar to the bugzilla API but would be user centric.

The API would expose resource data that allows the user of the API to:
- Look up information on a particular contributor (name, email, isEmployee, mozilliansAccountURL, bugzillaAccountId, ...)
- Obtain the number of patches contributed by a specific user form a specific date, to a specific date, in a particular module (or all modules)


Here's an example of such an API:
---------------------------------

Retrieving a list of all contributors:
GET /contributors/

Retrieving a specific contributor's information:
GET /contributors/id/

Retrieving a list of modules a particular contributor contributed to:
GET /contributors/id/modules/

Retrieving a summary of the patch information of a particular contributor:
GET /contributors/id/patchInfo/?from=date&to=date&module=X
(Where X could be a particular module or if not specified, all modules would be used)

Retrieving a list of all modules:
GET /modules/

Retrieving a particular module information:
GET /modules/toolkit/

Retrieving a list of all contributors of a particular module
GET /modules/toolkit/contributors/?from=date&to=date&sort=highest&max=100

From these APIs, users of the API could do interesting things like:
- Get a list of contributors that have stopped contributing within the last X days, in a particular module (or all modules)
- Get a list of really active recent contributors so we could send them things like t-shirts.
- Get a list of the highest contributors in each area of code so new users knows who to ask questions to
- Find out which contributors would be useful for a particular bug
- Lots of other things...

Having the modules finely defined would be very useful, i.e. not extremely broad.

In the future we could expand on that to easily expose the users who have unreviewed patches or have been waiting on reviews for a long time.
As discussed in the meeting, we're interested in getting data on individual Bugzilla components if possible, instead of the broader modules that are exposed right now.
Is there any update to the access discussed previously where we could run our own queries/exports against the database?
Are there any objections to making this bug public?
Since we where told to change the contributor email to the contributor name to hide it and there is a request of contributor's info, which includes the contributor's email, I think it's best to keep this bug private until the email issue is settled.

As for the requests, this is what is available now:

- Retrieving a list of all contributors: showing user name and total number of patches, sorted by total number of patches
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=usersQuery&outputType=csv&settingcsvSeparator=,


- Retrieving a specific contributor's information: showing user name, user email, is_employee, first_contribution_date and last_contribution_date
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=userInfoQuery&paramcurrentUserParam=Dao Gottwald&outputType=csv&settingcsvSeparator=,
change "Dao Gottwald" to the user that you want details for


- Retrieving a list of modules a particular contributor contributed to:
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=userModulesQuery&paramcurrentUserParam=Dao Gottwald&outputType=csv&settingcsvSeparator=,
change "Dao Gottwald" to the user that you want


- Retrieving a summary of the patch information of a particular contributor:
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=patchDetailsQuery&paramcurrentUserParam=Dao Gottwald&paramrepositoryParameter=mozilla-central&parammoduleParameter=Firefox&paramstartDateParameter=2012-01-01&paramendDateParameter=2012-01-31&outputType=csv&settingcsvSeparator=,
change "Dao Gottwald" to the user that you want
change "mozilla-central" for the repository that you want or "All"
change "Firefox" for the module that you want or "All"
change "2012-01-01" for date from
change "2012-01-31" for date to


- Retrieving a list of all modules:
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=modulesQuery&paramrepositoryParameter=All&outputType=csv&settingcsvSeparator=,


- Retrieving a particular module information: (showing submodules of a module)
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=subModulesQuery&parammoduleParameter=Firefox&outputType=csv&settingcsvSeparator=,
change "Firefox" for the module that you want


- Retrieving a list of all contributors of a particular module
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=usersDateQuery&parammodulesParameter=Firefox&startDateParameter=2012-01-01&endDateParameter=2012-01-31&outputType=csv&settingcsvSeparator=,
change "Firefox" for the module that you want or "All"
change "2012-01-01" for date from
change "2012-01-31" for date to


Let me know if these will satisfy your requests of the data in the database.

Just a final note, we store for last_contribution_date the date of the contributor's last contribution, regardless of the module or repository, so it won't be possible to query when a contributor had stop contributing to a specific module if he keeps contributing to another module.
Paulo: Who said that the contributor name and email cannot be revealed? I can't see that in the thread. Would be helpful to know where that decision came from and what guidelines were being referenced.

I confess that I'm confused. The information - names and emails of contributors - is already access this data. Indeed, a few years ago we created a similar dashboard with all this data querying bugzilla directly. In addition, Bugzilla warns users very explicitly that their names and emails in bugzilla will be public.
David,

We were trying to imitate the behavior of Bugzilla.  If someone is not logged in to bugzilla, they see only user names and cannot see the e-mail addresses.

My hope was that once we implement browserid authentication, we could provide e-mail addresses to logged in users and usernames otherwise.

This is open to discussion, but it was certainly a concern that was expressed in the first version of the mecurial dashboards that people could view a large list of e-mail addresses via the unauthenticated dashboard.
Daniel - thank you, super helpful! Going to ruminate on this. The face that non-logged in users and see user names is great. This makes more sense that I what I was gathering from the thread.
Copying Martin Best on this to get his input on how much overlap there is with what he's doing with the Bugzilla Anthropology project.

https://wiki.mozilla.org/Bugzilla_Anthropology
I'm not sure what you would like me to comment on, this is a pretty long comment list.  Would you mind being more specific?

I think that the project likely uses similar data but I tend to focus more on the bug life cycle as a whole without looking at contributors specifically for the most part.  I am working on a heat map of who is working on what but that will be down the line.

I personally worry that people are jumping to the secrecy stance too quickly, if there is any question then the tendency seems to better safe than sorry.  We should be as open as we can possibly be so that the data has the chance to have the maximum impact.
(In reply to Martin Best (:mbest) from comment #19)
> I'm not sure what you would like me to comment on, this is a pretty long
> comment list.  Would you mind being more specific?

I wanted to keep you in the loop so you could see if the conversations here were duplicating anything you're already working on.  I haven't looked at enough of the Bugzilla Anthropology content yet to have specific thoughts about where there might be overlap.

> I think that the project likely uses similar data but I tend to focus more
> on the bug life cycle as a whole without looking at contributors
> specifically for the most part.  I am working on a heat map of who is
> working on what but that will be down the line.

OK, good to know.  Sounds like there probably isn't much overlap for now.
 
> I personally worry that people are jumping to the secrecy stance too
> quickly, if there is any question then the tendency seems to better safe
> than sorry.  We should be as open as we can possibly be so that the data has
> the chance to have the maximum impact.

Agreed.  Seems like the best way to coordinate for now is on our efforts to make this data from Bugzilla available.
> Re Comment 14:

Thanks Paulo, 

The API calls you listed look great and I think tools can be built around them.
There is no need for a separate API like I listed in Comment 10.

Some questions for you:
- The key of the data seems to be a username, which is fine but I think this is not unique.  Can there also be a unique Id per user?
- Is it possible to link up a user to a bugzilla account or a mozillian account somewhow?
- For retrieving a list of modules a particular contributor contributed to, is it possible to get more granular here using any Product and Component value of bugzilla?
- The formatting of the data I think is subject to get broken if someone builds tools around the output though.
  - For the JSON data there seems to be arrays of data, would it be possible to have the output have property name/values instead?
  - For the XML data there seems to be a Row element for each data row, and a Col for each data item.  Would it be possible to use a descriptive field name in XML instead of always using <Col>?
(In reply to Josh Matthews [:jdm] (travelling until June 25th) from comment #11)
> As discussed in the meeting, we're interested in getting data on individual
> Bugzilla components if possible, instead of the broader modules that are
> exposed right now.

Josh, we've been looking and there are two possibilities

Module: Bugzilla
Sub Modules: Administration, Attachments, Authentication, Bug creation and modification, Charting system, Databases, Documentation, Email notifications, Exporting and Importing, Extensions and Hooks, Flags and Requests, Installation and Upgrading, Search system and Queries, Security, User Interface

Module: bugzilla.mozilla.org
Sub Modules: Administration, Implementation, Infrastructure, Roadmap, Workflow

Which one is it?
I'm sorry, I didn't actually mean Buzilla in particular. I just mean that we'd like to see data on the individual product/component pairs in which contributors are active.
Josh, go ahead and on Contributor Map check the patch details of any user, there are 2 new columns for Product and Component.
Is this what you had in mind?
(In reply to Brian R. Bondy [:bbondy] from comment #21)

- The key of the data seems to be a username, which is fine but I think this is not unique.  Can there also be a unique Id per user?
We garantee that the username is unique because we do manual corrections when the same person appears with different names or the same person appears with different emails

- Is it possible to link up a user to a bugzilla account or a mozillian account somewhow?
Don't think so

- For retrieving a list of modules a particular contributor contributed to, is it possible to get more granular here using any Product and Component value of bugzilla?
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=userModulesProductsQuery&paramcurrentUserParam=Dao Gottwald&paramproductParam=Boot2Gecko&paramcomponentParam=General&outputType=csv&settingcsvSeparator=,
change "Dao Gottwald" to the user that you want
change "Boot2Gecko" to the Product that you want
change "General" to the Component that you want

Updated the Patch Details Query to include Product and Component from bugzilla.

Also you could now get the list of Products and Component a user has contributed to:
https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=userProductsQuery&paramcurrentUserParam=Dao Gottwald&outputType=csv&settingcsvSeparator=,
change "Dao Gottwald" to the user that you want


  The data on JSON is this way because it's how we need it for CDA and CDE in order to build the dashboards, as for the data on XML, we could changed it but it's not a priority right now, so that's why I gave the links for the csv format.
Whiteboard: Research / Commentary / Dialog
Status: NEW → ASSIGNED
Not in DBoswell's list of Q3 asks.
Target Milestone: Unreviewed → Backlogged - BZ
Paolo, the sample queries for product and component you provided in comment 25 don't work. I get messages about "unknown module" when using combinations such as Firefox and General.
(In reply to Paulo Pires from comment #24)
> Josh, go ahead and on Contributor Map check the patch details of any user,
> there are 2 new columns for Product and Component.
> Is this what you had in mind?

I don't see this change as described. I see lists of modules for each contributor, but the concept of a module seems unclear. For example, changes that are in the Core: DOM product: component pair seem to be marked as an Unknown module in some cases. Could you clarify what a module is?

Looking at the Repository Analysis screen, I'm excited to see specific products and components (with that weird Unknown module and submodule stuff, too), but I'm sad that I can't see who the contributors are; there are only unlinked numbers.
Attached image Patch Details
Ah, I did see that screen. However, it doesn't quite fulfill what I'm looking for. Ideally we'd be able to see the list of contributors for a specific product/component pair (maybe a whole product in aggregate, too).
I've tested your example, for Firefox and General the link would be:

https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/dashboards/community.cda&dataAccessId=userModulesProductsQuery&paramcurrentUserParam=Dao Gottwald&paramproductParam=Firefox&paramcomponentParam=General&outputType=csv&settingcsvSeparator=,

and I get the following result in the csv file:

"module"
"Core"
"Firefox"
"Toolkit"
"Unknown module"

where "module" is the header. Please check if you used the same link as I did.

The columns for Product and Component appear only on the patch details, I've checked and it's there, attached a screenshot with it.

A module is a discrete unit of code or activity.
You can check on https://wiki.mozilla.org/Modules for details about Modules.

As for the request for the change of the Repository Analysis screen should be filtered out by the metrics team.
Ok, I was misunderstanding what I was seeing. Could you comment on what "Unknown module" means?
(In reply to Paulo Pires from comment #31)
> I've tested your example, for Firefox and General the link would be:
> 
> https://metrics.mozilla.com/pentaho/content/cda/doQuery?path=community/
> dashboards/community.
> cda&dataAccessId=userModulesProductsQuery&paramcurrentUserParam=Dao
> Gottwald&paramproductParam=Firefox&paramcomponentParam=General&outputType=csv
> &settingcsvSeparator=,
> 
> and I get the following result in the csv file:
> 
> "module"
> "Core"
> "Firefox"
> "Toolkit"
> "Unknown module"
> 
> where "module" is the header. Please check if you used the same link as I
> did.

Thinking about this more, I have no idea what these results mean. That is a list of products (with the addition of the "unknown module"); what is the query actually giving me? What is the point of the product and component parameters?
That query is for the request on comment 21
- For retrieving a list of modules a particular contributor contributed to, is it possible to get more granular here using any Product and Component value of bugzilla?

So according to Product and Component you see a list of modules a particular contributor contributed to.
I'm sorry, I think the terminology became unclear here, and I realize now the original question was confusing. Let me try to explain what I'm looking for here, and I'll attempt to define terms as well.

For a given contributor Sally Programmer, the coding stewards are interested in finding a list of products and components in which Sally has been active. If Sally has written five patches that were in bugs in Core: General, two patches in Core: XUL, three patches in Firefox: Installer, and one in Boot2Gecko: General, I would expect to be able to query Sally's name (and nothing else) and see a list something like this:

Core, General,
Core, XUL,
Firefox, Installer,
Boot2Gecko, General

I've reread your earlier comments and figured out where the disconnect with regards to modules is happening. As far as I can tell, you are classifying patches based on the paths they touch and correlating that to the directories listed on the Modules pages. That's an interesting system, but it's error-prone enough (given the relative infrequency of wiki updates that reflect reality) that I think it would be more effective to lookup the bug id given in a commit and grab the product and component from the originating bug. This is the real information we care about; other tools can lump components together for us and aggregate metrics API queries, but we would like to have the full breakdown available.
What's the current wishlist for Coding contributor dashboards?

Current dashboards that do exist:
https://dataviz.mozilla.org/views/ContributorsPathway/ContributorsPathway
https://dataviz.mozilla.org/views/BalooTemplate/Summary

https://dataviz.mozilla.org/workbooks/Active_Contributor_Dashboard (click the links for the visualizations)
Flags: needinfo?(netzen)
Flags: needinfo?(josh)
Flags: needinfo?(chofmann)
Group: metrics-private
Flags: needinfo?(mhoye)
the dashboards in comment 36 are generally ok for understanding interesting historical total numbers but I'm not sure they meet the request in comment 1.  

scobbie diver is the perfect example of what we would be looking to surface.  It was a very intense contributor for several years on crash report analysis and bug filing, sumo, and a few other areas.  then he stopped contributing and it took us several weeks to notice this and investigate the reason for not being as active.

we need automation to turn up things like that sooner so we can check in to see whats up and if we can help foster continue contribution.

we need reports that help us identify important new contributors that are rising above the level of others, and noticing also when these people run into problems.

Here is the case story. https://www.youtube.com/watch?v=YFOzoN7apnQ
and the specific metrics might look like:

for the last week, month, quarter, year
   which individuals working in which areas are trending to higher and lower contribution
   and for the project overall do we have higher or lower contribution

With those bits we could take action to figure out who are the new stars and who are we losing.

And we could now if we are generally seeing more or less contribution to the project and could start to ask questions like why?
Flags: needinfo?(chofmann)
haven't been following lately so will let others chime in, in my place.
Flags: needinfo?(netzen)
Assigning to Josephine, who can fix the graphs once we've figured out what we want.
Assignee: nobody → jtanumijaya
I have no idea what we want any more, and I'm not putting any time into thinking about this in the foreseeable future.
Flags: needinfo?(josh)
I'm closing this resolved-incomplete. The contributor landscape and our organizational posture has changed so much since it was filed and our requirements are blurry enough that I don't see how meaningful forward motion is going to take place in this bug.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(mhoye)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: