Closed Bug 933783 Opened 11 years ago Closed 11 years ago

Write tools to monitor code changes per author

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

To report productivity in terms of code contributions, we'll need an automated means of determining how much code each of us have contributed during a fixed period.  Two decent metrics for that are number of commits, and number of lines changed in those commits.

So, we'll need some scripts that can handle hg, svn, and git repositories and extract per-user numbers for each of those metrics during specific time periods.
How much is the bonus multiplier for a negative code contribution?
dear god don't let this be tied to bonuses
(In reply to Rob Tucker [:rtucker] from comment #1)
> How much is the bonus multiplier for a negative code contribution?

A lot, I hope :)
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #0)
> To report productivity in terms of code contributions, we'll need an

I disagree that we need to measure productivity in terms of lines of code.  The best programmers can spend hours writing hundreds of lines and days focusing on very few.

And, as atoll pointed out, some cand spend lots of time refactoring and -removing- code, which is just as valuable.

anyway, I can see value in such a tool for identifying where a person's time is spent  (ie: people like dustin and atoll spend a lot of time in puppet, versus other admins who may touch it little).  I just don't want to see this equate to a feeling of needing to hit code quotas or move a graph of contributions to prove their worth.
(In reply to Corey Shields [:cshields] from comment #4)
> (In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo
> me) from comment #0)
> > To report productivity in terms of code contributions, we'll need an
> 
> I disagree that we need to measure productivity in terms of lines of code. 
> The best programmers can spend hours writing hundreds of lines and days
> focusing on very few.
> 
> And, as atoll pointed out, some cand spend lots of time refactoring and
> -removing- code, which is just as valuable.
> 
> anyway, I can see value in such a tool for identifying where a person's time
> is spent  (ie: people like dustin and atoll spend a lot of time in puppet,
> versus other admins who may touch it little).  I just don't want to see this
> equate to a feeling of needing to hit code quotas or move a graph of
> contributions to prove their worth.

I apologize, but it is unclear to me if this is something that I should be looking into building or not?
I'll probably do it when I get a chance, but if you're keen, then by all means do :)
Rob -- I have a pretty stupid simple thing begun at
  https://github.com/djmitche/code-change-monitor

can you have a look and let me know what you think?

It only supports Git right now, so it will need SVN and Hg classes.  And I don't really have a good idea whether the report format is any good.  I'm figuring it can end up in an HTML email eventually.

Would you be willing to hack up SVN and Hg classes while I improve the reports?  Or the other way around?
Yep, happy to take a look and help out. Can you comment on the priority and importance of this project?
I'll leave priority to Corey.  This is important insofar as Upper Management would like metrics on who's doing what, and lots of us (yourself included) mainly "do" code, so this gives UM good visibility of our activities and how those grow or shrink over time.
It would be great to have it by the end of the month so we can use it to collect stats for the next operational report cards.
Comment 9 is in direct contradiction to comment 4. Comment 4 suggested that ICs would not be tracked/scored based on commits and number of lines changed.
I don't intend to track individual ICs, just use this as s tool to show how much effort my team is putting into making code changes from month to month.
I spoke loosely and I shouldn't have.  You shouldn't take my comments as to how this is or is not being used to track, score, evaluate, compensate, punish, reward IC's, resistors, capacitors, employees, volunteers, contractors, or anything like that.  I have no idea.  I just know I code a lot, and if someone is measuring something about me, I'd like those measurements to reflect what I do.  So just ignore the bits where I said anything about what anyone's doing with this, and let's not try to figure that out on this bug.  Talk to your boss.

I was led to believe we should produce this information.  If that's not the case, R/INVALID.  If that is the case, let me know in more detail what it should look like, and I'll keep hacking.

Here's a sample of what I've got: http://people.v.igoro.us/~dustin/ccm-report.html
This tool will require some ability to scan GitHub for employee work (for instance, Puppetagain or the DB team's mysql/postgres modules or a lot of rtucker's work with puppetlabs-firewall).
It can do that - the demo stuff I set up is, in fact.

I'm not sure where we'll end up running this, but that might require a flow.  I'm open to ideas for hosting.
To address the issue of people worrying about being held to some sort of quotas or anything of that sort, let me state that this is absolutely NOT the intention here.  The intention is to be able to show, at a team level, where a great deal of effort is being spent (this is part of the story of "what does your team do?" not keeping track of every commit or every line of code specific people write).  My intention is to give people a sense of how much work actually goes into all of the puppet/python/vbs/etc work that we do as part of an overall picture of all of the things that are on my team's plate.

I'm specifically looking for metrics about repos, because, fundamentally, that's probably 75% of what my team does. Our niche is installation and automation. We don't manage the hardware and we don't manage the applications, and machine uptime or load is not a useful metric when the machines are designed to reboot and only run one job at a time.  

So I hope that clears up any misconceptions and I apologize if people are feeling frustrated or concerned by the exchange in the above comments.  I'll take full responsibility for any ill will since I didn't specify a clear goal or requirements out of the gate.
(In reply to Richard Soderberg [:atoll] from comment #14)
> This tool will require some ability to scan GitHub for employee work (for
> instance, Puppetagain or the DB team's mysql/postgres modules or a lot of
> rtucker's work with puppetlabs-firewall).

I filed bug 941086 to create a link between employee LDAP and GitHub accounts, which is something we've needed for several reasons anyways, but it would benefit you in this case as well.
So, onto what data would be useful to me...

I took a look at http://people.v.igoro.us/~dustin/ccm-report.html and I think that's an awesome start.  I'm mostly interested in raw data per month which I can then add people on my team together and slap into a spread sheet.  I like the fact that it's broken down per repo so I can pull out specific stats for relops (as a whole, not individual people) and do a comparison from month to month and as far as overall contributions to the repo.  One possible example might be: "for the releng puppet repo, there was a total of 8010 lines added, 5295 removed, and relops made 85.5% of those modifications"  That would be great to graph with a secondary graph showing the change between months once we have more than one month of data.

I think we want to query:

releng puppet
releng buildbot
releng mozpool
sysadmins puppet
the various projects we have on github (we'd need to enumerate these)
wherever we decide to keep info for the various windows program/config side of things

So as far as connectors, I think that's likely hg, svn, and git (and we need to figure out where we're keeping the windows stuff... maybe git?)

I'm probably interested in having this in csv format on the first of every month for the previous month (email is fine), but if this is more useful to have a sliding window or some other delivery format/method, I'm open to suggestions.
(In reply to Amy Rich [:arich] [:arr] from comment #18)
> I think we want to query:
> 
> releng puppet
> releng buildbot
> releng mozpool
> sysadmins puppet
> the various projects we have on github (we'd need to enumerate these)
> wherever we decide to keep info for the various windows program/config side
> of things

Bugzilla?
(In reply to Amy Rich [:arich] [:arr] from comment #10)
> It would be great to have it by the end of the month so we can use it to
> collect stats for the next operational report cards.

I can guarantee with 100% certainty that this will not be completed by the end of the month.
I'll take that as a challenge!
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #21)
> I'll take that as a challenge!

My statement is due to the fact I'm slammed with other things and there is no way in 5 working days time I can get the code written for 2 new connectors, not to mention additional requirements of monthly emails as CSV. Just implementing the per person associations to repositories is a large task.

With the workload of everyone in webops, I'd be surprised that if the code were done today, we could get it deployed on IT infra to production in 5 days.

Also factor in that this is being used to generate some level of metrics for tracking productivity of employees, do we really want to make this a rush job with limited eyes on testing and metrics gathering?

If you can get this thing done in 5 days, with enough precision to potentially affect an internal employees performance perception you deserve a special place in the clouds.
I didn't realize this wasn't assigned to me.  I understand you don't have time, and I didn't mean to imply you should make time.

This isn't a web app, so webops isn't involved.  By "hosting" I only mean "a box to run this on", and honestly it could run on a server at my place if push came to shove.

Amy's requests are pretty straightforward.  You make a good point about testing, though.
Assignee: infra → dustin
Component: Infrastructure: Tools → RelOps
QA Contact: rtucker → arich
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #23)
> I didn't realize this wasn't assigned to me.  I understand you don't have
> time, and I didn't mean to imply you should make time.
> 
> This isn't a web app, so webops isn't involved.  By "hosting" I only mean "a
> box to run this on", and honestly it could run on a server at my place if
> push came to shove.
> 
> Amy's requests are pretty straightforward.  You make a good point about
> testing, though.

Let me know if you want help with those connectors, happy to pitch in where I can. I don't have a ton of cycles but could pitch in something I'm sure.
To be clear it is not necessary that this be done by the end of the quarter.  It would be nice to have, but if we don't have cycles, that's fine.  When we get the data when can start using it, but this shouldn't take precedence over other IT goals.
Well, I was unable to find a place to host this that had both python-2.7 and a new enough version of git to take dates in the --since and --until arguments.  So I tarred the thing up and sent it to Amy to run by hand.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.