Regression Ranges Calculation Script

NEW
Assigned to

Status

9 years ago
9 years ago

People

(Reporter: murali, Assigned: murali)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

9 years ago
This is a python script that calculates regression ranges for bugs and prints out the following information into a local DB.

Regressor, RegressorFixDate, Regression, RegressionFileDate, RegressionTagdate, Fix2File,File2Tag , day_of_record

You can not run this script because you don't have the required access to Bugzilla back end DB dump.

But, I would like you to review this for syntax and style and please provide any optimization techniques that I might benefit from.

Thanks
Murali
(Assignee)

Comment 1

9 years ago
Created attachment 437158 [details]
regression ranges calculation

Thsi script does regression ranges calculation
Comment on attachment 437158 [details]
regression ranges calculation

>  def remote_get_data(self):  
>    self.cursor = self.conn.cursor ()
>    try:
>      self.cursor.execute ("""
>        SELECT d.*,
>        b.bug_id,
>        b.keywords,
>        DATE_FORMAT(b.creation_ts, '%Y-%m-%d') 
>        FROM bugs_security.dependencies d,
>        bugs_security.bugs b
>        WHERE b.bug_id=d.dependson
>        AND b.keywords LIKE '%regression%'
>        AND DATE_FORMAT(b.creation_ts, '%Y-%m-%d') > '2009-01-01'
>           """)
>    except  MySQLdb.Error, e:
>      print "Error %d: %s" % (e.args[0], e.args[1])
>      sys.exit (1)
>  def data_handler(self):
>    self.today = datetime.date.today()
>    print "today is ", self.today
>    while True:
>      row = self.cursor.fetchone ()
>      if row == None:
>        break

Newlines in between defs

>    self.regressor_details =  self.fetch_all_info(regressor, 0)

'fetch_all_info', there must be some defining characteristic about this info, if its bug info, name the function so.

>    self.regression_details = self.fetch_all_info(regression,1)

Could use a string or some defined constants for the flag here, True/False would be sufficient if...
  
>  def fetch_all_info(self,bugid,flag):

you name the flag based on what it signifies

>    if flag == 0:
>      self.regressor = bugid
>      history=self.get_bug_history(self.regressor)
>      details=self.get_bug_mainbody(self.regressor)
>      hist_array=self.regressor_history_parser(history)
>      detail_array=self.regressor_details_parser(details)
>      return hist_array,detail_array
>    else:
>      self.regression = bugid
>      history=self.get_bug_history(self.regression)
>      details=self.get_bug_mainbody(self.regression)
>      hist_array=self.regression_history_parser(history)
>      detail_array=self.regression_details_parser(details)
>      return hist_array,detail_array
> 

You should take the return out from under the conditional.
 
>  def get_bug_history(self,bugid):
>    url='https://api-dev.bugzilla.mozilla.org/latest/bug/'+str(bugid)+'/history?username='+self.bzusername+'&password='+self.bzpasswd
>    return json.load(urllib.urlopen(url))
>

Could put the base url to the bugzilla api in a constant somewhere

>    
>  def regressor_history_parser(self,buff):

this doesn't return a parser, you could name it by what it does like 'parse_regressor_history'

>      for idy in range(len(buff['history'][idx]['changes'])):
>        if buff['history'][idx]['changes'][idy]['field_name'] == 'resolution' and  buff['history'][idx]['changes'][idy]['added'].upper().startswith('FIXED') != -1:
>          return [buff['history'][idx]['change_time'], buff['history'][idx]['changes'][idy]['field_name'] , buff['history'][idx]['changes'][idy]['added']]

Break up long lines


># **** MAIN BODY OF THE CODE STARTS DOWN BELOW *****
>
># Should I write the number of ARGS check from  the commandline ?
># I need 9 args at commandline.
># Before the program starts, I should have VPNed and SSH Tunneled
># to my local copy of BugZilla DB in a terminal ...
>print sys.argv
>prog,rhost,rport,rdb,ruser,rpassword,luser,lpassword,bzusername,bzpasswd = sys.argv
>dm=DB_Connection()
>dm.remote_connection(rhost,rport,rdb,ruser,rpassword,bzusername,bzpasswd)
>dm.remote_get_data()
>dm.create_local_table(luser,lpassword)
>dm.data_handler()
>

bad Murali! I talked to you about this already (importing this script from other scripts). Also use an options parser (like optparse) and pass the options into the remote_connection function as a 'config' kind of dict, I think there are too many arguments here.

Just a few readability comments from me, haven't looked too deep into it. Good stuff (-:

Comment 3

9 years ago
Comment on attachment 437158 [details]
regression ranges calculation

>#
># The Initial Developer of the original Code is
># Murali Krishna Nandigama
># Portions created by the Initial Developer are Copyright (C) 2010
># the Initial Developer. All Rights Reserved.

This should be 

# The Original Code is mozilla.org code.
#
# The Initial Developer of the Original Code is
# Mozilla Foundation.
# Portions created by the Initial Developer are Copyright (C) 2010
# the Initial Developer. All Rights Reserved.
#
# Contributor(s):
# Murali Krishna Nandigama

The following are mostly personal style related comments.

>import  MySQLdb,sys,os,json,re,datetime,urllib
>

I like spaces following comma in lists and following comma and period in sentences . I won't mention it again though there are more examples below. Helps readability in my opinion.

>class DB_Connection:
>  
>  def remote_connection(self, rhost,rport,rdb,ruser,rpassword,bzusername,bzpasswd):
>    self.rhost=rhost
>    self.rport=int(rport)
>    self.rdb=rdb
>    self.ruser=ruser
>    self.rpasswd=rpassword
> 

Personally I prefer assignments and other operators to be separated by a space on either side. In cases where there are several assignments in a row, I prefer to align them on the assignment operator. Seems more readable to me.

    self.rhost   = rhost
    self.rport   = int(rport)
    self.rdb     = rdb
    self.ruser   = ruser
    self.rpasswd = rpassword
   
>    # While we are at it,store the bugzilla login/passwd also
>    # in the self object.This is as better place as any!!
>
>    self.bzusername=bzusername
>    self.bzpasswd=bzpasswd
>    
>    try:
>         self.conn = MySQLdb.connect (host = self.rhost,
>                                 user = self.ruser,
>                                 passwd = self.rpasswd,
>                                 db = self.rdb,
>                                 port = self.rport)
>    except MySQLdb.Error, e:
>         print "Error %d: %s" % (e.args[0], e.args[1])
>         sys.exit (1)
>         
>  def remote_get_data(self):  
>    self.cursor = self.conn.cursor ()
>    try:
>      self.cursor.execute ("""
>        SELECT d.*,
>        b.bug_id,
>        b.keywords,
>        DATE_FORMAT(b.creation_ts, '%Y-%m-%d') 
>        FROM bugs_security.dependencies d,
>        bugs_security.bugs b
>        WHERE b.bug_id=d.dependson
>        AND b.keywords LIKE '%regression%'
>        AND DATE_FORMAT(b.creation_ts, '%Y-%m-%d') > '2009-01-01'
>           """)
>    except  MySQLdb.Error, e:
>      print "Error %d: %s" % (e.args[0], e.args[1])
>      sys.exit (1)

separate function/method definitions by at least one blank line as you do elsewhere.

>  def data_handler(self):
>    self.today = datetime.date.today()
>    print "today is ", self.today
>    while True:
>      row = self.cursor.fetchone ()
>      if row == None:
>        break
>        
>      #date, author, foo, y = row
>      # Write code here to take the zeroth and 2nd element ,
>      # fetch the details of history and main body from bugzilla
>      # and then calculate the fix2file as well as file2tag deltas.
>      self.run_dates_delta_calculation(row[0],row[2])
>    print "Number of rows returned: %d" % self.cursor.rowcount 
>    self.cursor.close ()
>    self.conn.close ()      

ditto

>  def run_dates_delta_calculation(self,regressor,regression):
>    self.regressor_details =  self.fetch_all_info(regressor, 0)
>    self.regression_details = self.fetch_all_info(regression,1)
>    
>    # Get the regressor fix date, regression file date,regression tag date
>    # etc.,
>    
>    if self.regression_details[0][0].startswith('0000-00-00'):
>      regression_filedate=regression_tagdate=self.regression_details[1][2][:10]
>    else:
>      regression_filedate=self.regression_details[1][2][:10]
>      regression_tagdate=self.regression_details[0][0][:10]
>    regression_product=self.regression_details[1][0]
>    regression_component=self.regression_details[1][1]
>    
>    regressor_fixdate=self.regressor_details[0][0][:10]
>    fix2filedate=self.dates_delta(regression_filedate,regressor_fixdate)
>    file2tagdate=self.dates_delta(regression_tagdate,regression_filedate)
>    today=str(self.today.year)+'-'+str(self.today.month)+'-'+str(self.today.day)
>    try:
>      self.update_local_db(self.regressor,regressor_fixdate,self.regression,regression_filedate,regression_tagdate,fix2filedate,file2tagdate,today)
>    except MySQLdb.Error, e:
>         print "Error %d: %s" % (e.args[0], e.args[1])
>         sys.exit (1)
>      
>  
>  def fetch_all_info(self,bugid,flag):
>    if flag == 0:
>      self.regressor = bugid
>      history=self.get_bug_history(self.regressor)
>      details=self.get_bug_mainbody(self.regressor)
>      hist_array=self.regressor_history_parser(history)
>      detail_array=self.regressor_details_parser(details)
>      return hist_array,detail_array
>    else:
>      self.regression = bugid
>      history=self.get_bug_history(self.regression)
>      details=self.get_bug_mainbody(self.regression)
>      hist_array=self.regression_history_parser(history)
>      detail_array=self.regression_details_parser(details)
>      return hist_array,detail_array
>  
>  def get_bug_history(self,bugid):
>    url='https://api-dev.bugzilla.mozilla.org/latest/bug/'+str(bugid)+'/history?username='+self.bzusername+'&password='+self.bzpasswd

url='https://api-dev.bugzilla.mozilla.org/latest/bug/%s/history?username=%s&password=%s' % (bugid, self.bzusername, self.bzpasswd)

seems more readable.

Make 'https://api-dev.bugzilla.mozilla.org/latest' global as in

bugzilla_api_url = 'https://api-dev.bugzilla.mozilla.org/latest'

then you could do

url='%s/bug/%s/history?username=%s&password=%s' % (bugzilla_api_url, bugid, self.bzusername, self.bzpasswd)

>    return json.load(urllib.urlopen(url))

according to the bugzilla rest api docs this is an object with one property 'history' containing an array of changesets. Maybe go ahead and just dereference the history property here.

i.e. 

     history_changesets = json.load(urllib.urlopen(url))
     return history_changesets['history']

then you don't need all that buff['history'] stuff everywhere.

>
>  
>  def get_bug_mainbody(self,bugid):
>    url='https://api-dev.bugzilla.mozilla.org/latest/bug/'+str(bugid)+'?username='+self.bzusername+'&password='+self.bzpasswd

ditto

>    return json.load(urllib.urlopen(url))
>    
>  def regressor_history_parser(self,buff):
>    #for h in buff['history']:
>    #  for c in h['changes']:
>    #    if c['field_name'] == 'resolution':...
>    for idx in range (len(buff['history'])):
>      for idy in range(len(buff['history'][idx]['changes'])):
>        if buff['history'][idx]['changes'][idy]['field_name'] == 'resolution' and  buff['history'][idx]['changes'][idy]['added'].upper().startswith('FIXED') != -1:
>          return [buff['history'][idx]['change_time'], buff['history'][idx]['changes'][idy]['field_name'] , buff['history'][idx]['changes'][idy]['added']]
>    # if the bug is still not FIXED, like one of those tracker bugs
>    # return a date of FIX from the future.
>    return [u'2020-12-12T05:22:42Z',u'resolution',u'FIXED']

man, this is hard to read. the commented out code is distracting. The loops could benefit from some well named temporary variables.

buff is not a meaningful name. Same for all the other buffs out there. considering the bugzilla api, buff is an object containing a property 'history' which is an array of changeset objects. If you make the change above and change get_bug_history to de-reference the history property before returning you could write this as:


  def regressor_history_parser(self, changeset_list):

    for changes in changeset_list:
        change_list = changes['changes']
        changer     = changeset['changer']
        change_time = changeset['change_time']

        for change in change_list:
            if change['field_name'] == 'resolution' and re.search('^fixed', change['added'], re.IGNORECASE):
              return [change_time, change['field_name'] , change['added']]


that's all for now.
(Assignee)

Comment 4

9 years ago
Implemented most of the comments from Harth and BC. BC, I implemented the last suggestion in my own way and de-cluttered the 
for idx in range (len(buff['history'])):
>      for idy in range(len(buff['history'][idx]['changes'])):
>        if buff['history'][idx]['changes'][idy]['field_name'] == 'resolution' and  buff['history'][idx]['changes'][idy]['added'].upper().startswith('FIXED') != -1:
>          return [buff['history'][idx]['change_time'], buff['history'][idx]['changes'][idy]['field_name'] , buff['history'][idx]['changes'][idy]['added']]

greatly.


Please look at the new revision.
(Assignee)

Comment 5

9 years ago
Created attachment 437226 [details]
Version II of regression ranges calculator

implemented many of BC and Harth's comments. Still pending main method and opts check code bits.
Attachment #437158 - Attachment is obsolete: true
For the command line parsing, don't forget the earlier comment about optparse.  Here is an example of how it is used in other scripts:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#179

Also, please include a main() function and call it as such:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#266

One other bit of feedback is some of your function argument lists have arg,arg2,arg3, and others have arg, arg2, arg3.  Please follow the convention of the second approach and use spaces between args.

One other thing I see is a sys.exit() call in the except blocks.  It would be nice if we could bubble that up to the calling functions and at the very least ensure we close our db cursor and connection.
(Assignee)

Comment 7

9 years ago
Man!  you guys are a tough bunch!!!

Done with all the boiler plate code.   Please check the 3rd version for review and approval.

Thanks
Murali
(Assignee)

Comment 8

9 years ago
Created attachment 437354 [details]
Version III of regression ranges calculator
Attachment #437226 - Attachment is obsolete: true
Please put the parser options and defaults into a function instead of at the top scope.  

Also there is one function 'update_local_db' which has args with no spaces between them.

Could you try to keep this to 80 character width limit if possible?  

Lastly, you have a comment to check for 9 args.  Go ahead and do that.  Also create a usage() function to display the usage and requirements.
(Assignee)

Comment 10

9 years ago
I am using Ctalbert's notation of putting parser options in global space and not in a function call[ like in heartbeat ]. I asked him for clarification about it as well.

I will act on the rest of comments.

Thanks

Comment 11

9 years ago
(In reply to comment #10)
> I am using Ctalbert's notation of putting parser options in global space and
> not in a function call[ like in heartbeat ]. I asked him for clarification
> about it as well.
> 
Jmaher: is there any need for that in things that are not terribly likely to be inherited by other code?
it is good coding practice.  This is a script that could get imported in in the future and would need to be modified if it was.  Having everything in a main() function is a good idea.

for example if we have a series of scripts that do bugzilla related reports, we might refactor some of this code.  It would just be easier to work with in the future if we do it now.

Comment 13

9 years ago
A few notes:

> import  MySQLdb, sys, os, json, re, datetime, urllib

[minor]
I like putting my imports on separate lines for easy diffing and modification

--

> class RegressionRangeCalculation:

[minor] 
New-style classes inherit from object (see, e.g. http://stackoverflow.com/questions/54867/old-style-and-new-style-classes-in-python) but it really doesn't mater for this program. `class RegressionRangeCalculation(object)`

--

>         self.conn  =  MySQLdb.connect (host  =  self.rhost,
>                                        port  =  self.rport,
>                                        db  =  self.rdb,
>                                        user  =  self.ruser,
>                                        passwd  =  self.rpasswd)


[style]
That''s a lot of extra space!  I would do (ignoring indentation)

self.conn = MySQLdb.connect(host=self.rhost,
                            port=self.rport,
                            db=self.rdb,
                            user=self.ruser,
                            passwd=self.rpasswd)

You do this elsewhere to.  The "prefered" style is one space before + after '='s and except in function calls (including ctors) in which you use 0 spaces.

--

>    except MySQLdb.Error, e:
>         print "Error %d: %s" % (e.args[0], e.args[1])
>         sys.exit (1)

[minor]
Its preferable to re-raise the exception, unless you're supressing output (not really sure why you're printing here or the details):

except MySQLdb.Error, e:
  print "Error %d: %s" % (e.args[0], e.args[1])
  raise # re-raise the exception, causing output to stderr and an exit with code 1

Since you do this exception handling several places, I might try to unify this.

--

>      if row  ==  None:

[minor]
None is a singleton, so `if row is None` is prefered

--

As discussed earlier, the handling of main is....odd, IMHO.  The "# MAIN BODY OF THE CODE" comment is unnecessary;  any such documentation should be as documentation of the main function.  The recipe I always use (as do most pythonistas in my experience is:

def main(args=sys.argv[1:]):
    """command line entry point"""

    defaults = {} 
    parser = OptionParser(usage="<how to use this program>")
    # add options to parser
    ...
    parser.set_defaults(**defaults)

    # parse options
    options, args = parser.parse_args()

    # check/sanify input [if needed]

    # perform the regression range calculation
    dm = RegressionRangeCalculation()
    dm.remote_connection(options.rhost, 
                         options.rport,
                         options.rdb,
                         options.ruser,
                         options.rpassword,
                         options.bzusername,
                         options.bzpasswd)
    dm.remote_get_data()
    dm.create_local_table(luser, lpassword)
    dm.data_handler()

if __name__ == '__main__':
  main()

An advantage of doing it this way (though I'd claim its good structure in any case) is that if you ever package the program, you can associate the main function with a script that will be installed.  If desired, you could abstract what is now main to its own function, as long as the intention of that function is clear.

--

Other general critiques:

* this is very hard to read;  I haven't looked at it on a functional level to try to figure out what is actually being done, but there are a lot of conventions going on that 

* docstrings are nice; I would recommend using them

* similar critiques as on 'buff':  variables, functions, etc should be semantic as a clue to others reading the code.  Just to give one example:

>  def run_dates_delta_calculation(self, regressor, regression):
 
What is regressor?  What is regression?  These aren't documented anywhere, so I'm not sure what's going on.  Also, `calculate_dates_delta` is a better function name.


I hope this isn't too nit-picky!

Comment 14

9 years ago
(In reply to comment #12)
> it is good coding practice.  This is a script that could get imported in in the
> future and would need to be modified if it was.  Having everything in a main()
> function is a good idea.
> 
> for example if we have a series of scripts that do bugzilla related reports, we
> might refactor some of this code.  It would just be easier to work with in the
> future if we do it now.

That makes good sense.  So I should refactor the heartbeat code too.  I'll find time for that.  Murali, you should change this per Joel's request.
(Assignee)

Comment 15

9 years ago
ok !! first thing tomorrow !!!
You need to log in before you can comment on or make changes to this bug.