Closed Bug 952101 Opened 11 years ago Closed 10 years ago

Version 2 of Bugzilla-ETL

Categories

(Testing :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ekyle, Assigned: ekyle)

References

()

Details

Attachments

(1 file)

Attached patch Please reviewSplinter Review
See main bug comment
Attachment #8350054 - Flags: review?(mcote)
Depends on: 952108
Blocks: 952108
No longer depends on: 952108
Comment on attachment 8350054 [details] [diff] [review]
Please review

Review of attachment 8350054 [details] [diff] [review]:
-----------------------------------------------------------------

I can't claim to fully understand the tests, but it all looks decent. :)  Little nits:

::: MANIFEST.in
@@ +15,5 @@
> +include README.txt
> +exclude MANIFEST.in
> +
> +
> +

Remove blank lines.

::: bzETL/extract_bugzilla.py
@@ +168,3 @@
>  
>      try:
> +        comments = db.query("""

Why did you make some of these unicode strings but not all?

::: bzETL/util/cnv.py
@@ +36,5 @@
>      @staticmethod
>      def JSON2object(json_string, params=None, flexible=False):
>          try:
>              #REMOVE """COMMENTS""", #COMMENTS, //COMMENTS, AND \n \r
> +            if flexible: json_string = re.sub(r"\"\"\".*?\"\"\"|\s+//.*\n|#.*?\n|\n|\r", r" ",

May as well move the clause down to its own line.

@@ +231,5 @@
> +            return _filter(esfilter, row, rownum, rows)
> +        return output
> +
> +def _filter(esfilter, row, rownum, rows):
> +    esfilter=struct.wrap(esfilter)

Since you've been fixing these...add spaces around the =.

@@ +273,5 @@
> +            field = esfilter.missing
> +        else:
> +            field = esfilter.missing.field
> +
> +        if row[field] == None:

"is None" is the general convention.

@@ +283,5 @@
> +            field = esfilter.missing
> +        else:
> +            field = esfilter.missing.field
> +
> +        if row[field] != None:

"is not None" would be the convention here.

::: bzETL/util/db.py
@@ +210,2 @@
>  
> +            if param: sql = expand_template(sql, self.quote_param(param))

Might as well split this into two lines.

@@ +212,5 @@
>              sql = self.preamble + outdent(sql)
>              if self.debug:
>                  Log.note(u"Execute SQL:\n{{sql}}", {u"sql": indent(sql)})
>  
> +            # if isinstance(sql, unicode):

Generally it's bad to leave commented-out code, in part because it can bitrot and you'll never notice until you finally want to use it.  If you really want to leave this in, put in a comment as to why it's commented out.

@@ +314,5 @@
>              stderr=subprocess.STDOUT,
>              bufsize=-1
>          )
> +        if isinstance(sql, unicode):
> +            sql=sql.encode("utf-8")

Spacing.

::: bzETL/util/files.py
@@ +20,4 @@
>  
>  
>  class File(object):
> +    def __init__(self, filename, buffering=2 ** 14):

I believe the standard for the power operator is no spaces, e.g. 2**14.

::: bzETL/util/jsons.py
@@ +9,1 @@
>  #

Random comment: might be good to explain why this file is preferable to the standard json module.

@@ +148,2 @@
>  ESCAPE_DCT = {
> +    u"\\": u"\\\\",

Any reason you switched all the single quotes to double quotes here?

::: bzETL/util/maths.py
@@ +113,5 @@
>          return output
>  
> +    @staticmethod
> +    def ceiling(value):
> +        return math.ceil(value)

Why bother with this?  Why not use math.ceil() directly where this is being called?  Just to keep all the math-related stuff you use in this file?

::: bzETL/util/queries/Q.py
@@ +544,5 @@
>  
>      def __getitem__(self, key):
>          try:
> +            if isinstance(key, dict):
> +                key=struct.unwrap(key)

Spacing.

@@ +547,5 @@
> +            if isinstance(key, dict):
> +                key=struct.unwrap(key)
> +                key = [key.get(k, None) for k in self._keys]
> +            elif not isinstance(key, list):
> +                key=[key]

Spacing.

@@ +567,5 @@
>  
>  
>      def add(self, val):
> +        if not isinstance(val, dict):
> +            val = {(self._keys[0], val)}

Just noticed this... why the init via tuple and not the standard colon notation?

::: bzETL/util/queries/windows.py
@@ +174,5 @@
>  
>      def end(self):
>          return self.total
> +
> +

Remove extra blank lines.

::: bzETL/util/startup.py
@@ +10,5 @@
>  
>  import argparse
> +import os
> +import tempfile
> +import sys

Sorting.

@@ +95,5 @@
> +
> +    Remember that this works by creating a lock file with a filename based on the full path to the script file.
> +    """
> +    def __init__(self, flavor_id=""):
> +        import sys

Why in the constructor?

::: tests/resources/python/look_for_leaks.py
@@ +10,5 @@
> +
> +    # max_bug_id = private.query({
> +    # 	"from":"private_bugs",
> +    # 	"select":{"value":"bug_id","aggregate":"maximum"}
> +    # })

Remove or add comment (and elsewhere below).

@@ +15,5 @@
> +
> +    max_bug_id = private.search({
> +        "query":{"filtered":{
> +            "query":{"match_all":{}},
> +            "filter":{"and":[{"match_all":{}}]}

Horizontal spacing (and below).

@@ +57,5 @@
> +        })
> +
> +        private_ids=set(Q.select(results.hits.hits, "fields.bug_id"))
> +
> +

Spacing (both vertical and horizontal).

@@ +91,5 @@
> +    #VERIFY NONE IN PUBLIC
> +
> +    #FIND ALL PRIVATE COMMENTS
> +
> +    #VERIFY NONE IN PUBLIC

What are these comments about?

@@ +96,5 @@
> +
> +
> +
> +
> +if __name__=="__main__":

Two blank lines would suffice. :)  Also, horizontal spacing.

@@ +101,5 @@
> +    try:
> +        settings = startup.read_settings()
> +        Log.start(settings.debug)
> +        private = ElasticSearch(settings.private)
> +        public =  ElasticSearch(settings.public)

Spacing.

::: tests/test_etl.py
@@ +9,4 @@
>  #
>  
>  from datetime import datetime
> +import unittest

Sorting.  And technically "from" import should go below direct imports.

@@ +108,5 @@
> +                param.end_time = CNV.datetime2milli(get_current_time(db))
> +                param.start_time = 0
> +                param.start_time_str = extract_bugzilla.milli2string(db, 0)
> +                param.alias_file = self.settings.param.alias_file
> +                #            param.bugs_filter=SQL("bug_id in {{bugs}}", {"bugs":db.quote_value(some_bugs)})

Remove or add comment as to why this is commented out.  Also, weird spacing.

@@ +125,5 @@
> +                except Exception, e:
> +                    Log.warning("Total failure during compare of bugs {{bugs}}", {"bugs": some_bugs}, e)
> +
> +
> +    def test_private_etl(self):

Don't double-space method definitions within a class.

@@ +145,5 @@
> +
> +    def test_public_etl(self):
> +        """
> +
> +        """

Empty docstring...

@@ +475,3 @@
>                  ref_versions = \
>                      Q.sort(
> +                        map(# map-lambda ADDED TO FIC OLD PRODUCTION BUG VERSIONS

I know this is pre-existing code, but why is this not a list comprehension?  If this comment explains it, I don't quite understand. :)

::: tests/util/database.py
@@ +115,5 @@
>          db.insert("bugs_activity", activity)
>  
> +    db.execute("UPDATE bugs SET delta_ts={{now}} WHERE {{where}}", {
> +        "now":now,
> +        "where":db.esfilter2sqlwhere({"term":{"bug_id":old_record.bug_id}})

Spacing.

::: tests/util/elasticsearch.py
@@ +41,4 @@
>  
>  
>      def search(self, query):
> +        f=CNV.esfilter2where(struct.wrap(query).query.filtered.filter)

Spacing.
Attachment #8350054 - Flags: review?(mcote) → review+
Fixes (no code changes, all tests pass):
https://github.com/klahnakoski/Bugzilla-ETL/commit/cccb89f3ab27c4abc1f61baa2199aa1a781e71b0



> "is None" is the general convention.

Yes, but I have the NullStruct floating around my code.  Using x == None does a more complex comparison (https://github.com/klahnakoski/Bugzilla-ETL/blob/master/bzETL/util/struct.py#L228)



> Random comment: might be good to explain why this file is preferable to the standard json module.

Thank you.  Added comment (https://github.com/klahnakoski/Bugzilla-ETL/commit/cccb89f3ab27c4abc1f61baa2199aa1a781e71b0#diff-388b6b027188a75b8fe851ad96a9d01cR11)


> @@ +148,2 @@
> >  ESCAPE_DCT = {
> > +    u"\\": u"\\\\",
> 
> Any reason you switched all the single quotes to double quotes here?

Only for the sake of uniformity with other lines.  I had to double-take on the backslash and whether I got it correct.



> ::: bzETL/util/maths.py
> @@ +113,5 @@
> >          return output
> >  
> > +    @staticmethod
> > +    def ceiling(value):
> > +        return math.ceil(value)
>
> Why bother with this?  Why not use math.ceil() directly where this is being called? 

One of my many weaknesses is a terrible memory for the multitude of short forms.  In this case I keep forgetting that the "ceiling" function is missing "ing".  After having to look this up twice, and even wondering if Python had such a function, I added a full name version here so I do not need to remember this again.  Furthermore, one day, I will add "mod" parameter so I can perform ceiling() modulo some base value.

> @@ +475,3 @@
> >                  ref_versions = \
> >                      Q.sort(
> > +                        map(# map-lambda ADDED TO FIC OLD PRODUCTION BUG VERSIONS
>
> I know this is pre-existing code, but why is this not a list comprehension?  If this comment
> explains it, I don't quite understand. :)

Thank you!  List comprehension should have been used.  I must have railroaded myself into using map() for some reason I long forgot.   Changed is in a future version.

> > +
> > +    Remember that this works by creating a lock file with a filename based on the full path to 
> the script file.
> > +    """
> > +    def __init__(self, flavor_id=""):
> > +        import sys
> 
> Why in the constructor?

Good question!  Will have it fixed in another version.
Assignee: nobody → klahnakoski
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: