Closed
Bug 669961
Opened 14 years ago
Closed 14 years ago
Refactor the search API code
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
2.3
People
(Reporter: adrian, Assigned: adrian)
References
Details
Attachments
(4 files, 6 obsolete files)
70.70 KB,
patch
|
lars
:
review+
|
Details | Diff | Splinter Review |
70.83 KB,
patch
|
lars
:
review+
|
Details | Diff | Splinter Review |
2.75 KB,
patch
|
lonnen
:
review+
|
Details | Diff | Splinter Review |
30.86 KB,
patch
|
lars
:
review+
|
Details | Diff | Splinter Review |
I'm in the process of refactoring the search API code in the middleware, for better consistency, better performances, and hopefully better unit-testing.
This bug is for tracking this process and for code reviews.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → adrian
Version: 2.2 → 2.1
Assignee | ||
Comment 1•14 years ago
|
||
This is the first and main "wave" of refactoring. ElasticSearchAPI is mainly concerned, postgres only following the changes in searchUtil and few architecture changes.
* searchAPI is renamed searchUtil, and contains functions used by both elasticSearchAPI and postgresAPI ;
* Default values for parameters are centralized ;
* Building the main ES query is now a function ;
* The ES query uses query_string only when necessary, instead it uses filters which are faster ;
* ElasticSearchAPI is better structured and will be easier to unit-test ;
* ElasticSearch returns data about hang number and plugin number for each signature (this was a regression).
Attachment #544663 -
Flags: review?(laura)
Attachment #544663 -
Flags: review?(lars)
Comment 2•14 years ago
|
||
Comment on attachment 544663 [details] [diff] [review]
Refactor search module
I really didn't see trouble in the original object oriented design. As it says in PEP8, design for inheritance. Moving things out to the module level only serve the thwart inheritance. Utilities that are useful outside the class should be at module level. Utilities that are used only within the class should be enclosed within the class.
You can better accomplish "cleaning" by making functions that don't access local data into '@static' functions, breaking functions that are longer than 50 (or so) lines into smaller functions, and moving variable/function names into PEP8 form ("es_result" instead of "esResult" (I really hate saying that as I prefer camelCase)).
Once you've done that sort of refactoring, I think it is more important to focus on unit tests.
Attachment #544663 -
Flags: review?(lars) → review-
Assignee | ||
Comment 3•14 years ago
|
||
Following the advices of Lars, I rolled back to an inheritance architecture.
This is still here from previous patch:
* Default values for parameters are centralized ;* Building the main ES query is now a function ;
* The ES query uses query_string only when necessary, instead it uses filters which are faster ;
* ElasticSearchAPI is better structured and will be easier to unit-test ;
* ElasticSearch returns data about hang number and plugin number for each signature (this was a regression).
This is new:
* Helpers functions are now static methods ;
* Everything follows PEP8, from modules and files names to variables in functions ;
* Integrate fixes for 500 Errors (see bug 669965 )
This is testable on khan: http://agaudebert.khan.mozilla.org/query and has been tested with PowerFuzzer, resulting in no 500 Internal Error.
If r+, we'll have to test it more with Matt Brandt when I come back.
Attachment #544663 -
Attachment is obsolete: true
Attachment #545038 -
Flags: review?(laura)
Attachment #545038 -
Flags: review?(lars)
Attachment #544663 -
Flags: review?(laura)
Comment 4•14 years ago
|
||
Comment on attachment 545038 [details] [diff] [review]
Refactor search module v2
The name of the base class SearchAPI implies that it defines the api for search. However, the primary public API for this class is embodied in the methods 'query' and 'search' that are only defined in the derived classes. I think the base class should define, but not implement these functions.
Had I been designing this in C++, I would have made the base class SearchAPI abstract by defining the 'search' and 'query' functions as pure virtual. Python has a similar system through the use of the 'abc' (abstract base class) module:
>>> import abc
>>> class Alpha(object):
... __metaclass__ = abc.ABCMeta
... @abc.abstractmethod
... def a(self):
... raise NotImplemented
...
>>> class Beta(Alpha):
... pass
...
>>> class Gamma(Alpha):
... def a(self):
... print "Gamma.a"
...
>>> a = Alpha()
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
TypeError: Can't instantiate abstract class Alpha with abstract methods a
>>> b = Beta()
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
TypeError: Can't instantiate abstract class Beta with abstract methods a
>>> g = Gamma()
>>> g.a()
Gamma.a
Now this is all very interesting and I'm obviously feeling academic this morning, but you don't have to go this far. Just define 'query' and 'search' in the base class and have them raise the 'NotImplemented' exception. That'll make me happy...
Attachment #545038 -
Flags: review?(lars) → review-
Assignee | ||
Comment 5•14 years ago
|
||
Actively trying to make Lars happy! :D
* Adding some ABC features for search API implementation ;
* Adding empty methods raising NotImplemented to follow the API.
Code has been tested with PowerFuzzer, no errors. Need to be QAed if r+ed.
Attachment #545038 -
Attachment is obsolete: true
Attachment #546586 -
Flags: review?(lars)
Attachment #545038 -
Flags: review?(laura)
Comment 6•14 years ago
|
||
Comment on attachment 546586 [details] [diff] [review]
Refactor search module v3
Review of attachment 546586 [details] [diff] [review]:
-----------------------------------------------------------------
well, as I said in the last sentence in my previous review, you didn't have to go so far as using the abc classes. However, this is just fine. You've satisfied me.
Attachment #546586 -
Flags: review?(lars) → review+
Comment 7•14 years ago
|
||
adrian, when this lands on dev will you bump this to fixed? I'll finish QAing it for you :)
Assignee | ||
Comment 8•14 years ago
|
||
Just a quick update of the code after running my first unit tests:
* Fixing a bug in parameters order when calling build_wildcard_query() ;
* Correcting some specific behaviors ;
* Adding precision to a comment.
This is just 6 or 7 lines changing comparing to the last patch (I don't mark it as obsolete so you can diff).
Attachment #546895 -
Flags: review?(lars)
Comment 9•14 years ago
|
||
Comment on attachment 546895 [details] [diff] [review]
Refactor search module v4
this is a rubber stamp approval. The two patches are largely the same and you say there are only a few changes between them. I'll trust you.
Attachment #546895 -
Flags: review?(lars) → review+
Assignee | ||
Updated•14 years ago
|
Target Milestone: --- → 2.1
Version: 2.1 → Trunk
Assignee | ||
Comment 10•14 years ago
|
||
Patch v4 landed on trunk with revision r3275. Not marking this as RESOLVED because the work on this is not finished yet - more refactoring to come on ElasticSearchAPI, and PostgresAPI would love to be refactored at some point too.
Assignee | ||
Comment 11•14 years ago
|
||
The httpclient module was forgot in the previous patch, causing 500 Errors on dev when using search with ES. Very simple changes, only switching to PEP8, which impacts the module name and one function name.
Attachment #548295 -
Flags: review?(lars)
Assignee | ||
Comment 12•14 years ago
|
||
Same patch than just before, but I removed the unneeded import of urllib.
Attachment #548295 -
Attachment is obsolete: true
Attachment #548302 -
Flags: review?(chris.lonnen)
Attachment #548295 -
Flags: review?(lars)
Assignee | ||
Comment 13•14 years ago
|
||
Attachment #548305 -
Flags: review?(chris.lonnen)
Comment 14•14 years ago
|
||
Comment on attachment 548302 [details] [diff] [review]
httpclient was to be refactored too .2
Review of attachment 548302 [details] [diff] [review]:
-----------------------------------------------------------------
pep8 nitpickery:
whitespace before ':' on lines 23, 24, and 25.
line 42 is too long.
Fix those and this can go in.
Attachment #548302 -
Flags: review?(chris.lonnen) → review-
Assignee | ||
Comment 15•14 years ago
|
||
Adrian - PEP8, round 3!
Attachment #548302 -
Attachment is obsolete: true
Attachment #548305 -
Attachment is obsolete: true
Attachment #548310 -
Flags: review?(chris.lonnen)
Attachment #548305 -
Flags: review?(chris.lonnen)
Comment 16•14 years ago
|
||
Comment on attachment 548310 [details] [diff] [review]
HttpClient class refactor
Review of attachment 548310 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #548310 -
Flags: review?(chris.lonnen) → review+
Assignee | ||
Comment 17•14 years ago
|
||
Moved to 2.3 as this bug is a big, slowly-moving process.
Target Milestone: 2.1 → 2.3
Assignee | ||
Comment 18•14 years ago
|
||
This patch applies PEP8 rules to httpclient.py, searchapi.py and elasticsearch.py. Files were checked using jbalogh's check.py. Postgres is not here yet because it needs a little more changes than just breaking lines.
Attachment #549981 -
Flags: review?(lars)
Comment 19•14 years ago
|
||
Comment on attachment 549981 [details] [diff] [review]
Did you say PEP8?
> if http_response["error"]["code"] == 404 and \
> data.find("IndexMissingException") >= 0:
The backslash line continuation character is so very FORTRAN. I've found it to be much more stylish to use parenthesis to accomplish the same thing:
> if (http_response["error"]["code"] == 404 and
> data.find("IndexMissingException") >= 0):
> logger.debug("Query the OS by signature: %s" %
> count_by_os_query_json)
In this line, the python interpreter will do the string substitution and pass it to the 'debug' method of the logger. If the logger is set to ignore 'DEBUG' level messages, it will throw away the new string and just return, having wasted the effort of doing the string substitution.
The logging function calls (debug, info, error, critical) accept a variable number of arguments. You could do this instead:
> logger.debug("Query the OS by signature: %s",
> count_by_os_query_json)
Rather than do the string substitution, the Python interpreter will pass the two arguments to the 'debug' method. If the logger is set to ignore 'DEBUG', it will drop the two arguments and return. It doesn't do the unnecessary overhead of the string catenation. However, if the logger is to log 'DEBUG' level messages, it assumes that the first argument is a string and all subsequent arguments are for string substitution. The 'debug' method does the string substitution for you. It is a minor efficiency in cases where the logging level is set higher than the logging call.
Attachment #549981 -
Flags: review?(lars) → review-
Assignee | ||
Comment 20•14 years ago
|
||
Applying changes Lars suggested, completed the doc for function get_parameters().
Attachment #549981 -
Attachment is obsolete: true
Attachment #550149 -
Flags: review?(lars)
Updated•14 years ago
|
Attachment #550149 -
Flags: review?(lars) → review+
Assignee | ||
Comment 21•14 years ago
|
||
Last patch landed on trunk with revision r3325: http://code.google.com/p/socorro/source/detail?r=3325
Assignee | ||
Comment 22•14 years ago
|
||
This is done for SearchAPI and ElasticSearchAPI. Won't fix for PostgresAPI for the moment.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•