Closed Bug 669961 Opened 14 years ago Closed 14 years ago

Refactor the search API code

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adrian, Assigned: adrian)

References

Details

Attachments

(4 files, 6 obsolete files)

I'm in the process of refactoring the search API code in the middleware, for better consistency, better performances, and hopefully better unit-testing. This bug is for tracking this process and for code reviews.
Assignee: nobody → adrian
Version: 2.2 → 2.1
Blocks: 651279
Attached patch Refactor search module (obsolete) — Splinter Review
This is the first and main "wave" of refactoring. ElasticSearchAPI is mainly concerned, postgres only following the changes in searchUtil and few architecture changes. * searchAPI is renamed searchUtil, and contains functions used by both elasticSearchAPI and postgresAPI ; * Default values for parameters are centralized ; * Building the main ES query is now a function ; * The ES query uses query_string only when necessary, instead it uses filters which are faster ; * ElasticSearchAPI is better structured and will be easier to unit-test ; * ElasticSearch returns data about hang number and plugin number for each signature (this was a regression).
Attachment #544663 - Flags: review?(laura)
Attachment #544663 - Flags: review?(lars)
Blocks: 670186
Comment on attachment 544663 [details] [diff] [review] Refactor search module I really didn't see trouble in the original object oriented design. As it says in PEP8, design for inheritance. Moving things out to the module level only serve the thwart inheritance. Utilities that are useful outside the class should be at module level. Utilities that are used only within the class should be enclosed within the class. You can better accomplish "cleaning" by making functions that don't access local data into '@static' functions, breaking functions that are longer than 50 (or so) lines into smaller functions, and moving variable/function names into PEP8 form ("es_result" instead of "esResult" (I really hate saying that as I prefer camelCase)). Once you've done that sort of refactoring, I think it is more important to focus on unit tests.
Attachment #544663 - Flags: review?(lars) → review-
Attached patch Refactor search module v2 (obsolete) — Splinter Review
Following the advices of Lars, I rolled back to an inheritance architecture. This is still here from previous patch: * Default values for parameters are centralized ;* Building the main ES query is now a function ; * The ES query uses query_string only when necessary, instead it uses filters which are faster ; * ElasticSearchAPI is better structured and will be easier to unit-test ; * ElasticSearch returns data about hang number and plugin number for each signature (this was a regression). This is new: * Helpers functions are now static methods ; * Everything follows PEP8, from modules and files names to variables in functions ; * Integrate fixes for 500 Errors (see bug 669965 ) This is testable on khan: http://agaudebert.khan.mozilla.org/query and has been tested with PowerFuzzer, resulting in no 500 Internal Error. If r+, we'll have to test it more with Matt Brandt when I come back.
Attachment #544663 - Attachment is obsolete: true
Attachment #545038 - Flags: review?(laura)
Attachment #545038 - Flags: review?(lars)
Attachment #544663 - Flags: review?(laura)
Comment on attachment 545038 [details] [diff] [review] Refactor search module v2 The name of the base class SearchAPI implies that it defines the api for search. However, the primary public API for this class is embodied in the methods 'query' and 'search' that are only defined in the derived classes. I think the base class should define, but not implement these functions. Had I been designing this in C++, I would have made the base class SearchAPI abstract by defining the 'search' and 'query' functions as pure virtual. Python has a similar system through the use of the 'abc' (abstract base class) module: >>> import abc >>> class Alpha(object): ... __metaclass__ = abc.ABCMeta ... @abc.abstractmethod ... def a(self): ... raise NotImplemented ... >>> class Beta(Alpha): ... pass ... >>> class Gamma(Alpha): ... def a(self): ... print "Gamma.a" ... >>> a = Alpha() Traceback (most recent call last): File "<string>", line 1, in <fragment> TypeError: Can't instantiate abstract class Alpha with abstract methods a >>> b = Beta() Traceback (most recent call last): File "<string>", line 1, in <fragment> TypeError: Can't instantiate abstract class Beta with abstract methods a >>> g = Gamma() >>> g.a() Gamma.a Now this is all very interesting and I'm obviously feeling academic this morning, but you don't have to go this far. Just define 'query' and 'search' in the base class and have them raise the 'NotImplemented' exception. That'll make me happy...
Attachment #545038 - Flags: review?(lars) → review-
Actively trying to make Lars happy! :D * Adding some ABC features for search API implementation ; * Adding empty methods raising NotImplemented to follow the API. Code has been tested with PowerFuzzer, no errors. Need to be QAed if r+ed.
Attachment #545038 - Attachment is obsolete: true
Attachment #546586 - Flags: review?(lars)
Attachment #545038 - Flags: review?(laura)
Comment on attachment 546586 [details] [diff] [review] Refactor search module v3 Review of attachment 546586 [details] [diff] [review]: ----------------------------------------------------------------- well, as I said in the last sentence in my previous review, you didn't have to go so far as using the abc classes. However, this is just fine. You've satisfied me.
Attachment #546586 - Flags: review?(lars) → review+
adrian, when this lands on dev will you bump this to fixed? I'll finish QAing it for you :)
Just a quick update of the code after running my first unit tests: * Fixing a bug in parameters order when calling build_wildcard_query() ; * Correcting some specific behaviors ; * Adding precision to a comment. This is just 6 or 7 lines changing comparing to the last patch (I don't mark it as obsolete so you can diff).
Attachment #546895 - Flags: review?(lars)
Comment on attachment 546895 [details] [diff] [review] Refactor search module v4 this is a rubber stamp approval. The two patches are largely the same and you say there are only a few changes between them. I'll trust you.
Attachment #546895 - Flags: review?(lars) → review+
Target Milestone: --- → 2.1
Version: 2.1 → Trunk
Patch v4 landed on trunk with revision r3275. Not marking this as RESOLVED because the work on this is not finished yet - more refactoring to come on ElasticSearchAPI, and PostgresAPI would love to be refactored at some point too.
The httpclient module was forgot in the previous patch, causing 500 Errors on dev when using search with ES. Very simple changes, only switching to PEP8, which impacts the module name and one function name.
Attachment #548295 - Flags: review?(lars)
Same patch than just before, but I removed the unneeded import of urllib.
Attachment #548295 - Attachment is obsolete: true
Attachment #548302 - Flags: review?(chris.lonnen)
Attachment #548295 - Flags: review?(lars)
Attached patch HttpClient class refactor (obsolete) — Splinter Review
Attachment #548305 - Flags: review?(chris.lonnen)
Comment on attachment 548302 [details] [diff] [review] httpclient was to be refactored too .2 Review of attachment 548302 [details] [diff] [review]: ----------------------------------------------------------------- pep8 nitpickery: whitespace before ':' on lines 23, 24, and 25. line 42 is too long. Fix those and this can go in.
Attachment #548302 - Flags: review?(chris.lonnen) → review-
Adrian - PEP8, round 3!
Attachment #548302 - Attachment is obsolete: true
Attachment #548305 - Attachment is obsolete: true
Attachment #548310 - Flags: review?(chris.lonnen)
Attachment #548305 - Flags: review?(chris.lonnen)
Comment on attachment 548310 [details] [diff] [review] HttpClient class refactor Review of attachment 548310 [details] [diff] [review]: -----------------------------------------------------------------
Attachment #548310 - Flags: review?(chris.lonnen) → review+
Moved to 2.3 as this bug is a big, slowly-moving process.
Target Milestone: 2.1 → 2.3
Attached patch Did you say PEP8? (obsolete) — Splinter Review
This patch applies PEP8 rules to httpclient.py, searchapi.py and elasticsearch.py. Files were checked using jbalogh's check.py. Postgres is not here yet because it needs a little more changes than just breaking lines.
Attachment #549981 - Flags: review?(lars)
Comment on attachment 549981 [details] [diff] [review] Did you say PEP8? > if http_response["error"]["code"] == 404 and \ > data.find("IndexMissingException") >= 0: The backslash line continuation character is so very FORTRAN. I've found it to be much more stylish to use parenthesis to accomplish the same thing: > if (http_response["error"]["code"] == 404 and > data.find("IndexMissingException") >= 0): > logger.debug("Query the OS by signature: %s" % > count_by_os_query_json) In this line, the python interpreter will do the string substitution and pass it to the 'debug' method of the logger. If the logger is set to ignore 'DEBUG' level messages, it will throw away the new string and just return, having wasted the effort of doing the string substitution. The logging function calls (debug, info, error, critical) accept a variable number of arguments. You could do this instead: > logger.debug("Query the OS by signature: %s", > count_by_os_query_json) Rather than do the string substitution, the Python interpreter will pass the two arguments to the 'debug' method. If the logger is set to ignore 'DEBUG', it will drop the two arguments and return. It doesn't do the unnecessary overhead of the string catenation. However, if the logger is to log 'DEBUG' level messages, it assumes that the first argument is a string and all subsequent arguments are for string substitution. The 'debug' method does the string substitution for you. It is a minor efficiency in cases where the logging level is set higher than the logging call.
Attachment #549981 - Flags: review?(lars) → review-
Applying changes Lars suggested, completed the doc for function get_parameters().
Attachment #549981 - Attachment is obsolete: true
Attachment #550149 - Flags: review?(lars)
Attachment #550149 - Flags: review?(lars) → review+
Last patch landed on trunk with revision r3325: http://code.google.com/p/socorro/source/detail?r=3325
This is done for SearchAPI and ElasticSearchAPI. Won't fix for PostgresAPI for the moment.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: