Closed Bug 681112 Opened 13 years ago Closed 13 years ago

Reorganize middleware code

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adrian, Assigned: adrian)

References

Details

Attachments

(3 files, 1 obsolete file)

As middleware is growing it will handle more and more HTTP services. Before the code is too huge, we should reorganize it to facilitate new services and to make it cleaner. 

Here is the plan: 

socorro/
        ...
        middleware/
                search_service.py
                report_service.py
                crash_service.py
                query_service.py
                stats_service.py
                util_service.py
                ...
                elasticsearch/
                        elasticsearch.py
                        search.py
                        report.py
                        ...
                postgresql/
                        postgresql.py
                        search.py
                        report.py
                        ...
        ...

All services are in a middleware module, and all implementations are in a submodule called by the name of the storage system it uses. 

Example:
HTTP GET /search/signatures/param/value/
With serviceImplementationClass = ElasticSearch
Calls:
    1. socorro.middleware.search.py
    2. socorro.middleware.elasticsearch.elasticsearch.py
    3. socorro.middleware.elasticsearch.search.py
Blocks: 680166
This patch does a lot of things:

* reorganize the python code as explained before ;
* make sure everything is PEP 8 (yes, even postgresql) ;
* give better documentation ;
* change a config key (searchImplClass becomes serviceImplementationClass) ;
* change the unit tests accordingly.

I believe code is way better organized now, and adding new services to the middleware will be easier and make more sense.
Attachment #555012 - Flags: review?(lars)
hey adrian, would it be possible to rollback the tree changes (file name changes, directory changes) so that the diff shows only what changed within the files?  I'd like to see and review that first.  Then once that is approved, do the tree changes and have that reviewed separately.  

As it stands right now, the diffs are pretty much useless because they show all the files to be entirely new.
Lars, here is the patch without filesystem changes. It is indeed way easier to read... Sorry I didn't think about that for the first patch!

Btw I'm not marking the previous patch as obsolete as it is the exact same content, so if you r- or r+ this one can you do the same with the first one?
Attachment #555139 - Flags: review?(lars)
Nag at the sideline: Would you be using a system like hg or git already, you could do diffs that transparently show moves and actual changes in moved files. ;-)

(I know a move is planned for some time in the future, just pointing out another reason why that would be nice.)
Comment on attachment 555012 [details] [diff] [review]
Reorganize the middleware

this change needs to be consolidated with long term goals.  I'm suggesting that we should think about consolidation of external services (ES, HBase, PG) into individual packages.  This system starts down that path, but ties it to the middleware.  I think we need discussion and buy in from the whole group before we proceed.
Attachment #555012 - Flags: review?(lars) → review-
Target Milestone: 2.2.2 → 2.3
Target Milestone: 2.3 → 2.4
This issue has been discussed with the team and we agreed that it is a good thing to do, and we would like to see an implementation of it to try and test it before really confirming it. I'll write that in the coming 2 or 3 weeks for search only at first, as a demonstration and if accepted a first step in that process of reorganizing. 

Moved to 2.4, we won't have it done and validated for 2.3.
Here is the current state of my work on this bug: https://github.com/AdrianGaudebert/socorro/compare/master...681112-reorganize-middleware

The code is not changing a lot, most of the changes are code moving from a file to another. The important part is the socorro.middleware.service code as it is where implementations are discovered dynamically by services. 

The way it works changed a bit from what I explained during all-hands: there is no more central class for each "external", it is now a base class. That means services will not call a single known class but will search inside a module if there is an implementation of the service they need. That implementation module is decided by looking first at arguments (is there a module specified in parameters? ), then in configuration (what is the chosen module for the application?), and then into an optional list of modules defined by the service itself. 

Each implementation in external.something inherits the base class (e.g. for ElasticSearch external.elasticsearch.search will inherit external.elasticsearch.elasticsearch), and the base class inherits a "common" class that lies in external. 

The goal of all this is to make it very easy to add new implementations, and to have automatic fall-backs if the favorite implementation doesn't exist. Also the base class doesn't have to have a list of all the service implementations.

This code is tested with PostgreSQL but not yet with ElasticSearch as I don't have a test environment anymore.
Attached image New middleware hierarchy (obsolete) —
I created a new branch to show a basic example of how to create a new service and 2 implementations: [https://github.com/AdrianGaudebert/socorro/compare/master...mware-reorg-skeleton]

Please look at files socorro/middleware/hello_service.py, socorro/external/english/hello.py, socorro/external/french/french.py and socorro/external/french/hello.py. Those are very simple files only displaying "Hello world!" in English or French. 

If you want to test it, try those URLs:
http://socorro-api/bpapi/201105/hello/
http://socorro-api/bpapi/201105/hello/force_api_impl/french/
http://socorro-api/bpapi/201105/hello/force_api_impl/english/

You can also change the config key `serviceImplementationModule` to 'socorro.external.english' or 'socorro.external.french' in webapiconfig.py. 

Also added a diagram of the class hierarchy in that new organization. 

Awaiting feedback from :laura, :lars, :lonnen and anyone interested!
I made my history a lot cleaner so you can make comments easily: 

Main branch (the one that will be pulled): [https://github.com/AdrianGaudebert/socorro/compare/master...681112-reorganize-middleware]
Branch with example service/external: [https://github.com/AdrianGaudebert/socorro/compare/master...mware-reorg-skeleton]
Comment on attachment 555139 [details] [diff] [review]
Reorganize the middleware, without filesystem changes

Review of attachment 555139 [details] [diff] [review]:
-----------------------------------------------------------------

Review from Lars dropped.
Attachment #555139 - Flags: review?(lars)
In general, I approve of the direction that this reorganization is heading.

It is my intent to exploit the external package throughout Socorro.  I'd like to move all external resource dependent code into implementation directories within this hierarchy.  As such, I'd like to impose a module name convention on files within the external sub-sub-packages.  The name should reflect what API it implements and/or who the client is for the API.

For example, I'll move the Processor Postrges code into the 'external.postgres' package.  I'll add a file with the name 'processor_api.py'.  That module name will be introduced in the configuration for dynamic loading when the processor starts.  Theoretically, someone could implement a MySQL version and specify its use in configuration.  I'll also be moving the implementations of the 'storage' classes into this heirachy, too.

In that vein, your module 'external.postgresql.postgres' package should be renamed something like 'external.postgresql.middleware_search_api' or 'external.postgresql.search_api'.  There is no need to repeat the name 'postgres' within the module name, being a member of the 'postgresql' sub-package is enough to identify it.  You should do the same/similar renaming within the other implementation sub-sub-packages.

You've got a 'common.py' within the 'external' package.  I suggest that since it is solely an API for middleware that it be moved to the 'middleware' package.  Alternatively, if you think that someday it may be used outside the middleware, you can leave it in place, but rename it so that it is obviously a base required for the search implementations.

We're on the right track, this is good work.
After discussing on IRC with :lars and :lonnen, we agreed that:

* "_api" at the end of external files is not needed (considering that everything in external _is_ a Python API) ;

* files like socorro.external.common and socorro.external.postgresql.postgresql should be moved into socorro.lib and separated by themes (for example, search_common, elasticsearch_common, postgresql_common, external_common... ). Classes in external will inherit from any needed classes. The package for those would be socorro.lib.external, does that sound right?
nope, sorry, that doesn't sound quite right.  If things are needing names like "search_postgres", they should those things should stay within the external implementation directories.  I've only proposed moving .../external/common.py to .../lib/search_common.py

end result:

.../socorro/lib/search_common.py
.../socorro/external/postgresql/search_base.py     # used to be postgres.py
.../socorro/external/postgresql/search.py        
.../socorro/external/elasticsearch/search_base.py  # used to be elasticsearch.py
.../socorro/external/elasticsearch/search.py
...

you may want to combine the 'search_base' and 'search' files.  This isn't Java, we can define more that one class in a file.
Last updates online:

https://github.com/AdrianGaudebert/socorro/compare/master...681112-reorganize-middleware
https://github.com/AdrianGaudebert/socorro/compare/master...mware-reorg-skeleton

I think we're getting closer! I made the changes lars, lonnen and I discussed recently and added a few unit tests. 

More precisely:
* socorro.external.common.py => socorro.lib.search_common.py
* socorro.external.elasticsearch.elasticsearch.py => socorro.external.elasticsearch.common.py
* same for postgresql
* moved generic code to socorro.lib.{util|datetimeutil}
* pieces of code moved between files to be were they belong
* added unit tests for socorro.middleware.service and reorganized existing unit tests to reflect new organization of the code.
r+ for the changes mentioned in Comment #15
It looks good to me.
Thank you Lars! 

I opened https://github.com/mozilla/socorro/pull/136 

:rhelmer, :lonnen, :laura, r?
Target Milestone: 2.4 → 2.3.3
Adrian and I spoke in IRC about the search methods that accommodate separate branches for a single value or a list of values. We agreed that it would be worthwhile to convert single values to singleton lists earlier (when they are parsed from the url), so that these utility methods could be deleted in favor of python built-ins.
Last changes suggested will take time to be implemented, moved to next release.
Target Milestone: 2.3.3 → 2.3.4
Blocks: 701600
Blocks: 701813
Blocks: 699504
Attached image Middleware hierarchy
This is what the middleware looks like now.
Attachment #569971 - Attachment is obsolete: true
As I'm reading over this new structure, I wonder if we shouldn't split the lib folder up. If someone wants to write a new external, it would be nice if everything was in one folder that could be dropped in.

socorro/external/<some_tech>/
socorro/external/<some_tech>/base.py (optional)
socorro/external/<some_tech>/<service>.py
socorro/external/<some_tech>/utils.py

Thoughts?
The problem is that what is in socorro.lib is related to implementations across resources. For example, socorro.lib.search_common is shared by socorro.external.elasticsearch.search and socorro.external.postgresql.search. Same for external_common which is shared by all implementations. 

It seems to me that utils.py file you want to add will be redundant with base.py. 

Instead, I would suggest that we add a submodule in lib or in external for code shared across resources. Something like:

socorro/lib/external/search_common.py
socorro/lib/external/external_common.py

or

socorro/external/shared/search.py
socorro/external/shared/utils.py

or 

socorro/external/common/search.py
socorro/external/common/utils.py
That makes sense.... to be honest I'm not sure where I thought I saw lib/<some_tech>_utils.py. Probably confusing search with ES. Sorry.

Looking at each services implementation now, though, why doesn't the base class for the service contain the lib code related to that service? Seems to conflict with how we're handling externals.

e.g. why are these seperate:
socorro/lib/search_common.py
socorro/middleware/search_service.py

while we have a bunch of static methods in:
socorro/external/elasticsearch/base.py
We try to keep services and external resources entirely separated. The idea is that the service is only a REST interface, and does nothing but calling a method somewhere that does the job it needs. 

Also, external implementations are independent from services because we want to use them from anywhere in Socorro. Lars will be moving it's storage system to use that new architecture, with everything related to HBase in ``external/hbase`` and the rest calling that code from wherever.

So the best solution was to create other files, outside of external and outside of middleware, and the only place we found was socorro/lib. Maybe it's not ideal, and maybe it won't scale well if we need to do that for numerous other services. The only solution I can think of is the one I wrote in my previous comment, ie a new submodule like socorro/lib/external.
Blocks: 704480
Thought we were using mixins for that. Clearly I've held this up long enough with objections that don't reflect the state of the code. One last thing: https://github.com/mozilla/socorro/pull/136/files#L14R47
Commit pushed to https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/4dc25ac6b7c226cc8fa607f3c09e32f8d4a0325f
Merge pull request #136 from AdrianGaudebert/681112-reorganize-middleware

Fixes bug 681112 - Reorganize middleware
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: