Closed Bug 1250859 Opened 4 years ago Closed 2 years ago

relengapi microservices

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garbas, Assigned: garbas)

References

Details

User Story

# Links

- Code: https://github.com/garbas/mozilla-relengapi (temporarily until we find a new "home")
- Docs: ???
- Develop: ???
- Staging: ???

# Services

- Archiver: #1284474 (Unassigned)
- Auth: #??? (:garbas:)
- BadPeny: #1284479 (:garbas:)
- Clobberer: #1284475 (:garbas:)
- Docs: #??? (:garbas:)
- Mapper: #1284478 (Unassigned)
- SlaveLoad: #1284477 (Unassigned)
- ToolTool: #1284475 (Unassigned)
- TokenAuth: ignores since login.taskcluster.net provides this functionality
- TreeStatus: #1284476 (Unassigned)
This is a meta ticket to share ideas on how to improve ``relengapi``. Once ideas are discussed it will be used a status ticket for this work.


# Current situation and problems

Currently all ``relengapi`` is one application with many API endpoints.

While code is nicely decoupled at python package level it has become a hard to roll out new features and new API endpoints, due to the fact that a small change in one module could bring down or brake unexpectedly other endpoints. This makes ``relengapi`` very fragile and 

Current services:
 - Bad Penny (handles scheduling and monitoring of periodic tasks)
 - Clobberer (handles resetting build directories on buildbot slaves to a clean state)
 - Slave loan (allows contributors to request and receive access to continuous integration machines for debugging and development)
 - Tokens (used to allow automated services to authenticate to Releng API without being tied to a user's identity)
 - Tooltool (fetching binary artifacts for builds and tests)
 - Tree Status (See the current status of Mozilla's version-control repositories)


# What do we want / Goals

Initial goal of having everything under one repository was to share common things, making sure we don't reinvent the wheel which (proven) shortened development cycle. We must preserve this while still


It should be possible to release only one ticket


# Proposed actions

- create separated python packages for each service (listed above)

- deploy services individually (under different subdomain) using heroku
   - we could use their - still in closed beta - docker support

- move frontend code (js/css) into separate package (not python package)

- deploy documentation also as a separate "service"

- use grafana (eg. https://github.com/taskcluster/heroku-grafana) to monitor services

- maybe this would be a nice opportunity to also do a rewrite into react+redux as taskcluster team is planning. but this should be done as a separate task (if we agree on this ofcourse)
Flags: needinfo?(dustin)
There are a lot of sentence prefixes there.. early draft?

Also, you omitted the mapper and archiver blueprints and a few others.  The full list is here:
  https://github.com/mozilla/build-relengapi/tree/master/relengapi/blueprints

> - create separated python packages for each service (listed above)

+1 -- with one or more shared library packages to support those

> - deploy services individually (under different subdomain) using heroku

+1

>    - we could use their - still in closed beta - docker support

I don't see why we'd need Docker -- RelengAPI is intentionally very simple to deploy: just some Python packages and gunicorn.

> - move frontend code (js/css) into separate package (not python package)
> - maybe this would be a nice opportunity to also do a rewrite into react+redux as taskcluster team is planning. but this
> should be done as a separate task (if we agree on this ofcourse)

+1, tools.mozilla-releng.net?

> - deploy documentation also as a separate "service"

+1, but you leave unspecified how the documentation will get from the individual services to this one -- will it query a documentation blob "live" from the other services?  TaskCluster uploads API documentation to an S3 bucket which the docs site then queries, although much of the prose documentation is in the taskcluster-docs repository.

> - use grafana (eg. https://github.com/taskcluster/heroku-grafana) to monitor services
+1

Other things to be concerned with:

 - MySQL database backends - these may not be the best choice in Heroku; perhaps use Postgres or Azure tables instead?

 - Celery - archiver, slaveloan, and tooltool make heavy use of Celery to do processing outside the HTTP request.  Ideally, for both MySQL and Celery, we could share some access rather than paying for an add-on per app.

 - Authentication - the current set up uses Apache to do LDAP authentication, meaning RelengAPI and Python code never see the authentication string and thereby reducing the likelihood of a vulnerability exposing LDAP passwords.  Outside of the Mozilla network, access to the LDAP database requires a client SSL certificate, which is not difficult to do in Python but does not work with Apache.  We can use Okta (SAML), but then we need an alternative for non-employees, and we should also allow Mozillians logins (for non-employees without an LDAP account).  It turns out all of this is a lot of work!  I think the least difficult path here would be to use Hawk and piggyback on TaskCluster's authentication system.

There are a few other AWS services in use -- ElastiCache is one of them -- but those should continue to work without any change.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #1)
> There are a lot of sentence prefixes there.. early draft?
> 

yes an early draft, nothing final. i'm not aware of all the details and i'd like to collect them (in written form) before i start working on any of the tasks.


> Also, you omitted the mapper and archiver blueprints and a few others.  The
> full list is here:
>   https://github.com/mozilla/build-relengapi/tree/master/relengapi/blueprints
> 

will add mapper and archiver to the list.


> > - create separated python packages for each service (listed above)
> 
> +1 -- with one or more shared library packages to support those
> 

+1

> >    - we could use their - still in closed beta - docker support
> 
> I don't see why we'd need Docker -- RelengAPI is intentionally very simple
> to deploy: just some Python packages and gunicorn.
> 

i will already setup the development environment for me with nix. it is a one command line to generate a docker image with exact binaries that we use during development.
for me it only makes sense to eliminate any potential surprise. also this would allow us to use some other tools which we would otherwise reject due to the "hard to deploy" comment.
 

> > - move frontend code (js/css) into separate package (not python package)
> > - maybe this would be a nice opportunity to also do a rewrite into react+redux as taskcluster team is planning. but this
> > should be done as a separate task (if we agree on this ofcourse)
> 
> +1, tools.mozilla-releng.net?
> 

what if we just use the top domain? or is that planned for something else.


> > - deploy documentation also as a separate "service"
> 
> +1, but you leave unspecified how the documentation will get from the
> individual services to this one -- will it query a documentation blob "live"
> from the other services?  TaskCluster uploads API documentation to an S3
> bucket which the docs site then queries, although much of the prose
> documentation is in the taskcluster-docs repository.
> 

we dont have a problem of documentation being in over +10 repositories. i would keep documentation in one folder as it is now and make sure import

even taskcluster documentation could have a better build process, because you can have a build process which will clone/download all the repos into one location and build the docs there. having this solved at build time you dont have to rely on anything else then a web server that serves your html.




> Other things to be concerned with:
> 
>  - MySQL database backends - these may not be the best choice in Heroku;
> perhaps use Postgres or Azure tables instead?
>

+1 on postgresql
 
>  - Celery - archiver, slaveloan, and tooltool make heavy use of Celery to do
> processing outside the HTTP request.  Ideally, for both MySQL and Celery, we
> could share some access rather than paying for an add-on per app.
>

+1 

>  - Authentication - the current set up uses Apache to do LDAP
> authentication, meaning RelengAPI and Python code never see the
> authentication string and thereby reducing the likelihood of a vulnerability
> exposing LDAP passwords.  Outside of the Mozilla network, access to the LDAP
> database requires a client SSL certificate, which is not difficult to do in
> Python but does not work with Apache.  We can use Okta (SAML), but then we
> need an alternative for non-employees, and we should also allow Mozillians
> logins (for non-employees without an LDAP account).  It turns out all of
> this is a lot of work!  I think the least difficult path here would be to
> use Hawk and piggyback on TaskCluster's authentication system.
> 

+1 on "the least difficult path here would be to use Hawk and piggyback on TaskCluster's authentication system."


> There are a few other AWS services in use -- ElastiCache is one of them --
> but those should continue to work without any change.

is there a list of those services somewhere?
Flags: needinfo?(dustin)
ah i see now, i didn't finish few sentences, sorry. thoughts were going faster then my hands. will create cummary in ``User Story`` box.
User Story: (updated)
I'll wait until I see the docker build process, but if it is as easy as you suggest, ok :)

> what if we just use the top domain? or is that planned for something else.

I like that the name "tools" highlights that this is just an interface to something larger.  But I don't really care.  We could 302 from one to the other.

As for the AWS services, I think some grepping of the source for uses of the AWS SDK may be the best approach.
Flags: needinfo?(dustin)
Assignee: nobody → rok
Status: NEW → ASSIGNED
Depends on: 1284474
Depends on: 1284475
Depends on: 1284476
Depends on: 1284477
Depends on: 1284478
Depends on: 1284479
Depends on: 1284480
User Story: (updated)
Depends on: 1351073
Component: Tools → General
this is an old ticket. all relevant relengapi blueprints are migrated
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.