1562017 - Test out idea to use database replica for get requests

Reporter

Description

•

5 years ago

The concerns about this idea was a lag between insertions and reads (the UI needing to read something it wrote not long after).
According to ckolos we're talking about a lag of miliseconds.

This has the potential to isolate UI slowdowns due to MySql inserts slowing down MySql select operations (see bug 1553199 for details).

Some docs:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html

Some info from #django channel

r1cky [10:33 AM]
@Armen (armenzg) SUMO used to (probably still does) have the DB set up as one master (RW) and multiple slaves (RO). So all the GETs were handled by the slaves


Armen (armenzg) [10:36 AM]
@r1cky have you been involved with the work to switch to that model? or you know someone who has?
I'm interested to know what problems I could encounter when switching over

r1cky [10:37 AM]
I was but that was years ago... looks like they are using https://github.com/jbalogh/django-multidb-router now
jbalogh/django-multidb-router
Round-robin master/slave db router for Django.
Website
http://jbalogh.me/projects/multidb-router
Stars
263
jbalogh/django-multidb-router | Feb 9th, 2010 | Added by GitHub
https://github.com/mozilla/kitsune/blob/master/kitsune/settings.py#L72-L80
kitsune/settings.py:72-80
```if DATABASES['default']['ENGINE'] == 'django.db.backends.mysql':
    DATABASES['default']['CONN_MAX_AGE'] = DB_CONN_MAX_AGE
    DATABASES['default']['OPTIONS'] = {'init_command': 'SET storage_engine=InnoDB'}
    DATABASE_ROUTERS = ('multidb.PinningMasterSlaveRouter',)```

 Show more
mozilla/kitsune | Added by GitHub
The Pinning router is so that a client that just did a WRITE gets "pinned" to the master database for the next few READs/GETs. otherwise they will see stale data if there is a delay in master/slave replication (edited) 

pmac [10:40 AM]
I think we were using that when SUMO was in the data center, but not in AWS, but no reason it wouldn't work there

mat [11:18 AM]
AMO is using that as well ; similar strategy, primary db for writes and reads coming from a request that did some writes recently + some replicas for rest of the reads
unless something is very wrong the replica delay is always < 1 second. only potential issue is when you write something, then trigger a celery task, you need to have the task use the primary db through a decorator or something, even if it's just reading, to ensure the data is there
(on a similar note, not related to replicas but you need to make sure any transaction is committed before triggering a task if you're wrapping views in a transaction - we do that with https://github.com/mozilla/django-post-request-task/)

Sarah Clements [:sclements]

Comment 1

•

5 years ago

I think this is worth trying. I do think we'll want to try what mat suggested in the dialog you posted - to use the "primary db for writes and reads coming from a request that did some writes". A potential area of concern is with Perfherder - because of how the backend logic is structured for the alert data, an updated alert summary is requested if an alert status has changed and any delay in that would affect the UI. So I'd want to test the change thoroughly to see how it affects that part of the app.

Armen [:armenzg]

Reporter

Updated

•

5 years ago

Priority: -- → P3

Armen [:armenzg]

Reporter

Updated

•

5 years ago

Blocks: 1553199

Sarah Clements [:sclements]

Comment 2

•

5 years ago

This might actually be really helpful in alleviating some of the UI slowdown. Perhaps it should be a higher priority?

Armen [:armenzg]

Reporter

Comment 3

•

5 years ago

I was just trying to get it out of the queue of "untriaged" bugs.

It's important. We should try it soon.

Priority: P3 → P1

Armen [:armenzg]

Reporter

Comment 4

•

5 years ago

I discussed this problem with a DB expert and it was clear that we would just be moving the problem to another place.

The replications from the master DB + the reads from the slave DB would be the same amount of IOPS.

In any case, the database IOPS capping is solved.

Armen [:armenzg]

Reporter

Updated

•

5 years ago

No longer blocks: 1067846, 1504990

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

Armen [:armenzg]

Reporter

Updated

•

5 years ago

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Armen [:armenzg]

Reporter

Updated

•

5 years ago

No longer blocks: 1553199

Cameron Dawson [:camd]

Updated

•

5 years ago

Assignee: nobody → cdawson

Cameron Dawson [:camd]

Comment 5

•

4 years ago

De-prioritizing Treeherder work in new reorg. So removing myself.

Assignee: cdawson → nobody

Bugzilla

Quick Search

Test out idea to use database replica for get requests

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

Tracking

(Not tracked)

People

(Reporter: armenzg, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated

Updated

Updated

Updated

Comment 5