Closed Bug 1125464 Opened 9 years ago Closed 9 years ago

Treeherder is only using the DB master and not the read-only slave

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: emorley)

References

Details

(Whiteboard: [data:consultative])

Attachments

(3 files)

It looks like the full-text search is using the th_admin, rather than the read-only slave with th_user.  This would alleviate the traffic on the master db.
Component: Treeherder → Treeherder: Data Ingestion
Priority: -- → P1
Looking at the general logs, the only thing connecting to treeherder2 is monitoring from Nagios. treeherder2 is the slave, and the only machine answering treeherder-ro-vip.metrics.scl3.mozilla.com

So something's not working right.
refreshing the endpoint put some queries on the slave, so it's working for some queries!
So here's the funny thing.

Whilst settings/local.py on prod, has:
* TREEHERDER_RO_DATABASE_HOST set to treeherder-ro-vip.metrics.scl3.mozilla.com
* TREEHERDER_DATABASE_HOST set to treeherder-rw-vip.metrics.scl3.mozilla.com
(and the corresponding admin vs user credentials etc)

We don't actually use TREEHERDER_RO_DATABASE_HOST anywhere.

We should be using it here:
https://github.com/mozilla/treeherder-service/blob/96d5d4255457c2ce7b052432aa591b2e419cad93/treeherder/model/management/commands/init_datasources.py#L27

As a result we've ended up with:

SELECT project, host, read_only_host FROM treeherder.datasource limit 5;

project	        read_only_host
accessibility	treeherder-rw-vip.metrics.scl3.mozilla.com
accessibility	treeherder-rw-vip.metrics.scl3.mozilla.com
addon-sdk	treeherder-rw-vip.metrics.scl3.mozilla.com
addon-sdk	treeherder-rw-vip.metrics.scl3.mozilla.com
alder	        treeherder-rw-vip.metrics.scl3.mozilla.com

We should update init_datasources.py so new additions get the correct host, plus update the existing rows in treeherder.datasource on stage and prod.
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Summary: bugscache should use the readonly slave not master → Use the value of TREEHERDER_RO_DATABASE_HOST for read_only_host rather than the master
Attachment #8554113 - Flags: review?(mdoglio)
I'm not so sure we are using the correct credentials for the read-only operations in fact.

Yes, we're definitely connecting to the master for all operations when we shouldn't, but the read-only user (th_user) should still be being used for the selects, since it's set directly at the location linked in comment 6.

However when I connect to treeherder-rw-vip.metrics.scl3.mozilla.com and list the client connections, the only th_user connections I see are from my own session, and everything else is being performed using th_admin, even the selects:

-- Connection Id: 1843936
-- User: th_admin
-- Host: <removed>
-- DB: treeherder
-- Command: Query
-- Time: 0
-- State: Sending data
SELECT `job_type`.`id`, `job_type`.`job_group_id`, `job_type`.`symbol`, `job_type`.`name`, `job_type`.`description`, `job_type`.`active_status` FROM `job_type`

That said, so long as the th_admin credentials can be used against both the slave and the master, this possible issue needn't stop the pull request in this bug landing - we can fix separately.
Ok so here's another funny thing, well two in fact.

We've carefully specified 'master_host' vs 'read_host' for all of our stored SQL, eg:

        "get_performance_series": {

            "sql":"SELECT `interval_seconds`, `series_signature`, `type`, `last_updated`, `blob`
                   FROM `performance_series`
                   WHERE `interval_seconds` = ? AND `series_signature` = ?",

            "host":"read_host"

        },

Except the property 'host' does not actually do anything. We instead needed to have used 'host_type'.

As such, we just fall back to the Datasource default, which is 'master_host':
https://github.com/mozilla/treeherder-service/blob/master/vendor/datasource/bases/RDBSHub.py#L105
(We should make the Datasource library reject unknown keys or else remove the default host, to make this kind of thing easier to spot).

In addition, for the reference data we only ever set 'master_host' and not 'read_host':
https://github.com/mozilla/treeherder-service/blob/6bf711fb4a50e1bd88751b8a1d87b2ae6f789c16/treeherder/model/derived/refdata.py#L29

Updated PR coming up.
Component: Treeherder: Data Ingestion → Treeherder
Summary: Use the value of TREEHERDER_RO_DATABASE_HOST for read_only_host rather than the master → Treeherder is only using the DB master and not the read-only slave
Blocks: 1125410
Comment on attachment 8554113 [details] [review]
Actually use the read-only DB host

The PR has been updated, I'd recommend looking at the individual commits diffs and commit messages for easier reviewing than the whole PR diff.

It fixes:
1) The read-only database params in base.py (and the sample config) passed the default fallback values as strings to os.environ.get() rather than the variables themselves. base.py was also missing the read-only host param, which would cause later commits to fail the tests.
2) init_datasource was using the master hostname when populating the Datasource table read_only_host field.
3) The reference data Datasource was missing the 'read_host' params.
4) The jobs/objectstore create() & related tests did not set read_only_host.
5) We used the wrong key name ('host' instead of 'host_type') in the SQL proc json files, causing Datasource to use the default of 'master_host'.

We'll also need to update the existing values in the read_only_host field for the datasource table on prod/stage, since the fix for # 2 only applies to new additions.

In the future I also think we should explore using:
https://docs.djangoproject.com/en/1.7/topics/db/multi-db/#automatic-database-routing

...for the Django parts that presumably at the moment use the default (which is master):
https://github.com/mozilla/treeherder-service/blob/b1881b26a1f2802a80d88a4263d506a7547e8bd1/treeherder/settings/base.py#L329

Ideally we'd set default to {} as per:
https://docs.djangoproject.com/en/1.7/topics/db/multi-db/#defining-your-databases

...plus forbid the default in Datasource - that way we'll catch cases where we've forgotten to set a host type. 

We should also vet the existing master_host vs read_host uses & check there are no more we can switch to the RO host.
Attachment #8554113 - Attachment description: Use the correct config param for datasource read_only_host → Actually use the read-only DB host
I've filed an upstream issue against Datasource for making typos like #5 in comment 9 harder to miss - since whilst there were many other things broken, we'd have known to actually look once we'd fixed the typos, since the tests then correctly started failing.

https://github.com/jeads/datasource/issues/20
Attached file sql to fix prod
Sheeri, does this look ok to run against prod? (WHERE clause because it looks like safe update mode is on).
The field is currently unused, until the PR lands.

We'll also need an equiv for stage, but using the value treeherder-stage-ro-vip.db.scl3.mozilla.com
Attachment #8554217 - Flags: review?(scabral)
Depends on: 1125564
Depends on: 1125569
Depends on: 1125585
Depends on: 1125586
No longer depends on: 1125585
(In reply to Ed Morley [:edmorley] from comment #10)
> I've filed an upstream issue against Datasource for making typos like #5 in
> comment 9 harder to miss - since whilst there were many other things broken,
> we'd have known to actually look once we'd fixed the typos, since the tests
> then correctly started failing.
> 
> https://github.com/jeads/datasource/issues/20

I don't think datasource is maintained anymore.
I forked it on my github account when jeads left, but I haven't updated its reference in requirements/pure.txt yet. We can use my fork if we want to change it.
Attachment #8554113 - Flags: review?(mdoglio) → review+
Commits pushed to master at https://github.com/mozilla/treeherder-service

https://github.com/mozilla/treeherder-service/commit/fbecbc36dbcf72526a79875c835fad578c1a3aa4
Bug 1125464 - Clean up read-only database config params

Passes the fallback default values correctly, rather than as a string.
Also adds the read-only host to base.py, otherwise later commits would
fail tests when run locally or on Travis.

https://github.com/mozilla/treeherder-service/commit/9f025ae4423f38df8df1d77a589c8ad1198659a3
Bug 1125464 - Populate the Datasource read_only_host field correctly

On initial repo setup, we were previously setting the read_only_host
field in the Datasource table to the host value for the master, not the
read-only slave. Oops.

https://github.com/mozilla/treeherder-service/commit/4ea041690ca80f30e64f47752a4f41f935a93446
Bug 1125464 - Add read-only DB config to the reference data Datasource

Later commits will fix the typos in model/sql/reference.json, resulting
in Datasource expecting us to provide a 'read_host' key for queries that
can be performed on the read-only DB. However prior to this change, the
call to Datasource for reference data didn't actually provide this.

https://github.com/mozilla/treeherder-service/commit/280c687159a3d576c77e731f968446627e70d48c
Bug 1125464 - Always set read_only_host when creating job/OS Datasources

read_only_host was previously only ever set when using the
init_datasources script, and not via any other means.

https://github.com/mozilla/treeherder-service/commit/b04f542b1253c5dcf4062249f7a5ccbe29de75ca
Bug 1125464 - Remove now redundant check for read_only_host

read_only_host is now always defined (to the same value as host, if no
other was configured), so remove the redundant conditional.

https://github.com/mozilla/treeherder-service/commit/c96f3af90494aa35d247c485e84294d8bd0b2073
Bug 1125464 - Fix key name for specifying master vs read-only DB host

The 'host' property is unrecognised (and ignored) by Datasource, causing
it to silently fall back to the default of 'master_host'. The property
we needed to have set is in fact called 'host_type'.

As a result, we never use the read-only host!

An issue has been filed against Datasource for making this less silent:
https://github.com/jeads/datasource/issues/20

https://github.com/mozilla/treeherder-service/commit/c8b1c70f30a5a43d464714d1634b823bd049fdbd
Bug 1125464 - Set host_type for test_bugscache correctly
Attached file sql to fix stage
Equivalent for stage (different table name, different read_only_host fqdn).
Depends on: 1125856
I'd suggest trying it out in stage/dev first. Those infrastructures have RO VIPs too.
Hah! Just read the next comment, you did! It looks good to me, yes.
Whiteboard: [data:consultative]
Depends on: 1126226
Comment on attachment 8554217 [details]
sql to fix prod

The stage attachment worked fine, so this has now been run against prod.

We're just waiting for bug 1126226, then we can continue testing the master vs read tweaks in bug 1125856 before deploying the changes here to prod.
Attachment #8554217 - Flags: review?(scabral)
Comment on attachment 8554217 [details]
sql to fix prod

looks good to me.
Attachment #8554217 - Flags: review+
Comment on attachment 8554517 [details]
sql to fix stage

looks good to me.
Attachment #8554517 - Flags: review+
Just needs deploying on prod now (on stage already).
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 1127940
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: