Open Bug 1814313 Opened 2 years ago Updated 2 years ago

Try sending as many SELECT queries to the replica database as possible

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: wezhou, Unassigned)

Details

:wezhou

Reporter

Description

•

2 years ago

While investigating the 100% database CPU usage issue today, we find most of the SELECT queries are sent to the primary database including some time consuming slow query (see https://bugzilla.mozilla.org/show_bug.cgi?id=1814312). During the incident, the primary database reached 100% CPU while the replica only uses 2% CPU.

We should send as many READ queries to the replicas so that the WRITE queries can return quickly and improve performance. Another benefit of sending READ queries to replicas is that replicas can be horizontally scaled and help improve the overall throughput of the system.

:wezhou

Reporter

Comment 1

•

2 years ago

I just found out the current existing replica is meant for use by redash only and not for the application itself (as hinted in https://github.com/mozilla-services/cloudops-infra/pull/3120).

If we were go down this route (i.e. sending READ queries to read replicas), we probably need to set up new replicas. However, it feels like this ticket can wait till bug #1814312 is resolved and if that solves the problem, this one becomes less urgent (considering we have doubled the number of vCPUs of the primary today).

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Try sending as many SELECT queries to the replica database as possible

Categories

(Tree Management :: Treeherder, enhancement)

Tracking

(Not tracked)

People

(Reporter: wezhou, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1