Closed Bug 997231 Opened 7 years ago Closed 7 years ago

Need alert at 80% of max DB connections for Firefox Accounts/Sync

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: kthiessen, Assigned: mostlygeek)

Details

(Whiteboard: [qa+])

Per :jrgm, exceeding the number of DB connections available in FxA/Sync is catastrophic.  So we'd like to know when we're getting close to the red zone.

CC'ing jrgm, Edwin, and Gene, who are likely to have any details needed to make this happen.
QA Contact: kthiessen
Whiteboard: [qa+]
It's been 60 days.  Any word on this?
Escalating this to the ops people who are likely to be most in the line of fire: :bobm, :whd.  If there are other folks who would know about this, please bring them in via cc or needinfo.

Bob, Wes, what do we have in terms of monitoring?
Flags: needinfo?(whd)
Flags: needinfo?(bobm)
(In reply to Karl Thiessen [:kthiessen] from comment #2)
> Escalating this to the ops people who are likely to be most in the line of
> fire: :bobm, :whd.  If there are other folks who would know about this,
> please bring them in via cc or needinfo.

Is this a monitoring request for max connections to the RDS instance for FxA?  The max connections to the Token DB? Or, the individual Sync servers?

> Bob, Wes, what do we have in terms of monitoring?

On individual Sync servers the Max connections to the DBs should not be hit, but rather get queued / throttled at the application layer where worse case it can result in an error that should be handled gracefully by the client.  We have the ability to detect when this happens, and can appropriately deal with the situation from a capacity perspective.
Flags: needinfo?(whd)
Flags: needinfo?(kthiessen)
Flags: needinfo?(jrgm)
Flags: needinfo?(bobm)
(In reply to Bob Micheletto [:bobm] from comment #3)
> (In reply to Karl Thiessen [:kthiessen] from comment #2)
> > Escalating this to the ops people who are likely to be most in the line of
> > fire: :bobm, :whd.  If there are other folks who would know about this,
> > please bring them in via cc or needinfo.
> 
> Is this a monitoring request for max connections to the RDS instance for
> FxA?  The max connections to the Token DB? Or, the individual Sync servers?

I'm most concerned about the first two, as the Sync Servers have some application-layer backoff, as you state.  :jrgm will likely have more details.
Flags: needinfo?(kthiessen)
Assignee: nobody → bwong
Request for Max Connections monitor for Token and FxA RDS.
Added to stack driver. 

The max DB connections is: 1232
Obtained with the SQL: show variables like '%conn%' 

I set a stackdriver alert for when the connections go above 900. Currently with 6 servers we're at ~250 connections. So we're good to scale up to the max servers in the Autoscale group (21).
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Very cool.  Thank you, Ben.
I will take the action to check in with :jrgm next week in Mt View and make sure that this covers everything he needed/wanted, after which I will mark this bug VERIFIED.
Seems good to me. Thanks.
Flags: needinfo?(jrgm)
Marking VERIFIED per comment 8.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.