Closed Bug 1294595 Opened 8 years ago Closed 8 years ago

Increase Nginx send_timeout setting for Sync.

Categories

(Cloud Services :: Operations: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bobm, Assigned: bobm)

References

(Blocks 1 open bug)

Details

An unknown number of Sync clients are failing to sync because they running into the default Nginx send_timeout limit of 60 seconds when reading large collections.  While this problem should also be addressed on the client side, by handling large reads in a different manner, adjusting the send_timeout setting will provide a more immediate fix.

I propose we do the following:
1. Increase the error log level in production from error to info.  This will allow us to count the user population in this state and measure how successful changes to send_timeout are.
2. Increase send_timeout gradually by some % to reduce the number of users stuck in this state.  Determining the actual upper bound will be difficult, because we can't know how long a client needs until they succeed.  Some investigation of collection size distributions may be in order.
(In reply to Bob Micheletto [:bobm] from comment #0)
Enabled on three servers as logging capacity test.  Those are 187, 291, and 363.  Thus far logging looks to have increased by at least an order of magnitude.
(In reply to Bob Micheletto [:bobm] from comment #1)
> (In reply to Bob Micheletto [:bobm] from comment #0)
A baseline estimate of the user population hitting this issue has been established.  I propose we change the timeout to five minutes, observe the impact, and make further modifications as necessary.  

:kthiessen :markh thoughts?
Flags: needinfo?(markh)
Flags: needinfo?(kthiessen)
(In reply to Bob Micheletto [:bobm] from comment #2)
> (In reply to Bob Micheletto [:bobm] from comment #1)
> > (In reply to Bob Micheletto [:bobm] from comment #0)
> A baseline estimate of the user population hitting this issue has been
> established.  I propose we change the timeout to five minutes, observe the
> impact, and make further modifications as necessary.  
> 
> :kthiessen :markh thoughts?

SGTM - I believe all downsides of this will be on the server rather than the client, so as far as I'm concerned you can make it as large as the backend can handle :)
Flags: needinfo?(markh)
Yes, Bob's suggestion seems quite reasonable to me.
Flags: needinfo?(kthiessen)
send_timeout has been increased to 600 seconds in production.
Estimated user population running into this issue has dropped by ~75%.  Closing this ticket.  We can re-visit if necessary.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.