Adjust TTL settings on snippets service



Infrastructure & Operations Graveyard
WebOps: Engagement
5 years ago
2 years ago


(Reporter: hoosteeno, Assigned: phrawzty)



In bug 895558 comment 2, :jakem suggests we tune the TTL settings of the snippets service to ensure we're getting the most possible performance out of our hardware. Specifically...

"Can we do 3/4/5/10 minutes, instead of 1.5 min? That'll probably help the hit rate a few %... going from 90% to 95% cuts the volume of traffic that goes to the backend servers in half, so a small change here can help a lot."

In bug bug 895558 comment 3, :mkelly explains the setting that would need adjustment to experiment with TTL. Specifically...

"The max-age for the requests is controlled by a setting called SNIPPET_HTTP_MAX_AGE, which defaults to 90 (in seconds). I recommend controlling it via that setting in rather than updating the app code."

It is not clear where an improvement in the hit rate can be directly observed during this experiment. If you know, please add a comment.


5 years ago
Component: Server Operations: Web Operations → WebOps: Engagement
Product: → Infrastructure & Operations

Comment 1

5 years ago

If you would like me to make some adjustments to SNIPPET_HTTP_MAX_AGE, just let me know, and I will.

There are also cache settings in the load balancer which need to be considered - in particular, the "webcache!time" variable, currently set to 21600 seconds (6 hours).  From the documentation :

« The traffic manager caches regular HTTP responses for the period of time set in webcache!time. The resource is loaded into the cache the first time that a remote client requests it; then subsequent requests for the same resource are served directly from the cache until the time expires.  [...]  If the resource is changed on the back-end server, it may take up to webcache!time seconds before the traffic manager notices and the cached copy is updated. »

For reference, at this time, the load balancers are maintaining a cache hit rate anywhere from 70 to 95 % :

Manager   Cache Used        Entries         Hits         Lookups      Rate
pp-zlb08  72.3 M (0.4 %)    15480 (0.3 %)   9710206502   10236668268  95%
pp-zlb09  1842.9 M (9.5 %)  276488 (6.0 %)  15539176781  20097145350  77%
pp-zlb10  272.4 M (1.4 %)   22344 (0.5 %)   17160645574  20486365211  84%
pp-zlb11  441.9 M (2.3 %)   75834 (1.6 %)   11839352221  12834549258  92%
pp-zlb12  172.6 M (0.9 %)   2639 (0.1 %)    4085183511   5862452142   70%
Let's try setting it to 3 minutes, so:


If that has a significant effect on hit rates, then we can go a bit higher and see if it helps more.

Comment 3

5 years ago
To clarify comment #1, the caching times in the load balancer can be overridden by headers set by the web applications.  From the documentation :

« The Cache-Control header in an HTTP response can indicate that an HTTP response should never be placed in the web cache. The header can also use the max-age value to specify how long the cached object can be cached for. This may cause a response to be cached for less than the configured webcache!time parameter. »

Comment 4

5 years ago
(In reply to Michael Kelly [:mkelly] from comment #2)

This adjustment has been made to on the admin machine.  Please feel free to trigger a deployment at your leisure.
Prod has been pushed and the max-age looks correct, it's now 180.

How long until we should check the cache hit rates to see if it's had an effect?

Comment 6

5 years ago
(In reply to Michael Kelly [:mkelly] from comment #5)
> How long until we should check the cache hit rates to see if it's had an
> effect?

A few hours at least, but realistically, just leave it until tomorrow.


5 years ago
Assignee: server-ops-webops → dmaher
Priority: -- → P4

Comment 7

5 years ago
As it turns out, the statistics I listed in comment #1 were global cache stats for the LB cluster - the interface is not at all clear on this point (ugh).

I'm currently attempting to determine a way to extract statistics for just snippets.



5 years ago
See Also: → bug 905609

Comment 8

5 years ago
As predicted, we are now seeing a cache hit rate of ~ 94% (up from ~ 90%) on Snippets prod, which is *excellent*.

Great success !
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.