In bug 895558 comment 2, :jakem suggests we tune the TTL settings of the snippets service to ensure we're getting the most possible performance out of our hardware. Specifically... "Can we do 3/4/5/10 minutes, instead of 1.5 min? That'll probably help the hit rate a few %... going from 90% to 95% cuts the volume of traffic that goes to the backend servers in half, so a small change here can help a lot." In bug bug 895558 comment 3, :mkelly explains the setting that would need adjustment to experiment with TTL. Specifically... "The max-age for the requests is controlled by a setting called SNIPPET_HTTP_MAX_AGE, which defaults to 90 (in seconds). I recommend controlling it via that setting in local.py rather than updating the app code." It is not clear where an improvement in the hit rate can be directly observed during this experiment. If you know, please add a comment.
Component: Server Operations: Web Operations → WebOps: Engagement
Product: mozilla.org → Infrastructure & Operations
Hello, If you would like me to make some adjustments to SNIPPET_HTTP_MAX_AGE, just let me know, and I will. There are also cache settings in the load balancer which need to be considered - in particular, the "webcache!time" variable, currently set to 21600 seconds (6 hours). From the documentation : « The traffic manager caches regular HTTP responses for the period of time set in webcache!time. The resource is loaded into the cache the first time that a remote client requests it; then subsequent requests for the same resource are served directly from the cache until the time expires. [...] If the resource is changed on the back-end server, it may take up to webcache!time seconds before the traffic manager notices and the cached copy is updated. » For reference, at this time, the load balancers are maintaining a cache hit rate anywhere from 70 to 95 % : Manager Cache Used Entries Hits Lookups Rate pp-zlb08 72.3 M (0.4 %) 15480 (0.3 %) 9710206502 10236668268 95% pp-zlb09 1842.9 M (9.5 %) 276488 (6.0 %) 15539176781 20097145350 77% pp-zlb10 272.4 M (1.4 %) 22344 (0.5 %) 17160645574 20486365211 84% pp-zlb11 441.9 M (2.3 %) 75834 (1.6 %) 11839352221 12834549258 92% pp-zlb12 172.6 M (0.9 %) 2639 (0.1 %) 4085183511 5862452142 70%
Let's try setting it to 3 minutes, so: SNIPPET_HTTP_MAX_AGE = 180 If that has a significant effect on hit rates, then we can go a bit higher and see if it helps more.
To clarify comment #1, the caching times in the load balancer can be overridden by headers set by the web applications. From the documentation : « The Cache-Control header in an HTTP response can indicate that an HTTP response should never be placed in the web cache. The header can also use the max-age value to specify how long the cached object can be cached for. This may cause a response to be cached for less than the configured webcache!time parameter. »
(In reply to Michael Kelly [:mkelly] from comment #2) > SNIPPET_HTTP_MAX_AGE = 180 This adjustment has been made to local.py on the admin machine. Please feel free to trigger a deployment at your leisure.
Prod has been pushed and the max-age looks correct, it's now 180. How long until we should check the cache hit rates to see if it's had an effect?
(In reply to Michael Kelly [:mkelly] from comment #5) > How long until we should check the cache hit rates to see if it's had an > effect? A few hours at least, but realistically, just leave it until tomorrow.
Status: NEW → ASSIGNED
Assignee: server-ops-webops → dmaher
Priority: -- → P4
As it turns out, the statistics I listed in comment #1 were global cache stats for the LB cluster - the interface is not at all clear on this point (ugh). I'm currently attempting to determine a way to extract statistics for just snippets. WIP.
As predicted, we are now seeing a cache hit rate of ~ 94% (up from ~ 90%) on Snippets prod, which is *excellent*. Great success !
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.