On August 17th, when deploying Firefox Accounts Train 184 to production I noticed elevated CPU/memory usage on the oauth database and redis instances correlating with a 10% increase in HTTP traffic starting at 2020-08-17T00:00:00Z. After digging into the logs, we found that the
/v1/account/attached_devices route was seeing up to 100x to 150x more traffic than normal . :rfkelly investigated various experiment platforms that we use and found that ASRouter has the ability to list oauth service services/devices connected to the signed-in Firefox Account to support targeting messages. :k88hudson showed me how to use the ASRouter dev tools which allowed me to find two snippets (24160 and 24163) that use this feature.
:andreio added some more context around this filter:
andreio - I know it was always problematic, we need the data to be as recent as possible (users actives some services => stop showing messages about that services) but the call is expensive to make. Would increasing the cache time on our side help?
jbuck - on the caching time side - how often does snippets check targeting?
andreio - every time you open a new tab until the user interacts with the snippet or the campaign is taken down
andreio - there's a small 5 min cache (between opening 2 tabs) where we don't look at targeting (we assume nothing changed) and a 2hr cache for that attribute
jbuck - lets say it got bumped up to a 4 hour cache - would it be reasonable to assume that it’d halve the traffic to this route? (edited)
So, I think the short-term fix is increasing the cache expiry time to 4 hours, to reduce traffic to this endpoint.
The long-term fix is probably having FxA client within the browser cache this information/use push notifications to keep it up to date so ASRouter doesn't need to worry about these details.