Open
Bug 1311510
Opened 8 years ago
Updated 2 years ago
chrome.storage.sync: performance test of production stack for chrome.storage.sync
Categories
(WebExtensions :: Storage, defect, P3)
WebExtensions
Storage
Tracking
(Not tracked)
NEW
People
(Reporter: glasserc, Unassigned)
References
Details
(Whiteboard: [storage], triaged)
We need to make sure that we have enough capacity in production for when this feature hits beta.
Updated•8 years ago
|
Component: WebExtensions: Untriaged → WebExtensions: General
Priority: -- → P3
Whiteboard: [storage]triaged
Updated•8 years ago
|
Comment 2•8 years ago
|
||
Hi Andy, We have some loadtest for Kinto ready there: https://github.com/mozilla-services/ailoads-kinto Usually QA is running them. But I don't know who is the QA for the service side of the webextensions stack. Stuart do you know if we have a QA that could run some loadtest on the webextension stack?
Flags: needinfo?(rhubscher) → needinfo?(sphilp)
Comment 3•8 years ago
|
||
Karl can take it, cc'ing him. Do we need this for a certain date?
Flags: needinfo?(sphilp)
Updated•8 years ago
|
QA Contact: kthiessen
Comment 4•8 years ago
|
||
(In reply to Stuart Philp :sphilp from comment #3) > Karl can take it, cc'ing him. Do we need this for a certain date? This would block the feature from landing in beta. So, sometime before that would be great. Release trains are at https://wiki.mozilla.org/RapidRelease/Calendar
Comment 5•8 years ago
|
||
I'll note that we're aiming for Firefox 53 here, which means the relevant merge date is currently 2017-03-06. I can agree to that timeframe.
Comment 6•8 years ago
|
||
Who in Services Ops is going to be in charge of this production deployment? Can we get them cc'ed on this bug, please, or get a pointer to another bug to use for communication with Ops?
Flags: needinfo?(eglassercamp)
Comment 7•8 years ago
|
||
More questions: * Do we have defined desired capacities in terms of, for example, number of queries per second we want the service to stand up under? * Do we need to co-ordinate with Ops to determine what the optimum size of the production cluster will be, or have they already made that decision? * Who is our Ops contact for deployment verification? Is there a stage instance for the AMO-specific cluster, or are we just using the existing https://webextensions-settings.stage.mozaws.net? * My team are standing up the load testing apparatus today and tomorrow; we should have the first successful tests late this week or early next, and I'm hoping to have a go/no-go call by the end of next week. Does that work with everyone's timetable?
Comment 8•8 years ago
|
||
(In reply to Karl Thiessen [:kthiessen] from comment #6) > Who in Services Ops is going to be in charge of this production deployment? > Can we get them cc'ed on this bug, please, or get a pointer to another bug > to use for communication with Ops? I am the primary Ops on Kinto/Storage today, bobm is secondary.
Comment 9•8 years ago
|
||
(In reply to Karl Thiessen [:kthiessen] from comment #7) > More questions: > * Do we need to co-ordinate with Ops to determine what the optimum size of > the production cluster will be, or have they already made that decision? > As of right now production is up but with minimal resources, 3 web instances c4.large and RDS m4.large [1]. We can adjust as needed based on performance testing and how much traffic we expect to receive. Production endpoint is https://webextensions.settings.services.mozilla.com/v1/ > * Who is our Ops contact for deployment verification? Is there a stage > instance for the AMO-specific cluster, or are we just using the existing > https://webextensions-settings.stage.mozaws.net? I am the Ops contact, reach out to me with any questions. We should use https://webextensions-settings.stage.mozaws.net for testing. [1] https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kintowe/ansible/envs/prod.yml#L15-L20
Comment 10•8 years ago
|
||
Brilliant! Thanks, Jason. The only outstanding question is: * Do we have defined desired capacities in terms of, for example, number of queries per second we want the service to stand up under? Ethan, have you got an answer for that, or an you point us in the direction of someone who does?
Comment 11•8 years ago
|
||
(In reply to Karl Thiessen [:kthiessen] from comment #10) > Brilliant! Thanks, Jason. > > The only outstanding question is: > > * Do we have defined desired capacities in terms of, for example, number of > queries per second we want the service to stand up under? AdBlock Plus uses this. If you took the number of users for AdBlock Plus (~20 million) multiplied it by the number of times it queries in a day, you'll get the idea. But AdBlock Plus isn't moving over to this for a few releases. Overall approx. 15% of all add-ons on the Chrome store use this API [1]. We currently have 89 add-ons using it [2]. We've explicitly stated that this API end point has no SLA around usage or performance, developers get what they get and they don't get upset. I really don't want us to end up throwing too many resources at this and would like to suggest we ramp up performance as the usage increases, I expect very little usage until it hits a peak when something like AdBlock Plus hits release (expected November). It's worth noting that chrome.storage.sync only works if you are signed in through Firefox Sync. So we can probably say that a simple metric is to take the amount of traffic that syncing through Firefox Sync does and then dividing that by. How many queries per second that translates into, I don't know. But I would be interested in the amount of GET, POSTs and PUTs on sync right now from other services and then suggesting that by Nov 57, the load on this service would be a fraction of that (amount of sync traffic / amount of add-ons using). What numbers do other sync services handle? What numbers can Kinto put up right now? [1] https://github.com/andymckay/arewewebextensionsyet.com/blob/master/usage.csv#L16 [2] https://gist.github.com/andymckay/10c3a4c64ce8990b589f0ac740f65955#file-firefox-permissions-L131
Flags: needinfo?(eglassercamp)
Comment 12•8 years ago
|
||
Thank you, Andy! That's very useful information. I'll check with the Sync metrics team and see if I can get some related data.
Comment 13•8 years ago
|
||
I'm not sure what the policy is for putting traffic numbers in public bugs, but I have the sync numbers that Andy asked for above, and will bring them to the meeting tomorrow.
Comment 14•8 years ago
|
||
Do we want load test results in this bug, or somewhere more private (since they're likely to include performance thresholds)?
Flags: needinfo?(jthomas)
Comment 15•8 years ago
|
||
I think we should keep performance thresholds private. Sharing via google docs works for me but if you want to include datadog graphs might be worth looking at https://app.datadoghq.com/notebook/.
Flags: needinfo?(jthomas)
Comment 16•8 years ago
|
||
https://app.datadoghq.com/notebook/list is better and has a notebook created by :miles for another project.
Updated•8 years ago
|
QA Contact: kthiessen → chartjes
Comment 17•8 years ago
|
||
We have a scenario document being used for load testing here: https://docs.google.com/document/d/1na-4DtECFRf0zEgJzaeK4G6MJAINfqY_UO_8rx5p_ME/edit Please get the scenarios you want tested into that document, so that Chris can do the testing required to make sure this product is ready for release.
Comment 18•8 years ago
|
||
Hi Ethan, Are you the best person to gather the scenarios for load testing? Added Bob as well - who worked as webextension liaison
Flags: needinfo?(eglassercamp)
Flags: needinfo?(bob.silverberg)
Whiteboard: [storage]triaged → [storage]
Comment 19•8 years ago
|
||
Not sure if QA plan helps - in case some of those scenarios are good perf test cases
Comment 20•8 years ago
|
||
Krupa and I have been chatting to Karl about load testing for this. Whats the next steps?
Flags: needinfo?(kthiessen)
Flags: needinfo?(eglassercamp)
Flags: needinfo?(bob.silverberg)
Comment 21•8 years ago
|
||
Let's lay out a timeline of load testing/perf scenarios -- we expect n users by d date, staggered all the way up to November. Then we can schedule a series of load tests on some sort of non-production environment -- either stage or something purpose-built in AWS. We will also need to think about how we are going to model users for the load/perf tests -- how often are we going to allow a given add-on to hit its storage container, etc, and how much enforcement of those limits is needed? I think the important thing here is to start early and test frequently.
Flags: needinfo?(kthiessen)
Comment 22•8 years ago
|
||
(In reply to Karl Thiessen [:kthiessen] from comment #21) > Let's lay out a timeline of load testing/perf scenarios -- we expect n users > by d date, staggered all the way up to November. > > Then we can schedule a series of load tests on some sort of non-production > environment -- either stage or something purpose-built in AWS. We will also > need to think about how we are going to model users for the load/perf tests > -- how often are we going to allow a given add-on to hit its storage > container, etc, and how much enforcement of those limits is needed? > > I think the important thing here is to start early and test frequently. andy, rémy - thoughts?
Flags: needinfo?(rhubscher)
Updated•8 years ago
|
Flags: needinfo?(amckay)
Comment 23•8 years ago
|
||
I guess settings in an add-on shouldn't be write heavy and people are syncing with their mobile 48% of the time and with another desktop 52%. The Android app doesn't support the storage.sync API just yet. So it means we will have at max 4 millions users that might be using the storage sync API in one or two addons. In my opinion if the stack can handle 300 requests per seconds we should be fine for a while. Because it means we can handle 9 millions sync per day. Which means every user would be updating the two add-ons everyday which is really unlikely.
Flags: needinfo?(rhubscher)
Updated•8 years ago
|
Whiteboard: [storage] → [storage], triaged
Comment 24•8 years ago
|
||
What would be really awesome is a dashboard on data dog that shows the amount of traffic the production instances get in terms of reads and writes and time taken. If we can handle 300 requests per second then I'm pretty happy with things, but a graph showing this would let us all see whats happening and react appropriately.
Flags: needinfo?(amckay)
Comment 25•7 years ago
|
||
Benson, do you know of any graphs or metrics for this service?
Flags: needinfo?(bwong)
Comment 26•7 years ago
|
||
jason set up a datadog dashboard for it [1]. If there's other metrics you'd like I can add them to the dashboard. [1] https://app.datadoghq.com/dash/241098/kinto-webextensions-prod?live=true&page=0&is_auto=false&from_ts=1494333525702&to_ts=1494347925702&tile_size=m
Flags: needinfo?(bwong)
Updated•7 years ago
|
Component: WebExtensions: General → WebExtensions: Storage
Updated•6 years ago
|
Product: Toolkit → WebExtensions
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•