Bugzilla

Updated

•

8 years ago

Component: WebExtensions: Untriaged → WebExtensions: General

Priority: -- → P3

Whiteboard: [storage]triaged

Updated

•

8 years ago

Blocks: 1220494
No longer blocks: 1311710

Comment 1

•

8 years ago

Would this be you Remy, or someone else?

Flags: needinfo?(rhubscher)

Rémy Hubscher (:natim)

Comment 2

•

8 years ago

Hi Andy,

We have some loadtest for Kinto ready there: https://github.com/mozilla-services/ailoads-kinto
Usually QA is running them. But I don't know who is the QA for the service side of the webextensions stack.

Stuart do you know if we have a QA that could run some loadtest on the webextension stack?

Flags: needinfo?(rhubscher) → needinfo?(sphilp)

Stuart Philp :sphilp

Comment 3

•

8 years ago

Karl can take it, cc'ing him. Do we need this for a certain date?

Flags: needinfo?(sphilp)

krupa raj [:krupa--use this to needinfo]

Updated

•

8 years ago

QA Contact: kthiessen

Comment 4

•

8 years ago

(In reply to Stuart Philp :sphilp from comment #3)
> Karl can take it, cc'ing him. Do we need this for a certain date?

This would block the feature from landing in beta. So, sometime before that would be great. Release trains are at https://wiki.mozilla.org/RapidRelease/Calendar

Comment 5

•

8 years ago

I'll note that we're aiming for Firefox 53 here, which means the relevant merge date is currently 2017-03-06.

I can agree to that timeframe.

Comment 6

•

8 years ago

Who in Services Ops is going to be in charge of this production deployment?  Can we get them cc'ed on this bug, please, or get a pointer to another bug to use for communication with Ops?

Flags: needinfo?(eglassercamp)

Comment 7

•

8 years ago

More questions:

* Do we have defined desired capacities in terms of, for example, number of queries per second we want the service to stand up under?

* Do we need to co-ordinate with Ops to determine what the optimum size of the production cluster will be, or have they already made that decision?

* Who is our Ops contact for deployment verification?  Is there a stage instance for the AMO-specific cluster, or are we just using the existing https://webextensions-settings.stage.mozaws.net?

* My team are standing up the load testing apparatus today and tomorrow; we should have the first successful tests late this week or early next, and I'm hoping to have a go/no-go call by the end of next week.  Does that work with everyone's timetable?

Comment 8

•

8 years ago

(In reply to Karl Thiessen [:kthiessen] from comment #6)
> Who in Services Ops is going to be in charge of this production deployment? 
> Can we get them cc'ed on this bug, please, or get a pointer to another bug
> to use for communication with Ops?

I am the primary Ops on Kinto/Storage today, bobm is secondary.

Comment 9

•

8 years ago

(In reply to Karl Thiessen [:kthiessen] from comment #7)
> More questions:
> * Do we need to co-ordinate with Ops to determine what the optimum size of
> the production cluster will be, or have they already made that decision?
> 

As of right now production is up but with minimal resources, 3 web instances c4.large and RDS m4.large [1]. We can adjust as needed based on performance testing and how much traffic we expect to receive. Production endpoint is https://webextensions.settings.services.mozilla.com/v1/


> * Who is our Ops contact for deployment verification?  Is there a stage
> instance for the AMO-specific cluster, or are we just using the existing
> https://webextensions-settings.stage.mozaws.net?

I am the Ops contact, reach out to me with any questions. We should use https://webextensions-settings.stage.mozaws.net for testing.


[1] https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kintowe/ansible/envs/prod.yml#L15-L20

Comment 10

•

8 years ago

Brilliant!  Thanks, Jason.

The only outstanding question is:

* Do we have defined desired capacities in terms of, for example, number of queries per second we want the service to stand up under?

Ethan, have you got an answer for that, or an you point us in the direction of someone who does?

Comment 11

•

8 years ago

(In reply to Karl Thiessen [:kthiessen] from comment #10)
> Brilliant!  Thanks, Jason.
> 
> The only outstanding question is:
> 
> * Do we have defined desired capacities in terms of, for example, number of
> queries per second we want the service to stand up under?

AdBlock Plus uses this. If you took the number of users for AdBlock Plus (~20 million) multiplied it by the number of times it queries in a day, you'll get the idea. But AdBlock Plus isn't moving over to this for a few releases.

Overall approx. 15% of all add-ons on the Chrome store use this API [1]. We currently have 89 add-ons using it [2].

We've explicitly stated that this API end point has no SLA around usage or performance, developers get what they get and they don't get upset.

I really don't want us to end up throwing too many resources at this and would like to suggest we ramp up performance as the usage increases, I expect very little usage until it hits a peak when something like AdBlock Plus hits release (expected November).

It's worth noting that chrome.storage.sync only works if you are signed in through Firefox Sync. So we can probably say that a simple metric is to take the amount of traffic that syncing through Firefox Sync does and then dividing that by.

How many queries per second that translates into, I don't know. But I would be interested in the amount of GET, POSTs and PUTs on sync right now from other services and then suggesting that by Nov 57, the load on this service would be a fraction of that (amount of sync traffic / amount of add-ons using). 

What numbers do other sync services handle?

What numbers can Kinto put up right now?

[1] https://github.com/andymckay/arewewebextensionsyet.com/blob/master/usage.csv#L16
[2] https://gist.github.com/andymckay/10c3a4c64ce8990b589f0ac740f65955#file-firefox-permissions-L131

Flags: needinfo?(eglassercamp)

Comment 12

•

8 years ago

Thank you, Andy!  That's very useful information.  I'll check with the Sync metrics team and see if I can get some related data.

Comment 13

•

8 years ago

I'm not sure what the policy is for putting traffic numbers in public bugs, but I have the sync numbers that Andy asked for above, and will bring them to the meeting tomorrow.

Comment 14

•

8 years ago

Do we want load test results in this bug, or somewhere more private (since they're likely to include performance thresholds)?

Flags: needinfo?(jthomas)

Comment 15

•

8 years ago

I think we should keep performance thresholds private. Sharing via google docs works for me but if you want to include datadog graphs might be worth looking at https://app.datadoghq.com/notebook/.

Flags: needinfo?(jthomas)

Comment 16

•

8 years ago

https://app.datadoghq.com/notebook/list is better and has a notebook created by :miles for another project.

Updated

•

8 years ago

QA Contact: kthiessen → chartjes

Comment 17

•

8 years ago

We have a scenario document being used for load testing here:
   https://docs.google.com/document/d/1na-4DtECFRf0zEgJzaeK4G6MJAINfqY_UO_8rx5p_ME/edit

Please get the scenarios you want tested into that document, so that Chris can do the testing required to make sure this product is ready for release.

:shell escalante

Comment 18

•

8 years ago

Hi Ethan,

Are you the best person to gather the scenarios for load testing?  Added Bob as well - who worked as webextension liaison

Flags: needinfo?(eglassercamp)

Flags: needinfo?(bob.silverberg)

Whiteboard: [storage]triaged → [storage]

:shell escalante

Comment 19

•

8 years ago

Not sure if QA plan helps - in case some of those scenarios are good perf test cases

Comment 20

•

8 years ago

Krupa and I have been chatting to Karl about load testing for this. Whats the next steps?

Flags: needinfo?(kthiessen)

Flags: needinfo?(eglassercamp)

Flags: needinfo?(bob.silverberg)

krupa raj [:krupa--use this to needinfo]

Comment 21

•

8 years ago

Let's lay out a timeline of load testing/perf scenarios -- we expect n users by d date, staggered all the way up to November.

Then we can schedule a series of load tests on some sort of non-production environment -- either stage or something purpose-built in AWS.  We will also need to think about how we are going to model users for the load/perf tests -- how often are we going to allow a given add-on to hit its storage container, etc, and how much enforcement of those limits is needed?

I think the important thing here is to start early and test frequently.

Flags: needinfo?(kthiessen)

Comment 22

•

8 years ago

(In reply to Karl Thiessen [:kthiessen] from comment #21)
> Let's lay out a timeline of load testing/perf scenarios -- we expect n users
> by d date, staggered all the way up to November.
> 
> Then we can schedule a series of load tests on some sort of non-production
> environment -- either stage or something purpose-built in AWS.  We will also
> need to think about how we are going to model users for the load/perf tests
> -- how often are we going to allow a given add-on to hit its storage
> container, etc, and how much enforcement of those limits is needed?
> 
> I think the important thing here is to start early and test frequently.

andy, rémy - thoughts?

Flags: needinfo?(rhubscher)

krupa raj [:krupa--use this to needinfo]

Updated

•

8 years ago

Flags: needinfo?(amckay)

Rémy Hubscher (:natim)

Comment 23

•

8 years ago

I guess settings in an add-on shouldn't be write heavy and people are syncing with their mobile 48% of the time and with another desktop 52%.

The Android app doesn't support the storage.sync API just yet.

So it means we will have at max 4 millions users that might be using the storage sync API in one or two addons.


In my opinion if the stack can handle 300 requests per seconds we should be fine for a while. Because it means we can handle 9 millions sync per day. Which means every user would be updating the two add-ons everyday which is really unlikely.

Flags: needinfo?(rhubscher)

:shell escalante

Updated

•

8 years ago

Whiteboard: [storage] → [storage], triaged

Comment 24

•

8 years ago

What would be really awesome is a dashboard on data dog that shows the amount of traffic the production instances get in terms of reads and writes and time taken. If we can handle 300 requests per second then I'm pretty happy with things, but a graph showing this would let us all see whats happening and react appropriately.

Flags: needinfo?(amckay)

Benson Wong [:mostlygeek]

Comment 25

•

7 years ago

Benson, do you know of any graphs or metrics for this service?

Flags: needinfo?(bwong)

Comment 26

•

7 years ago

jason set up a datadog dashboard for it [1]. If there's other metrics you'd like I can add them to the dashboard.


[1] https://app.datadoghq.com/dash/241098/kinto-webextensions-prod?live=true&page=0&is_auto=false&from_ts=1494333525702&to_ts=1494347925702&tile_size=m

Flags: needinfo?(bwong)