Deploy release 0.10.0 to loop-server Stage

VERIFIED FIXED

Status

Cloud Services
Operations: Deployment Requests
VERIFIED FIXED
4 years ago
4 years ago

People

(Reporter: alexis, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa+])

(Reporter)

Description

4 years ago
Code changes: https://github.com/mozilla-services/loop-server/compare/0.9.0...0.10.0

Configuration changed since 0.9.0, see:

https://github.com/mozilla-services/loop-server/compare/0.9.0...0.10.0#diff-9

Especially, we should support multiple channels now on the server. The configuration file should be updated as follows:

- For nightly/aurora: https://bugzilla.mozilla.org/show_bug.cgi?id=1033574
- For all other channels (default) https://bugzilla.mozilla.org/show_bug.cgi?id=1021955

Credentials should be:

> credentials": {
>   "default": {
>     "apiKey": "<put your apiKey here>",
>     "apiSecret": "<put your apiSecret here>"
>   },
>   "aurora": {
>     "apiKey": "<put your apiKey here>",
>     "apiSecret": "<put your apiSecret here>"
>   },
>   "nightly": {
>     "apiKey": "<put your apiKey here>",
>     "apiSecret": "<put your apiSecret here>"
>   },
> },

James, I'll let you create the prod deployment request once this is deployed and verified to stage.
Status: NEW → ASSIGNED
Whiteboard: [qa+]
Blocks: 1048264
I *think* :mostlygeek just updated Stage to 0.10.0
I see just now a new m3.medium instance: i-6bfb0b46

Code:
loop-server-svcops 0.10.0-1 x86_64 20905864
puppet-config-loop 20140729184904-1 x86_64 13881

This is instance i-6bfb0b46, a m3.medium in us-east-1a
Standard tags are: app => loop_server, type => loop_server, env => stage, stack => loop-server-stage

But, right now, we are getting a 502 if we curl the server...
This should be all good now.
Moving forward with basic verification of loop-server Stage.
Waiting to hear back from the third-party on an available test window to hit their live server...


Also, see: https://github.com/mozilla-services/puppet-config/pull/781
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Verified we have one updated m3.medium instance running the following code:
loop-server-svcops 0.10.0-1 x86_64 20905864
puppet-config-loop 20140811172607-1 x86_64 14277


Noting here some changes to the all-important config file: /data/loop-server/config/settings.json

The following are new/updated entries:
...etc...
    "fxaAudiences": [
        "app://loop.stage.mozaws.net",
        "http://loop.stage.mozaws.net"
    ],
    "fxaTrustedIssuers": [
        "api-accounts.stage.mozaws.net",
        "msisdn-dev.stage.mozaws.net"
    ],
...etc...
    "timers": {
        "connectionDuration": 10
    },
    "tokBox": {
        "apiUrl": "THE USUAL",
        "credentials": {
            "default": {
                "apiKey": "THE USUAL",
                "apiSecret": "THE USUAL",
                "apiUrl": "THE USUAL"
            }
        }
    },


Please confirm the additions, changes, and layout.
(I can supply values for "THE USUAL" if needed)
curl https://loop.stage.mozaws.net
{"name":"mozilla-loop-server","description":"The Mozilla Loop (WebRTC App) server","version":"0.10.0","homepage":"https://github.com/mozilla-services/loop-server/","endpoint":"https://loop.stage.mozaws.net","fakeTokBox":false}

curl -I https://loop.stage.mozaws.net
HTTP/1.1 200 OK
Date: Mon, 11 Aug 2014 18:25:02 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 226
Connection: keep-alive
ETag: W/"e2-2194042355"
Timestamp: 1407781502
The configuration looks good to me.
Also the default channel apiUrl is optional if it is the same as the tokBox.apiUrl and the connectionDuration timers is now set to the default value.
:natim thanks!


Did a couple of quick 'make test' to verify Stage.
Interesting change to the loadtest Makefile. 'make test' runs a 'make build'.
Not sure why this is needed, given that only QA and Dev use this and know to run 'make build' once before any calls to 'make test'. Is there some requirement to run 'make build' every time we test?

Anyway, waiting to hear back from our third-party...
OK, our test window is Tuesday afternoon, PDT.
The 30min test also looks good:
https://loads.services.mozilla.com/run/c087f859-17d9-421e-82bd-f23a2793ae1f

Going to try a bit more traffic next...
The more stress-based 30min load test failed miserably, probably because I overloaded our single m3.medium instance:
Run 3: 30min
https://loads.services.mozilla.com/run/aa5ba100-df1c-4946-878a-a3c512226a22
users = 40
duration = 1800
agents = 10

Most errors were of this type:
500 Internal Server Error


Going back to an easier 10min to make sure everything is good.
Then I will dial it up a little slower this time...
Blocks: 1052929
The following runs were all successful:
Run 4: 10min
https://loads.services.mozilla.com/run/b543bf16-f526-4631-b90a-879f5e8a1132
Users     [20]
Agents     5
Duration     600

Run 5: 10min
https://loads.services.mozilla.com/run/87bb391a-eddd-4199-937c-e3aec8b65e27
Users     [25]
Agents     5
Duration     600

Run 6: 10min
https://loads.services.mozilla.com/run/819cedf6-0d69-44aa-820a-1ba6856e9495
Users     [30]
Agents     5
Duration     600

Run 7: 45min
https://loads.services.mozilla.com/run/db16450e-b428-4315-8a25-9a02d5450684
users = 30
duration = 2700
agents = 5

Tests over     15005 
Successes     15005
Failures     0
Errors     0
TCP Hits     60620
Opened web sockets     30301
Total web sockets     30301
Bytes/websockets     6156909
Requests / second (RPS)     22
Logs look really good after this early runs with the 500s.
The only other thing I noticed was some bot/spam activity in the /media/ephemeral0/nginx/logs/loop_server.access.log and /media/ephemeral0/circus/loop_server/loop_server.out.log logs (as 404s).
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.