Closed Bug 1255776 Opened 8 years ago Closed 8 years ago

Please deploy kinto-dist 0.1.0 release to Kinto (firefox settings) STAGE.

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhubscher, Assigned: dmaher)

References

Details

Assignee: nobody → dmaher
Depends on: 1248898
QA Contact: kthiessen
I worked through a reasonable number of issues on our end related to my (ill-conceived?) plan to label everything correctly (re: kinto vs kinto-dist).  Anyway that's all squared away now.

Kinto-dist (ref: 35fb783f90c11638cbf5761650e75ecca726e72a) has been deployed to stage and DNS has been updated.

Unfortunately, I ran into this error when attempting to migrate:

```
[dmaher@ip-172-31-29-128 kinto-dist]$ . bin/activate
(kinto-dist-svcops-20160311170739)[dmaher@ip-172-31-29-128 kinto-dist]$ bin/kinto migrate
No handlers could be found for logger "cliquet.initialization"
```

It's late on Friday so I'm going to stop there for now.  Check you on Monday.
> No handlers could be found for logger "cliquet.initialization"

This is just a warning AFAIK.

Do you have any other output?
Blocks: 1255034
Further debugging revealed that the Stage database schema wasn't in a good state to begin with, so the migrate couldn't possibly work in the first place.  After flushing the database (read: drop schema; create schema) it was possible to run the migrate successfully.  Yay!
Status: NEW → ASSIGNED
Turns out that npm libsodium is difficult to build on CentOS 6.x (this is going to be true of more and more things going forward).  The long-term solution is to migrate Loop to CentOS 7; however, this is *well* outside the scope of this deployment, heh.  For now, I've asked :natim to downgrade libsodium in the hopes that we can get this deployment out the door.

```
09:08:00 < phrawzty> Standard8: natim: In the meantime, would it be possible to
                     use an earlier version of libsodium - one that doesn't
                     require autoconf > 2.65 ?
09:08:19 < phrawzty> Standard8: natim: Assuming, of course, that this doesn't
                     introduce security problems, etc.
09:09:23 < natim> phrawzty: I can probably tag a 0.20.1 with the version
                  reverted, would that works for you?
09:09:51 < natim> Standard8: I would do that in the 0.20.x branch so that you
                  can still use master with node 0.12
09:10:14 < phrawzty> natim: Seems like it'd work for me
09:13:31 < natim> phrawzty: here you go:
https://github.com/mozilla-services/loop-server/releases/tag/0.20.1
```


I'm going to go ahead and try that now.
(In reply to Daniel Maher [:phrawzty] from comment #4)
> I'm going to go ahead and try that now.

The package built as expected (pipeline #74), and Jenkins was able to run the LoopServer-STAGE-Deploy job (#71); however, the instance behind the load balancer is out of the pool, and is timing out connections to port 80.  Worse yet, I can't SSH into the failing node, so something is clearly busted.

WIP
OOPS! Comments #4 and #5 were meant to go on a totally different bug.  PLEASE DISREGARD THOSE COMMENTS.

uugghh.
> it was possible to run the migrate successfully

Apparently it seems that the permission schema migration didn't run well.

Can you make sure that the permission backend is correctly configured to use postgresql on both the kinto-writer and the kinto-reader?
(In reply to Rémy Hubscher (:natim) from comment #7)
> Apparently it seems that the permission schema migration didn't run well.
> 
> Can you make sure that the permission backend is correctly configured to use
> postgresql on both the kinto-writer and the kinto-reader?

After teaming up with Remy and Mathieu (like Voltron) we solved the issue (yay).  As it turned out, it was simply a matter of ensuring that the kinto read-only user had permissions to use the public schema (it didn't).
Whilst pairing with :natim we discovered that the following modifications must be made:

* Add more elements to the kinto.signer.resources key:
kinto.signer.resources =
        staging/certificates;blocklists/certificates
        staging/add-ons;blocklists/add-ons
        staging/plugins;blocklists/plugins
        staging/gfx;blocklists/gfx

I will make the config modifications in the appropriate places and re-deploy the Stage stack.
Depends on: 1257882
Stage has been deployed with the settings noted in comment #9 as well as kinto-attachment (from bug 1257882).  The public endpoint is: 37-kinto.stage.mozaws.net which is currently returning ISE on HTTP requests (investigation underway).

See also svcops ansible PR #949[0] and puppet PR #1930[1].

[0] https://github.com/mozilla-services/svcops/pull/949/
[1] https://github.com/mozilla-services/puppet-config/pull/1930
Fixes checked in[0] and Stage re-deployed.

Reader: 38-kinto.stage.mozaws.net
Writer: 16-kinto-writer.stage.mozaws.net

[0] https://github.com/mozilla-services/puppet-config/pull/1931
As noted in comment 11 above, this has been deployed. Please let me know what needs to be done in order to verify this deployment so that we can resolve this bug (and, hopefully, move to Prod).

Thank you.
Flags: needinfo?(tarek)
Flags: needinfo?(rhubscher)
Flags: needinfo?(mathieu)
I am not totally aware of the strategy with regards to loading of initial data.

But since we're on stage, we have a bit of freedom. So in order to make sure it works, I would suggest to trigger a signature on one of the configured kinto.signer.resources.

Using httpie, that would be:
    SERVER_URL=https://16-kinto-writer.stage.mozaws.net


Create buckets and collections for source and destination:

   http PUT $SERVER_URL/v1/buckets/staging --auth user:pass
   http PUT $SERVER_URL/v1/buckets/staging/collections/gfx --auth user:pass
   http PUT $SERVER_URL/v1/buckets/blocklists --auth user:pass
   http PUT $SERVER_URL/v1/buckets/blocklists/collections/gfx --auth user:pass

Create a record in the source collection:

   echo '{"data": {"foo": "test"}}' | http POST $SERVER_URL/v1/buckets/staging/collections/gfx/records --auth user:pass

Trigger a signature operation:

   echo '{"data": {"status": "to-sign"}}' | http PATCH $SERVER_URL/v1/buckets/staging/collections/gfx --auth user:pass


Check that the destination has the record:

    http GET $SERVER_URL/v1/buckets/blocklists/collections/gfx/records

Check that the destination has the signature:

    http GET $SERVER_URL/v1/buckets/blocklists/collections/gfx


Check that the reader has the same infos and write operations are rejected (405).

    SERVER_URL=https://38-kinto.stage.mozaws.net

    http GET $SERVER_URL/v1/buckets/blocklists/collections/gfx/records

    http GET $SERVER_URL/v1/buckets/blocklists/collections/gfx

    echo '{"data": {"foo": "test"}}' | http POST $SERVER_URL/v1/buckets/staging/collections/gfx/records --auth user:pass

Don't hesitate to ping me on IRC.
Flags: needinfo?(mathieu)
Let's put aside the signature part for now.

In order to verify the deployment, JP is rolling back the stage postgres data with the data duped from prod, and we are verifying that kinto stage is migrating properly. 

Once this is validated we can move to prod.
Flags: needinfo?(tarek)
Flags: needinfo?(rhubscher)
Summary: Please deploy kinto-dist 0.1.0 release to OneCRL STAGE → Please deploy kinto-dist 0.1.0 release to Kinto (firefox settings) STAGE.
Blocks: 1261333
We ran into a few issues which I'll briefly cover here.

1)  The settings did not include kinto writer node sizing for prod, and defaulted to the smallest server.  The disk configuration on the smallest servers caused problems with creating a working server.

2)  The fennec_s3_bucket_name variable had to be added in a couple spots for prod to look like stage.

I'm fixing those here https://github.com/mozilla-services/puppet-config/pull/1965
and here: https://github.com/mozilla-services/svcops/pull/983
(In reply to Tarek Ziadé (:tarek) from comment #14)
> Once this is validated we can move to prod.

We've done a fair amount of work on both the infrastructure and the application side of things, and in fact, Prod has been updated already (heh).  How are we looking in terms of *this* bug - are we ready to QA and/or close it?
Let's close it, thanks!
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
QA Contact: kthiessen
You need to log in before you can comment on or make changes to this bug.