Closed Bug 1035960 Opened 11 years ago Closed 11 years ago

Deploy Release 0.3.1 to msisdn-gateway Stage

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jbonacci, Assigned: mostlygeek)

References

Details

(Whiteboard: [qa+])

Whiteboard: [qa+]
Stage stack: https://msisdn.stage.mozaws.net updated to 0.3.1
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
OK, verified the deployment to a single m3.medium instance: ec2-54-198-16-11 Versions msisdn-gateway-svcops 0.3.1-1 x86_64 49860025 puppet-config-msisdn 20140703171505-1 x86_64 9919 Process app node msisdn-gateway/index.js /data/msisdn-gateway/config/production.json is set up to use the mock server http://omxen.dev.mozaws.net/ https://msisdn.stage.mozaws.net returns; {"name":"mozilla-msisdn-gateway","description":"The Mozilla MSISDN Gateway","version":"0.3.1", "homepage":"https://github.com/mozilla-services/msisdn-gateway/","endpoint":"https://msisdn.stage.mozaws.net"} https://msisdn.stage.mozaws.net/__heartbeat__ returns: {"storage":true} Will set up some load testing soon...
'make test' and 'make bench' from Mac to Stage were successful. Will attempt a short 'make megabench' load test a bit later this evening. Opened some bugs against Ubuntu and/or RHEL for load test issues...
Two short load tests look good. I will continue this tomorrow (Wednesday).
OK, well second short load test ran to completion, but there were errors: https://loads.services.mozilla.com/run/f969bcbb-035e-4c5b-ace8-504c5fd32b7f I will investigate tomorrow...
First 5 runs (all short) - in chronological order RUN 1: https://loads.services.mozilla.com/run/36437a16-cf44-4d65-8b4c-0dd84bb994cd NO APPARENT ERRORS Results Tests over 10000 Successes 10000 Failures 0 Errors 0 RUN 2: https://loads.services.mozilla.com/run/f969bcbb-035e-4c5b-ace8-504c5fd32b7f 385 POSSIBLE ERRORS Results Tests over 100000 Successes 99615 Failures 0 Errors 0 addFailure 385 RUN 3: https://loads.services.mozilla.com/run/dab52058-a7c2-4e38-99d8-d730557ed290 2616 POSSIBLE ERRORS Results Tests over 10000 Successes 7384 Failures 0 Errors 0 addFailure 2616 RUN 4: https://loads.services.mozilla.com/run/394f0abd-2096-4b37-9774-018663cb4a5e NO APPARENT ERRORS Results Tests over 49850 Successes 49850 Failures 0 Errors 0 RUN 5: https://loads.services.mozilla.com/run/6038bad3-824e-4810-9b66-a0e39de2adad 2899 POSSIBLE ERRORS Results Tests over 10000 Successes 7101 Failures 0 Errors 0 addFailure 2899 I will investigate results in the logs. After that, it looks like I will need to run a longer load test while monitoring the logs to see what I catch...
OK, well yuck. This will need to be looked at: /media/ephemeral0/msisdn-gateway/msisdn-gateway_err.log has traceback in it: events.js:72 throw er; // Unhandled 'error' event ^ TypeError: Cannot call method 'toString' of undefined at SecretBox.self.decrypt (/data/msisdn-gateway/node_modules/sodium/lib/secretbox.js:149:30) at Object.decrypt (/data/msisdn-gateway/msisdn-gateway/encrypt.js:40:14) at /data/msisdn-gateway/msisdn-gateway/index.js:573:34 at /data/msisdn-gateway/msisdn-gateway/storage/redis.js:211:13 at try_callback (/data/msisdn-gateway/node_modules/redis/index.js:573:9) at RedisClient.return_reply (/data/msisdn-gateway/node_modules/redis/index.js:661:13) at HiredisReplyParser.<anonymous> (/data/msisdn-gateway/node_modules/redis/index.js:309:14) at HiredisReplyParser.emit (events.js:95:17) at HiredisReplyParser.execute (/data/msisdn-gateway/node_modules/redis/lib/parser/hiredis.js:43:18) at RedisClient.on_data (/data/msisdn-gateway/node_modules/redis/index.js:534:27) at Socket.<anonymous> (/data/msisdn-gateway/node_modules/redis/index.js:91:14) at Socket.emit (events.js:95:17) at Socket.<anonymous> (_stream_readable.js:748:14) at Socket.emit (events.js:92:17) at emitReadable_ (_stream_readable.js:410:10) at emitReadable (_stream_readable.js:406:5) at readableAddChunk (_stream_readable.js:168:9) at Socket.Readable.push (_stream_readable.js:130:10) at TCP.onread (net.js:528:21) And, well, this is interesting. As of the moment I checked, the nginx logs had not been updated for 20 hours. So, it looks like some logging is not working right in Stage? Are there any other/new logs that I don't know about? So, that's it for first round of Stage testing - I can't pass this w/o some understanding of what's going on with the traceback and the logging (or lack thereof).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
re: logging. Yes there was a log_rotate bug. See: https://github.com/mozilla-services/puppet-config/pull/664 I'll need to redeploy the box to fix it. Nothing else has changed. re: stack dump, that looks like something for the devs.
OK msisdn-stage has been re-deployed fixing the logging bug.
OK, thanks for the logging fix. So, I am just blocked on the traceback info shown in https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c7
Blocks: 1036736
We need to have the detail of the failures in the loads dashboard and put back the timestamp on circus logs.
:natim, so you want more information than what is in https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c6 ?
Yes but loads doesn't provide it for you. It is tracked here: https://github.com/mozilla-services/loads/issues/268
I just merged https://github.com/mozilla-services/msisdn-gateway/commit/ef4a34291392a0719a4fc012c7d0b1f4b06a5961 We should probably test to run loadtests with DynamoDB configured. Also :tarek fixed the failure display inside the loads cluster.
(In reply to Rémy Hubscher (:natim) from comment #14) > I just merged > https://github.com/mozilla-services/msisdn-gateway/commit/ > ef4a34291392a0719a4fc012c7d0b1f4b06a5961 > > We should probably test to run loadtests with DynamoDB configured. > > Also :tarek fixed the failure display inside the loads cluster. 0.4.0? I don't think we should mix up the deploy tickets. 0.3.0 is in prod and it is broken so we should at least get this one out before blocking ourselves with testing a dynamodb backend.
Assignee: nobody → bwong
Yea, I prefer one change at a time since we have not released to Prod in awhile. But, I also prefer to wait for a possible fix to the issue I found in Comment 7, reproduced in Comment 15.
Yeah I agree. That said, the dynamo backend won't really get in the way if the config points to the redis one
OK, so for the traceback and 302s, I opened this new bug: bug 1037604 Consider it a blocker for any further deployments and testing in Stage and Production. Separately, let's hold off on the changes for dynamo support unless this is a high-priority change.
Depends on: 1037604
I think you mean 502 rather than 302 right?
That reference was pulled from here: https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c15 But, yea, based on your test results - https://loads.services.mozilla.com/run/b3cbd9f1-b8f2-415b-981b-54fcac0a7f7b - it should be 502, not 302. Two issues to address: The traceback I found in the logs The 502s you found in the loads dashboard
Not going to morph this bug. Resolving as incomplete so we can move on to 0.4.1.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → INCOMPLETE
No longer blocks: 1036736
Switching over to Verified since we are quickly deploying this to Prod and moving right to 0.4.1.
Status: RESOLVED → VERIFIED
Resolution: INCOMPLETE → FIXED
You need to log in before you can comment on or make changes to this bug.