Closed Bug 1241852 Opened 8 years ago Closed 8 years ago

[careers] Change DNS entries for careers.m.o and careers.a.o (Move to AWS)

Categories

(Infrastructure & Operations Graveyard :: WebOps: Engagement, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: giorgos, Assigned: rwatson)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2497] )

Hi Ops,

We have a new home for Careers website. Please change DNS to

 - careers.mozilla.org CNAME careers-mozilla-org-36000613.us-west-2.elb.amazonaws.com 

 - careers.allizom.org CNAME allizom-org-234627437.us-west-2.elb.amazonaws.com

Please verify by visiting

 https://careers.mozilla.org/static/revision.txt

and

 https://careers.allizom.org/static/revision.txt

both must return da69f52fdef4cec250988efc8dab39bd5fcf6bf0

Thanks!
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2497]
Assignee: server-ops-webops → rwatson
Pushed the change, waiting for propagation.
Change Request: --- → routine
Pushed allizom first, the change applied and worked. 
Then pushed prod, we saw nagios alerts and the lookup didn't work, so I swiftly changed it back. 
:giorgos can you reach out to me on IRC when you have a second.
Flags: needinfo?(giorgos)
or our pingdom checks is failing :
Go to URL http://careers.mozilla.org/admin
HTTP status code should be 401

it's retruning 200 is that check still valid ? or should I change it ?
For bug info. We updated the DNS again and it looks like it took this time.
Nagios is still reporting down. ICMP pings to the nodes in AWS are not guaranteed to work so I disabled the ping check by putting in a dummy ping check for that service. the https-websites is still valid.

giorgos I assume you having monitoring in place for careers now?

Sending        phx1.pp
Transmitting file data .
Committed revision 114215.
[ludo@Oulanl mozilla]$ svn diff
Index: services.pp
===================================================================
--- services.pp	(revision 114215)
+++ services.pp	(working copy)
@@ -4800,7 +4800,6 @@
             normal_check_interval => 5,
             hostgroups => $::fqdn ? {
                 'nagios1.private.phx1.mozilla.com' => [
-                    'python-cluster-http-websites',
                 ],
                 default => [
                 ]
@@ -4812,7 +4811,6 @@
             normal_check_interval => 5,
             hostgroups => $::fqdn ? {
                 'nagios1.private.phx1.mozilla.com' => [
-                    'python-cluster-https-websites',
                 ],
                 default => [
                 ]
[ludo@Oulanl mozilla]$ svn commit -m"fixing 1241852"
Sending        services.pp
Transmitting file data .done
Committing transaction...
Committed revision 114216.
[ludo@Oulanl mozilla]$
I appreciate wanting to get this done but in the future no changes like this on a Friday, especially because this is end of day "Giorgos Time".
Had this come in,

12:41:55 <nagios-phx1> careers.mozilla.org:HTTPS - SSL Cert expiration is CRITICAL: CRITICAL - Socket timeout after 10 seconds (http://m.mozilla.org/HTTPS+-+SSL+Cert+expiration)

Fixed in sysadmins r114261,

13:11:31 <@nagios-scl3> careers.mozilla.org:HTTPS - SSL Cert expiration is OK - OK - Certificate 'careers.mozilla.org' will expire on 2017-11-09 04:00 -0800/PST. Last Checked: 2016-01-23 13:08:39 PST
(In reply to Ben (:bensternthal) from comment #7)
> I appreciate wanting to get this done but in the future no changes like this
> on a Friday, especially because this is end of day "Giorgos Time".

ftr we worked both stage and prod together with w0ts0n and verified that it worked OK during european time. AFAIK there's work to be done to update the nagios alerts to play well with AWS. 

That said I'll avoid filing similar bugs in the future on Fridays. Thanks Ben!
Flags: needinfo?(giorgos)
(In reply to Ryan Watson [:w0ts0n] from comment #5)
> giorgos I assume you having monitoring in place for careers now?

I've a NR Application alert policy configured to http get both prod and stage and email me and ping vectorvictor bot on IRC

https://rpm.newrelic.com/accounts/263620/application_alert_policies?search[q]=careers.mozilla.org
As per bug: 1242454 this still needs some monitoring work done, but this bug can be closed.
Status: NEW → RESOLVED
Closed: 8 years ago
Depends on: 1242454
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.