Closed Bug 1281977 Opened 8 years ago Closed 8 years ago

auth checks in pingdom

Categories

(Infrastructure & Operations :: MOC: Projects, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdevney, Assigned: mdevney)

Details

Attachments

(1 file)

The following auth checks are showing down on pingdom, but not alerting in any other way. Is this expected behavior? Bad checks? Actionable alerts? mxr.mozilla.org/addons mxr.mozilla.org/amo-stage passwordreset-dev.allizom.org passwordrest.allizom.org phonebook-dev.allizom.org phonebook.allizom.org phonebook.mozilla.org plugins.allizom.org pto.allizom.org www-demo1.allizom.org www-demo3.allizom.org www-demo4.allizom.org www-dev.allizom.org For contrast many other auth checks are fine, such as reps.mozilla.org, pto.mozilla.org, netops-apps.allizom.org.
Tested phonebook.mozilla.org and intranet.mozilla.org. I can login to both. Suspect the checks are wrong.
Both these checks are configured to go to https://<URL> and look for a 401 code. In prod (plugins.mozilla.org) that happens: $ curl --head https://plugins.mozilla.org HTTP/1.1 401 Authorization Required Server: Apache X-Backend-Server: plugins2.webapp.phx1.mozilla.com WWW-Authenticate: Basic realm="Use your ldap username/password" Vary: User-Agent, Accept-Encoding Cache-Control: max-age=3600 Content-Type: text/html; charset=iso-8859-1 Strict-Transport-Security: max-age=31536000; includeSubDomains Date: Fri, 24 Jun 2016 03:04:03 GMT Expires: Fri, 24 Jun 2016 4:04:03 GMT Transfer-Encoding: chunked Via: Moz-Cache-zlb1 Connection: Keep-Alive X-Cache-Info: not cacheable; response code not cacheable In plugins.allizom.org we get something completely different: $ curl --head https://plugins.allizom.org HTTP/1.1 303 See Other Server: Apache X-Backend-Server: plugins2.stage.webapp.phx1.mozilla.com Cache-Control: private, must-revalidate Content-Type: text/html; charset=iso-8859-1 Date: Fri, 24 Jun 2016 03:04:14 GMT Location: https://plugins.allizom.org/mellon/login?ReturnTo=https%3A%2F%2Fplugins.allizom.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17zkmwil4G5FGu1d8 Transfer-Encoding: chunked Connection: Keep-Alive That backend server plugins2.stage.webapp.phx1.mozilla.com belongs to webops, so maybe ask them?
< gcox> mxr is dead, fwiw. see the tail of 1279952. https://bugzilla.mozilla.org/show_bug.cgi?id=1279952 Removing mxr from pingdom.
Assignee: nobody → mdevney
Component: MOC: Problems → MOC: Projects
Auth for multiple websites were moved from LDAP -> 2FA/Okta/Mellon. This would've broken the check, since the websites would now do auth via web vs. HTTP auth.
Thanks Ashish. Should we remove these checks? Or update them for 2fa/etc.?
20:35 < atoll> if anything pingdom alarms about securitywiki, change it from expecting 401 to expect 303. 20:36 < jlaz> atoll: roger dodger 20:45 < jedi> atoll: does that go for any site that's been switched over from ldap auth to 2fa/okta ? 20:45 -!- arr1 [arr@moz-428jqk.01t5.ifmj.0181.2601.IP] has joined #systems 20:45 < atoll> yes 20:46 < jedi> :D
Updated pingdom checks from 401 to 303. Now they SHOULD work. Example: $ curl --head https://phonebook.mozilla.org HTTP/1.1 303 See Other Server: Apache X-Backend-Server: generic3.webapp.phx1.mozilla.com Vary: Accept-Encoding Cache-Control: private, must-revalidate Content-Type: text/html; charset=iso-8859-1 Date: Wed, 29 Jun 2016 05:14:04 GMT Location: https://phonebook.mozilla.org/mellon/login?ReturnTo=https%3A%2F%2Fphonebook.mozilla.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17eqkpj9pJg4C11d8 Transfer-Encoding: chunked Connection: Keep-Alive Set-Cookie: X-Mapping-ddfpehom=A0CC3698F33F00F4F289BE6E35D1250A; path=/ X-Cache-Info: not cacheable; response code not cacheable ...but they don't. Opened Pingdom support ticket 169759.
Eric Masser (Pingdom) Jun 29, 13:47 CEST Hi Matthew, Thank you for contacting us! I am running a simple GET request from each of our probe servers and they are reporting a SSL/TLS issue. Are you seeing the Transaction Monitors requests in the logs for your server? Best regards, Eric Masser www.pingdom.com Hi Eric, What IP(s) or hostname(s) should I grep for? Also, what ssl error do you see? I don't see any errors... Everything checks out with 'openssl s_client -connect host:443' Please advise. Hi again, The TMS uses this user-agent string "(Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.1 Safari/534.34)". The issue is probably not a certificate issue as our probes does not validate certificates. I will have to ask our operations team to take a look as to what this TLS/SSL error might mean tomorrow. Best regards, Eric Masser www.pingdom.com Hi Eric, Yes, we do see that user-agent hitting our hosts. See below. Looks like that user-agent is properly getting served a 303 as expected. access_2016-06-29-05.gz:64.120.6.122 - - [29/Jun/2016:05:27:19 +0000] "GET / HTTP/1.1" 303 352 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-05.gz:64.120.6.122 - - [29/Jun/2016:05:27:19 +0000] "GET /mellon/login?ReturnTo=https%3A%2F%2Fintranet.mozilla.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17jns1dxOPXVPy1d8 HTTP/1.1" 303 1333 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-05.gz:64.120.6.122 - - [29/Jun/2016:05:28:36 +0000] "GET / HTTP/1.1" 303 352 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-05.gz:64.120.6.122 - - [29/Jun/2016:05:28:36 +0000] "GET /mellon/login?ReturnTo=https%3A%2F%2Fintranet.mozilla.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17jns1dxOPXVPy1d8 HTTP/1.1" 303 1337 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-11.gz:64.120.6.122 - - [29/Jun/2016:11:41:49 +0000] "GET / HTTP/1.1" 303 352 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-11.gz:64.120.6.122 - - [29/Jun/2016:11:41:49 +0000] "GET /mellon/login?ReturnTo=https%3A%2F%2Fintranet.mozilla.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17jns1dxOPXVPy1d8 HTTP/1.1" 303 1349 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-11.gz:64.120.6.122 - - [29/Jun/2016:11:46:53 +0000] "GET / HTTP/1.1" 303 352 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" access_2016-06-29-11.gz:64.120.6.122 - - [29/Jun/2016:11:46:53 +0000] "GET /mellon/login?ReturnTo=https%3A%2F%2Fintranet.mozilla.org%2F&IdP=http%3A%2F%2Fwww.okta.com%2Fexk17jns1dxOPXVPy1d8 HTTP/1.1" 303 1345 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 PhantomJS/1.9.0 Safari/534.34" Please advise what may be the matter. Thank you!
Interpreting the Pingdom interactive UI for building a check, they do not offer the ability to check on 3XX codes so you might explicitly confirm that understanding with Pingdom support. I would also clarify that we can't even seem to get the check to pass after we finally get to a status 200 page when we update the check to look for 200. Another interesting thing that might be freaking out Pingdom's check methods, there are multiple redirects before we even get to the 200 auth page. curl -L -k -v https://phonebook.mozilla.org 2>&1|grep '< HTTP' < HTTP/1.1 303 See Other < HTTP/1.1 303 See Other < HTTP/1.1 302 Moved Temporarily < HTTP/1.1 200 OK
Eric Masser (Pingdom) Jun 30, 16:26 CEST Hi again, It seems as though the responses from your API contains location headers as well, this means that the Transaction Monitor will follow those redirects, just like a regular browser would. Best regards, Eric Masser www.pingdom.com Hi Eric, There are 3 redirects here: < HTTP/1.1 303 See Other < HTTP/1.1 303 See Other < HTTP/1.1 302 Moved Temporarily < HTTP/1.1 200 OK If any of those codes resulted in the check coming up green I think that would do it for us. However, see screenshot attached (rca.png). "HTTP status code should be 303 but is unavailable." This error seems pretty clear that the agent just isn't seeing, or isn't reporting, the code 303. The received header below seems to confirm this. Seems like it should show any/all HTTP headers that it saw, but it's showing none. Please advise how to make this check work. Thanks!
Attached image rca.png
Screenshot of pingdom's "Root Cause Analysis" pane showing no headers where headers should be.
Hi again, The Transaction Monitor will only test the last response in a redirect chain, which means that you can't test the status code of the redirecting responses. The logs you showed earlier only seem to contain 300 series status codes and not the final 200 OK which is seen if you load the URL in a browser. It could be something advanced going on with the redirect to Okta, this does however work fine: Go to URL https://intranet.mozilla.org/ Wait for element #user-signin to exist Best regards, Eric Masser
Eric's recommended check actually seems to work -- and is probably a more thorough check anyway. I've implemented it on most of the erroring auth checks. Example check: Go to URL https://intranet.mozilla.org/ Wait for element #user-signin to exist
We're down to just 1 alert still alerting in Pingdom. I think we can close this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
So PingDom indirectly confirmed you can't monitor any 3XX level status codes, boo. Also, they didn't seem to address why we don't see any headers in the "RCA" report, boo.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: