Closed Bug 781838 Opened 12 years ago Closed 11 years ago

Develop and stage zeus rule changes for browserid

Categories

(Cloud Services :: Operations: Miscellaneous, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gene, Assigned: gene)

Details

(Whiteboard: [qa+])

Develop and stage rule changes on 8/10 for these tickets

https://github.com/mozilla/browserid/issues/2038
https://github.com/mozilla/browserid/issues/2097

Deploy to prod on 8/20
Assignee: nobody → gene
Whiteboard: [qa+]
I did not manage to stage these changes on 8/10. I will try to find a block where I'm free between meetings on Wednesday or Thursday to stage them
Add in this request :

https://github.com/mozilla/browserid/issues/2038
I built a small test harness because all of these requests were getting confusing. Here are my assumptions about the behavior that's desired. The 3 columns are :
1. GET or POST : What method the request is
2. URL : The URL the client is calling
3. Result : Either the location that is returned in the http header for a 301/302 redirect, or if it's not a redirect, then the http status code

Could someone look through these test cases and correct any that I have wrong (since I'm developing the zeus rules against this)?

Testcases : 
GET http://diresworb.org/ https://login.anosrep.org/
GET https://diresworb.org/ https://login.anosrep.org/
GET http://www.diresworb.org/ https://login.anosrep.org/
GET https://www.diresworb.org/ https://login.anosrep.org/
GET http://anosrep.org/ https://login.anosrep.org/
GET https://anosrep.org/ https://login.anosrep.org/
GET http://www.anosrep.org/ https://login.anosrep.org/
GET https://www.anosrep.org/ https://login.anosrep.org/
GET http://static.login.anosrep.org/ https://login.anosrep.org/
GET https://static.login.anosrep.org/ https://login.anosrep.org/
GET https://login.anosrep.org/v/fb5534092a/production/browserid.css 200
GET http://verifier.login.anosrep.org/ https://verifier.login.anosrep.org/
GET https://verifier.login.anosrep.org/ 200
GET https://verifier.login.anosrep.org/verify 404
GET http://login.anosrep.org/ https://login.anosrep.org/
GET https://login.anosrep.org/ 200
POST http://login.anosrep.org/ 400
POST https://login.anosrep.org/ 404
POST http://verifier.login.anosrep.org/verify 400
POST https://verifier.login.anosrep.org/verify 400
Status: NEW → ASSIGNED
Here are the changes I've made to the zeus rules and the resulting before and after impact on the testcases :

Before :
FAILURE : GET : http://diresworb.org/ : https://login.anosrep.org/ : https://diresworb.org/
SUCCESS : GET : https://diresworb.org/ : https://login.anosrep.org/
FAILURE : GET : http://www.diresworb.org/ : https://login.anosrep.org/ : https://www.diresworb.org/
SUCCESS : GET : https://www.diresworb.org/ : https://login.anosrep.org/
FAILURE : GET : http://anosrep.org/ : https://login.anosrep.org/ : https://anosrep.org/
SUCCESS : GET : https://anosrep.org/ : https://login.anosrep.org/
FAILURE : GET : http://www.anosrep.org/ : https://login.anosrep.org/ : https://www.anosrep.org/
FAILURE : GET : https://www.anosrep.org/ : https://login.anosrep.org/ : 200
FAILURE : GET : http://static.login.anosrep.org/ : https://login.anosrep.org/ : https://static.login.anosrep.org/
FAILURE : GET : https://static.login.anosrep.org/ : https://login.anosrep.org/ : 200
SUCCESS : GET : https://login.anosrep.org/v/fb5534092a/production/browserid.css : 200
SUCCESS : GET : http://verifier.login.anosrep.org/ : https://verifier.login.anosrep.org/
SUCCESS : GET : https://verifier.login.anosrep.org/ : 200
SUCCESS : GET : https://verifier.login.anosrep.org/verify : 404
SUCCESS : GET : http://login.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://login.anosrep.org/ : 200
SUCCESS : POST : http://login.anosrep.org/ : 400
SUCCESS : POST : https://login.anosrep.org/ : 404
SUCCESS : POST : http://verifier.login.anosrep.org/verify : 400
SUCCESS : POST : https://verifier.login.anosrep.org/verify : 400



After :
SUCCESS : GET : http://diresworb.org/ : https://login.anosrep.org/
SUCCESS : GET : https://diresworb.org/ : https://login.anosrep.org/
SUCCESS : GET : http://www.diresworb.org/ : https://login.anosrep.org/
SUCCESS : GET : https://www.diresworb.org/ : https://login.anosrep.org/
SUCCESS : GET : http://anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : http://www.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://www.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : http://static.login.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://static.login.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://login.anosrep.org/v/fb5534092a/production/browserid.css : 200
SUCCESS : GET : http://verifier.login.anosrep.org/ : https://verifier.login.anosrep.org/
SUCCESS : GET : https://verifier.login.anosrep.org/ : 200
SUCCESS : GET : https://verifier.login.anosrep.org/verify : 404
SUCCESS : GET : http://login.anosrep.org/ : https://login.anosrep.org/
SUCCESS : GET : https://login.anosrep.org/ : 200
SUCCESS : POST : http://login.anosrep.org/ : 400
SUCCESS : POST : https://login.anosrep.org/ : 404
SUCCESS : POST : http://verifier.login.anosrep.org/verify : 400
SUCCESS : POST : https://verifier.login.anosrep.org/verify : 400



Changes :
--- /home/gene/sandbox/781838/1a
+++ /home/gene/sandbox/781838/1b
@@ -1,16 +1,14 @@
-$host = "login.anosrep.org";
 $path = http.getPath();
-$hostheader = http.getHostHeader();
 
 if ($path != "/__heartbeat__") {
-   if ($hostheader == "anosrep.org" ||
-       $hostheader == "diresworb.org" ||
-       $hostheader == "www.diresworb.org") {
+   $host = http.getHostHeader();
+   $sites = ["diresworb.org", "www.diresworb.org", "anosrep.org", "www.anosrep.org", "static.login.anosrep.org"];
+   if (array.contains($sites, $host)) {
       $qs = http.getQueryString();
       if ($qs == "") {
-         http.redirect("https://" . $host . $path);
+         http.redirect("https://login.anosrep.org" . $path);
       } else {
-         http.redirect("https://" . $host . $path . "?" . $qs);
+         http.redirect("https://login.anosrep.org" . $path . "?" . $qs);
       }
    }
 }



--- /home/gene/sandbox/781838/2a
+++ /home/gene/sandbox/781838/2b
@@ -1,7 +1,12 @@
 $method = http.getMethod();
 if ($method == 'GET' || $method == 'HEAD') {
+   $sites = ["diresworb.org", "www.diresworb.org", "anosrep.org", "www.anosrep.org", "static.login.anosrep.org"];
    $host = http.getHostHeader();
-   http.changeSite(string.append("https://", $host));
-}else {
+   if (array.contains($sites, $host)) {
+     http.changeSite(string.append("https://login.anosrep.org/"));
+   } else {
+     http.changeSite(string.append("https://", $host));
+   }
+} else {
    http.sendResponse('400 Bad Non-SSL Requests Please Use HTTPS', 'application/json', '{"error": "Please use HTTPS rather than HTTP"}', '');
 }
And here's the script to run the test cases

Test Script :
#!/bin/bash

cat testcases.txt | while read line; do
  set -- $line
  if [ "$1" = "POST" ]; then
    willpost="-d foo=bar"
  fi
  page=`curl $willpost -si --progress $2`
  if echo "$page" | egrep "^Location: " >/dev/null; then
    result=`echo "$page" | egrep "^Location: " | sed -e 's/Location: //g' | tr -d '\r'`
  else
    result=`echo "$page" | egrep "^HTTP/1\.1 " | sed -e 's#HTTP/1\.1 \([^ ]*\).*#\1#g' | tr -d '\r'`
  fi

  if [ "$result" = "$3" ]; then
    echo "SUCCESS : $1 : $2 : $3"
  else
    echo "FAILURE : $1 : $2 : $3 : $result"
  fi
done
I will need validation of the test cases I'm using (additions, corrections, etc) before being able to move forward with a production change. I'd originally planned to deploy to production yesterday but am delayed. As soon as I can get someone to look at these test cases, I can schedule a changewindow to change this in production.
I need to ponder this more, but one thing not right is that we redirect any request to static.login.anosrep.org to login.persona.org, which defeats the purpose of the static domain.

Also, nit, and don't know which language this is (tcl?) but is this a no-op?:
  'string.append("https://login.anosrep.org/")'
Gene: I believe there are some errors in these new rules.

To be complete, here they are all:

GET http://diresworb.org/ https://login.anosrep.org/

--> yes, assuming this is for all paths.

GET https://diresworb.org/ https://login.anosrep.org/

--> yes, assuming this is for all paths.


GET http://www.diresworb.org/ https://login.anosrep.org/

--> yes, assuming this is for all paths.

GET https://www.diresworb.org/ https://login.anosrep.org/

--> yes, assuming this is for all paths.

GET http://anosrep.org/ https://login.anosrep.org/

--> yes, I suggest doing this only for the top-level path "/", while other paths should give a 404 (so that folks don't start depending on other anosrep.org URLs.)

GET https://anosrep.org/ https://login.anosrep.org/

--> same as previous comment.

GET http://www.anosrep.org/ https://login.anosrep.org/

--> same as previous, suggest redirecting www.anosrep to anosrep for all paths, and letting other redirects / 404s handle it from there.

GET https://www.anosrep.org/ https://login.anosrep.org/

--> same as previous, suggest redirect www.anosrep to anosrep for all paths, and letting other redirects / 404s handle it from there.

GET http://static.login.anosrep.org/ https://login.anosrep.org/

--> for top-level path only.

GET https://static.login.anosrep.org/ https://login.anosrep.org/

--> for top-level path only.

GET https://login.anosrep.org/v/fb5534092a/production/browserid.css 200

--> great.

GET http://verifier.login.anosrep.org/ https://verifier.login.anosrep.org/

--> sure

GET https://verifier.login.anosrep.org/ 200

--> no, 404.

GET https://verifier.login.anosrep.org/verify 404

--> no, 405, method not allowed (but URL is okay.)

GET http://login.anosrep.org/ https://login.anosrep.org/

--> yes, for all paths.

GET https://login.anosrep.org/ 200

--> yes.

POST http://login.anosrep.org/ 400

--> no, 405.

POST https://login.anosrep.org/ 404

--> no, 405.

POST http://verifier.login.anosrep.org/verify 400

--> yes.

POST https://verifier.login.anosrep.org/verify 400

--> OH NO, that's definitely supposed to work, that's the URL we advertise.


Can you write the corresponding script, or would you like the dev team to work on this?
Ben, great comments!

This all does point out a need to formalize a script to verify all the various requests that should/should't work. I have a gist here: https://gist.github.com/3422644

On the OH NO, the gist has a bit of refinement to get a 200 from /verify that shows it got all the way through. 

But to get back to the original motivation for these changes, that was https://github.com/mozilla/browserid/issues/2097 - 'login.persona.org/verify can be used as a verifier'

Let's target a fix now to close off that URL and nothing else (and not closing off using www.persona.org as the main site GH-2038).
:jrgm > Also, nit, and don't know which language this is (tcl?) but is this a no-op?: 'string.append("https://login.anosrep.org/")'

Yup, good catch, I'll fix that.

:jrgm > Let's target a fix now to close off that URL and nothing else (and not closing off using www.persona.org as the main site GH-2038).

Difficulty is that the way that all this redirect logic was implemented initially is very fragile and it's very difficult to make changes that would only affect the request in issue 2097 and not impact the other redirects. That was the reason for trying to setup test cases. I'll see how hard it would be to do just one piece.

:jrgm, excellent test case script (in the gist). I'll go through and updated/add to it to reflect Ben's comments above, repost it and test/develop against it.
:jrgm
  I've forked and updated your gist of check-persona-url.py to include my understanding of Ben's requests above.

https://gist.github.com/3423771/95ff5cb7a6add5d0dda43e5930332577cb82dfe8
Ben,
  Just wanted to confirm a few of your test cases since they deviate from what I would expect. These questions are in paralell to the changes required for this ticket so this side conversation won't block me moving forward.

You'd said "GET http://static.login.anosrep.org/ https://login.anosrep.org/

--> for top-level path only."

Does this mean you really want to allow serving of static resources over http? For example, you want to serve back a 200 when a request for http://login.anosrep.org/v/fb5534092a/production/browserid.css ?


You'd said "POST http://verifier.login.anosrep.org/verify 400

--> yes."

Do you really want to serve back a 400 Bad Request on an http POST to "http://verifier.login.anosrep.org/verify" or would you like to serve a 405 Method Not Allowed like we do for other POSTs to http?


You'd said this "--> yes, I suggest doing this only for the top-level path "/", while other paths should give a 404 (so that folks don't start depending on other anosrep.org URLs.)" and  "suggest redirecting www.anosrep to anosrep for all paths, and letting other redirects / 404s handle it from there." about these urls :
GET http://www.anosrep.org/ https://login.anosrep.org/
GET https://www.anosrep.org/ https://login.anosrep.org/

Which would mean that :
1. a request for "http://www.anosrep.org/" would redirect to "http://anosrep.org/" which wound then do a second redirect to "https://login.anosrep.org/"
2. a request for "https://www.anosrep.org/" would redirect to "https://anosrep.org/" which wound then do a second redirect to "https://login.anosrep.org/"
3. a request for "http://www.anosrep.org/about" would return a 404
4. a request for "https://www.anosrep.org/about" would return a 404

Is this what you want instead of scenarios 1 and 2 doing a single redirect to "https://login.anosrep.org/"?
I have made the following changes to the staging zeus rules

http://gene.pastebin.mozilla.org/1770252

to better conform to my understanding of the test cases. Some of the test cases do not pass yet. I believe that the *requirements* for this ticket which originate in the 2 github issues mentioned originally are satisfied by these changes.

In order to move forward I'd like to propose this question :

Ben, as product owner, does the current behavior in staging (as illustrated either by hitting urls in staging or running the updated python test case tester) satisfy the requirements closely enough for you to approve moving to production? (yes or no requested)

If "no" we can continue the discussion on these test cases and what is desired. If "no" please indicate what we need to change either by 1. identifying an existing test case that doesn't pass but needs to, or 2. identifying an existing test case that needs to be changed and needs to pass, or 3. identifying a new test case that doesn't yet exist and must pass.
*** If (2) or (3), please manifest this changed or new test case, not in english in this ticket but in a forked/modified test case code base.

If "yes" I will schedule a change window and deploy to production
(In reply to Eugene Wood [:gene] from comment #13)
> 
> You'd said "GET http://static.login.anosrep.org/ https://login.anosrep.org/
> 
> --> for top-level path only."
> 
> Does this mean you really want to allow serving of static resources over
> http? For example, you want to serve back a 200 when a request for
> http://login.anosrep.org/v/fb5534092a/production/browserid.css ?

Ahah, good catch. So for I would suggest that

- top-level static path on HTTP redirect to top-level login.* path on HTTPS.
- every other URL on plain HTTP static should give a 404.

so that we only serve static content from https.

> You'd said "POST http://verifier.login.anosrep.org/verify 400
> 
> --> yes."
> 
> Do you really want to serve back a 400 Bad Request on an http POST to
> "http://verifier.login.anosrep.org/verify" or would you like to serve a 405
> Method Not Allowed like we do for other POSTs to http?

Let me tweak the request based on a conversation with Francois. For all plain HTTP requests to the verifier, we should give a 404 with an error message that says "the verifier is only available over SSL."

> 1. a request for "http://www.anosrep.org/" would redirect to
> "http://anosrep.org/" which wound then do a second redirect to
> "https://login.anosrep.org/"

Yes, I'm okay with this 2-step redirect because in case anyone caches the www --> non-www redirect (the first one), and we start serving diff stuff at anosrep.org, then I'd like that to work.

> 2. a request for "https://www.anosrep.org/" would redirect to
> "https://anosrep.org/" which wound then do a second redirect to
> "https://login.anosrep.org/"

Same reasoning. There's a chance we serve something different at anosrep.org in the future, don't want to write rules that imply that login.* and anosrep.org are the same site.

> 3. a request for "http://www.anosrep.org/about" would return a 404
> 4. a request for "https://www.anosrep.org/about" would return a 404

Yes, I'd like that. I would rather be stricter about the API we expose and then loosen it if needed.
Ben,
  2 things.

1. I still need a response to Comment 14 in order to move forward.
2. I've updated jrgm's script to accommodate your requests in Comment 15. ( https://github.com/jrgm/checkurl/blob/master/check-persona-url.py )

I'll wait on a yes or no response to Comment 14 and either schedule a change window to push to prod or  wait for you to make updates to the test cases.
Gene: is there a reason we need to push out these changes before we fix last details? If so, I'll work to make sure they're all good, but if not, I would prefer we hold off until we've got all the changes ready to go.
Ben, I'm in no hurry I just feel bad that the change was supposed to go out last Wednesday, I said I'd get it out this Monday and it's now Thursday. If everyone else is ok with the delay, I am too.
There are a couple of bugs in the test cases that I want to fix, one of which is quite important, and then I'd like a couple more pairs of eyes on this amount of change.

I'm also going to hand off this bug to Francois Marier to own, and I think I'm talking to him today :)
Assigning to Francois per comment #19.

Also - This adds risk to the release and is behind schedule.

Why is it a beta blocker? This improvement can be made afterwards.

@gene how comfortable are you with this change for Beta?
Assignee: gene → francois
In that I've already staged the changes that causes the current makeup of test cases to succeed and fail as they do, I'm comfortable deploying it to production if that's what we want. If there are further changes needed due to either new test cases or due to identification of existing failing test cases that should be passing, then I'd be less comfortable but still open.
So https://id.etherpad.mozilla.org/crew-devops said

> [Richard+Austin] figure out Zeus/Nginx refactor post beta1

Richard got his part done insanely fast, because he is awesome.

However, I don't want to add risk to beta accidentally.

QA has not done manual testing of these rules yet.

We should be ready to rollback this feature (in stage), if need be.
(In reply to Austin King [:ozten] from comment #22)
> So https://id.etherpad.mozilla.org/crew-devops said
> 
> > [Richard+Austin] figure out Zeus/Nginx refactor post beta1

For the record, this bug is *not* about Zeus/Nginx refactor.
(In reply to Richard Soderberg [:atoll] from comment #23)

Thanks, sorry for the noise.
Gene: I want to take a close look at all of the test cases before this is deployed. Ben thought that one of them would break backwards compatibility with existing sites using the old browserid.org URLs.

I'm planning to do this on Monday.
Gene, the answer to your question in comment 14 is "no" (i.e. the existing
rules are not ready for production) and so I have modified/added a few test
cases in my fork of checkurl:

  https://github.com/fmarier/checkurl

The reasoning for each change are in the commit messages.

Also, there are two things that aren't checked by this script and that I
believe should be done at the same time:

1- providing a matching "Allow" header along with any 405 responses as you
point out in https://github.com/mozilla/browserid/issues/2097#issuecomment-7932294

2- providing a "Strict-Transport-Security" header on all browserid.org
redirects (this is point 1 in https://bugzilla.mozilla.org/show_bug.cgi?id=779710 )

I would also suggest changing this error message from:

  "Wrong URL for the verifier. Please ask  the site administrator to use
  verifier.login.persona.org."

to:

  "Wrong URL for the verifier. Please ask the site administrator to use:
  https://verifier.login.persona.org/verify"

to make it more precise.

Finally, I've opened two new bugs ( https://github.com/mozilla/browserid/issues/2419
and https://github.com/mozilla/browserid/issues/2420 ) which may add/change
a few test cases depending on whether or not we decide to tackle these
related issues at the same time.
Due to https://github.com/mozilla/browserid/issues/2476 and the fact that this isn't a blocker for Beta, we're rolling back this Zeus config and we'll ship post-Beta.
I have updated my checkup fork to include tests for GH-2419 and GH-2420 (assuming our access logs show that we can cut these off now).
Assignee: francois → gene
Found a current flaw in current production rules :

zlb*.pub.scl2 hosts: Rule route-verifier-requests: Line 6: Rule selected an unknown pool 'scl2-persona'
These changes have been completed and deployed
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.