multiple login prompts for intranet.mozilla.org/webtools

RESOLVED FIXED

Status

RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jd, Assigned: brandon)

Tracking

Trunk
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

6 years ago
Hi,

I am migrating the https://intranet.mozilla.org/webtools app to a new cluster. In the new location I am having an issue where I get a login prompt over and over again.

There are now multiple web servers and I think my auth token keeps getting invalidated when I get pages from different web servers (the current app is on only one server).  This can be seen by viewing the contents of the 'X-Backend-Server' response header.

The issue seems to be complicated due to the fact that the 'webtools/authenticate/login' page appears to use the 'Referrer' request header to determine where to redirect to after a successful login.  However if you come from page X from server1 then get a login page from server2 which posts back to server[1 or 3] (token being invalid for these servers) you get sent back to the login page but now with the 'Referrer' header set to 'webtools/authenticate/login'.  As you can see this places you in an endless loop.

So there are three possible outcomes that I can find:
1) It works (the red herring outcome)
2) You get double auth (misleading as it looks like a simple Apache auth label issue)
3) You fall into the endless auth loop (the hypnotic spiral syndrome)

In order to test any of this you will need the following in your hosts file:
63.245.216.89 intranet.mozilla.org

There is a new dev and stage site (both of which have a single web server and work):
https://intranet-dev.allizom.org/webtools/ (dev) (not yet auto-updating)
https://intranet.allizom.org/webtools/ (stage)

Finally I should point out that I placed the application/cache and application/logs folders on NFS to be shared by all the production web servers.  It is possible that I have missed some additional folder in need of sharing.

Please let me know if I can provide any additional information or clarity.
(Reporter)

Updated

6 years ago
Blocks: 774158

Comment 1

6 years ago
Brandon, can you help jd out with this since I'm on vacation next week?
Assignee: laura → bsavage
(Assignee)

Comment 2

6 years ago
Sure, I can help.

Jd, do you have ANY idea where the code is in a Mozilla repo?
(Reporter)

Comment 3

6 years ago
Brandon,

The code for this lives here:

https://github.com/mozilla/webtools-workermgmt.git

and we are at 94ea47d on production.

Please let me know if I can be of further help.
(Reporter)

Comment 4

6 years ago
Brandon,

Hey, we are hoping to switch this over this weekend.  Do you think you will have this working by COB Friday?

tia
(Assignee)

Comment 5

6 years ago
I was off today, but I will finish this up tomorrow with any luck.
(Assignee)

Comment 6

6 years ago
This is a tricky bug. I'll continue working on it, but it's unlikely to be fixed, reviewed and resolved by the weekend.
(Reporter)

Comment 7

6 years ago
Thanks for the update, please continue to keep me informed.

Cheers
(Reporter)

Comment 8

6 years ago
Brandon,

Hey there, just checking in to see if things are looking good for a go this week?

Thanks
(Assignee)

Comment 9

6 years ago
I'm stuck on this bug. I'm going to touch base with Laura tomorrow.
(Assignee)

Comment 10

6 years ago
Any chance I could get a second dev server so I could try and reproduce the problem? Ideally this dev server would be one I could log into and poke around at.
(Reporter)

Comment 11

6 years ago
Brandon,

I am not sure I can do this in a reliable fashion without quite a lot of work (days).  I can probably figure out how to give you some temporary access to the current dev server, but I am not sure that will help as it is a single instance and the bug does not exist there.  It only exists on the production environment as there are multiple web servers there.

If you tell me what information you are looking for I might be able to collect it from the production servers for you.  Would this help or am I way off the mark?
(Assignee)

Comment 12

6 years ago
Without a second server, I won't be able to debug this easily. I've struck out reproducing it locally, I really need to be able to poke around, change code, and work on things directly on a dual setup, if this bug is going to be easy or quick to fix.
(Reporter)

Comment 13

6 years ago
Brandon,

I am trying to think about how to set this up without creating a whole new cluster (would take days).  What level of access will you need on the web servers in order to figure out what you need?  I am thinking of creating you an account on the production gear (which is not yet live) but I would not be able to give you root access.  Also I would need to explain to you how to deploy code (via a sudo command on an admin node).

If this is not workable, is there a way you can create some vmware or similar vm locally?

What is the normal procedure that the devs use to test this sort of thing?  Do you use the petri project or anything like that?  Not positive that would help at any rate, I am just curious.
(Assignee)

Comment 14

6 years ago
I think that would be a workable compromise. As long as we can deploy new code and watch the process take place, that should work.
(Reporter)

Comment 15

6 years ago
Brandon,

Okay, 2 more questions.

1) Who is we?  How many accounts are we talking about now?

2) What do you need to "watch" in terms of progress?  This is what I am asking in comment 13 when I ask about what level of access you need.  What precisely will you look at on the web servers, or if it is easier, what commands or tools will you need to execute. Not trying to be difficult here but the permissions on the production web servers can be a bit finicky.
(Reporter)

Comment 16

6 years ago
Brandon,

User: bsavage (SSH_KEY== bsavage_gen_1)

Hosts:
intranetadm.private.corp.phx1.mozilla.com (/data/intranet/src/intranet.mozilla.org/webtools/)
intranet1.webapp.corp.phx1.mozilla.com (/data/www/intranet.mozilla.org/webtools/)
intranet1.webapp.corp.phx1.mozilla.com
intranet3.webapp.corp.phx1.mozilla.com

As per the norm you must use either a jumphost or be on a datacenter relevant VPN.
(Reporter)

Comment 17

6 years ago
Brandon,

Just checking in to make sure the new access I set up for you is working as expected?  Is this allowing you to make requisite progress?

Cheers
(Reporter)

Comment 18

6 years ago
Brandon,

Checking in again for an update.  When can I plan to have this issue resolved?

Thanks
This is currently blocking a Q3 goal for IT/Ops, what is the ETA to have this fixed?
(Assignee)

Comment 20

6 years ago
Per email this bug is actively being worked on. Will update with an ETA after diagnosis.
(Assignee)

Comment 21

6 years ago
I tried a few different ways into these machines but I wasn't able to get in. Is there a jumphost I should try? I have the MPT VPN but not any of the specific datacenters.
Depends on: 792023
(Reporter)

Comment 22

6 years ago
Brandon,

You should be able to log in from the MPT VPN now.  Please let me know if you have any issues with this.

Regards
(Assignee)

Comment 23

6 years ago
Thanks, I'm working on this now.
(Assignee)

Comment 24

6 years ago
I'm unable to get a different server besides intranet1.webapp.corp.phx1.mozilla.com. Any method of forcing the load balancer to randomly assign me a server?
(Reporter)

Comment 25

6 years ago
Brandon discovered that this was the old php_session issue. I placed the php_value in the location match of the apache vhost file, created the appropriate folder, etc, etc and all is well.

Thanks to Brandon for figuring out what the issue was.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.