Closed Bug 773000 Opened 12 years ago Closed 12 years ago

multiple login prompts for intranet.mozilla.org/webtools

Categories

(Webtools :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jd, Assigned: brandon)

References

()

Details

Hi,

I am migrating the https://intranet.mozilla.org/webtools app to a new cluster. In the new location I am having an issue where I get a login prompt over and over again.

There are now multiple web servers and I think my auth token keeps getting invalidated when I get pages from different web servers (the current app is on only one server).  This can be seen by viewing the contents of the 'X-Backend-Server' response header.

The issue seems to be complicated due to the fact that the 'webtools/authenticate/login' page appears to use the 'Referrer' request header to determine where to redirect to after a successful login.  However if you come from page X from server1 then get a login page from server2 which posts back to server[1 or 3] (token being invalid for these servers) you get sent back to the login page but now with the 'Referrer' header set to 'webtools/authenticate/login'.  As you can see this places you in an endless loop.

So there are three possible outcomes that I can find:
1) It works (the red herring outcome)
2) You get double auth (misleading as it looks like a simple Apache auth label issue)
3) You fall into the endless auth loop (the hypnotic spiral syndrome)

In order to test any of this you will need the following in your hosts file:
63.245.216.89 intranet.mozilla.org

There is a new dev and stage site (both of which have a single web server and work):
https://intranet-dev.allizom.org/webtools/ (dev) (not yet auto-updating)
https://intranet.allizom.org/webtools/ (stage)

Finally I should point out that I placed the application/cache and application/logs folders on NFS to be shared by all the production web servers.  It is possible that I have missed some additional folder in need of sharing.

Please let me know if I can provide any additional information or clarity.
Brandon, can you help jd out with this since I'm on vacation next week?
Assignee: laura → bsavage
Sure, I can help.

Jd, do you have ANY idea where the code is in a Mozilla repo?
Brandon,

The code for this lives here:

https://github.com/mozilla/webtools-workermgmt.git

and we are at 94ea47d on production.

Please let me know if I can be of further help.
Brandon,

Hey, we are hoping to switch this over this weekend.  Do you think you will have this working by COB Friday?

tia
I was off today, but I will finish this up tomorrow with any luck.
This is a tricky bug. I'll continue working on it, but it's unlikely to be fixed, reviewed and resolved by the weekend.
Thanks for the update, please continue to keep me informed.

Cheers
Brandon,

Hey there, just checking in to see if things are looking good for a go this week?

Thanks
I'm stuck on this bug. I'm going to touch base with Laura tomorrow.
Any chance I could get a second dev server so I could try and reproduce the problem? Ideally this dev server would be one I could log into and poke around at.
Brandon,

I am not sure I can do this in a reliable fashion without quite a lot of work (days).  I can probably figure out how to give you some temporary access to the current dev server, but I am not sure that will help as it is a single instance and the bug does not exist there.  It only exists on the production environment as there are multiple web servers there.

If you tell me what information you are looking for I might be able to collect it from the production servers for you.  Would this help or am I way off the mark?
Without a second server, I won't be able to debug this easily. I've struck out reproducing it locally, I really need to be able to poke around, change code, and work on things directly on a dual setup, if this bug is going to be easy or quick to fix.
Brandon,

I am trying to think about how to set this up without creating a whole new cluster (would take days).  What level of access will you need on the web servers in order to figure out what you need?  I am thinking of creating you an account on the production gear (which is not yet live) but I would not be able to give you root access.  Also I would need to explain to you how to deploy code (via a sudo command on an admin node).

If this is not workable, is there a way you can create some vmware or similar vm locally?

What is the normal procedure that the devs use to test this sort of thing?  Do you use the petri project or anything like that?  Not positive that would help at any rate, I am just curious.
I think that would be a workable compromise. As long as we can deploy new code and watch the process take place, that should work.
Brandon,

Okay, 2 more questions.

1) Who is we?  How many accounts are we talking about now?

2) What do you need to "watch" in terms of progress?  This is what I am asking in comment 13 when I ask about what level of access you need.  What precisely will you look at on the web servers, or if it is easier, what commands or tools will you need to execute. Not trying to be difficult here but the permissions on the production web servers can be a bit finicky.
Brandon,

User: bsavage (SSH_KEY== bsavage_gen_1)

Hosts:
intranetadm.private.corp.phx1.mozilla.com (/data/intranet/src/intranet.mozilla.org/webtools/)
intranet1.webapp.corp.phx1.mozilla.com (/data/www/intranet.mozilla.org/webtools/)
intranet1.webapp.corp.phx1.mozilla.com
intranet3.webapp.corp.phx1.mozilla.com

As per the norm you must use either a jumphost or be on a datacenter relevant VPN.
Brandon,

Just checking in to make sure the new access I set up for you is working as expected?  Is this allowing you to make requisite progress?

Cheers
Brandon,

Checking in again for an update.  When can I plan to have this issue resolved?

Thanks
This is currently blocking a Q3 goal for IT/Ops, what is the ETA to have this fixed?
Per email this bug is actively being worked on. Will update with an ETA after diagnosis.
I tried a few different ways into these machines but I wasn't able to get in. Is there a jumphost I should try? I have the MPT VPN but not any of the specific datacenters.
Brandon,

You should be able to log in from the MPT VPN now.  Please let me know if you have any issues with this.

Regards
Thanks, I'm working on this now.
I'm unable to get a different server besides intranet1.webapp.corp.phx1.mozilla.com. Any method of forcing the load balancer to randomly assign me a server?
Brandon discovered that this was the old php_session issue. I placed the php_value in the location match of the apache vhost file, created the appropriate folder, etc, etc and all is well.

Thanks to Brandon for figuring out what the issue was.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.