Closed Bug 659569 Opened 13 years ago Closed 13 years ago

Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in redirect loop.

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
Tracking Status
firefox5 - affected
firefox6 - fixed
firefox7 --- fixed

People

(Reporter: jst, Assigned: bjarne)

References

Details

(Keywords: regression, Whiteboard: [qa-])

Attachments

(6 files)

This is so far something I've only ever seen when trying to log into the admin interface on a Ubiquiti wifi radio and but so far it seems like something that can happen elsewhere as well. The problem is that Firefox gets stuck in a redirect loop and ends up displaying the error for that. Hitting try again gets past that and all the way to the login screen, but logging in is unsuccessful. If I at this point clear the cache, I'm able to log in and things work fine for days until I get kicked out and need to log in again.

If I test with a clean profile and disable both the disk cache and the memory cache before navigating to the radio interface then I don't see the above redirect problem.

This is a regression since 4, so I think we should track this for 5 and 6 as this could be a more widespread problem than what I'm seeing.

Unfortunately I so far can only reproduce this issue with one of these radios locally, but I'm bisecting right now to see what changeset caused this, and maybe that'll shed some light on what might be going on here. I can attach logs etc too, if it's not clear from the changeset itself.
This regressed with revision http://hg.mozilla.org/mozilla-central/rev/6fe1e7832dc8. Cc:ing Bjarne.
IOW, the fix for bug 561276 caused this.
Depends on: 561276
This is a URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5 log from before this broke (changeset b490b2316729, with s/PRUInt32/PRUint32/ in nsHttpTransaction.cpp to make it build).
... and in those logs the failing load is from 192.168.1.1.
This redirection uses a pattern which causes nsHttpChannel::CheckCache() to clear |mRedirectedCachekeys| prematurely, hence we lose the redirect-chain and the solution doesn't work anymore. Actually a quite stupid mistake in the original fix, but that's life once in a while...

I was able to reproduce by modifying test for bug #561276 to use the same pattern. Fix is to *not* clear |mRedirectedCachekeys| at the wrong time - see patch.

Passes local testing. Pushed to tryserver.
Assignee: nobody → bjarne
Status: NEW → ASSIGNED
Attachment #535036 - Flags: review?(bzbarsky)
Blocks: 561276
No longer depends on: 561276
Comment on attachment 535036 [details] [diff] [review]
Proposed solution

r=me
Attachment #535036 - Flags: review?(bzbarsky) → review+
Thanks for the quick fix Bjarne! Seems to work wonderfully for the initial page that ends up stuck in the redirect loop, but the need to clear the cache before being able to log in remains. I'll capture logs for that as well.
This is a log with Bjarne's patch applied of the same working build where the login page loads fine, but the login itself fails. The behavior here is that the login page loads, I enter the credentials and hit enter and I'm presented with the same login page again instead of actually getting logged in etc.

The radio name in this question is ap-bud instead of an IP address (sorry, had to switch radios here).
Er, sorry, I confused myself there... The previous log is of a working login, where the login succeeds. The upcoming log is of a session where the login itself does not work.
Here the login page loads fine (no redirect loop), but logging in brings me back to the login page rather than logging in. If I clear the cache, I can successfully log in.
can you guys get this into central so we can get this tested? will consider for 5 and 6.
Asa, this is only partially fixed by the attached patch. We could probably land that separately (and I'll go ahead and do that), but we need more work here.

Bjarne, if it's not clear from the logs what's going on here I could bring in one of these radios and hook it up to the network in the office so you can test it yourself. Let me know if that would help.
Proposed solution pushed as:

http://hg.mozilla.org/mozilla-central/rev/f584467626df

But there's more to do here, so leaving this bug open.
(In reply to comment #13)
> Bjarne, if it's not clear from the logs what's going on here I could bring
> in one of these radios and hook it up to the network in the office so you
> can test it yourself. Let me know if that would help.

That would be infinitely useful, please...  :)
Ok, will do. I'll send you the address and login info etc once it's set up (hopefully tomorrow).
Thanks for the test-setup. Makes life a lot simpler...  :)

The login-issue can be described as follows: From the login-page we POST the credentials. The response is a 302 pointing to "/" and necko translates this into a "GET /" request (see bug #598304). We have already loaded "/" and the result is cached, and it happens to be a 301 pointing to the login-page. Necko follows the redirect to the login-page and we're back to the beginning. The redirect-chain doesn't help here because the response to the login-page is a 200.

This worked previously because necko always validated cached redirects if a cookie was set in the request, hence it didn't use the cached 301-response. The real problem is that the 301-response is cached and it could easily be resolved on the server by specifying no-cache or must-validate.

(Note btw that changing the request-method on a 302 (or 301 for that matter) is dead wrong (see RFC2616, notes at the end of sections 10.3.2 and 10.3.3) but according to bug #598304 it might be necessary to avoid breaking the web.)

But how do we fix this? The original code used the presence of a cookie in the request to detect the situation, but bug #561276 decided that this was too coarse-grained. IMO this situation may occur only for replacement-channels where the request-method has changed. I.e. we could carry fwd the request-method of an originating channel to the replacement-channel and compare.
So the server should be sending a "Vary: Cookie" in this case and is not, right?
Not really. The cookie is also sent in the earlier, cached request. I assume the server applies a design-pattern which uses the cookie to identify a session but maintains state internally in the session, returning 301's to different pages depending on state. The real, uhh, confusing bit is why the server uses a 301 ("moved permanently") for this. By just replacing 301 with 302 the problem would be resolved since 302s are not cached unless explicitly specified...  :(

See e.g. attachment #535107 [details]: The first request for "/" results in a 301 to "cookiechecker". The request for "cookiechecker" results in a 301 back to "/". Then, the request for "/" is found in the cache but CheckCache() decides to validate it (probably because it's captured without the redirect-chain-patch). The subsequent request for "/" sets the cookie, but now the server responds with a 301 to "login". The request for "login" also sets the cookie and results in a 200 and at this point we have a cached 301-response for "/" with the cookie set. After this we see a number of req/res for resources probably displayed on the login-page, before we get to the POST with credentials. Server responds with a 302 to "/" and observe that CheckCache() decides to validate the cached 301-entry for "/" again. This validation saves us because we get a 200 when requesting "/" with cookie set again. Normally we should have re-used the cached 301 which would send us back to "login". This is what happens in attachment #535110 [details].

The question is whether we should hack around this or not. I have a patch which does as suggested in last para of comment #17, but it essentially breaks RFC2616 (i.e. in this case makes the 301 non-cacheable) in order to allow servers to use a design-pattern and misuse 301 like described above. IMO we should go back to whoever writes this stuff and tell them to use 302 instead in these cases.
Fwiw, I tested IE and Chrome with one of these radios and neither browser has a problem logging into them. So I think we need to do *something* here, but I don't know enough about this to say what.
(In reply to comment #19)
> Not really. The cookie is also sent in the earlier, cached request.

To clarify: The cookie-values in this case are the same so "Vary: Cookie" wouldn't help, unfortunately.

(In reply to comment #20)
> Fwiw, I tested IE and Chrome with one of these radios and neither browser
> has a problem logging into them. So I think we need to do *something* here,
> but I don't know enough about this to say what.

Would be interesting to know if IE/Opera receives the same content we do... Otoh I perfectly agree that we must do something - IMO we have 3 options:

1) apply the hack I'll attach in a little while (deliberately allowing such behaviour from servers)

2) back out the fix for bug #561276 (losing the advantage of caching 3xx's which are used correctly)

3) use the power of our current market-share and communicate with whoever writes this software and make them change it

I've encountered the dilemma 1 vs 3 in a number of other bugs and I'd expect to see it in also in future issues, hence it might be worth to take this discussion on a principal level in order to get clear guidelines (or maybe there are some already that I'm unaware of..?)

I don't know how widespread this software is (or whether it is maintained and upgraded) so I'll leave the decision of what to do to others.
I don't like this solution but it allows me to log into the wireless and navigate it. If we want to follow this path, and if considered necessary, I could probably put together a test.
Attachment #535589 - Flags: review?(bzbarsky)
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

Btw, it passes local testing and has been sent to tryserver.
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

Looks ok to me from a code point of view.  I think Patrick or Jason are more qualified to review the HTTP interaction "should we do this?" end of things.
Attachment #535589 - Flags: review?(mcmanus)
Attachment #535589 - Flags: review?(jduell.mcbugs)
Attachment #535589 - Flags: review?(bzbarsky)
Attachment #535589 - Flags: review+
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

I would not be surprised if other browsers are invalidating the target of the redirect from the POST, since the spec says [1]:

"Some HTTP methods MUST cause a cache to invalidate an entity. This is either the entity referred to by the Request-URI, or by the Location or Content-Location headers (if present). These methods are: PUT, DELETE, POST. In order to prevent denial of service attacks, an invalidation based on the URI in a Location or Content-Location header MUST only be performed if the host part is the same as in the Request-URI."

I think we should do precisely what the spec says: If the host in the Location response header field of a POST (or PUT or DELETE) is the same as the host in the request URI, then doom the entries mentioned in the Location header field (and, similarly for Content-Location) when processing the POST (or PUT or DELETE).

[1] http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache-14#section-2.5
Attachment #535589 - Flags: feedback-
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

I don't have a problem with what the patch does, so I'm happy to r+ it. We more or less want to bring the not-from-cache semantics of the POST through to the rewritten request.

If I understand Brian's suggestion it boils down to invalidating any cached resource (probably from a GET) that has POST/PUT/etc performed on it even for future GETs.. that has a certain logical attraction to it, but I would want to ponder how that interaction impacts the cache for deployed websites and imo we  should study that in a different bug.
Attachment #535589 - Flags: review?(mcmanus) → review+
(In reply to comment #25)
> I would not be surprised if other browsers are invalidating the target of
> the redirect from the POST

A fresh pair of eyes is often fruitful - this would be a far better solution (if it applies, which I believe it will)!

I remember reviewing a patch doing something like this some time ago but I cannot find it in any of my searches... Iirc the patch was ok but should be extended with more details so I believe I cleared the r-request. I'll try to dig it up somehow...
Yeah - this is probably also covered by RFC2616 section 13.10 as stated in bug #618835, comment #5. Wonder why I didn't recall this one - good catch Brian. :)

I'll verify that the approach from #618835 fixes the remaining issue of this bug.
WIP patch from bug #618835 fixes this problem. Since comment #14 pushed the fix for the original problem described in this bug, I resolve the remaining issue as a dup of bug #618835. Feel free to correct me if this is wrong procedure.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Updating title to match what was fixed here. Thanks again Bjarne for digging in here! We'll continue tracking the remaining issue in bug 618835. And I'm going to mark this fixed instead, not that it makes much difference...
Resolution: DUPLICATE → FIXED
Summary: Unable to log into Ubiquiti Network radio equipment w/o clearing the cache. → Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in redirect loop.
Comment on attachment 535036 [details] [diff] [review]
Proposed solution

This needs to land on 5 beta and 6 aurora.
Attachment #535036 - Flags: approval-mozilla-beta+
Attachment #535036 - Flags: approval-mozilla-aurora+
Even though this is a Firefox 5 regression, we are going to remove tracking and ship without it because:

1. We don't think the issue is widespread
2. The workaround is pretty simple (just clicking the try again button)
3. The changes are in http code and we are late in the beta cycle

Clearing version-specific tracking flags.
Attachment #535036 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Bjarne, do you need to request aurora review to land this for Fx6?
This is not really a prerequisite for bug #618835 but it is related. Please observe that attachment #535036 [details] [diff] [review] was already approved for aurora (afaics) so I guess I can just request check-in?
Is bug 666021 referring to the same issue as the one that was fixed here? Thanks.
(In reply to comment #36)
> Is bug 666021 referring to the same issue as the one that was fixed here?

Most likely. Could you try the latest nightly build...?
(In reply to Christian Legnitto [:LegNeato] from comment #32)
> 2. The workaround is pretty simple (just clicking the try again button)

I'm not sure it's worth pointing out at this late date, but the Try Again button is not sufficient to resolve this issue. It gets past the redirect warning, but then one gets stuck on an infinite loop on the device's login page. That is, the login page is presented, user enters a username/password and clicks the submit button (Login), and the login page is again presented, ad-infinitum.

My solution has been to use another browser to manage this device, which is kind of annoying, especially because most of the time when I want to manage it it is because there is a network problem in-progress...
Attachment #535589 - Flags: review?(jduell.mcbugs)
qa-: QA is unable to verify this fix, Johnny can you please verify the fix?
Whiteboard: [qa-]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> qa-: QA is unable to verify this fix, Johnny can you please verify the fix?

Sorry for the delay, I wasn't sure if this was targetted at mortals. Under WinXP, Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device that FF6 could not talk to (running Firmware Version: 	XM.v5.2.1). The Aurora particulars:
 Build identifier: Mozilla/5.0 (Windows NT 5.1; rv:9.0a2) Gecko/20110930 Firefox/9.0a2
 Built from http://hg.mozilla.org/releases/mozilla-aurora/rev/79eab065f0d0

So looks good!!
> Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device
> that FF6 could not talk to 

I suppose I need to amend that. I'm afraid I lost track of what version I saw the failure under. It was probably FF4, which appears to have been upgraded away from this system. It appears that FF7.0.1 also works just fine to connect to this device.
Yup, this problem is now a thing of the past. Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: