Last Comment Bug 659569 - Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in redirect loop.
: Unable to connect to Ubiquiti Network radio equipment w/o getting stuck in re...
Status: VERIFIED FIXED
[qa-]
: regression
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: ---
Assigned To: Bjarne (:bjarne)
:
: Patrick McManus [:mcmanus]
Mentors:
: 672394 676392 (view as bug list)
Depends on:
Blocks: 561276
  Show dependency treegraph
 
Reported: 2011-05-24 22:24 PDT by Johnny Stenback (:jst, jst@mozilla.com)
Modified: 2013-12-27 14:30 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
affected
-
fixed
fixed


Attachments
Log from before this broke, login page loads fine. (483.60 KB, text/plain)
2011-05-25 00:16 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details
Log from after this broke, login page does not loads, redirect loop. (242.55 KB, text/plain)
2011-05-25 00:22 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details
Proposed solution (3.47 KB, patch)
2011-05-25 04:58 PDT, Bjarne (:bjarne)
bzbarsky: review+
jst: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta-
Details | Diff | Splinter Review
Log from before this broke, login succeeds (827.83 KB, text/plain)
2011-05-25 09:57 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details
Log from current trunk with Bjarne's patch, login fails. (2.52 MB, text/plain)
2011-05-25 10:05 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details
Hack to detect and fix pattern as described in comment #17 (8.14 KB, patch)
2011-05-27 02:33 PDT, Bjarne (:bjarne)
bzbarsky: review+
mcmanus: review+
brian: feedback-
Details | Diff | Splinter Review

Description Johnny Stenback (:jst, jst@mozilla.com) 2011-05-24 22:24:58 PDT
This is so far something I've only ever seen when trying to log into the admin interface on a Ubiquiti wifi radio and but so far it seems like something that can happen elsewhere as well. The problem is that Firefox gets stuck in a redirect loop and ends up displaying the error for that. Hitting try again gets past that and all the way to the login screen, but logging in is unsuccessful. If I at this point clear the cache, I'm able to log in and things work fine for days until I get kicked out and need to log in again.

If I test with a clean profile and disable both the disk cache and the memory cache before navigating to the radio interface then I don't see the above redirect problem.

This is a regression since 4, so I think we should track this for 5 and 6 as this could be a more widespread problem than what I'm seeing.

Unfortunately I so far can only reproduce this issue with one of these radios locally, but I'm bisecting right now to see what changeset caused this, and maybe that'll shed some light on what might be going on here. I can attach logs etc too, if it's not clear from the changeset itself.
Comment 1 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 00:13:00 PDT
This regressed with revision http://hg.mozilla.org/mozilla-central/rev/6fe1e7832dc8. Cc:ing Bjarne.
Comment 2 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 00:13:47 PDT
IOW, the fix for bug 561276 caused this.
Comment 3 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 00:16:33 PDT
Created attachment 534997 [details]
Log from before this broke, login page loads fine.

This is a URILoader:5,LoadGroup:5,DocLoader:5,nsHttp:5,nsSocketTransport:5 log from before this broke (changeset b490b2316729, with s/PRUInt32/PRUint32/ in nsHttpTransaction.cpp to make it build).
Comment 4 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 00:22:44 PDT
Created attachment 534998 [details]
Log from after this broke, login page does not loads, redirect loop.
Comment 5 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 00:26:08 PDT
... and in those logs the failing load is from 192.168.1.1.
Comment 6 Bjarne (:bjarne) 2011-05-25 04:58:31 PDT
Created attachment 535036 [details] [diff] [review]
Proposed solution

This redirection uses a pattern which causes nsHttpChannel::CheckCache() to clear |mRedirectedCachekeys| prematurely, hence we lose the redirect-chain and the solution doesn't work anymore. Actually a quite stupid mistake in the original fix, but that's life once in a while...

I was able to reproduce by modifying test for bug #561276 to use the same pattern. Fix is to *not* clear |mRedirectedCachekeys| at the wrong time - see patch.

Passes local testing. Pushed to tryserver.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2011-05-25 07:25:22 PDT
Comment on attachment 535036 [details] [diff] [review]
Proposed solution

r=me
Comment 8 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 09:38:52 PDT
Thanks for the quick fix Bjarne! Seems to work wonderfully for the initial page that ends up stuck in the redirect loop, but the need to clear the cache before being able to log in remains. I'll capture logs for that as well.
Comment 9 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 09:57:30 PDT
Created attachment 535107 [details]
Log from before this broke, login succeeds

This is a log with Bjarne's patch applied of the same working build where the login page loads fine, but the login itself fails. The behavior here is that the login page loads, I enter the credentials and hit enter and I'm presented with the same login page again instead of actually getting logged in etc.

The radio name in this question is ap-bud instead of an IP address (sorry, had to switch radios here).
Comment 10 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 10:01:05 PDT
Er, sorry, I confused myself there... The previous log is of a working login, where the login succeeds. The upcoming log is of a session where the login itself does not work.
Comment 11 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 10:05:41 PDT
Created attachment 535110 [details]
Log from current trunk with Bjarne's patch, login fails.

Here the login page loads fine (no redirect loop), but logging in brings me back to the login page rather than logging in. If I clear the cache, I can successfully log in.
Comment 12 Asa Dotzler [:asa] 2011-05-25 11:54:41 PDT
can you guys get this into central so we can get this tested? will consider for 5 and 6.
Comment 13 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 15:15:24 PDT
Asa, this is only partially fixed by the attached patch. We could probably land that separately (and I'll go ahead and do that), but we need more work here.

Bjarne, if it's not clear from the logs what's going on here I could bring in one of these radios and hook it up to the network in the office so you can test it yourself. Let me know if that would help.
Comment 14 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 15:24:31 PDT
Proposed solution pushed as:

http://hg.mozilla.org/mozilla-central/rev/f584467626df

But there's more to do here, so leaving this bug open.
Comment 15 Bjarne (:bjarne) 2011-05-25 15:31:41 PDT
(In reply to comment #13)
> Bjarne, if it's not clear from the logs what's going on here I could bring
> in one of these radios and hook it up to the network in the office so you
> can test it yourself. Let me know if that would help.

That would be infinitely useful, please...  :)
Comment 16 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-25 16:20:24 PDT
Ok, will do. I'll send you the address and login info etc once it's set up (hopefully tomorrow).
Comment 17 Bjarne (:bjarne) 2011-05-26 07:22:18 PDT
Thanks for the test-setup. Makes life a lot simpler...  :)

The login-issue can be described as follows: From the login-page we POST the credentials. The response is a 302 pointing to "/" and necko translates this into a "GET /" request (see bug #598304). We have already loaded "/" and the result is cached, and it happens to be a 301 pointing to the login-page. Necko follows the redirect to the login-page and we're back to the beginning. The redirect-chain doesn't help here because the response to the login-page is a 200.

This worked previously because necko always validated cached redirects if a cookie was set in the request, hence it didn't use the cached 301-response. The real problem is that the 301-response is cached and it could easily be resolved on the server by specifying no-cache or must-validate.

(Note btw that changing the request-method on a 302 (or 301 for that matter) is dead wrong (see RFC2616, notes at the end of sections 10.3.2 and 10.3.3) but according to bug #598304 it might be necessary to avoid breaking the web.)

But how do we fix this? The original code used the presence of a cookie in the request to detect the situation, but bug #561276 decided that this was too coarse-grained. IMO this situation may occur only for replacement-channels where the request-method has changed. I.e. we could carry fwd the request-method of an originating channel to the replacement-channel and compare.
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2011-05-26 08:53:28 PDT
So the server should be sending a "Vary: Cookie" in this case and is not, right?
Comment 19 Bjarne (:bjarne) 2011-05-26 13:22:08 PDT
Not really. The cookie is also sent in the earlier, cached request. I assume the server applies a design-pattern which uses the cookie to identify a session but maintains state internally in the session, returning 301's to different pages depending on state. The real, uhh, confusing bit is why the server uses a 301 ("moved permanently") for this. By just replacing 301 with 302 the problem would be resolved since 302s are not cached unless explicitly specified...  :(

See e.g. attachment #535107 [details]: The first request for "/" results in a 301 to "cookiechecker". The request for "cookiechecker" results in a 301 back to "/". Then, the request for "/" is found in the cache but CheckCache() decides to validate it (probably because it's captured without the redirect-chain-patch). The subsequent request for "/" sets the cookie, but now the server responds with a 301 to "login". The request for "login" also sets the cookie and results in a 200 and at this point we have a cached 301-response for "/" with the cookie set. After this we see a number of req/res for resources probably displayed on the login-page, before we get to the POST with credentials. Server responds with a 302 to "/" and observe that CheckCache() decides to validate the cached 301-entry for "/" again. This validation saves us because we get a 200 when requesting "/" with cookie set again. Normally we should have re-used the cached 301 which would send us back to "login". This is what happens in attachment #535110 [details].

The question is whether we should hack around this or not. I have a patch which does as suggested in last para of comment #17, but it essentially breaks RFC2616 (i.e. in this case makes the 301 non-cacheable) in order to allow servers to use a design-pattern and misuse 301 like described above. IMO we should go back to whoever writes this stuff and tell them to use 302 instead in these cases.
Comment 20 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-26 18:07:03 PDT
Fwiw, I tested IE and Chrome with one of these radios and neither browser has a problem logging into them. So I think we need to do *something* here, but I don't know enough about this to say what.
Comment 21 Bjarne (:bjarne) 2011-05-27 01:46:53 PDT
(In reply to comment #19)
> Not really. The cookie is also sent in the earlier, cached request.

To clarify: The cookie-values in this case are the same so "Vary: Cookie" wouldn't help, unfortunately.

(In reply to comment #20)
> Fwiw, I tested IE and Chrome with one of these radios and neither browser
> has a problem logging into them. So I think we need to do *something* here,
> but I don't know enough about this to say what.

Would be interesting to know if IE/Opera receives the same content we do... Otoh I perfectly agree that we must do something - IMO we have 3 options:

1) apply the hack I'll attach in a little while (deliberately allowing such behaviour from servers)

2) back out the fix for bug #561276 (losing the advantage of caching 3xx's which are used correctly)

3) use the power of our current market-share and communicate with whoever writes this software and make them change it

I've encountered the dilemma 1 vs 3 in a number of other bugs and I'd expect to see it in also in future issues, hence it might be worth to take this discussion on a principal level in order to get clear guidelines (or maybe there are some already that I'm unaware of..?)

I don't know how widespread this software is (or whether it is maintained and upgraded) so I'll leave the decision of what to do to others.
Comment 22 Bjarne (:bjarne) 2011-05-27 02:33:57 PDT
Created attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

I don't like this solution but it allows me to log into the wireless and navigate it. If we want to follow this path, and if considered necessary, I could probably put together a test.
Comment 23 Bjarne (:bjarne) 2011-05-27 02:47:05 PDT
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

Btw, it passes local testing and has been sent to tryserver.
Comment 24 Boris Zbarsky [:bz] (still a bit busy) 2011-05-27 23:28:39 PDT
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

Looks ok to me from a code point of view.  I think Patrick or Jason are more qualified to review the HTTP interaction "should we do this?" end of things.
Comment 25 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-28 00:06:22 PDT
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

I would not be surprised if other browsers are invalidating the target of the redirect from the POST, since the spec says [1]:

"Some HTTP methods MUST cause a cache to invalidate an entity. This is either the entity referred to by the Request-URI, or by the Location or Content-Location headers (if present). These methods are: PUT, DELETE, POST. In order to prevent denial of service attacks, an invalidation based on the URI in a Location or Content-Location header MUST only be performed if the host part is the same as in the Request-URI."

I think we should do precisely what the spec says: If the host in the Location response header field of a POST (or PUT or DELETE) is the same as the host in the request URI, then doom the entries mentioned in the Location header field (and, similarly for Content-Location) when processing the POST (or PUT or DELETE).

[1] http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache-14#section-2.5
Comment 26 Patrick McManus [:mcmanus] 2011-05-28 06:21:07 PDT
Comment on attachment 535589 [details] [diff] [review]
Hack to detect and fix pattern as described in comment #17

I don't have a problem with what the patch does, so I'm happy to r+ it. We more or less want to bring the not-from-cache semantics of the POST through to the rewritten request.

If I understand Brian's suggestion it boils down to invalidating any cached resource (probably from a GET) that has POST/PUT/etc performed on it even for future GETs.. that has a certain logical attraction to it, but I would want to ponder how that interaction impacts the cache for deployed websites and imo we  should study that in a different bug.
Comment 27 Bjarne (:bjarne) 2011-05-28 12:49:40 PDT
(In reply to comment #25)
> I would not be surprised if other browsers are invalidating the target of
> the redirect from the POST

A fresh pair of eyes is often fruitful - this would be a far better solution (if it applies, which I believe it will)!

I remember reviewing a patch doing something like this some time ago but I cannot find it in any of my searches... Iirc the patch was ok but should be extended with more details so I believe I cleared the r-request. I'll try to dig it up somehow...
Comment 28 Bjarne (:bjarne) 2011-05-30 10:19:10 PDT
Yeah - this is probably also covered by RFC2616 section 13.10 as stated in bug #618835, comment #5. Wonder why I didn't recall this one - good catch Brian. :)

I'll verify that the approach from #618835 fixes the remaining issue of this bug.
Comment 29 Bjarne (:bjarne) 2011-05-30 13:10:42 PDT
WIP patch from bug #618835 fixes this problem. Since comment #14 pushed the fix for the original problem described in this bug, I resolve the remaining issue as a dup of bug #618835. Feel free to correct me if this is wrong procedure.

*** This bug has been marked as a duplicate of bug 618835 ***
Comment 30 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-30 14:59:39 PDT
Updating title to match what was fixed here. Thanks again Bjarne for digging in here! We'll continue tracking the remaining issue in bug 618835. And I'm going to mark this fixed instead, not that it makes much difference...
Comment 31 Johnny Stenback (:jst, jst@mozilla.com) 2011-05-31 14:46:14 PDT
Comment on attachment 535036 [details] [diff] [review]
Proposed solution

This needs to land on 5 beta and 6 aurora.
Comment 32 christian 2011-06-03 14:20:26 PDT
Even though this is a Firefox 5 regression, we are going to remove tracking and ship without it because:

1. We don't think the issue is widespread
2. The workaround is pretty simple (just clicking the try again button)
3. The changes are in http code and we are late in the beta cycle

Clearing version-specific tracking flags.
Comment 33 Boris Zbarsky [:bz] (still a bit busy) 2011-06-23 13:19:18 PDT
Bjarne, do you need to request aurora review to land this for Fx6?
Comment 34 Bjarne (:bjarne) 2011-06-23 14:57:26 PDT
This is not really a prerequisite for bug #618835 but it is related. Please observe that attachment #535036 [details] [diff] [review] was already approved for aurora (afaics) so I guess I can just request check-in?
Comment 35 Johnny Stenback (:jst, jst@mozilla.com) 2011-06-23 22:01:37 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/4d83f16deb7c
Comment 36 AndreiD[QA] 2011-06-24 05:26:14 PDT
Is bug 666021 referring to the same issue as the one that was fixed here? Thanks.
Comment 37 Bjarne (:bjarne) 2011-06-24 05:54:47 PDT
(In reply to comment #36)
> Is bug 666021 referring to the same issue as the one that was fixed here?

Most likely. Could you try the latest nightly build...?
Comment 38 Thomas Ahlblom 2011-08-04 12:39:33 PDT
*** Bug 676392 has been marked as a duplicate of this bug. ***
Comment 39 John Hawkinson 2011-08-06 13:06:33 PDT
(In reply to Christian Legnitto [:LegNeato] from comment #32)
> 2. The workaround is pretty simple (just clicking the try again button)

I'm not sure it's worth pointing out at this late date, but the Try Again button is not sufficient to resolve this issue. It gets past the redirect warning, but then one gets stuck on an infinite loop on the device's login page. That is, the login page is presented, user enters a username/password and clicks the submit button (Login), and the login page is again presented, ad-infinitum.

My solution has been to use another browser to manage this device, which is kind of annoying, especially because most of the time when I want to manage it it is because there is a network problem in-progress...
Comment 40 Boris Zbarsky [:bz] (still a bit busy) 2011-08-23 00:39:56 PDT
*** Bug 672394 has been marked as a duplicate of this bug. ***
Comment 41 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-22 13:54:08 PDT
qa-: QA is unable to verify this fix, Johnny can you please verify the fix?
Comment 42 John Hawkinson 2011-10-01 01:16:04 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #41)
> qa-: QA is unable to verify this fix, Johnny can you please verify the fix?

Sorry for the delay, I wasn't sure if this was targetted at mortals. Under WinXP, Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device that FF6 could not talk to (running Firmware Version: 	XM.v5.2.1). The Aurora particulars:
 Build identifier: Mozilla/5.0 (Windows NT 5.1; rv:9.0a2) Gecko/20110930 Firefox/9.0a2
 Built from http://hg.mozilla.org/releases/mozilla-aurora/rev/79eab065f0d0

So looks good!!
Comment 43 John Hawkinson 2011-10-01 01:22:51 PDT
> Aurora 9.02a2 seems to work just fine connecting to a Ubiquity device
> that FF6 could not talk to 

I suppose I need to amend that. I'm afraid I lost track of what version I saw the failure under. It was probably FF4, which appears to have been upgraded away from this system. It appears that FF7.0.1 also works just fine to connect to this device.
Comment 44 Johnny Stenback (:jst, jst@mozilla.com) 2011-10-05 09:53:23 PDT
Yup, this problem is now a thing of the past. Verified.

Note You need to log in before you can comment on or make changes to this bug.