Closed
Bug 628561
Opened 14 years ago
Closed 13 years ago
DNS caching forcing one to restart Firefox to eliminate temporary DNS redirection
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla8
People
(Reporter: xiaqqaix, Assigned: mcmanus)
References
Details
(Whiteboard: [inbound])
Attachments
(1 file, 2 obsolete files)
7.50 KB,
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0b10pre) Gecko/20110120 Firefox-4.0/4.0b10pre
Build Identifier: Mozilla/5.0 (X11; Linux i686; rv:2.0b10pre) Gecko/20110120 Firefox-4.0/4.0b10pre
Some public networks require user name and password for access, and they do this by hijacking the DNS, that is, any DNS requests would be redirected to the login web page. After logging in Internet access is granted, and the redirection is no longer necessary. However, Firefox caches the old DNS redirection, and the misbehavior requires a restart of Firefox to fix.
Although I'm talking about Firefox, I believe that the bug should belong to "Core", as I found the Firefox section doesn't have a component "Networking: Cache" or like but this section has.
Reproducible: Always
Steps to Reproduce:
1. Connect to some public network described above.
2. Navigate Firefox to a certain URL, say, "http://www.google.com"
3. Firefox would be redirected to the login page.
4. Enter the correct user name and password to log in.
5. Navigate Firefox to the same URL as in step 2.
Actual Results:
Firefox would be redirected to the login page again.
Expected Results:
Firefox should go to the un-redirected place.
At least a force reload should lead Firefox to the un-redirected place.
Only the URL in step 2 gets incorrectly redirected. Any other URL works.
A "force reload" doesn't help. I have tried (Ctrl|Shift|Alt) + (Reload button|F5); none works. Seems the force reload doesn't reload the DNS cache.
This could lead to substantial security problems, if the public network is malicious.
Comment 1•14 years ago
|
||
"Cache" is the HTTP cache.
How could this lead to security problems that aren't present without DNS caching, exactly?
Component: Networking: Cache → Networking
QA Contact: networking.cache → networking
(In reply to comment #1)
> "Cache" is the HTTP cache.
>
> How could this lead to security problems that aren't present without DNS
> caching, exactly?
Thanks for the prompt reply. After some rethink the security concern turned out to be a false perception. As for the category, it would be appreciated if anyone with the permission could move this report to the more appropriate section.
Comment 3•14 years ago
|
||
I already did the moving. ;)
Thanks for clarifying about the security concerns!
Group: core-security
Assignee | ||
Comment 4•14 years ago
|
||
the force reload does force the dns subsystem to refresh the dns cache:
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#3632
unfortunately, what I think is going on here is that your reload is using an idle persistent connection which is indexed under the host name.. so no call to the dns system is made (and thus no refresh is done). I was able to reproduce that.
The best thing I can think of to do is when we get a force reload and are about to reuse an idle persistent connection to check that the DNS matches the peer address and if doesn't (which 99.9% of the time is going to be because of a legit LB thing not the "hijack is repaired" scenario here) then update the DNS cache and close out all the existing pconns so as to use the new server. It will slow things down un-necessarily in some cases, but verifiable correctness is sort of the point of forced reload.
Assignee | ||
Comment 5•14 years ago
|
||
After sleeping on this, I decided to take a more direct approach. Forced reloads (ctrl f5, shift reload icon, or shift ctrl r) are very strong signals that we want to resync things from scratch.
so this patch responds to a forced reload by closing out the persistent connection pool for the related host. new connections will be opened and new DNS lookups will happen. There is a performance penality - but more or less that is what you mean when you do the forced reload
normal reloads are un-effected.
this is a long standing issue, we don't necessarily have to add the fix to necko 2.0 imo
Assignee: nobody → mcmanus
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #507259 -
Flags: review?(bzbarsky)
Comment 6•14 years ago
|
||
Comment on attachment 507259 [details] [diff] [review]
patch v1
You don't need the "see bug" bit; the vcs blame will show that. r=me with that line gone.
Attachment #507259 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 7•14 years ago
|
||
reflects comment 6.
will land post ff 4.0 barring other instruction.
Attachment #507259 -
Attachment is obsolete: true
Attachment #507441 -
Flags: review+
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #6)
>
> You don't need the "see bug" bit; the vcs blame will show that.
I made the change to the patch, but I wanted to follow up on this.
over 1100 of the 1233 lines of nshttptransaction.cpp are uselessly blamed on change d8dc49d5bd609668b3c4fadd6c1df12d5da20547 (rearrange makefiles and directory heirarchy). Anecdotally, that's also true of at least most of the files in netwerk/protocol/http.
changes to whitespace and minor changes in logic (e.g adding an ||) also obscure blame history. Blame is fragile.
The source should be self describing wrt motivation, but pointers to further reading are a good thing especially to stable sources like bugzilla.
But hey, I cringe at the 80 column rule too :) I can cope.
Comment 9•14 years ago
|
||
The blame mess with necko is a problem, yes; we have people working on hg to fix that. ;)
And yes, sometimes you have to follow blame a few changes back in time.
But if we added bug pointers for every checkin, we'd right now have several hundred thousand of them in our codebase. That's one every 10 lines of code, on average. That doesn't really scale, unfortunately.
Assignee | ||
Comment 10•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 11•14 years ago
|
||
Backed out on suspicion of causing tp4 regression: http://hg.mozilla.org/mozilla-central/rev/b17599b51fd7
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 12•14 years ago
|
||
http://perf.snarkfest.net/compare-talos/breakdown.html?oldTestIds=6864069,6864861,6866327,6867825&newTestIds=6869662,6869775&testName=tp4
this is the source of the tp4 issue.. it does clear the cache, but I didn't think any of the tests were using external resources?
Comment 13•14 years ago
|
||
Also, tp4 shouldn't be doing force-reloads...
Comment 14•14 years ago
|
||
Aren't the post-patch results of this tp4 "regression" to be expected? Some of those pages have multiple persistent connections and the test purposefully resets each of them 10 times. Now, unlike before, they are actually being completely reset, as one would intuitively expect -- see Comment 5. Pre-patch all those connections would have remained open though it was reloading each of them.
I'd vote for fixing the test either by not having it force-reload between each of 10 page refreshes ... or just accepting the bad looking, but more accurate results.
Ctrl-F5 is a supposed to be a full reset, so it should be somewhat slow, otherwise its just gaming the system.
Comment 15•14 years ago
|
||
> and the test purposefully resets each of them 10 times.
Where do you see that happening, exactly?
> Ctrl-F5 is a supposed to be a full reset
tp4 does NOT do that. It just loads web pages the same as you would by typing in the url bar and hitting enter.
Assignee | ||
Comment 16•14 years ago
|
||
(In reply to comment #15)
> > and the test purposefully resets each of them 10 times.
>
> Where do you see that happening, exactly?
>
> > Ctrl-F5 is a supposed to be a full reset
>
> tp4 does NOT do that. It just loads web pages the same as you would by
> typing in the url bar and hitting enter.
First, Kelley - thanks for taking the time to think about this.. I haven't forgotten about it but it is sitting behind some other higher priority things.
Boris, I did get as far as logging on the code that is supposed to match up to the ^F5 and it did get hit when running tp4. I did not get a chance to dig into why the heck that was happening - that will be my next step.
and lastly, when I fixed a bug in syn-retry a couple weeks ago a bunch of talos numbers on the win-7 platform improved quite unexpectedly.. it stands to reason that this patch, which can result in having to make a bunch of new connections, would have run into that problem disproportionately on that platform.. if so, its possible that this can just land because the other bug is fixed. but that's just conjecture at this point until I have a chance to run it down.
Comment 17•14 years ago
|
||
> and it did get hit when running tp4
Er... why? That seems wrong; what did the stack look like?
Assignee | ||
Comment 18•14 years ago
|
||
(In reply to comment #17)
> > and it did get hit when running tp4
>
> Er... why? That seems wrong; what did the stack look like?
as I said, digging into it is in my queue somewhere. I don't think I even got a stack - just a log line which I didn't expect to be in the output :) I only mentioned it now because this is obviously tieing up space in kelley's brain :)
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to comment #13)
> Also, tp4 shouldn't be doing force-reloads...
LOAD_BYPASS_CACHE is used by force-reload.. it turns out in the case of tp4, it is also used by an xhr post of some sort.
I'll update the patch to use a new LOAD_FRESH_CONNECTION and have docshell set LOAD_BYPASS_CACHE | LOAD_FRESH_CONNECTION on the force-reload.
Assignee | ||
Comment 20•13 years ago
|
||
the tp4 (still available on try) regression clears up with this version. Makes more sense than overloading bypass_cache imo.
The words captive portal should be in this bug for search engine purpose.
Attachment #507441 -
Attachment is obsolete: true
Attachment #547506 -
Flags: review?(bzbarsky)
Comment 21•13 years ago
|
||
Hmm. Would it make more sense for XHR to use BYPASS_LOCAL_CACHE instead of BYPASS_CACHE in the POST case? What it's doing right now is making sure to read from the server but still storing the data in our cache... Jonas?
Comment 22•13 years ago
|
||
Comment on attachment 547506 [details] [diff] [review]
patch v3
That said, I think I agree that we may want to have a separate flag for this, because overloading LOAD_BYPASS_CACHE is not great. But should we perhaps leave it out of LOAD_REQUESTMASK? Otherwise each subresource on a page being shift-reloaded will end up closing idle connections, right?
Comment 23•13 years ago
|
||
And I'm also a little bit worried about running out of nsIRequest flag bits, but not sure what to do about it yet.
Sorry, don't know enough about the issues here to have an immediate opinion. Most of what I know is based on comments in the XHR code.
Comment 25•13 years ago
|
||
The basic issue is whether we want XHR POST to bypass the cache only in terms of not reading from it (but still write to it) or whether it should skip the cache altogether. But that's fodder for as separate bug anyway....
Assignee | ||
Comment 26•13 years ago
|
||
(In reply to comment #22)
> Comment on attachment 547506 [details] [diff] [review] [review]
> patch v3
>
> That said, I think I agree that we may want to have a separate flag for
> this, because overloading LOAD_BYPASS_CACHE is not great. But should we
> perhaps leave it out of LOAD_REQUESTMASK? Otherwise each subresource on a
> page being shift-reloaded will end up closing idle connections, right?
The closing of connections is done on a per-host basis and the subresources don't have to share the same host, so I think we're doing the right thing. It does mean that every transaction in the load group will get its own connection - but for the extremis of force-reload I don't think that's wrong.
It would be kind of odd to have existing connections to subresources that are out of sync because of a captive portal - but its not impossible in mobile scenarios.
Assignee | ||
Comment 27•13 years ago
|
||
(In reply to comment #23)
> And I'm also a little bit worried about running out of nsIRequest flag bits,
> but not sure what to do about it yet.
perhaps bits 1 through 6 can be reclaimed?
Comment 28•13 years ago
|
||
Oh, I missed that 1-6 are unused. OK, good.
Comment 29•13 years ago
|
||
Comment on attachment 547506 [details] [diff] [review]
patch v3
r=me
Attachment #547506 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Updated•13 years ago
|
Whiteboard: [inbound]
Comment 30•13 years ago
|
||
Target Milestone: --- → mozilla8
Updated•13 years ago
|
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•