Closed Bug 981447 Opened 7 years ago Closed 7 years ago

dns cache too sticky!

Categories

(Core :: Networking: DNS, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox29 --- fixed
firefox30 --- fixed

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

(Whiteboard: [dns])

Attachments

(2 files)

firefox accounts is having problems with shifting IP CNAMEs on the server side and firefox not being able to connect when that happens - well past the TTL time.

After investigating, there are a handful of different actions

1] this is primarily caused by 807441 which intentionally makes addresses very sticky when there are no known errors in the name of improving SSL session cache hit rates - which are a major slow down for us. The fact that fxa runs from chrome instead of content makes error detection via reload fail. I think that turns out to be too aggressive of an approach (it was mine :() and should be reverted - we'll find another way. That change can ride the normal trains imo (different bug to be filed) though could be backported if #2 is not sufficient.

2] the easiest short term fix is just to lower the caching value and grace period to, pending testing, 60 seconds each. we can do that in this bug and back port it as necessary as it is just a default pref. I hope that will get things running acceptably.

3] develop a feedback system that break stickiness that replaces the one being reverted in #1 and put the stickiness back in (we want that session cache hit rate!). Jesse suggests any bad connection or ssl handshake could be the input, and that makes sense to me. different bug - to be filed. nothing blocks on resolving this bit.

4] bug 151929 says we don't access the real TTL - and that remains true but somewhat orthogonal.
Whiteboard: [dns]
Blocks: 981513
Attachment #8388366 - Flags: review?(sworkman)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Attachment #8388366 - Flags: review?(sworkman) → review+
Comment on attachment 8388366 [details] [diff] [review]
use more conservative dns caching values

[Approval Request Comment]
Bug caused by (feature/regressing bug #): very old dns ttl management strategy
User impact if declined: firefox accounts fails to find updated web services after IP changes
Testing completed (on m-c, etc.): on m-i, hand tested, verified by reporter
Risk to taking this patch (and alternatives if risky): very low risk of failure. some cases of increased onclick latency will be seen as a necessary side effect of removing an unsafe optimization
String or IDL/UUID changes made by this patch: none
Attachment #8388366 - Flags: approval-mozilla-aurora?
Comment on attachment 8388366 [details] [diff] [review]
use more conservative dns caching values

must be uplifted for fx/a
Attachment #8388366 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8388366 [details] [diff] [review]
use more conservative dns caching values

Review of attachment 8388366 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/src/init/all.js
@@ +1252,5 @@
> +// This is the number of dns cache entries allowed
> +pref("network.dnsCacheEntries", 400);
> +
> +// In the absence of OS TTLs, the DNS cache TTL value
> +pref("network.dnsCacheExpiration", 60);

I am not familiar with the inner DNS logic, so just to double check: when a DNS record has a TTL value larger than 60 seconds, Firefox will rely on the OS cache, and not re-query every 60s?
(In reply to Julien Vehent [:ulfr] from comment #6)
> 
> I am not familiar with the inner DNS logic, so just to double check: when a
> DNS record has a TTL value larger than 60 seconds, Firefox will rely on the
> OS cache, and not re-query every 60s?

the firefox OS cache will expire at 60s, but the next cache up in line may indeed serve a cache hit in the presence of a big TTL. (maybe that's the OS cache, maybe that's the recursive resolver, etc..).. in otherwords the expiration of the firefox TTL doesn't force an end to end revalidation if an intermediary can supply a cached value.

hth
It does. Thanks for clarifying.
https://hg.mozilla.org/mozilla-central/rev/c2658ef02336
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Duplicate of this bug: 512969
My IP just changed 5+ hours ago and I still can't reach my own site using Firefox v29.0.1 Linux.  The only indication of a problem is a message telling me I'm in "Offline mode".  I turned on FF Menu Bar and don't see a check mark next to "Work Offline".  I toggled the mode trying both offline and online and it makes no difference.  I fired up Chromium and am able to load my own site pages.

I also tried an Add-on called DNS Cache 1.7, and tried "Enabling" and "Disabling" the cache.  Behavior is the same either way.  I used the same program to "Flush" the Cache, and again, no change in behavior.

I went to about:config typed the filter "cache" and didn't see anything I felt I should try touching.

I fired up my laptop, which doesn't get used much (shouldn't be anything recent in the cache) and FFv29.0 Windows is able to go straight to my local server, then immediately started updating itself to v29.0.1.

At 5+ hours with FFv29.0.1 Linux, I'd say this bug report is still open, unless there is another for this issue.
Hi Craig, you might be interested in Bug 939318 and its related bugs. We're working on patches to reset some networking state info (certain HTTP connections, the DNS cache, proxy auto config etc.) when there are changes in the network link. Bug 939318 is for Windows, with other platforms to be filed as follow-ups.
Thanks Steve, I'll have a look.

The actual URL I am using (my own) is https://arno.com/tng .  This caching issue seems to be a large issue on several fronts.  In my prior experience with this issue, it will disappear on it's own in roughly 24 hours, give or take.  Until then, I have to use Chromium to get anything done.  I mostly use Linux, but also use Windows heavily enough and computers in general to be called a sophisticated user on both platforms.  https://arno.com will suggest why.
I experience this consistently on Ubuntu 14.10, FF 33.0. If I close my laptop's lid then reopen it and try to access a site before the connection has fully resumed, I am unable to visit that site until I close and reopen Firefox regardless of the state of the connection. Disabling offline mode and Firefox's DNS cache don't help. Other browsers and command line tools have no trouble accessing the site but FF continues to report "Server not found".
You might be seeing 939318, which was fixed in FF 35 (currently in Aurora / Developer Edition).
You need to log in before you can comment on or make changes to this bug.