Open Bug 956267 Opened 10 years ago Updated 2 years ago

localhost DNS lookup always fails if $HOME empty

Categories

(Core :: Networking, defect, P3)

28 Branch
x86
Linux
defect

Tracking

()

People

(Reporter: jidanni, Unassigned)

Details

(Whiteboard: [necko-backlog])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0 Iceweasel/28.0a2 (Beta/Release)
Build ID: 20131215004001

Steps to reproduce:

Do the following test.
Disconnect your computer from the network and in /etc/hosts.conf put
127.0.0.1 abj.jidanni.org
127.0.0.1 radioscanningtw.jidanni.org
127.0.0.1 transgender-taiwan.org
127.0.0.1 mysql.transgender-taiwan.org
and have apache serve them. And restart e.g., dnsmasq, firefox, etc.

Well chromium and w3m work fine.

But for firefox, transgender-taiwan.org doesn't work!
That's right, all the other virtual sites we created work except that one.

Why? Because firefox finds that there is only two components,
and insists on adding WWW. in front before trying it AT ALL!

(You need to be offline to see this happening.)

Setting browser.fixup.alternate.prefix to "" fixes it.

This is terrible --- not one little beep over HTTP is ever attempted.

Please check for the version of the URL the user has asked for
before stuffing the WWW. on in BOTH online *AND FOR LOCALHOST* situations.
On localhost,
$ firefox http://X.Y/ from the command line is also turned into http://WWW.X.Y/ before EVEN TESTING FOR X.Y !
Q.X.Y and Q.R.X.Y etc. are immune, because they have 3+ components.
Severity: normal → major
Component: Untriaged → Location Bar
Priority: -- → P2
Not sure where this belongs, Document Navigation or Networking?
Component: Location Bar → Document Navigation
Product: Firefox → Core
Document Navigation is correct, since the relevant code is nsDefaultURIFixup::MakeAlternateURI, called by nsDefaultURIFixup::CreateFixupURI, which probably comes from this snippet in nsDocShell::EndPageLoad:

>7013     // If the page load failed, then deal with the error condition...
>7014     // Errors are handled as follows:
>7015     //   1. Check to see if it's a file not found error or bad content
>7016     //      encoding error.
>7017     //   2. Send the URI to a keyword server (if enabled)
>7018     //   3. If the error was DNS failure, then add www and .com to the URI
>7019     //      (if appropriate).
>7020     //   4. Throw an error dialog box...
The callsite pointed to in comment 4 passes either 0 or FIXUP_FLAG_ALLOW_KEYWORD_LOOKUP to CreateFixupURI.

But browser.fixup.alternate.prefix is only considered if the FIXUP_FLAGS_MAKE_ALTERNATE_URI flag is set.

That flag is set in the code pointed to in comment 3, but that code would only run if we failed to resolve the hostname transgender-taiwan.org.  Note that this is purely about DNS resolution; I would not expect any HTTP traffic before this failure case if it happens at all.

I see no other places that call createFixupURI with FIXUP_FLAGS_MAKE_ALTERNATE_URI in our tree.

So it sure sounds like we're not finding an IP address for that hostname.

Reporter, could you please create a DNS resolution log for a minimal Firefox session that shows the problem for you using the steps at https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging but with NSPR_LOG_MODULES set to nsHostResolver:5 and then attach that log to this bug?
Flags: needinfo?(jidanni)
And wouldn't you know it, I can't reproduce it at all today, so I'll
close this bug for now while I try different ways to reproduce it.
When it was happening, tcpflow -i lo showed nothing, so you are right,
it was all at the DNS level.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(jidanni)
Resolution: --- → INVALID
Real results!

Detach your computer from the internet.
After boot, put
127.0.0.1 abj.jidanni.org
in /etc/hosts, and do
/etc/init.d/dnsmasq stop or whatever,
to prevent any DNS interference,
and have an apache2 server listening for that virtual host there.

# cat script
export NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
export NSPR_LOG_FILE=/tmp/log.$$.txt
export HOME=/tmp/F
mkdir -p $HOME
firefox http://abj.jidanni.org/
# chmod +x script
# su - nobody -c $PWD/script #super pristine

NOW on the first run with an empty $HOME, firefox consistently CANNOT
find abj.jidanni.org. Now quit firefox with CTRL+Q.

ON second and subsequent runs, it ALWAYS finds abj.jidanni.org !
(Which only holds for our pristine case, not all cases... Anyway, the
above you should be able to reproduce. iceweasel: 28.0~a2+20131215004001-1)

So we see it probably looks IN its cache before looking FOR its cache,
or something!

(So how about the fuss I made about "www." ?
Well in each case after I had saved preferences I had thus already
made the directory tree and thus was not reproducing the above exact
test... or something.)

P.S., On the aforementioned second AND subsequent runs, OTHER sites
that we have listed there in our /etc/hosts.conf still cannot be
found when we type them into the URL bar. If we do
# su - nobody -c "HOME=/tmp/F firefox http://other.site.com/"
to connect to that running firefox, they SOMETIMES can be found...
maybe there is a race condition in that case, here on my 2005 vintage computer.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Summary: even before ever checking for real server, browser.fixup.alternate.prefix is applied, preventing any contact with the real server, when on localhost → localhost DNS lookup always fails if $HOME empty
Comment on attachment 8355845 [details]
log.5965.txt - FAILURE on first run

Here's the difference:

2014-01-05 04:35:46.306378 UTC - -1219873024[b722e480]: Resolving host [abj.jidanni.org].
2014-01-05 04:35:46.306398 UTC - -1219873024[b722e480]:   No usable address in cache for [abj.jidanni.org]
2014-01-05 04:35:46.306406 UTC - -1219873024[b722e480]:   DNS thread counters: total=1 any-live=0 idle=1 pending=1
2014-01-05 04:35:46.306415 UTC - -1219873024[b722e480]:   DNS lookup for host [abj.jidanni.org] blocking pending 'getaddrinfo' query: callback [a5651880]
2014-01-05 04:35:46.306458 UTC - -1501562048[a6e58040]: DNS lookup thread - Calling getaddrinfo for host [abj.jidanni.org].
2014-01-05 04:35:46.306722 UTC - -1501562048[a6e58040]: DNS lookup thread - lookup completed for host [abj.jidanni.org]: failure: unknown host.

2014-01-05 04:35:56.114034 UTC - -1220450560[b722e480]: Resolving host [abj.jidanni.org].
2014-01-05 04:35:56.114051 UTC - -1220450560[b722e480]:   No usable address in cache for [abj.jidanni.org]
2014-01-05 04:35:56.114103 UTC - -1220450560[b722e480]:   DNS thread counters: total=1 any-live=0 idle=0 pending=1
2014-01-05 04:35:56.114114 UTC - -1220450560[b722e480]:   DNS lookup for host [abj.jidanni.org] blocking pending 'getaddrinfo' query: callback [a8afd880]
... later ... 
2014-01-05 04:35:56.114652 UTC - -1308624064[b722f800]: Resolving host [abj.jidanni.org].
2014-01-05 04:35:56.114661 UTC - -1308624064[b722f800]:   Host [abj.jidanni.org] is being resolved. Appending callback [a8afd9d0].
2014-01-05 04:35:56.114670 UTC - -1308624064[b722f800]:   advancing to STATE_RESOLVING
2014-01-05 04:35:56.118032 UTC - -1493181632[a8aa5600]: DNS lookup thread - starting execution.
2014-01-05 04:35:56.124224 UTC - -1493181632[a8aa5600]: DNS lookup thread - Calling getaddrinfo for host [abj.jidanni.org].
2014-01-05 04:35:56.124554 UTC - -1493181632[a8aa5600]: DNS lookup thread - lookup completed for host [abj.jidanni.org]: success.

Not really sure what to make of the flaky DNS results.
Sounds like the system getaddrinfo call is just returning different things in the two cases, no?  As in, the problem is in whatever implements getaddrinfo, not in our code...
Component: Document Navigation → Networking
(In reply to Boris Zbarsky [:bz] from comment #11)
Super easy to reproduce: Unplug network, add line to /etc/hosts, run above script.
...that is I can totally control if the bug will appear or not: { rm -r /tmp/F; run script; run script again;}
repeat as many times as you need. Each first run will have the bug, each second run won't.
Boris, do you know if this is our bug? Seems like by comment 11 that you think it might not be.
Severity: major → normal
Flags: needinfo?(bzbarsky)
I strongly doubt this is our bug, but someone needs to investigate to make sure...
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #15)
> I strongly doubt this is our bug, but someone needs to investigate to make
> sure...

Need-info to Lukas to find someone to work on this.
Flags: needinfo?(lsblakk)
Jason - you're on the peer list for Core:Networking - is there anything you can see in this case that would help us clarify where the bug lies (in our code or not)?
Flags: needinfo?(lsblakk) → needinfo?(jduell.mcbugs)
Would also be useful to know how far back this reproduces.
I'm not convinced that we need to be working hard to find a window or an assignee for this; the situation described here is pretty esoteric.
(In reply to Josh Matthews [:jdm] from comment #19)
> I'm not convinced that we need to be working hard to find a window or an
> assignee for this; the situation described here is pretty esoteric.

I agree, it's definitely not something we'd track for release.  Up to Jason to prioritize this in his workflow or pass it to someone else better suited to investigate.
Yes, my guess here is that Chrome uses it's own DNS resolver, while we use the OS'es, and there's something idiosyncratic about the DNS setup on Dan's box (or distro).  I don't think I want to put hours into this unless/until we get some evidence that this is a widespread issue.  It's very unlikely this is something we can easily within mozilla code (short of shipping our own DNS resolver, which we'll probably do one of these days).
Flags: needinfo?(jduell.mcbugs)
So should I resolve this as WONTFIX until such time that we decide to implement our own DNS resolver?
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #22)
> So should I resolve this as WONTFIX until such time that we decide to
> implement our own DNS resolver?

don't close it - it could well be a firefox issue.. or maybe not. its simply not clear. If it were - I would definitely take a patch for if it were fully investigated. Just because it doesn't rate on the priority list for full timers (and I agree we've spent too much time talking about it) doesn't mean we wouldn't take the contribution from someone scratching their itch.

its quite plausible that this is tied up in the online/offline triggering state-resetting work that :bagder and :sworkman are starting this quarter.
Did any of you run my test? It only takes a half a minute to confirm the bug.
For now, I'll mark this NEW since it's been replicated, even though we aren't sure it's Firefox's bug or the OS's DNS setup.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P2 → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: