blocks for long periods in nsDNSService::Resolve

RESOLVED INVALID

Status

()

--
critical
RESOLVED INVALID
10 years ago
10 years ago

People

(Reporter: brian, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

10 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008111318 Ubuntu/8.10 (intrepid) Firefox/3.0.4
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008111318 Ubuntu/8.10 (intrepid) Firefox/3.0.4

I'm finding that if I give firefox a URL with a domain name that does not exist, the entire process, all windows and tabs included will become unresponsive for a long time (10 minutes?).  A look at firefox's stack at the time shows:

#0  0xb808c430 in __kernel_vsyscall ()
#1  0xb8042075 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7ce4e39 in PR_WaitCondVar (cvar=0xbb5b000, timeout=4294967295)
    at ptsynch.c:405
#3  0xb7ce4eb7 in PR_Wait (mon=0x1e727120, timeout=4294967295) at ptsynch.c:584
#4  0xb72bd0f3 in nsDNSService::Resolve (this=0xa031dd0, hostname=@0x18cd3b18, 
    flags=<value optimized out>, result=0xbfa851e4) at nsDNSService2.cpp:498
#5  0xb7a16085 in NS_InvokeByIndex_P ()
   from /usr/lib/xulrunner-1.9.0.4/libxul.so
#6  0xb72829a3 in XPCWrappedNative::CallMethod (ccx=@0xbfa8538c, 
    mode=XPCWrappedNative::CALL_METHOD) at xpcwrappednative.cpp:2393
#7  0xb7289fa5 in XPC_WN_CallMethod (cx=0xb2b3e70, obj=0xb38cd80, argc=2, 
    argv=0x1c1b23c0, vp=0xbfa854c8) at xpcwrappednativejsops.cpp:1473
#8  0xb7d5cc65 in js_Invoke (cx=0xb2b3e70, argc=2, vp=0x1c1b23b8, flags=2)
    at jsinterp.c:1297
#9  0xb7d50b2c in js_Interpret (cx=0xb2b3e70) at jsinterp.c:4857
#10 0xb7d5ccb4 in js_Invoke (cx=0xb2b3e70, argc=1, vp=0x1c1b23ac, flags=0)
    at jsinterp.c:1313
#11 0xb7d5cfb4 in js_InternalInvoke (cx=0xb2b3e70, obj=0xb25b120, 
    fval=188271040, flags=0, argc=1, argv=0x1c1b23a0, rval=0xbfa85a18)
    at jsinterp.c:1369
#12 0xb7d2832a in JS_CallFunctionValue (cx=0xb2b3e70, obj=0xb25b120, 
fval=188271Quit

And a look at what started this brokenness:

(gdb) frame 4
#4  0xb72bd0f3 in nsDNSService::Resolve (this=0xa031dd0, hostname=@0x18cd3b18, 
    flags=<value optimized out>, result=0xbfa851e4) at nsDNSService2.cpp:498
498	in nsDNSService2.cpp
(gdb) print hostname
$2 = (
    const nsACString_internal &) @0x18cd3b18: {<nsCSubstring_base> = {<No data fields>}, mData = 0x183bb0f0 "marc.info", mLength = 9, mFlags = 5}

So, yes indeed, marc.info is an NXDOMAIN, but why should the entire of firefox hang up while it figures that out?  Obviously something has lined all of firefox up behind some kind of synchronous operation which is for whatever reason taking a long time to complete.


Reproducible: Sometimes

Steps to Reproduce:
1.
2.
3.

Comment 1

10 years ago
http://www.xulplanet.com/references/xpcomref/ifaces/nsIDNSService.html

nsIDNSRecord resolve ( AUTF8String hostName , PRUint32 flags )
"Called to synchronously resolve a hostname. warning this method may block the calling thread for a long period of time. it is extremely unwise to call this function on the UI thread of an application."

nsICancelable asyncResolve ( AUTF8String hostName , PRUint32 flags , nsIDNSListener listener , nsIEventTarget listenerTarget )
"Kicks off an asynchronous host lookup."

Yes, you "lined up all of Firefox" behind the operation by telling it to do a synchronous lookup in that thread. DNS resolves might take time, so if you're not in a thread that can take it then you should use asyncResolve().
Status: UNCONFIRMED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → INVALID

Updated

10 years ago
OS: Linux → All
Hardware: PC → All

Comment 2

10 years ago
Simplifying some bits of code in Flagfox...
How to look up a domain name asynchronously:
--------------------------------------------------
var dnsHandler =
{
    dns : null,     // DNS service
    thread : null,  // this thread; onLookupComplete() must be called in this thread

    init : function()
    {
        this.dns = Components.classes["@mozilla.org/network/dns-service;1"]
                             .getService(Components.interfaces.nsIDNSService);

        var EQS = Components.classes["@mozilla.org/event-queue-service;1"];
        if (EQS)  // Firefox 1.5 - 2.0
        {
            EQS = EQS.getService(Components.interfaces.nsIEventQueueService);
            this.thread = EQS.getSpecialEventQueue(EQS.CURRENT_THREAD_EVENT_QUEUE);
        }
        else      // Firefox 3.0+
        {
            this.thread = Components.classes["@mozilla.org/thread-manager;1"]
                                    .getService(Components.interfaces.nsIThreadManager)
                                    .currentThread;
        }
        if (!this.thread)
            throw "Could not fetch current thread for DNS handler";
    },

    resolveHost : function(host)
    {
        this.dns.asyncResolve(host, 0, this, this.thread);
    },

    onLookupComplete : function(nsrequest, nsrecord, status)
    {
        if (status != 0 || !nsrecord || !nsrecord.hasMore())
            return;  // IP not found in DNS

        var ip = nsrecord.getNextAddrAsString();

        // DO SOMETHING
    }
};
--------------------------------------------------

dns.asyncResolve() returns an object which can be used to abort the lookup, so you could also keep track of it/them and cancel requests if you need to. (which I also do for one at a time, but I didn't want to clutter up the above) Read the docs and you can do other stuff here if you need to as well.

Just thought you might want to know how to call it correctly and I could provide a code sample easily. If you don't need to maintain any Gecko 1.8.x compatibility you can drop the EQS stuff and use the newer method only.

Comment 3

10 years ago
(of course nothing fits inside the wordwrap here, but you get the gist)
(Reporter)

Comment 4

10 years ago
davegar,

Why did you change this bug to RESOLVED INVALID?  I'm not sure I understand how it's "resolved".

(In reply to comment #1)
> http://www.xulplanet.com/references/xpcomref/ifaces/nsIDNSService.html
> 
> nsIDNSRecord resolve ( AUTF8String hostName , PRUint32 flags )
> "Called to synchronously resolve a hostname. warning this method may block the
> calling thread for a long period of time. it is extremely unwise to call this
> function on the UI thread of an application."

Yes, I read that too.

> Yes, you
       ^^^
Who is the "you" to which you are referring?  Me, the user?

> "lined up all of Firefox" behind the operation by telling it to do a
> synchronous lookup in that thread.

I can't see how that's me to whom you are referring.  As a user clicking on links, I have no control over whether a click results in a synchronous or asynchronous lookup.

> DNS resolves might take time, so if you're
> not in a thread that can take it then you should use asyncResolve().

Agreed.  But again, the person (you're) in that instruction can't be me.

Comment 5

10 years ago
You, as in the programmer who used resolve() where you shouldn't have. Seeing as you're giving me a stack here and inquiring specifically about one function's behavior, I'm assuming you're the one using it. If you weren't the one who wrote whatever is stalling, then you'll have to complain to that person. (if it's somewhere in Mozilla code then that would be a bug here)

This bug is INVALID because the initial complaint is that resolve() blocks the thread yet that is the intended behavior. It's doing what it's supposed to so it's not a bug. You asked why it should block in that function and my response is to tell you why and what to do instead, that's all.

If you're saying that this happens in Firefox by itself with no add-ons or anything, then that's another story entirely. If that's what you meant then your initial report and title "blocks for long periods in nsDNSService::Resolve" wasn't clear and I didn't understand you, because "nsDNSService::Resolve" is supposed to "block for long periods". ;)

Comment 6

10 years ago
As to the domain mentioned, I can load up marc.info with no problems. (and any known non-existant domains without problems either)

(hint: leaving the steps to reproduce blank is a good way to not get across what the bug you're reporting is)

Comment 7

10 years ago
If this was just intended to be a bug complaining about general stalling on browsing to a page that DNS fails on, then just file another bug to that effect. (and sorry for any misunderstanding if there was one) Though, I suggest you first test in the Mozilla build with a new profile because Mozilla obviously doesn't have control over the Ubuntu build. Many reports filed for non-Mozilla builds are often marked INVALID as you're generally supposed to report problems in their build to them first.
(Reporter)

Comment 8

10 years ago
(In reply to comment #5)
> You, as in the programmer who used resolve() where you shouldn't have. Seeing
> as you're giving me a stack here and inquiring specifically about one
> function's behavior, I'm assuming you're the one using it. If you weren't the
> one who wrote whatever is stalling, then you'll have to complain to that
> person. (if it's somewhere in Mozilla code then that would be a bug here)

No.  I am a Firefox user, filing a bug report about a Firefox hang (notice the Product: Firefox in the header) and providing the stack trace as helpful information.

> This bug is INVALID because the initial complaint is that resolve() blocks the
> thread yet that is the intended behavior. It's doing what it's supposed to so
> it's not a bug. You asked why it should block in that function and my response
> is to tell you why and what to do instead, that's all.

Hrm.  I think you are mis-reading the report.  I think I was pretty clear that as a user I simply provided Firefox with a URL and the lookup of the domain name in that URL caused Firefox to hang, because it was using nsDNSService::Resolve.

> If you're saying that this happens in Firefox by itself with no add-ons

Well, I can't be sure it happens with *no* addons as I do have a few of them (who doesn't).

> or
> anything, then that's another story entirely.

That's (almost -- minus the *no addons*) what I'm saying.

> If that's what you meant then
> your initial report and title "blocks for long periods in
> nsDNSService::Resolve" wasn't clear

What would have been more clear than "[Firefox] blocks for long periods in nsDNSService::Resolve"?  The initial Firefox is of course implied by the "Product: Firefox" in the bug filing.

> and I didn't understand you, because
> "nsDNSService::Resolve" is supposed to "block for long periods". ;)

But I didn't write that did I?

Now can we have the status of this bug rectified to reflect what I actually wrote and that's that Firefox is blocking for long periods of time because it's using "nsDNSService::Resolve"?  This looks to me like a bug and it's not resolved.

(In reply to comment #6)
> As to the domain mentioned, I can load up marc.info with no problems. (and any
> known non-existant domains without problems either)

Great.  Pretty usually, that fails gracefully here too.  Lately it has not been for reasons that I am not sure.
 
> (hint: leaving the steps to reproduce blank is a good way to not get across
> what the bug you're reporting is)

If I had a definite set of steps to reproduce I would have.

(In reply to comment #7)
> If this was just intended to be a bug complaining about general stalling on
> browsing to a page that DNS fails on, then just file another bug to that
> effect.

How would it be any different than this bug?  I could file some kind of dumb "firefox is stuck again.  please fix it" bug but how useful would that be?

> Though, I suggest
> you first test in the Mozilla build with a new profile because Mozilla
> obviously doesn't have control over the Ubuntu build. Many reports filed for
> non-Mozilla builds are often marked INVALID as you're generally supposed to
> report problems in their build to them first.

I would, but TBH, my opinion of the bug triaging over at Ubuntu (for firefox at least) is pretty low.  I get all kinds of "try this, try that" stabs in the dark but no real programmatic analysis of problems.  Sad to say.

If you really think I should still open a new bug here, I will.  But tell me what I should do different than I did in this bug report.  I thought it was a concise description of the problem along with information to help analyze.  Should I provide less helpful information?
>No.  I am a Firefox user, filing a bug report about a Firefox hang (notice the
>Product: Firefox in the header) and providing the stack trace as helpful
>information.

Networking is not the product Firefox, it's core (=Gecko) but the caller of this function could be of course be outside of Gecko. Most people file all kind of bugs in Firefox:General and many of them are caused by addons (invalid) or are in Gecko (Core). All what we know if you use "firefox" as prudct is that you are using Firefox aand not for example Thunderbird or Seamonkey.

This bug is only useful if you test this without addons (Firefox safemode).
In that case we have to find the caller but I think this must be a "bug" in an addon.
Do you get this hang if you enter a real nonexisting domain as example.inv ?
(Reporter)

Comment 10

10 years ago
I've finally figured out what causes this.  IMHO, it shouldn't but it does -- so AFAIAC, this is a bug.

It was the DNS server returning "server error".  Try configuring a bind9 server so that a valid IP space is in the bogon list and have bind9 blackhole bogons.

Then try to use a URL in that IP space such that bind9 returns server error to the client.
You need to log in before you can comment on or make changes to this bug.