if gethostbyname() blocks because a hostname gets stuck resolving, the entire browser becomes unusable (with fix)

VERIFIED FIXED in 4.5

Status

--
major
VERIFIED FIXED
15 years ago
15 years ago

People

(Reporter: enigma2, Assigned: jhpedemonte)

Tracking

({qawanted})

x86
OS/2
qawanted

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

15 years ago
User-Agent:       Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.4.1) Gecko/20030830
Build Identifier: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.4.1) Gecko/20030830

if gethostbyname() blocks because a hostname gets stuck resolving, the entire
browser becomes unusable, (with fix)

Reproducible: Always

Steps to Reproduce:
1.x
2.
3.




Heres some code I've written for one of my projects (OS/2 Based) that is
basically gethostbyname() with a timeout value (default: 10 seconds), its quite
simple, and works perfectly.  Unfortunately I'm not familiar with Mozilla's code
so I can't make a patch or convert the OS/2 api calls to everything else, but
I'm hoping someone can help me with this.  Heres the code:

typedef struct _nbhoststruct {
    const char *hostname;
    int   gotname, dienow;
    struct in_addr inaddr;
} nbhoststruct;

void waitforhostname(nbhoststruct *nbparams) {
    struct hostent *host;

    if ((host = gethostbyname(nbparams->hostname))) {
        memcpy(&nbparams->inaddr, host->h_addr, sizeof(struct in_addr));
        nbparams->gotname=1;
    } else {
        nbparams->gotname=2;
    }
    while (!nbparams->dienow)
        DosSleep(1);
    free((char *)nbparams->hostname);
    free(nbparams);
}

int nbgethostbyname(const char *name) {
    nbhoststruct *nbparams;
    time_t connectionstart;
    struct in_addr inaddr;

    nbparams = malloc(sizeof(nbhoststruct));
    nbparams->inaddr.s_addr = 0;
    nbparams->gotname = 0;
    nbparams->dienow = 0;
    nbparams->hostname = strdup(name);
    if (_beginthread((void *)&waitforhostname,NULL,0x8000,(void*)nbparams) == -1) {
        free((char *)nbparams->hostname);
        free(nbparams);
        return 0;
    }

    connectionstart = time(NULL);

    while (!interrupt_connect && !nbparams->gotname &&
time(NULL)<connectionstart+10)
        DosSleep(1);

    // interrupt_connect is just a global int you can use to cancel all
    // pending resolves..
    inaddr.s_addr = nbparams->inaddr.s_addr;
    nbparams->dienow = 1;
    return inaddr.s_addr;
}

Comment 1

15 years ago
I don't think that you can just interrupt gethostbyname() like that - at least
not in a portable way. But you're right, we would really need that timeout value :-)

Note that we just redesigned the whole DNS lookup service (bug 205726). In
particular, bug 70213. We can now support multiple parrallel lookups, because we
switched form gethostbyname() to getaddrinfo(), when it's available on the
platform. But I don't know if that syscall is available on OS/2 (it exists in
Windows XP).
(Reporter)

Comment 2

15 years ago
OS/2 doesn't have getaddrinfo() but I'm sure my code can be modified to emulate
it  - MKaply you reading this? :)

Comment 3

15 years ago
I'll leave this one to Javier to look at :)
OS: other → OS/2

Comment 4

15 years ago
uhm... have you tried building mozilla from the trunk?  the new DNS
implementation calls gethostbyname on background threads, so provided OS/2
supports re-entrant gethostbyname (which i believe it does) this bug should be a
non-issue, and it would probably be a duplicate of bug 70213.
(Assignee)

Comment 5

15 years ago
The trunk nightlies should be back up soon.  I'll leave this one open until the
reporter has had a chance to test with a trunk build.
(Reporter)

Comment 6

15 years ago
I just tested with mozilla 1.5rc2 and the resolver still hangs, can you please
check to make sure the serialized resolver code has made it into the OS/2 port?
1,5RC2 is a brnach build and I don't think that they fix that in the branch
because it's a risky change
(Assignee)

Comment 8

15 years ago
rasta:  Sorry, I didn't explain myself well enough.  The nightly builds are
considered to be the 1.6 version.  As Matti pointed out, the 1.5 branch does not
include Darin's fix for this issue.  Try the nightly build at
http://ftp.mozilla.org/pub/mozilla/nightly/2003-09-29-08-trunk/mozilla-os2.zip
(requires libc02).  That one works for me.

Resolving as WORKSFORME, using the nightly build.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 9

15 years ago
Not here, load
http://ftp.mozilla.org/pub/mozilla/nightly/2003-09-29-08-trunk/mozilla-os2.zip,
open 2 tabs, in the first one go to www.fucka.com (dont ask), then in the second
tab try to load google.  notice google won't load because its stuck resolving
www.fucka.com 

(Assignee)

Comment 10

15 years ago
Reopening after confirming comment #9.

For some reason, this fails only on OS/2.
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---

Comment 11

15 years ago
>For some reason, this fails only on OS/2.

Javier, this turns out to be caused by the fact that _PR_HAVE_THREADSAFE_GETHOST
is not defined in nsprpub/pr/include/md/_os2.h.  the result is that NSPR forces
gethostbyname calls to be synchronized.  gethostbyname promised to return a
pointer to statically allocated memory.  that typically makes it non-reentrant.
 however, some OS vendors make gethostbyname return a pointer to thread local
storage, thereby making the function thread-safe.  if OS/2 implements
gethostbyname in a thread safe manner, then we can easily fix this bug by adding
that define to NSPR.  if not, then there isn't much we can do.  does OS/2
provide getaddrinfo or gethostbyname_r?  (i doubt it.)
(Reporter)

Comment 12

15 years ago
it should be no problem to define _PR_HAVE_THREADSAFE_GETHOST as several of my
OS/2 apps use gethostbyname() between threads, sometimes concurrently without
any problems.  OS/2 has no getaddrinfo or gethostbyname_r though.
(Assignee)

Updated

15 years ago
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Assignee)

Comment 13

15 years ago
-> me
Assignee: darin → pedemont
Status: ASSIGNED → NEW
(Assignee)

Comment 14

15 years ago
Created attachment 135442 [details] [diff] [review]
patch

As specified by the dev docs: "The sockets and network utility routines are
completely reentrant. Multiple threads of an application can perform any socket
call."	So I define _PR_HAVE_THREADSAFE_GETHOST in configure.in.  Thanks,
Darin.
(Assignee)

Updated

15 years ago
Attachment #135442 - Flags: review?(darin)

Comment 15

15 years ago
Comment on attachment 135442 [details] [diff] [review]
patch

r=darin

the alternative is to define this macro in _os2.h because it does not need to
be dynamically determined (based on OS version or something like that).

however, there seems to be some convention here for AC_DEFINE'ing such macros
in configure.in.  

wtc: what do you think? :)
Attachment #135442 - Flags: superreview?(wchang0222)
Attachment #135442 - Flags: review?(darin)
Attachment #135442 - Flags: review+

Comment 16

15 years ago
changing product to NSPR
Component: Networking → NSPR
Product: Browser → NSPR
Version: Trunk → 4.2
(Assignee)

Comment 17

15 years ago
Created attachment 135498 [details] [diff] [review]
patch v1.1

I actually like your way better, Darin.  I also moved some other defines from
configure.in to _os2.h.
Attachment #135442 - Attachment is obsolete: true
(Assignee)

Updated

15 years ago
Attachment #135498 - Flags: superreview?(wchang0222)
Attachment #135498 - Flags: review?(darin)
(Assignee)

Updated

15 years ago
Attachment #135442 - Flags: superreview?(wchang0222)
Attachment #135442 - Flags: review+

Comment 18

15 years ago
Comment on attachment 135498 [details] [diff] [review]
patch v1.1

r=wtc.	You can define these macros in either configure.in or _os.h.
Attachment #135498 - Flags: superreview?(wchang0222) → superreview+

Comment 19

15 years ago
Comment on attachment 135498 [details] [diff] [review]
patch v1.1

r=darin
Attachment #135498 - Flags: review?(darin) → review+

Comment 20

15 years ago
Created attachment 135618 [details] [diff] [review]
patch v1.2 (checked in)

The only difference from Javier's patch v1.1 is that
I moved the definition of _PR_GLOBAL_THREADS_ONLY and
_PR_HAVE_THREADSAFE_GETHOST to the beginning of _os2.h,
near the definition of other _PR_XXX macros.

Updated

15 years ago
Attachment #135498 - Attachment is obsolete: true

Comment 21

15 years ago
Patch checked in on the NSPR trunk (NSPR 4.5) and
NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.6b).
Status: NEW → RESOLVED
Last Resolved: 15 years ago15 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.5

Comment 22

15 years ago
+qawanted.

If someone w/ OS/2 can confirm this is working, please assign youself QA and
mark it VERIFIED.
Status: RESOLVED → VERIFIED
Keywords: qawanted
(Reporter)

Comment 23

15 years ago
Its definately fixed here
You need to log in before you can comment on or make changes to this bug.