Closed Bug 219913 Opened 22 years ago Closed 22 years ago

if gethostbyname() blocks because a hostname gets stuck resolving, the entire browser becomes unusable (with fix)

Categories

(NSPR :: NSPR, defect)

x86
OS/2
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: enigma2, Assigned: jhpedemonte)

Details

(Keywords: qawanted)

Attachments

(1 file, 2 obsolete files)

User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.4.1) Gecko/20030830 Build Identifier: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.4.1) Gecko/20030830 if gethostbyname() blocks because a hostname gets stuck resolving, the entire browser becomes unusable, (with fix) Reproducible: Always Steps to Reproduce: 1.x 2. 3. Heres some code I've written for one of my projects (OS/2 Based) that is basically gethostbyname() with a timeout value (default: 10 seconds), its quite simple, and works perfectly. Unfortunately I'm not familiar with Mozilla's code so I can't make a patch or convert the OS/2 api calls to everything else, but I'm hoping someone can help me with this. Heres the code: typedef struct _nbhoststruct { const char *hostname; int gotname, dienow; struct in_addr inaddr; } nbhoststruct; void waitforhostname(nbhoststruct *nbparams) { struct hostent *host; if ((host = gethostbyname(nbparams->hostname))) { memcpy(&nbparams->inaddr, host->h_addr, sizeof(struct in_addr)); nbparams->gotname=1; } else { nbparams->gotname=2; } while (!nbparams->dienow) DosSleep(1); free((char *)nbparams->hostname); free(nbparams); } int nbgethostbyname(const char *name) { nbhoststruct *nbparams; time_t connectionstart; struct in_addr inaddr; nbparams = malloc(sizeof(nbhoststruct)); nbparams->inaddr.s_addr = 0; nbparams->gotname = 0; nbparams->dienow = 0; nbparams->hostname = strdup(name); if (_beginthread((void *)&waitforhostname,NULL,0x8000,(void*)nbparams) == -1) { free((char *)nbparams->hostname); free(nbparams); return 0; } connectionstart = time(NULL); while (!interrupt_connect && !nbparams->gotname && time(NULL)<connectionstart+10) DosSleep(1); // interrupt_connect is just a global int you can use to cancel all // pending resolves.. inaddr.s_addr = nbparams->inaddr.s_addr; nbparams->dienow = 1; return inaddr.s_addr; }
I don't think that you can just interrupt gethostbyname() like that - at least not in a portable way. But you're right, we would really need that timeout value :-) Note that we just redesigned the whole DNS lookup service (bug 205726). In particular, bug 70213. We can now support multiple parrallel lookups, because we switched form gethostbyname() to getaddrinfo(), when it's available on the platform. But I don't know if that syscall is available on OS/2 (it exists in Windows XP).
OS/2 doesn't have getaddrinfo() but I'm sure my code can be modified to emulate it - MKaply you reading this? :)
I'll leave this one to Javier to look at :)
OS: other → OS/2
uhm... have you tried building mozilla from the trunk? the new DNS implementation calls gethostbyname on background threads, so provided OS/2 supports re-entrant gethostbyname (which i believe it does) this bug should be a non-issue, and it would probably be a duplicate of bug 70213.
The trunk nightlies should be back up soon. I'll leave this one open until the reporter has had a chance to test with a trunk build.
I just tested with mozilla 1.5rc2 and the resolver still hangs, can you please check to make sure the serialized resolver code has made it into the OS/2 port?
1,5RC2 is a brnach build and I don't think that they fix that in the branch because it's a risky change
rasta: Sorry, I didn't explain myself well enough. The nightly builds are considered to be the 1.6 version. As Matti pointed out, the 1.5 branch does not include Darin's fix for this issue. Try the nightly build at http://ftp.mozilla.org/pub/mozilla/nightly/2003-09-29-08-trunk/mozilla-os2.zip (requires libc02). That one works for me. Resolving as WORKSFORME, using the nightly build.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
Not here, load http://ftp.mozilla.org/pub/mozilla/nightly/2003-09-29-08-trunk/mozilla-os2.zip, open 2 tabs, in the first one go to www.fucka.com (dont ask), then in the second tab try to load google. notice google won't load because its stuck resolving www.fucka.com
Reopening after confirming comment #9. For some reason, this fails only on OS/2.
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
>For some reason, this fails only on OS/2. Javier, this turns out to be caused by the fact that _PR_HAVE_THREADSAFE_GETHOST is not defined in nsprpub/pr/include/md/_os2.h. the result is that NSPR forces gethostbyname calls to be synchronized. gethostbyname promised to return a pointer to statically allocated memory. that typically makes it non-reentrant. however, some OS vendors make gethostbyname return a pointer to thread local storage, thereby making the function thread-safe. if OS/2 implements gethostbyname in a thread safe manner, then we can easily fix this bug by adding that define to NSPR. if not, then there isn't much we can do. does OS/2 provide getaddrinfo or gethostbyname_r? (i doubt it.)
it should be no problem to define _PR_HAVE_THREADSAFE_GETHOST as several of my OS/2 apps use gethostbyname() between threads, sometimes concurrently without any problems. OS/2 has no getaddrinfo or gethostbyname_r though.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
-> me
Assignee: darin → pedemont
Status: ASSIGNED → NEW
Attached patch patch (obsolete) — Splinter Review
As specified by the dev docs: "The sockets and network utility routines are completely reentrant. Multiple threads of an application can perform any socket call." So I define _PR_HAVE_THREADSAFE_GETHOST in configure.in. Thanks, Darin.
Attachment #135442 - Flags: review?(darin)
Comment on attachment 135442 [details] [diff] [review] patch r=darin the alternative is to define this macro in _os2.h because it does not need to be dynamically determined (based on OS version or something like that). however, there seems to be some convention here for AC_DEFINE'ing such macros in configure.in. wtc: what do you think? :)
Attachment #135442 - Flags: superreview?(wchang0222)
Attachment #135442 - Flags: review?(darin)
Attachment #135442 - Flags: review+
changing product to NSPR
Component: Networking → NSPR
Product: Browser → NSPR
Version: Trunk → 4.2
Attached patch patch v1.1 (obsolete) — Splinter Review
I actually like your way better, Darin. I also moved some other defines from configure.in to _os2.h.
Attachment #135442 - Attachment is obsolete: true
Attachment #135498 - Flags: superreview?(wchang0222)
Attachment #135498 - Flags: review?(darin)
Attachment #135442 - Flags: superreview?(wchang0222)
Attachment #135442 - Flags: review+
Comment on attachment 135498 [details] [diff] [review] patch v1.1 r=wtc. You can define these macros in either configure.in or _os.h.
Attachment #135498 - Flags: superreview?(wchang0222) → superreview+
Comment on attachment 135498 [details] [diff] [review] patch v1.1 r=darin
Attachment #135498 - Flags: review?(darin) → review+
The only difference from Javier's patch v1.1 is that I moved the definition of _PR_GLOBAL_THREADS_ONLY and _PR_HAVE_THREADSAFE_GETHOST to the beginning of _os2.h, near the definition of other _PR_XXX macros.
Attachment #135498 - Attachment is obsolete: true
Patch checked in on the NSPR trunk (NSPR 4.5) and NSPRPUB_PRE_4_2_CLIENT_BRANCH (Mozilla 1.6b).
Status: NEW → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.5
+qawanted. If someone w/ OS/2 can confirm this is working, please assign youself QA and mark it VERIFIED.
Status: RESOLVED → VERIFIED
Keywords: qawanted
Its definately fixed here
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: