Open Bug 192797 Opened 22 years ago Updated 2 years ago

PR_Poll times out when it shouldn't (win9x only)

Categories

(NSPR :: NSPR, defect)

x86
Windows 98
defect

Tracking

(Not tracked)

People

(Reporter: darin.moz, Unassigned)

Details

Attachments

(1 file, 2 obsolete files)

from bug 192294, PR_Poll under Win9x sometimes times out (returns 0) when passed PR_INTERVAL_NO_TIMEOUT. from my investigation, it appears that the bug exists in the OS (_MD_SELECT is returning 0 when tvp == NULL). i tried adding a do-while loop around _MD_SELECT, like this: do { ready = _MD_SELECT(..., tvp); } while (ready == 0 && tvp == NULL); but when we loop, the second _MD_SELECT call errors out with WINSOCK error 10023 "Too many open files". i'm a bit suspicious of this error code, since i would have expected the first call to _MD_SELECT to return this error if it is indeed the problem.
10023 does not seem to be a Winsock error. At least it is not defined in <winsock2.h>. Could you tell me where it is defined? I suspect that the first call to _MD_SELECT modified the fd_sets, so we can't call _MD_SELECT on them again. I will attach a patch for you to try.
Attached patch Experimental patch (obsolete) — Splinter Review
This patch calls select() again if select() times out when we ask it not to. Please test this patch with this printf statement after the _MD_SELECT() call: printf("select with timeout %p returned %d, error %d\n", tvp, ready, (ready == SOCKET_ERROR) ? WSAGetLastError() : 0);
Attached patch Experimental patch v2 (obsolete) — Splinter Review
This version is slightly better.
Attachment #114181 - Attachment is obsolete: true
Attached patch Full patchSplinter Review
This patch addresses all the issues I found reading the Winsock select() documentation. 1. Increase FD_SETSIZE to 1024. (The default value of FD_SETSIZE is only 64.) Make PR_Poll fail with PR_INSUFFICIENT_RESOURCES_ERROR if we try to add more sockets to the fd_set structures than they can hold. 2. Pass a pointer to a fd_set structure to select() only if the fd_set contains at least a socket. 3. If all three fd_set structures are empty, simply do a sleep with the specified timeout and return 0. Please test this patch with the printf statement (see comment 2) after the _MD_SELECT() call. Thanks.
Attachment #114183 - Attachment is obsolete: true
i should point out that this bug always seems to occur when necko is polling only for exceptions on a large number of sockets (~20 or so). i'll give your patch a try shortly ;-)
oh, i should have said... though necko is polling a bunch of sockets for PR_POLL_EXCEPT only, there is still the "pollable event" that is being polled for PR_POLL_READ. so, actually my comment above is not quite correct.
i don't know whether it has anything to do with this but i'll state it anyway: back when i was programming the w3c wwwlib, i had to 'fix up' some loop when it was running on Windows NT. when trying to clear some fd_set with FD_CLR it sometimes failed to clear! so i wrapped it up in a while loop and had one less 'bug' to worry about in the Win32 winsock library. see http://lists.w3.org/Archives/Public/www-lib/2000AprJun/0124.html
Mrten: Thanks for the pointer to your bug fix. Our code is not using FD_CLR (we use FD_ZERO, FD_SET, and FD_ISSET), so the bug we are running into is not the same one.
So, is this patch good? Can it be approved and committed before 1.3 final? This seems simple enough that it can be put in, yes?
chris: we've worked around this bug in the networking code above NSPR, so this bug is not very critical for 1.3.
Yeah, I saw that in bug #192294. But I was still figuring that this might hurt other things, and that if there was a patch that could be moved on, it may as well be... Thanks...
QA Contact: wtchang → nspr
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: