I am going to use pt_poll_now in ptio.c as an example of this bug, but there are 7 or 8 copies of this code for various platforms and for select and poll, and they all have this bug. If the original timeout or the local variable 'remaining' is greater than 0 but less than one milliseconds, 'msecs' will be 0, and we will poll for 0 ms. If there are no I/O events, poll will time out, and we will add PR_MillisecondToInterval(msecs), which is 0, to 'now'. So 'now' won't be advanced, and we will get stuck in the loop polling with 0 timeout. The cause of the bug is that 'msecs' is computed from the original timeout or 'remaining' with PR_IntervalToMilliseconds, which rounds down the result. So PR_MillisecondsToInterval(msecs) will be smaller than the original timeout or 'remaining' due to rounding. In the problematic case described above, msecs is 0, and it will appear that the current time 'now' does not advance. The fix is to avoid the PR_MillisecondsToInterval(msecs) call unless it is absolutely necessary. If the poll timeout is 'remaining', just use it instead of recomputing it from 'msecs'. This bug only affects NSPR clients using *blocking* sockets. The Mozilla client uses non-blocking sockets, so it is not affected by this bug.
I just wanted to acknowledge that Jeff Stewart of Good Technology reported and tracked down this bug and explained it to me.
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 4.2
The fix has been checked into the tip of NSPR and is in NSPR 4.2.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
This patch, excluding the OS/2 portion (os2sock.c), has also been checked into the NSPR 4.1 branch. OS/2 was excluded because the socket code underwent significant changes in NSPR 4.2, and this patch was made against 4.2.
You need to log in before you can comment on or make changes to this bug.