Closed Bug 264268 Opened 20 years ago Closed 9 years ago

Implement alternative to PR_NewPollableEvent + PR_Poll for the socket transport service on Win32

Categories

(Core :: Networking, enhancement, P5)

x86
Windows XP
enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: darin.moz, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted)

Implement alternative to PR_NewPollableEvent + PR_Poll for the socket transport
service on Win32.

this code is notorious for triggering firewall alerts about mozilla opening
server sockets.  clearly, these are "false positives" since the server socket is
really just a loopback, connected socket pair that cannot be abused by an
attacker.  nonetheless, this warning surely makes users feel uneasy.

moreover, there are better ways to implement this socket + 'pollable event'
approach under Win32.  instead of changing the implementation of
PR_NewPollableEvent and PR_Poll in NSPR, it might be easier to simply fork
nsSocketTransportService2.cpp.  it's less than 600 lines of code to consider. 
we can probably refactor nsSocketTransportService2.cpp to further reduce the
amount of #ifdef'd code.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta
changing NSPR would allow other users of this function to share the
improvements, though
WTC and I spoke about doing this in NSPR, and the problem is that there are
limits on the number of socket events that can be waited on simultaneously, at
least under older versions of Windows.  That means that using
WaitForMultipleEvents may not be an option in general.

Since we already limit the number of sockets created by Necko, Necko seems like
the place where we could implement this with the least amount of pain.

Someone correct me if I'm wrong.
WaitForMultipleObjects can only wait for
MAXIMUM_WAIT_OBJECTS handles at a time.
MAXIMUM_WAIT_OBJECTS is defined to be 64.
This value doesn't seem to depend on the
Windows version.
Would IOCP work better for us here?  We could fall back to the current mechanism
if the system is too old (pre-NT 3.5?); I'm not nearly as worried about the
false alarms on older systems, myself.

IOCP also scales better than WFMO, which would make it a more appropriate
NSPR-level solution, possibly. 
According to MSDN:

> I/O completion ports are the mechanism by which an application uses a pool of 
> threads that was created when the application was started to process 
> asynchronous I/O requests.

So, IOCP implies the existance of one or more background threads.  What we're
looking for here is not the addition of new threads, because we already having a
threading model that we are happy with.

All we really need is a way to wait for any one of the following conditions to
occur:

 (1) a socket is ready for us to read, write, etc.
 (2) a task (PLEvent) is pending to be run on the socket transport thread
 (3) a timeout occured

We implement #2 today by using PR_NewPollableEvent, and then we call PR_Poll on
our sockets plus the event.  When we post a PLEvent to our queue, we tickle the
pollable event thus unblocking the call to PR_Poll.  That wakes up the socket
thread and it goes an processes the event queue.  Based on my testing, this I/O
mechanism is plenty fast.  On fast networks, it is not uncommon for necko to
stall with full buffers waiting for gecko to consume incoming data.  So, I think
we are doing a fine job of pulling data off the wire.

The idea behind WFMO is that it would allow us to replace the pollable event
with a Win32 Kernel Event object and hence avoid the problems with the pollable
event and firewalls.  I would be surprised if WFMO posed a performance problem
for us, given our usage pattern.

My main concern is the call to WSAEventSelect because there is not a clear
mapping between the various PR_POLL flags and the values for the lNetworkEvents
parameter in the 'connect' and 'accept' cases.  We'd need a way to express
FD_CONNECT and FD_ACCEPT, I think.  Hmm...
Blocks: 279707
no chance that i will get to this in time for gecko 1.8.  instead, i plan to
delay initialization of the socket transport service.  that'll probably happen
as part of bug 283049.
Depends on: 283049
Keywords: helpwanted
Target Milestone: mozilla1.8beta1 → Future
It occured to me that we might be able to eliminate the need for
PR_NewPollableEvent by making use of PR_Interrupt.
According to wtc:

  There is no way to unblock select() on Windows.
  So we call select with at most a 5 second timeout, and check the
  interrupt flag in between.  So the latency of PR_Poll's response
  to PR_Interrupt is 2.5 seconds on average.

In other words, using PR_Interrupt is not an option for us in this context.
can you use a pipe
(http://www.mozilla.org/projects/nspr/reference/html/priofnc.html#19605) for
this purpose?

(Maybe that's what the pollable event impl should use?)
> can you use a pipe
> ... for this purpose?

No, you cannot because Win32's select() function only works with sockets.
Following comment #6, and the comment here (about loading Socket and DNS
services lazily):
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsIOService.cpp#190

I implemented a mechanism of such lazy loading in nsIOService. This helped, but
the SocketTransportService was inited anyway a bit further down the road (with
the ZoneAlarm alert, of course):
http://lxr.mozilla.org/seamonkey/source/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp#109
(which was trigered by a call to 'nsNavigator::GetPlatform').

So, it seems that the lazy loading of nsSocketTransportService in nsIOService is
not the way to go. By the way - the ZoneAlarm alerts are especially suspicious
to users, if the user starts a stand alone XULRunner application that by its
functionality should never try and connect to the net. Should we maybe CC
bsmedberg on this?
Alex: then we should also lazily load nsHttpConnectionMgr.
(In reply to comment #12)
> Alex: then we should also lazily load nsHttpConnectionMgr.

I'm a bit concerned that there are several places out there that get the
nsSocketTransportService as a side effect, so fixing one might just bring out
the next one. I guess we can try this approach and see where it takes us.

I was looking at possible solutions of the original PR_NewPollableEvent problem.
I believe that instead of creating a socket and binding it to the loopback, we
can create a dummy, unbound socket. Closing it should cause 'select' to return,
and it can be recreated after that. What do you think?
Perhaps this bug should not be tackled by clever coding,
but by talking to Zone Labs.  We should ask Zone Labs why
they warn about a listening socket bound to the loopback
address.  Because of this bug, I bought and installed the
software firewall products of two other vendors (Symantec
Norton and McAfee), and neither of them warns about the
listening socket created by PR Pollable Event.
(In reply to comment #14)
> Perhaps this bug should not be tackled by clever coding,
> but by talking to Zone Labs.  We should ask Zone Labs why
> they warn about a listening socket bound to the loopback
> address.  
Actually, my Zone Alarm (4.5.538.000) only complains about the local connect
request and not about the listening socket. I gueass we could ask Zone Labs, but
even if they agree, there will still be many users with older versions. Also,
who knows what other firewalls are out there, and what are their policies...


Assignee: darin → nobody
Status: ASSIGNED → NEW
QA Contact: benc → networking
Target Milestone: Future → ---
Just a note that we are seeing this come up often with people having firewalls warn of opening a server socket.   Nom for 1.9 so we keep this on the Radar.
Flags: blocking1.9a2?
Flags: blocking1.9a2? → blocking1.9+
Ere or Robstrong, are you interested in this? Looking for an owner.
The chances of my finding the time to do this are very slim.
I'm interested though I'll need to dig into the sockets stuff to get familiar with them. But how about trying WaitForMultipleObjects in a loop for MAXIMUM_WAIT_OBJECTS objects with 0 timeout?
Ere, yes that is pretty much the right solution, but the problem is that it is very difficult to implement PR_Poll in terms of WaitForMultipleObjects because WFMO only supports waiting on at most 64 handles, whereas PR_Poll has no such restriction.  Two possibilities: 1) make NSPR have a limit on the number of PRPollDesc objects that can be passed to PR_Poll, 2) make nsSocketTransport use Windows directly.  Option #1 is fine for Firefox, but probably not fine in general for NSPR.  Option #2 is complicated by nsISocketProvider (which is how NSS is plugged in for TLS/SSL support).
It seems I asked something else I wanted to ask. What I meant is calling WFMO in a loop for MAXIMUM_WAIT_OBJECTS objects at a time with timeout of 0 and checking the return value. In practice:

1. ret=WFMO(MAXIMUM_WAIT_OBJECTS, <first MAXIMUM_WAIT_OBJECTS handles>, FALSE, 0)
2. break out if ret!=WAIT_TIMEOUT
3. ret=WFMO(MAXIMUM_WAIT_OBJECTS, <next MAXIMUM_WAIT_OBJECTS handles>, FALSE, 0)
4. break out if ret!=WAIT_TIMEOUT
5. repeat..

This would be done inside PR_Poll or such to make it transparent. I don't know of any downsides of doing this but it might be worth a shot.
This is called busy polling.  The downside of busy polling is that
it wastes CPU cycles.  Because of the 0 timeout, the thread that
executes this loop will consume all the cycles of the CPU it runs
on.
Yeah, that might cause trouble. I assume adding a small sleep is also out of the question. I know this might be overkill, but WFMO in multiple threads?
Given Windows' lame limit, I vote for NSPR growing hair to allow an embedding to use WFMO under PR_Poll. Not sure exactly how ugly that hair should be (nose hair? ear hair? yuck! :-P) but it would enable a new, incompatible mode that limits the number of "fd"s to the WFMO limit, and explicitly uses WFMO.

/be
Do the Mozilla clients pass less than 64 file descriptors to
PR_Poll?

We should also make sure we can support full duplex I/O in this new
implementation -- we need to be able to wait for "ready for reading"
"ready for writing" events on a socket simultaneously.
Another shot in the dark: Would it be possible to have a global "something happened" event and then the WFMO loop to check which socket it was?
Reassigning to jmathies per mconnor. Jim, let me know if you want to talk about this.
Assignee: nobody → jmathies
Does this really need to block 1.9? From reading the comments it sounds like this is more of a networking enhancement. I understand the issues with firewalls but considering the level of chatter on this bug I wonder if this is really a major issue? 
This is a consistent pain for users who update Firefox and have a firewall... fixing it would be a major usability win because it reduces the number of prompts that a new user sees who is running Firefox for the first time. Thus the blocking marker.
Priority: -- → P5
Flags: tracking1.9+ → wanted-next+
Assignee: jmathies → nobody
I was about to say, not to interrupt, and then I see the dates.

With the amount of ignorance, skepticism, and paranoia, amongst  'new users' I would figure that such an issue would have been hidden. Warnings, Alerts and other indicators form varied 'anti-X' software would put even those who are aware, on alert, at it's first appearance.

I understand that the loopback is being used on Windows, and is not needed for *nix.

This has me wondering what in Windows requires ..something else? Where 'something else' usually allows for holes of one kind or another. I bet, found or not, there is a way to exploit this (A software version of an inductor - is one such thought).

In short, it puts me off. And I can understand. Imagine the user who does not understand or can't - at all.

*shudder the eXPeriance*
Blocks: 551158
Actually, the fact that TB 3.1.6 (WinXP Pro SP3) often uses port 3689, thus stealing the port for streaming music from iTunes to my Airport Express streaming client, is a pain in the a**. As TB usually is one of the first apps I start up after booting the system and iTunes is only used once in a while, iTunes often does not see Aiport Express, because it cannot get the listening port. I just noticed by chance when I watched port usage in my personal firewall's TCP/UDP monitor.

I have no idea if there is a way for you to fix this in TB, but obviously if TB would not use listening sockets for internal communication, the probability of 3689 being free for iTunes would be considerably higher.
this is effectively not an issue afaict in the support channels anymore.. and the alternatives are not good.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.