Closed Bug 13660 Opened 25 years ago Closed 25 years ago

PL_ProcessPendingEvents() doesn't just processing pending events

Categories

(NSPR :: NSPR, defect, P1)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: troy, Assigned: travis)

References

Details

(Whiteboard: [PDT+])

Attachments

(2 files)

There is a serious problem with PL_ProcessPendingEvents() on Windows. The problem is that it doesn't just process the pending events it keeps processing events until the queue is empyt For a single threaded usage this is okay, but if there are multiple threads and one thread is posting events while the other thread receives the events this can result in PL_ProcessPendingEvents() never yielding Specifically what happens is that the network thread is receiving socket notifications and when it receives data it posts an event to the ui thread queue. The ui thread queue calls PL_ProcessPendingEvents() and starts processing events. It can take quite a while to process an event, because we parse the HTML, create frames, and then layout the frames. Meanwhile, the network thread continues to add new events to the queue, and we continue to process the newly added events That means we never return to the native event queue and so we don't process normal Windows messages. That means mouse/keyboard/paint events are locked out (as well as all other Windows messages).
I have a suggested patch that I used in my tree here at home. What it does is have PL_ProcessPendingEvents() only process the events that are already in the queue. It does that by making a temporary queue, copying the existing events into that queue, and then processing those events. Alternatively we could count the number of existing events and only process that number of events, but my suggested approach has the advantage of only requiring one monitor lock while the events are copied over. We don't need to lock/unlock each time we call PL_GetEvent() like we do today. Counting the number of events in the queue requires locking the monitor to count the number of events, then unlocking the monitor. Plus a lock/unlock by PL_GetEvent() for each events retrieved Maybe if NSPR 3.0 starts using fast locks using the "lock" instruction this won't be an issue, but today on NT the locks are done using EnterCriticalSection() and that's very slow and requires a round trip to ring 0
Attached file suggested patch
The PLEvent module was to be moved from NSPR to Mozilla core quite sometime back, but it hasn't happened so far. It will be good to make the move now. Any takers?
Brendan, we really need this bug fixed. Are you the right one to decide where the PLEvent code moves?
Assignee: srinivas → travis
I own this now. I will be moving it out of NSPR.
Status: NEW → ASSIGNED
Change of plan... After trying to pull PLEvents out, I ran into a problem where I was needing to copy a number of private include files as well. Since we eventually want to re-write PLEvents in an XPCOM fashion, I am going to push off removing PLEvents from NSPR until we get to that step. For now I am going to apply Troy's changes in place to NSPR. Brendan and I talked about this and seemed like the best approach for right now. srinivas, any problem with that?
Ok, you can checkin the modifications to PLEvent.
Priority: P3 → P1
Target Milestone: M11
Why is this a blocker?
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
I didn't have the rights to modify the files so srinivas@netscape.com checked the fix in.
everybody, check the side effect that the fix may have produced http://bugzilla.mozilla.org/show_bug.cgi?id=14325 is this connected? who do we get the benefits with out the side effect?
Should this be reopened if the fix was backed out?
The fix wasn't backed out. It was marked #if defined(WIN32)
Blocks: 14469
Status: RESOLVED → REOPENED
I know this new code is only turned on for Windows now, but for a while, before it was Windows only, I was running with it on OpenVMS. Or at least trying to. Its severely broken. In PL_ProcessPendingEvents we make a copy of the event queue WITHOUT a monitor. Then we read bytes from the notification pipe and decrement notifyCount in the copy of the queue, thus leaving the real notifyCount in the real event queue holding a number which no longer represents the number of bytes in the pipe. The result is that the notification mechanism stalls and/or we go compute bound on the select. queue. Hence we get out of whack.
On Windows we don't to lock the monitor when getting events from the queue. So it's not broken on Windows. You're probably right about OpenVMS and reading from the pipe; I have no idea how that code works
Sorry, I should have said, OpenVMS builds pretty much as Unix. If Windows doesn't use the notification pipe, and the new code will never be turned on for anything BUT Windows, then all is well. I was just worried that the new code was going to get turned on for everyone again in M11.
I'm glad you identified the problem. Everyone was wondering why we had a problem with Linux, but no one has had time to sit down and take a good look. Thanks for finding the problem
Resolution: FIXED → ---
Would Mac benefit from a Windows-like fix here? /be
Whiteboard: [PDT+]
Putting on [PDT+] radar.
m12
Isn't this one fixed now? I was looking at the code yesterday and PL_ProcessPendingEvents now counts the number of entries on the queue when it is called, and then only processes this many. This has fixed the problem with events being added to the queue while the original ones are still be processed, and done so in a safe manner (the previous attempt to fix this problem was very unsafe).
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
This is so fixed (although indentation style in plevent.c is shot to hell!). /be
Status: RESOLVED → VERIFIED
marking Verified.
Target Milestone: M12 → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: