Closed Bug 13660 Opened 25 years ago Closed 25 years ago

PL_ProcessPendingEvents() doesn't just processing pending events

Categories

(NSPR :: NSPR, defect, P1)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: troy, Assigned: travis)

References

Details

(Whiteboard: [PDT+])

Attachments

(2 files)

There is a serious problem with PL_ProcessPendingEvents() on Windows. The
problem is that it doesn't just process the pending events it keeps processing
events until the queue is empyt

For a single threaded usage this is okay, but if there are multiple threads and
one thread is posting events while the other thread receives the events this
can result in PL_ProcessPendingEvents() never yielding

Specifically what happens is that the network thread is receiving socket
notifications and when it receives data it posts an event to the ui thread
queue. The ui thread queue calls PL_ProcessPendingEvents() and starts processing
events. It can take quite a while to process an event, because we parse the
HTML, create frames, and then layout the frames.

Meanwhile, the network thread continues to add new events to the queue, and we
continue to process the newly added events

That means we never return to the native event queue and so we don't process
normal Windows messages. That means mouse/keyboard/paint events are locked out
(as well as all other Windows messages).
I have a suggested patch that I used in my tree here at home. What it does is
have PL_ProcessPendingEvents() only process the events that are already in the
queue.

It does that by making a temporary queue, copying the existing events into that
queue, and then processing those events.

Alternatively we could count the number of existing events and only process that
number of events, but my suggested approach has the advantage of only requiring
one monitor lock while the events are copied over. We don't need to lock/unlock
each time we call PL_GetEvent() like we do today.

Counting the number of events in the queue requires locking the monitor to count
the number of events, then unlocking the monitor. Plus a lock/unlock by
PL_GetEvent() for each events retrieved

Maybe if NSPR 3.0 starts using fast locks using the "lock" instruction this
won't be an issue, but today on NT the locks are done using
EnterCriticalSection() and that's very slow and requires a round trip to ring 0
Attached file suggested patch
The PLEvent module was to be moved from NSPR to Mozilla core quite sometime
back, but it hasn't happened so far.

It will be good to make the move now. Any takers?
Brendan, we really need this bug fixed. Are you the right one to decide where
the PLEvent code moves?
Assignee: srinivas → travis
I own this now.  I will be moving it out of NSPR.
Status: NEW → ASSIGNED
Change of plan...  After trying to pull PLEvents out, I ran into a problem
where I was needing to copy a number of private include files as well.  Since
we eventually want to re-write PLEvents in an XPCOM fashion, I am going to push
off removing PLEvents from NSPR until we get to that step.  For now I am going
to apply Troy's changes in place to NSPR.  Brendan and I talked about this and
seemed like the best approach for right now.  srinivas, any problem with that?
Ok, you can checkin the modifications to PLEvent.
Priority: P3 → P1
Target Milestone: M11
Why is this a blocker?
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
I didn't have the rights to modify the files so srinivas@netscape.com checked
the fix in.
everybody,  check the side effect that the fix may have produced
http://bugzilla.mozilla.org/show_bug.cgi?id=14325

is this connected?  who do we get the benefits with out the side effect?
Should this be reopened if the fix was backed out?
The fix wasn't backed out. It was marked #if defined(WIN32)
Blocks: 14469
Status: RESOLVED → REOPENED
I know this new code is only turned on for Windows now, but for a while, before
it was Windows only, I was running with it on OpenVMS. Or at least trying to.
Its severely broken.

In PL_ProcessPendingEvents we make a copy of the event queue WITHOUT a monitor.
Then we read bytes from the notification pipe and decrement notifyCount in the
copy of the queue, thus leaving the real notifyCount in the real event queue
holding a number which no longer represents the number of bytes in the pipe.

The result is that the notification mechanism stalls and/or we go compute
bound on the select.
queue. Hence we get out of whack.
On Windows we don't to lock the monitor when getting events from the queue. So
it's not broken on Windows.

You're probably right about OpenVMS and reading from the pipe; I have no idea
how that code works
Sorry, I should have said, OpenVMS builds pretty much as Unix.

If Windows doesn't use the notification pipe, and the new code will never
be turned on for anything BUT Windows, then all is well. I was just worried
that the new code was going to get turned on for everyone again in M11.
I'm glad you identified the problem. Everyone was wondering why we had a
problem with Linux, but no one has had time to sit down and take a good look.

Thanks for finding the problem
Resolution: FIXED → ---
Would Mac benefit from a Windows-like fix here?

/be
Whiteboard: [PDT+]
Putting on [PDT+] radar.
m12
Isn't this one fixed now? I was looking at the code yesterday and
PL_ProcessPendingEvents now counts the number of entries on the queue
when it is called, and then only processes this many. This has fixed the
problem with events being added to the queue while the original ones are
still be processed, and done so in a safe manner (the previous attempt to
fix this problem was very unsafe).
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
This is so fixed (although indentation style in plevent.c is shot to hell!).

/be
Status: RESOLVED → VERIFIED
marking Verified.
Target Milestone: M12 → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: