Closed Bug 67370 Opened 24 years ago Closed 23 years ago

interleaving events and xlib events

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: blizzard, Assigned: blizzard)

Details

Attachments

(3 files)

This is a tracking bug.  I want to see if interleaving xlib events and plevent
events is a win or not.
Attached patch patch #1Splinter Review
Have to remember to remove the static nsVoidArray this adds.
Status: NEW → ASSIGNED
Keywords: patch
Adding dougt + danm.  Guys, there's a patch in here that mucks with gtk events
that I want you guys to look at.
A quickish look tells me it probably won't hurt anything. Curious, though -- 
processing nspr events during gtk event processing would be basically increasing 
the priority of nspr events. Whazzup?
It doesn't really increase the priority of nspr events exactly.  It actually
puts them on equal footing since we only process events that are in the queue
when compared with the serial of the X event.

The most important thing that this fixes is if X is really busy we aren't
completely starving the plevent queue and vice versa.  That's what all the 'id'
stuff in the PLEvent object is.
Attached patch patch #2Splinter Review
OK, I need reviews on this patch.  It's done enough and people don't seem to see
any bad effects from it.

I need jrgm [ now cc'ed on this bug ] to apply this patch to a linux build and
do a perf run on it to see if it affects page load times since I'll be yelled at
if it does affect it.
Okay, I'm pulling a new build now, and I'll have an answer later this evening.
Just to confirm: patch #2, right?
Okay, so I ran this patch on a build with '--disable-debug --enable-optimize'.
I ran 5x40 urls, cacheable content, 1sec delay between pages, sidebar 
collapsed, 600x800 content area, ...

There is no difference in the times with or without the patch. The average 
median time is 3.249 seconds without, and 3.254 seconds without, with the 
median times for every URL staying within a 3 percent of each other. (Note
that this was not done in a situation where the X server was particularly 
busy).

So, be yelling at me, if I'm wrong.
Attached image pictures and everything
To John Morrison: be sure that X11 is very busy doing things. Try to run the
Zilla cross-continental (~200km wire or more) and/or via ssh channel to increase
latency...
Hmm, one caveat (to any test) is that if I introduce a variance of 20% in the
measurements, then I have no confidence in results that differ by less than
20% (waving hands but you know what I mean). In other words, I don't know if
we can get a useful measurement in the exceptional case. Is there some other
middle level case we can look at and/or is it necessary to test further with
the X server in a more active condition.
Remote Xserver and slow (local) Xservers (m64 in Ultra5/Blade100) are cases
where this optimisation described here would make sense...
Remote Xserver should be easy to test (please no 1000baseT... 100baseT may still
to fast when 8bit visuals are used... best would be 10baseT with a 24bit visual
to make X11 slow&sluggish) - slow local Xserver perfmance testing would only be
usefull with multi-CPU machines (Pavlov has a shiny E450... :-) or "renice" the
Xserver to run with a _lower_ priority (e.g. only get CPU if Mozilla process
does not need the CPU)...
Thanks, John.  I appreciate your time in testing this.
Oh, and to answer your question I don't think that you need to test exceptional
cases.  This patch will probably help in cases where there is a lot of X traffic
in between mozilla and the server and a lot of layout and thread events
happening as well.  Less starvation.
I need dougt or danm to look at these changes and say r= or no r= since they are
the event queue dudes.
la la la still waiting for people to review this la la la
If a review falls in a forest, does it make a sound?
r=timeless, but it sounds like you really want an r=pavlov.
 #endif /* XP_UNIX */
+
+/* extra functions for unix */
+
+#ifdef XP_UNIX
any reason to have 2 separate ifdef xp unix?
/*
**
is ** common in /* comments?
I put the extra #ifdef XP_UNIX in there because it seemed like the right thing
to do at the time.  In that file the XP_* flags tend to surround each function
for each platform.  Also, the ** in /* comments is how it is in the rest of that
header file.  Just following local custom.
  Sorry it took me so long to get around to reviewing this. Once again my 
Bugzilla mail and I have lost track of each other.
  So basically I think the patch looks like it'll work as intended (I assume 
you've run with it and it actually helps!). A nit: the PL_ProcessEventsBeforeID 
returns as its value its "count" variable, which is uninitialized. Not important 
because no one uses the return value, but wants fixing.
  However, nits aside, in terms of overal structure of the patch, I gotta express 
some misgivings. One thing is the duplication of PL_ProcessPendingEvents. Can you 
fold it and the new PL_ProcessEventsBeforeID into one function with some sort of 
illegal ID value? I'm worried that the somewhat fragile method, once duplicated, 
will undergo divergent evolution.
  As an example, I'm curious why ProcessEventsBefore has a dissimilar exit 
clause: the original puts a notification back on the native queue if events 
remain, but the new one doesn't. Accident or intent? This is all twitchy code 
that makes me twitch.
  But more to the point, I'm concerned that there is now code in two places that 
traverses the active event queues, looking for events to process. The plevent.c 
code is a kind of brute underlayer, and there's already a somewhat delicately 
balanced thing sitting on top (in nsEventQueue and nsEventQueueService) that 
groups them together and controls which queues are processed under which 
circumstances.
  You're adding another layer of similar code that does something very similar 
but tangentially. I'm afraid they'll short out on each other some day, wires 
crossed as they are.
  I'd really rather you put this stuff in the upper layer, which is already 
playing the balancing act.
Thanks for your review, Dan.

The fact that count isn't initialized is a bug. I've got that change locally but
I don't think that it's worth a whole new patch.

As for your other misgivings I thought that they two functions were different
enough to warrent a new function as opposed to folding them all together.  I
could probably merge them into a single function but I'm worried about
fragility, too.  If you want me to try and do that I will.

As to your specific question about why I don't put a token back into the queue
when the event count is > 0 is I never take the token out.  Remember that this
running of the event queue is not done from the standard unix native event queue
notification method.  That method is to wake up from select() by reading from a
file descriptor.  In this method if there are still events in the queue there
will still be a native token in the pipe and if that event ends up being
processed outside of the event processing loop it will still wake up from
select() and process the event queue.

And I don't see a way to put this into a higher level except to wrap it and that
wouldn't buy us much except to complicate the code somewhat.  I still need to
label each event queue and each event with an ID and that needs to be done at
the lowest level.

If you want to get rid of plevent.c and move to
EventQueueUnix.cpp/EventQueueWin.cpp then we're talking about something else I
would buy into.  The current impl is organic and a mess.  I'm just trying to
cause as little pain as possible here. :)
  Alright; thanks for the explanations. The native notification removal/addition 
stuff is twitchy in its cross-platform glory, but this being gtk-only makes it 
easier. My particular objection to the plevent-vs-nsEventQueue thing is this: 
nsEventQueue keeps a two dimensional array of queues, thread vs stack-o'-stalled-
queues. Event processing steps down one of the dimensions (down the stack, within 
the current thread, though I think that's broken on gtk, because of the 
ListenToEventQueue stuff). At the plevent level, queues are independent. All the 
queuegroup interactions are handled in one place, at the nsEventQueue level.
  This is goofy code and delicate. (See bug 54371, though that bug's specific 
complaint is about something different and has been fixed. But it remains open 
because of the gtk problem and gives a window on the seamy underbelly of the 
event code.) Your patch in this bug adds another level of queue grouping 
independent of the nsEventQueue's grouping, bringing the concept of grouping also 
into plevent. That's my main worry.
  Oo. Come to think of it, that is a problem. Suddenly I have a specific worry, 
rather than general qualms. There is a problem in the current (unpatched) code. 
The ListenToEventQueue thing causes events in worker threads to be processed 
outside their thread. Oddly, this mostly doesn't seem to hurt anything. But I 
hear complaints from the networking guys that they're seeing events occasionally 
processed out of order on the Linux builds, and I'm sure this is the reason. It 
wants to be fixed, but I haven't figured out how yet.
  Your patch links all the event queues together and processes them at odd 
moments, all in a bunch, regardless of their owning thread. It'll exacerbate the 
ListenToEventQueue problem of events being processed off thread and out of order.
  Yes there is already a problem here, but you don't want to make it worse. I'm 
beginning to be pretty convinced that code which wants to jump from queue to 
queue, processing events, wants to be in the nsEventQueue level, where we're 
already doing that, where lives the knowledge to stay in the current thread and 
process them in the correct order of stalled queues within the thread.
I only use this on the primary thread on the thread event queue.  I'm not
talking about using this anywhere except there.  Doesn't that obsolve your
worries about processing events on the wrong thread?

Also, if you call ListenToEVentQueue(PR_FALSE) I don't process events for that
queue anymore, just like normal processing.  So, if you fix that then you fix
this entry point too.

My patch does _not_ process event queues in bunches and at odd times.  In fact
that's how I describe the PL_ProcessPendingEvents() method.  I no longer queue
up events until we get back up to the idle loop.  This method prevents us from
being very busy and failing to ever pay attention to the event queue which might
have a lot of important events in it.  If you have a lot of timers firing with a
lot of drawing going on it's very likely that you will end up completely
starving the event queue or the window queue.  This is what I'm trying to fix here.

I always process the events in order with regards to window events.  That's what
the whole addition of the 'id' element is on the PLEvent structure is for.  I
never process events for which the window events haven't been seen yet.  This
should keep draws and reflows syncronized as well.

Essentially what I've tried to do is replicate the way that Windows does its
event queue handling.  That is that they intermingle window event messages and
event queue messages.  It's better than what we have in unix for some things and
I'm trying to make things smoother wrt event delivery and event queue starvation.
  But you're hooking into ListenToEventQueue, so I don't see how you're limiting 
yourself to queues on the main thread. The patch affects every queue that's 
listened to, which at the moment is every queue anyone ever creates, anywhere.
  It's unfair of me to say the queues are processed at odd moments. But they will 
be processed at more moments, and in a great fell swoop that traverses every 
active listening queue, in the order they were created. That's probably the right 
order, at least within a thread, but why second guess all the careful ordering 
already in nsEventQueue?
  Now within a single queue, you're pulling events out in order and potentially 
stopping short. No worries there. It's what happens when you jump to the next 
queue that I'm all bunched up about. To be honest I don't completely follow the 
event ID thing. I gather you're being very careful to stop short in each queue 
when you run into an NSPR event that belongs to a later native notification. 
Alright. That may work. It still seems like there could be problems between 
queues, with event delivery latency and all that.
  To be fair, it does seem like part of the fix for ListenToEventQueue that I've 
alluded to in the comment above would be to stop listening to monitored queues. 
I'm not certain that that change alone would keep you from jumping into queues 
for worker threads. I mean, it's an error to make a native queue off the main 
thread but it works on Windows and I thought we were doing this in a few places.
  My point being that I think we're misusing native queues. With some tweaks and 
careful checking and maybe a paddling or two, then queue jumping at the plevent 
level might not be harmful. But as the situation stands, I think it would be. 
From there back to my qualms, I'd just rather not be traversing multiple queues 
at the plevent level.

  A thought. I suspect you're going to more trouble than you need to. We've been 
trying to cut back on native notifications anyway (Doug put in a patch to that 
effect which doesn't quite work, but it could be made to work...). I believe the 
correlation between NSPR events and native notification events is an uncritical 
one. So I suspect it isn't important to be so careful with the native event 
serial numbering and all, or wouldn't be if you stuck with the queue jumping code 
in nsEventQueue.
  Wouldn't something like this shorter patch do the desirous thing?

+  if (XPending(GDK_DISPLAY()))
+    nsAppShell::ProcessAFewPendingEvents();

where the new function is just ProcessPendingEvents on the current thread, with a 
limit of, I don't know, 3 or 10 events?
As far as I know ListenToEventQueue is only called on native event queues.  That
is, it is only called on event queues that use an FD and are called from the
mail thread.  Other thread queues are monitored from other threads.  There have
been bugs in the past where someone called ListenToEventQueue on a non-native
event queue and all of a sudden events were being processed on the wrong thread.
 As far as I know that doesn't happen anymore.

I don't understand your concerns about transversing multiple event queues.  I
mean, we do that now but in a very random fashon.

As for your proposed patch I don't like it because you still end up with
potential event queue starvation.  I'm trying to get reproducable ordering here
not just interleaving events.
  I was fooled again by the general code in nsEventQueue, forgetting about the 
filter in nsAppShellService. So yes, ListenToEventQueue is only called on native 
queues. That makes more sense anyway, what can I say.
  So scratch everything I've said above about monitored queues being listened to. 
That helps, but I'm still bequalmed. But not enough to keep bumping heads. r=me.
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: