Closed
Bug 67370
Opened 24 years ago
Closed 24 years ago
interleaving events and xlib events
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: blizzard, Assigned: blizzard)
Details
Attachments
(3 files)
9.81 KB,
patch
|
Details | Diff | Splinter Review | |
10.00 KB,
patch
|
Details | Diff | Splinter Review | |
13.54 KB,
image/gif
|
Details |
This is a tracking bug. I want to see if interleaving xlib events and plevent
events is a win or not.
Assignee | ||
Comment 1•24 years ago
|
||
Assignee | ||
Comment 2•24 years ago
|
||
Have to remember to remove the static nsVoidArray this adds.
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•24 years ago
|
||
Adding dougt + danm. Guys, there's a patch in here that mucks with gtk events
that I want you guys to look at.
A quickish look tells me it probably won't hurt anything. Curious, though --
processing nspr events during gtk event processing would be basically increasing
the priority of nspr events. Whazzup?
Assignee | ||
Comment 5•24 years ago
|
||
It doesn't really increase the priority of nspr events exactly. It actually
puts them on equal footing since we only process events that are in the queue
when compared with the serial of the X event.
The most important thing that this fixes is if X is really busy we aren't
completely starving the plevent queue and vice versa. That's what all the 'id'
stuff in the PLEvent object is.
Assignee | ||
Comment 6•24 years ago
|
||
Assignee | ||
Comment 7•24 years ago
|
||
OK, I need reviews on this patch. It's done enough and people don't seem to see
any bad effects from it.
I need jrgm [ now cc'ed on this bug ] to apply this patch to a linux build and
do a perf run on it to see if it affects page load times since I'll be yelled at
if it does affect it.
Comment 8•24 years ago
|
||
Okay, I'm pulling a new build now, and I'll have an answer later this evening.
Just to confirm: patch #2, right?
Comment 9•24 years ago
|
||
Okay, so I ran this patch on a build with '--disable-debug --enable-optimize'.
I ran 5x40 urls, cacheable content, 1sec delay between pages, sidebar
collapsed, 600x800 content area, ...
There is no difference in the times with or without the patch. The average
median time is 3.249 seconds without, and 3.254 seconds without, with the
median times for every URL staying within a 3 percent of each other. (Note
that this was not done in a situation where the X server was particularly
busy).
So, be yelling at me, if I'm wrong.
Comment 10•24 years ago
|
||
Comment 11•24 years ago
|
||
To John Morrison: be sure that X11 is very busy doing things. Try to run the
Zilla cross-continental (~200km wire or more) and/or via ssh channel to increase
latency...
Comment 12•24 years ago
|
||
Hmm, one caveat (to any test) is that if I introduce a variance of 20% in the
measurements, then I have no confidence in results that differ by less than
20% (waving hands but you know what I mean). In other words, I don't know if
we can get a useful measurement in the exceptional case. Is there some other
middle level case we can look at and/or is it necessary to test further with
the X server in a more active condition.
Comment 13•24 years ago
|
||
Remote Xserver and slow (local) Xservers (m64 in Ultra5/Blade100) are cases
where this optimisation described here would make sense...
Remote Xserver should be easy to test (please no 1000baseT... 100baseT may still
to fast when 8bit visuals are used... best would be 10baseT with a 24bit visual
to make X11 slow&sluggish) - slow local Xserver perfmance testing would only be
usefull with multi-CPU machines (Pavlov has a shiny E450... :-) or "renice" the
Xserver to run with a _lower_ priority (e.g. only get CPU if Mozilla process
does not need the CPU)...
Assignee | ||
Comment 14•24 years ago
|
||
Thanks, John. I appreciate your time in testing this.
Assignee | ||
Comment 15•24 years ago
|
||
Oh, and to answer your question I don't think that you need to test exceptional
cases. This patch will probably help in cases where there is a lot of X traffic
in between mozilla and the server and a lot of layout and thread events
happening as well. Less starvation.
Assignee | ||
Comment 16•24 years ago
|
||
I need dougt or danm to look at these changes and say r= or no r= since they are
the event queue dudes.
Assignee | ||
Comment 17•24 years ago
|
||
la la la still waiting for people to review this la la la
Assignee | ||
Comment 18•24 years ago
|
||
If a review falls in a forest, does it make a sound?
Comment 19•24 years ago
|
||
r=timeless, but it sounds like you really want an r=pavlov.
#endif /* XP_UNIX */
+
+/* extra functions for unix */
+
+#ifdef XP_UNIX
any reason to have 2 separate ifdef xp unix?
/*
**
is ** common in /* comments?
Assignee | ||
Comment 20•24 years ago
|
||
I put the extra #ifdef XP_UNIX in there because it seemed like the right thing
to do at the time. In that file the XP_* flags tend to surround each function
for each platform. Also, the ** in /* comments is how it is in the rest of that
header file. Just following local custom.
Comment 21•24 years ago
|
||
Sorry it took me so long to get around to reviewing this. Once again my
Bugzilla mail and I have lost track of each other.
So basically I think the patch looks like it'll work as intended (I assume
you've run with it and it actually helps!). A nit: the PL_ProcessEventsBeforeID
returns as its value its "count" variable, which is uninitialized. Not important
because no one uses the return value, but wants fixing.
However, nits aside, in terms of overal structure of the patch, I gotta express
some misgivings. One thing is the duplication of PL_ProcessPendingEvents. Can you
fold it and the new PL_ProcessEventsBeforeID into one function with some sort of
illegal ID value? I'm worried that the somewhat fragile method, once duplicated,
will undergo divergent evolution.
As an example, I'm curious why ProcessEventsBefore has a dissimilar exit
clause: the original puts a notification back on the native queue if events
remain, but the new one doesn't. Accident or intent? This is all twitchy code
that makes me twitch.
But more to the point, I'm concerned that there is now code in two places that
traverses the active event queues, looking for events to process. The plevent.c
code is a kind of brute underlayer, and there's already a somewhat delicately
balanced thing sitting on top (in nsEventQueue and nsEventQueueService) that
groups them together and controls which queues are processed under which
circumstances.
You're adding another layer of similar code that does something very similar
but tangentially. I'm afraid they'll short out on each other some day, wires
crossed as they are.
I'd really rather you put this stuff in the upper layer, which is already
playing the balancing act.
Assignee | ||
Comment 22•24 years ago
|
||
Thanks for your review, Dan.
The fact that count isn't initialized is a bug. I've got that change locally but
I don't think that it's worth a whole new patch.
As for your other misgivings I thought that they two functions were different
enough to warrent a new function as opposed to folding them all together. I
could probably merge them into a single function but I'm worried about
fragility, too. If you want me to try and do that I will.
As to your specific question about why I don't put a token back into the queue
when the event count is > 0 is I never take the token out. Remember that this
running of the event queue is not done from the standard unix native event queue
notification method. That method is to wake up from select() by reading from a
file descriptor. In this method if there are still events in the queue there
will still be a native token in the pipe and if that event ends up being
processed outside of the event processing loop it will still wake up from
select() and process the event queue.
And I don't see a way to put this into a higher level except to wrap it and that
wouldn't buy us much except to complicate the code somewhat. I still need to
label each event queue and each event with an ID and that needs to be done at
the lowest level.
If you want to get rid of plevent.c and move to
EventQueueUnix.cpp/EventQueueWin.cpp then we're talking about something else I
would buy into. The current impl is organic and a mess. I'm just trying to
cause as little pain as possible here. :)
Comment 23•24 years ago
|
||
Alright; thanks for the explanations. The native notification removal/addition
stuff is twitchy in its cross-platform glory, but this being gtk-only makes it
easier. My particular objection to the plevent-vs-nsEventQueue thing is this:
nsEventQueue keeps a two dimensional array of queues, thread vs stack-o'-stalled-
queues. Event processing steps down one of the dimensions (down the stack, within
the current thread, though I think that's broken on gtk, because of the
ListenToEventQueue stuff). At the plevent level, queues are independent. All the
queuegroup interactions are handled in one place, at the nsEventQueue level.
This is goofy code and delicate. (See bug 54371, though that bug's specific
complaint is about something different and has been fixed. But it remains open
because of the gtk problem and gives a window on the seamy underbelly of the
event code.) Your patch in this bug adds another level of queue grouping
independent of the nsEventQueue's grouping, bringing the concept of grouping also
into plevent. That's my main worry.
Oo. Come to think of it, that is a problem. Suddenly I have a specific worry,
rather than general qualms. There is a problem in the current (unpatched) code.
The ListenToEventQueue thing causes events in worker threads to be processed
outside their thread. Oddly, this mostly doesn't seem to hurt anything. But I
hear complaints from the networking guys that they're seeing events occasionally
processed out of order on the Linux builds, and I'm sure this is the reason. It
wants to be fixed, but I haven't figured out how yet.
Your patch links all the event queues together and processes them at odd
moments, all in a bunch, regardless of their owning thread. It'll exacerbate the
ListenToEventQueue problem of events being processed off thread and out of order.
Yes there is already a problem here, but you don't want to make it worse. I'm
beginning to be pretty convinced that code which wants to jump from queue to
queue, processing events, wants to be in the nsEventQueue level, where we're
already doing that, where lives the knowledge to stay in the current thread and
process them in the correct order of stalled queues within the thread.
Assignee | ||
Comment 24•24 years ago
|
||
I only use this on the primary thread on the thread event queue. I'm not
talking about using this anywhere except there. Doesn't that obsolve your
worries about processing events on the wrong thread?
Also, if you call ListenToEVentQueue(PR_FALSE) I don't process events for that
queue anymore, just like normal processing. So, if you fix that then you fix
this entry point too.
My patch does _not_ process event queues in bunches and at odd times. In fact
that's how I describe the PL_ProcessPendingEvents() method. I no longer queue
up events until we get back up to the idle loop. This method prevents us from
being very busy and failing to ever pay attention to the event queue which might
have a lot of important events in it. If you have a lot of timers firing with a
lot of drawing going on it's very likely that you will end up completely
starving the event queue or the window queue. This is what I'm trying to fix here.
I always process the events in order with regards to window events. That's what
the whole addition of the 'id' element is on the PLEvent structure is for. I
never process events for which the window events haven't been seen yet. This
should keep draws and reflows syncronized as well.
Essentially what I've tried to do is replicate the way that Windows does its
event queue handling. That is that they intermingle window event messages and
event queue messages. It's better than what we have in unix for some things and
I'm trying to make things smoother wrt event delivery and event queue starvation.
Comment 25•24 years ago
|
||
But you're hooking into ListenToEventQueue, so I don't see how you're limiting
yourself to queues on the main thread. The patch affects every queue that's
listened to, which at the moment is every queue anyone ever creates, anywhere.
It's unfair of me to say the queues are processed at odd moments. But they will
be processed at more moments, and in a great fell swoop that traverses every
active listening queue, in the order they were created. That's probably the right
order, at least within a thread, but why second guess all the careful ordering
already in nsEventQueue?
Now within a single queue, you're pulling events out in order and potentially
stopping short. No worries there. It's what happens when you jump to the next
queue that I'm all bunched up about. To be honest I don't completely follow the
event ID thing. I gather you're being very careful to stop short in each queue
when you run into an NSPR event that belongs to a later native notification.
Alright. That may work. It still seems like there could be problems between
queues, with event delivery latency and all that.
To be fair, it does seem like part of the fix for ListenToEventQueue that I've
alluded to in the comment above would be to stop listening to monitored queues.
I'm not certain that that change alone would keep you from jumping into queues
for worker threads. I mean, it's an error to make a native queue off the main
thread but it works on Windows and I thought we were doing this in a few places.
My point being that I think we're misusing native queues. With some tweaks and
careful checking and maybe a paddling or two, then queue jumping at the plevent
level might not be harmful. But as the situation stands, I think it would be.
From there back to my qualms, I'd just rather not be traversing multiple queues
at the plevent level.
A thought. I suspect you're going to more trouble than you need to. We've been
trying to cut back on native notifications anyway (Doug put in a patch to that
effect which doesn't quite work, but it could be made to work...). I believe the
correlation between NSPR events and native notification events is an uncritical
one. So I suspect it isn't important to be so careful with the native event
serial numbering and all, or wouldn't be if you stuck with the queue jumping code
in nsEventQueue.
Wouldn't something like this shorter patch do the desirous thing?
+ if (XPending(GDK_DISPLAY()))
+ nsAppShell::ProcessAFewPendingEvents();
where the new function is just ProcessPendingEvents on the current thread, with a
limit of, I don't know, 3 or 10 events?
Assignee | ||
Comment 26•24 years ago
|
||
As far as I know ListenToEventQueue is only called on native event queues. That
is, it is only called on event queues that use an FD and are called from the
mail thread. Other thread queues are monitored from other threads. There have
been bugs in the past where someone called ListenToEventQueue on a non-native
event queue and all of a sudden events were being processed on the wrong thread.
As far as I know that doesn't happen anymore.
I don't understand your concerns about transversing multiple event queues. I
mean, we do that now but in a very random fashon.
As for your proposed patch I don't like it because you still end up with
potential event queue starvation. I'm trying to get reproducable ordering here
not just interleaving events.
Comment 27•24 years ago
|
||
I was fooled again by the general code in nsEventQueue, forgetting about the
filter in nsAppShellService. So yes, ListenToEventQueue is only called on native
queues. That makes more sense anyway, what can I say.
So scratch everything I've said above about monitored queues being listened to.
That helps, but I'm still bequalmed. But not enough to keep bumping heads. r=me.
Comment 28•24 years ago
|
||
sr=shaver.
Assignee | ||
Comment 29•24 years ago
|
||
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•