Closed Bug 50104 Opened 24 years ago Closed 18 years ago

PLEvent queue needs priority handling

Categories

(Core :: DOM: UI Events & Focus Handling, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: dmosedale, Assigned: dougt)

References

Details

(Keywords: arch, perf)

Attachments

(4 files)

Build with LDAP support (see http://www.mozilla.org/directory/xpcom.html for instructions). Now, via a reasonably fast link (eg the Netscape LAN), visit an LDAP search URL that returns a lot of directory entries, such as ldap://memberdir.netscape.com:389/ou=member_directory,o=netcenter.com??sub?(sn=Smith) The UI will completely stall out on all platforms for quite some time (on my Linux box, it hangs for 45 seconds). If I do this from a slow link (like my ricochet modem, which is ~28.8K), the UI behaves reasonably (ie the throbber and status bar continue to spin, XUL menus continue to work, etc). Alternately, if I sleep a while between callbacks (a la line 423 of nsLDAPConnection.cpp), the problem goes away. Doug, I took your advice and moved 99% of the callback logic to the LDAP connection thread (note that nsLDAPChannel.cpp is now compiled with #define INVOKE_LDAP_CALLBACKS_ON_MAIN_THREAD 0). This doesn't seem to make any perceptible difference. Is it time for event priorities so that events related to the chrome and UI can take precedence?
actually, this is the time that we remove the native notification for every event in favor of an event for the first event that is added to the Q that has not had a native notification. This optimization should help all platforms. Nominate if nsbeta3 otherwise I am going to later.
I'd like to see this get fixed if possible. It would probably help UI performance a good bit since we won't be servicing the event queue as often and that's one of the places where we are really hurting in terms of how we look to end users. Is what you are thinking of still based on the assumption that you process the entire queue any time someone adds an event?
Keywords: arch, perf
plussing.
Whiteboard: [nsbeta3+]
changing qa contact to ckritzer
QA Contact: janc → ckritzer
Priority: P3 → P1
working on this.
Status: NEW → ASSIGNED
I just attached the patch which reduces notification by about 80%. It works great on linux. still needs to verify on mac and windows.
geez... I am a dork. found the windows problem... I am actually surprized that it worked on unix.
It turns out that this patch doesn't noticeably help the problem at all on Linux. I think the reason I thought it did was because the first time I tested did so across a slow link where flooding wasn't going to be a problem. Your patch is just an optimization for the current case, where every event in the monitored queue is processed at once, right? If so, it doesn't address the core problem that if there are a lot of events in the monitored queue (because a lot of data has come back quickly over the network), it's still gonna take a long time (though now slightly less) to get back to the point the main thread even thinks about touching any UI-related events. Does this sound right?
The fix does not do anything about monitor event queue's, just native event notification. Regardless if this fixes LDAP's problem, it is a good fix or performance. Lets get together tomorrow regarding what you are seeing with LDAP.
I could reduce the calls to a notify on the eventQ monitor. This may help...
After talking to Dan for a while about the exact problem, it sounds like we need some kind of event priorities. Here is part of our aim conversation: dougtnscp: I need to know more about how you are doing things in LDAP... dmose2: basically, i start a search dmose2: then, the LDAP SDK listens on the network dmose2: and each time it gets a directory entry, it executes a callback dmose2: which is executed on the connection thread dmose2: and that callback writes a bunch of stuff down an asyncstreamlistener dmose2: that ends up in the layout engine dmose2: and one search can quickly yield ~50 entries or more dmose2: regardless of the number of native notifies used to do the event execution, all the events stacked up in a queue get processed at once, right? dougtnscp: yes, of course. dmose2: so if i have 50 events in the queue, that's a buttload of laying out to do before any UI-related events ever happen dmose2: so i would argue that that's the problem dmose2: the algorithm should stop after each event (or small batch of events) in the queue dmose2: and look for UI-related events dmose2: and give those priority I don't think that we should change the event processing model this late in the development cycle. It is risky and is some work. I am going to remove the nsbeta3+ status from this bug - however, I still will checkin the optimization to plevents that are attached. Dan, for now, see about batching your posts. cc-ing alecf. He has some good idea regarding prioritizing the plevent queues.
Summary: PLEvent queue flooding → PLEvent queue needs priority handling
Whiteboard: [nsbeta3+]
Blocks: 36849
I don't remember what my ideas were... maybe someone can remind me and I'll remember again?
Whiteboard: [nsbeta3-]
Updating QA Contact.
QA Contact: ckritzer → lorca
It turned out that I was calling Write and OnDataAvailable() many many times per directory entry, each one generating an event. I just tweaked a bunch of my code to not do that, so that there is now one call to nsIOutputStream::Write() and ::OnDataAvailable per directory entry callback. Unfortunately, this still doesn't really make any difference: on a reasonably fast network, the event queue is still completely flooded, and layout is still completely stalled. Nominating this bug for mozilla0.9.
Keywords: mozilla0.9
Anyone for nominating this for RTM?
This is an architecture issue. Definitely post-NS6.
Something just triggered my idea that I may have had a while back.. The way I look at this situation is this: - Certain sets of events MUST happen in order - for instance, if I'm making cross-thread proxy calls, I expect that if I make two async calls, that they at least happen in order. I think layout has similar expectations about events that it posts on it's own thread. - Certain sets events can happen in any order - for example two threads each making proxy calls, or a proxy call and another thread posting an event to it's own queue.. So it seems like we need a way to associate a given thread with a set, and make sure that all threads within that set execute in the order they were put on the queue, relative to the other threads in that set. Then, maybe we need to give individual sets priorities, so that a given event set can have priority over another event set. The "event set" concept doesn't have to exist as a datastructure per say, but each event can keep a "set ID" which the PL-event dequeuer would use when deciding which event to run next. thoughts?
It would be easy to guarantee FIFO ordering among events in the same priority class. That is all you want, right?
if "priority class" == my "event set" then yes :) if a priority class means a group of events with the same priority, then no.. I guess I'm conceptually trying to break apart event priorities from event ordering, so that we have more flexibility in picking which events (such as ones related to responsiveness) get a higher priority, without breaking event ordering expectations.. (I wondered if there were well-known terms for concepts like this, I'm just coming up with this stuff off the top of my head)
I see. Systems like this have been studied under the name of "Petri nets". I don't know much about that work though. It would be easy enough to implement "event chaining" where instead of posting a sequence of N events, you put them in a linked list and just post the first event. Every time an event is handled you check to see if it has a successor in the list, and if there is, post it. Is that enough? If we need to be able to specify arbitrary event ordering constraints, we might as well forget about a priority system and just go with a constraint DAG. It would probably be necessary to introduce explicit "event classes" (or "set IDs"), annotate each event with a set of classes, and specify constraints on the classes. This would be a little harder to implement but could still be done very efficiently. Exposing an explicit ordering DAG would actually be better than priorities, because when you set priorities you're always thinking about "we want A to happen before B, but after C, and so ..." --- it'd be easier and more maintainable to just state these requirements explicitly in the code.
I don't think completely arbitrary event ordering is necessary, I'm just thinking something real simple: for proxies, each {originating, destination} thread would need it's own "event set" and the priorities of the events themselves could be set when the proxy object is created. for other events like the gecko events, it would be up to the caller. In the case of gecko specifically, the events would all have to belong to the same event set, but priorities could be different depending on the event. I'll post ideas about the event processing/dequeuing side later..
There needs to be some heuristics to avoid blocking a proxied call. For example, thread b proxies to thread a, but a is always handling layout events and never gets to the proxy events. We need to handle x layout events then check for lower priority events. I don;t think that the logic can be as simple as "handle all high priority events before lower priority events". I am worried that the caller of ProcessPendingEvents, or equivalent, is going to have not know about this logic. Ideally I think that clients want as little to worry about as possible. Where do you propose that this goes? Would it all be under the sheets of xpcom or would embedding clients needs to worry about how they handle which events.
We should be able to put all this in ProcessPendingEvents. Although we might have to extend ProcessPendingEvents a bit so that callers can 1) only process events above a certain priority (so that platform events can be effectively prioritized) 2) only process one event (in case a higher-priority platform event arrives while we are processing the first PLEvent). Avoiding starvation is important, but we have to be careful because arbitrarily violating priorities could break people's expectations, leading to bugs --- and even worse, ugly client-side workarounds. We would need to carefully distinguish between event ordering properties that are "hard" and never violated, and those that are "soft" and will sometimes be violated. If we used explicit event ordering constraints, we could mark constraints as "soft" or "hard"; soft constraints would not apply to events that have been in the queue for longer than a certain period of time.
I think ProcessPendingEvents should simply be smart enough to make sure that certain events don't starve. This is a similar problem as an OS trying to round-robin it's processes. We could probably age the events by raising their priority the longer they are in the queue, and so forth.
I think the "hard" vs. "soft" is a good way of explaining my original proposal :) Events in the same "event set" would be required to be executed in the order that they appear in the queue, relative to each other. This is effectively "hard" prioritization Two events in different event sets have no guaranteed order, no matter what their priority is. This is effectively "soft" prioritization.
Was thinking about places where the lack of event prioritization seems to be a problem in mozilla other than LDAP. Presumably any strategy chosen should also be tested to make sure it helps these cases as well. The file transport channel: Try loading from a very large HTML file (example: install a MySQL RPM on linux, and then view /usr/doc/MySQL*/manual.html -- it's 1.5 megs). The entire UI stalls out as the content is being pushed into layout. and, from an old porkjockeys posting: FTP checkin comment ("FTP bug # 46750: don't starve UI during FTP operations by slowly pushing FTP URLs into content model on a timer") and diff: http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=mozilla/xpfe/components/directory&command=DIFF_FRAMESET&file=nsDirectoryViewer.cpp&rev1=1.53&rev2=1.54&root=/cvsroot
A question about Alec's priority sets: you're suggesting event prioritization even across event queues? Has this been a problem? Currently each thread and event queue is self-contained -- at least across threads --. I wonder why this is being portrayed as insufficient. An extra suggested complication: are we interested in trying to get rid of event queue stacks? I know this would make the embedding people happy. Event queue stacks exist only to force execution priority. If the priority code were a two-part thing: what Alec suggested and additionally a higher-order trump priority, and the event queue processed only events with trump priorities at or above its current, changeable trump level, we should theoretically be able to get rid of event queue stacks.
I didn't mean across event queues... that would be insane :) - the event sets only have meaning within the event queue that is being processed. But I like the idea you describe to avoid the event queue stacks. I wonder if we can combine the event set idea with the trump idea, for sake of simplicity.. by my description above, the event sets are priority free (i.e. one event set does not have 'trump' priority over another one) and the priority is up to the events themselves... One option would be to put this trump priority on the event queue - this would avoid problems such as events that were expected to be in order (since they are in the same event set) but get processed out of order because a later event has a higher trump priority. However, this would actually require us to keep track of the event sets themselves, which we didn't need to do. Of course this hypothetical situation about trump priorities making events within the same event set happen out of order could be completely impossible, and thus an unnecessary thing to watch out for.
Just to clarify, is this what you're suggesting? --- Give each event a priority with two components: the "basic priority", set by the poster of the event, and the "trump priority", set from the "current trump priority" of the event queue, which is itself set by some other code. If that's what you want, then we can simplify things a bit. Just have one priority on each event. Give each event queue a "current base priority", and provide methods on the event queue to get and set it. Add the current base priority to the priority of every event posted to the queue. To emulate "trump priorities", just temporarily increase the current base priority by some large value (e.g. 2^16, assuming 32-bit priorities). The current base priority would be useful for other things too.
Event queues are mostly fetched as needed from the event queue manager across the product. That is, if you need to post an event, you need to ask the manager for the current queue, since that can change. They are occasionally fetched and cached. Generally, you always want to do the former thing, but sometimes it's important to do the latter. Event queue stacks allow events to be processed for, say, loading a modal dialog while deferring processing on events for, say, the underlying network connection that needs a response to the modal dialog before it can continue. A priority system could accomplish the same thing, if it were rich enough. The underlying network connection needs to ask the event queue for its current trump priority and always post events using that priority. (Currently, I believe it caches the event queue which was active at the time. Same thing.) And of course the normal priority can vary as required. While it's happily spewing events at that trump priority, the event queue may be ramped up to a higher trump priority (a thing we're currently accomplishing by pushing a new queue onto the stack.) Further events posted by that network connection need to continue being posted at their original trump priority while more recent connections will post at their current, higher trump. If the event queue leaves lower trump priority events unprocessed -- even if it's completely out of higher trump priority events -- until it returns to its original lower trump priority (currently: pop an event queue off the stack), then it can accomplish the same thing as an event queue stack: temporarily stymie the original connection while allowing connections for processing the modal dialog to go through. The trump priority could be the high bits of a priority word, but it needs to be treated a bit differently from the lower, normal priority bits, as outlined above. I'm not totally sold on this approach yet, but I think it has wortwhile advantages over our current, stack-o'-queues scheme. We could stick with the stacks and add the priority scheme you've been discussing, and have less starvation for UI events. But I think while we're at it we could blend in this similar tweak, which would allow us to rip out event queue stacks.
2
Assignee: dougt → danm
Status: ASSIGNED → NEW
Blocks: 55403
Target Milestone: --- → Future
Status: NEW → ASSIGNED
I'm attaching a patch which builds on Doug's earlier patch in this bug; the one reducing the number of native event notifications. There are still more notifications coming in than are strictly necessary. Today's linux build, for instance, typically queues up 84 concurrent notifications while launching. This may cause some UI starvation (though I haven't noticed any), but it does cause other problems (bug 63646). With this patch applied, I've never seen more than 2 notifications stacked up at any time (which will obviate any workaround for bug 63646). The patch sharply reduces the number of notifications. The risk, of course, is that a notification would go completely missing and events go unprocessed until fresh events are posted. I think this is a good patch, and I've had no problems with it yet, but it's still scary.
REMOVING nsbeta2/3 and replacing with nsbeta1 KW. Also clearing Status Whiteboard of nsbeta2/3 result. Primarily because this would be nice if it really improves performance. On a side issue, its interesting that a Dan reported this, a Dan is working to fix it, and a Dan is working on testing this. All different Dans. Scary?
Keywords: nsbeta3nsbeta1
Whiteboard: [nsbeta3-]
Reassigning QA Contact for all open and unverified bugs previously under Lorca's care to Gerardo as per phone conversation this morning.
QA Contact: lorca → gerardok
I am using the XPCOM LDAP components and asserting entries into an RDF data source much in the same way as the js LDAP data source. I don't see any noticable improvement when i applied the lastest patch and used async proxy RDF observers, the UI is still unresponsive for the responsive LDAP server used, thus i have to resort to sync proxy RDF observers. I am not overly familiar with aspects of mozilla i am commenting on in the below sentances, so i could be talking a load of... so apologies in advance if so. In terms of putting events onto the UI queue, Java Swing may do somthing very similar with the InvokeLater method: http://java.sun.com/products/jdk/1.2/docs/api/javax/swing/SwingUtilities.html#invokeLater(java.lang.Runnable) I don't know anything on how events work in Mozilla but maybe some ideas could be obtained from the swing source to manage UI events more efficiently? Perhaps the problem lies more with UI processing of events than at the XPCOM layer of event propagation (maybe also thread priorities of user input?). Does there need to be a special UI event queue implementation? Also on the point about RDF Observers and data sources. It seems that a data source which notifies observers, on a different thread, should not have to proxy to the observer thread. This implies that the data source is bound to a particular thread for notification, .e.g UI thread. This could result in UI starvation if data sources are also used for non UI purposes. The solution could be for the presumably XUL obeservers to recognise that a notification is not on the UI thread and act accordingly. There may be some scope for optimization if performed this way.
Just compiled mozilla 0.8.1 + my stuff optimized and no debug. With or without the patch the performance (of course!) is alot better but has the occasional pause, the UI updates faster than the rate of LDAP messages, i presume. Netscape 4x is approximately 2x faster.
QA contact updated
QA Contact: gerardok → madhur
Blocks: 104166
any of the patches here still work? I'm guessing that more work is needed on this...
Good guess. Not going to happen before 1.0, as the competition is fierce.
QA Contact: madhur → rakeshmishra
Keywords: patch
QA Contact: rakeshmishra → trix
Assignee: danm.moz → nobody
Status: ASSIGNED → NEW
The bug described in comment #0 and comment #10 (in particular) should have been fixed with the threadmanager landing (bug 326273). We still do not have a way to set the priority of "PLEvents" (now they are actually nsIRunnable objects), but we do now take care to better mix gecko events with UI events so that the underlying bug here should be resolved. If we want to introduce priorities for gecko events, then we can still do so, but I think a new bug should be filed with fresh justification. Marking this bug FIXED by bug 326273.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Assignee: nobody → doug.turner
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: