50104 - PLEvent queue needs priority handling

Reporter

Description

•

24 years ago

Build with LDAP support (see http://www.mozilla.org/directory/xpcom.html for instructions). Now, via a reasonably fast link (eg the Netscape LAN), visit an LDAP search URL that returns a lot of directory entries, such as ldap://memberdir.netscape.com:389/ou=member_directory,o=netcenter.com??sub?(sn=Smith) The UI will completely stall out on all platforms for quite some time (on my Linux box, it hangs for 45 seconds). If I do this from a slow link (like my ricochet modem, which is ~28.8K), the UI behaves reasonably (ie the throbber and status bar continue to spin, XUL menus continue to work, etc). Alternately, if I sleep a while between callbacks (a la line 423 of nsLDAPConnection.cpp), the problem goes away. Doug, I took your advice and moved 99% of the callback logic to the LDAP connection thread (note that nsLDAPChannel.cpp is now compiled with #define INVOKE_LDAP_CALLBACKS_ON_MAIN_THREAD 0). This doesn't seem to make any perceptible difference. Is it time for event priorities so that events related to the chrome and UI can take precedence?

Doug Turner (:dougt)

Assignee

Comment 1

•

24 years ago

actually, this is the time that we remove the native notification for every event in favor of an event for the first event that is added to the Q that has not had a native notification. This optimization should help all platforms. Nominate if nsbeta3 otherwise I am going to later.

Christopher Blizzard (:blizzard)

Comment 2

•

24 years ago

I'd like to see this get fixed if possible. It would probably help UI performance a good bit since we won't be servicing the event queue as often and that's one of the places where we are really hurting in terms of how we look to end users. Is what you are thinking of still based on the assumption that you process the entire queue any time someone adds an event?

Dan Mosedale (:dmosedale, :dmose)

Reporter

Updated

•

24 years ago

Keywords: arch, perf

Christopher Blizzard (:blizzard)

Updated

•

24 years ago

Keywords: nsbeta3

Judson Valeski

Comment 3

•

24 years ago

plussing.

Whiteboard: [nsbeta3+]

Jan Carpenter

Comment 4

•

24 years ago

changing qa contact to ckritzer

QA Contact: janc → ckritzer

Judson Valeski

Updated

•

24 years ago

Priority: P3 → P1

Doug Turner (:dougt)

Assignee

Comment 5

•

24 years ago

working on this.

Status: NEW → ASSIGNED

Doug Turner (:dougt)

Assignee

Comment 6

•

24 years ago

Attached patch PLEvent optimization unified diff — Details — Splinter Review

Doug Turner (:dougt)

Assignee

Comment 7

•

24 years ago

I just attached the patch which reduces notification by about 80%. It works great on linux. still needs to verify on mac and windows.

Doug Turner (:dougt)

Assignee

Comment 8

•

24 years ago

geez... I am a dork. found the windows problem... I am actually surprized that it worked on unix.

Doug Turner (:dougt)

Assignee

Comment 9

•

24 years ago

Attached patch Plevent optimization - Take Two. — Details — Splinter Review

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 10

•

24 years ago

It turns out that this patch doesn't noticeably help the problem at all on Linux. I think the reason I thought it did was because the first time I tested did so across a slow link where flooding wasn't going to be a problem. Your patch is just an optimization for the current case, where every event in the monitored queue is processed at once, right? If so, it doesn't address the core problem that if there are a lot of events in the monitored queue (because a lot of data has come back quickly over the network), it's still gonna take a long time (though now slightly less) to get back to the point the main thread even thinks about touching any UI-related events. Does this sound right?

Doug Turner (:dougt)

Assignee

Comment 11

•

24 years ago

The fix does not do anything about monitor event queue's, just native event notification. Regardless if this fixes LDAP's problem, it is a good fix or performance. Lets get together tomorrow regarding what you are seeing with LDAP.

Doug Turner (:dougt)

Assignee

Comment 12

•

24 years ago

I could reduce the calls to a notify on the eventQ monitor. This may help...

Doug Turner (:dougt)

Assignee

Comment 13

•

24 years ago

Attached patch reduce monitor notifications patch — Details — Splinter Review

Doug Turner (:dougt)

Assignee

Comment 14

•

24 years ago

After talking to Dan for a while about the exact problem, it sounds like we need some kind of event priorities. Here is part of our aim conversation: dougtnscp: I need to know more about how you are doing things in LDAP... dmose2: basically, i start a search dmose2: then, the LDAP SDK listens on the network dmose2: and each time it gets a directory entry, it executes a callback dmose2: which is executed on the connection thread dmose2: and that callback writes a bunch of stuff down an asyncstreamlistener dmose2: that ends up in the layout engine dmose2: and one search can quickly yield ~50 entries or more dmose2: regardless of the number of native notifies used to do the event execution, all the events stacked up in a queue get processed at once, right? dougtnscp: yes, of course. dmose2: so if i have 50 events in the queue, that's a buttload of laying out to do before any UI-related events ever happen dmose2: so i would argue that that's the problem dmose2: the algorithm should stop after each event (or small batch of events) in the queue dmose2: and look for UI-related events dmose2: and give those priority I don't think that we should change the event processing model this late in the development cycle. It is risky and is some work. I am going to remove the nsbeta3+ status from this bug - however, I still will checkin the optimization to plevents that are attached. Dan, for now, see about batching your posts. cc-ing alecf. He has some good idea regarding prioritizing the plevent queues.

Summary: PLEvent queue flooding → PLEvent queue needs priority handling

Whiteboard: [nsbeta3+]

Marc Attinasi

Updated

•

24 years ago

Blocks: 36849

Alec Flett

Comment 15

•

24 years ago

I don't remember what my ideas were... maybe someone can remind me and I'll remember again?

Judson Valeski

Updated

•

24 years ago

Whiteboard: [nsbeta3-]

Loco

Comment 16

•

24 years ago

Updating QA Contact.

QA Contact: ckritzer → lorca

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 17

•

24 years ago

It turned out that I was calling Write and OnDataAvailable() many many times per directory entry, each one generating an event. I just tweaked a bunch of my code to not do that, so that there is now one call to nsIOutputStream::Write() and ::OnDataAvailable per directory entry callback. Unfortunately, this still doesn't really make any difference: on a reasonably fast network, the event queue is still completely flooded, and layout is still completely stalled. Nominating this bug for mozilla0.9.

Keywords: mozilla0.9

Loco

Comment 18

•

24 years ago

Anyone for nominating this for RTM?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

24 years ago

This is an architecture issue. Definitely post-NS6.

Alec Flett

Comment 20

•

24 years ago

Something just triggered my idea that I may have had a while back.. The way I look at this situation is this: - Certain sets of events MUST happen in order - for instance, if I'm making cross-thread proxy calls, I expect that if I make two async calls, that they at least happen in order. I think layout has similar expectations about events that it posts on it's own thread. - Certain sets events can happen in any order - for example two threads each making proxy calls, or a proxy call and another thread posting an event to it's own queue.. So it seems like we need a way to associate a given thread with a set, and make sure that all threads within that set execute in the order they were put on the queue, relative to the other threads in that set. Then, maybe we need to give individual sets priorities, so that a given event set can have priority over another event set. The "event set" concept doesn't have to exist as a datastructure per say, but each event can keep a "set ID" which the PL-event dequeuer would use when deciding which event to run next. thoughts?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 21

•

24 years ago

It would be easy to guarantee FIFO ordering among events in the same priority class. That is all you want, right?

Alec Flett

Comment 22

•

24 years ago

if "priority class" == my "event set" then yes :) if a priority class means a group of events with the same priority, then no.. I guess I'm conceptually trying to break apart event priorities from event ordering, so that we have more flexibility in picking which events (such as ones related to responsiveness) get a higher priority, without breaking event ordering expectations.. (I wondered if there were well-known terms for concepts like this, I'm just coming up with this stuff off the top of my head)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 23

•

24 years ago

I see. Systems like this have been studied under the name of "Petri nets". I don't know much about that work though. It would be easy enough to implement "event chaining" where instead of posting a sequence of N events, you put them in a linked list and just post the first event. Every time an event is handled you check to see if it has a successor in the list, and if there is, post it. Is that enough? If we need to be able to specify arbitrary event ordering constraints, we might as well forget about a priority system and just go with a constraint DAG. It would probably be necessary to introduce explicit "event classes" (or "set IDs"), annotate each event with a set of classes, and specify constraints on the classes. This would be a little harder to implement but could still be done very efficiently. Exposing an explicit ordering DAG would actually be better than priorities, because when you set priorities you're always thinking about "we want A to happen before B, but after C, and so ..." --- it'd be easier and more maintainable to just state these requirements explicitly in the code.

Alec Flett

Comment 24

•

24 years ago

I don't think completely arbitrary event ordering is necessary, I'm just thinking something real simple: for proxies, each {originating, destination} thread would need it's own "event set" and the priorities of the events themselves could be set when the proxy object is created. for other events like the gecko events, it would be up to the caller. In the case of gecko specifically, the events would all have to belong to the same event set, but priorities could be different depending on the event. I'll post ideas about the event processing/dequeuing side later..

Doug Turner (:dougt)

Assignee

Comment 25

•

24 years ago

There needs to be some heuristics to avoid blocking a proxied call. For example, thread b proxies to thread a, but a is always handling layout events and never gets to the proxy events. We need to handle x layout events then check for lower priority events. I don;t think that the logic can be as simple as "handle all high priority events before lower priority events". I am worried that the caller of ProcessPendingEvents, or equivalent, is going to have not know about this logic. Ideally I think that clients want as little to worry about as possible. Where do you propose that this goes? Would it all be under the sheets of xpcom or would embedding clients needs to worry about how they handle which events.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

24 years ago

We should be able to put all this in ProcessPendingEvents. Although we might have to extend ProcessPendingEvents a bit so that callers can 1) only process events above a certain priority (so that platform events can be effectively prioritized) 2) only process one event (in case a higher-priority platform event arrives while we are processing the first PLEvent). Avoiding starvation is important, but we have to be careful because arbitrarily violating priorities could break people's expectations, leading to bugs --- and even worse, ugly client-side workarounds. We would need to carefully distinguish between event ordering properties that are "hard" and never violated, and those that are "soft" and will sometimes be violated. If we used explicit event ordering constraints, we could mark constraints as "soft" or "hard"; soft constraints would not apply to events that have been in the queue for longer than a certain period of time.

Alec Flett

Comment 27

•

24 years ago

I think ProcessPendingEvents should simply be smart enough to make sure that certain events don't starve. This is a similar problem as an OS trying to round-robin it's processes. We could probably age the events by raising their priority the longer they are in the queue, and so forth.

Alec Flett

Comment 28

•

24 years ago

I think the "hard" vs. "soft" is a good way of explaining my original proposal :) Events in the same "event set" would be required to be executed in the order that they appear in the queue, relative to each other. This is effectively "hard" prioritization Two events in different event sets have no guaranteed order, no matter what their priority is. This is effectively "soft" prioritization.

Dan Mosedale (:dmosedale, :dmose)

Reporter

Comment 29

•

24 years ago

Was thinking about places where the lack of event prioritization seems to be a problem in mozilla other than LDAP. Presumably any strategy chosen should also be tested to make sure it helps these cases as well. The file transport channel: Try loading from a very large HTML file (example: install a MySQL RPM on linux, and then view /usr/doc/MySQL*/manual.html -- it's 1.5 megs). The entire UI stalls out as the content is being pushed into layout. and, from an old porkjockeys posting: FTP checkin comment ("FTP bug # 46750: don't starve UI during FTP operations by slowly pushing FTP URLs into content model on a timer") and diff: http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=mozilla/xpfe/components/directory&command=DIFF_FRAMESET&file=nsDirectoryViewer.cpp&rev1=1.53&rev2=1.54&root=/cvsroot

Dan M

Comment 30

•

24 years ago

A question about Alec's priority sets: you're suggesting event prioritization even across event queues? Has this been a problem? Currently each thread and event queue is self-contained -- at least across threads --. I wonder why this is being portrayed as insufficient. An extra suggested complication: are we interested in trying to get rid of event queue stacks? I know this would make the embedding people happy. Event queue stacks exist only to force execution priority. If the priority code were a two-part thing: what Alec suggested and additionally a higher-order trump priority, and the event queue processed only events with trump priorities at or above its current, changeable trump level, we should theoretically be able to get rid of event queue stacks.

Alec Flett

Comment 31

•

24 years ago

I didn't mean across event queues... that would be insane :) - the event sets only have meaning within the event queue that is being processed. But I like the idea you describe to avoid the event queue stacks. I wonder if we can combine the event set idea with the trump idea, for sake of simplicity.. by my description above, the event sets are priority free (i.e. one event set does not have 'trump' priority over another one) and the priority is up to the events themselves... One option would be to put this trump priority on the event queue - this would avoid problems such as events that were expected to be in order (since they are in the same event set) but get processed out of order because a later event has a higher trump priority. However, this would actually require us to keep track of the event sets themselves, which we didn't need to do. Of course this hypothetical situation about trump priorities making events within the same event set happen out of order could be completely impossible, and thus an unnecessary thing to watch out for.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 32

•

24 years ago

Just to clarify, is this what you're suggesting? --- Give each event a priority with two components: the "basic priority", set by the poster of the event, and the "trump priority", set from the "current trump priority" of the event queue, which is itself set by some other code. If that's what you want, then we can simplify things a bit. Just have one priority on each event. Give each event queue a "current base priority", and provide methods on the event queue to get and set it. Add the current base priority to the priority of every event posted to the queue. To emulate "trump priorities", just temporarily increase the current base priority by some large value (e.g. 2^16, assuming 32-bit priorities). The current base priority would be useful for other things too.

Dan M

Comment 33

•

24 years ago

Event queues are mostly fetched as needed from the event queue manager across the product. That is, if you need to post an event, you need to ask the manager for the current queue, since that can change. They are occasionally fetched and cached. Generally, you always want to do the former thing, but sometimes it's important to do the latter. Event queue stacks allow events to be processed for, say, loading a modal dialog while deferring processing on events for, say, the underlying network connection that needs a response to the modal dialog before it can continue. A priority system could accomplish the same thing, if it were rich enough. The underlying network connection needs to ask the event queue for its current trump priority and always post events using that priority. (Currently, I believe it caches the event queue which was active at the time. Same thing.) And of course the normal priority can vary as required. While it's happily spewing events at that trump priority, the event queue may be ramped up to a higher trump priority (a thing we're currently accomplishing by pushing a new queue onto the stack.) Further events posted by that network connection need to continue being posted at their original trump priority while more recent connections will post at their current, higher trump. If the event queue leaves lower trump priority events unprocessed -- even if it's completely out of higher trump priority events -- until it returns to its original lower trump priority (currently: pop an event queue off the stack), then it can accomplish the same thing as an event queue stack: temporarily stymie the original connection while allowing connections for processing the modal dialog to go through. The trump priority could be the high bits of a priority word, but it needs to be treated a bit differently from the lower, normal priority bits, as outlined above. I'm not totally sold on this approach yet, but I think it has wortwhile advantages over our current, stack-o'-queues scheme. We could stick with the stacks and add the priority scheme you've been discussing, and have less starvation for UI events. But I think while we're at it we could blend in this similar tweak, which would allow us to rip out event queue stacks.

Dan M

Comment 34

•

24 years ago

2

Assignee: dougt → danm

Status: ASSIGNED → NEW

Dan M

Updated

•

24 years ago

Blocks: 55403

Dan M

Updated

•

24 years ago

Target Milestone: --- → Future

Dan M

Updated

•

24 years ago

Status: NEW → ASSIGNED

Dan M

Comment 35

•

24 years ago

I'm attaching a patch which builds on Doug's earlier patch in this bug; the one reducing the number of native event notifications. There are still more notifications coming in than are strictly necessary. Today's linux build, for instance, typically queues up 84 concurrent notifications while launching. This may cause some UI starvation (though I haven't noticed any), but it does cause other problems (bug 63646). With this patch applied, I've never seen more than 2 notifications stacked up at any time (which will obviate any workaround for bug 63646). The patch sharply reduces the number of notifications. The risk, of course, is that a notification would go completely missing and events go unprocessed until fresh events are posted. I think this is a good patch, and I've had no problems with it yet, but it's still scary.

Dan M

Comment 36

•

24 years ago

Attached patch reduce count of unnecessary event delivery notifications — Details — Splinter Review

Loco

Comment 37

•

24 years ago

REMOVING nsbeta2/3 and replacing with nsbeta1 KW. Also clearing Status Whiteboard of nsbeta2/3 result. Primarily because this would be nice if it really improves performance. On a side issue, its interesting that a Dan reported this, a Dan is working to fix it, and a Dan is working on testing this. All different Dans. Scary?

Keywords: nsbeta3 → nsbeta1

Whiteboard: [nsbeta3-]

Hixie (not reading bugmail)

Comment 38

•

24 years ago

Reassigning QA Contact for all open and unverified bugs previously under Lorca's care to Gerardo as per phone conversation this morning.

QA Contact: lorca → gerardok

Paul.Sandoz

Comment 39

•

24 years ago

I am using the XPCOM LDAP components and asserting entries into an RDF data source much in the same way as the js LDAP data source. I don't see any noticable improvement when i applied the lastest patch and used async proxy RDF observers, the UI is still unresponsive for the responsive LDAP server used, thus i have to resort to sync proxy RDF observers. I am not overly familiar with aspects of mozilla i am commenting on in the below sentances, so i could be talking a load of... so apologies in advance if so. In terms of putting events onto the UI queue, Java Swing may do somthing very similar with the InvokeLater method: http://java.sun.com/products/jdk/1.2/docs/api/javax/swing/SwingUtilities.html#invokeLater(java.lang.Runnable) I don't know anything on how events work in Mozilla but maybe some ideas could be obtained from the swing source to manage UI events more efficiently? Perhaps the problem lies more with UI processing of events than at the XPCOM layer of event propagation (maybe also thread priorities of user input?). Does there need to be a special UI event queue implementation? Also on the point about RDF Observers and data sources. It seems that a data source which notifies observers, on a different thread, should not have to proxy to the observer thread. This implies that the data source is bound to a particular thread for notification, .e.g UI thread. This could result in UI starvation if data sources are also used for non UI purposes. The solution could be for the presumably XUL obeservers to recognise that a notification is not on the UI thread and act accordingly. There may be some scope for optimization if performed this way.

Paul.Sandoz

Comment 40

•

24 years ago

Just compiled mozilla 0.8.1 + my stuff optimized and no debug. With or without the patch the performance (of course!) is alot better but has the occasional pause, the UI updates faster than the rate of LDAP messages, i presume. Netscape 4x is approximately 2x faster.

Madhur Bhatia

Comment 41

•

23 years ago

QA contact updated

QA Contact: gerardok → madhur

Asa Dotzler [:asa]

Updated

•

23 years ago

Blocks: 104166

basic

Comment 42

•

23 years ago

any of the patches here still work? I'm guessing that more work is needed on this...

Dan M

Comment 43

•

23 years ago

Good guess. Not going to happen before 1.0, as the competition is fierce.

Rakesh Mishra

Updated

•

23 years ago

QA Contact: madhur → rakeshmishra

Markus Hübner

Updated

•

22 years ago

Keywords: patch

Trix Supremo

Updated

•

22 years ago

QA Contact: rakeshmishra → trix

Dan M

Updated

•

20 years ago

Assignee: danm.moz → nobody

Status: ASSIGNED → NEW

Darin Fisher

Comment 44

•

18 years ago

The bug described in comment #0 and comment #10 (in particular) should have been fixed with the threadmanager landing (bug 326273). We still do not have a way to set the priority of "PLEvents" (now they are actually nsIRunnable objects), but we do now take care to better mix gecko events with UI events so that the underlying bug here should be resolved. If we want to introduce priorities for gecko events, then we can still do so, but I think a new bug should be filed with fresh justification. Marking this bug FIXED by bug 326273.

Blocks: nsIThreadManager

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Dão Gottwald [:dao]

Updated

•

13 years ago

Assignee: nobody → doug.turner

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Event Handling → User events and focus handling

PLEvent optimization unified diff 24 years ago Doug Turner (:dougt) 13.41 KB, patch		Details \| Diff \| Splinter Review
Plevent optimization - Take Two. 24 years ago Doug Turner (:dougt) 13.85 KB, patch		Details \| Diff \| Splinter Review
reduce monitor notifications patch 24 years ago Doug Turner (:dougt) 13.78 KB, patch		Details \| Diff \| Splinter Review
reduce count of unnecessary event delivery notifications 24 years ago Dan M 2.45 KB, patch		Details \| Diff \| Splinter Review