Closed Bug 36849 Opened 25 years ago Closed 24 years ago

asynchronous reflow and synchronous paint interact poorly

Categories

(Core :: Layout, defect, P3)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla0.8

People

(Reporter: rouyer, Assigned: kmcclusk)

References

()

Details

(Keywords: perf, polish, testcase, Whiteboard: [nsbeta2-][nsbeta3-])

Attachments

(13 files)

820 bytes, text/html
Details
1.32 KB, patch
Details | Diff | Splinter Review
1.17 KB, patch
Details | Diff | Splinter Review
1.08 KB, text/html
Details
671 bytes, text/html
Details
1.75 KB, patch
Details | Diff | Splinter Review
570 bytes, text/html
Details
891 bytes, patch
Details | Diff | Splinter Review
9.55 KB, patch
Details | Diff | Splinter Review
6.17 KB, patch
Details | Diff | Splinter Review
7.24 KB, patch
Details | Diff | Splinter Review
22.93 KB, patch
Details | Diff | Splinter Review
22.93 KB, patch
Details | Diff | Splinter Review
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m16) Gecko/20000421 BuildID: M14 M15 & M16 2000042108 Netscape4.x and Exploere 4/5x allows the dynamic clipping and positioning of an image at the same time to produce an animation. This occurs without the image flickering and it is an extremely useful technique. Current mozilla builds cause the image to refresh and flicker when clipping and repositioning occur at the same time. Just clipping or just positioning an image doest cause the flicker but together it does as it is shown in http://www.iex.net/rouyer/guru_test Reproducible: Always Steps to Reproduce: 1. flicker 2. flicker 3. flicker Actual Results: flicker Expected Results: The image after is has been dynamically repositioned and clipped at the same time should not be repainted causing a flicker. A very usefull animation technique is to take a single image comprised of multiple frames. You then apply dynamic clipping and repositioning of the image in a style sheet layer to perform the animation. This allows for realtime control of the animation other compared to no control of a gif image. This technique only requires one image not several as in standard image replacement animation. This technique is extremely bandwidth friendly and can be applied to multiple button states as well, but the flickering has to go. previous browsers had no problem with this method.
I see the flickering problem you mention, although when trying to compare it to IE 5.5 or NS 4.72 as you said was possible I encountered two different JavaScript errors. In IE 5.5: 'availableHeight is undefined' (also got a 'previous_position is undefined' error one time) In NS 4.72: document.getElementById is not a function. -and- guru.frame is not a function. I see the problem but have nothing to compare it against so I can see how it 'should' be. ->Javascript?
Assignee: asadotzler → rogerl
Severity: major → normal
Status: UNCONFIRMED → NEW
Component: Browser-General → Javascript Engine
Ever confirmed: true
QA Contact: jelwell → pschwartau
I updated the test page to show the same method working under IE 4/5x and NS 4x. You will see the animations not flicker (refresh) when performing a dynamic clip and position at the same time.
The test page has been updated to account for the new clipping method introduced in the April 24 M16 builds and greater. In fact, the image flashing problem has become worse... Early M16 and M15 builds do not support the new clipping methods, so please use te latest nightly.
Why getting this fixed is important to me and thousand others. Thus the problems I am facing should not be cast to less than "dogfood" bug status. To expand, I am writing a follow-up book to a previous successful book I have written on Dynamic HTML development (Dynamic HTML Web Magic). This one will focus on W3 standards and cross browser development to enable web developers to make pages for the future. I have a large library of methods planed for the new book and an enormous ready audience in both consumer and in higher education markets. If it is votes you want I will rally the charge!
The problem seems to be that setting "style.clip" causes an immediate reflow/repaint. Setting "style.left" does not cause immediate drawing.
For this method to work it requires images.
Adden: I cant see the problem (flicker) in attachment (id=9466). When the page load it simply appears, same with reloads. I created an example with clipping alone and I get no flicker. Another test with just positioning and no flicker. Together they cause problems.
OK, looking deeper, changing the CSS "left" property gives an NS_STYLE_HINT_REFLOW hint, meaning that the document needs to be reflowed. This is done asynchronously; the reflow command is posted and the script continues immediately, and everyone's happy. Changing the CSS "clip" property gives an NS_STYLE_HINT_VISUAL hint, meaning that the document does not need to be reflowed. Sounds like good news, doesn't it? But Mozilla takes the opportunity to update the display immediately and synchronously, before the script can continue. This causes the flashing and other interesting behaviours (such as the display being updated before previous style changes have been taken into account by reflows). I guess the question is, why isn't screen refreshing suppressed during script execution, e.g. using nsIViewManager::BeginUpdateViewBatch? Regarding the test case: just click on the box. After a little while it will start jumping around and flashing. Well, it does on my machine.
Well, after some experimentation it looks like just BeginUpdateViewBatch around the event handler won't do. The problem is that the screen still gets repainted before the reflow happens (and then again after the reflow happens, of course). What we need is a way to delay repainting until the asynchronous reflow command has been processed.
(I don't think this is a JS engine bug, so I'm going to reassign to attinasi and reset the component to layout.) A couple of things: - could we use the asynch reflow mechanisms to schedule asynch repaints, too? - if we've already got a reflow queued, there's no need for a separate paint: reflow requires that each frame repaint, I'm pretty sure.
Assignee: rogerl → attinasi
Component: Javascript Engine → Layout
I added Kevin to the CC list since he knows a lot about how the paint events would be handled if we did post them asynchronously. It seems like a pretty big change to make: DOM-related changes cause synch painting instead of asynch, and the message pump has to handle reflows before paints, and possibly discard paints? Maybe I'm just a scaredy-cat but this seems pretty risky so close to shipping. Contrary opinions? I'd be inclined to push this off until Gecko 2
Status: NEW → ASSIGNED
Assigned a milestone - not necessarily the correct milestone, just a milestone.
Target Milestone: --- → M20
Having double repaints when not needed is bad and should be fixed now.
Potentially causing many bugs to stem from the fix of something not used all that often is also bad, esp. when we're concentrating on stability now. I realize you depend on this to work, but we have to consider the repercussions of doing this, and have to weigh the result with the possible consequences (we want low risk/high reward bugs right now).
We will always consider successful patches from concerned Mozilla community members :-) Seriously, we are trying really hard to prioritize effectively so we can ship the most effective browser possible. We will forever be improving it... Mozilla 1.1 will be close behind 1.0
Believe me, many upon many developers will come across this situation and be bothered. It is not uncommon to apply two dynamic style change effect at the same time. When the full force of web developers get involved they will be bothered for sure. But to remove my assumption of this broad scope I will ask other developers what they think.
I decided to try a couple of easy changes to see if eliminating the synchronous repaints would help. Well, it does and it doesn't; if we do not paint synchronously then no animation is visible. This is because the testcase script uses a tight loop and the paint messages are never serviced until the script ends. If the script was changed to use timers to control the animation then asynch paints would work fine (probably). Intersetingly, IE5 seems to do the asynch paint policy. When I run the testcase in IE5 there is no visible animation, however the UI gets momentarily unresponsive so I'm guessing that it is executing the script. Probably we should make the paint asynch and force authors to use timers to animate. This way UI events could be serviced while the script is running as well, since the timers would allow us to get back and service the message pump. This is a big and scarey change, me thinks. However, we are at risk with this synchronous paint method we currently have; authors may take advantage of it and then if we 'fix' it (make paint asynch) their scripts will appear to be 'broken'... What to do?
IE5's precedent makes me happier -- waterson and I had talked about the case where someone does repeated updates of something like window.status, and don't see the interim values. Teaching EndViewUpdateBatch about ignoring paints that aren't required by a previous reflow is a lot of work: let's not do that. How about we just run event handlers and scripts inside a ViewUpdateBatch, and teach the NS_STYLE_HINT_VISUAL case to check the reflow queue before requesting that the frame paint? That seems relatively low-impact, in terms of interactions with other parts of the code. (Updating the summary to better reflect current thinking WRT the nature of this bug.)
Summary: Image Flicker when Clipping and Positioning → asynchronous reflow and synchronous paint interact poorly
RE: TImers. In my example I am using a setTimeout timer at 50 miliseconds. Are u refering to setInterval or something else?
Posting this pertinent comment I received via email: As a web developer I also rely heavily on clipping animations and like Jeff rouyer plan to release some NS6 based tutorials that rely heavily on this feature. Please it is imperative that this bug gets fixed sooner and not later as this will allow the development of tutorials and NS6 based content and thus expose NS6 to a huge web developer audience. Eddie Traversa ICQ # 8278642 DHTML Tutorials Fireworks & Dreamweaver Extensions and a bunch more! http://nirvana.media3.net/ The Future Now
Are these people using clipping changes in tight loops, or via timers? Jeff has indicated that he uses timers, which is good news, but if people are clamoring for this sort of thing to work, then our current thoughts on asynch repaint are going to need reconsideration: function shrinkWindow(window) { for (i = 0; i < 10; i ++) { window.clip.right -= 10; // no update happens between clips } } (What 4.x did, I think, was composite on a timer, based on the frame-rate settings. I don't think we can do that here, because we have reflow issues to worry about as well, and reflowing from the timer is likely to cause some performance pain. I guess we could try it out, though.)
Timers good! Loop-based animation bad! I don't want to see any animation in my test case. So far, no-one has been able to explain to me why you ever want to do synchronous repainting. Looping animation is just stupid in the event-driven DOM world, and people who use it deserve to be punished. attinasi, can you post your patch so others can try it out?
The general lore around here is that doing the synchronous paints was never intentional. Kevin McClusky, view manager owner, says we should NEVER be doing synchronous paints (possible exception is XUL menus, but even that is questionable). Please try the patch, see what happens. Hopefully nobody is relying on synch. painting so we can remove it, and the remainder of those annoying 'Recursive Paint' assertions will go away too.
If that's the case, why do we have all this infrastructure to support synchronous painting (NS_VMREFRESH_IMMEDIATE, Begin/EndUpdateViewBatch, etc)?
I'm no archaeologist, but from what I was told EndUpdateViewBatch used to not take any flags and always did synchronous updates. It was changed to support the better model of asynch updates, but support for immediate (synch) updates was thought to be required in some cases, hence the flag. That's what I heard anyway, but as to why paints were originally immediate as a result of EndUpdateViewBatch I do no know - maybe the mechanism for getting back to the message queue to get paint messages was borken or slow or maybe somebody just didn't think about it... Does anybody have any less vague explanations?
I tried your patch, but it still causes flashing on the html_guru2 page and a modified version of my test case, which I will attach.
I think we still have the problem where we are posting a paint, and that is being serviced before the reflow, then we reflow and paint again. As shaver suggested, we should try to eliminate the paint that comes before the reflow, but how to do that... Changing the paint to asynch is clearly not enough (or else I am missing a synch paint somewhere... I'll go back and check).
OK, here's what's happening: while handling the mouse click, an asynchronous repaint is requested, and an asynchronous reflow is also requested. Good. Problem is, the repaint message is dispatched by Windows before the PL event message which triggers the reflow. So we paint, reflow, paint again. The repaint event needs to be handled at lowest priority. Windows normally does this itself, but it appears that someone has deliberately overriden this behaviour in the Windows message loop code. Maybe other platforms are working...
The patch reverts to using the standard Windows event priorities. The guru page works properly, and so do my test cases. Everything seems functional, but menus feel slower.
Synchronous updating might make sense for some cases, e.g. tracking/responding to user input. For example, when the user is playing with menus, it's more important to repaint the menu than to repaint other windows. To support this, we should have a method in the PresShell, accessible through Javascript (window.repaintNow?) to force an immediate, synchronous update of a window. This would also let losers write their looping animation scripts. If this is thought to be necessary we should spin off a separate bug.
add hyatt: did you hack any of the painting stuff for menus?
Hi, This bug is absolutely a high priority for web developers as Jeff Rouyer states below. I am concerned about relying on the timer solution as I have found extensive use of timers in Windows creates memory leaks. Mozilla has to be an absolute star at handling dynamic images or it will be laughed off the web and all of your hard efforts will be wasted! It is totally worth it to delay the final release until this issue is resolved well - not half-heartedly and postponing to Gecko 2 as someone suggested below. Regards, Chimneytall
If you want to do animation, you need to use timers. Anything else is just a hack. If timers leak memory, then that's a separate bug and we'll fix it. If you want to register your support for this bug, please vote for it using the Bugzilla voting mechanism. Adding a "please fix" message to the discussion log just makes it harder for us to read what's going on while we work out a fix. Thanks.
Repaints can not be processed at the lowest priority because pages that have animation updates which are faster than the machine can keep up with result in starvation of the paint message servicing. (i.e the page freezes.. spending all or most of its time servicing timers which result in DOM manipulation, which result in invalidates, which result in additional paint messages on the message queue.) This is the reason the repaints and user-events are handled before the timer events.
The default PeekMessage behaviour (which is what my patch selects) services WM_TIMER last, below the priority of WM_PAINT. So I think that should be OK. Is there an existing test case or should I cook one up?
Just to be clear: the test case works fine with my patches.
I already voted prior to posting here. I wasn't sure that anyone was actually watching the vote count. I agree that timers are more intelligent than looping animations. My point is that I wanted the deeper sync/async/reflow issues to be cleaned up.
My problems with timers occurred when I had multiple, simultaneous, timers expiring at different intervals. Your example is terrific and very focused as examples should be. However, it is a very simple case of one timer expiring very quickly.
Thanks for voting. The test case is not meant to reveal memory leaks or anything like that, just to address Kevin McCluskey's concerns about event scheduling, which is what this bug is about. If you have a test case that reveals leaks with complex timers, please submit a new Bugzilla bug.
Michael Lowe did alot of work on the message pump, including the message priorities I believe - I'm cc'ing him to make sure he brings relevant information to the discussion before we change the logic in the pump...
Robert, you mentioned that your change services the timers after any pending paints on Windows - what about on other platforms? Kevin had an additional idea that would help guarantee starvation-prevention and would also solve the flicker problem. I'll relay it here: 1) Apply the two existing patches: mine for asynch paints, and Robert's for better (normal) message queue handling 2) After a reflow is performed, force a repaint (not from the message queue but right from the PresShell where the Reflow is kicked off, in ProcessReflowCommands). This should solve the flicker problem and ensure that the screen is updated frequently enough, even if there are zillions of reflows and the timers are somehow getting serviced before paints. The downside is very small: user interaction will be impossible for a little longer since the paint is happening on the tail-end of the reflow and no UI events will be handled until both are done. Painting generally takes very little time compared to Reflow so it should not be noticable... I'll try to prepare a patch for that part as well so we can see if we have a 'final solution' and something to nominate for nsbeta2.
> Robert, you mentioned that your change services the timers after any pending > paints on Windows - what about on other platforms? My patch only changes Win32-related code, so it only has to work on Win32. Doing the synchronous repaint after reflow sounds safe, but it's not necessary as far as I know.
OK - I tested this with the three patches (06/07/00 13:57,06/07/00 19:35,06/12/00 15:44) and the animation on the URL looks very nice. Can others look over the patches and try this out and report their findings? Also, since so many are interested in having this fixed I'd like to nominate it for nsbeta2 once we have a consensus that it solves the problem *and* that it won't cause any new, more severe, bugs.
I vote for all three patches to be checked in. In the longer term, I think we can and should remove patch #10001. I'm running fine without it. The XP issues are rather murky. Only patch #9790 should really impact non-Windows platforms. (#9840 only affects Windows and #10001 is just a backstop.) However, it can hardly create any new bugs since asynchronous repaint already had to work. I think the worst that can happen XP is that this bug isn't fixed by patch #9790 for platforms have similar event priority issues to the ones I fixed in my Windows patch. FYI, the PeekMessage docs are here: http://msdn.microsoft.com/library/psdk/winui/messques_8085.htm In particular, > Messages are processed in the following order: > Sent messages > Posted messages > Input (hardware) messages and system internal events > Sent messages (again) > WM_PAINT messages > WM_TIMER messages If there are any problems with the default handling, it'll most likely be that reflows (and PLEvents in general) have too high priority. Someone should define and write down exactly what we expect from the platform event queues in terms of event prioritization. I think the Windows priorities are nearly right, except we might want a class of "low priority" PLEvents that are serviced after input events.
I'm in agreement that we should get these patches in, however I cannot make the call on the risk/benefits. Nominating to nsbeta2 so PDT can make the decision. Benefit: eliminates flicker in common animation technique, and eliminates incorrect synchronous update of DOM-issued visual changes. Risks: changing the priority of messages coming off the Windows message queue: low level change, needs lots of testing... Additional note: the last patch I attached (http://bugzilla.mozilla.org/showattachment.cgi?attach_id=10001) is not correct: tracing through it I see that it does not cause the correct view to be painted (I though that the views would be recursed but they are not).
Keywords: nsbeta2
Whiteboard: [fix in hand]
nsbeta2-, nominated for nsbeta3 to give more time that is needed for testing.
Keywords: nsbeta3
Whiteboard: [fix in hand] → [nsbeta2-][fix in hand]
Is there a Win32 build with the best working patch somewhere (nightly or experimental directories) for those who are not compiling the browser?
I can send you a copy of GKHTML.DLL which should work with recent builds. Email me and I'll email it back to you.
Changing to default QA contact for assigned component, Layout.
QA Contact: pschwartau → petersen
If the patches are apparently benign to beneficial so far, they should go into the trunk as soon as a mozilla milestone branch (from which nsbeta2 will branch) is created, so we can maximize testing. /be
I've been using patches #9790 and #9840 for weeks with no apparent problems. My confidence in them is high.
I've been running patch 9793 and 9809 and they are fine. Robert, 9840 is a testcase, could you recheck which patches you are using? I believe patch 10001 is unnecessary (it seems to have no effect).
Yeah, they're the ones I meant :-). 9793 and 9809.
Keywords: patch, perf, polish
I just queried for this bug again to say "now's the time (for checkin)". Anyway, Brendan was faster... :)
Approved for beta3: excellent benefits to performance, animation on par with Nav and IE. Well tested for many weeks by several people.
Whiteboard: [nsbeta2-][fix in hand] → [nsbeta2-][fix in hand][nsbeta3+]
Target Milestone: M20 → M18
Fixes checked in: nsCSSFrameConstructor.cpp - no sync updates now (all no-sync) nsAppShell.cpp and nsWinMaion.cpp (for Robert O'Callahan) - standard event priorities in message-loop Time to Rock & Roll, animators! Let's pull this and make sure we have not introduced any regressions (that would be bad).
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Excellent, thank you all!!!!
Marking verified in the Aug 24th build.
Status: RESOLVED → VERIFIED
Is this fixed just for Windows? What about event priorities on Mac and Linux?
Dunno, but if there are problems they should be filed as separate platform-specific bugs.
The changes to nsAppShell are going to be backed out. This checkin is the cause of no painting *at all* during layout. It negates any attempt at incremental layout. It's taken several days to figure this one out and the general feeling of unresponsiveness during layout has been around since this went in over a month ago. After backing out the changes, the brower is significantly more responsive during layout. I'm going to be a bit blunt here since this has had such a negative effect on performance. Please DO NOT make such radical changes to event handling in the future without much wider discussion of the issue. --Vidur (from jst's account)
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
> Please DO NOT make such radical changes to event handling in the > future without much wider discussion of the issue. I'm sorry that this caused you such trouble. However, many luminaries have been CC'ed on this bug and its brethren (e.g. 44678) for months. I also tried to start a newsgroup discussion on how events should be prioritized (in .xpcom), but there was little interest. What else should have been done?
Anyway, now we're back to square one. We may have better responsiveness, but we're slower overall, because many content changes produce a paint->reflow->paint sequence, forcing us to paint the same area twice. And of course, there's the real reason this bug was filed in the first place: the first paint is frequently bogus because we haven't reflowed yet. Someone needs to lay down exactly what the relative event priorities should be XP, so that people can implement and use them correctly. (This is what I asked for months ago.) Once we have those rules, we can figure out how to fix this bug and other similar bugs.
Robert, it sounds like you did the right thing by trying to drive this discussion here and in newsgroups. I suspect that you might have got more of a response on porkjockeys than xpcom. I guess I'm still reeling from the statement that the change was tested with no perceived ill effects for several weeks before it was checked in. I recognize that layout of the complete document is slightly slower than it was with the change in, but the responsiveness and general usability of the browser is so improved now that it's a worthwhile and necessary tradeoff. Either way, as you say - we're back to square one. Should the discussion move to mozilla-porkjockeys?
I'm not sure it makes sense to classify all PLEvent uses in a single priority pool relative to other events. Seems to me that Robert's earlier suggestion of having more than one priority at which PLEvents are delivered might be a viable solution. For the cases in question, Necko-generated PLEvents might be considered low priority with respect to UI events, reflow-related PLEvents might be considered higher.
Maybe we just didn't look at the right sites, or under the right conditions... If a page loads very fast, then I probably don't notice whether it's displaying incrementally or not. If it loads very slowly, then depending on network conditions and other stuff, the Necko events may stop long enough for a reflow and paint. Let's have a discussion on porkjockeys...
Wow, I'm feeling pretty rotten about this. I really thought we tested this thoroughly, had all of the correct people looking at it, documented / publicized the change appropriatly, only to find out it caused people _days_ of grief trying to track it down. My most sincere apologies. I did post to m.p.l.checkins on Fri, 28 Jul 2000, and the posting did have all of the information about the event priorities. Since nsAppShell was backed out we need to back out /mozilla/webshell/tests/viewer/nsWinMain.cpp too since it has the same change. I'll take care of that (V: was there a bug number for your backout?) Also, clearing FIX IN HAND and nsbeta3+ from status whiteboard...
Whiteboard: [nsbeta2-][fix in hand][nsbeta3+] → [nsbeta2-]
Viewer is backed out now too. It looks like this bug is really dependent on 50104, or at least closely related, so I'm marking as such.
Status: REOPENED → ASSIGNED
Depends on: 50104
Target Milestone: M18 → Future
I tried the attachment from 6/8/00 and I can't tell what it's supposed to show me. Is it broken? Is it working? How can we make sure mac/linux aren't suffering from the same problems?
Synopsis: The current event processing does not ensure that paint events generated by visual changes will be processed at the same time as paint events generated by reflow. This causes flicker and performance problems for animations which manipulate both VISUAL and REFLOW properties within the same timer callback or event handler. Current processing sequence that causes problems: Within timer callback or event handler: 1.a script makes two changes; one results in a VISUAL_CHANGE (performs a invalidate), the other results in scheduling a REFLOW PLEvent. 2. return to the message pump. 3. dispatch PAINT messages for VISUAL_CHANGE invalidate. (See the results of visual change) 4. dispatch REFLOW PLEvent (The reflow invalidates portions of the window) 5. return to the message pump 6. dispatch PAINT messages generated by the REFLOW invalidate calls. (See the results of the reflow change) Proposed solution: Defer the dispatching of the PAINT events using an Invalidate PLEvent. The sequence would change to the following: 1. a script makes two changes; one results in a VISUAL_CHANGE (schedules an invalidate PLEvent), the other results in scheduling a REFLOW PLEvent 2. return to the message pump. 3. dispatch Invalidate PLEvent which performs an invalidate on the window. 4. dispatch REFLOW PLEvent (The reflow invalidates portions of the window) 5. return to the message pump 6. dispatch PAINT messages for the both the reflow and visual changes (See the results of the visual change and reflow change at the same time) The patch to delay invalidates does this.
Is the original testcase (the site in the URL) still available? That URL is not pointing to it any longer and I couldn't find it. rouyer, can you put the test page back up or point us to the new location? Thanks.
Ok, I put the test case back up at http://www.iex.net/rouyer/guru_test To view the same test with Explorer 5.5(Win)to see it displaying correctly.
Is this NS_VIEW_PUBLIC_FLAG_DONT_DOUBLEBUFFER stuff supposed to be checked in? It doesn't seem to be used anywhere. Also, the trunk nsViewManager2 does not support weak references, so the patch won't apply cleanly. After fixing that up, and using my view manager, I'm seeing all sorts of problems. Repaints in many places are happening slowly or not at all. This could be some interaction with my changes though, so other people should test...
Um, you never clear mPendingInvalidateEvent :-).
Oops.. I posted an old patchfile which has some problems as you found. Yes the ProcessPendingUpdates should reset the mPendingInvalidateEvent. The NS_VIEW_PUBLIC_FLAG_DONT_DOUBLEBUFFER is for bug 49743, but it should not be checked in until we confirm that it really helps to solve that bug. I ran all day yesterday with the correct version of the patch and it solved the problem described in this bug, and I didn't notice any major problems.
Can you post the real patch? I'm seeing MAJOR problems even after I put in code to clear mPendingInvalidateEvent.
I;'ve applied the patch and have been testing it. On the URL (http://www.iex.net/rouyer/guru_test/) it does seem to works correctly, at least I don't get flicker when changing the position. However, when the waves are started it sometimes shows the image in the old location before showing it in the new location (flickers). Also, those waves eat 100% of my CPU. I don't know if this is relevant to the bug, but I thought it was interesting. No other problems seen with this patch so far.
A further not on my last comment: the 100% cpu usage of the waves happens with or without the patch.
I paired down the guru test to just the figure animation.
I see two problems. One is that highlighting in menus is now really slow, it lags considerably behind mouse movement. The other problem is that if I quickly mouse over the "pos 0" to "pos 9" links in the test case, then I see the image jump around. As far as I can tell, there is nothing in the code to disable refresh while the moust event handler is running. Do you others see this?
Robert, my SeaMonkey build is not running right now so I have been using Viewer (clobbering and rebuilding now). I can believe that the menu response will have slowed down if it is doing asynch invalidates. For some reason I thought Menus were doing immediate (synchronous) invalidates, but I guess not.
In Viewer, I can still see the image jumping around when mousing over the "pos" links, but I have to move the mouse very fast. In Mozilla it's much more noticeable. Is there anything forcing the paint event to be processed after the reflow event? Not as far as I know.
I'm also seeing that, when typing, nothing gets painted until I stop typing for a bit. It's very bad. Do you see this?
Marking nsbeta3- due to high risk.
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3-]
I am seeing slow menu item updates with a pull from today without the patch. Robert: can you try it without the patch? If you don't want to back out the patch, you can modify nsViewManager2::EnableRefresh(PRUint32 aUpdateFlags) so it always calls ProcessPendingUpdates instead of Posting the invalidate event. I am seeing flicker when I move quickly between the positions with te patch installed.
I, too, am seeing very slow menu updates in a just pulled build. When I open a menu and mouse over an item, it takes a noticeably long time to select the item.
OK, the slow menus are definitely caused by something else. However, we can't really see what effect this has on the menus until the slow menus are fixed. I think the flicker we're still seeing is caused by the following sequence: -- mouse move event (triggers Javascript, posts invalidate and reflow PLEvents) -- PLEvents (posts paint event) -- mouse move event (triggers Javascript, posts invalidate and reflow PLEvents) -- paint event (paints bogus state) So I think that this fix is not quite right and should NOT be checked into the Mozilla trunk. As a 90% solution, it might still be suitable for the NS6 branch if Netscape PDT can be persuaded.
Over to kevein, per our conversation today (more of a rendering issue than a style issue).
Assignee: attinasi → kmcclusk
Status: ASSIGNED → NEW
Upon managerial request, adding the "testcase" keyword to 84 open layout bugs that do not have the "testcase" keyword and yet have an attachement with the word "test" in the description field. Apologies for any mistakes.
Keywords: testcase
Status: NEW → ASSIGNED
*** Bug 60171 has been marked as a duplicate of this bug. ***
*** Bug 60171 has been marked as a duplicate of this bug. ***
*** Bug 62670 has been marked as a duplicate of this bug. ***
Set milestone to mozilla0.8
Target Milestone: Future → mozilla0.8
Analysis after instrumenting (widget, presShell, view) I confirmed the following: There are two separate problems with the flicker. (1) Flicker caused by changes (visual+reflow) changes from a timer event handler (2) Flicker caused by changes (visual+reflow) from a mouse-move event handler (1) is solved by posting a paint invalidate event and processing the invalidate at the same time as the reflow event. (2) is solved by flushing all pending reflows/invalidates at the end of the mouse-movement. It is the result of the inversion of event priorities which favors mouse events over paint events It flickers because of the following sequence: event PL-event(reflow1) Process pending reflows which generates invalidates event MouseMove fires JavaScript event handler reflow change - posts PL-event(reflow2) visual change - changes state of frame which will be visible if the frame is asked to paint event PAINT (generated from PL-event(reflow1) invalidate) >>>>>> This is bad because the PAINT event handler renders the state change of the frame modified in the MouseMove event. (This happens becauses the frame with the state change intersects the invalidated rect generated by processing PL-event(reflow1)). To solve this event sequence problem I added code in nsEventStateManager to flush the pending reflow and invalidate events after all of the mouse callbacks have happened. This changes the sequence to be: event PL-event (reflow1) Process pending reflows which generates invalidates event MouseMove reflow change - posts PL-event(reflow2) visual change - changes state of frame which will be visible if the frame is asked to paint FlushPendingReflow - causes PL-event(reflow2) in reflows to be processed generating invalidates FlushPendingInvalidates - causes the visual change to generate invalidates event PAINT reflow1, reflow2, and visual changes from the mouse move are rendered by the event handler of the paint message. This is needed so the invalidates can be combined with the reflow invalidates. The event state manager already flushes the pending reflows in GenerateDragGesture with every mouse move. I added another call to flush the reflows after the mouse move event handlers have been called. In most cases there will not be any pending reflows or invalidates because they are normally flushed by the GenerateDragGesture.
Added joki@netscape.com to CC list, to comment on the EventStateManager change.
What's the deal with the gViewManagers array being added in the patch? Is that needed here? This patch is, IMHO, acceptable, but it seems to me that it's a bit of a band-aid. The fix should really apply to all interactive events, not just mouse movement --- but applying this patch to all interactive events would be "wrong", because interactive events would need to take priority over these reflows and paints.
"What's the deal with the gViewManagers array being added in the patch? Is that needed here?" I added a sanity check to make sure the Invalidate event had a valid ViewManager instance in it. The mechanism I use to determine it is a valid view manager is to cycle through the existing gViewManagers array to determine if the invalidate event points to valid view manager. The invalidate event should be killed by the mEventQueue->RevokeEvents(this) in the destructor of the viewmanager, but I have seen the assertion indicating a bad viewmanager in the invalidate handle event triggered once after running with an earlier version of the patch. I'd like to see if the current patch ever invokes the bad view manager assertion. "The fix should really apply to all interactive events, not just mouse movement --- but applying this patch to all interactive events would be "wrong", because interactive events would need to take priority over these reflows and paints." I guess the main issue is "do the interactive events require the state of the display to be current?." (i.e all reflowing and invalidates) have been peformed before processing another interactive event. Currently, The key and mouse events have the highest priority, so they are done at the expense of keeping the display current. (The patch forces the reflow/invalidates for each mouse move to keep the display current.) Should this be done for mouse up/mouse and keyevents as well?. My guess is we only want to do it were worried about flicker caused by doing both a visual and reflow change as the result of an event handler. There is a performance tradeoff. No flicker vs. quicker processing of the events. I felt comfortable adding the forcing of reflow/invalidates for the mouse move would not impact current performance, since it already did a flush of all of the reflow commands with every mouse move in the GenerateDragGesture code.
You're right that there's a tradeoff between responsiveness and throughput which is determined by how often we update the screen in response to events. But that is not actually what we're dealing with here. My fault for introducing this red herring. The problem here is that updating has two phases --- reflowing and repainting --- and if we ever repaint while there are pending reflows, we display something that is not a valid rendering of any state of the content. I would argue that whenever this happens, it is a bug. Your patch fixes one instance of this by forcing a (reflow, repaint) cycle after mouse moves, which we have observed can trigger the bug. But other high-priority events can also trigger the bug. It would be easy to modify the testcase to exhibit the bug in response to other input events. But as you and I observe, it would be a mistake to apply your fix to all input events. I think that ultimately we need to tackle the problem head on, by processing any relevant pending reflows every time we repaint. I understand that this might not be easy to do, but any other complete solution will be much nastier.
"But as you and I observe, it would be a mistake to apply your fix to all input events.' I think it really depends on the target of the event. The target ultimately needs to decide if the display needs to synchronized before the target can perform it's processing. For example, if the target of a keyevent is a textfield then it wouldn't want the display to be synchronized. It needs to batch the keyevent's without updating the display. If the target of a keyevent is a javaScript event handler that manipulates the DOM to generate a single frame of animation driven by the keyboard then it would need the display to be synchronized to avoid flicker. "I think that ultimately we need to tackle the problem head on, by processing any relevant pending reflows every time we repaint." I think that would be worth investigating, but at this point I would prefer to go ahead with the current patch and file a new bug. Forcing the flushing of pending reflows with ever repaint would have a far larger impact on the current implementation than the change within this patch.
Tried this patch on Linux, I have not seen any problems so far. I will try mac next.
Hmmm, the latest patch has some failed hunks versus current CVS, even with a large 'fuzz'.
I created bug 65248 to cover the general issue which this bug exposed. Please add your self to the CC list of this bug if your still interested in this issue.
Adam: I just pulled the tip and applied the patch on WIN32 and it worked fine. No failed hunks.
The patch looks good to me as well. However, I do have one architectual concern. nsVoidArray* nsViewManager::GetViewManagerArray() returns a pointer to a private data member of nsViewManager. That means the caller can do all sorts of evil things to the view manager, like add or delete from the array, or even insert an element of an unexpected type. It would be *much* cleaner, and only a little extra work, to have GetViewManagerArray() return an iterator instead.
How about having it return a `const nsVoidArray*'.
that might work as well, though a little less safe since C++ makes "const" easy to cast away.
Making it a const should be adequate. The nsViewManager::GetViewManagerArray() method is not a nsIViewManager interface method and nsViewManager.h is not exported so the method can only be accessed within the view module.
I'm ok with that.
Checked in patch + modification to make nsViewManager::GetViewManagerArray() return a const.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
I just snagged a new build and aaaah, very nice, very smooth, like a clean shave - that animation is flicker free and happy :)
Marking verified in the March 5 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: