Closed Bug 36849 Opened 20 years ago Closed 19 years ago

asynchronous reflow and synchronous paint interact poorly

Categories

(Core :: Layout, defect, P3)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla0.8

People

(Reporter: rouyer, Assigned: kmcclusk)

References

()

Details

(Keywords: perf, polish, testcase, Whiteboard: [nsbeta2-][nsbeta3-])

Attachments

(13 files)

820 bytes, text/html
Details
1.32 KB, patch
Details | Diff | Splinter Review
1.17 KB, patch
Details | Diff | Splinter Review
1.08 KB, text/html
Details
671 bytes, text/html
Details
1.75 KB, patch
Details | Diff | Splinter Review
570 bytes, text/html
Details
891 bytes, patch
Details | Diff | Splinter Review
9.55 KB, patch
Details | Diff | Splinter Review
6.17 KB, patch
Details | Diff | Splinter Review
7.24 KB, patch
Details | Diff | Splinter Review
22.93 KB, patch
Details | Diff | Splinter Review
22.93 KB, patch
Details | Diff | Splinter Review
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m16) Gecko/20000421
BuildID:    M14 M15 & M16 2000042108

Netscape4.x and Exploere 4/5x allows the dynamic clipping and positioning of an
image at the same time to produce an animation. This occurs without the image
flickering and it is an extremely useful technique. Current mozilla builds cause
the image to refresh and flicker when clipping and repositioning occur at the
same time.  Just clipping or just positioning an image doest cause the flicker
but together it does as it is shown in http://www.iex.net/rouyer/guru_test

Reproducible: Always
Steps to Reproduce:
1. flicker
2. flicker
3. flicker

Actual Results:  flicker

Expected Results:  The image after is has been dynamically repositioned and
clipped at the same time should not be repainted causing a flicker.

A very usefull animation technique is to take a single image comprised of
multiple frames. You then apply dynamic clipping and repositioning of the image
in a style sheet layer to perform the animation. This allows for realtime
control of the animation other compared to no control of a gif image. This
technique only requires one image not several as in standard image replacement
animation.  This technique is extremely bandwidth friendly and can be applied to
multiple button states as well, but the flickering has to go.  previous browsers
had no problem with this method.
I see the flickering problem you mention, although when trying to compare it to 
IE 5.5 or NS 4.72 as you said was possible I encountered two different 
JavaScript errors.

In IE 5.5:
'availableHeight is undefined' (also got a 'previous_position is undefined' 
error one time)

In NS 4.72:

document.getElementById is not a function.
-and-
guru.frame is not a function.


I see the problem but have nothing to compare it against so I can see how 
it 'should' be.

->Javascript?
Assignee: asadotzler → rogerl
Severity: major → normal
Status: UNCONFIRMED → NEW
Component: Browser-General → Javascript Engine
Ever confirmed: true
QA Contact: jelwell → pschwartau
I updated the test page to show the same method working under IE 4/5x and NS 4x. 
You will see the animations not flicker (refresh) when performing a dynamic clip 
and position at the same time.
The test page has been updated to account for the new clipping method introduced 
in the April 24 M16 builds and greater. In fact, the image flashing problem has 
become worse... Early M16 and M15 builds do not support the new clipping 
methods, so please use te latest nightly.
Why getting this fixed is important to me and thousand others.  Thus the 
problems I am facing should not be cast to less than "dogfood" bug status.

To expand, I am writing a follow-up book to a previous successful book I have 
written on Dynamic HTML development (Dynamic HTML Web Magic). This one will 
focus on W3 standards and cross browser development to enable web developers to 
make pages for the future. I have a large library of methods planed for the new 
book and an enormous ready audience in both consumer and in higher education 
markets. If it is votes you want I will rally the charge!
The problem seems to be that setting "style.clip" causes an immediate 
reflow/repaint. Setting "style.left" does not cause immediate drawing.
For this method to work it requires images.
Adden: I cant see the problem (flicker) in attachment (id=9466). When the page 
load it simply appears, same with reloads.

I created an example with clipping alone and I get no flicker. Another test with 
just positioning and no flicker. Together they cause problems.
OK, looking deeper, changing the CSS "left" property gives an 
NS_STYLE_HINT_REFLOW hint, meaning that the document needs to be reflowed. This 
is done asynchronously; the reflow command is posted and the script continues 
immediately, and everyone's happy.

Changing the CSS "clip" property gives an NS_STYLE_HINT_VISUAL hint, meaning 
that the document does not need to be reflowed. Sounds like good news, doesn't 
it? But Mozilla takes the opportunity to update the display immediately and 
synchronously, before the script can continue. This causes the flashing and 
other interesting behaviours (such as the display being updated before 
previous style changes have been taken into account by reflows).

I guess the question is, why isn't screen refreshing suppressed during script 
execution, e.g. using nsIViewManager::BeginUpdateViewBatch?

Regarding the test case: just click on the box. After a little while it will 
start jumping around and flashing. Well, it does on my machine.
Well, after some experimentation it looks like just BeginUpdateViewBatch around 
the event handler won't do. The problem is that the screen still gets repainted 
before the reflow happens (and then again after the reflow happens, of course). 
What we need is a way to delay repainting until the asynchronous reflow command 
has been processed.
(I don't think this is a JS engine bug, so I'm going to reassign to attinasi and
reset the component to layout.)

A couple of things:
- could we use the asynch reflow mechanisms to schedule asynch repaints, too?
- if we've already got a reflow queued, there's no need for a separate paint:
reflow requires that each frame repaint, I'm pretty sure.
Assignee: rogerl → attinasi
Component: Javascript Engine → Layout
I added Kevin to the CC list since he knows a lot about how the paint events 
would be handled if we did post them asynchronously.

It seems like a pretty big change to make: DOM-related changes cause synch 
painting instead of asynch, and the message pump has to handle reflows before 
paints, and possibly discard paints? Maybe I'm just a scaredy-cat but this seems 
pretty risky so close to shipping. Contrary opinions? I'd be inclined to push 
this off until Gecko 2
Status: NEW → ASSIGNED
Assigned a milestone - not necessarily the correct milestone, just a milestone.
Target Milestone: --- → M20
Having double repaints when not needed is bad and should be fixed now.
Potentially causing many bugs to stem from the fix of something not used all 
that often is also bad, esp. when we're concentrating on stability now.  I 
realize you depend on this to work, but we have to consider the repercussions 
of doing this, and have to weigh the result with the possible consequences (we 
want low risk/high reward bugs right now).
We will always consider successful patches from concerned Mozilla community 
members :-) Seriously, we are trying really hard to prioritize effectively so we 
can ship the most effective browser possible. We will forever be improving it... 
Mozilla 1.1 will be close behind 1.0
Believe me, many upon many developers will come across this situation and be 
bothered.  It is not uncommon to apply two dynamic style change effect at the 
same time.  When the full force of web developers get involved they will be 
bothered for sure.  But to remove my assumption of this broad scope I will ask 
other developers what they think.
I decided to try a couple of easy changes to see if eliminating the synchronous 
repaints would help. Well, it does and it doesn't; if we do not paint 
synchronously then no animation is visible. This is because the testcase 
script uses a tight loop and the paint messages are never serviced until the 
script ends. If the script was changed to use timers to control the animation 
then asynch paints would work fine (probably). Intersetingly, IE5 seems to do 
the asynch paint policy. When I run the testcase in IE5 there is no visible 
animation, however the UI gets momentarily unresponsive so I'm guessing that it 
is executing the script.

Probably we should make the paint asynch and force authors to use timers to 
animate. This way UI events could be serviced while the script is running as 
well, since the timers would allow us to get back and service the message pump. 
This is a big and scarey change, me thinks. However, we are at risk with this 
synchronous paint method we currently have; authors may take advantage of it and 
then if we 'fix' it (make paint asynch) their scripts will appear to be 
'broken'... What to do?
IE5's precedent makes me happier -- waterson and I had talked about the case
where someone does repeated updates of something like window.status, and don't
see the interim values.

Teaching EndViewUpdateBatch about ignoring paints that aren't required by a
previous reflow is a lot of work: let's not do that.

How about we just run event handlers and scripts inside a ViewUpdateBatch, and
teach the NS_STYLE_HINT_VISUAL case to check the reflow queue before requesting
that the frame paint?  That seems relatively low-impact, in terms of
interactions with other parts of the code.

(Updating the summary to better reflect current thinking WRT the nature of this
bug.)
Summary: Image Flicker when Clipping and Positioning → asynchronous reflow and synchronous paint interact poorly
RE: TImers. In my example I am using a setTimeout timer at 50 miliseconds.  Are 
u refering to setInterval or something else?
Posting this pertinent comment I received via email:

As a web developer I also rely heavily on clipping animations and like Jeff
rouyer plan to release some NS6 based tutorials that rely heavily on this
feature.

Please it is imperative that this bug gets fixed sooner and not later as
this will allow the development of tutorials and NS6 based content and thus
expose NS6 to a huge web developer audience.


Eddie Traversa
ICQ # 8278642

DHTML Tutorials Fireworks & Dreamweaver Extensions and a bunch more!
http://nirvana.media3.net/

The Future Now
Are these people using clipping changes in tight loops, or via timers?  Jeff has
indicated that he uses timers, which is good news, but if people are clamoring
for this sort of thing to work, then our current thoughts on asynch repaint are
going to need reconsideration:

   function shrinkWindow(window) {
       for (i = 0; i < 10; i ++) {
           window.clip.right -= 10; // no update happens between clips
       }
   }

(What 4.x did, I think, was composite on a timer, based on the frame-rate
settings.  I don't think we can do that here, because we have reflow issues to
worry about as well, and reflowing from the timer is likely to cause some
performance pain.  I guess we could try it out, though.)
Timers good! Loop-based animation bad! I don't want to see any animation in my 
test case.

So far, no-one has been able to explain to me why you ever want to do 
synchronous repainting. Looping animation is just stupid in the event-driven 
DOM world, and people who use it deserve to be punished. 

attinasi, can you post your patch so others can try it out?
The general lore around here is that doing the synchronous paints was never 
intentional. Kevin McClusky, view manager owner, says we should NEVER be doing 
synchronous paints (possible exception is XUL menus, but even that is 
questionable). Please try the patch, see what happens. Hopefully nobody is 
relying on synch. painting so we can remove it, and the remainder of those 
annoying 'Recursive Paint' assertions will go away too.
If that's the case, why do we have all this infrastructure to support 
synchronous painting (NS_VMREFRESH_IMMEDIATE, Begin/EndUpdateViewBatch, etc)?
I'm no archaeologist, but from what I was told EndUpdateViewBatch used to not 
take any flags and always did synchronous updates. It was changed to support the 
better model of asynch updates, but support for immediate (synch) updates was 
thought to be required in some cases, hence the flag. That's what I heard 
anyway, but as to why paints were originally immediate as a result of 
EndUpdateViewBatch I do no know - maybe the mechanism for getting back to the 
message queue to get paint messages was borken or slow or maybe somebody just 
didn't think about it... Does anybody have any less vague explanations?
I tried your patch, but it still causes flashing on the html_guru2 page and a 
modified version of my test case, which I will attach.
I think we still have the problem where we are posting a paint, and that is 
being serviced before the reflow, then we reflow and paint again. As shaver 
suggested, we should try to eliminate the paint that comes before the reflow, 
but how to do that... Changing the paint to asynch is clearly not enough (or 
else I am missing a synch paint somewhere... I'll go back and check).
OK, here's what's happening: while handling the mouse click, an asynchronous 
repaint is requested, and an asynchronous reflow is also requested. Good. 
Problem is, the repaint message is dispatched by Windows before the PL event 
message which triggers the reflow. So we paint, reflow, paint again.

The repaint event needs to be handled at lowest priority. Windows normally does 
this itself, but it appears that someone has deliberately overriden this 
behaviour in the Windows message loop code. Maybe other platforms are working...
The patch reverts to using the standard Windows event priorities. The guru page 
works properly, and so do my test cases.

Everything seems functional, but menus feel slower.
Synchronous updating might make sense for some cases, e.g. tracking/responding 
to user input. For example, when the user is playing with menus, it's more 
important to repaint the menu than to repaint other windows. To support this, we 
should have a method in the PresShell, accessible through Javascript 
(window.repaintNow?) to force an immediate, synchronous update of a window. This 
would also let losers write their looping animation scripts. If this is thought 
to be necessary we should spin off a separate bug.
add hyatt: did you hack any of the painting stuff for menus?
Hi,

This bug is absolutely a high priority for web developers as Jeff Rouyer states 
below. I am concerned about relying on the timer solution as I have found 
extensive use of timers in Windows creates memory leaks. 

Mozilla has to be an absolute star at handling dynamic images or it will be 
laughed off the web and all of your hard efforts will be wasted! It is totally 
worth it to delay the final release until this issue is resolved well - not 
half-heartedly and postponing to Gecko 2 as someone suggested below.

Regards,
Chimneytall


If you want to do animation, you need to use timers. Anything else is just a 
hack. If timers leak memory, then that's a separate bug and we'll fix it.

If you want to register your support for this bug, please vote for it using the 
Bugzilla voting mechanism. Adding a "please fix" message to the discussion log 
just makes it harder for us to read what's going on while we work out a fix. 
Thanks.
Repaints can not be processed at the lowest priority because pages that have 
animation updates which are faster than the machine can keep up with result in 
starvation of the paint message servicing. (i.e the page freezes.. spending all 
or most of its time servicing timers which result in DOM manipulation, which 
result in invalidates, which result in additional paint messages on the message 
queue.) This is the reason the repaints and user-events are handled before the 
timer events.
The default PeekMessage behaviour (which is what my patch selects) services 
WM_TIMER last, below the priority of WM_PAINT. So I think that should be OK. Is 
there an existing test case or should I cook one up?
Just to be clear: the test case works fine with my patches.
I already voted prior to posting here. I wasn't sure that anyone was actually 
watching the vote count.

I agree that timers are more intelligent than looping animations. My point is 
that I wanted the deeper sync/async/reflow issues to be cleaned up.
My problems with timers occurred when I had multiple, simultaneous, timers 
expiring at different intervals. Your example is terrific and very focused as 
examples should be. However, it is a very simple case of one timer expiring 
very quickly.
Thanks for voting.

The test case is not meant to reveal memory leaks or anything like that, just to 
address Kevin McCluskey's concerns about event scheduling, which is what this 
bug is about. If you have a test case that reveals leaks with complex timers, 
please submit a new Bugzilla bug.
Michael Lowe did alot of work on the message pump, including the message 
priorities I believe - I'm cc'ing him to make sure he brings relevant 
information to the discussion before we change the logic in the pump...
Robert, you mentioned that your change services the timers after any pending 
paints on Windows - what about on other platforms? Kevin had an additional idea 
that would help guarantee starvation-prevention and would also solve the flicker 
problem. I'll relay it here:
1) Apply the two existing patches: mine for asynch paints, and Robert's for 
better (normal) message queue handling
2) After a reflow is performed, force a repaint (not from the message queue but 
right from the PresShell where the Reflow is kicked off, in 
ProcessReflowCommands).

This should solve the flicker problem and ensure that the screen is updated 
frequently enough, even if there are zillions of reflows and the timers are 
somehow getting serviced before paints. The downside is very small: user 
interaction will be impossible for a little longer since the paint is happening 
on the tail-end of the reflow and no UI events will be handled until both are 
done. Painting generally takes very little time compared to Reflow so it should 
not be noticable...

I'll try to prepare a patch for that part as well so we can see if we have a 
'final solution' and something to nominate for nsbeta2.
> Robert, you mentioned that your change services the timers after any pending 
> paints on Windows - what about on other platforms?

My patch only changes Win32-related code, so it only has to work on Win32.

Doing the synchronous repaint after reflow sounds safe, but it's not necessary 
as far as I know.
OK - I tested this with the three patches (06/07/00 13:57,06/07/00 
19:35,06/12/00 15:44) and the animation on the URL looks very nice. Can others 
look over the patches and try this out and report their findings? Also, since so 
many are interested in having this fixed I'd like to nominate it for nsbeta2 
once we have a consensus that it solves the problem *and* that it won't cause 
any new, more severe, bugs.
I vote for all three patches to be checked in.

In the longer term, I think we can and should remove patch #10001. I'm running 
fine without it.

The XP issues are rather murky. Only patch #9790 should really impact 
non-Windows platforms. (#9840 only affects Windows and #10001 is just a 
backstop.) However, it can hardly create any new bugs since asynchronous repaint 
already had to work. I think the worst that can happen XP is that this bug isn't 
fixed by patch #9790 for platforms have similar event priority issues to the 
ones I fixed in my Windows patch.

FYI, the PeekMessage docs are here:
http://msdn.microsoft.com/library/psdk/winui/messques_8085.htm
In particular,
> Messages are processed in the following order: 
>      Sent messages 
>      Posted messages 
>      Input (hardware) messages and system internal events 
>      Sent messages (again) 
>      WM_PAINT messages 
>      WM_TIMER messages 
If there are any problems with the default handling, it'll most likely be that 
reflows (and PLEvents in general) have too high priority.

Someone should define and write down exactly what we expect from the platform 
event queues in terms of event prioritization. I think the Windows priorities 
are nearly right, except we might want a class of "low priority" PLEvents that 
are serviced after input events.
I'm in agreement that we should get these patches in, however I cannot make the 
call on the risk/benefits. Nominating to nsbeta2 so PDT can make the decision.

Benefit: eliminates flicker in common animation technique, and eliminates 
incorrect synchronous update of DOM-issued visual changes.
Risks: changing the priority of messages coming off the Windows message queue: 
low level change, needs lots of testing...

Additional note: the last patch I attached 
(http://bugzilla.mozilla.org/showattachment.cgi?attach_id=10001) is not correct: 
tracing through it I see that it does not cause the correct view to be painted 
(I though that the views would be recursed but they are not).
Keywords: nsbeta2
Whiteboard: [fix in hand]
nsbeta2-, nominated for nsbeta3 to give more time that is needed for testing.
Keywords: nsbeta3
Whiteboard: [fix in hand] → [nsbeta2-][fix in hand]
Is there a Win32 build with the best working patch somewhere (nightly or 
experimental directories) for those who are not compiling the browser?
I can send you a copy of GKHTML.DLL which should work with recent builds. Email 
me and I'll email it back to you.
Changing to default QA contact for assigned component, Layout.
QA Contact: pschwartau → petersen
If the patches are apparently benign to beneficial so far, they should go into 
the trunk as soon as a mozilla milestone branch (from which nsbeta2 will branch) 
is created, so we can maximize testing.

/be
I've been using patches #9790 and #9840 for weeks with no apparent problems. My 
confidence in them is high.
I've been running patch 9793 and 9809 and they are fine. Robert, 9840 is a 
testcase, could you recheck which patches you are using? I believe patch 10001 
is unnecessary (it seems to have no effect).
Yeah, they're the ones I meant :-). 9793 and 9809.
Keywords: patch, perf, polish
I just queried for this bug again to say "now's the time (for checkin)".
Anyway, Brendan was faster... :)
Approved for beta3: excellent benefits to performance, animation on par with Nav 
and IE. Well tested for many weeks by several people.
Whiteboard: [nsbeta2-][fix in hand] → [nsbeta2-][fix in hand][nsbeta3+]
Target Milestone: M20 → M18
Fixes checked in: 
nsCSSFrameConstructor.cpp 
  - no sync updates now (all no-sync)
nsAppShell.cpp and nsWinMaion.cpp (for Robert O'Callahan) 
  - standard event priorities in message-loop

Time to Rock & Roll, animators! Let's pull this and make sure we have not 
introduced any regressions (that would be bad).
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Excellent, thank you all!!!!
Marking verified in the Aug 24th build.
Status: RESOLVED → VERIFIED
Is this fixed just for Windows? What about event priorities on Mac and Linux?
Dunno, but if there are problems they should be filed as separate 
platform-specific bugs.
The changes to nsAppShell are going to be backed out. This checkin is the cause
of no painting *at all* during layout. It negates any attempt at incremental
layout. It's taken several days to figure this one out and the general feeling
of unresponsiveness during layout has been around since this went in over a
month ago. After backing out the changes, the brower is significantly more
responsive during layout.

I'm going to be a bit blunt here since this has had such a negative effect on
performance. Please DO NOT make such radical changes to event handling in the
future without much wider discussion of the issue. 

--Vidur (from jst's account)
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
> Please DO NOT make such radical changes to event handling in the
> future without much wider discussion of the issue.

I'm sorry that this caused you such trouble.

However, many luminaries have been CC'ed on this bug and its brethren (e.g. 
44678) for months. I also tried to start a newsgroup discussion on how events 
should be prioritized (in .xpcom), but there was little interest. What else 
should have been done?
Anyway, now we're back to square one. We may have better responsiveness, but 
we're slower overall, because many content changes produce a 
paint->reflow->paint sequence, forcing us to paint the same area twice. And of 
course, there's the real reason this bug was filed in the first place: the first 
paint is frequently bogus because we haven't reflowed yet.

Someone needs to lay down exactly what the relative event priorities should be 
XP, so that people can implement and use them correctly. (This is what I asked 
for months ago.) Once we have those rules, we can figure out how to fix this bug 
and other similar bugs.
Robert, it sounds like you did the right thing by trying to drive this 
discussion here and in newsgroups. I suspect that you might have got more of a 
response on porkjockeys than xpcom.

I guess I'm still reeling from the statement that the change was tested with no 
perceived ill effects for several weeks before it was checked in. I recognize 
that layout of the complete document is slightly slower than it was with the 
change in, but the responsiveness and general usability of the browser is so 
improved now that it's a worthwhile and necessary tradeoff.

Either way, as you say - we're back to square one. Should the discussion move to 
mozilla-porkjockeys?
I'm not sure it makes sense to classify all PLEvent uses in a single priority 
pool relative to other events. Seems to me that Robert's earlier suggestion of 
having more than one priority at which PLEvents are delivered might be a viable 
solution. For the cases in question, Necko-generated PLEvents might be 
considered low priority with respect to UI events, reflow-related PLEvents might 
be considered higher.
Maybe we just didn't look at the right sites, or under the right conditions... 
If a page loads very fast, then I probably don't notice whether it's displaying 
incrementally or not. If it loads very slowly, then depending on network 
conditions and other stuff, the Necko events may stop long enough for a reflow 
and paint.

Let's have a discussion on porkjockeys...
Wow, I'm feeling pretty rotten about this. I really thought we tested this 
thoroughly, had all of the correct people looking at it, documented / publicized 
the change appropriatly, only to find out it caused people _days_ of grief 
trying to track it down. My most sincere apologies. I did post to m.p.l.checkins 
on Fri, 28 Jul 2000, and the posting did have all of the information about the 
event priorities.

Since nsAppShell was backed out we need to back out 
/mozilla/webshell/tests/viewer/nsWinMain.cpp too since it has the same change. 
I'll take care of that (V: was there a bug number for your backout?)

Also, clearing FIX IN HAND and nsbeta3+ from status whiteboard...
Whiteboard: [nsbeta2-][fix in hand][nsbeta3+] → [nsbeta2-]
Viewer is backed out now too. It looks like this bug is really dependent on 
50104, or at least closely related, so I'm marking as such.
Status: REOPENED → ASSIGNED
Depends on: 50104
Target Milestone: M18 → Future
I tried the attachment from 6/8/00 and I can't tell what it's supposed to show
me. Is it broken? Is it working? How can we make sure mac/linux aren't suffering
from the same problems?
Synopsis: 
The current event processing does not ensure that paint events generated by 
visual changes will be processed at the same time as paint events generated by 
reflow.  This causes flicker and performance problems for animations which 
manipulate both VISUAL and REFLOW properties within the same timer callback or 
event handler. 

Current processing sequence that causes problems: 

Within timer callback or event handler: 
1.a script  makes two changes; one results in a VISUAL_CHANGE (performs a 
invalidate), the other results in scheduling
a REFLOW PLEvent. 
2. return to the message pump. 
3. dispatch PAINT messages for VISUAL_CHANGE invalidate.  (See the results of  
visual change) 
4. dispatch REFLOW PLEvent  (The reflow invalidates portions of the window) 
5. return to the message pump 
6. dispatch PAINT messages generated by the REFLOW invalidate calls. (See the 
results of the reflow change) 

Proposed solution: 

Defer the dispatching of the PAINT events using an Invalidate PLEvent. 

The sequence would change to the following: 

1. a script makes two changes; one results in a VISUAL_CHANGE (schedules an 
invalidate PLEvent), the other results
in scheduling a REFLOW PLEvent 
2. return to the message pump. 
3. dispatch Invalidate PLEvent which performs an invalidate on the window. 
4. dispatch REFLOW PLEvent  (The reflow invalidates portions of the window) 
5. return to the message pump 
6. dispatch PAINT messages for the both the reflow and visual changes (See the 
results of the visual change and reflow
change at the same time) 

The patch to delay invalidates does this.
Is the original testcase (the site in the URL) still available? That URL is not
pointing to it any longer and I couldn't find it. rouyer, can you put the test
page back up or point us to the new location? Thanks.
Ok, I put the test case back up at http://www.iex.net/rouyer/guru_test

To view the same test with Explorer 5.5(Win)to see it displaying correctly.
Is this NS_VIEW_PUBLIC_FLAG_DONT_DOUBLEBUFFER stuff supposed to be checked in? 
It doesn't seem to be used anywhere.

Also, the trunk nsViewManager2 does not support weak references, so the patch 
won't apply cleanly.

After fixing that up, and using my view manager, I'm seeing all sorts of 
problems. Repaints in many places are happening slowly or not at all. This could 
be some interaction with my changes though, so other people should test...
Um, you never clear mPendingInvalidateEvent :-).
Oops.. I posted an old patchfile which has some problems as you found. Yes the 
ProcessPendingUpdates should reset the mPendingInvalidateEvent.

The NS_VIEW_PUBLIC_FLAG_DONT_DOUBLEBUFFER is for bug 49743, but it should not be 
checked in until we confirm that it really helps to solve that bug.

 I ran all day yesterday with the correct version of the patch and it solved the 
problem described in this bug, and I didn't notice any major problems.


Can you post the real patch? I'm seeing MAJOR problems even after I put in code 
to clear mPendingInvalidateEvent.
I;'ve applied the patch and have been testing it. On the URL
(http://www.iex.net/rouyer/guru_test/) it does seem to works correctly, at least
I don't get flicker when changing the position. However, when the waves are
started it sometimes shows the image in the old location before showing it in
the new location (flickers). Also, those waves eat 100% of my CPU. I don't know
if this is relevant to the bug, but I thought it was interesting.

No other problems seen with this patch so far.
A further not on my last comment: the 100% cpu usage of the waves happens with
or without the patch.
I paired down the guru test to just the figure animation.
I see two problems. One is that highlighting in menus is now really slow, it 
lags considerably behind mouse movement.

The other problem is that if I quickly mouse over the "pos 0" to "pos 9" links 
in the test case, then I see the image jump around. As far as I can tell, there 
is nothing in the code to disable refresh while the moust event handler is 
running.

Do you others see this?
Robert, my SeaMonkey build is not running right now so I have been using Viewer
(clobbering and rebuilding now). I can believe that the menu response will have
slowed down if it is doing asynch invalidates. For some reason I thought Menus
were doing immediate (synchronous) invalidates, but I guess not.
In Viewer, I can still see the image jumping around when mousing over the "pos" 
links, but I have to move the mouse very fast. In Mozilla it's much more 
noticeable.

Is there anything forcing the paint event to be processed after the reflow 
event? Not as far as I know.
I'm also seeing that, when typing, nothing gets painted until I stop typing for
a bit. It's very bad. Do you see this?
Marking nsbeta3- due to high risk.
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3-]
I am seeing slow menu item updates with a pull from today without the patch. 
Robert: can you try it without the patch? 

If you don't want to back out the patch, you can 
modify nsViewManager2::EnableRefresh(PRUint32 aUpdateFlags) so it always calls 
ProcessPendingUpdates instead of Posting the invalidate event.

I am seeing flicker when I move quickly between the positions with te patch 
installed.
I, too, am seeing very slow menu updates in a just pulled build.  When I open a
menu and mouse over an item, it takes a noticeably long time to select the item.
OK, the slow menus are definitely caused by something else. However, we can't 
really see what effect this has on the menus until the slow menus are fixed.

I think the flicker we're still seeing is caused by the following sequence:
-- mouse move event (triggers Javascript, posts invalidate and reflow PLEvents)
-- PLEvents (posts paint event)
-- mouse move event (triggers Javascript, posts invalidate and reflow PLEvents)
-- paint event (paints bogus state)

So I think that this fix is not quite right and should NOT be checked into the 
Mozilla trunk. As a 90% solution, it might still be suitable for the NS6 branch 
if Netscape PDT can be persuaded.
Over to kevein, per our conversation today (more of a rendering issue than a
style issue).
Assignee: attinasi → kmcclusk
Status: ASSIGNED → NEW
Upon managerial request, adding the "testcase" keyword to 84 open layout bugs that
do not have the "testcase" keyword and yet have an attachement with the word
"test" in the description field. Apologies for any mistakes.
Keywords: testcase
Status: NEW → ASSIGNED
*** Bug 60171 has been marked as a duplicate of this bug. ***
*** Bug 60171 has been marked as a duplicate of this bug. ***
*** Bug 62670 has been marked as a duplicate of this bug. ***
Set milestone to mozilla0.8
Target Milestone: Future → mozilla0.8
Analysis after instrumenting (widget, presShell, view) I confirmed the following:

There are two separate problems with the flicker.
(1) Flicker caused by changes (visual+reflow) changes from a timer event handler
(2) Flicker caused by changes (visual+reflow) from a mouse-move event handler

(1) is solved by posting a paint invalidate event and processing the invalidate
at the same time as the reflow event.
(2) is solved by flushing all pending reflows/invalidates at the end of the
mouse-movement. It is the result of the inversion of event priorities which
favors mouse events over paint events

It flickers because of the following sequence:

event PL-event(reflow1)
  Process pending reflows which generates invalidates
event MouseMove
  fires JavaScript event handler
       reflow change - posts PL-event(reflow2)
       visual change - changes state of frame which will be visible if the frame
      is asked to paint
event PAINT (generated from PL-event(reflow1) invalidate)

>>>>>> This is bad because the PAINT event handler renders the state change of
the frame modified in the MouseMove event. (This happens becauses the frame with
the state change intersects the invalidated rect generated by processing
PL-event(reflow1)).


To solve this event sequence problem I added code in nsEventStateManager to
flush the pending reflow and invalidate events after all of the mouse callbacks
have happened. This changes the sequence to be:

event PL-event (reflow1)
   Process pending reflows which generates invalidates
event MouseMove
    reflow change - posts PL-event(reflow2)
    visual change - changes state of frame which will be visible if the frame is
asked to paint
   FlushPendingReflow - causes PL-event(reflow2) in reflows to be processed
generating invalidates
   FlushPendingInvalidates - causes the visual change to generate invalidates
event PAINT
    reflow1, reflow2, and visual changes from the mouse move are rendered by the
event handler of the paint message.


This is needed so the invalidates can be combined with the reflow invalidates.
The event state manager already flushes the pending reflows in
GenerateDragGesture with every mouse move. I added another call to flush the
reflows after the mouse move event handlers have been called.
In most cases there will not be any pending reflows or invalidates because they
are normally flushed by the GenerateDragGesture.
Added joki@netscape.com to CC list, to comment on the EventStateManager change.

What's the deal with the gViewManagers array being added in the patch? Is that 
needed here?

This patch is, IMHO, acceptable, but it seems to me that it's a bit of a 
band-aid. The fix should really apply to all interactive events, not just mouse 
movement --- but applying this patch to all interactive events would be "wrong", 
because interactive events would need to take priority over these reflows and 
paints.
"What's the deal with the gViewManagers array being added in the patch? Is that
needed here?"

I added a sanity check to make sure the Invalidate event had a valid ViewManager
instance in it. The mechanism I use to determine it is a valid view manager is
to cycle through the existing gViewManagers array to determine if the invalidate
event points to valid view manager. The invalidate event should be killed by the
mEventQueue->RevokeEvents(this) in the destructor of the viewmanager, but I have
seen the assertion indicating a bad viewmanager in the invalidate handle event
triggered once after running with an earlier version of the patch.  I'd like to
see if the current patch ever invokes the bad view manager assertion.

"The fix should really apply to all interactive events, not just mouse
movement --- but applying this patch to all interactive events would be "wrong",
because interactive events would need to take priority over these reflows and
paints."

I guess the main issue is "do the interactive events require the state of the
display to be current?." (i.e all reflowing and invalidates) have been peformed
before processing another interactive event.

Currently, The key and mouse events have the highest priority, so they are done
at the expense of keeping the display current. (The patch forces the
reflow/invalidates for each mouse move to keep the display current.) Should this
be done for mouse up/mouse and keyevents as well?. My guess is we only want to
do it were worried about flicker caused by doing both a visual and reflow change
as the result of an event handler. There is a performance tradeoff. No flicker
vs. quicker processing of the events. I felt comfortable adding the forcing of
reflow/invalidates for the mouse move would not impact current performance,
since it already did a flush of all of the reflow commands with every mouse move
in the GenerateDragGesture code.




You're right that there's a tradeoff between responsiveness and throughput which 
is determined by how often we update the screen in response to events. But that 
is not actually what we're dealing with here. My fault for introducing this red 
herring.

The problem here is that updating has two phases --- reflowing and repainting 
--- and if we ever repaint while there are pending reflows, we display something 
that is not a valid rendering of any state of the content. I would argue that 
whenever this happens, it is a bug.

Your patch fixes one instance of this by forcing a (reflow, repaint) cycle after 
mouse moves, which we have observed can trigger the bug. But other high-priority 
events can also trigger the bug. It would be easy to modify the testcase to 
exhibit the bug in response to other input events. But as you and I observe, it 
would be a mistake to apply your fix to all input events.

I think that ultimately we need to tackle the problem head on, by processing any 
relevant pending reflows every time we repaint. I understand that this might not 
be easy to do, but any other complete solution will be much nastier.
"But as you and I observe, it 
would be a mistake to apply your fix to all input events.'

I think it really depends on the target of the event. The target ultimately 
needs to decide if the display needs to synchronized before the target can 
perform it's processing. For example, if the target of a keyevent is a textfield 
then it wouldn't want the display to be synchronized. It needs to batch the 
keyevent's without updating the display. If the target of a keyevent is a 
javaScript event handler that manipulates the DOM to generate a single frame of 
animation driven by the keyboard then it would need the display to be 
synchronized to avoid flicker. 


"I think that ultimately we need to tackle the problem head on, by processing 
any relevant pending reflows every time we repaint."

I think that would be worth investigating, but at this point I would prefer to 
go ahead with the current patch and file a new bug. Forcing the flushing of 
pending reflows with ever repaint would have a far larger impact on the current 
implementation than the change within this patch. 



Tried this patch on Linux, I have not seen any problems so far. I will try mac 
next.
Hmmm, the latest patch has some failed hunks versus
current CVS, even with a large 'fuzz'.
I created bug 65248 to cover the general issue which this bug exposed. Please
add your self to the CC list of this bug if your still interested in this issue.
Adam: I just pulled the tip and applied the patch on WIN32 and it worked fine.
No failed hunks.
The patch looks good to me as well.  However, I do have one architectual concern.
nsVoidArray* nsViewManager::GetViewManagerArray()
returns a pointer to a private data member of nsViewManager.  That means the
caller can do all sorts of evil things to the view manager, like add or delete
from the array, or even insert an element of an unexpected type.
It would be *much* cleaner, and only a little extra work, to have
GetViewManagerArray() return an iterator instead.
How about having it return a `const nsVoidArray*'.
that might work as well, though a little less safe since C++ makes "const" easy
to cast away.
Making it a const should be adequate. The nsViewManager::GetViewManagerArray()
 method is not a nsIViewManager interface method and nsViewManager.h is not
exported so the method can only be accessed within the view module.
I'm ok with that.
Checked in patch + modification to make nsViewManager::GetViewManagerArray()
 return a const.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
I just snagged a new build and aaaah, very nice, very smooth, like a clean shave 
- that animation is flicker free and happy :)
Marking verified in the March 5 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.