Closed Bug 129953 Opened 22 years ago Closed 22 years ago

Regression causing "frame drop-offs" [DHTML / painting-supression]

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: markushuebner, Assigned: pavlov)

References

()

Details

(Keywords: regression, testcase, top100, Whiteboard: [adt2] FIXED ON TRUNK [ETA 05/02])

Attachments

(2 files, 3 obsolete files)

Just go to http://www.formula-one.nu/dhtml/3d.htm - in recent builds major 
frame drop-offs happen. Unfortunately this is also present on the 0.9.9 trunk 
(testing 2002031903)

Tested back builds, I find out that the regression started with build 
2002021803 (the testcase is fine with build 2002021708)
The checkin of bug 117061 (02/17/2002 16:10) seems to be the reason.
Blocks: 21762
Markush, please give this to the XP timer owner, pavlov.  I'll help.

/be
Assignee: jst → pavlov
No problems with 0.9.9 build 2002031104
Blocks: 117061
No longer blocks: 117061
Blocks: 117061
A very fine testcase from Peter for playing around
http://www.formula-one.nu/mozilla/timeouttest.htm

You can test with different timeouts and number of elements.
You can see that a small timeout causes frame-drop.
You can also see that a big timeout but with many elements causes the same 
problem.
matic: it would help if you could produce something that measured the time
between SetTimeout() and the timeout being called, and display the distribution
(NOT average) of times.  I suspect the problem is that for small timeouts the
algorithm occasionally does some subtraction and ends up using an effective 0
timeout, causing the appearance of a jump (the reflows may get coalesced for two
updates, even with the current code, before it's drawn).
jesup: I've updated the testcase with timers. (the same url)

Do you think it can be used or do you want me to change it?
bug 117061 has some useful stats from an updated testcase attached.  I'm
duplicating a comment I just made there, since it's really more applicable to
this specific bug:

This test now shows we have serious problems with the SetTimeout processing,
perhaps partly due to the previous attempt to fix timers.  With a SetTimeout of
100ms (which is what matic used, the default settings), you shouldn't see a
pattern of sometimes 250ms, sometimes 10ms.  Now, the 250 may be largely due to
bug 129115, but we shouldn't be seeing 10 or 20, and especially not several of
them in a row.

OS->All
OS: Windows 2000 → All
Hardware: PC → All
http://www.kunstforum-wien.at/index.php?area=newsletter
is another really bad example.
if you move the mouse-cursor over the symbol for scrolling up or down it's 
even hard to realise that this is animating as there seems to be no animation.
if you move around with the mouse-cursor the animation takes place.
as jesup already stated in bug 117061 comment #10 we are having major troubles 
with firing timers.
Attached patch partial fix on windows (obsolete) — Splinter Review
this patch makes things a lot better, but still isn't perfect..
Great - hope to see this in 1.0 as this would ease a few things.
Keywords: patch
Attached patch better fixSplinter Review
this patch makes it up to the app's main loop code to call to turn on idle
timers and makes the appshell code for mozilla on windows do it by default. 
other apps and platforms can make this call and do the make the calls to get
this functionality, otherwise they will continue to operate as they used to.
Attachment #77190 - Attachment is obsolete: true
I'll review this soon, by tonight for sure.  I'm keen to hear from Markush and
others who might try it, if they can (or get an optimized build that contains
this patch from pavlov).  Also, I think someone should r= -- if that's me, who
will sr=?

/be
Keywords: mozilla1.0mozilla1.0+
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt2]
Updated nsAppShell.cpp patch which also processes idle timers when we are
modal.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment on attachment 77335 [details] [diff] [review]
better fix

sr=rpotts@netscape.com
Attachment #77335 - Flags: superreview+
Comment on attachment 78150 [details] [diff] [review]
Updated patch to nsAppShell.cpp  (previous patch required - nsAppShell changes)

sr=rpotts@netscape.com
Attachment #78150 - Flags: superreview+
Comment on attachment 78150 [details] [diff] [review]
Updated patch to nsAppShell.cpp  (previous patch required - nsAppShell changes)

r=rjesup@wgate.com
Comment on attachment 78150 [details] [diff] [review]
Updated patch to nsAppShell.cpp  (previous patch required - nsAppShell changes)

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #78150 - Flags: approval+
Comment on attachment 77335 [details] [diff] [review]
better fix

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77335 - Flags: approval+
Keywords: adt1.0.0
rjesup wanted to know if I was cool with these changes, and I am -- pav already
showed them to me last week -- I was just waiting for others to r=/sr= and
relieve me of contributing to volunteer effect.

/be
We have r/sr/a, and brendan is ok with it.  Let's commit this now!  If you don't
have time/etc to commit it, reassign to me and I can do it.
What kind of risk is associated with this patch and how much testing has been done.
Can we get some builds to QA for testing? Can you checkin on the trunk after Moz
1.0 branches and let it bake for a day or two
Time to RC1 is short. This patch is not huge and it's well reviewed. We should
get this in quickly. 
i've landed this on the trunk.  Will land on the branch once I get adt
approval.......
Let's let it bake on the trunk for a couple of days, have QA take a look at it,
and make sure there are no regressions introduced, then come back to us for
approval - ADT. Removing adt1.0.0. Pls renominate once it has been on the trunk
a couple days,
and been tested. 
Keywords: adt1.0.0
I've tested the patch with 2002041003 NT4.

Since I have a slow machine (PII 300), the testcase I used is a modified version
of     http://www.formula-one.nu/mozilla/timeouttest.htm
(http://bugzilla.mozilla.org/showattachment.cgi?attach_id=78592 see bug#136688)

Results show very stable numbers.

Timeout    500            700            300
---------------------------------------------
Iter 1     501            701            290
 "   2     501            701            301
 "   3     500            701            290
 "   4     501            701            291
 "   5     501            701            301          
 "   6     500            701            300
 "   7     501            701            310
 "   8     501            701            311
 "   9     501            701            300
 "   10    500            701            310
............................................

I'm suprised by times less than the timeout.
Depends on: 136677
I'm going to be out of the office for the next week or so.  I would appriciate
it if someone could land this on the branch while I'm away.
How much work is needed to get this fixed for non-Windows platforms at this
point (per comment 11)?
Comment on attachment 77335 [details] [diff] [review]
better fix

these changes were backed out due to a performance regression (see bug 136677)
Attachment #77335 - Flags: needs-work+
Comment on attachment 78150 [details] [diff] [review]
Updated patch to nsAppShell.cpp  (previous patch required - nsAppShell changes)

marking needs-work since this was backed out (bug 136677)
Attachment #78150 - Flags: needs-work+
Final numbers given seem to be 1% increase in Tp.

For a significant improvement in DHTML functionality (not just performance) this
seems worth it.  DHTML functionality was broken by the previous timer checkins
to reduce latency; this fixes that regression.  If there are Tp issues due to
that, we can resolve them or decide that they're unresolvable; we shouldn't back
out a significant regression fix for 1% Tp.

Note this fixes considerable randomness in timers; without this fix they can
fire way earlier than requested or way later.  100ms SetTimeout was taking 10ms
to 250ms, apparently randomly.

I say check it back in, mirror it for Unix/etc, and then resolve any fallout.
I'm thinking the same as Randell Jesup in comment #31. Just 1%-1.5% increase in
pageload for a much wanted bugfix!? I agree focus on performance is important in
current Mozilla, but...
I would rather back out the whole new timer implementation that caused many of
these regressions. I don't know what good came from it, but since it's probably
too late to back it out we _have_ to fix the regressions so that the stuff that
depends on timers are unbroken.

That it cost performance sucks, but what can we do? The timers as they are right
now are just not acceptable. :-(
is bug 136447 a dup of this? It got around extremely much faster after this
checkin (allthough still around 5-10 times slower than MSIE)
Speed is always an issue, but in this case I would rather see Mozilla 1.0 do the
right thing a little slow than see it doing the wrong thing a little more quickly.
Since these changes can be checked in in 3 independent parts, I'm going to check
them in in 3 pieces to see which piece tweaked the Linux builds (since none of
them should have).  I just checked in the Makefile.in / makefile.win to add the
nsITimerManager.idl to the build.
OK, adding the IDL, not surprisingly, didn't increase the time.  So I just
checked in the rest of the xpcom/threads/ changes.  (Still no factory change or
Windows appshell change.)
So, as I was just discussing with bbaetz & jrgm on IRC, this code *is* used by
linux.  The registering of the class occurs on all platforms.  The change to
nsXPComInit.cpp causes an extra RegisterFactory call from inside of
NS_InitXPCOM2.  It doesn't seem like a single class should cause a 10ms change
in page load...especially since NS_InitXPCOM2 should be called way before
pageload occurs but that's the only common connection that I can see.
pavlov is out; to manager.
Assignee: pavlov → gagan
Status: ASSIGNED → NEW
adt1.0.0+ (on ADT's behalf) for checkin into the 1.0 branch. Pls check this in
to the branch today. After it is checked in, pls add fixed1.0.0. Once QA has
verified it on the branch, then add verified1.0.0.
Keywords: adt1.0.0+
Comment on attachment 78150 [details] [diff] [review]
Updated patch to nsAppShell.cpp  (previous patch required - nsAppShell changes)

removing needs work and re-approving for the 1.0 branch
Attachment #78150 - Flags: needs-work+
Comment on attachment 77335 [details] [diff] [review]
better fix

removing needs work and re-approving for the 1.0 branch.
Attachment #77335 - Flags: needs-work+
doug can you help? :)
Assignee: gagan → dougt
@#$%!!  The patch does not apply cleanly.  It does not include mac or unix
makefiles.  Ugh.

Attaching what I think the patch is suppose to look like.  I am deeply conserned
about the nsAppShell.cpp changes as they are not the same on the trunk.
No longer depends on: 136677
after talking with dbaron, we agreed that the nsAppShell.cpp from the trunk
should be landed.
Checking in widget/src/windows/nsAppShell.cpp;
/cvsroot/mozilla/widget/src/windows/nsAppShell.cpp,v  <--  nsAppShell.cpp
new revision: 3.42.16.2; previous revision: 3.42.16.1
done
Checking in xpcom/build/nsXPComInit.cpp;
/cvsroot/mozilla/xpcom/build/nsXPComInit.cpp,v  <--  nsXPComInit.cpp
new revision: 1.126.12.2; previous revision: 1.126.12.1
done
Checking in xpcom/threads/TimerThread.cpp;
/cvsroot/mozilla/xpcom/threads/TimerThread.cpp,v  <--  TimerThread.cpp
new revision: 1.6.4.2; previous revision: 1.6.4.1
done
Checking in xpcom/threads/makefile.win;
/cvsroot/mozilla/xpcom/threads/makefile.win,v  <--  makefile.win
new revision: 1.22.2.2; previous revision: 1.22.2.1
done
Checking in xpcom/threads/Makefile.in;
/cvsroot/mozilla/xpcom/threads/Makefile.in,v  <--  Makefile.in
new revision: 1.25.4.2; previous revision: 1.25.4.1
done
Checking in xpcom/threads/MANIFEST_IDL;
/cvsroot/mozilla/xpcom/threads/MANIFEST_IDL,v  <--  MANIFEST_IDL
new revision: 3.4.4.2; previous revision: 3.4.4.1
done
Checking in xpcom/threads/nsITimerManager.idl;
/cvsroot/mozilla/xpcom/threads/nsITimerManager.idl,v  <--  nsITimerManager.idl
new revision: 1.1.4.1; previous revision: 1.1
done
Checking in xpcom/threads/nsTimerImpl.cpp;
/cvsroot/mozilla/xpcom/threads/nsTimerImpl.cpp,v  <--  nsTimerImpl.cpp
new revision: 1.11.4.2; previous revision: 1.11.4.1
done
Checking in xpcom/threads/nsTimerImpl.h;
/cvsroot/mozilla/xpcom/threads/nsTimerImpl.h,v  <--  nsTimerImpl.h
new revision: 1.5.4.2; previous revision: 1.5.4.1
done

Landed the above files on the branch.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Um... are we opening other bugs for non-Windows platforms then?
Sorry but this is definitely fixed only on windows. 
I've opened bug 137706 for the Linux platform.
adding fised1.0.0 keyord to make sure this gets QA verification.
Keywords: fixed1.0.0
Reopening because there are problems with the patch!

The nsAppShell.cpp patch caused dialog problems.  Those that are familar with
this patch please review the change which I landed last night.  

rpotts and rjesup, 
You reviewed these changes.  Any idea what could cause 137796?  Note that dbaron
and I landed the trunk version of the nsAppShell.cpp.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch better patch v.2Splinter Review
fixes the dialog problem
Attachment #78150 - Attachment is obsolete: true
Attachment #79376 - Attachment is obsolete: true
Comment on attachment 79482 [details] [diff] [review]
better patch v.2

sr=alecf but I'd sure like a reviewer who's more familiar with this to read
this.
Comment on attachment 79482 [details] [diff] [review]
better patch v.2

r=rjesup@wgate.com - this is the same as the patch I r='d before (2nd part of
pav's patch)
Attachment #79482 - Flags: review+
Comment on attachment 79482 [details] [diff] [review]
better patch v.2

AlecF sr=='d it already
Attachment #79482 - Flags: superreview+
Comment on attachment 79482 [details] [diff] [review]
better patch v.2

i'll approve this; straight backout of this file didn't build for me.
Attachment #79482 - Flags: approval+
Checking in nsAppShell.cpp;
/cvsroot/mozilla/widget/src/windows/nsAppShell.cpp,v  <--  nsAppShell.cpp
new revision: 3.42.16.3; previous revision: 3.42.16.2
done


this should do it.  
there are residual problems with other dialogs, even with your change (cookie
dialogs, for instance, require that i click on ok or cancel in rapid succession
to dismiss the dialog). I don't think we can ship rc1 unless we can clean up
that behaviour.
This isn't fixed. We probably need to just back it out.
Severity: major → blocker
Keywords: fixed1.0.0, patchsmoketest
Whiteboard: [adt2] → [adt2] BRANCH ONLY
Checking in mozilla/xpcom/threads/nsTimerImpl.h;
/cvsroot/mozilla/xpcom/threads/nsTimerImpl.h,v  <--  nsTimerImpl.h
new revision: 1.5.4.3; previous revision: 1.5.4.2
done
Checking in mozilla/xpcom/threads/nsTimerImpl.cpp;
/cvsroot/mozilla/xpcom/threads/nsTimerImpl.cpp,v  <--  nsTimerImpl.cpp
new revision: 1.11.4.4; previous revision: 1.11.4.3
done
Checking in mozilla/xpcom/threads/MANIFEST_IDL;
/cvsroot/mozilla/xpcom/threads/MANIFEST_IDL,v  <--  MANIFEST_IDL
new revision: 3.4.4.3; previous revision: 3.4.4.2
done
Checking in mozilla/xpcom/threads/Makefile.in;
/cvsroot/mozilla/xpcom/threads/Makefile.in,v  <--  Makefile.in
new revision: 1.25.4.3; previous revision: 1.25.4.2
done
Checking in mozilla/xpcom/threads/makefile.win;
/cvsroot/mozilla/xpcom/threads/makefile.win,v  <--  makefile.win
new revision: 1.22.2.3; previous revision: 1.22.2.2
done
Checking in mozilla/xpcom/threads/TimerThread.cpp;
/cvsroot/mozilla/xpcom/threads/TimerThread.cpp,v  <--  TimerThread.cpp
new revision: 1.6.4.3; previous revision: 1.6.4.2
done
Checking in mozilla/xpcom/build/nsXPComInit.cpp;
/cvsroot/mozilla/xpcom/build/nsXPComInit.cpp,v  <--  nsXPComInit.cpp
new revision: 1.126.12.4; previous revision: 1.126.12.3
done
Checking in mozilla/widget/src/windows/nsAppShell.cpp;
/cvsroot/mozilla/widget/src/windows/nsAppShell.cpp,v  <--  nsAppShell.cpp
new revision: 3.42.16.4; previous revision: 3.42.16.3
done
bash-2.05a$ 

backed out.  
Assignee: dougt → pavlov
Status: REOPENED → NEW
Sorry for the spam, but did this get backed out properly?  bonsai shows the 
backout applying against a branch called MOZILLA_1_0_BRANCH (not 
MOZILLA_1_0_0_BRANCH).  Does that matter?
Removing adt1.0.0+, as this was backed out. Pls nominate again, when there is a
patch that fixes the problem.
Keywords: adt1.0.0+
MOZILLA_1_0_BRANCH and MOZILLA_1_0_0_BRANCH should be aliases of each other (for
now, at least).
This is no longer a branch smoketest blocker since it was backed out of the branch.
Severity: blocker → major
Keywords: smoketest
Whiteboard: [adt2] BRANCH ONLY → [adt2]
Any chance this will be fixed for 1.0, or is it just to late now that RC1 is
released ?

There's so much DHTML not working properly, including some stuff I made :-(

Well, sorry for the spam, but I reelly really hope mozilla.org has the nesecary
patience to wait for a fix for this bug. And thanks to those people who have
already been working hard trying to fixing this bug.
Since this isn't fixed shouldn't it be fixed on all platforms? I don't think
that there should be so major differences between behaviour on Windows, Linux
and Mac versions.
Blocks: advocacybugs
There might be something related in bug 138791. I found some problems in the
timer code while making it handle interval rollover correctly.
Depends on: 138791
Whiteboard: [adt2] → [adt2] FIXED ON TRUNK
The reason this wasn't changed on linux or mac is because their old timer 
implementations did not behave this way.
Status: NEW → ASSIGNED
Do we know exactly what is holding up this patch on the branch? This does ruin
alot of DHTML websites, and it would be nice for the patch to make it into 1.0.

The trunk seems to be working fine with this patch on Win. What are we looking
for to get approval to land on the branch?
Timer code has been fixed, but we still have problems. Please see my comment 70
from bug 117061 for some analysis. 
i think most of the remaining problems are being caused by other bugs such as 
the various dhtml sucking due to layout problems.
If you want to get this on the branch you should add adt1.0.0 keyword.
adding adt1.0.0 keyword... We shouldn't need Netscape's permission to check into
the 1.0 branch, but it can't hurt to put this on people's radars. Like Stuart
said, the problems that still remain with DHTML performance are other bugs.
Fixing this one will at least get DHTML animations to work as opposed to simply
stalling the browser with frame-dropoffs until the animation is finished. 

we have a few drivers CCd on this bug already, and it is a dependency on bug
138000 (Make RC2 not suck) so unless there are some negative issues with this
patch, this and probably other related bugs (bug 117061 for example) should be
checked in to the 1.0 branch.
Keywords: adt1.0.0
Doug - Considering taking this one after beta. What Top 100 sites are exhibiting
a problem with this bug?
Keywords: adt1.0.0adt1.0.0-
Whiteboard: [adt2] FIXED ON TRUNK → [adt2 RTM] FIXED ON TRUNK
This bug is a major regressions causing most DHTML sites to hang and many 
DHTML examples have frame-drop-offs. It has already done fine on the trunk - 
so why not checking into 1.0 branch?!
This is a stopship for RC2 (that's why it's blocking 138000). It's been on the
trunk for a long time with nothing but good results. We must get this landed
before tomorrow.
rolling the dice ...

adt1.0.0+ (on ADT's behalf) approval for checkin to the 1.0 branch. Pls check
this in to 1.0 branch today, then add the fixed1.0.0 keyword.
Keywords: adt1.0.0-adt1.0.0+
Whiteboard: [adt2 RTM] FIXED ON TRUNK → [adt2] FIXED ON TRUNK [ETA 05/02]
marking fixed.  now on trunk and branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Keywords: fixed1.0.0
Resolution: --- → FIXED
I must say, that lastest trunk for Windows is amazing. Amazing results with 
moving layers in most combinations (with clipping etc.). The problem still 
apears when Mozilla has to show many, many layers (bug #140789 ).

I hope this patch wont be exluded in RC2?
It's on the 1.0 branch, so ready for RC2 :)
Fixing bug 102321 helped many, many things - also a bit
doing animations over a background image.
Nevertheless bug 139986, bug 87808, bug 107746 still need our attention.
*** Bug 142399 has been marked as a duplicate of this bug. ***
No longer blocks: 21762
Blocks: 21762
The patch in this bug causes a pretty serious problem (see bug 291386).  By suppressing timers until the windows event loop is idle, the timer thread's filter gets greatly skewed, and as a result, the timer thread ends up over-correcting like mad for the skew.  It results in timers firing at 1ms intervals post page load, for example.

Unfortunately, all of the testcases linked from this bug are now 404s.  Can anyone help resurrect a testcase for this bug?  Thanks!
Blocks: 291386
Markus: Thanks!
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: