Closed Bug 186465 Opened 22 years ago Closed 2 months ago

DHTML clock animation causing 100% CPU

Categories

(Core :: Layout, defect)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: stanio, Unassigned)

References

Details

(Keywords: perf, testcase)

Attachments

(13 files, 6 obsolete files)

1.08 KB, text/html
Details
75.23 KB, text/plain
Details
8.37 KB, text/html
Details
144.86 KB, application/x-gzip
Details
218.47 KB, image/jpeg
Details
228.70 KB, image/jpeg
Details
196.36 KB, image/jpeg
Details
7.11 KB, text/html
Details
2.65 MB, text/html
Details
131.26 KB, image/png
Details
108.88 KB, image/png
Details
219.91 KB, image/png
Details
178.90 KB, image/png
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20021221 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20021221 I found a JavaScript drawing a funny clock written exclusively for Netscape4/IE. I modified it a bit so it runs in Mozilla/Netscape7 too. The code for Mozilla and IE is the same except for the mouse move tracking. In fact the code is the same for Netscape 4 with the exception it produces markup with LAYER tags and not DIVs. The performance in Mozilla is very slow. I'm trying it on 2 systems, both under Windows XP and with 512Mb RAM. One has Duron 800Mhz CPU and the other has P4 1.8Mhz CPU. For comparsion it runs alot faster in IE6 and Netscape 4.7x. In Mozilla/Netscape7 the CPU usage is always 100% on both systems while in IE/Netscape4 it is about 50% on the slower system and about 10% on the faster. Reproducible: Always Steps to Reproduce: 1. Open the test case; 2. Move the mouse around; 3. Watch the performance and the CPU usage. Actual Results: The performance is poor and the CPU usage is at 100%. Expected Results: Better performance and lower CPU usage.
Attached file JavaScript code drawing the clock (obsolete) —
Attachment #109940 - Attachment mime type: application/x-javascript → text/javascript
Probably our Math operations could need some tweaking, we also know that ParseInt() could need some optimizing. Anyway, a profile would be needed.
Assignee: asa → other
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout
Ever confirmed: true
Keywords: perf, testcase
QA Contact: asa → markush
Blocks: 21762
Summary: Slow DHTML performance → DHTML clock animation causing 100% CPU
Is this the same as in bug 70156 ?
Removing the style-information didn't have any effect on the performance.
They are somewhat related, though this bug is about a specific testcase. When we can find out what is lacking here we'll also do better in 70156 (which is more like a meta bug).
Attached file warnings, etc.
I put the test case on my own server. With a Linux debug build form this morning I get all these warnings. Going to the page produces the reflow warnings and the CSS warnings. Moving the mouse outside the window and back produces the second batch of CSS warnings. Clicking the back button produces all the assertions. Lots happening here.
How it is with other browsers? source code of this clock is dirty, most of operations in functions |ClockAndAssign| and |Delay| are redundanced, big number of operations are useless. Each step of this animation does whole (not needed) job. Finding layers in tree, counting stable values, checking type of DOM (ns4,ie,w3c) and so on. All of this is done in each step, every 20 miliseconds! I cannot get why, if for eg. type of DOM cannot been changed during animation... I'm not too suprised it's horribly CPU eater. But, of course, if other browsers works ok, it might mean that they have better performance for Math operations, faster CSS machine and faster getElementById(). And it would mean that it is a bug.
And to Comment #7. Those are bugs in code. I think it's not a part of problem. As You can see, moving mouse outside window causes destruction of whole animation.
OS: Windows XP → All
To Comment #8: It seems finding the layers in the tree is not a problem since like I've mentioned it runs at least as twice faster (a lot faster in fact) in Netscape 4.7x. I've made many experiments before I finally decided it's a Mozilla problem and I had made modifications which doesn't use 'document.layers[""]' in the ClockAndAssign() function but only using an arrays already holding references to what is needed. The situation left the same - this could degrade the performance in Netscape 4.7x but there it is fine. I think the problem is in either the calculations run slower or changing the position of the elements and reflecting it is slower. IE6 runs it blazingly fast. I've tested with Opera 7 beta 2 too - it haaardly moves but still it is working. It's not working in Opera 6. I can prepare a cleaner test case excluding the support for Netscape 4 and leaving the little which is needed to run it in IE so there could be made comparison - if somebody thinks this would help?
Thats what i've been asking for. So, Mozilla use more CPU power for this than other browsers. Could it be related to bug 157681 ?
This one works in Mozilla and IE (with very little overhead). The JavaScript code is inlined.
This one is the same as the previous but cleared the code a bit more so no overhead is added at all. More specifically the followMouse() function and one if statement in the ClockAndAssign() function.
Could someone who's seeing this just profile it? The venkman profiler should give you a first approximation (it'll at least tell you what things in the JS take a long time, if not why they do so) and is available XP with all mozilla builds that have Venkman installed...
Keywords: qawanted
[From JS profiles, and experimentation] (almost) all the time is being consumed by these two lines: DL.top = Dy[i] + ClockHeight * 1.5 * Math.sin(currStep + i * Dsplit * Math.PI / 180); DL.left = Dx[i] + ClockWidth * 1.5 * Math.cos(currStep + i * Dsplit * Math.PI / 180); So this is essentially (as suggested above) the same as bug 157681 (i.e., the above lines trigger a reflow when not really required). (And yes, this is the same animation as bug 70156, but that bug has become a bit overloaded with unrelated examples, etc.).
Blocks: 70156
Depends on: 157681
windoze XP build 2003080105 * only attachment 109941 [details] displays the clock in mozilla. ** CPU charge (athlon 1800+) w/ iexplore between 25 and 50% + /!\ 25% for explorer: here's the trick. ** CPU charge with Mozilla between 50 and 100% ** CPU charge with S.O 5 between 30 and 60% **** all 3 attachment displays the clock in iexplore and soffice ****. ** the maximum charge with attachment 110031 [details] with mozilla is only 50% 50% more than with iexplore as attachment 10941 [details] is 100% more but the clock is not displayed. performance in SUn product is between iexplore's and mozilla's. it *should be* a good thing to see what produce this amount of CPU usage but the most difference is that iexplore amount of CPU usage shut nearby 0 when the cursor leaves the window wich is not the case in Mozilla (it shuts to 19%) <- the script is still running when it's off focus.
with trunk build 2003011412 on winxp pro sp1,1.1ghz,512ram Using the improved version at http://bugzilla.mozilla.org/attachment.cgi?id=110031&action=view without moving the mouse I have 60% cpu usage and 100% when moving around. http://bugzilla.mozilla.org/attachment.cgi?id=109941&action=view has 95% cpu usage without moving the mouse and 100% when moving.
test script: sleep 2 kill -PROF $1 sleep 30 kill -USR1 $1 ./jprof ./mozilla-bin jprof-log > ~/jprof$1.html env: rh8.0+ trunk20030112 cpu: 1.8GHZ
i remade the test, vhecking total CPU charge in i explore and netscape: window in focus and moving pointer in it 45/55 CPU charge in iexplore and 85/100 in Mozilla (2003011412) vs 2/8% window not loaded windows out of focus in iexplore: 8/13 CPU total charge windows out of focus in netscape: 40/60% CPU charge: There is the real problem.
leon, mind attaching your analysis of the jprof results?
As a note... on a 1.8GHz machine, profiling for 30 seconds is near-pointless. Also, why are there two profiles? What's really needed here are two profiles, each at least 10 minutes long, one realtime (do not use the system for anything else during this time) and one not realtime.
bz: I will give new profiles which test time will be about 10m.
As a note the profiles in comment 18 show that the most time is spent in libesd.so. I somehow doubt the correctness of that.... leon, are you quitting mozilla before running jprof on the log file? If I recall correctly, the log is not flushed to disk synchronously....
Priority: -- → P2
Target Milestone: --- → Future
Leon, can you provide your mentioned new profiles?
Another thing to try, for all DHTML testcases, is to try the test without any painting --- e.g., move the window offscreen (just the top of the titlebar showing, none of the content showing). Do this ON WINDOWS (GTK will still paint, yes I'm working on that bug). The testcases should go faster, and the amount that they go faster tells us how much of the load is painting.
Using http://bugzilla.mozilla.org/attachment.cgi?id=109941&action=view with trunk build 2003021608 on winxp pro sp1,1.1ghz,512ram I get 65-70% cpu usage when I don't move the mouse at all,when putting Mozilla into the taskbar (minimizing) I get 45% cpu usage. Does that really mean that painting contributes to 20-25% of the whole load?
Yes, I think that's what it means.
Wow - so great potential for a major performance boost ahead :)
Depends on: 170330, 191474
I'm sorry, but it's still unbelivable for me. So positioning vectors of ~40 layers, without painting them (so in some way only setting vaslues of two properties for every layer) consumes 40% of my processor, and painting this (where we have to paint each pixel, repaint spirit [or how it is done in Mozilla], and check z-index) costs only 20%? I cannot imagine this situation...
Hmmm... My results are quite different: IE: Full Window (and moving) - 5-10% processor Full Window - 2% processor Minimized window - 1-2% processor Mozilla Full Window (and moving) - 70% processor Full Window - 50% processor Minimized window - 10-20 % processor So i cannot really base on my IE6 reactions (because it's too near to 0%), but i can say what i see with Mozilla. And as for me paiting costs >30%.
Just a side note: Everybody could open several (two or three) windows with the example loaded for a more robust performance test (IE6 handles it very well).
The painting overhead is also affected by bug 194627
Attached image quantify data(1)
Attached image quantify data(2)
Attached image quantify data(3)
original data package is too big to transfer because of limitation of attachment of bugzilla. env: sol8 + trunk20030305 + quantify
Have we got any findings from Leon's quantify data?
Priority: P2 → --
Target Milestone: Future → ---
no DHTML. only math stuff remains. does dhtml-less case cause problem? (no CPU problem w/ the testcases. moz 2003040708/win98 PIII 128MB RAM)
Attachment #109940 - Attachment is obsolete: true
Attachment #110031 - Attachment is obsolete: true
Leon, regarding the new profile - could you please provide an analysis (how much time/percent is spent in the individual functions).
Here's some data I gathered. Function Calls FunctionTime InvalidateRect 2895 100184.02 ConvertUTF8toUCS2::write 4074 99560.36 CombineRgn 1308 58656.32 new 20354 19743.87 BitBlt 14 17809.14 InvalidateRect is pretty self explanitory, it's being called by nsWindow::Invalidate ConvertUTF8toUCS2::write seems to be driven from RuleHash::EnumerateAllRules CombineRgn is driven by nsViewManager::View and nsViewManager::UpdateWidgetArea equally. new is getting called mainly by nsBaseWidget::GetChildren the most, but there's about ten other functions that called it over 1,000 times. BitBit is interesting, since it was only called 14 times but the function time is really up there. It's being called by nsRenderingContextWin::CopyOffScreenBits
Might this be related to bug 153367 ?
hm, the ConvertUTF8toUCS2 seems suspicious to me. dbradley: can you find out where exactly it's getting called from? EnumerateAllRules does not seem to directly call it.
copy_string NS_ConvertUTF8toUCS2::Init NS_ConvertUTF8toUCS2::NS_ConvertUTF8toUCS2 AtomImpl::ToString RuleHash_CHashKey PL_DHashTableOperate RuleHash::EnumerateAllRules I wonder if before enumerating the rules you could create an atom from the string, and pass that in instead, so you wouldn't have to convert all the atoms to strings.
We _are_ passing in atoms. The problem is that the comparison needs to be case-insensitive so we have to get the string and upcase it... ToString used to be pretty cheap before atoms switched to being UTF-8... not sure it still is. We don't have a ToUpperCase that will work for UTF-8 strings, do we?
that's exactly what we need.. or even better a way to convert, upcase, and hash all in one motion. you wouldn't need any kind of temporary buffer.
True. So maybe we need a HashCIString(aUTF8String) ? (with a better name)
Yeah, and one that takes an argument for uppercasing, lowercasing, or neither, too.
Hasn't there been any bug fixing in the meantime or should we file a seperate bug about that?
I've attached a patch in bug 213813 that might help with some of these DHTML issues. I don't expect tremendous gains, but it might be a small step in improving performance. I'm going to try and get some numbers, but I wanted to get this out to others that might have more time than I have current. The patch is at http://bugzilla.mozilla.org/attachment.cgi?id=128862&action=edit
Whats up with this clock testcase? It doesnt work anymore in Mozilla, i get: Error: uncaught exception: [Exception... "Component returned failure code: 0xffffffff [nsIDOMCSS2Properties.left]" nsresult: "0xffffffff (<unknown>)" location: "JS frame :: http://bugzilla.mozilla.org/attachment.cgi?id=110031&action=view :: ClockAndAssign :: line 200" data: no]
José, that's known regression with setting inline style in current builds...
Depends on: 230170
In a current (somewhat hacked) trunk build, the single biggest part of the flat profile is in XIntersectRegion (5% of total time). It's called (via some other junk) from nsViewManager::AddRectToDirtyRegion. We spend about 10% of total time under AddRectToDirtyRegion...
(In reply to comment #54) > In a current (somewhat hacked) trunk build, the single biggest part of the flat > profile is in XIntersectRegion (5% of total time). Weird. (Without looking at the code) I don't know we couldn't just use nsRects instead of X regions, the former of which at least are inlineable AFAIK.
Using nsRects would mean we're painting extra stuff, no? So it's a tradeoff between too much painting and too much time spent maintaining the exact invalidate region...
Considering that for some widget ports (e.g., GTK, GTK2), we just get the minimal rect surrounding the region when we do a paint, it seems like a bit of a waste. (On Windows, we actually use the region.)
Assignee: layout → nobody
QA Contact: markushuebner → layout
CPU usage is "00" when testing all testcases. Build ID: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a1pre) Gecko/20091113 Minefield/3.7a1pre
Opening one IE 6 window with attachment 110031 [details], and leaving it idle (not moving the mouse) for a moment I observe about 5% CPU usage reported by the Task Manager. Opening four overlapping (i.e. only the top one has the animated clock visible) IE 6 windows with attachment 110031 [details], I observe no visible slow down and about 25% CPU usage, when not moving the mouse. Using: Mozilla/5.0 (Windows NT 5.1; rv:2.0b4pre) Gecko/20100807 Minefield/4.0b4pre Opening one Minefield window with attachment 110031 [details] and leaving it idle I observe varying 5%-10% CPU usage reported. Opening four overlapping windows with attachment 110031 [details] in Minefield, I observe quite noticeable slow down and constant 100% CPU usage reported. Opening four tabs of attachment 110031 [details] in a single Minefield window doesn't show slow down, but CPU usage reported is at minimum 80%. Of course this is on my quite old already machine (CPU: Athlon 1.47GHz, VGA: NVIDIA GeForce2 MX 400) - one may experiment with more windows of attachment 110031 [details] opened with a faster machine to compare how different browsers scale.
The CPU usage stays low in 26.0a1 (2013-08-26), Win 7 x64 using the testcases.
Keywords: qawanted
Attached image CPU consumption using Edge (obsolete) —
Attached image CPU consumption using Fx 48a2 (obsolete) —
Attached image CPU consumption using Chrome 50 (obsolete) —
Attached image CPU consumption using IE 11 (obsolete) —
attachment 8755205 [details]attachment 8755206 [details]attachment 8755207 [details] Using attachment 110033 [details] I'm still seeing Firefox consuming about twice CPU as the rest of the browsers running a single clock instance (Win10 64-bit, Lenovo T540p, 2-cores, 4-logical CPUs), but the causes may are quite different from 13 years ago. So I'm willing to close this report, unless Firefox developers don't want it to further investigate and possibly improve on.
Attached file clock_jprof.html
There are no obvious hotspots... 50% in nsRefreshDriver::Tick() processing pending updates/paints/restyles; 17% in egl_Main/ProxySetProperty
My observation from comment 59 still appears valid. Opening 4 maximized fully overlapping windows with the clock, Firefox consumes 80% CPU and above, and I perceive visible slow down. Edge and IE keep it around 30% CPU and no visible slow down. Chrome keeps at most 50% CPU usage and no visible slow down. It is interesting to point, if I open the 4 clocks in tabs instead of new windows then Firefox uses at most 30% CPU and no visible slow down. Another point to try out to see if it makes any difference for you is to keep moving the mouse around.
> It is interesting to point, if I open the 4 clocks in tabs instead of new windows Then all but one of them are running in background tabs, and this line in the testcase: setTimeout("Delay()", 20); is equivalent to: setTimeout("Delay()", 1000); in a background tab. That obviously uses a lot less CPU. ;) Overlapped windows do not throttle timeouts like that, but see bug 1236512. It's possible that Edge and Chrome do in fact throttle things in overlapped windows.
Attachment #8755208 - Attachment is obsolete: true
Attachment #8755205 - Attachment is obsolete: true
Attachment #8755207 - Attachment is obsolete: true
Attachment #8755206 - Attachment is obsolete: true
(In reply to Boris Zbarsky [:bz] from comment #67) > Overlapped > windows do not throttle timeouts like that, but see bug 1236512. Thanks for the pointer. Leaving this specific scenario aside what I'm currently observing with 4 visible clock windows: * IE 11 and Edge ~ 30% CPU * Chrome 50 ~ 50% CPU * Firefox 48 ~ 75-80% CPU Only with Firefox I observe visible slow down of the clocks, and I can hear my cooling fans about to take off, although the measured difference with Chrome doesn't look that big. Leaving this open for module owners to decide whether to close or not.
Severity: normal → S3

This is WFM on WIndows.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: