780341 - [meta] Get gaia homescreen running at 60fps (again, on otoro)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Description

•

12 years ago

To get there, we're going to need to fix a set of smaller bugs which will be set as dependencies here. None of the individual bugs block in and of themselves, but the goal set by this bug here does (per product requirements), so I'm going to set this to block.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 780336, 779968, 780260

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 1

•

12 years ago

Basing some early work off of this profile http://people.mozilla.com/~bgirard/cleopatra/?report=ba904a35f4179f11f95d2ebf9a992e43ac35045a

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 780330

Andrew Overholt [:overholt]

Comment 2

•

12 years ago

Turning into a tracker bug. All dependencies nommed.

blocking-basecamp: + → ---

Summary: Get gaia homescreen running at 60fps (again, on otoro) → [meta] Get gaia homescreen running at 60fps (again, on otoro)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 3

•

12 years ago

As I said in comment 0, it's hard to justify marking some of the dependencies as blockers because they wouldn't block in and of themselves. Further, we might find a juicier optimization that gets us to 60fps without having fix some of the other deps. Do you have suggestions for the mechanics of tracking this?

Andrew Overholt [:overholt]

Comment 4

•

12 years ago

I'm perfectly happy with keeping this +'d even if it's more of a meta bug. I'm going to set the owner field to you (so queries for "blockers with no owners" don't return anything) if that's okay. You'll have to forgive me for not knowing the answer, but do we have some easy way of measuring and/or tracking this? arewerunningat60fpsyet.com? ;)

Assignee: nobody → jones.chris.g

blocking-basecamp: --- → +

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 5

•

12 years ago

(In reply to Andrew Overholt [:overholt] from comment #4) > I'm perfectly happy with keeping this +'d even if it's more of a meta bug. > I'm going to set the owner field to you (so queries for "blockers with no > owners" don't return anything) if that's okay. > Yep. > You'll have to forgive me for not knowing the answer, but do we have some > easy way of measuring and/or tracking this? arewerunningat60fpsyet.com? ;) Heh, good idea. Better buy that before competitors do! ;) The in-gecko fps counter is (mostly) sufficient for measuring the (large) impact of current bugs. To enable, go to Settings -> About phone and tap Enable FPS counter. It's more useful with patches from bug 779940 and bug 780074.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 6

•

12 years ago

We're being really dumb with event dispatch right now. The gaia homescreen is making things worse too. Here's what happens on an incoming touch event - master process: dispatch raw touch event to content. Build display list + hit-test. Fire touch event off to content. - master process: synthesize mouse event from touch event. Dispatch to content. Display list + hit-test. - (in parallel) homescreen process: dispatch raw touch event to content. Display list + hit-test. - homescreen process: the gaia homescreen doesn't handle raw touch events, so we synthesize a mouse event. Dispatch. Display list + hit-test. If I understand correctly, we'll end up constructing *four* display lists and doing four hit tests for each event. This is especially bad for the homescreen process, because the master process runs at higher priority; all this extra CPU chokes out the homescreen process. There are several things we can do here. Investigating.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 7

•

12 years ago

GetFrameForPoint() is about 10% of the same-process profile. It's like going to be a bit higher for both processes. One way we can optimize that is caching display lists. mattwoodrow suggests: <mattwoodrow> I think the basic idea was just to take the display list that we create in nsLayoutUtils::PaintFrame and retain it instead of deleting <mattwoodrow> not entirely sure how memory management of that would work <mattwoodrow> and then you'd need to destroy the cached display list if you get any calls to nsIFrame::Invalidate(), nsPresShell::DoReflow() <mattwoodrow> maybe nsCSSFrameConstructor::PostPendingRestyle? <mattwoodrow> but then you should be able to use the retained one in nsLayoutUtils::GetFramesForArea <mattwoodrow> roc: ^^ <cjones> does the display list hold onto raw nsIFrame*? <cjones> i seem to recall it does <mattwoodrow> it does, yes <cjones> can the frame tree ever change without Invalidate/DoReflow? <mattwoodrow> so anything that changes the frame tree would need to delete it <cjones> :) <jgilbert> whaaat <mattwoodrow> nsCSSFrameConstructor::ProcessRestyleEventCommon? <jgilbert> glCheckFramebufferStatus generating INVALID_OPERATION? <mattwoodrow> You might need other places in nsCSSFrameConstructor, unsure <bjacob> jgilbert: that would mean that your mobile device has a driver issue? <cjones> mattwoodrow, where would we stick the cached display list? <jgilbert> bjacob: well it is an adreno 205 <mattwoodrow> cjones: on the root frame?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 8

•

12 years ago

I grabbed a profile of the master process while we're panning around, and it's active for about 1/3 of the samples taken. That's really bad; that's CPU time taken away from app processes. The time is all in ViewManager::DispatchEvent. I think I have a simple fix.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 9

•

12 years ago

My patch wins us back 10fps :D.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 788943

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 10

•

12 years ago

Here's a profile from the homescreen's process while panning around http://people.mozilla.com/~bgirard/cleopatra/?report=7e276d3ea2f68065b1e6ee797f20964f9a322a6f The addresses aren't resolved because of an apparent bug in the profiler addon, but there's enough information to see that time is going to approximately the same places as before. This profile was gathered without the patch from bug 788943. Highlights - 45% in restyling - 22% building layers and layer transactions - 14% in display-list construction - 5% gaia event handling in JS Juiciest target is still restyling.

Andrew Overholt [:overholt]

Updated

•

12 years ago

Whiteboard: [WebAPI:P0]

Andrew Overholt [:overholt]

Updated

•

12 years ago

Keywords: feature

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 789664

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 11

•

12 years ago

Attached patch Patch to get data out of the profiler — Details — Splinter Review

I had wanted to fix the profiler addon this weekend, but I had a catastrophic hard drive failure and got derailed. (Second in three weeks :/.) Anyway, attached is a hacky patch that lets us get data from the profiler. I made a script to manually symbolicate profiles http://people.mozilla.com/~cjones/sym.sh The script wants to live in a clone of https://github.com/bgirard/Gecko-Profiler-Addon which has already been configured to build the firefox addon. (See the README.) There are some paths hard-coded in the script that need to be adjusted for a particular clone and build of b2g. With these tools, I gathered profiles like so $ adb shell stop b2g $ adb shell 'MOZ_PROFILER_STARTUP=1 /system/bin/b2g.sh > /dev/null' & [wait for b2g to start up] $ adb shell ps | grep b2g [note pid of first "plugin-container" process; that's the homescreen] [run test] $ adb shell kill -12 $HOMESCREEN_PROCESS_PID [there's a message dumped to logcat that looks something like E/profiler( 561): Saved to /sdcard/profile_2_${HOMESCREEN_PROCESS_PID}.txt ] $ adb pull /sdcard/profile_2_${HOMESCREEN_PROCESS_PID}.txt profile.txt $ ~/mozilla/profiler-addon/sym.sh < profile.txt > profile-syms.txt Then load the UI http://people.mozilla.com/~bgirard/cleopatra and manually upload profile-syms.txt, and then upload that data to the server. Here's a profile gathered with the default 1000Hz sampler and manually symbolicated. http://people.mozilla.com/~bgirard/cleopatra/?report=d5ba9a326909147a3461aa60bf0239a9836a0cf1 and one gathered with 100Hz sampler (for 100s, whew. I think I wore a hole in my otoro screen.) http://people.mozilla.com/~bgirard/cleopatra/?report=87e5b21715fad2a529c33b718cca6e8db6f5ba3b Interestingly, at 100Hz, PresShell::Paint looks more expensive than CSS::ProcessRestyles, but the deltas aren't very big; they still both really hurt. There are two more things we can check for PresShell::Paint, but I personally don't know how to make progress on CSS::ProcessRestyles. I'm going to reach out to some of our friends and see if they can help us get an oprofile to cross-check the results here.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 12

•

12 years ago

Also, to be honest, the results of the symbolicating look like crap to me. I don't know how much to trust it.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 13

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #12) > Also, to be honest, the results of the symbolicating look like crap to me. > I don't know how much to trust it. Actually, I realized that the libxul symbolification generated by my script *is* crap, because we should end up using ASLR for gecko libs. So ignore libxul symbols. If the 100Hz profile is more believable, here's how things look now - 41% in PresShell::Paint * 11% of that in ShadowLayerForwarder::EndTransaction (are we accidentally sending sync transactions here? Are we getting unlucky and context-switched out at the write() here, so this appears more expensive than it really is?) * 9% in BasicLayerManager::EndTxnInternal. Mostly bug 780330. * 9% in BuildDisplayList (for painting). * 7% in ContainerState::ProcessDisplayItems - 37% in CSS::ProcessRestyles. 30% of those samples are going to a libxul address that I can't resolve. Grr! - 16% in nsViewManager::DispatchEvent * 6% of that in display-list hit testing * ~4% of that running the Gaia event handler, although I think this time is actually misrepresented by the sampler since profiling JS code is very expensive * ~2% in what looks like posting IPC messages that shouldn't be running on the homescreen. Easy fix. Also might be an unlucky context-switch point, so possibly misrepresented. This profile was also gathered without the patch from bug 788943, so syscalls might look more expensive here than they should by virtue of the plugin-container process being context-switched out in favor of the b2g process doing unnecessary work.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 14

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #13) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #12) > > Also, to be honest, the results of the symbolicating look like crap to me. > > I don't know how much to trust it. > > Actually, I realized that the libxul symbolification generated by my script > *is* crap, because we should end up using ASLR for gecko libs. So ignore > libxul symbols. Actually, no; the profiles include the library mapping. So something is just not working well.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 15

•

12 years ago

Since results from system libs look believable, I might try a run with --disable-elf-hack. Not sure what problems that might cause, though ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 16

•

12 years ago

Here's a profile from a 100Hz sampling run with --disable-elf-hack and the patch from bug 788943 applied (to hopefully reduce the skew of syscalls leading to context switch) http://people.mozilla.com/~bgirard/cleopatra/?report=d6f8ad69905a860ecdf671bbdb38b7d9a28af3da The symbol info looks believable! Yay. The analysis of comment 13 is mostly the same, except that - the 11% in ShadowLayerForwarder::EndTransaction really is mostly in write(). It might still be overinflated by process priorities forcing context switching, but we need to fix it. - the time in CSS::ProcessRestyles seems more understandable, but still not by me. - similar to the first point above, the IPC-message-posting during event dispatch is hurting, whether inflated by process priorities or not.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 790183

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 17

•

12 years ago

Attached patch Patch to add labels for Send/Recv IPC methods — Details — Splinter Review

100Hz profile with this patch http://people.mozilla.com/~bgirard/cleopatra/?report=ce1fdb565d6fd0aa076b15949ff3007b93555bfc So that 3% is PBrowser:SetCursor, which is totally useless on "mobile". Argh! This also confirms we're correctly using the async layers update message.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 18

•

12 years ago

I landed the patch to fix symbol resolution for libxul. ./repo sync in your $b2g checkout will pull the change. Also ensure that you have |export B2G_PROF=1| in your $b2g/.userconfig file.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 19

•

12 years ago

Attached patch Patch to separately charge touch events and synth mouse events — Details — Splinter Review

http://people.mozilla.com/~bgirard/cleopatra/?report=c9b1bc8455c35e37ff273705e6c417419307f83a 85% of the event-dispatch time is going to the mouse events. We don't seem to be building a display list for the touch events, so I guess a short-circuit optimization is kicking in.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 20

•

12 years ago

And a profile with dbaron's patch from bug 790379 applied. http://people.mozilla.com/~bgirard/cleopatra/?report=c65f90b04174ef692f3d5b203e6940d03e70f690 dbaron, most of the time is going ComputeStyleChangeFor(). If that rings any bells. We seem to be doing selector matching, which is very surprising.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #19) > 85% of the event-dispatch time is going to the mouse events. We don't seem > to be building a display list for the touch events, so I guess a > short-circuit optimization is kicking in. On devices where there isn't a mouse pointer (do we have a way of detecting those in general?) we shouldn't be doing synth mouse move events at all, since the "current position" of the mouse that we have is basically bogus.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 22

•

12 years ago

How to synthesize fake mouse events from touch series that content doesn't preventDefault() is apparently part of the DOM spec for touch events. (The gaia homescreen only handles mouse events.)

Boris Zbarsky [:bzbarsky]

Comment 23

•

12 years ago

Selector matching during a ComputeStyleChangeFor() is quite possible, depending on why we're doing the style change and what the style rules in the document look like... In this case, are we dynamically toggling classes, say? That would involve rematching selectors on at least the element with the class being toggled. Possibly on others depending on what's in the stylesheets.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 24

•

12 years ago

Er, I guess you were talking about a different sort of synthesized mouse events than I was thinking of. That said, if we dispatch synth mouse events on mobile platforms to handle content moving underneath the mouse pointer, we probably should get a bug filed to stop doing that.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 25

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #23) > In this case, are we dynamically toggling classes, say? That would involve > rematching selectors on at least the element with the class being toggled. > Possibly on others depending on what's in the stylesheets. We certainly shouldn't; the core logic here is just setting -moz-transform. Vivien might be able to tell us; he knows the code better. But do we have any way to, like, help developers debug these kinds of performance issues themselves? :) Is there any way to log *why* we're rerunning selector matching, what property changes triggered it?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 26

•

12 years ago

(In reply to David Baron [:dbaron] from comment #24) > Er, I guess you were talking about a different sort of synthesized mouse > events than I was thinking of. > Yeah, most definitely --- I don't know of any other synthetic mouse events that gecko would dispatch.

Boris Zbarsky [:bzbarsky]

Comment 27

•

12 years ago

> the core logic here is just setting -moz-transform. Setting it how? Via inline style or something else? > But do we have any way to, like, help developers debug these kinds of performance issues > themselves? Other than gdb and lots of pain? ;) I can surface all sorts of information about us requesting style recomputations (e.g. which attr, not property, change triggered it); the hard part is not ending up with a perf hit on every single attr change as a result. Do we have a sane way to cheaply check for a "performance measurement" (not profiling, since this will really skew the results) mode and take some useful action we can surface to developers in the UI? Seems like that's basically what we need here... Plus UI and such.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 28

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #23) > Selector matching during a ComputeStyleChangeFor() is quite possible, > depending on why we're doing the style change and what the style rules in > the document look like... > > In this case, are we dynamically toggling classes, say? That would involve > rematching selectors on at least the element with the class being toggled. > Possibly on others depending on what's in the stylesheets. So on this dirt-simple test http://people.mozilla.com/~cjones/homescreen/homescreen.html I still get a very similar profile to what I see on the real homescreen, with selector matching etc included. http://people.mozilla.com/~bgirard/cleopatra/?report=487be45879a61ce5bb4360c6be9d2a0a7e0f2e95 To the best of my knowledge, this test doesn't change any classes. The core of the logic is Page.prototype = { moveTo: function pg_moveBy(scrollX) { var style = this.movableContainer.style; style.MozTransform = 'translateX(' + scrollX + 'px)'; style.MozTransition = '';

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 29

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #27) > > the core logic here is just setting -moz-transform. > > Setting it how? Via inline style or something else? > See above. > > But do we have any way to, like, help developers debug these kinds of performance issues > > themselves? > > Other than gdb and lots of pain? ;) > > I can surface all sorts of information about us requesting style > recomputations (e.g. which attr, not property, change triggered it); the > hard part is not ending up with a perf hit on every single attr change as a > result. Do we have a sane way to cheaply check for a "performance > measurement" (not profiling, since this will really skew the results) mode > and take some useful action we can surface to developers in the UI? Seems > like that's basically what we need here... Plus UI and such. We implemented something like this for async CSS animations. For example, http://mxr.mozilla.org/mozilla-central/source/layout/style/AnimationCommon.cpp#236 The "should I log" check isn't clever, just looks at a pref basically. There's a checkbox in the gaia "settings app" to toggle this logging code. When it's enabled, it's dumped to the system log (currently, because b2g doesn't have a "developer console" yet). The restyling/reflowing logging sounds like it could get fabulously more complicated, but maybe the basic approach is portable? It might even make sense to share the same "performance logging" pref.

Boris Zbarsky [:bzbarsky]

Comment 30

•

12 years ago

> The core of the logic is Ah. That hits inline style, which does in fact retrigger selector matching on the one element that inline style belongs to. At one point I had a patch series to try to not do that. I'm not sure whether there was a corresponding bug; it's not mentioned in the (nonexistent) commit messages. I didn't finish doing that because 1) it wasn't obviously a win, 2) it conflicted with the selector parallelization work dz was doing at the time, and 3) I think dbaron had some concerns about the generality of the approach and preferred a less-general approach. If desired, I can try reviving that patch series so we can measure with it; I still have it sitting around, so I'd just need to unbitrot it. Looking at the profile, linked above something like 1.5% is in SelectorMatches, right? So about 5% of ComputeStyleChangeFor.... We could win that back but lose even more on the slightly slower FileRules we'd need. :(

Boris Zbarsky [:bzbarsky]

Comment 31

•

12 years ago

> It might even make sense to share the same "performance logging" pref. Sure. It would be simple to add a static pref cache and just check that boolean when posting a restyle. If set, dump out that we're asking for a restyle for an element with certain id/class/tag stuff, and what sort of restyle we're asking for ("self", "self and descendants", "self and descendants and all following siblings and their descendants" being the three kinds we have around)... Would that be useful?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 32

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #30) > > The core of the logic is > > Ah. That hits inline style, which does in fact retrigger selector matching > on the one element that inline style belongs to. > Is there a way to work around that? > Looking at the profile, linked above something like 1.5% is in > SelectorMatches, right? So about 5% of ComputeStyleChangeFor.... We could > win that back but lose even more on the slightly slower FileRules we'd need. > :( Right: I'm not so worried about selector matching per se, just the fact that we seem to be doing a lot more work than we seem to need for just a couple of transform changes. (But don't take that as me making accusations; I know this code is tricky, and I have no idea what I'm talking about ;) .)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 33

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #31) > > It might even make sense to share the same "performance logging" pref. > > Sure. It would be simple to add a static pref cache and just check that > boolean when posting a restyle. If set, dump out that we're asking for a > restyle for an element with certain id/class/tag stuff, and what sort of > restyle we're asking for ("self", "self and descendants", "self and > descendants and all following siblings and their descendants" being the > three kinds we have around)... Would that be useful? Sure. The goal here would be for developers to see that their code is slow and then gather some information to bring to someone else, or locally fix if the bug is obvious. In the example of this test, I was actually surprised that changing the transform triggered restyling --- it certainly makes sense, but that's not the first or even tenth thing that came to my mind.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 34

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #32) > (In reply to Boris Zbarsky (:bz) from comment #30) > > > The core of the logic is > > > > Ah. That hits inline style, which does in fact retrigger selector matching > > on the one element that inline style belongs to. > > > > Is there a way to work around that? I'm not sure that that's the problem. I'm poking at this in gdb right now, and while most of the restyles have aRestyleDescendants=false, I'm seeing a few with aRestyleDescendants=true, which seem like they have a good chance of being the problem. More info in a bit, hopefully.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 35

•

12 years ago

The subtree restyles that I'm seeing are coming from setting a "data-transitioning" attribute to "true" and later removing that attribute. (The attribute changes are happening via a SetProperty and DelProperty on an nsDOMStringMapSH.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 36

•

12 years ago

(In reply to David Baron [:dbaron] from comment #35) > The subtree restyles that I'm seeing are coming from setting a > "data-transitioning" attribute to "true" and later removing that attribute. > (The attribute changes are happening via a SetProperty and DelProperty on an > nsDOMStringMapSH.) Though these do only seem to happen at the very start and very end of the operation. (And it's line 48 of grid.js: document.body.dataset.transitioning = 'true'; and line 155: delete document.body.dataset.transitioning; .) So the middle seems likely to be dominated by style attribute changes. (Did we have a plan at some point to add another eRestyle_* hint that would only replace the style attribute and not do any selector matching? I thought we'd discussed it at some point, but I don't see any mention in bug 494117.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 37

•

12 years ago

That said, the portion of the time that's just constructing style contexts, computing the new data in order to CalcDifference, and doing the CalcDifference is pretty high; such a bypass wouldn't help that.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 790505

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 38

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #30) > > The core of the logic is > > Ah. That hits inline style, which does in fact retrigger selector matching > on the one element that inline style belongs to. > > At one point I had a patch series to try to not do that. I'm not sure > whether there was a corresponding bug; it's not mentioned in the > (nonexistent) commit messages. I didn't finish doing that because 1) it > wasn't obviously a win, 2) it conflicted with the selector parallelization > work dz was doing at the time, and 3) I think dbaron had some concerns about > the generality of the approach and preferred a less-general approach. BTW, I might reconsider those concerns about generality. (I do remember the patch series, I think. It involved reporting which levels of the cascade changed, I think.) > Looking at the profile, linked above something like 1.5% is in > SelectorMatches, right? So about 5% of ComputeStyleChangeFor.... We could > win that back but lose even more on the slightly slower FileRules we'd need. > :( Another random thought that just occurred to me. It's not quite a valid optimization (because of cross-struct dependencies), but it might be close enough that we could make it one: When we're replacing one style context with another, old->GetRuleNode() == new->GetRuleNode(), and we want to compute a struct that's not cached on the rule node, then if the old->GetParent()->GetStyle##struct() == new->GetParent()->GetStyleStruct(), just share the struct from old to new (and set a bit on old saying it no longer owns it). This sharing could propagate down the tree. Then we could benefit from pointer equality checks.

Boris Zbarsky [:bzbarsky]

Comment 39

•

12 years ago

> Sure. The goal here would be for developers to see that their code is slow OK. Can we get a separate bug filed on that? > Did we have a plan at some point to add another eRestyle_* hint that would only replace > the style attribute We considered that, yes. > such a bypass wouldn't help that. That's my worry, yes. > BTW, I might reconsider those concerns about generality. (I do remember the patch > series, I think. It involved reporting which levels of the cascade changed, I think.) Yes, it basically reported separate restyle hints per cascade level. > Another random thought that just occurred to me. That's an interesting idea. We could at least do that for a subset of structs, perhaps?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 40

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #39) > > Another random thought that just occurred to me. > > That's an interesting idea. We could at least do that for a subset of > structs, perhaps? The idea would basically be: * replace the bool &aCanStoreInRuleTree with a reference to a simple struct with a few booleans in it, and some helper methods, so we can simultaneously record both "can store in rule tree" and record the common cases of cross-struct dependence (within style context) * on inherited structs, include in the struct a bitfield (encapsulated in a struct, maybe the same one as above, or maybe not) that stores those common cases of cross-struct dependence (maybe one bit for every struct, or maybe fewer so it can fit in a char) * give the struct that encapsulates that bitfield a method that takes two style contexts and returns whether the struct can just be copied * have DO_STRUCT_DIFFERENCE call a function that uses that method

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 41

•

12 years ago

Here's a profile with the new sample labels http://people.mozilla.com/~bgirard/cleopatra/?report=be3fe54b69331c5378b324f48db59e7292e370f6

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 42

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #39) > > Sure. The goal here would be for developers to see that their code is slow > > OK. Can we get a separate bug filed on that? > Sure! :) Bug 790858

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Whiteboard: [WebAPI:P0] → [WebAPI:P0][LOE:M]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 43

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #41) > Here's a profile with the new sample labels > > http://people.mozilla.com/~bgirard/cleopatra/ > ?report=be3fe54b69331c5378b324f48db59e7292e370f6 So that's telling me that comment 30 doesn't have much room to help us; though the 1.5% in SelectorMatches last time went down to 0.3% this time, so the profiles aren't quite consistent. It's also saying that there is substantial room for comment 38 to help, perhaps. But I'm also surprised by how much was inside the CSS::ComputeStyleChangeFor label but not inside either of its two child labels. I'd have expected things like nsRuleNode::ComputeDisplayData to be entirely within the nsStyleContext::CalcStyleDifference child label... though, actually, now that I realize it, it's probably coming from nsStyleContext::nsStyleContext -> nsStyleContext::ApplyStyleFixups -> nsStyleContext::GetStyleDisplay. I'm quite curious why the display data wouldn't be cached in the rule tree, though.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 44

•

12 years ago

This was a profile of the real homescreen, not the test case. You can tell the difference by seeing if there are samples from event dispatch in the profile. Let me know which would be more useful.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 45

•

12 years ago

Bug 790505 should get us to ~52-54 fps.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 46

•

12 years ago

The work here had got us up to about 38fps, but recent gaia changes have dropped that back down to ~20fps. Sigh.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 47

•

12 years ago

Recalibrating metrics here based on changes in gaia: I'm now testing quickly panning in a circle between the first and second homescreen pages with icons. (There are only two now.) In that test, I see ~39fps, ironically, where we were with a bunch more icons. So the style recomputation might have be hurting us more in the new design. (The new test has fewer icons and labels.) The optimization in bug 790505 gets us to ~57fps, which is the display refresh rate. It looks like to stay there when the pages are filled with icons, we'll need a little boost from somewhere else too, hopefully style recomputation. I'll try to test full pages of icons later.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 48

•

12 years ago

Attached patch patch to test whether the restyling problem is only 'rem' units (now bug 804970) — Details — Splinter Review

Andreas suggested to me that maybe we could do something specific to 'rem' units (so that we could cache them in the rule tree). I think this is probably doable. So here's a patch to test that: it will break the effect of dynamic style changes on the root element changing 'rem' units, but a performance test with this patch applied will tell us how much is to be gained from such a patch.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 49

•

12 years ago

Unscientific testing suggests this patch wins ~2-3fps.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 50

•

12 years ago

On the default gaia homescreen, bug 790505 gets us to 60fps (well, the display refresh rate, 57fps).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 51

•

12 years ago

Panning between the homescreen pages that show actual content is now at display refresh rate. The useless landing page was readded, and panning between that and the first page with content is not 60fps, but I think users will pan away from that so quickly it won't matter. If needed, we can fix that transition with bug 796697. So I declare this work v1 done. Would really love to see followups to further optimize restyling though.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 52

•

12 years ago

In a local "eng" build (default, non-release), panning between the third and fourth screens is around 45-48fps. These two screens are 72% full of icons (23/32 slots). So perf is going to degrade as more apps are installed.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 53

•

12 years ago

Two followup bugs on additional style system work: bug 804970 (with patches), to follow up on the rem idea in comment 48 bug 804975 (no patches yet), for a more complex idea that's different from comment 38 and comment 40, but perhaps better

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

12 years ago

Attachment #665154 - Attachment description: patch to test whether the restyling problem is only 'rem' units → patch to test whether the restyling problem is only 'rem' units (now bug 804970)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 54

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #49) > Unscientific testing suggests this patch wins ~2-3fps. The full patch series in bug 804970 should help more than this, because it also fixes a stupid mistake with calc() that's relevant to the homescreen.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 55

•

12 years ago

Cool :). I wanted to make sure that everyone was aware of this test http://people.mozilla.com/~cjones/homescreen/homescreen.html It runs just fine in desktop and exhibited the same profile as the "real" homescreen when fit to run on device.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 56

•

12 years ago

Attached file homescreen.html modified to be a benchmark — Details

The above homescreen.html modified to be a benchmark. Still only about 2% improvement. Hopefully I'll have some time to debug why the week of November 5.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 57

•

12 years ago

(In reply to David Baron [:dbaron] from comment #54) > The full patch series in bug 804970 should help more than this, because it > also fixes a stupid mistake with calc() that's relevant to the homescreen. It turns out that that stupid calc() mistake wasn't relevant to the homescreen since the calc()s on the homescreen are on properties that store calc in style structs rather than compute it to a result that gets stored in the style structs. (Stored calc always faults out of the rule tree due to the way we allocate the memory; that's fixable, but a hack workaround didn't help measurably on the testcase.) However, I did some profiling of the testcase using 'perf' (and using 'perf script' since I can't get 'perf report' to show what I want), which led me to rewrite the idea in bug 572200 in a slightly different way. That gives a 14% improvement on homescreen.html; see bug 572200 comment 9 for details.

Patch to get data out of the profiler 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2.18 KB, patch		Details \| Diff \| Splinter Review
Patch to add labels for Send/Recv IPC methods 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 4.02 KB, patch		Details \| Diff \| Splinter Review
Patch to separately charge touch events and synth mouse events 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 1.46 KB, patch		Details \| Diff \| Splinter Review
patch to test whether the restyling problem is only 'rem' units (now bug 804970) 12 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 886 bytes, patch		Details \| Diff \| Splinter Review
homescreen.html modified to be a benchmark 12 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 5.21 KB, text/html		Details