Closed Bug 1210261 Opened 9 years ago Closed 9 years ago

Animated favicon in bookmark folder breaks smooth scrolling

Categories

(Core :: Graphics, defect)

40 Branch
Unspecified
Windows
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox41 --- wontfix
firefox42 + wontfix
firefox43 + wontfix
firefox44 + wontfix
firefox45 + fixed
firefox-esr38 --- unaffected

People

(Reporter: loneboco, Assigned: mchang)

References

Details

(Keywords: perf, regression, reproducible)

Attachments

(7 files, 12 obsolete files)

4.45 KB, text/plain
Details
3.79 KB, text/plain
Details
113.37 KB, text/x-log
Details
91 bytes, text/plain
Details
91 bytes, text/plain
Details
91 bytes, text/plain
Details
10.92 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0
Build ID: 20150928102225

Steps to reproduce:

1) Make the Bookmarks Toolbar visible.
2) Create a new Test folder in your Bookmarks Toolbar.
3) Create a bookmark to this website: http://www.skindeepcomic.com/
4) Visit the website to load the favicon into cache.
5) Open up a new browser window.
6) Visit a site you can scroll, like reddit.com.
7) Scroll up and down, noticing how smooth it is.
8) Click the Test folder in your Bookmarks Toolbar to expand it, making the animated favicon visible (you don't need to visit the bookmark).
9) Scroll up and down, noticing how smooth it is.
10) Delete the Test folder in the Bookmarks Toolbar.
11) Scroll up and down, noticing how smooth it is.


Actual results:

The page starts off with no issues scrolling.  It is smooth and works fine.  Upon viewing the Test folder in the Bookmarks Toolbar, smooth scrolling breaks for every current and future tab in the current Firefox window (other windows are not affected).  The page stutters when scrolling around.  When the Test folder is deleted (thereby deleting the bookmark with the animated favicon), the stutter disappears and scrolling is smooth once again.


Expected results:

There should be no stutter while scrolling.
I can reproduced on Firefox40.0.3 and 41.0.1 with default setting. And I can reproduced on Aurora43.0a2 and Nightly44.0a1 without e10s.

[Tracking Requested - why for this release]: Regression since Firefox40, Break smooth scrolling, 

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a4ac07b1a28b&tochange=0f1095c9567f

Regressed by: 0f1095c9567f	Mason Chang — Bug 1144317 - Enable vsync refresh driver on Windows. r=kats
Blocks: 1144317
Status: UNCONFIRMED → NEW
Component: Untriaged → Graphics
Ever confirmed: true
OS: Unspecified → Windows
Product: Firefox → Core
Version: 42 Branch → 40 Branch
Blocks: 1071275
Keywords: perf
I don't see this on Linux (Aurora), but the regression range suggests that it's Windows-specific.

Gerv
It looks like it doesn't even need to be a bookmark.  Any animated favicon will kill browser performance.

Just viewing this page in a tab will be enough to ruin scrolling performance:
http://vocaroo.com/?info
Recent visual regression, tracking it.
kats, Mason, could you help us?
Flags: needinfo?(mchang)
Flags: needinfo?(bugmail.mozilla)
I think Mason will need to look into this when he gets back from PTO. My guess is the favicon is in the parent process (just like bookmark icons would be) and so the animation is causing the refresh driver in the parent process to tick a lot more. That might have adverse performance consequences.
Flags: needinfo?(bugmail.mozilla)
Do you know when he is back from PTO? We are going to be build beta 4 today and we don't have much beta.
Flags: needinfo?(bugmail.mozilla)
He'll be back next week. I would pick it up but I'm going on PTO from tomorrow as well and I don't think I'll be able to do anything in the time left today. Passing ni to Milan; maybe somebody else can look into it.
Flags: needinfo?(bugmail.mozilla) → needinfo?(milan)
Benoit, can you take a quick look?  I'm not holding my breath that this will be a quick fix, but let's see how far we can get.
Flags: needinfo?(milan) → needinfo?(bgirard)
I don't see anything on my machine with and without layers acceleration on nightly with e10s. Can someone who can reproduce this post their about:support so I can make sure I'm running with the same code paths?
Flags: needinfo?(bgirard)
(In reply to Benoit Girard (:BenWa) from comment #9)
> I don't see anything on my machine with and without layers acceleration on
> nightly with e10s. Can someone who can reproduce this post their
> about:support so I can make sure I'm running with the same code paths?

You should test without e10s
My bad, my theory in comment 5 doesn't make any sense given the findings in comment 1, since it only reproduces without e10s.
Attached file about_support.txt
No e10s also means no APZ.  Also, 41 & 42 should still have prefs for vsync.
Not seeing it on 41, Windows 7, D2D1.1/D3D11.  Alice's is Windows 8, otherwise comparable.
Steps to reproduce:

1) Make the Bookmarks Toolbar visible.
2) Create a new Test folder in your Bookmarks Toolbar.
3) Create a bookmark to this website: http://www.skindeepcomic.com/
4) Visit the website to load the favicon into cache.

5) Restart browser
6) Open https://developer.mozilla.org/en-US/Add-ons/Code_snippets/Tabbed_browser
7) Scroll up and down (turn mouse wheel slowly) on the web page.
   --- smooth as expected

8) Click the Test folder in your Bookmarks Toolbar to expand it, making the animated favicon visible (you don't need to visit the bookmark).
9) Scroll up and down (turn mouse wheel slowly) on the web page
   --- stutter, jumps few pixels

I can reproduce on Windows7 without e10s (about:support see cooment#12).
https://hg.mozilla.org/mozilla-central/rev/3edc8d4a1e198314f5d7ebd2967b85842beef602
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0 ID:20151006030231
(In reply to Alice0775 White from comment #15)
> Steps to reproduce:
> 
> 1) Make the Bookmarks Toolbar visible.
> 2) Create a new Test folder in your Bookmarks Toolbar.
> 3) Create a bookmark to this website: http://www.skindeepcomic.com/
> 4) Visit the website to load the favicon into cache.
> 
> 5) Restart browser
> 6) Open
> https://developer.mozilla.org/en-US/Add-ons/Code_snippets/Tabbed_browser
> 7) Scroll up and down (turn mouse wheel slowly) on the web page.
>    --- smooth as expected
> 
> 8) Click the Test folder in your Bookmarks Toolbar to expand it, making the
> animated favicon visible (you don't need to visit the bookmark).
> 9) Scroll up and down (turn mouse wheel slowly) on the web page
>    --- stutter, jumps few pixels
> 
> I can reproduce on Windows7 without e10s (about:support see cooment#12).
> https://hg.mozilla.org/mozilla-central/rev/
> 3edc8d4a1e198314f5d7ebd2967b85842beef602
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
> ID:20151006030231

I can mostly only reproduce with a trackpad, and the stuttering isn't very bad, I can still scroll and the browser is responsive. It just is no longer super smooth scrolling. With a mouse wheel, I can't tell. Actually, the best way to reproduce for me was on a lenovo laptop with the newer trackpads on Windows 10 as it's perfectly smooth with Silk, then slightly janky with the animated favicon. Without Silk, it's decently janky without the favicon.

Can you try a few things just to make sure we're seeing the same issue:
1) Use Firefox Release / Beta (41/42).
2) Go to about:config
3) set the preferences gfx.vsync.* to false.
4) restart Firefox
5) Try the test results in comment 15 again. Are you seeing the same issue?

With Silk disabled, It's MUCH harder to tell at least for me, that the smoothness goes down, but that's only because with Silk disabled, scrolling is already janky. It just gets slightly jankier.

Also, can you answer these questions please:
1) If you have a trackpad, do you notice the issue more with a trackpad?
2) Is firefox still responsive for you, just jankier while scrolling? Or can you not scroll at all?

Thanks!
Assignee: nobody → mchang
Flags: needinfo?(mchang) → needinfo?(alice0775)
(In reply to John Norman from comment #3)
> It looks like it doesn't even need to be a bookmark.  Any animated favicon
> will kill browser performance.
> 
> Just viewing this page in a tab will be enough to ruin scrolling performance:
> http://vocaroo.com/?info

I also tried having this tab be open while visiting the site listed in comment 15. I could see no difference with having that page in an open tab and having that favicon run when scrolling.
(In reply to Mason Chang [:mchang] from comment #16)

> 
> Can you try a few things just to make sure we're seeing the same issue:
> 1) Use Firefox Release / Beta (41/42).
> 2) Go to about:config
> 3) set the preferences gfx.vsync.* to false.
> 4) restart Firefox
> 5) Try the test results in comment 15 again. Are you seeing the same issue?
> 
> With Silk disabled, It's MUCH harder to tell at least for me, that the
> smoothness goes down, but that's only because with Silk disabled, scrolling
> is already janky. It just gets slightly jankier.
> 

it is still super smooth with |gfx.vsync.* to false| on Release41.

> Also, can you answer these questions please:
> 1) If you have a trackpad, do you notice the issue more with a trackpad?

Do not have it.

> 2) Is firefox still responsive for you, 

yes.

>  just jankier while scrolling? Or can
> you not scroll at all?
> 

Scroll is not smooth. like 2px 1px 1px 1px 2px 1px 1px 1px 2px ...
Flags: needinfo?(alice0775)
With the gfx.vsync.* preferences to false, does the issue still occur for you? Does scrolling get janky with the favicon? Or does it remain smooth?
Flags: needinfo?(alice0775)
Alternative.

1. Disable E10s and restart
2. Open https://bug894128.bmoattachments.org/attachment.cgi?id=776049
   Click "Add dense text" and then click "Scroll-test"

3. Restart
4. Click the Test folder in your Bookmarks Toolbar to expand it
5. Open https://bug894128.bmoattachments.org/attachment.cgi?id=776049
   Click "Add dense text" and then click "Scroll-test"

Actual Results:

** Results of Step2:
Page: https://bug894128.bmoattachments.org/attachment.cgi?id=776049
Steps (after ignoring 120): 299
Step: 5 px
Duration: 7017 ms (target: 7000)

Window size: 1139 x 782
Average interval: 16.72 ms
STDDEV intervals: 1.81 ms

intervals histogram:
6.0 - 8.0 ms: 1
14.0 - 15.9 ms: 17
15.9 - 17.3 ms: 272
17.3 - 18.0 ms: 8
35.0 - 50.0 ms: 1


** Results of Step5:
Page: https://bug894128.bmoattachments.org/attachment.cgi?id=776049
Steps (after ignoring 120): 299
Step: 5 px
Duration: 7014 ms (target: 7000)

Window size: 1139 x 782
Average interval: 16.72 ms
STDDEV intervals: 3.15 ms

intervals histogram:
4.0 - 6.0 ms: 1
6.0 - 8.0 ms: 1
8.0 - 10.0 ms: 2
10.0 - 12.0 ms: 1
12.0 - 14.0 ms: 46
14.0 - 15.9 ms: 62
15.9 - 17.3 ms: 69
17.3 - 18.0 ms: 38
18.0 - 22.0 ms: 76
22.0 - 32.0 ms: 2
35.0 - 50.0 ms: 1


** The above results indicates, quality of scroll becomes worse after viewing animated bookmark favicon.
Flags: needinfo?(alice0775)
(In reply to Mason Chang [:mchang] from comment #19)
> With the gfx.vsync.* preferences to false, does the issue still occur for
> you? 

No.

> Does scrolling get janky with the favicon? Or does it remain smooth?


smooth

*Tested of Firefox41.0.1 with gfx.vsync.* preferences to false

** Results of Step2:
Page: https://bug894128.bmoattachments.org/attachment.cgi?id=776049
Steps (after ignoring 120): 300
Step: 5 px
Duration: 7018 ms (target: 7000)

Window size: 1139 x 782
Average interval: 16.67 ms
STDDEV intervals: 0.62 ms

intervals histogram:
12.0 - 14.0 ms: 1
14.0 - 15.9 ms: 6
15.9 - 17.3 ms: 284
17.3 - 18.0 ms: 6
18.0 - 22.0 ms: 3


** Results of Step5:
Page: https://bug894128.bmoattachments.org/attachment.cgi?id=776049
Steps (after ignoring 120): 299
Step: 5 px
Duration: 7001 ms (target: 7000)

Window size: 1139 x 782
Average interval: 16.67 ms
STDDEV intervals: 0.69 ms

intervals histogram:
12.0 - 14.0 ms: 1
14.0 - 15.9 ms: 42
15.9 - 17.3 ms: 218
17.3 - 18.0 ms: 34
18.0 - 22.0 ms: 4
I went back to Gecko 42. Once we load the favicon from the bookmarks menu, the refresh driver ticks forever, regardless of whether or not silk is enabled. That seems really off, so the main thread is just a lot busier in general if the refresh driver is always ticking. But it still doesn't seem to explain why the intervals are worse. Still digging.

@Gijs - Do you know why, when loading an animated gif from a favicon from the bookmarks toolbar, we would keep animating the favicon forever? Are we caching the favicon in some open document somewhere? Thanks!
Flags: needinfo?(gijskruitbosch+bugs)
Page: https://www.google.com/search?q=regex&ie=utf-8&oe=utf-8
Steps (after ignoring 120): 57
Step: 5 px
Duration: 3575 ms (target: 7000)

Window size: 1199 x 645
Average interval: 19.95 ms
STDDEV intervals: 2.40 ms

intervals histogram:
6.0 - 8.0 ms: 1
18.0 - 22.0 ms: 55
32.0 - 35.0 ms: 1

---

I then opened this page up in a new tab: http://vocaroo.com/?info
I then tried the test again.

---

Page: https://www.google.com/search?q=regex&ie=utf-8&oe=utf-8
Steps (after ignoring 120): 57
Step: 5 px
Duration: 3606 ms (target: 7000)

Window size: 1199 x 645
Average interval: 20.74 ms
STDDEV intervals: 6.20 ms

intervals histogram:
4.0 - 6.0 ms: 1
8.0 - 10.0 ms: 1
12.0 - 14.0 ms: 3
14.0 - 15.9 ms: 4
15.9 - 17.3 ms: 3
17.3 - 18.0 ms: 2
18.0 - 22.0 ms: 25
22.0 - 32.0 ms: 12
32.0 - 35.0 ms: 4
35.0 - 50.0 ms: 2

---

I then turned off all the gfx.vsync.* settings and restarted Firefox.

---

Page: https://www.google.com/search?q=regex&ie=utf-8&oe=utf-8
Steps (after ignoring 120): 57
Step: 5 px
Duration: 3552 ms (target: 7000)

Window size: 1199 x 645
Average interval: 19.96 ms
STDDEV intervals: 0.62 ms

intervals histogram:
17.3 - 18.0 ms: 1
18.0 - 22.0 ms: 56

---

With this opened in a new tab: http://vocaroo.com/?info

---

Page: https://www.google.com/search?q=regex&ie=utf-8&oe=utf-8
Steps (after ignoring 120): 57
Step: 5 px
Duration: 3543 ms (target: 7000)

Window size: 1199 x 645
Average interval: 20.14 ms
STDDEV intervals: 1.46 ms

intervals histogram:
15.9 - 17.3 ms: 1
18.0 - 22.0 ms: 54
22.0 - 32.0 ms: 2

---

Also of note, since the bug also occurs when a tab is open, the animation that plays when a page is loading (the spinning circle) also affects scrolling performance.  To me, it is often the cause of poor scrolling performance when first loading a page, or when loading a page in the background (middle-clicking a link).
I'm still digging, but another interesting thing is that when scrolling with Silk and after opening the animating favicon, is that we block a lot more and wait on more transactions. So we seem to be throttling the compositor and ticking the refresh driver from the throttled compositor rather than just the refresh driver timer. Still digging.
(In reply to Mason Chang [:mchang] from comment #22)
> I went back to Gecko 42. Once we load the favicon from the bookmarks menu,
> the refresh driver ticks forever, regardless of whether or not silk is
> enabled. That seems really off, so the main thread is just a lot busier in
> general if the refresh driver is always ticking. But it still doesn't seem
> to explain why the intervals are worse. Still digging.
> 
> @Gijs - Do you know why, when loading an animated gif from a favicon from
> the bookmarks toolbar, we would keep animating the favicon forever? Are we
> caching the favicon in some open document somewhere? Thanks!

From your comment, it is not clear to me whether you mean the bookmarks menu (1st para) or bookmarks toolbar (2nd para).

Either way though, that icon is loaded into the browser's XUL document, right? Because it's obviously part of the menu or toolbar XUL (specifically, the <menuitem> or <toolbarbutton> that somehow (image attribute, list-style-image style, whatever) ends up with the animated icon as its... icon).

So the answer to the question seems to me like it would be trivially true.

It seems like this is yet another case of bug 1178141 (and/or maybe bug 1166500 but that talks about CSS animations).

Maybe Seth can help?
Flags: needinfo?(gijskruitbosch+bugs)
Been digging through mozilla-beta with and without Silk and the essential problem comes down to how / when the refresh driver / compositor recover from throttled mode. Every refresh driver tick, we use the timestamp of the tick to calculate the position to scroll when scrolling w/o APZ / e10s. If the TimeStamps aren't uniform, scrolling is janky. Without Silk, the refresh driver / compositor recovers fairly quickly from throttled mode, hence normal operation can continue and smooth scrolling works. With Silk, we don't recover very well from throttled mode, and have timestamps mixed from throttled mode / vsync, causing lots of jank.

The first scroll always takes a long time to tick the refresh driver, upwards of 50+ms, so we hit throttled mode. Without Silk, after the first refresh driver tick, it scheduled a tick from 16 ms from the start of the first tick. After a 50 ms tick, the main thread will run the refresh driver again, giving two layer transactions to the compositor. The Compositor will run immediately from the first refresh due to the scheduler, composite, then send a DidComposite back to the refresh driver. When the refresh driver's ticks again, it will have gotten one DidComposite back and the second transaction will composite since it's been ~16 ms. The second transaction will finish compositing, send the DidComposite back, and the refresh driver will be on it's normal schedule. In this case, it's good that the compositor / refresh driver aren't on the same timers.

With Silk, this doesn't quite happen as nicely. With the first refresh driver tick, the composite waits around a bit until the next vsync. So for example, let's say the first tick took 20ms, we wait 12 ms to composite. Since one vsync happened already, we also tick the refresh driver ASAP again, to have two transactions. At time t=32, we composite the first transaction, but the refresh driver goes into throttled mode since we've already had two ticks. When the composite finishes, we do a force refresh, but the compositor still waits for the next vsync at t=48 to do the next frame, where we composite and tick the refresh driver again. The refresh driver tick waits to tick since a composite hasn't happened again. It gets stuck in throttled mode for a lot longer and usually ticks from a forced refresh.

It gets unstuck if the CompositorChild::RecvDidComposite executes before the next vsync, but since the main thread is busy, this isn't always the case. The ticks in the test case in comment 15 take ~7-12 ms, So it oscillates between scrolling with a forced refresh versus vsync, causing the extra jank. I'm not quite sure why we hit throttled mode more often with the animated favicon, even without Silk it still oscillates between throttled / non-throttled, but the variability is lower since it sometimes composites "Now" and can catch up. e.g. sways from 15-25 ms versus sways from 10ms - 35 ms.

I tried disabling force refresh and instead just wait until the next vsync. It will force 30fps, but it's smoother than the variability, but still not quite as good as without silk. Alice, does this improve things for you a bit?

http://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/mchang@mozilla.com-80eaf982d403/

Thanks!
Flags: needinfo?(alice0775)
hmm
The try server build did not change anything for me.., The problem is reproduced.
Flags: needinfo?(alice0775)
I finally got the gecko profiler add-on working by going back to mozilla-beta and Windows 10. Here are two profiles:

With silk
http://people.mozilla.org/~bgirard/cleopatra/#report=d52a3ebea5b3218b3967b0d40eece585e3a95b37

without Silk
http://people.mozilla.org/~bgirard/cleopatra/#report=8ede2f31a4fd2c865f8f4ce3980dcba01c533cf6

Essentially, we spend 14-17 ms in display list generation. Once we load an animated favicon, we run the refresh driver forever (which is another bug outside of Silk). Without Silk, we tick the refresh driver Now(), which is roughly in line with what the timer was going at with a 60 HZ rate anyway.

With Silk, our refresh driver ticks are kind of all over the place since sometimes we "catch up" from throttling, wait for the vsync to tick, which causes a longer some longer display list generation.

What's confusing is that it seems to spend most of it's time waiting for something. Matt, do you have an idea of what the displaylist generation could be waiting for? Thanks!
Flags: needinfo?(matt.woodrow)
It's really hard to tell without proper stacks :(

I've seen GDI cause long wait times, but we shouldn't be calling into that from display list building..
Flags: needinfo?(matt.woodrow)
Attached file Vtune call stack
I finally got a sync call stack with a release build with vtune. The other profilers just always got "could not find call stack". Attached is the call stack. It basically looks like we spend lots of time flushing d2d1 through [1]. I've seen upwards of 20ms waits in that loop. 

This also occurs without the favicon, see profile - http://people.mozilla.org/~bgirard/cleopatra/#report=713639d3b365d3614cc3e0a7b80a016e9f00ed08.

It essentially looks like, since we tick the refresh driver forever, it slightly puts the refresh driver tick time over the threshold to become a forced refresh, and with vsync scheduling, we oscillate pretty badly with vsync / forced refreshes. 

Does this call stack make sense as to why we would spend lots of time locking?

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/ClientPaintedLayer.cpp?case=true&from=ClientPaintedLayer.cpp#82
Flags: needinfo?(matt.woodrow)
I definitely makes sense that that call would be slow. We're trying to make a copy of the current destination, so we have to flush d2d/d3d (and wait for it to complete?). It would be really nice if we could change the content to not need to hit this path.

This callstack shouldn't be contributing to the display list block in the profile though, it's very much during rasterization. I wonder if there's a bug in the label reporting.

As for why this got regressed because of vsync.. Is it because we're now trying to run content painting and the compositor at the same time, increasing our contention on the GPU?
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #31)
> I definitely makes sense that that call would be slow. We're trying to make
> a copy of the current destination, so we have to flush d2d/d3d (and wait for
> it to complete?). It would be really nice if we could change the content to
> not need to hit this path.
> 
> This callstack shouldn't be contributing to the display list block in the
> profile though, it's very much during rasterization. I wonder if there's a
> bug in the label reporting.

Yeah, since the profiler_labeling is just around the nsViewManager::ProcessPendingUpdates [1].

> As for why this got regressed because of vsync.. Is it because we're now
> trying to run content painting and the compositor at the same time,
> increasing our contention on the GPU?

It's hard to say, since the Intel GPU just randomly locks. From the profiles with silk, we see a lot more 20+ms display lists rather than 15 ms display lists, and the composites don't overlay with the refresh driver at all without silk. I also think it's because we're swinging between throttled mode / unthrottled mode, and we have to wait for the next vsync to tick until we do anything. Before, we schedule the next tick based on when we "should've" ticked if we hit the rate, which usually makes the next tick come up sooner than the next vsync. The uniformity of the ticks are better without vsync in this case. 

This also hurts on the compositor side. If we have a long refresh, the compositor before would start ASAP since it's been a long time since the last composite. With vsync, we wait which causes us to hit the throttled case a bit more as well.

[1] https://dxr.mozilla.org/mozilla-central/source/view/nsViewManager.cpp?from=nsviewmanager.cpp#397
Attached file display list dump (obsolete) —
Attached file display list dump (obsolete) —
Attachment #8676569 - Attachment is obsolete: true
Attached file another display list dump (obsolete) —
I'm not sure I'm looking at the right things here. I never saw a call to PushGroupAndCopyBackground when the favicon was being painted. I only saw it when painting nsDisplaySVGEffects. Here are some of the values, but the most interesting is that aRegionToInvalidate is always empty. Was this what you were looking for? You can probably reproduce this by just adding the favicon to a bookmarks menu, it's probably just fast enough on your machine that you don't notice the decrease in scrolling.

bound rect
    x	1360	int
		y	25920	int
		width	50680	int
		height	2000	int
http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.cpp#5386

aRegionToInvalidate(0, 0, 0, 0)
http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.cpp#5553

nsDisplaySVGEffects - 
-		mozilla::gfx::BaseRect<int,nsRect,nsPoint,nsSize,nsMargin>	{x=-181 y=3060 width=31027 ...}	mozilla::gfx::BaseRect<int,nsRect,nsPoint,nsSize,nsMargin>
		x	-181	int
		y	3060	int
		width	31027	int
		height	1680	int

http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.cpp#5437
Flags: needinfo?(matt.woodrow)
(In reply to Mason Chang [:mchang] from comment #37)
> I'm not sure I'm looking at the right things here. I never saw a call to
> PushGroupAndCopyBackground when the favicon was being painted. I only saw it
> when painting nsDisplaySVGEffects.

That's expected. The confusing piece is why we're painting the nsDisplaySVGEffects, when we should have only invalidated the favicon.

> Here are some of the values, but the most
> interesting is that aRegionToInvalidate is always empty.

This is normal too. aRegionToInvalidate is only used when the layers backend wants to invalidate something extra on top of what DLBI computed (maybe the layer backbuffer got deleted due to memory pressure?).


> Was this what you
> were looking for? You can probably reproduce this by just adding the favicon
> to a bookmarks menu, it's probably just fast enough on your machine that you
> don't notice the decrease in scrolling.
> 
> bound rect
>     x	1360	int
> 		y	25920	int
> 		width	50680	int
> 		height	2000	int
> http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.
> cpp#5386
> 
> aRegionToInvalidate(0, 0, 0, 0)
> http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.
> cpp#5553
> 
> nsDisplaySVGEffects - 
> -		mozilla::gfx::BaseRect<int,nsRect,nsPoint,nsSize,nsMargin>	{x=-181 y=3060
> width=31027 ...}	mozilla::gfx::BaseRect<int,nsRect,nsPoint,nsSize,nsMargin>
> 		x	-181	int
> 		y	3060	int
> 		width	31027	int
> 		height	1680	int
> 
> http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.
> cpp#5437

Kind of. These two rectangles don't appear to intersect at all, so shouldn't we be skipping the nsDisplaySVGEffects because of the condition at [1]?

[1] http://mxr.mozilla.org/mozilla-beta/source/layout/base/FrameLayerBuilder.cpp#5398
Flags: needinfo?(matt.woodrow)
I'm getting these bounds:

nsDisplayOpacity item for 0.5 opacity when we do call CopyGroupAndCopyBackground.

Visible Rect
		x	13657	int
		y	23440	int
		width	700	int
		height	680	int
    
Bounds Rect
		x	0	int
		y	15480	int
		width	55840	int
		height	14760	int

Also, on the page as reported, I'm only getting the CopyGroupAndCopyBackground every couple of frames while scrolling. On an idle about:blank page with the animated favicon running, I don't see it being called.

For an animated gif, I guess it kind of makes sense that we run the refresh driver forever. We probably just loop the gif.
Well those rects intersect, so that makes more sense. As does the fact that we're not invalidating during the idle state.

It seems that for some reason scrolling is triggering invalidations in the chrome layers that intersect with the partially transparent text.

You might need to link into https://wiki.mozilla.org/Gecko:DisplayListBasedInvalidation#Debugging_Invalidations_Problems to try figure out why this happens, because invalidations for scrolling should be limited to the layer subtree for content and shouldn't affect chrome.
Attached file display lits dump + invalidation log (obsolete) —
I'm seeing this:

Invalidating changes of the visible region bounds of the scrolled contents
Invalidating layer 161C4C00: < (x=0, y=736, w=1396, h=10); (x=0, y=1328, w=1396, h=10); >
Display item type XULImage(0FBF7E90) (in layer 0FB56400) belongs to an invalidated frame!
Invalidating layer 0FB56400: < (x=305, y=132, w=25, h=24); >

The XULImage 0xfb7e90 is the animated favicon in this case. Attached is the display list dump intermixed with the invalidation log. Does anything stand out to you? Thanks!
Flags: needinfo?(matt.woodrow)
Looks like we will ship 42 with this issue. Wont fix, however, it would be great to have a patch for 43.
(In reply to Mason Chang [:mchang] from comment #41)
> Created attachment 8677784 [details]
> display lits dump + invalidation log
> 
> I'm seeing this:
> 
> Invalidating changes of the visible region bounds of the scrolled contents
> Invalidating layer 161C4C00: < (x=0, y=736, w=1396, h=10); (x=0, y=1328,
> w=1396, h=10); >
> Display item type XULImage(0FBF7E90) (in layer 0FB56400) belongs to an
> invalidated frame!
> Invalidating layer 0FB56400: < (x=305, y=132, w=25, h=24); >
> 
> The XULImage 0xfb7e90 is the animated favicon in this case. Attached is the
> display list dump intermixed with the invalidation log. Does anything stand
> out to you? Thanks!

Is this invalidation log for a paint where you saw the SVGEffects being painted?

There are two SVGEffects items in that dump, and both belong to layer 0xfb56400 (along with all the other browser chrome content).

The only invalidation for that layer is the XULImage one you mentioned, and that doesn't intersect with either of the SVGEffects.
Flags: needinfo?(matt.woodrow)
In case I forget anything.

Pre-Reqs
1) Bookmark toolbar open, with 1 folder, and the animated favicon in the folder.

STR:
1) Load one tab with https://developer.mozilla.org/en-US/Add-ons/Code_snippets/Tabbed_browser
2) Load a second tab with about:config
3) Enable paint invalidation + display-list dumping
4) Go back to first tab
5) Click bookmark folder with animated favicon
6) Scroll
7) Go to second tab with about:config, disable prefs.
Attached file Invalidation log only (obsolete) —
I couldn't find a display list dump where the SVGEffects bounds overlapped with the favicon bounds. Instead, this is a dump where the gfxContext::PushGroupAndCopyBackground is called. Does this make any more sense?
Flags: needinfo?(matt.woodrow)
This paint looks like it's this content being invalidated and causing PushGroupAndCopyBackground:


Opacity p=0xe445080 f=0x1817a478(Block(_moz_generated_content_before)(-1)) bounds(32425,37260,700,680) layerBounds(32425,154440,700,680) visible(20102,6020,35644,35220) componentAlpha(32425,37260,700,680) clip()  (opacity 0.5)
      Text p=0xe445028 f=0xe272ac8(Text(0)"\uf08e") bounds(32425,37260,700,680) layerBounds(32425,154440,700,680) visible(20102,6020,35644,35220) componentAlpha(32425,37260,700,680) clip(0,6020,75760,35240) 


That's part of the page content, not chrome (though would be fixed by bug 1216851).

Looks like this isn't the right paint for a call from SVGEffects unfortunately.
Flags: needinfo?(matt.woodrow)
Attached file Display list dump (obsolete) —
Maybe the last one? This one, I clicked on the toolbar menu to show the favicon, then hid it away, then let it sit there for a bit until I disabled display list dumping. With this version, the favicon is hidden, visually, behind the bookmarks toolbar. I think I was able to find a place where the SVGEffects visible region overlaps the animated favicon's visible region (maybe? The pointer addresses are the same but the layer addresses are different). Or I could be reading these logs incorrectly. Attached is the full dump, but this is what I was looking at:

SVG Effects
SVGEffects p=0xc7cc0e0 f=0xe43eff8(Box(hbox)(1) id:urlbar-wrapper) bounds(-181,2540,47143,1680) layerBounds(-181,2540,47143,1680) visible(0,2240,74160,2280) componentAlpha(3480,2760,37071,1240) clip(0,2240,74160,2280)  layer=0xf456c00 effects=(clip(trivial))

Animated Icon in Chrome
XULImage p=0x7ce4900 f=0x13eda2d8(ImageBox(image)(0) class:menu-iconic-icon) bounds(320,260,960,960) layerBounds(320,260,960,960) visible(120,120,6506,1240) componentAlpha(0,0,0,0) clip(120,120,6506,1240) 

Animated Icon in Client (I think)
XULImage p=0x7ce4900 f=0xdd401c8(ImageBox(image)(-1) class:toolbarbutton-icon) bounds(72300,920,720,720) layerBounds(72300,920,720,720) visible(0,0,76800,41240) componentAlpha(0,0,0,0) clip(0,0,76800,41240)  layer=0xf456c00

Is this also still off?
Attachment #8676574 - Attachment is obsolete: true
Attachment #8676575 - Attachment is obsolete: true
Attachment #8676576 - Attachment is obsolete: true
Attachment #8677784 - Attachment is obsolete: true
Attachment #8680109 - Attachment is obsolete: true
Attachment #8680113 - Attachment is obsolete: true
Attachment #8680226 - Attachment is obsolete: true
Flags: needinfo?(matt.woodrow)
Attached file display list dump
Attachment #8680862 - Attachment is obsolete: true
We're not ever expecting the XULImage and SVGEffects to overlap, since they're visibly in different places.

I want to find the paints where we decide to repaint the content within the SVGEffects even though they *dont* intersect, and we're expecting only the XULImage to have been invalidated.

That is, find out why we decided the SVGEffects area was invalid, even though it's distinct from the obviously invalidating thing.
Flags: needinfo?(matt.woodrow)
[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

I don't see us fixing this without having it ride the trains; so far the discussion isn't pointing to a small fix.  Feels like wontfix for 43 and 44.
I was kind of able to reproduce with just a small html file that just had the word "text" everywhere. Attached is the display list dump + invalidation log with the favicon.
Just out of curiosity, should we be ticking with the refresh drivers with the same transaction id multiple times? If I just load a blank page and have the favicon running, I see that we tick the same transaction 4-5 times.
Flags: needinfo?(matt.woodrow)
No, that's unexpected.

We have machinery to return a transaction id if we don't end up sending anything to the compositor, but transaction ids that get sent to the compositor should be unique.
Flags: needinfo?(matt.woodrow)
Lots of things are unexpected here. We're never getting shadow layer updates, so the compositor never ticks, hence why it's very easy to get into throttled mode here. STR:

1) Go to skindeepcomic.com
2) Bookmark the site
3) Add the bookmark to a folder, move the folder to the bookmarks toolbar
4) restart firefox
5) Reload, go to about:blank
6) open the folder from (3), don't need to click.
7) See the refresh driver tick forever with the same transaction id.
Flags: needinfo?(seth)
I'm not quite sure what exactly is going on with the painting here, but in regards to scheduling, it looks like we have increased GPU contention. From VTune, I see a lot of extra time spent at [1]. This is a sync transaction from the refresh driver to the compositor. Since we're ticking the refresh driver and compositor at the same time, the layer transaction has 2-3x more contention with the favicon running all the time. The time it takes to send the transaction is roughly 0.5-2 ms whereas before, almost all transactions finished in less than 0.5 ms. At least in the test case, painting itself takes 15 ms or so. With Silk, we have the increased contention which causes the paint to take over 16 ms and forces the refresh driver and compositor into throttled mode. Since each paint is now scheduled via throttling, we have a non-uniform timestamp sent to gfxScrollFrame to calculate the scroll position, which causes janky scroll positions. I'm not really quite sure what we can do about the contention since by design, we're ticking both the compositor and refresh driver together.

As for why the paint itself takes so long, both Matt and I are still confused, but it seems to be a problem outside of Silk and is somewhere in layout.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.cpp?from=ShadowLayers.cpp#666
I think I finally figured out what's going on here. We have a few assumptions:

1) Painting the animated favicon in a folder forces us to go through the non-OMTC route and use basic layers. This causes refresh driver ticks to take 10-20+ ms, frequently putting us in throttled mode. This happens with or without Silk scheduling. But we're not really sure why the refresh driver ticks are so long and nothing stands out in the profile.
2) We have increased GPU contention with the compositor due to vsync scheduling. This has shown 2-3x slower when sending a layer update (from ~0.5ms to ~1-1.5 ms);
3) What usually happens in that when we start to scroll, we take a long time to tick the refresh driver. This puts the refresh driver / compositor into throttled mode.
4) For sake of clarity, composites usually only take 2-3 ms. In the following examples, we'll assume a constant composite time of 4 ms.
5) We throttle if the refresh driver is 1 transaction ahead of the compositor, but we only do a forced refresh when the transaction comes back if another refresh driver tick occurred before the transaction was completed.

The first big problem is the order in which refresh drivers get ticked. When a tick comes from a refresh timer, there isn't a strong ordering on which refresh driver gets ticked first/last. When a forced refresh comes from a throttled transaction, the root refresh driver gets ticked first, but then ticks the non-root refresh drivers before ticking itself [1]. Also, non-root refresh drivers, will not paint if the root refresh driver is throttled. What I'm seeing happen with the favicon is that the root refresh driver, which contains the favicon, gets ticked first when ticked from a refresh timer. This is important in throttled mode. When the refresh driver which has the nsGfxScrollFrame ticks, since the root refresh driver is throttled, the refresh driver with the scroll frame doesn't tick. It also forces the root refresh driver to force refresh when a transaction is completed. WITHOUT the favicon, the inverse occurs: The nsGfxScrollFrame ticks BEFORE the root refresh driver.

Now with Silk, it becomes easier to get into a throttled state. Take the case of coming from idle. With Silk, at time t=0, the refresh driver ticks and allocates a transaction id. At time t=16, both the compositor and refresh driver tick. The refresh driver will tick and automatically become throttled at this point because the compositor hasn't ticked. At time t=20, the compositor finishes the transaction. However, we don't immediately tick the refresh driver since we haven't skipped any paints yet [2]. As long as the refresh driver finishes in time, the refresh driver ticks as normal, but one extra tick will cause it to throttle.

Before Silk, when coming from idle, at time t=0, the refresh driver ticks and sends a transaction id. However long the tick takes (let's say t=5ms), it sends the transaction to the compositor, and the compositor would tick NOW since it's been idle. It could finish the transaction within 4 ms, send the result back at time t=9, and the refresh driver could process it all before 16 ms is up. So one disadvantage with Silk is that we're closer to becoming stuck in a throttled / force refresh mode over pre-silk since we by design, we'll have two ticks before the compositor can finish a transaction.

Both with and without Silk, we can eventually recover from throttled mode, but we get recover differently. Let's say we have a long refresh driver tick, 20ms. It sends the transaction to the compositor and the compositor receives it at time t=21 ms. Without Silk, the Compositor would tick NOW, and the refresh driver would also tick NOW again. The refresh driver would finish quickly since there isn't a lot of work to do, it's throttled, and another scheduled refresh driver tick would come in at time t=32 ms since the refresh driver schedules the next tick BEFORE ticking and it schedules it at the rate it should be ticking at. However, the Compositor is now off sync with the refresh driver and the next composite will occur at time t=37, (21 ms + 16 ms). If composites take 4 ms, the first composite will finish at time t=25, and the refresh driver should get the notification that the transaction finished BEFORE the next refresh driver tick. If the refresh driver tick takes less than 7 ms (32 - 5), which it happens often enough, now the refresh driver will no longer be throttled as long as the next ticks are less than 16 ms (which most are). Also, no forced refresh occurs because the transaction complete came before the next tick.

With Silk, we start with a 20ms tick. The compositor gets the transaction, but waits until the next vsync at a time t=32. We allow the refresh driver to backlog 1 vsync, so it too will tick ASAP and become throttled. At t=32, both the refresh driver and compositor tick at the same time. The refresh driver is still throttled and doesn't do anything. Unlike pre-silk, a transaction didn't finish before the tick and now we skipped a paint. Thus, when the compositor finishes, we'll do a force refresh. The composite finishes at time t=36, and sends a transaction back. The refresh driver ticks at time t=37 since it's throttled and sends another transaction. But the compositor waits to composite at time t=48. The refresh driver and compositor both tick at time t=48, and the refresh driver becomes throttled again. What's even worse, the root refresh driver ticked from the vsync, but the child did not. The child however did check if the parent refresh driver was waiting for a transaction, which it was, and forced the parent to think it missed a tick. Thus, when the compositor comes back, it will force refresh all the refresh drivers. When the compositor finishes at time t=52, we do another force refresh driver tick, which starts the cycle over again.

Sadly, the only time I see the refresh driver become unthrottled is if the composition comes back close to a vsync (v1). In those cases, the forced refresh allocates a transaction id and sends another shadow update. Then the vsync (v1) occurs and thinks no work needs to be done, no transaction id is allocated, and we composite quickly, at which point we can "catch up". However, we quickly go back to throttled mode also. The refresh driver ticks at vsync v2, and everything is ok. At vsync (3), we start compositing the transaction from (v2), and the refresh driver ticks the root refresh driver first. This causes the refresh driver to go into throttled mode. The child refresh driver causes the root refresh driver to think it's been ticked, and now we're back into forced refresh mode. In normal scrolling, we still hit this situation, but the non-root refresh driver ticks first, so we don't fall into the situation where the root refresh driver does a forced refresh. Something with the favicon forces the root refresh driver to tick first. Combined with the delayed compositing, creates many forced refreshes and hence the janky scrolling.

In summary, we have a couple of issues.
1) Should we raise the limit to 2 transactions with Silk before we stop painting? Right now, if we have a long tick, let's say 20 ms, we'll tick twice pretty quickly since we backlog 1 vsync. Then we'll tick again at t=32 from the next vsync and become throttled. If we did 2, we'd at least keep up in this case since the composite should still be fast and we wouldn't do a force refresh. If I understand correctly, the compositor will coalesce transactions if two come in before the compositor gets to it?
2) Is there anything we can do about painting an animated favicon taking 5+ms consistently? Or is this just basic layers is expensive?
3) Should we re-order refresh driver ticks such that the root refresh driver ticks last? Are root refresh drivers the only ones able to allocate transactions?
4) What happens if we allow non-root refresh drivers to tick while the root refresh driver is pending a transaction?

I'm thinking (1) is ok, along with lowering the refresh rate to 30 FPS in these cases. It also means we stop backlocking a parent refresh driver tick and don't allow another one until after the current tick has completely finished. Right now, we queue one up as long as the main thread is going to tick the refresh driver. This will create a more stable 30 FPS scenario than what we have now where we immediately try to tick again. I just don't know if it's going to be ok to produce one extra frame, but it would give us the same amount of buffer as we had pre-silk.

I'm really not the person to ask about (2), but it'd be nice if it wasn't so slow.

As for (3), I wonder how much we'd break by enforcing (3) and it hasn't been a huge problem before, so I'm ok leaving it as is.

Thoughts? Thanks, sorry for the long post. It's complicated and Friday!

[1] http://mxr.mozilla.org/mozilla-release/source/layout/base/nsRefreshDriver.cpp#1722
[2] http://mxr.mozilla.org/mozilla-release/source/layout/base/nsRefreshDriver.cpp#2007
Flags: needinfo?(seth) → needinfo?(matt.woodrow)
Tracking for 43+ to carry this forward. Wow, I'm a little in awe of comment 59. Progress!

I do agree with Milan this sounds like a wontfix for 43.
(In reply to Mason Chang [:mchang] from comment #59)

> The first big problem is the order in which refresh drivers get ticked. When
> a tick comes from a refresh timer, there isn't a strong ordering on which
> refresh driver gets ticked first/last. When a forced refresh comes from a
> throttled transaction, the root refresh driver gets ticked first, but then
> ticks the non-root refresh drivers before ticking itself [1]. Also, non-root
> refresh drivers, will not paint if the root refresh driver is throttled.

We should probably try guarantee this ordering for all paints. I think it makes most sense to tick all child refresh drivers first (and send rAF to content etc) before ticking the root and painting.

> What I'm seeing happen with the favicon is that the root refresh driver,
> which contains the favicon, gets ticked first when ticked from a refresh
> timer. This is important in throttled mode. When the refresh driver which
> has the nsGfxScrollFrame ticks, since the root refresh driver is throttled,
> the refresh driver with the scroll frame doesn't tick. It also forces the
> root refresh driver to force refresh when a transaction is completed.
> WITHOUT the favicon, the inverse occurs: The nsGfxScrollFrame ticks BEFORE
> the root refresh driver.

It's a little unexpected that the favicon is in the root refresh driver, I would have expected the popup window get have it's own refresh driver. The nsGfxScrollFrame is within the content document, and each document has its own refresh driver, so that makes sense.

As above, I think the latter ordering seems saner, and it would be nicer if we always did that.


> 
> In summary, we have a couple of issues.
> 1) Should we raise the limit to 2 transactions with Silk before we stop
> painting? Right now, if we have a long tick, let's say 20 ms, we'll tick
> twice pretty quickly since we backlog 1 vsync. Then we'll tick again at t=32
> from the next vsync and become throttled. If we did 2, we'd at least keep up
> in this case since the composite should still be fast and we wouldn't do a
> force refresh. If I understand correctly, the compositor will coalesce
> transactions if two come in before the compositor gets to it?

This seems ok. Yeah, the compositor will coalesce transactions.

> 2) Is there anything we can do about painting an animated favicon taking
> 5+ms consistently? Or is this just basic layers is expensive?

Hard to say, need to profile this and see what's taking the time. It seems unnecessarily slow though.

> 3) Should we re-order refresh driver ticks such that the root refresh driver
> ticks last? Are root refresh drivers the only ones able to allocate
> transactions?

I think we should, yeah. I'm not entirely sure what happens when we have an OMTC popup, which should be possible on some platforms at least. We'd be allocating transaction ids for painting those, and it's not clear which refresh driver it'd be coming from. Might be more fun bugs :)

> 4) What happens if we allow non-root refresh drivers to tick while the root
> refresh driver is pending a transaction?

Well, we'd do things like fire rAF to content for those non-root refresh drivers. It's a bit weird to do that if we're not actually going to do any painting yet (because the root-refresh driver that controls painting is throttled).

> 
> I'm thinking (1) is ok, along with lowering the refresh rate to 30 FPS in
> these cases. It also means we stop backlocking a parent refresh driver tick
> and don't allow another one until after the current tick has completely
> finished. Right now, we queue one up as long as the main thread is going to
> tick the refresh driver. This will create a more stable 30 FPS scenario than
> what we have now where we immediately try to tick again. I just don't know
> if it's going to be ok to produce one extra frame, but it would give us the
> same amount of buffer as we had pre-silk.

I think skipping the backlogged refresh is the right thing to do. It seems like a lot of this problem is caused by the fact that we let the refresh driver tick again immediately (instead of waiting for vsync), but we don't have the equivalent for the compositor. This disconnect is how we end up out of sync and struggling to catch up. We should do the same for both, and having the refresh driver wait for the next vsync seems like the best and simplest way to achieve this.

Solving that should remove the need to do (1), but it wouldn't be a disaster to have it anyway.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #61)
> (In reply to Mason Chang [:mchang] from comment #59)
> 
> > The first big problem is the order in which refresh drivers get ticked. When
> > a tick comes from a refresh timer, there isn't a strong ordering on which
> > refresh driver gets ticked first/last. When a forced refresh comes from a
> > throttled transaction, the root refresh driver gets ticked first, but then
> > ticks the non-root refresh drivers before ticking itself [1]. Also, non-root
> > refresh drivers, will not paint if the root refresh driver is throttled.
> 
> We should probably try guarantee this ordering for all paints. I think it
> makes most sense to tick all child refresh drivers first (and send rAF to
> content etc) before ticking the root and painting.

Thanks, sounds good!

> I think we should, yeah. I'm not entirely sure what happens when we have an
> OMTC popup, which should be possible on some platforms at least. We'd be
> allocating transaction ids for painting those, and it's not clear which
> refresh driver it'd be coming from. Might be more fun bugs :)

Oh joy! Can't wait lol.

> > I'm thinking (1) is ok, along with lowering the refresh rate to 30 FPS in
> > these cases. It also means we stop backlocking a parent refresh driver tick
> > and don't allow another one until after the current tick has completely
> > finished. Right now, we queue one up as long as the main thread is going to
> > tick the refresh driver. This will create a more stable 30 FPS scenario than
> > what we have now where we immediately try to tick again. I just don't know
> > if it's going to be ok to produce one extra frame, but it would give us the
> > same amount of buffer as we had pre-silk.
> 
> I think skipping the backlogged refresh is the right thing to do. It seems
> like a lot of this problem is caused by the fact that we let the refresh
> driver tick again immediately (instead of waiting for vsync), but we don't
> have the equivalent for the compositor. This disconnect is how we end up out
> of sync and struggling to catch up. We should do the same for both, and
> having the refresh driver wait for the next vsync seems like the best and
> simplest way to achieve this.
> 
> Solving that should remove the need to do (1), but it wouldn't be a disaster
> to have it anyway.

From testing, doing (1) didn't seem to have much of an impact. Doing:

1) Ticking the root refresh driver last
2) Waiting for a vsync when a finished transaction returns
3) Not backlogging another vsync

Seemed to have mostly fixed the issue for me. It's still not quite as good
as without Silk, but it's significantly better. This is because when we do throttle,
we fall back to 30 FPS, which is jankier than 60 smooth FPS.

We do however, backlog the compositor at the moment just like we did with the refresh driver on the parent side. We didn't backlog on the content side. So for now, we'll backlog on the compositor but not on the refresh driver. The backlog didn't matter much before since we coalesce transactions, we'd send two transactions and just composite the latest one. Since composites generally finish quickly, we don't backlog very often.
Hi Alice,

Can you try this build and see if the issue is better for you?

http://archive.mozilla.org/pub/firefox/try-builds/mchang@mozilla.com-71a40b47b4e649fc7dffdd967577666104ff1f11/try-win64/

It won't be exactly as good as without Silk, but it should be significantly better than nightly. Thanks!
Flags: needinfo?(alice0775)
(In reply to Mason Chang [:mchang] from comment #65)
> Or a 32 bit build if you prefer:
> 
> http://archive.mozilla.org/pub/firefox/try-builds/mchang@mozilla.com-
> 71a40b47b4e649fc7dffdd967577666104ff1f11/try-win32/

WOW!, The try build fixed the problem for me.

https://hg.mozilla.org/try/rev/71a40b47b4e649fc7dffdd967577666104ff1f11
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Flags: needinfo?(alice0775)
Attached patch Tick root refresh driver last (obsolete) — Splinter Review
Try looking good:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=71a40b47b4e6

This patch does three things:
1) Always ticks the root refresh driver last during a refresh timer tick.
2) Waits for vsync when the refresh driver is throttled instead of force refreshing.
3) Stops backlogging a refresh timer tick if the main thread is busy. We'll only tick at another vsync after the refresh timer has finished ticking in the parent process.

From comment 66, this looks like it works!
Attachment #8685186 - Attachment is obsolete: true
Attachment #8685443 - Flags: review?(matt.woodrow)
May be worth an uplift to Aurora?
(In reply to Milan Sreckovic [:milan] from comment #68)
> May be worth an uplift to Aurora?

I'd rather not. This is a pretty subtle change that might lead to lots of random ugly bugs. I'd prefer to let this ride the trains.
Comment on attachment 8685443 [details] [diff] [review]
Tick root refresh driver last

Review of attachment 8685443 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsRefreshDriver.cpp
@@ +181,5 @@
> +    }
> +
> +    nsRefreshDriver* refreshDriver = drivers[0];
> +    nsRefreshDriver* rootRefreshDriver = nullptr;
> +    nsPresContext* displayRoot = refreshDriver->PresContext()->GetDisplayRootPresContext();

We'll have one root refresh driver per top level window, so I think you need to make this an array and move all the root ones into it.

@@ +199,5 @@
>  
>        TickDriver(driver, jsnow, now);
>      }
> +
> +    if (rootRefreshDriver) {

Probably still need to check IsTestControllingRefreshesEnabled here.
Attachment #8685443 - Flags: review?(matt.woodrow) → review+
See Also: → 1184283
Attached patch Tick root refresh driver last (obsolete) — Splinter Review
Waiting on try before asking for review:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=606ed80641b0

Includes some conversations from irc, specifically that we'll have two refresh driver arrays: One for content and one for root. This way, we wouldn't have to keep checking every refresh driver to see if it's a root driver every tick.
Attachment #8685443 - Attachment is obsolete: true
Uses two lists to manage the root / content refresh drivers. The changes are big enough that I figured I should ask for another review. The big changes are also around shutting down the PresContext(). During shutdown, we can't reliably detect if a refresh driver is a root refresh driver since the PresContext is disconnecting. 

Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4d365bc329d
Attachment #8686151 - Attachment is obsolete: true
Attachment #8686857 - Flags: review?(matt.woodrow)
Attachment #8686857 - Flags: review?(matt.woodrow) → review+
See all the regressions in bug 1224677.

Tested CanvasMark locally and verified that the regression occurs from not ticking the refresh driver after FinishedWaitingForTransaction() executes and instead waiting for the next vsync. From the test notes itself [1]:

"Results are a score "based on the length of time the browser was able to maintain the test scene at greater than 30 FPS, multiplied by a weighting for the complexity of each test type"."

Since we fall back to 30 FPS, by design in this patch, if we start to fall back instead of having some random interval between 30 - 60 FPS, a fairly large regression kind of makes sense.

[1] https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark
Landed with only ticking the root refresh drivers last. I took out the parts that did:

1) Stop backlogging a vsync 
2) Don't force refresh when throttled.

This still fixed the actual favicon scrolling bug that was reported in this bug.

Talos compare looking good: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dbffe594f8fb&newProject=try&newRevision=6569a6798d53

@matt - Do you have a strong preference to throttle down to 30 FPS and accept the talos regressions?
Flags: needinfo?(matt.woodrow)
(In reply to Mason Chang [:mchang] from comment #77)
> Landed with only ticking the root refresh drivers last. I took out the parts
> that did:
> 
> 1) Stop backlogging a vsync 
> 2) Don't force refresh when throttled.
> 
> This still fixed the actual favicon scrolling bug that was reported in this
> bug.
> 
> Talos compare looking good:
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=dbffe594f8fb&newProject=try&newR
> evision=6569a6798d53
> 
> @matt - Do you have a strong preference to throttle down to 30 FPS and
> accept the talos regressions?

It's probably not that critical, especially if we can fix the bug without it.

Aren't most talos tests using ASAP mode? Shouldn't that supercede this change, and let us still hit the maximum framerate possible?
Flags: needinfo?(matt.woodrow) → needinfo?(mchang)
(In reply to Matt Woodrow (:mattwoodrow) from comment #78)
> (In reply to Mason Chang [:mchang] from comment #77)
> > Landed with only ticking the root refresh drivers last. I took out the parts
> > that did:
> > 
> > 1) Stop backlogging a vsync 
> > 2) Don't force refresh when throttled.
> > 
> > This still fixed the actual favicon scrolling bug that was reported in this
> > bug.
> > 
> > Talos compare looking good:
> > https://treeherder.mozilla.org/perf.html#/
> > compare?originalProject=try&originalRevision=dbffe594f8fb&newProject=try&newR
> > evision=6569a6798d53
> > 
> > @matt - Do you have a strong preference to throttle down to 30 FPS and
> > accept the talos regressions?
> 
> It's probably not that critical, especially if we can fix the bug without it.
> 
> Aren't most talos tests using ASAP mode? Shouldn't that supercede this
> change, and let us still hit the maximum framerate possible?

The Talos Page Switch / Tab animation tests do use ASAP. The super big regressions came from CanvasMark, which isn't using ASAP mode. The non-CanvasMark tests didn't regress that much and some improved on different platforms, so I'd suspect that it would be noise.
Flags: needinfo?(mchang)
https://hg.mozilla.org/mozilla-central/rev/ac65a40dffcd
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Mason, does this need to be uplifted to Aurora44?
Flags: needinfo?(mchang)
(In reply to Ritu Kothari (:ritu) from comment #81)
> Mason, does this need to be uplifted to Aurora44?

This is a pretty risky patch and I'd rather not uplift this if possible.
Flags: needinfo?(mchang)
(In reply to Mason Chang [:mchang] from comment #82)
> (In reply to Ritu Kothari (:ritu) from comment #81)
> > Mason, does this need to be uplifted to Aurora44?
> 
> This is a pretty risky patch and I'd rather not uplift this if possible.

Thanks Mason. In that case, we will mark this as a wontfix for FF44.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: