Closed Bug 822096 Opened 12 years ago Closed 3 years ago

Scrolling is jank-y

Categories

(Core :: Graphics, defect, P3)

Product:

Component:

Platform:

x86

Linux

Type:

defect

Priority:

P3

Severity:

normal

Tracking

()

Status:

RESOLVED WORKSFORME

Tracking Flags:

Tracking

Status

firefox20

-

affected

firefox21

-

affected

People

(Reporter: smaug, Assigned: smaug)

References

Details

(Keywords: perf, Whiteboard: [c=handeye p= s= u=])

Attachments

(5 files)

sysprof profile 12 years ago Olli Pettay [:smaug][bugs@pettay.fi] 3.75 MB, text/xml		Details
patch 11 years ago Olli Pettay [:smaug][bugs@pettay.fi] 1.58 KB, patch	roc : review+ avih : feedback+	Details \| Diff \| Splinter Review
different approach, don't be in perf mode when user has been active 11 years ago Olli Pettay [:smaug][bugs@pettay.fi] 6.54 KB, patch		Details \| Diff \| Splinter Review
balanced_favor_perf.diff 11 years ago Olli Pettay [:smaug][bugs@pettay.fi] 20.48 KB, patch		Details \| Diff \| Splinter Review
+tweak to enable perf hint in the active tab only 11 years ago Olli Pettay [:smaug][bugs@pettay.fi] 20.54 KB, patch		Details \| Diff \| Splinter Review

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Description

•

12 years ago

I believe this is a regression. Scrolling isn't smooth in a window which has many tabs. Especially pages which have many images are slow to scroll. Most of the time is spent under nsRefreshDriver::Tick / nsViewManager::ProcessPendingUpdatesForViews This is on an ultrabook running Fedora 17.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 1

•

12 years ago

Could be, I changed a bunch of stuff here in the refresh driver recently. Any suggested steps to reproduce? Is opening up a bunch of tabs (10? 20? 50?) and then going to one that has a lot of images (e.g. big picture blog page? something more?) has janky scrolling?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 2

•

12 years ago

I have 35 tabs. Most of them are bugzilla pages, but also some news sites and Facebook and etherpad. But an odd thing is that the slowness disappears at some point. After FF restart it is there again, and then after x minutes (I guess x is more than 10) scrolling is back to normal. Really odd. When the slowness happens, it is most visible in http://www.mtv3.fi/uutiset/ (Finnish news site)

Jeff Muizelaar [:jrmuizel]

Updated

•

12 years ago

Keywords: regressionwindow-wanted

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 3

•

12 years ago

This is more generic. And I see this even in FF18. FF17 is ok. I'll try to find regression range. I don't know why it shows up in my main profile.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 4

•

12 years ago

I'll try to find the regression range tomorrow.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 5

•

12 years ago

Hmm, perhaps there are few different regressions, but the really bad one is recent. http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4dfe323a663d&tochange=634180132e68

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 6

•

12 years ago

Bug 820131, Bug 805507 or Bug 806099 perhaps.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 7

•

12 years ago

(In reply to Olli Pettay [:smaug] from comment #3) > This is more generic. And I see this even in FF18. FF17 is ok. So this it not true. Scrolling is better in FF17, but it is still usable in FF18.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 8

•

12 years ago

Ah, so there is also bug 799242 in the regression range.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 9

•

12 years ago

I would be surprised if this wasn't related to bug 731974, which directly touched the refresh driver (which caused bug 799242 and was fixed the next day). I didn't get a chance to look at this today, but I will in the morning. One more question -- how are you scrolling? How do you determine slow scrolling? Just moving the scrollbar up and down, using the mouse wheel, spacebar, what? (If wheel, is pixel scrolling/smooth scrolling enabled?)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 10

•

12 years ago

I scroll using touchpad's two finger scrolling. Testcase is to load http://www.mtv3.fi/uutiset/ and try to scroll it down. If first scroll-down takes 0.5 - 1s and is not smooth at all, the bug is there. Smooth scrolling is activated.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 11

•

12 years ago

Another testcase is MoCo phonebook. Both the phonebook and http://www.mtv3.fi/uutiset/ have somewhat similar background (gradient or large image?). I wonder if that has something to do with this.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 12

•

12 years ago

Hmmm. I can't reproduce this on Windows. Will ask around for a linux laptop.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 13

•

12 years ago

Can you install the profiler (see https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler) and grab a profile immediately after you try and get the janky scrolling? Then upload it and post the link here. It might give us some info.

Benoit Jacob [:bjacob] (mostly away)

Comment 14

•

12 years ago

I can't reproduce on Linux x86-64, NVIDIA driver, default profile, opening a few dozen bugzilla tabs and the testcase in comment 11.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 15

•

12 years ago

Attached file sysprof profile — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 16

•

12 years ago

Don't even try to open that profile in browser. It should be opened using sysprof.

Matt Woodrow (:mattwoodrow)

Comment 17

•

12 years ago

This page uses a fixed-position background that isn't on the css canvas (it's on a div on top). We don't ever put these into their own layer (because of their potential to break async scrolling - they don't necessarily cover the entire viewport), and instead have to always repaint everything on scroll since the text has moved relative to the background gradient. I think the regression range here only applies to BasicLayers, where bug 806099 changed how we choose the active scrolled root for flattened layers. We might have been getting away with only repainting the content and not the background previously.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 18

•

12 years ago

The laptop on which I see the problem is an ultrabook. about:support says Adapter Description Tungsten Graphics, Inc -- Mesa DRI Intel(R) Ivybridge Mobile Device ID Mesa DRI Intel(R) Ivybridge Mobile Driver Version2.1 Mesa 8.0.4 GPU Accelerated Windows 0/2 Basic AzureCanvasBackend cairo AzureContentBackend none AzureFallbackCanvasBackend none Does that mean Gecko ends up using BasicLayers?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 19

•

12 years ago

Looks like bug 806099 can't be backed out cleanly.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 20

•

12 years ago

Ah, maybe the check in patch is possible to back out, not the patch in the bug.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 21

•

12 years ago

But bug 806099 doesn't seem to have caused the problem

Matt Woodrow (:mattwoodrow)

Comment 22

•

12 years ago

It doesn't? That is unexpected. Probably worth doing a local bisect to find out what actually caused this.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 23

•

12 years ago

Yeah, I'm trying to find what caused the regression. Compiling takes time on this machine.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 24

•

12 years ago

On a whim, can you try the build in http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vladimir@pobox.com-fadac9dbe2a8/ ?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 25

•

12 years ago

That tryserver build doesn't behave any better :( I'm compiling new builds ...

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 26

•

12 years ago

Ok, finally. It is bug 799242 which caused this. Haven't debugged yet at all what is happening.

Blocks: 799242

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 27

•

12 years ago

bug 799242? not bug 731974? I don't think 799242 ever landed by itself even...

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

12 years ago

Er, sorry, bug 731974. Copy pasted the wrong bug number from the commit message.

Blocks: 731974
No longer blocks: 799242

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 29

•

12 years ago

Ok; not too surprising that that bug causes it.. but I still can't reproduce here :( I would do any testing along with the patch from bug 823091 though.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 30

•

12 years ago

So the bug is odd. I start FF with some tabs and one of them is http://www.mtv3.fi/uutiset/ The jankyness may not start immediately when in process of restoring the session (I have the setting to load all the tabs after restart), but once all the tabs have been loaded, scrolling is very slow. It is perhaps slow on all the pages, but especially with http://www.mtv3.fi/uutiset/ Then, much later the scrolling may get smooth again, or may not.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 31

•

12 years ago

I don't know what has fixed this, but I haven't seen this for few days.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 32

•

12 years ago

Er, very much still there with https://phonebook.mozilla.org/tree.php

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 33

•

12 years ago

And once the problem starts, it affects to all the pages to some extent. Is it possible that the refreshdriver gets mad. When the problem happens http://people.mozilla.org/~wmccloskey/example1.html reports normal numbers, but painting is very janky. I mean the red dot is moved once a second or so.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

12 years ago

tracking-firefox20: --- → ?

tracking-firefox21: --- → ?

Summary: Scrolling is jank-y when the window has many tabs → Scrolling is jank-y

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

12 years ago

Log calls to nsRefreshDriver::Tick and make sure it's running frequently enough?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 35

•

12 years ago

Based on http://people.mozilla.org/~wmccloskey/example1.html it is running (once the page is painted the black line shows usual numbers, so animationframe callback got called normally).

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

12 years ago

Log nsWindow paint (NS_PAINT) events to make sure the window is being redrawn regularly too?

bhavana bajaj [:bajaj]

Updated

•

12 years ago

status-firefox20: --- → affected

status-firefox21: --- → affected

tracking-firefox20: ? → +

tracking-firefox21: ? → +

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 37

•

12 years ago

I get this on another linux laptop too. Doesn't happen too often, but happens. Roc, how can I log nsWindow paint events?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

12 years ago

instrument nsWindow::OnPaint in nsWindowGfx?

Lukas Blakk [:lsblakk] use ?needinfo

Comment 39

•

12 years ago

Given how hard this is to reproduce and that it seems to only affect Linux, we'll remove tracking but uplift nominations when a fix is found would be welcome.

tracking-firefox20: + → -

tracking-firefox21: + → -

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

12 years ago

Keywords: regressionwindow-wanted

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 40

•

12 years ago

One thing I noticed today when profiling is that when the browser entered to a bad state, things under gfxAlphaBoxBlur::Paint started to page fault a lot.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

12 years ago

Low on memory?

Avi Halachmi (:avih)

Comment 42

•

12 years ago

(In reply to Olli Pettay [:smaug] from comment #0) > I believe this is a regression. > Scrolling isn't smooth in a window which has many tabs. Especially pages > which have many images are slow to scroll. > Most of the time is spent under nsRefreshDriver::Tick / > nsViewManager::ProcessPendingUpdatesForViews > > This is on an ultrabook running Fedora 17. Could you please refine your description of "janky" and "slow"? Would you say that it seems like the refresh rate has gone down (say, to 10-20 fps) but otherwise distance/sec didn't change? or does it introduce hangs (200ms? 500ms? ...)? or does the scroll distance get shorter compared to when it's working well? What's the cpu/cores utilization while scrolling is janky? I'm asking since I possibly noticed a similar issue (on windows xp, 7), though it started quite some time ago, probably at least 2 years back. My symptom was that the actual refresh rate would suddenly go down to 10-20 fps, on pages where I know it can scroll much closer to 60 (it was mostly happening to me on a specific vBulletin forum). And to me as well, I had many (20+) tabs open, it was intermittent, and restarting the browser solved it. While I haven't experienced it recently, your symptoms do remind me mine.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 43

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #41) > Low on memory? No, which is why the profile was a bit odd. But that is what sysprof told me... not sure how reliable it is in this case. This is happening (rarely) also on this fast i7-Qx laptop, running Fedora 18. When the problem happens fps on pages like http://www.mtv3.fi/uutiset/ goes to perhaps 2fps

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 44

•

12 years ago

For some reason this has happened 5 times today.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 45

•

12 years ago

Got it again. CPU doesn't seem to be high. So, since this is a regression from RefreshDriver changes, I assume we somehow get to a state where RefreshDriver fires way too rarely.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 46

•

12 years ago

Or, no, based on http://people.mozilla.org/~wmccloskey/example1.html animation frame callback is called often enough, but painting doesn't happen every time.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

12 years ago

try comment #38?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 48

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38) > instrument nsWindow::OnPaint in nsWindowGfx? what kind of instrumentation would be useful? (I'm getting the problem again somewhat regularly today.)

Timothy Nikkel (:tnikkel)

Comment 49

•

12 years ago

Number of milliseconds between paints maybe?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

12 years ago

That would work. Or just printf something when you hit OnPaint and see if, while you see rendering frozen, we're still hitting OnPaint or not.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 51

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38) > instrument nsWindow::OnPaint in nsWindowGfx? I assume not there since that is Windows thingie, and I'm seeing this on linux.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 52

•

12 years ago

nsWindow::OnExposeEvent might be the right place.

Timothy Nikkel (:tnikkel)

Comment 53

•

12 years ago

(In reply to Olli Pettay [:smaug] from comment #52) > nsWindow::OnExposeEvent might be the right place. Yeah, that's right.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 54

•

12 years ago

Managed to finally reproduce with an instrumented build. So we end up painting once or twice a second but top level refreshdriver enters to some kind of loop where it runs all the time (every 16ms). Need to figure out why we call refresh timer all the time, yet don't paint.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

12 years ago

Check whether nsWindow::Invalidate is getting called often enough.

Timothy Nikkel (:tnikkel)

Comment 56

•

12 years ago

I think I have now seen this, on Linux too, first time I've ever seen it though.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 57

•

12 years ago

I've been running a build with some more instrumentation about a week now, but the problem hasn't happened again. I'll keep using that build. http://www.mtv3.fi/uutiset/ is still the testcase for me. Once scrolling on that page becomes very slow, I know I'm seeing the problem.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 58

•

12 years ago

Got it again. Managed to get into a state where RefreshDriver and Invalidate were called 150 before things were actually painted. So RefreshDriver and Invalidation seem to be working fine. Painting itself seems to be fast. Do we somehow not process messages from gtk2/X?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 59

•

12 years ago

Maybe we need some kind of X event monitoring at this point...

Karl Tomlinson (:karlt)

Comment 60

•

12 years ago

Invalidation is independent from X events. It is managed by GDK which schedules GDK expose events at a lower than default priority so as to coalesce responses to native X11 input and XExpose events. XPCOM events run at default priority, so OnExpose() will only be called if XPCOM has no events pending. If the next tick event is scheduled before the current ticks are run [1], then running the ticks can take up all the available time before the next tick event and the GTK expose events don't get to run. That scenario doesn't sound consistent with the low CPU use reported in comment 45, but perhaps the page faulting explains that. https://hg.mozilla.org/mozilla-central/rev/513ec84b5c88#l1.146

Avi Halachmi (:avih)

Comment 61

•

11 years ago

Bug 880036 shows paint handler starvation when consecutive refresh driver iterations take (even slightly) more than the optimal refresh-rate to complete. The result is no screen updates until the refresh driver ticks are quick again, but I observed such total freeze for at most 2 seconds - not intermittent freezes for minutes as with this bug. This bug could be a similar case, or with some pattern of refresh driver intervals which ends in only one every few paint events handled on time.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 62

•

11 years ago

Attached patch patch — Details — Splinter Review

Finally figured out. Looks like the responsiveness can be really horrible if mFavorPerf counting goes wrong. There are plans to properly fix or remove the favor-perf hint, but could we perhaps take this kind of patch for now in order to guarantee that we don't end up executing the favor-perf-mode-'else' all the time. I added 'limit' to cover ContentSink case which may pass 0 as starvation delay. (All the favor perf handling is pretty complete broken. I can see use cases for it, but it needs to be rewritten.) https://tbpl.mozilla.org/?tree=Try&rev=f221b43da133

Assignee: nobody → bugs

Attachment #793428 - Flags: review?(avihpit)

Avi Halachmi (:avih)

Updated

•

11 years ago

Blocks: 880036

Avi Halachmi (:avih)

Comment 63

•

11 years ago

I'm documenting the favor perf thingy as much as I understand it, so it's easier to follow and modify/remove. 1. favorPerf mode means that it prefers handling Gecko events and starves (or process much less) native events. Native events are OS events, including user inputs like mouse and KB events, or paint events, etc. 2. The method FavorPerformanceHint(bool enable, uint duration) only modify member vars, as follows: - Registers "duration" and forgets any previously used/registered duration (mStarvationDelay). - Increase a counter if enable==true, or decrease this counter otherwise (mFavorPerf). - Registers the last timestamp where this method was called with enabled==false (mSwitchTime). 3. Within OnProcessNextEvent(...), the values from (2) are interpreted as follows (isFavorPerfMode doesn't actually exist at the code, but it helps to understand the logic with this pseudo variable): isFavorPerfMode = !(counter <=0 && now > lastDisabledTimestamp + duration) or in other words: isFavorPerfMode = counter > 0 || less than duration passed since FavorPerformanceHint was called with enable=false. So far, we can tell few things from this (too twisted for my taste) logic: - Calls to FavorPerformanceHint should be symmetric. If it's enabled more times than disabled, then it'll get stuck in enabled mode regardless of duration hint (since counter>0 forever). - Regardless of the counter (mFavorPerf), if FavorPerformanceHint was called with enable=false, then it actually enables favorPerf mode for mStarvationDelay. Once isFavorPerfMode is determined, it acts upon it as follows (within OnProcessNextEvent(...) ): if (!isFavorPerfMode) { //normal mode start = now(); do { hasMoreEvents = process native event(); while (hasMoreEvents && now() - start < 20ms); } else { // starve native events if (it's been more than 20ms since we last processed a native event) { process one native event; } } [ more stuff happens later ] Both of these two different usages of 20ms are the same variable "limit", which is set to THREAD_EVENT_STARVATION_LIMIT which is 20ms. My eyes bleed... I hope that the original code knew what it wanted to do, but I can't figure out what it was. The suggested patch changes are: 1. remember mSwitchTime on every call to FavorPerformance hint rather than only when disabling it. 2. Change the logic to isFavorPerfMode= counter>0 && less than duration hint passed since any call to FavorPerformanceHint. So it should definitely limit the cases where favorPerf is true, and will also prevent it from getting stuck at always on. But since I still can't figure out what the original code wanted to do, and with some vague assumption that this favorPerf mode has some value, I can't predict what would be affected by this change, and how (other than the obviously more limited favorPerf mode). tn, any insight on this?

Flags: needinfo?(tnikkel)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 64

•

11 years ago

The patch to this bug is just trying to remove the possibility to get the browser into a state where it is effectively not working at all. The patch tries to be relatively conservative. I hope we can fix favorPerf hint in some other bug (and perhaps land such changes early in a cycle).

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 65

•

11 years ago

mochitests https://tbpl.mozilla.org/?tree=Try&rev=d70e58fca658

Avi Halachmi (:avih)

Comment 66

•

11 years ago

Comment on attachment 793428 [details] [diff] [review] patch Review of attachment 793428 [details] [diff] [review]: ----------------------------------------------------------------- Overall, considering what we suspect, I agree that the patch is conservative and would prevent bad cases of being stuck in favor perf mode. But I also don't understand the original logic enough to approve it within a module which roc owns.

Attachment #793428 - Flags: review?(roc)

Attachment #793428 - Flags: review?(avihpit)

Attachment #793428 - Flags: feedback+

Timothy Nikkel (:tnikkel)

Comment 67

•

11 years ago

Sounds right. smaug's patch should only make a bad situation better.

Flags: needinfo?(tnikkel)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 68

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=ac6571286888

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

11 years ago

Attachment #793428 - Flags: review?(roc) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 69

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ec184d1877cf

Ed Morley [:emorley]

Comment 70

•

11 years ago

Backed out for causing B2G mochitest-3, mochitest-8 & marionette-webapi failures (the same failures that are visible in the comment 68 try run..! ;-)): https://tbpl.mozilla.org/php/getParsedLog.php?id=26866728&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=26867398&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=26867015&tree=Mozilla-Inbound remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/e217b4c95302

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 71

•

11 years ago

crap and odd. Means bugs in other code or in tests :(

Nicolas B. Pierron [:nbp]

Comment 72

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #69) > https://hg.mozilla.org/integration/mozilla-inbound/rev/ec184d1877cf For information, this patch caused on AWFY [1] on kraken benchmarks running in FxOS browser. The kraken harness which is run can be found on [2] my user account (it does not start automatically, you have to click on Begin to start the benchmark). [1] http://arewefastyet.com/#machine=14 [2] http://people.mozilla.com/~npierron/kraken/hosted/

Avi Halachmi (:avih)

Comment 73

•

11 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #72) > For information, this patch caused on AWFY [1] on kraken benchmarks running > in FxOS browser. Caused what on AWFY? (In reply to Olli Pettay [:smaug] from comment #71) > crap and odd. Means bugs in other code or in tests :( Agreed. And considering that: - This system is supposed to be nothing more than an optimization hint, - Things broke after changing it in a way which seemed relatively safe, Unless the things which broke were trivial and/or based on incorrect assumptions and/or based on very tight timing, this system could be much more badly fragile than we thought, which means it could bite us in the future. Guessing further, it's possible that the logic which is described on comment 63 is actually required for proper functioning of some code. This is kinda sad. I honestly still can't figure out what would require such logic (assuming I interpreted it correctly).

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 74

•

11 years ago

Attached patch different approach, don't be in perf mode when user has been active — Details — Splinter Review

This tries to handle the problem in a quite different way - well, I guess this is trying to fix different problem: We should never be in the perf mode when user interacts with the browser. In practice that ends up fixing the slow scrolling problem I've seen. I verified that when running tests for example (without moving mouse), or load some large or slow loading pages we still end up to perf more. Also the stuff in ContentSink is just very broken, as far as I see; PRIntervalTime is uint32_t and PR_IntervalToMicroseconds may overflow :/ But I didn't want to change ContentSink's behavior in this patch. Avih, I think you understand this stuff well enough to give a review ;) (Not that I know anyone who can really understand the craziness of perf mode hint) https://tbpl.mozilla.org/?tree=Try&rev=c99e0998dea3

Attachment #801038 - Flags: review?(avihpit)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 75

•

11 years ago

Try looks reasonable good.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 76

•

11 years ago

Hmm, or file_screenClientXYConst.html fails on WinXP. Let me retrigger.

Timothy Nikkel (:tnikkel)

Comment 77

•

11 years ago

So since there is no user input on our testing or talos infrastructure we are basically going to be running different code for real users and for our testing/perf testing.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 78

•

11 years ago

Er, not WinXP but Win7.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 79

•

11 years ago

(In reply to Timothy Nikkel (:tn) from comment #77) > So since there is no user input on our testing or talos infrastructure we > are basically going to be running different code for real users and for our > testing/perf testing. Well, that happens already now, in certain cases. ContentSink calls FavorPerfHint based on whether there has been user input.

Timothy Nikkel (:tnikkel)

Comment 80

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #79) > (In reply to Timothy Nikkel (:tn) from comment #77) > > So since there is no user input on our testing or talos infrastructure we > > are basically going to be running different code for real users and for our > > testing/perf testing. > Well, that happens already now, in certain cases. ContentSink calls > FavorPerfHint based on whether > there has been user input. True, this just expands on that. What I'm worried about is I did some testing that shows that turning off favor perf mode in the docshell code was a talos hit. So since probably the majority of page loads are triggered by user input (?) this patch will make all page loads happen without favor perf mode. So users will see a decrease in page load performance that we are essentially hiding from our automated infrastructure.

Avi Halachmi (:avih)

Comment 81

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #75) > Try looks reasonable good. Compare talos to latest m-c shows several big regressions and few improvements: http://compare-talos.mattn.ca/?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true Compare talos to a revision which is hopefully closer to your base (not easy finding...): http://compare-talos.mattn.ca/?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true Shows much less regressions, with the notable ones: robocop pan is 300% regression on android 2.2 and 35% improvement on 4.0, and svg opacity is 30% regression on ubuntu 64. Are these regressions expected in any way? But I might have got the base cset wrong, so it could be better if you push a try without your patch to have as reference. As for(In reply to Timothy Nikkel (:tn) from comment #80) > True, this just expands on that. What I'm worried about is I did some > testing that shows that turning off favor perf mode in the docshell code was > a talos hit. So since probably the majority of page loads are triggered by > user input (?) this patch will make all page loads happen without favor perf > mode. So users will see a decrease in page load performance that we are > essentially hiding from our automated infrastructure. I think that by now we're expecting this hit and know what it should mean. We should still verify that it means what we think it means. E.g. tscrollx and tsvgx had better (lower) results when the favor perf hint was not set, and then they "Regressed" when I set the hint to 1ms. But If I looked at the screen, before using the pref the content wasn't actually scrolling or animating, while with the pref set to 1 it was scrolling and animating. So this is a good verification that the regression in the numbers is not a real regression, since we changed the test to do more (paint as well), which is something which we do want to measure, hence the result is just better now than before, even if it appears as a regression. So we should do the same kind of verification IMO on other tests which show regression when the favor mode gets disabled/modified.

Timothy Nikkel (:tnikkel)

Comment 82

•

11 years ago

(In reply to Avi Halachmi (:avih) from comment #81) > Compare talos to latest m-c shows several big regressions and few > improvements: > http://compare-talos.mattn.ca/ > ?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true It can be hard to get accurate results with only one talos run before and after. > As for(In reply to Timothy Nikkel (:tn) from comment #80) > > True, this just expands on that. What I'm worried about is I did some > > testing that shows that turning off favor perf mode in the docshell code was > > a talos hit. So since probably the majority of page loads are triggered by > > user input (?) this patch will make all page loads happen without favor perf > > mode. So users will see a decrease in page load performance that we are > > essentially hiding from our automated infrastructure. > > I think that by now we're expecting this hit and know what it should mean. > We should still verify that it means what we think it means. If we are expecting this hit and have decided that we want to take it then let's just take it (by killing favor perf mode) and not artificially hide it from our automated performance testing infrastructure.

Avi Halachmi (:avih)

Comment 83

•

11 years ago

(In reply to Timothy Nikkel (:tn) from comment #82) > It can be hard to get accurate results with only one talos run before and > after. Yup, but the 300% robopan regression seems outside the noise (though it is quite noisy). > If we are expecting this hit and have decided that we want to take it then > let's just take it (by killing favor perf mode) and not artificially hide it > from our automated performance testing infrastructure. When I said we I meant you, smaug, roc probably, and myself, which understand what it means and think (at least me, but I guess you as well) that the favor perf mode either causes real bad behavior (users see hang) or just causes incorrect results compared to what users observe (because it's avoided following some UI events, but is entered within talos). If we all agree on this, and assuming that ripping favor perf mode out doesn't have other kinds of regressions, then I don't expect objections coming from other places, even if the talos numbers get worse. Do you think otherwise? And indeed, I suggested to smaug to just rip it off instead of these workarounds. I guess he wants the scroll thing fixed, and ripping it out completely could take longer. I could understand that.

Avi Halachmi (:avih)

Comment 84

•

11 years ago

Let me rephrase. I think the major point we should get a consensus on is whether or not the intended goal of the favor perf mode is still valid today. If we agree that it's not a good tradeoff today, because starving OS events for Xs during page load is overall worse than some regression in page load times, then I don't think the talos numbers would be an issue, because they would merely reflect the agreed change in behavior.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 85

•

11 years ago

Uh, that regression is very surprising. I would accept such regression for users if it improves responsiveness, but I don't understand why it happens in testing. I think we should keep perf mode, but if user is actually interacting with the browser, user's input should be processed ASAP. (But the current situation where we may end up in perf mode forever and effectively making browser non-working is not acceptable.)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

11 years ago

Attachment #801038 - Flags: review?(avihpit)

Avi Halachmi (:avih)

Comment 86

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #85) > I think we should keep perf mode, but if user is actually interacting with > the browser, user's > input should be processed ASAP. This is debatable IMO (I don't have a clear stance on this yet). While the goal of rendering less intermediate states during page load for the gain of overall quicker final page rendering is not without merit, the current favor-perf mode (even if not buggy) is not necessarily the best path to this goal IMO. For instance, we could instead lower the frequency of content flushing during page load, while still flush chrome as frequently as needed. Also, looking at a page loading incrementally instead of 2s wait without any screen updates is not necessarily a bad thing, especially if the overall cost of the final rendering delay is small enough. Also, the current favor-perf mode, even if not buggy and even if disabled on user inputs (which is questionable, since we don't process user input during this mode, so will we detect an input event which should abort the mode?), would still hurt animation which should animate regardless of user input.

Avi Halachmi (:avih)

Comment 87

•

11 years ago

> Olli Pettay [:smaug] 2013-09-07 13:37:27 PDT > Attachment #801038 [details] [diff] - [canceled] Flags: review?(avihpit@yahoo.com) I'm not against reviewing and landing this approach for now. Just didn't get to it yet.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 88

•

11 years ago

Attached patch balanced_favor_perf.diff — Details — Splinter Review

Approach 3. Make sure the hint is reset at some point, at latest around 6 seconds after setting (I just picked up that 6 to be long enough to load some pages, but not too long to irritate users). In case of DocShell it means 6 + 2 seconds from the beginning of page load when we at latest switch to non-perf mode. Unfortunately b2g requires perf mode to be on all the time (fabrice said the perf was significantly with mode '0', currently it is '2'), so ContentSink looks a bit ugly. And the new API isn't super nice either, but at least it is easy to use. Crossing fingers https://tbpl.mozilla.org/?tree=Try&rev=b8d3ee8df0db

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 89

•

11 years ago

Attached patch +tweak to enable perf hint in the active tab only — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=2355fa935933

Timothy Nikkel (:tnikkel)

Comment 90

•

11 years ago

Looks like those latest two try runs are suffering from similar failures as the patch in bug 906811. But see bug 906811 comment 19 where I say that your first patch in this bug fixes those failures. That first patch only had 2-3 b2g failures. Maybe those are just flakey tests?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 91

•

11 years ago

Well, all the test failures are bugs in tests in this case. Tests rely on certain order in gecko and OS events, and that order is in any case somewhat random in practice. Performance regressions are different issue.

Avi Halachmi (:avih)

Comment 92

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #91) > Well, all the test failures are bugs in tests in this case. Tests rely on > certain order > in gecko and OS events, and that order is in any case somewhat random in > practice. This encouraging in this bug, but not in general... > Performance regressions are different issue. Yeah, we will need to verify what do each of the regressions mean, preferably while also looking at the screen during the tests and validating the results. If there's something I know already, it's that we can't trust the numbers blindly, pun intended.

Timothy Nikkel (:tnikkel)

Comment 93

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #88) > Unfortunately b2g requires perf mode to be on all the time (fabrice said the > perf was significantly with mode '0', currently it is '2'), so ContentSink > looks a bit ugly. We have someone who knows something concrete about the perf implications of favor perf mode?? Hooray!

Avi Halachmi (:avih)

Comment 94

•

11 years ago

Apparently, robocop pan has been behaving wildly recently and way to noisy to indicate any regression (you were right tn). Bug 908823 : http://graphs.mozilla.org/graph.html#tests=[[174,64,20]]&sel=none&displayrange=30&datatype=running

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Whiteboard: [c=handeye p= s= u=]

Avi Halachmi (:avih)

Updated

•

11 years ago

Depends on: 930793

Comment 95

•

11 years ago

Since favor perf mode has been removed (bug 930793), can you please retry with your machines?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 96

•

11 years ago

Yup, this is totally fixed now. I'd like to wait for few days before closing this to get feedback about bug 930793.

Flags: needinfo?(bugs)

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Comment 97

•

11 years ago

Is this issue something that would currently affect performance on B2G? After the many comments and bug 930793 being reopened, we're a little unclear on the impact and current status. Can this be closed out?

Flags: needinfo?(bugs)

Priority: -- → P3

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 98

•

11 years ago

This should not be closed. I don't know how this - well bug 930793, affects to b2g.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 99

•

3 years ago

Scrolling is now quite different, and smoother, with APZ.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 3 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.