Closed
Bug 822096
Opened 12 years ago
Closed 3 years ago
Scrolling is jank-y
Categories
(Core :: Graphics, defect, P3)
Tracking
()
People
(Reporter: smaug, Assigned: smaug)
References
Details
(Keywords: perf, Whiteboard: [c=handeye p= s= u=])
Attachments
(5 files)
3.75 MB,
text/xml
|
Details | |
1.58 KB,
patch
|
roc
:
review+
avih
:
feedback+
|
Details | Diff | Splinter Review |
6.54 KB,
patch
|
Details | Diff | Splinter Review | |
20.48 KB,
patch
|
Details | Diff | Splinter Review | |
20.54 KB,
patch
|
Details | Diff | Splinter Review |
I believe this is a regression.
Scrolling isn't smooth in a window which has many tabs. Especially pages
which have many images are slow to scroll.
Most of the time is spent under nsRefreshDriver::Tick / nsViewManager::ProcessPendingUpdatesForViews
This is on an ultrabook running Fedora 17.
Could be, I changed a bunch of stuff here in the refresh driver recently. Any suggested steps to reproduce? Is opening up a bunch of tabs (10? 20? 50?) and then going to one that has a lot of images (e.g. big picture blog page? something more?) has janky scrolling?
Assignee | ||
Comment 2•12 years ago
|
||
I have 35 tabs. Most of them are bugzilla pages, but also some news sites and Facebook and
etherpad. But an odd thing is that the slowness disappears at some point.
After FF restart it is there again, and then after x minutes (I guess x is more than 10) scrolling
is back to normal. Really odd.
When the slowness happens, it is most visible in http://www.mtv3.fi/uutiset/ (Finnish news site)
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Assignee | ||
Comment 3•12 years ago
|
||
This is more generic. And I see this even in FF18. FF17 is ok. I'll try to find regression range.
I don't know why it shows up in my main profile.
Assignee | ||
Comment 4•12 years ago
|
||
I'll try to find the regression range tomorrow.
Assignee | ||
Comment 5•12 years ago
|
||
Hmm, perhaps there are few different regressions, but the really bad one is recent.
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4dfe323a663d&tochange=634180132e68
Assignee | ||
Comment 6•12 years ago
|
||
Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #3)
> This is more generic. And I see this even in FF18. FF17 is ok.
So this it not true. Scrolling is better in FF17, but it is still usable in FF18.
Assignee | ||
Comment 8•12 years ago
|
||
Ah, so there is also bug 799242 in the regression range.
I would be surprised if this wasn't related to bug 731974, which directly touched the refresh driver (which caused bug 799242 and was fixed the next day). I didn't get a chance to look at this today, but I will in the morning.
One more question -- how are you scrolling? How do you determine slow scrolling? Just moving the scrollbar up and down, using the mouse wheel, spacebar, what? (If wheel, is pixel scrolling/smooth scrolling enabled?)
Assignee | ||
Comment 10•12 years ago
|
||
I scroll using touchpad's two finger scrolling. Testcase is to load http://www.mtv3.fi/uutiset/
and try to scroll it down. If first scroll-down takes 0.5 - 1s and is not smooth at all, the bug
is there.
Smooth scrolling is activated.
Assignee | ||
Comment 11•12 years ago
|
||
Another testcase is MoCo phonebook. Both the phonebook and http://www.mtv3.fi/uutiset/ have somewhat
similar background (gradient or large image?). I wonder if that has something to do with this.
Hmmm. I can't reproduce this on Windows. Will ask around for a linux laptop.
Can you install the profiler (see https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler) and grab a profile immediately after you try and get the janky scrolling? Then upload it and post the link here. It might give us some info.
Comment 14•12 years ago
|
||
I can't reproduce on Linux x86-64, NVIDIA driver, default profile, opening a few dozen bugzilla tabs and the testcase in comment 11.
Assignee | ||
Comment 15•12 years ago
|
||
Assignee | ||
Comment 16•12 years ago
|
||
Don't even try to open that profile in browser. It should be opened using sysprof.
Comment 17•12 years ago
|
||
This page uses a fixed-position background that isn't on the css canvas (it's on a div on top).
We don't ever put these into their own layer (because of their potential to break async scrolling - they don't necessarily cover the entire viewport), and instead have to always repaint everything on scroll since the text has moved relative to the background gradient.
I think the regression range here only applies to BasicLayers, where bug 806099 changed how we choose the active scrolled root for flattened layers. We might have been getting away with only repainting the content and not the background previously.
Assignee | ||
Comment 18•12 years ago
|
||
The laptop on which I see the problem is an ultrabook.
about:support says
Adapter Description Tungsten Graphics, Inc -- Mesa DRI Intel(R) Ivybridge Mobile
Device ID Mesa DRI Intel(R) Ivybridge Mobile
Driver Version2.1 Mesa 8.0.4
GPU Accelerated Windows 0/2 Basic
AzureCanvasBackend cairo
AzureContentBackend none
AzureFallbackCanvasBackend none
Does that mean Gecko ends up using BasicLayers?
Assignee | ||
Comment 19•12 years ago
|
||
Looks like bug 806099 can't be backed out cleanly.
Assignee | ||
Comment 20•12 years ago
|
||
Ah, maybe the check in patch is possible to back out, not the patch in the bug.
Assignee | ||
Comment 21•12 years ago
|
||
But bug 806099 doesn't seem to have caused the problem
Comment 22•12 years ago
|
||
It doesn't? That is unexpected.
Probably worth doing a local bisect to find out what actually caused this.
Assignee | ||
Comment 23•12 years ago
|
||
Yeah, I'm trying to find what caused the regression.
Compiling takes time on this machine.
On a whim, can you try the build in http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vladimir@pobox.com-fadac9dbe2a8/ ?
Assignee | ||
Comment 25•12 years ago
|
||
That tryserver build doesn't behave any better :(
I'm compiling new builds ...
Assignee | ||
Comment 26•12 years ago
|
||
Ok, finally. It is bug 799242 which caused this.
Haven't debugged yet at all what is happening.
Blocks: 799242
bug 799242? not bug 731974? I don't think 799242 ever landed by itself even...
Assignee | ||
Comment 28•12 years ago
|
||
Er, sorry, bug 731974. Copy pasted the wrong bug number from the commit message.
Ok; not too surprising that that bug causes it.. but I still can't reproduce here :( I would do any testing along with the patch from bug 823091 though.
Assignee | ||
Comment 30•12 years ago
|
||
So the bug is odd. I start FF with some tabs and one of them is http://www.mtv3.fi/uutiset/
The jankyness may not start immediately when in process of restoring the session
(I have the setting to load all the tabs after restart), but once all the tabs have been loaded,
scrolling is very slow. It is perhaps slow on all the pages, but especially with http://www.mtv3.fi/uutiset/
Then, much later the scrolling may get smooth again, or may not.
Assignee | ||
Comment 31•12 years ago
|
||
I don't know what has fixed this, but I haven't seen this for few days.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Comment 32•12 years ago
|
||
Er, very much still there with https://phonebook.mozilla.org/tree.php
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee | ||
Comment 33•12 years ago
|
||
And once the problem starts, it affects to all the pages to some extent.
Is it possible that the refreshdriver gets mad.
When the problem happens http://people.mozilla.org/~wmccloskey/example1.html reports
normal numbers, but painting is very janky. I mean the red dot is moved once a second or so.
Assignee | ||
Updated•12 years ago
|
tracking-firefox20:
--- → ?
tracking-firefox21:
--- → ?
Summary: Scrolling is jank-y when the window has many tabs → Scrolling is jank-y
Log calls to nsRefreshDriver::Tick and make sure it's running frequently enough?
Assignee | ||
Comment 35•12 years ago
|
||
Based on http://people.mozilla.org/~wmccloskey/example1.html it is running (once the page is painted
the black line shows usual numbers, so animationframe callback got called normally).
Log nsWindow paint (NS_PAINT) events to make sure the window is being redrawn regularly too?
Updated•12 years ago
|
status-firefox20:
--- → affected
status-firefox21:
--- → affected
Assignee | ||
Comment 37•12 years ago
|
||
I get this on another linux laptop too. Doesn't happen too often, but happens.
Roc, how can I log nsWindow paint events?
instrument nsWindow::OnPaint in nsWindowGfx?
Comment 39•12 years ago
|
||
Given how hard this is to reproduce and that it seems to only affect Linux, we'll remove tracking but uplift nominations when a fix is found would be welcome.
Assignee | ||
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Assignee | ||
Comment 40•12 years ago
|
||
One thing I noticed today when profiling is that when the browser entered to a bad state, things
under gfxAlphaBoxBlur::Paint started to page fault a lot.
Low on memory?
Comment 42•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #0)
> I believe this is a regression.
> Scrolling isn't smooth in a window which has many tabs. Especially pages
> which have many images are slow to scroll.
> Most of the time is spent under nsRefreshDriver::Tick /
> nsViewManager::ProcessPendingUpdatesForViews
>
> This is on an ultrabook running Fedora 17.
Could you please refine your description of "janky" and "slow"? Would you say that it seems like the refresh rate has gone down (say, to 10-20 fps) but otherwise distance/sec didn't change? or does it introduce hangs (200ms? 500ms? ...)? or does the scroll distance get shorter compared to when it's working well? What's the cpu/cores utilization while scrolling is janky?
I'm asking since I possibly noticed a similar issue (on windows xp, 7), though it started quite some time ago, probably at least 2 years back. My symptom was that the actual refresh rate would suddenly go down to 10-20 fps, on pages where I know it can scroll much closer to 60 (it was mostly happening to me on a specific vBulletin forum). And to me as well, I had many (20+) tabs open, it was intermittent, and restarting the browser solved it.
While I haven't experienced it recently, your symptoms do remind me mine.
Assignee | ||
Comment 43•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #41)
> Low on memory?
No, which is why the profile was a bit odd. But that is what sysprof told me... not sure how reliable it is in this case.
This is happening (rarely) also on this fast i7-Qx laptop, running Fedora 18.
When the problem happens fps on pages like http://www.mtv3.fi/uutiset/ goes to perhaps 2fps
Assignee | ||
Comment 44•12 years ago
|
||
For some reason this has happened 5 times today.
Assignee | ||
Comment 45•12 years ago
|
||
Got it again. CPU doesn't seem to be high. So, since this is a regression from RefreshDriver changes,
I assume we somehow get to a state where RefreshDriver fires way too rarely.
Assignee | ||
Comment 46•12 years ago
|
||
Or, no, based on http://people.mozilla.org/~wmccloskey/example1.html animation frame callback is
called often enough, but painting doesn't happen every time.
try comment #38?
Assignee | ||
Comment 48•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> instrument nsWindow::OnPaint in nsWindowGfx?
what kind of instrumentation would be useful?
(I'm getting the problem again somewhat regularly today.)
Comment 49•12 years ago
|
||
Number of milliseconds between paints maybe?
That would work. Or just printf something when you hit OnPaint and see if, while you see rendering frozen, we're still hitting OnPaint or not.
Assignee | ||
Comment 51•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> instrument nsWindow::OnPaint in nsWindowGfx?
I assume not there since that is Windows thingie, and I'm seeing this on linux.
Assignee | ||
Comment 52•12 years ago
|
||
nsWindow::OnExposeEvent might be the right place.
Comment 53•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #52)
> nsWindow::OnExposeEvent might be the right place.
Yeah, that's right.
Assignee | ||
Comment 54•12 years ago
|
||
Managed to finally reproduce with an instrumented build.
So we end up painting once or twice a second but top level refreshdriver enters to some kind of loop
where it runs all the time (every 16ms).
Need to figure out why we call refresh timer all the time, yet don't paint.
Check whether nsWindow::Invalidate is getting called often enough.
Comment 56•12 years ago
|
||
I think I have now seen this, on Linux too, first time I've ever seen it though.
Assignee | ||
Comment 57•12 years ago
|
||
I've been running a build with some more instrumentation about a week now, but
the problem hasn't happened again. I'll keep using that build.
http://www.mtv3.fi/uutiset/ is still the testcase for me. Once scrolling on that page
becomes very slow, I know I'm seeing the problem.
Assignee | ||
Comment 58•12 years ago
|
||
Got it again. Managed to get into a state where RefreshDriver and Invalidate were called 150 before things were actually painted. So RefreshDriver and Invalidation seem to be working fine.
Painting itself seems to be fast.
Do we somehow not process messages from gtk2/X?
Maybe we need some kind of X event monitoring at this point...
Comment 60•12 years ago
|
||
Invalidation is independent from X events. It is managed by GDK which schedules
GDK expose events at a lower than default priority so as to coalesce responses
to native X11 input and XExpose events. XPCOM events run at default priority,
so OnExpose() will only be called if XPCOM has no events pending.
If the next tick event is scheduled before the current ticks are run [1], then running the ticks can take up all the available time before the next tick event
and the GTK expose events don't get to run.
That scenario doesn't sound consistent with the low CPU use reported in
comment 45, but perhaps the page faulting explains that.
https://hg.mozilla.org/mozilla-central/rev/513ec84b5c88#l1.146
Comment 61•11 years ago
|
||
Bug 880036 shows paint handler starvation when consecutive refresh driver iterations take (even slightly) more than the optimal refresh-rate to complete. The result is no screen updates until the refresh driver ticks are quick again, but I observed such total freeze for at most 2 seconds - not intermittent freezes for minutes as with this bug.
This bug could be a similar case, or with some pattern of refresh driver intervals which ends in only one every few paint events handled on time.
Assignee | ||
Comment 62•11 years ago
|
||
Finally figured out. Looks like the responsiveness can be really horrible if
mFavorPerf counting goes wrong.
There are plans to properly fix or remove the favor-perf hint, but could we
perhaps take this kind of patch for now in order to guarantee that we don't
end up executing the favor-perf-mode-'else' all the time.
I added 'limit' to cover ContentSink case which may pass 0 as starvation delay.
(All the favor perf handling is pretty complete broken. I can see use cases for it,
but it needs to be rewritten.)
https://tbpl.mozilla.org/?tree=Try&rev=f221b43da133
Assignee: nobody → bugs
Attachment #793428 -
Flags: review?(avihpit)
Comment 63•11 years ago
|
||
I'm documenting the favor perf thingy as much as I understand it, so it's easier to follow and modify/remove.
1. favorPerf mode means that it prefers handling Gecko events and starves (or process much less) native events. Native events are OS events, including user inputs like mouse and KB events, or paint events, etc.
2. The method FavorPerformanceHint(bool enable, uint duration) only modify member vars, as follows:
- Registers "duration" and forgets any previously used/registered duration (mStarvationDelay).
- Increase a counter if enable==true, or decrease this counter otherwise (mFavorPerf).
- Registers the last timestamp where this method was called with enabled==false (mSwitchTime).
3. Within OnProcessNextEvent(...), the values from (2) are interpreted as follows (isFavorPerfMode doesn't actually exist at the code, but it helps to understand the logic with this pseudo variable):
isFavorPerfMode = !(counter <=0 && now > lastDisabledTimestamp + duration)
or in other words:
isFavorPerfMode = counter > 0 || less than duration passed since FavorPerformanceHint was called with enable=false.
So far, we can tell few things from this (too twisted for my taste) logic:
- Calls to FavorPerformanceHint should be symmetric. If it's enabled more times than disabled, then it'll get stuck in enabled mode regardless of duration hint (since counter>0 forever).
- Regardless of the counter (mFavorPerf), if FavorPerformanceHint was called with enable=false, then it actually enables favorPerf mode for mStarvationDelay.
Once isFavorPerfMode is determined, it acts upon it as follows (within OnProcessNextEvent(...) ):
if (!isFavorPerfMode) { //normal mode
start = now();
do {
hasMoreEvents = process native event();
while (hasMoreEvents && now() - start < 20ms);
} else { // starve native events
if (it's been more than 20ms since we last processed a native event) {
process one native event;
}
}
[ more stuff happens later ]
Both of these two different usages of 20ms are the same variable "limit", which is set to THREAD_EVENT_STARVATION_LIMIT which is 20ms.
My eyes bleed... I hope that the original code knew what it wanted to do, but I can't figure out what it was.
The suggested patch changes are:
1. remember mSwitchTime on every call to FavorPerformance hint rather than only when disabling it.
2. Change the logic to isFavorPerfMode= counter>0 && less than duration hint passed since any call to FavorPerformanceHint.
So it should definitely limit the cases where favorPerf is true, and will also prevent it from getting stuck at always on. But since I still can't figure out what the original code wanted to do, and with some vague assumption that this favorPerf mode has some value, I can't predict what would be affected by this change, and how (other than the obviously more limited favorPerf mode).
tn, any insight on this?
Flags: needinfo?(tnikkel)
Assignee | ||
Comment 64•11 years ago
|
||
The patch to this bug is just trying to remove the possibility to get the browser into a state
where it is effectively not working at all. The patch tries to be relatively conservative.
I hope we can fix favorPerf hint in some other bug (and perhaps land such changes early in a cycle).
Assignee | ||
Comment 65•11 years ago
|
||
Comment 66•11 years ago
|
||
Comment on attachment 793428 [details] [diff] [review]
patch
Review of attachment 793428 [details] [diff] [review]:
-----------------------------------------------------------------
Overall, considering what we suspect, I agree that the patch is conservative and would prevent bad cases of being stuck in favor perf mode. But I also don't understand the original logic enough to approve it within a module which roc owns.
Attachment #793428 -
Flags: review?(roc)
Attachment #793428 -
Flags: review?(avihpit)
Attachment #793428 -
Flags: feedback+
Comment 67•11 years ago
|
||
Sounds right. smaug's patch should only make a bad situation better.
Flags: needinfo?(tnikkel)
Assignee | ||
Comment 68•11 years ago
|
||
Attachment #793428 -
Flags: review?(roc) → review+
Assignee | ||
Comment 69•11 years ago
|
||
Comment 70•11 years ago
|
||
Backed out for causing B2G mochitest-3, mochitest-8 & marionette-webapi failures (the same failures that are visible in the comment 68 try run..! ;-)):
https://tbpl.mozilla.org/php/getParsedLog.php?id=26866728&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=26867398&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=26867015&tree=Mozilla-Inbound
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/e217b4c95302
Assignee | ||
Comment 71•11 years ago
|
||
crap and odd. Means bugs in other code or in tests :(
Comment 72•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #69)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/ec184d1877cf
For information, this patch caused on AWFY [1] on kraken benchmarks running in FxOS browser.
The kraken harness which is run can be found on [2] my user account (it does not start automatically, you have to click on Begin to start the benchmark).
[1] http://arewefastyet.com/#machine=14
[2] http://people.mozilla.com/~npierron/kraken/hosted/
Comment 73•11 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #72)
> For information, this patch caused on AWFY [1] on kraken benchmarks running
> in FxOS browser.
Caused what on AWFY?
(In reply to Olli Pettay [:smaug] from comment #71)
> crap and odd. Means bugs in other code or in tests :(
Agreed. And considering that:
- This system is supposed to be nothing more than an optimization hint,
- Things broke after changing it in a way which seemed relatively safe,
Unless the things which broke were trivial and/or based on incorrect assumptions and/or based on very tight timing, this system could be much more badly fragile than we thought, which means it could bite us in the future.
Guessing further, it's possible that the logic which is described on comment 63 is actually required for proper functioning of some code. This is kinda sad. I honestly still can't figure out what would require such logic (assuming I interpreted it correctly).
Assignee | ||
Comment 74•11 years ago
|
||
This tries to handle the problem in a quite different way - well, I guess this is trying to fix different problem: We should never be in the perf mode when user interacts with
the browser. In practice that ends up fixing the slow scrolling problem I've seen.
I verified that when running tests for example (without moving mouse), or load some large or slow loading pages we still end up to perf more.
Also the stuff in ContentSink is just very broken, as far as I see;
PRIntervalTime is uint32_t and PR_IntervalToMicroseconds may overflow :/
But I didn't want to change ContentSink's behavior in this patch.
Avih, I think you understand this stuff well enough to give a review ;)
(Not that I know anyone who can really understand the craziness of perf mode hint)
https://tbpl.mozilla.org/?tree=Try&rev=c99e0998dea3
Attachment #801038 -
Flags: review?(avihpit)
Assignee | ||
Comment 75•11 years ago
|
||
Try looks reasonable good.
Assignee | ||
Comment 76•11 years ago
|
||
Hmm, or file_screenClientXYConst.html fails on WinXP.
Let me retrigger.
Comment 77•11 years ago
|
||
So since there is no user input on our testing or talos infrastructure we are basically going to be running different code for real users and for our testing/perf testing.
Assignee | ||
Comment 78•11 years ago
|
||
Er, not WinXP but Win7.
Assignee | ||
Comment 79•11 years ago
|
||
(In reply to Timothy Nikkel (:tn) from comment #77)
> So since there is no user input on our testing or talos infrastructure we
> are basically going to be running different code for real users and for our
> testing/perf testing.
Well, that happens already now, in certain cases. ContentSink calls FavorPerfHint based on whether
there has been user input.
Comment 80•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #79)
> (In reply to Timothy Nikkel (:tn) from comment #77)
> > So since there is no user input on our testing or talos infrastructure we
> > are basically going to be running different code for real users and for our
> > testing/perf testing.
> Well, that happens already now, in certain cases. ContentSink calls
> FavorPerfHint based on whether
> there has been user input.
True, this just expands on that. What I'm worried about is I did some testing that shows that turning off favor perf mode in the docshell code was a talos hit. So since probably the majority of page loads are triggered by user input (?) this patch will make all page loads happen without favor perf mode. So users will see a decrease in page load performance that we are essentially hiding from our automated infrastructure.
Comment 81•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #75)
> Try looks reasonable good.
Compare talos to latest m-c shows several big regressions and few improvements: http://compare-talos.mattn.ca/?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true
Compare talos to a revision which is hopefully closer to your base (not easy finding...): http://compare-talos.mattn.ca/?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true
Shows much less regressions, with the notable ones: robocop pan is 300% regression on android 2.2 and 35% improvement on 4.0, and svg opacity is 30% regression on ubuntu 64. Are these regressions expected in any way?
But I might have got the base cset wrong, so it could be better if you push a try without your patch to have as reference.
As for(In reply to Timothy Nikkel (:tn) from comment #80)
> True, this just expands on that. What I'm worried about is I did some
> testing that shows that turning off favor perf mode in the docshell code was
> a talos hit. So since probably the majority of page loads are triggered by
> user input (?) this patch will make all page loads happen without favor perf
> mode. So users will see a decrease in page load performance that we are
> essentially hiding from our automated infrastructure.
I think that by now we're expecting this hit and know what it should mean. We should still verify that it means what we think it means.
E.g. tscrollx and tsvgx had better (lower) results when the favor perf hint was not set, and then they "Regressed" when I set the hint to 1ms. But If I looked at the screen, before using the pref the content wasn't actually scrolling or animating, while with the pref set to 1 it was scrolling and animating.
So this is a good verification that the regression in the numbers is not a real regression, since we changed the test to do more (paint as well), which is something which we do want to measure, hence the result is just better now than before, even if it appears as a regression.
So we should do the same kind of verification IMO on other tests which show regression when the favor mode gets disabled/modified.
Comment 82•11 years ago
|
||
(In reply to Avi Halachmi (:avih) from comment #81)
> Compare talos to latest m-c shows several big regressions and few
> improvements:
> http://compare-talos.mattn.ca/
> ?oldRevs=655ac375b1c7&newRev=c99e0998dea3&submit=true
It can be hard to get accurate results with only one talos run before and after.
> As for(In reply to Timothy Nikkel (:tn) from comment #80)
> > True, this just expands on that. What I'm worried about is I did some
> > testing that shows that turning off favor perf mode in the docshell code was
> > a talos hit. So since probably the majority of page loads are triggered by
> > user input (?) this patch will make all page loads happen without favor perf
> > mode. So users will see a decrease in page load performance that we are
> > essentially hiding from our automated infrastructure.
>
> I think that by now we're expecting this hit and know what it should mean.
> We should still verify that it means what we think it means.
If we are expecting this hit and have decided that we want to take it then let's just take it (by killing favor perf mode) and not artificially hide it from our automated performance testing infrastructure.
Comment 83•11 years ago
|
||
(In reply to Timothy Nikkel (:tn) from comment #82)
> It can be hard to get accurate results with only one talos run before and
> after.
Yup, but the 300% robopan regression seems outside the noise (though it is quite noisy).
> If we are expecting this hit and have decided that we want to take it then
> let's just take it (by killing favor perf mode) and not artificially hide it
> from our automated performance testing infrastructure.
When I said we I meant you, smaug, roc probably, and myself, which understand what it means and think (at least me, but I guess you as well) that the favor perf mode either causes real bad behavior (users see hang) or just causes incorrect results compared to what users observe (because it's avoided following some UI events, but is entered within talos).
If we all agree on this, and assuming that ripping favor perf mode out doesn't have other kinds of regressions, then I don't expect objections coming from other places, even if the talos numbers get worse. Do you think otherwise?
And indeed, I suggested to smaug to just rip it off instead of these workarounds. I guess he wants the scroll thing fixed, and ripping it out completely could take longer. I could understand that.
Comment 84•11 years ago
|
||
Let me rephrase. I think the major point we should get a consensus on is whether or not the intended goal of the favor perf mode is still valid today.
If we agree that it's not a good tradeoff today, because starving OS events for Xs during page load is overall worse than some regression in page load times, then I don't think the talos numbers would be an issue, because they would merely reflect the agreed change in behavior.
Assignee | ||
Comment 85•11 years ago
|
||
Uh, that regression is very surprising. I would accept such regression for users if it improves
responsiveness, but I don't understand why it happens in testing.
I think we should keep perf mode, but if user is actually interacting with the browser, user's
input should be processed ASAP.
(But the current situation where we may end up in perf mode forever and effectively making
browser non-working is not acceptable.)
Assignee | ||
Updated•11 years ago
|
Attachment #801038 -
Flags: review?(avihpit)
Comment 86•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #85)
> I think we should keep perf mode, but if user is actually interacting with
> the browser, user's
> input should be processed ASAP.
This is debatable IMO (I don't have a clear stance on this yet). While the goal of rendering less intermediate states during page load for the gain of overall quicker final page rendering is not without merit, the current favor-perf mode (even if not buggy) is not necessarily the best path to this goal IMO.
For instance, we could instead lower the frequency of content flushing during page load, while still flush chrome as frequently as needed. Also, looking at a page loading incrementally instead of 2s wait without any screen updates is not necessarily a bad thing, especially if the overall cost of the final rendering delay is small enough.
Also, the current favor-perf mode, even if not buggy and even if disabled on user inputs (which is questionable, since we don't process user input during this mode, so will we detect an input event which should abort the mode?), would still hurt animation which should animate regardless of user input.
Comment 87•11 years ago
|
||
> Olli Pettay [:smaug] 2013-09-07 13:37:27 PDT
> Attachment #801038 [details] [diff] - [canceled] Flags: review?(avihpit@yahoo.com)
I'm not against reviewing and landing this approach for now. Just didn't get to it yet.
Assignee | ||
Comment 88•11 years ago
|
||
Approach 3. Make sure the hint is reset at some point, at latest around 6 seconds
after setting (I just picked up that 6 to be long enough to load some pages, but not too long to irritate users). In case of DocShell it means 6 + 2 seconds from the beginning of page load when we at latest switch to non-perf mode.
Unfortunately b2g requires perf mode to be on all the time (fabrice said the
perf was significantly with mode '0', currently it is '2'), so ContentSink
looks a bit ugly.
And the new API isn't super nice either, but at least it is easy to use.
Crossing fingers
https://tbpl.mozilla.org/?tree=Try&rev=b8d3ee8df0db
Assignee | ||
Comment 89•11 years ago
|
||
Comment 90•11 years ago
|
||
Looks like those latest two try runs are suffering from similar failures as the patch in bug 906811.
But see bug 906811 comment 19 where I say that your first patch in this bug fixes those failures.
That first patch only had 2-3 b2g failures. Maybe those are just flakey tests?
Assignee | ||
Comment 91•11 years ago
|
||
Well, all the test failures are bugs in tests in this case. Tests rely on certain order
in gecko and OS events, and that order is in any case somewhat random in practice.
Performance regressions are different issue.
Comment 92•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #91)
> Well, all the test failures are bugs in tests in this case. Tests rely on
> certain order
> in gecko and OS events, and that order is in any case somewhat random in
> practice.
This encouraging in this bug, but not in general...
> Performance regressions are different issue.
Yeah, we will need to verify what do each of the regressions mean, preferably while also looking at the screen during the tests and validating the results.
If there's something I know already, it's that we can't trust the numbers blindly, pun intended.
Comment 93•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #88)
> Unfortunately b2g requires perf mode to be on all the time (fabrice said the
> perf was significantly with mode '0', currently it is '2'), so ContentSink
> looks a bit ugly.
We have someone who knows something concrete about the perf implications of favor perf mode?? Hooray!
Comment 94•11 years ago
|
||
Apparently, robocop pan has been behaving wildly recently and way to noisy to indicate any regression (you were right tn). Bug 908823 : http://graphs.mozilla.org/graph.html#tests=[[174,64,20]]&sel=none&displayrange=30&datatype=running
Comment 95•11 years ago
|
||
Since favor perf mode has been removed (bug 930793), can you please retry with your machines?
Flags: needinfo?(bugs)
Assignee | ||
Comment 96•11 years ago
|
||
Yup, this is totally fixed now. I'd like to wait for few days before closing this to get
feedback about bug 930793.
Flags: needinfo?(bugs)
Is this issue something that would currently affect performance on B2G?
After the many comments and bug 930793 being reopened, we're a little unclear on the impact and current status. Can this be closed out?
Flags: needinfo?(bugs)
Priority: -- → P3
Assignee | ||
Comment 98•11 years ago
|
||
This should not be closed.
I don't know how this - well bug 930793, affects to b2g.
Flags: needinfo?(bugs)
Assignee | ||
Comment 99•3 years ago
|
||
Scrolling is now quite different, and smoother, with APZ.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 3 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•