Closed Bug 870294 Opened 12 years ago Closed 7 years ago

Change compositor thread for stable FPS

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sinker, Unassigned)

Details

Attachments

(3 files, 1 obsolete file)

Compositor is running with normal priority, it means to compete with other threads for CPU. It should be with real-time priority (SCHED_RR for Linux) to get more stable frame rate (stable interval between consequence frames). And, frame rate of CompositorParent and RefreshDriver is not consistent (60fps vs 66.66...fps). It is also a source of unstable FPS. CompositorParent keep tracking dispatching time of last frame to make sure frames being played with a stable FPS. But, the point of keeping timestamp is too far from the code that make decision of dispatching. It is also a source of unstable FPS. The attached patch will fix these 3 issues to get more stable FPS. I suggest to create followup bugs for synchronizing with composition for input and timer events. They are also very helpful for getting stable FPS.
Attachment #747332 - Flags: feedback?(justin.lebar+bug)
Fix value of mLastSchedule.
Attachment #747332 - Attachment is obsolete: true
Attachment #747332 - Flags: feedback?(justin.lebar+bug)
Attachment #747366 - Flags: feedback?(justin.lebar+bug)
Comment on attachment 747332 [details] [diff] [review] Change compositor thread priority and changes for stable frame rate Review of attachment 747332 [details] [diff] [review]: ----------------------------------------------------------------- Drive by.. ::: gfx/layers/ipc/CompositorParent.cpp @@ +106,5 @@ > return true; > } > sCompositorThreadRefCount = 1; > sCompositorThread = new Thread("Compositor"); > + Thread::Options options(MessageLoop::TYPE_DEFAULT, 0, 1); Add comments about the parameters. @@ +411,5 @@ > mCurrentCompositeTask = NewRunnableMethod(this, &CompositorParent::Composite); > > // Since 60 fps is the maximum frame rate we can acheive, scheduling composition > // events less than 15 ms apart wastes computation.. > + if (!initialSchedule && delta.ToMilliseconds() < 15) { Should we change this to 16.67 instead?
Attachment #747332 - Attachment is obsolete: false
Attachment #747332 - Flags: feedback?(nical.bugzilla)
Attachment #747332 - Attachment is obsolete: true
Attachment #747332 - Flags: feedback?(nical.bugzilla)
Attachment #747366 - Flags: feedback?(nical.bugzilla)
(In reply to Kan-Ru Chen [:kanru] from comment #2) > Comment on attachment 747332 [details] [diff] [review] > Change compositor thread priority and changes for stable frame rate > > Review of attachment 747332 [details] [diff] [review]: > ----------------------------------------------------------------- > > Drive by.. > > ::: gfx/layers/ipc/CompositorParent.cpp > @@ +106,5 @@ > > return true; > > } > > sCompositorThreadRefCount = 1; > > sCompositorThread = new Thread("Compositor"); > > + Thread::Options options(MessageLoop::TYPE_DEFAULT, 0, 1); > > Add comments about the parameters. > > @@ +411,5 @@ > > mCurrentCompositeTask = NewRunnableMethod(this, &CompositorParent::Composite); > > > > // Since 60 fps is the maximum frame rate we can acheive, scheduling composition > > // events less than 15 ms apart wastes computation.. > > + if (!initialSchedule && delta.ToMilliseconds() < 15) { > > Should we change this to 16.67 instead? I think it is int only. So, 16ms is more closed to 60fps.
This looks good to me, Although I don't know for sure what's the policy when it comes to modifying the chromium threads code (since we fetch upstream changes from time to time). Usually we try to not modify whenever possible. Not sure who to ask about this though.
Comment on attachment 747366 [details] [diff] [review] Change compositor thread priority and changes for stable frame rate, v2 Maybe glandium can comment on this. I don't know enough about Linux scheduling to say whether this is a good idea or not.
Attachment #747366 - Flags: feedback?(mh+mozilla)
Attachment #747366 - Flags: feedback?(nical.bugzilla)
Inserts RefreshDriver at the head of the event queue of the main thread. It makes sure that the main thread always run RefreshDriver before other tasks.
This patch log last recent calling of RefreshDriver. |tickcosts| remembers running time of ticks of RefreshDriver. |refreshintervals| remembers intervals of running RefreshDriver. Following is one of samples of running Permissions Test. (gdb) p *(refreshintervals + refreshidx - 100)@100 $7 = {17, 20, 16, 16, 19, 35, 19, 18, 28, 33, 33, 35, 58, 34, 14, 13, 18, 15, 18, 18, 17, 18, 19, 16, 18, 17, 18, 19, 18, 39, 14, 17, 17, 20, 21, 30, 14, 13, 16, 15, 14, 19, 15, 17, 17, 17, 19, 18, 15, 18, 17, 18, 18, 18, 18, 18, 16, 18, 18, 17, 39, 14, 17, 17, 18, 18, 18, 18, 17, 20, 15, 19, 20, 13, 20, 15, 17, 18, 20, 16, 18, 17, 18, 17, 17, 17, 17, 18, 16, 16, 18, 19, 18, 17, 14, 18, 18, 39, 18, 13} (gdb) p *(tickcosts + tickidx - 100)@100 $8 = {15, 12, 13, 13, 35, 16, 12, 28, 33, 33, 35, 58, 34, 14, 13, 15, 14, 15, 15, 14, 15, 14, 13, 15, 16, 14, 14, 14, 38, 14, 14, 14, 16, 14, 30, 14, 13, 16, 15, 14, 15, 14, 12, 13, 13, 14, 12, 12, 13, 13, 13, 13, 12, 13, 13, 12, 11, 11, 13, 39, 14, 14, 15, 14, 14, 14, 15, 14, 15, 12, 15, 20, 11, 14, 12, 15, 14, 14, 12, 13, 12, 12, 12, 12, 14, 13, 13, 12, 14, 15, 15, 13, 12, 13, 15, 14, 39, 15, 0, 0} I had change interval of refreshdriver to 18ms (instead of 15ms). As you see, most ticks cost 12~15ms, but some ticks cost more times (3xms, 5xms). Intervals of running ticks are highly relative to tick costs. Tick costs seems the major source of unstable FPS.
Following is an sample for Twitter App installed from Market Place. (gdb) p *(refreshintervals + refreshidx - 100)@100 $21 = {22, 44, 24, 23, 20, 21, 20, 19, 19, 22, 23, 20, 21, 22, 22, 19, 20, 19, 20, 19, 19, 19, 19, 19, 24, 21, 21, 21, 23, 20, 21, 20, 21, 22, 20, 19, 20, 19, 19, 19, 19, 19, 21, 20, 19, 19, 20, 20, 19, 5, 17, 18, 18, 17, 18, 17, 18, 18, 17, 18, 18, 18, 18, 18, 17, 18, 18, 18, 18, 18, 18, 18, 17, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 17, 18, 18, 18, 18, 18, 18, 17, 18, 18, 18, 18, 18, 17, 18} (gdb) p *(tickcosts + tickidx - 100)@100 $22 = {44, 24, 23, 20, 21, 20, 19, 19, 21, 23, 20, 21, 21, 22, 19, 20, 19, 19, 19, 19, 19, 19, 19, 24, 21, 21, 21, 23, 20, 21, 20, 21, 22, 20, 19, 20, 19, 19, 19, 19, 19, 20, 20, 19, 19, 19, 20, 19, 1, 0 <repeats 51 times>} The scrolling is not smooth for Twitter on Unagi. As you see, time for every frame is definitely longer than the default frame interval. It makes frame rate really unstable. Maybe we should consider to use adaptive frame rate to make frame rate more stable.
Comment on attachment 747366 [details] [diff] [review] Change compositor thread priority and changes for stable frame rate, v2 Review of attachment 747366 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/base/platform_thread_posix.cc @@ +133,5 @@ > if (!joinable) { > pthread_attr_setdetachstate(&attributes, PTHREAD_CREATE_DETACHED); > } > > + if (priority < 0 && priority > 99) sched_get_priority_min/sched_get_priority_max should probably be used here. @@ +139,5 @@ > + > + if (priority > 0) { > + pthread_attr_setschedpolicy(&attributes, SCHED_RR); > + sched_param param = { priority }; > + pthread_attr_setschedparam(&attributes, &param); The main problem you can have with this is that if your thread is using a lot of CPU and not blocked on I/O at some point, it will preempt other threads with priority=0, effectively starving them for CPU (as well as other processes). Also note this requires root or the right capability. ::: layout/base/nsRefreshDriver.cpp @@ +56,5 @@ > #else > #define LOG(...) do { } while(0) > #endif > > +#define DEFAULT_FRAME_RATE 66.66 This seems like a weird frame rate. Does that even match the video refresh rate?
Attachment #747366 - Flags: feedback?(mh+mozilla)
> Also note this requires root or the right capability. The main b2g process runs as root, so this is not a problem. I don't feel like we have the right people on this bug yet. Maybe roc can help.
Comment on attachment 748409 [details] [diff] [review] Dispatch RefreshDriver before others events at the queue of the main thread Starvation is similarly a concern here, I should think.
Comment on attachment 747366 [details] [diff] [review] Change compositor thread priority and changes for stable frame rate, v2 I'm not the right person to give feedback on this (although I can review the ipc bits, if we decide to go with this approach). Maybe roc can help us find the right person.
Attachment #747366 - Flags: feedback?(justin.lebar+bug) → feedback?(roc)
(In reply to Thinker Li [:sinker] from comment #7) Am I reading this right in that one tick of the refresh driver takes at least 12ms (the tickcosts array)? If so, that suggests that merely running the refresh driver at 60hz takes at least 1 - (1/60 - 0.012) / (1/60) = 72% of our time. Add in the compositor and you don't have much time for anything else. I presume this is only the case when we're scrolling or doing other heavy lifting. But it does speak to the starvation question, I'd think.
I think it would be pretty easy for an app or even a Web page to overload the compositor. If it has real-time priority, it may starve the rest of the system and be difficult for the user to recover. I am also worried about putting refresh driver events at the head of the event queue, for similar reasons. As an alternative, how about detecting instability in the frame rate and automatically reducing the frame rate to a lower value that we can maintain? Would that be better than the current situation? (I think we might need to experiment to see --- I've seen people claim that a lower but stable frame rate is better than an unstable frame rate, but I've also seen people claim the opposite.)
(In reply to Justin Lebar [:jlebar] from comment #13) > (In reply to Thinker Li [:sinker] from comment #7) > > Am I reading this right in that one tick of the refresh driver takes at > least 12ms (the tickcosts array)? Right! > > If so, that suggests that merely running the refresh driver at 60hz takes at > least 1 - (1/60 - 0.012) / (1/60) = 72% of our time. Add in the compositor > and you don't have much time for anything else. > > I presume this is only the case when we're scrolling or doing other heavy > lifting. But it does speak to the starvation question, I'd think. I had left a comment for starvation yesterday, but for some unknown reason, the comment had disappeared. With twitter app, starvation is really happened. The twitter app almost not do any response for touch events during scrolling after applying the event queue patch. By giving compositor thread a real-time priority, we can get better output with the same CPU consumption (I mean with the same FPS). To avoid starvation, refresh driver should responsible for throttling the refresh rate. To roc: I had did experiments for lower FPS for the Twitter app. I think we should divide the problem into two parts; scrolling in fast stage, and slowing down stage. In scrolling in fast stage, it overloads the CPU, a lower but stable FPS seems good. But, for slowing down stage, the CPU is not so busy, lower FPS seems poor. So, if we want adaptive FPS, it should be fast in adapting frame rate. (comment 8)
(In reply to Mike Hommey [:glandium] from comment #9) > > +#define DEFAULT_FRAME_RATE 66.66 > > This seems like a weird frame rate. Does that even match the video refresh > rate? refreshdriver is using 15ms as frame interval. 66.66 == 1000/15. It is weird, right! Maybe the compositor thread should base on frame intervals instead of FPS.
Avi and Vlad probably have some advice here. Besides, whatever we do here we probably want to use across platforms. (In reply to Thinker Li [:sinker] from comment #16) > refreshdriver is using 15ms as frame interval. 66.66 == 1000/15. It is > weird, right! > Maybe the compositor thread should base on frame intervals instead of FPS. There's no point in running the refresh driver more frequently than the compositor thread can keep up. So we need to sync the refresh driver to the compositor thread somehow.
And there's no point in running the refresh driver more frequently than the screen can refresh.
Right, we've needed feedback from the compositor thread/process for a while on the content side. We should be getting frame timings back from the compositor which we can feed in to the refresh driver calculation, because we can do things like halve our framerate and the like as necessary.
Note that we have bugs (e.g. 847592) where the main process compositor + refresh driver starves child processes. The effect is somewhat minimal at the moment, but if we increased the priority of the compositor, this could get a lot worse.
There few issues I notice: 1. How to time the compositor correctly. 2. Starving the compositor/refresh driver. 3. Deciding on priorities of this process. About the timing, the best approach would be to trigger using actual vsync signal. That's typically 59.94hz, with 50hz also common, and other frequencies are also possible (72, 75, 100, 120, etc). I tried to summarize the concerns on bug 689418 comment 24. Second best approach would be to trigger with a steady imaginary vsync signal running at 59.94 Hz. That's what the refresh driver does currently (except it does 60). Worse approach would be fixed delays (and especially as integer ms). It's wildly inaccurate, and gets out of sync with the monitor refresh rate very quickly (less than a second almost certainly). As for the starving. If either the compositor of the main thread are starved, we're not good. We need the main thread to process inputs, and the compositor to respond to them. If we keep current mechanisms, I think neither should have higher priority, and the compositor should trigger the refresh driver, possibly sometimes at decimated intervals. However, I have another approach, which will require some work, but by the look of things around, it could be the best balance of responsiveness and smooth respond. My approach is similar to the one used on Firefox for Android, and on IE10 (by empirical observations). Here's what I think is required, from highest priority to lowest: 1. Intercept all user input. 2. handle (and render) chrome UI inputs which don't affect content. 3. Do pan and zoom (the actual animated scrolling of something). 4. Rendering whatever is required for pan and zoom only. Hopefully that's reasonably light. 5. Almost everything else goes here (handle non pan/zoom content user inputs, content callbacks, maintenance, etc). 5. Rendering other content layout changes (DOM changes, hovers, etc). Casualties of this approach is possibly scrolling un-rendered areas into view (checkboard/white/BG/etc), as already done on Android. Zoom, but especially pan (scroll) is where the browser is measured most on responsiveness, and it's even more important when we move to touch-based devices. I don't have a design for this system, but I believe that's where most touch-based systems (and IE10) are going, and I think rightfully so: navigating the page with ultimate responsiveness is more important from a user perspective IMO than some delays in rendering (as long as the delay is reasonably short), or some in-page animation, etc.
The workload of compositor thread is determined by the producing rate of refresh driver. It is not reasonable that refresh driver produces more than compositor thread can be scheduled. Throttling frame rate at refresh driver is better than let compositor thread to compete CPU time with other thread. With the same workload (almost), but get a stable frame rate, why not? Will higher priority for compositor thread cause starving? As I said, compositor thread should be throttled by refresh driver. It is the refresh driver to take the responsibility of preventing starving others. And, the refresh driver is running on main thread. It means we always give main thread enough time to clear its queue, or the compositor thread never run. The user's experience comes from how smooth of refreshing and how long to get response. In our case, a better refreshing seems more important than response time since we can give a reasonable response time.
As I had showed on comment 7 and comment 8, scheduled interval of the refresh driver and the cost of a refresh driver round may be good factors of the frame rate function.
(In reply to Thinker Li [:sinker] from comment #22) > The workload of compositor thread is determined by the producing rate of > refresh driver. Not entirely. Asynchronous animations and video playback are two cases where the compositor workload is independent of the refresh driver. > It is not reasonable that refresh driver produces more than > compositor thread can be scheduled. Agreed. > It is the refresh > driver to take the responsibility of preventing starving others. Apart from the issue that async animations and video aren't under the control of the refresh driver (which is a good thing!), there are multiple refresh drivers involved --- one for each process that has content visible on the screen. So I don't think it works to make throttling the job of the refresh driver(s). Also the compositor thread has more information here, because it's closer to the display.
It is a question if the compositor thread has more information than other places. It is closer to the display, but it is more far from real workload. AFAIK, compositor's workload is not heavy. Since we have several drivers of refreshing for different cases, I think we should aggregate information in a place from these drivers. Maybe the compositor thread is a good choice. Basically, we need a place to choose a frame rate dynamically according collected data, and apply the frame rate to drivers. Right!?
(In reply to Thinker Li [:sinker] from comment #25) > Basically, we need a place to choose a frame rate dynamically according > collected data, and apply the frame rate to drivers. Right!? Right!
(In reply to Avi Halachmi (:avih) from comment #21) > 1. Intercept all user input. > 2. handle (and render) chrome UI inputs which don't affect content. > 3. Do pan and zoom (the actual animated scrolling of something). > 4. Rendering whatever is required for pan and zoom only. Hopefully that's > reasonably light. > 5. Almost everything else goes here (handle non pan/zoom content user > inputs, content callbacks, maintenance, etc). > 5. Rendering other content layout changes (DOM changes, hovers, etc). > > Casualties of this approach is possibly scrolling un-rendered areas into > view (checkboard/white/BG/etc), as already done on Android. This is how the async-pan-zoom-controller works. The pan/zoom happens entirely on the compositor for maximum responsiveness then catch up with content update.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: