Closed Bug 980171 Opened 7 years ago Closed 7 years ago

[buri] framebuffer is slow which prevents compositing at 60fps with v1.2-device.cfg

Categories

(Firefox OS Graveyard :: Vendcom, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:backlog)

RESOLVED WONTFIX
1.4 S3 (14mar)
tracking-b2g backlog

People

(Reporter: bkelly, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c=handeye p= s=2014.03.14 u=])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #977975 +++

Splitting this off from bug 977975 to track the buri issue described in bug977975 comment 11.

While that issue was originally found with 1.4, I have verified it reproduces on v1.3 as well.  Nom'ing for v1.3 as I understand we're targeting the buri for this release.

For convenience here is the description of the issue duplicated from the other bug:

It seems we are hitting periodic delays in the FBIOPUT_VSCREENINFO ioctl sync'ing the framebuffer to the display.  While this is done on a separate thread, its impacting our compositor thread through the framebuffer lock.

For example, this is the ideal sequence of events:

### ### disp_loop begin ioctl 1
### ### fb_lockBuffer begin locking 2
### ### fb_lockBuffer finish locking 2 [FAST]
### ### disp_loop finish ioctl 1 [16ms]
### ### disp_loop unlock 0
### ### disp_loop begin ioctl 2
### ### fb_lockBuffer begin locking 0
### ### fb_lockBuffer finish locking 0 [FAST]

The disp_loop thread is busy pushing out FB 1 here, while the gecko compositor is asking for a lock on FB 2.  Since FB 2 is not in use at all, the lock is granted immediately.  Here the ioctl completes within our frame budget of ~16ms and so everything stays on track.  Gecko's next lock is also immediately granted, etc.

Sometimes, though, we will get an ioctl that is way too slow.  For example:

### ### disp_loop begin ioctl 0
### ### fb_lockBuffer finish locking 1 [FAST]
### ### fb_lockBuffer begin locking 2
### ### disp_loop finish ioctl 0 [30ms]
### ### disp_loop unlock 2
### ### disp_loop begin ioctl 1
### ### fb_lockBuffer finish locking 2 [13ms]
### ### fb_lockBuffer begin locking 0
### ### disp_loop finish ioctl 1 [15ms]
### ### disp_loop unlock 0
### ### disp_loop begin ioctl 2
### ### fb_lockBuffer finish locking 0 [8ms]
### ### fb_lockBuffer begin locking 1
### ### disp_loop finish ioctl 2 [16ms]
### ### disp_loop unlock 1
### ### disp_loop begin ioctl 0
### ### fb_lockBuffer finish locking 1 [6ms]
### ### fb_lockBuffer begin locking 2
### ### disp_loop finish ioctl 0 [15ms]

Here we see the ioctl for FB 0 taking 30ms which is also twice our frame budget.  This causes gecko to block on the FB lock for 13ms.  (Due to our current architecture this also blocks gfx related IPC calls from the child processes to do things like allocate gralloc buffers, etc.)  Since this was just a single slow ioctl, though, the system recovers over the next few cycles and the lock times drift back down towards the fast case.

Unfortunately, though, we also see long series of slow FBIOPUT_VSCREENINFO ioctl calls:

### ### disp_loop begin ioctl 2
### ### fb_lockBuffer begin locking 0
### ### fb_lockBuffer finish locking 0 [FAST]
### ### fb_lockBuffer begin locking 1
### ### disp_loop finish ioctl 2 [29ms]
### ### disp_loop unlock 1
### ### disp_loop begin ioctl 0
### ### fb_lockBuffer finish locking 1 [8ms]
### ### fb_lockBuffer begin locking 2
### ### disp_loop finish ioctl 0 [31ms]
### ### disp_loop unlock 2
### ### disp_loop begin ioctl 1
### ### fb_lockBuffer finish locking 2 [17ms]
### ### fb_lockBuffer begin locking 0
### ### disp_loop finish ioctl 1 [30ms]
### ### disp_loop unlock 0
### ### disp_loop begin ioctl 2
### ### fb_lockBuffer finish locking 0 [18ms]
### ### fb_lockBuffer begin locking 1
### ### disp_loop finish ioctl 2 [31ms]

Here we are consistently blocking the gecko compositor on the FB lock for longer than our 16ms frame budget.

All of this was measured on an ICS buri last flashed with v1.2-device.cfg OEM firmware.
Michael, I understand that for v1.4 we're focused on the newer chipsets, but since this manifests in v1.3 is it worth investigating?

I think we could use some help to understand what is happening in the ioctl.  From what I can tell its doing a DMA of a fbdev buffer which should be a fixed size.

Thank you.
Flags: needinfo?(mvines)
Note, this problem seems worse on the v1.2-device.cfg firmware compared to the US_20131104_root.cfg firmware.

When I say worse, I mean that I see ~20 over budget frames at the end of a settings app scroll with v1.2-device.cfg, but only ~5 over budget frames with US_20131104_root.cfg.

US_20131104_root.cfg:

03-05 22:21:16.499 I/Gecko   (  782): ### ### HwcComposer2D::Render() took 40.1 ms
03-05 22:21:17.039 I/Gecko   (  782): ### ### HwcComposer2D::Render() took 17.9 ms
03-05 22:21:18.059 I/Gecko   (  782): ### ### HwcComposer2D::Render() took 19.5 ms
03-05 22:21:18.219 I/Gecko   (  782): ### ### HwcComposer2D::Render() took 20.1 ms
03-05 22:21:18.959 I/Gecko   (  782): ### ### HwcComposer2D::Render() took 19.7 ms

v1.2-device.cfg:

03-05 22:57:26.439 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.5 ms
03-05 22:57:26.479 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.4 ms
03-05 22:57:26.509 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.7 ms
03-05 22:57:26.539 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.0 ms
03-05 22:57:26.569 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 22.9 ms
03-05 22:57:26.599 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 27.6 ms
03-05 22:57:26.629 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.2 ms
03-05 22:57:26.659 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 27.0 ms
03-05 22:57:26.689 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 24.9 ms
03-05 22:57:26.719 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.6 ms
03-05 22:57:26.749 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.5 ms
03-05 22:57:26.779 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 27.5 ms
03-05 22:57:26.809 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.5 ms
03-05 22:57:26.839 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.9 ms
03-05 22:57:26.879 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 28.4 ms
03-05 22:57:26.909 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.1 ms
03-05 22:57:26.939 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 24.5 ms
03-05 22:57:26.969 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.5 ms
03-05 22:57:26.999 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 25.1 ms
03-05 22:57:27.029 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.3 ms
03-05 22:57:27.059 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 26.2 ms
03-05 22:57:27.089 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 18.1 ms
03-05 22:57:27.119 I/Gecko   (  788): ### ### HwcComposer2D::Render() took 28.4 ms


Also, I haven't instrumented gonk for the older firmware as I would need a separate b2g checkout to build for v1.1 firmware.  So it could be over budget on the older firmware for a different reason.
No, I don't think it's worth trying to optimize buri any further.
Flags: needinfo?(mvines)
Not blocking per comment 3.
blocking-b2g: 1.3? → backlog
I have verified that this does not occur on the tarako device which is also ICS based.  We don't seem to hit HwcComposer2D::Render on tarako, but composites consistently run at 16.3ms or faster which is in our frame budget.

Given comment 3 I'm not sure what we can do here other than try to decouple composition from the IPC thread as described in bug 980321.
Component: Graphics → Vendcom
Product: Core → Firefox OS
Summary: composites running slower than 60fps on b2g 1.3 → [buri] framebuffer is slow which prevents compositing at 60fps with v1.2-device.cfg
If this problem is buri only and doesn't affect tarako and doesn't affect KitKat/new devices, then I am ok with WONTFIX this.
(In reply to Ben Kelly [:bkelly] from comment #5)
> I have verified that this does not occur on the tarako device which is also
> ICS based.  We don't seem to hit HwcComposer2D::Render on tarako, but
> composites consistently run at 16.3ms or faster which is in our frame budget.
> 
> Given comment 3 I'm not sure what we can do here other than try to decouple
> composition from the IPC thread as described in bug 980321.

Have you seen this on more than one buri device, by the way?  Has anybody else reproduced the same problem on another one?  I know that we had at least one hamachi in Toronto that would not composite above 52fps without any other work to stop it.  Switch to another device, and the same test was happily running >60fps.  Not saying that's the case here, just something to consider.  BenWa, do you still have that simple change you'd make to see how fast the compositor would run?
Flags: needinfo?(bgirard)
Well, I can avoid the problem by flashing older OEM firmware, so I don't think its the particular device.  See comment 2.

That being said, having someone else reproduce would be nice.  Its really necessary to instrument the code to see the problem, though.
Attached patch patchSplinter Review
This patch + layers.offmainthreadcomposition.frame-rate;0 should do it.
Flags: needinfo?(bgirard)
What does this patch do?
The patch is relative to comment 7.  It sort of gives us an empty render (I don't have good words for this :).  And if that is not racing about 60fps, something is wrong with the unit.  It's really just a way to verify your phone is OK in the first place.  BenWa used it to identify one phone that wasn't OK.
Attachment #8387005 - Attachment is patch: true
Attachment #8387005 - Attachment mime type: text/x-patch → text/plain
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Whiteboard: [c=handeye p= s=2014.03.14 u=]
Target Milestone: --- → 1.4 S3 (14mar)
Depends on: 982968
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.