Closed Bug 1155797 Opened 9 years ago Closed 9 years ago

Vsync is unreliable on L devices

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
blocking-b2g 2.2+
Tracking Status
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- fixed
b2g-v2.2 --- verified
b2g-master --- verified

People

(Reporter: mchang, Assigned: jerry)

References

Details

(Whiteboard: [caf priority: p2][CR 834841])

Attachments

(8 files, 18 obsolete files)

115.69 KB, text/plain
Details
121.60 KB, text/plain
Details
2.82 KB, patch
Details | Diff | Splinter Review
4.42 KB, patch
Details | Diff | Splinter Review
3.78 KB, patch
Details | Diff | Splinter Review
3.72 KB, patch
mchang
: review+
Details | Diff | Splinter Review
1.52 MB, video/3gpp
Details
1.57 MB, image/jpeg
Details
On both Android 5.0 and Android 5.1 on Nexus 5 devices, hardware vsync is unreliable and causes lots of strange errors. 

1) We cannot reliably enable / disable vsync - https://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#205

2) If we enable/disable vsync many times in quick succession, the CPUs on the device seem to shut down. We get lots of errors such as:

    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3
    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3
    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu2
    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3
    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3

3) In the worst case, we can get flashing such as in bug 1147753.

Figure out why Vsync doesn't correctly work reliably on L devices. It works fine on Kit-Kat. 
    E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3
Assignee: nobody → hshih
Status: NEW → ASSIGNED
@Jerry, please describe where in the code you found errors in the kernel and maybe we can ask for partner support to see what's going on here.
Hi Diego,
I found some log from hal and kernel driver.

There are the failed log for eventControl() at first call.
01-06 09:59:47.452 I/qdhwcomposer(  195): vsync_loop: dpy:0 fd:89
01-06 09:59:47.452 I/qdhwcomposer(  195): vsync_loop: Reading vsync for dpy=1 from /sys/class/graphics/fb1/vsync_event
01-06 09:59:47.452 I/qdhwcomposer(  195): vsync_loop: dpy:1 fd:90
01-06 09:59:47.453 E/qdhwcomposer(  195): hwc_vsync_control: vsync control failed. fd=7, Dpy=0, enable=1 : Operation not supported on transport endpoint

In kernel video driver, I found the failed reason. We don't have add_vsync_handler and remove_vsync_handler handler in mdss_mdp_overlay_vsync_ctrl() call.

kernel/drivers/video/msm/mdss/mdss_mdp_overlay.c:
int mdss_mdp_overlay_vsync_ctrl(struct msm_fb_data_type *mfd, int en)
{
  ....
  if (!ctl->add_vsync_handler || !ctl->remove_vsync_handler)
    return -EOPNOTSUPP;  //<<== failed here
  ....
}

The mdss_mdp_overlay_vsync_ctrl() call stack:
<4>[   14.578362] [<c010e2b4>] (unwind_backtrace+0x0/0x140) from [<c0a4c8e4>] (dump_stack+0x20/0x24)
<4>[   14.578467] [<c0a4c8e4>] (dump_stack+0x20/0x24) from [<c044776c>] (mdss_mdp_overlay_vsync_ctrl+0xd4/0x170)
<4>[   14.578571] [<c044776c>] (mdss_mdp_overlay_vsync_ctrl+0xd4/0x170) from [<c0447e24>] (mdss_mdp_overlay_ioctl_handler+0x61c/0x1484)
<4>[   14.578678] [<c0447e24>] (mdss_mdp_overlay_ioctl_handler+0x61c/0x1484) from [<c0469b50>] (mdss_fb_ioctl+0xec/0x528)
<4>[   14.578787] [<c0469b50>] (mdss_fb_ioctl+0xec/0x528) from [<c04208a4>] (do_fb_ioctl+0x460/0x5dc)
<4>[   14.578845] [<c04208a4>] (do_fb_ioctl+0x460/0x5dc) from [<c0420a70>] (fb_ioctl+0x50/0x54)
<4>[   14.578945] [<c0420a70>] (fb_ioctl+0x50/0x54) from [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0)
<4>[   14.579044] [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0) from [<c028167c>] (sys_ioctl+0x7c/0x8c)
<4>[   14.579152] [<c028167c>] (sys_ioctl+0x7c/0x8c) from [<c0107400>] (ret_fast_syscall+0x0/0x30)

And the handlers are initialized in mdss_mdp_ctl_start_sub(). It seems that the handler is inited after panel ready. The time between the eventControl and mdss_mdp_ctl_start_sub is about 1 second. I have a test patch(attachment 8594719 [details] [diff] [review]) that try to enable the panel before vsync event control, but it doesn't work.

My question is:
**Do we have a clear time to use eventControl()?
I don't check the flame-kk kernel code yet, but the eventControl() sequence works at flame-kk.

kernel/drivers/video/msm/mdss/mdss_mdp_ctl.c:
static int mdss_mdp_ctl_start_sub(struct mdss_mdp_ctl *ctl)
{
  ....
  if (ctl->start_fnc)
    ret = ctl->start_fnc(ctl); //<<==
  ....
}

The mdss_mdp_ctl_start_sub() call stack:
<4>[   16.888640] [<c010e2b4>] (unwind_backtrace+0x0/0x140) from [<c0a4c8e4>] (dump_stack+0x20/0x24)
<4>[   16.888763] [<c0a4c8e4>] (dump_stack+0x20/0x24) from [<c0434d60>] (mdss_mdp_ctl_start_sub+0x48/0x2b4)
<4>[   16.888901] [<c0434d60>] (mdss_mdp_ctl_start_sub+0x48/0x2b4) from [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8)
<4>[   16.889042] [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8) from [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168)
<4>[   16.889172] [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168) from [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484)
<4>[   16.889295] [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484) from [<c0469b50>] (mdss_fb_ioctl+0xec/0x528)
<4>[   16.889369] [<c0469b50>] (mdss_fb_ioctl+0xec/0x528) from [<c04208a4>] (do_fb_ioctl+0x460/0x5dc)
<4>[   16.889437] [<c04208a4>] (do_fb_ioctl+0x460/0x5dc) from [<c0420a70>] (fb_ioctl+0x50/0x54)
<4>[   16.889571] [<c0420a70>] (fb_ioctl+0x50/0x54) from [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0)
<4>[   16.889701] [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0) from [<c028167c>] (sys_ioctl+0x7c/0x8c)
<4>[   16.889776] [<c028167c>] (sys_ioctl+0x7c/0x8c) from [<c0107400>] (ret_fast_syscall+0x0/0x30)
<4>[   16.892177] mdss_dsi_on:485 Panel already on.
Flags: needinfo?(dwilson)
I think the cpu wake-up error message(comment 0) is from msm_thermal_cpu_callback(). We got NOTIFY_BAD from msm_thermal_cpu_callback() in kernel/drivers/thermal/msm_thermal.c .

    E/MP-Decision(  860): Error 1 setting online status to 1 for cpu3

Does it mean the cpu temperature is too hot?
My L device also fails when I force enable VSync. See my kernel log attached.
Flags: needinfo?(dwilson)
Flags: needinfo?(mchang)
Chatted on irc, just a needinfo to make sure I was aware that there was movement on this bug. Diego will talk with some folks to see if they can find something. Thanks!
Flags: needinfo?(mchang)
The add_vsync_handler and remove_vsync_handler seem inited by MSMFB_OVERLAY_PLAY ioctl call. That will be call in first hwc::prepare().
I will have a try to call the prepare() in HwcComposer2D::RegisterHwcEventCallback(). Then, I will also try to defer the vsync source initialization at the first painting request from compositor.
There is no error if I call hwc::prepare() before hwc::eventControl(), but the system will hang at hwc::blink().
I'm writing a patch to defer the vsync source initialization at the first painting request from compositor.
I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay. Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm checking the vsync related preference setting for L and will upload the patch later.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #10)
> I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay.
> Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm
> checking the vsync related preference setting for L and will upload the
> patch later.

Why do we need to do this versus just making vsync on L be reliable like on kit-kat?
Hi Diego, do you have any updates on what's going on with vsync on L? Thanks!
Flags: needinfo?(dwilson)
> (In reply to Jerry Shih[:jerry] (UTC+8) from comment #10)
> > I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay.
> > Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm
> > checking the vsync related preference setting for L and will upload the
> > patch later.
> 
> Why do we need to do this versus just making vsync on L be reliable like on
> kit-kat?

Nexus-5-kk has the same issue. It's also failed at the first time we enable the vsync in gfxAndroidPlatform::CreateHardwareVsyncSource().
Another way is that we defer the init call to the first compositor paint. That needs to change refresh driver initialization flow(including the ipc code for vsync-ready message).
Mason,

I'm still trying to get hold of the display guy. I'll let you know once I have an update.
(In reply to Mason Chang [:mchang] from comment #11)
> (In reply to Jerry Shih[:jerry] (UTC+8) from comment #10)
> > I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay.
> > Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm
> > checking the vsync related preference setting for L and will upload the
> > patch later.
> 
> Why do we need to do this versus just making vsync on L be reliable like on
> kit-kat?

Mason, I had discussed with Jerry about the vsync on L yesterday. I agree with you that we should figure out why vsync on L is unreliable. 

But I also think gecko should consider to handle the condition when HW vsync is unreliable because it may happen in other platforms. The following is the proposed idea to fix this issue.
In gecko, the platform only needs to be aware of vsyncsource[1]. Inside vsyncsouce class, it checks the HW vsync availability first and fallback to SW if there is any problem to enable HW vsync for each vsync enable call. For each platform, we can override the function to enable/disable the HW vsync. Therefore, we still can share the sw vsync implementation for each platform.

[1]https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#570
I know why we got creepy frame at nexus5. When we init vsync failed at [1], we use sw timer instead.
But we still try to turn on vsync at [2] when we turn on screen. Once vsycn is ready, NotifyVsync() will be called twice per frame and post a lot of wrong notification in ScheduleNextVsync()[3].

[1]
https://hg.mozilla.org/mozilla-central/annotate/1ad65cbeb2f4/gfx/thebes/gfxAndroidPlatform.cpp#l499
[2]
https://hg.mozilla.org/mozilla-central/annotate/1ad65cbeb2f4/widget/gonk/nsWindow.cpp#l135
[3]
https://hg.mozilla.org/mozilla-central/annotate/1ad65cbeb2f4/gfx/thebes/SoftwareVsyncSource.cpp#l111


Here is the buggy sequence:


        SoftwareDisplay                                     
               |                                            
               |          HWC::Vsync()                        
               v               |                            
          EnableVsync()        |                            
               |               |                            
   +---------> |               |                            
   |           |               |                            
   |           v               |                            
   |      NotifyVsync()<-------+                            
   |           |          GetPlatform()                     
   |           |          ->GetHardwareVsync()              
   +-----------+          ->GetGlobalDisplay().NotifyVsync()
ScheduleNextVsync()
And the message loop will fill with a lot of NotifyVsync() in comment 22.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #5)
> I think the cpu wake-up error message(comment 0) is from
> msm_thermal_cpu_callback(). We got NOTIFY_BAD from
> msm_thermal_cpu_callback() in kernel/drivers/thermal/msm_thermal.c .
> 
>     E/MP-Decision(  860): Error 1 setting online status to 1 for cpu3
> 
> Does it mean the cpu temperature is too hot?

Hi Diego,

When we have this message, I can make sure we hit the error in kernel/drivers/thermal/msm_thermal.c . Even though we have bug in comment 22, why we can't wake up cpu2?


dmesg:
  E/MP-Decision(  861): Error 1 setting online status to 1 forcpu2

Here are the variables value in if statement:
  action: CPU_UP_PREPARE
  core_control_enabled: true
  msm_thermal_info.core_control_mask: 0xe
  cpus_offlined: 0xc
  cpu: 2

kernel/drivers/thermal/msm_thermal.c:
static int __ref msm_thermal_cpu_callback(struct notifier_block *nfb,
		unsigned long action, void *hcpu)
{
  uint32_t cpu = (uint32_t)hcpu;

  if (action == CPU_UP_PREPARE || action == CPU_UP_PREPARE_FROZEN) {
    if (core_control_enabled &&
        (msm_thermal_info.core_control_mask & BIT(cpu)) &&
        (cpus_offlined & BIT(cpu))) {
      pr_debug("%s: Preventing cpu%d from coming online", KBUILD_MODNAME, cpu);
      return NOTIFY_BAD; // <===== we got failed here
    }
  }
  return NOTIFY_OK;
}
Attachment #8598790 - Attachment description: wip Part3: use VsyncDisplay interface to turn on/off vsync. v1 → Part3: use VsyncDisplay interface to turn on/off vsync. v1
Attachment #8598790 - Flags: review?(mwu)
Comment on attachment 8598790 [details] [diff] [review]
Part3: use VsyncDisplay interface to turn on/off vsync. v1

Review of attachment 8598790 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/nsWindow.cpp
@@ -131,5 @@
>  
>  static void
>  displayEnabledCallback(bool enabled)
>  {
> -    HwcComposer2D::GetInstance()->EnableVsync(enabled);

We will not always use hwc vsync event. If we use sw vsync, we should not use the hwc control interface. Instead, we go through VsyncSource to get the right one.
The init sequence between flame-kk and nexus5-l are different.

For flame-kk:

The kernel/drivers/video/msm/mdss/mdp3_ctrl.c::mdp3_ctrl_init() is called much earlier than nexus5-l. Here is the backtrace.

<4>[    0.599646] [<c010bd74>] (unwind_backtrace+0x0/0xf8) from [<c03cb1e8>] (mdp3_ctrl_init+0x24/0x3c0)
<4>[    0.599665] [<c03cb1e8>] (mdp3_ctrl_init+0x24/0x3c0) from [<c03c3b80>] (mdp3_init+0xc/0x58)
<4>[    0.599684] [<c03c3b80>] (mdp3_init+0xc/0x58) from [<c03fa408>] (mdss_fb_probe+0x734/0xb60)
<4>[    0.599703] [<c03fa408>] (mdss_fb_probe+0x734/0xb60) from [<c0477d70>] (platform_drv_probe+0x18/0x1c)
<4>[    0.599720] [<c0477d70>] (platform_drv_probe+0x18/0x1c) from [<c0476b58>] (driver_probe_device+0x8c/0x208)
<4>[    0.599737] [<c0476b58>] (driver_probe_device+0x8c/0x208) from [<c0476d60>] (__driver_attach+0x8c/0x90)
<4>[    0.599756] [<c0476d60>] (__driver_attach+0x8c/0x90) from [<c047532c>] (bus_for_each_dev+0x4c/0x80)
<4>[    0.599773] [<c047532c>] (bus_for_each_dev+0x4c/0x80) from [<c04762e8>] (bus_add_driver+0xe0/0x250)
<4>[    0.599788] [<c04762e8>] (bus_add_driver+0xe0/0x250) from [<c04771f8>] (driver_register+0x78/0x138)
<4>[    0.599806] [<c04771f8>] (driver_register+0x78/0x138) from [<c0d1c6c0>] (mdss_fb_init+0xc/0x20)
<4>[    0.599823] [<c0d1c6c0>] (mdss_fb_init+0xc/0x20) from [<c01005b8>] (do_one_initcall+0x10c/0x170)
<4>[    0.599840] [<c01005b8>] (do_one_initcall+0x10c/0x170) from [<c0d00bcc>] (kernel_init+0xf0/0x1ac)
<4>[    0.599858] [<c0d00bcc>] (kernel_init+0xf0/0x1ac) from [<c0106820>] (kernel_thread_exit+0x0/0x8)

For nexus5-l:

In video driver, it should have add_vsync_handler and remove_vsync_handler handler for vsync control. That will be init in mdss_mdp_ctl_start().
The kernel/drivers/video/msm/mdss/mdss_mdp_ctl.c::mdss_mdp_ctl_start() seems triggered by MSMFB_OVERLAY_PLAY ioctl call.

Here is the backtrace:

<4>[   16.889042] [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8) from [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168)
<4>[   16.889172] [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168) from [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484)
<4>[   16.889295] [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484) from [<c0469b50>] (mdss_fb_ioctl+0xec/0x528)
<4>[   16.889369] [<c0469b50>] (mdss_fb_ioctl+0xec/0x528) from [<c04208a4>] (do_fb_ioctl+0x460/0x5dc)
<4>[   16.889437] [<c04208a4>] (do_fb_ioctl+0x460/0x5dc) from [<c0420a70>] (fb_ioctl+0x50/0x54)
<4>[   16.889571] [<c0420a70>] (fb_ioctl+0x50/0x54) from [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0)
<4>[   16.889701] [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0) from [<c028167c>] (sys_ioctl+0x7c/0x8c)
<4>[   16.889776] [<c028167c>] (sys_ioctl+0x7c/0x8c) from [<c0107400>] (ret_fast_syscall+0x0/0x30)

----

We can see the time stamp for these two function calls. The time stamp of mdp3_ctrl_init() is much earlier at flame-kk. I think that's why we can use eventControl() successfully in gfxPlatform::Init() at flame-kk.

<4>[    0.599665] [<c03cb1e8>] (mdp3_ctrl_init+0x24/0x3c0) <= flame-kk
        ~~~~~~~~
<4>[   16.889042] [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8) <= nexus5-l
       ~~~~~~~~~
(In reply to peter chang[:pchang][:peter] from comment #21)
> (In reply to Mason Chang [:mchang] from comment #11)
> > (In reply to Jerry Shih[:jerry] (UTC+8) from comment #10)
> > > I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay.
> > > Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm
> > > checking the vsync related preference setting for L and will upload the
> > > patch later.
> > 
> > Why do we need to do this versus just making vsync on L be reliable like on
> > kit-kat?
> 
> Mason, I had discussed with Jerry about the vsync on L yesterday. I agree
> with you that we should figure out why vsync on L is unreliable. 
> 
> But I also think gecko should consider to handle the condition when HW vsync
> is unreliable because it may happen in other platforms. The following is the
> proposed idea to fix this issue.
> In gecko, the platform only needs to be aware of vsyncsource[1]. Inside
> vsyncsouce class, it checks the HW vsync availability first and fallback to
> SW if there is any problem to enable HW vsync for each vsync enable call.
> For each platform, we can override the function to enable/disable the HW
> vsync. Therefore, we still can share the sw vsync implementation for each
> platform.
> 
> [1]https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.
> cpp#570

I talked with Jerry about this yesterday as well. I disagree that we should handle the hardware vsync failure unless we have a good reason why we can't fix hardware vsync. Silk is not very effective without hardware vsync and flame kit-kat has reliable hardware vsync. Android actually has reliable hardware vsync, so do OS X and Windows as well. If we want a good smooth product, we need a reliable hardware vsync. I think we're better off making sure vsync is reliable and not covering the hardware bugs so that we know when hardware vsync isn't working such as in this case.
Comment on attachment 8598790 [details] [diff] [review]
Part3: use VsyncDisplay interface to turn on/off vsync. v1

Review of attachment 8598790 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/nsWindow.cpp
@@ +137,5 @@
> +        if (enabled) {
> +            gfxPlatform::GetPlatform()->GetHardwareVsync()->GetGlobalDisplay().EnableVsync();
> +        } else {
> +            gfxPlatform::GetPlatform()->GetHardwareVsync()->GetGlobalDisplay().DisableVsync();
> +        }

nit: change to:
GlobalDisplay &globalDisplay = gfxPlatform::GetPlatform()->GetHardwareVsync()->GetGlobalDisplay();
if (aEnabled) {
    globalDisplay.EnableVsync();
} else {
    globalDisplay.DisableVsync();
}
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #26)
> The init sequence between flame-kk and nexus5-l are different.
> 
> For flame-kk:
> 
> The kernel/drivers/video/msm/mdss/mdp3_ctrl.c::mdp3_ctrl_init() is called
> much earlier than nexus5-l. Here is the backtrace.
> 
> <4>[    0.599646] [<c010bd74>] (unwind_backtrace+0x0/0xf8) from [<c03cb1e8>]
> (mdp3_ctrl_init+0x24/0x3c0)
> <4>[    0.599665] [<c03cb1e8>] (mdp3_ctrl_init+0x24/0x3c0) from [<c03c3b80>]
> (mdp3_init+0xc/0x58)
> <4>[    0.599684] [<c03c3b80>] (mdp3_init+0xc/0x58) from [<c03fa408>]
> (mdss_fb_probe+0x734/0xb60)
> <4>[    0.599703] [<c03fa408>] (mdss_fb_probe+0x734/0xb60) from [<c0477d70>]
> (platform_drv_probe+0x18/0x1c)
> <4>[    0.599720] [<c0477d70>] (platform_drv_probe+0x18/0x1c) from
> [<c0476b58>] (driver_probe_device+0x8c/0x208)
> <4>[    0.599737] [<c0476b58>] (driver_probe_device+0x8c/0x208) from
> [<c0476d60>] (__driver_attach+0x8c/0x90)
> <4>[    0.599756] [<c0476d60>] (__driver_attach+0x8c/0x90) from [<c047532c>]
> (bus_for_each_dev+0x4c/0x80)
> <4>[    0.599773] [<c047532c>] (bus_for_each_dev+0x4c/0x80) from
> [<c04762e8>] (bus_add_driver+0xe0/0x250)
> <4>[    0.599788] [<c04762e8>] (bus_add_driver+0xe0/0x250) from [<c04771f8>]
> (driver_register+0x78/0x138)
> <4>[    0.599806] [<c04771f8>] (driver_register+0x78/0x138) from
> [<c0d1c6c0>] (mdss_fb_init+0xc/0x20)
> <4>[    0.599823] [<c0d1c6c0>] (mdss_fb_init+0xc/0x20) from [<c01005b8>]
> (do_one_initcall+0x10c/0x170)
> <4>[    0.599840] [<c01005b8>] (do_one_initcall+0x10c/0x170) from
> [<c0d00bcc>] (kernel_init+0xf0/0x1ac)
> <4>[    0.599858] [<c0d00bcc>] (kernel_init+0xf0/0x1ac) from [<c0106820>]
> (kernel_thread_exit+0x0/0x8)
> 
> For nexus5-l:
> 
> In video driver, it should have add_vsync_handler and remove_vsync_handler
> handler for vsync control. That will be init in mdss_mdp_ctl_start().
> The kernel/drivers/video/msm/mdss/mdss_mdp_ctl.c::mdss_mdp_ctl_start() seems
> triggered by MSMFB_OVERLAY_PLAY ioctl call.
> 
> Here is the backtrace:
> 
> <4>[   16.889042] [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8) from
> [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168)
> <4>[   16.889172] [<c0444f64>] (mdss_mdp_overlay_start+0x98/0x168) from
> [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484)
> <4>[   16.889295] [<c0448548>] (mdss_mdp_overlay_ioctl_handler+0xd40/0x1484)
> from [<c0469b50>] (mdss_fb_ioctl+0xec/0x528)
> <4>[   16.889369] [<c0469b50>] (mdss_fb_ioctl+0xec/0x528) from [<c04208a4>]
> (do_fb_ioctl+0x460/0x5dc)
> <4>[   16.889437] [<c04208a4>] (do_fb_ioctl+0x460/0x5dc) from [<c0420a70>]
> (fb_ioctl+0x50/0x54)
> <4>[   16.889571] [<c0420a70>] (fb_ioctl+0x50/0x54) from [<c028144c>]
> (do_vfs_ioctl+0x40c/0x5c0)
> <4>[   16.889701] [<c028144c>] (do_vfs_ioctl+0x40c/0x5c0) from [<c028167c>]
> (sys_ioctl+0x7c/0x8c)
> <4>[   16.889776] [<c028167c>] (sys_ioctl+0x7c/0x8c) from [<c0107400>]
> (ret_fast_syscall+0x0/0x30)
> 
> ----
> 
> We can see the time stamp for these two function calls. The time stamp of
> mdp3_ctrl_init() is much earlier at flame-kk. I think that's why we can use
> eventControl() successfully in gfxPlatform::Init() at flame-kk.
> 
> <4>[    0.599665] [<c03cb1e8>] (mdp3_ctrl_init+0x24/0x3c0) <= flame-kk
>         ~~~~~~~~
> <4>[   16.889042] [<c0436688>] (mdss_mdp_ctl_start+0xdc/0x1a8) <= nexus5-l
>        ~~~~~~~~~

Cool, good find! Can you manually create a test where we start with software vsync on a nexus 5 L. Then after mdp3_ctrl_init is called, we swap to hardware vsync. Try booting with SoftwareVsyncSource, then after mdp3_ctrl_init, change to GonkVsyncSource. Use two separate timers, not one single timer that defaults back to software if hardware isn't working. 

In that case, is hardware vsync reliable? As in, if we properly wait for mdp3_ctrl_init then start using vsync, is it always reliable? Or do we still hit the CPU wake up problem? Or does the vsync control still fail sometimes?
Flags: needinfo?(hshih)
(In reply to Mason Chang [:mchang] from comment #27)
> ...
> I talked with Jerry about this yesterday as well. I disagree that we should
> handle the hardware vsync failure unless we have a good reason why we can't
> fix hardware vsync. Silk is not very effective without hardware vsync and
> flame kit-kat has reliable hardware vsync. Android actually has reliable
> hardware vsync, so do OS X and Windows as well. If we want a good smooth
> product, we need a reliable hardware vsync. I think we're better off making
> sure vsync is reliable and not covering the hardware bugs so that we know
> when hardware vsync isn't working such as in this case.

I'm not sure exactly what is being proposed here, so let me chime in.

Using hardware or software vsync is a matter of setting a preference?  We can set the preference per device and be done with it.  Blocklisting is probably a good approach, in that we want to discover HW vsync problems on devices we start using.  If we can detect HW vsync problems, we should be able to alert the developer and then deal with it.  We're not going to be in a situation where the HW vsync is unreliable "per device", but instead "per device class", correct?  Two phones of the same type are going to behave the same?
Also please note that hardware vsync is certainly enabled on Android Nexus 5, so the Nexus 5 issues here appear to be some problem with the B2G port (either to Nexus 5 specifically, or L-gonk in general).
OK, I understand this better a bit, the whole story, so let me add more informed thoughts.

As a background, we're going to add a preference to be able to force vsync to software (or hardware), or leave it as an automatic decision (as we do today.)  Perhaps modify the existing preference, perhaps introduce a new one and remove the existing one.

The decision whether to use hardware vsync or not should be done once and maintained for the lifetime of the application.  There doesn't seem to be a valid reason to continually check and revise that decision - either we have a reliable source for hardware vsync or we don't; if it is reliable, then it's good all the time.  If it isn't reliable, then we do not want to use it at all, and mix up between the hardware and software.

With comment 31, it is suggested that hardware vsync is reliable on Nexus 5, when used with Android.  If that's the case, and it isn't reliable for B2G, we should fix that problem - I don't know if this is the bug for it or we need a separate one, but we should fix it in hardware vsync, not teach the rest of the code to deal with unreliable hardware vsync (see above as to what constitutes reliable :)

It is also possible that Android is not doing something that we are doing, for example, enabling and disabling rapidly. If that's the case, and we can figure out how to change our approach to make it reliable, great; if not, we should just default to software vsync.  Trying to dynamically change our mind during the execution because sometimes we succeed with hardware vsync and sometimes we don't is just asking for trouble.
(In reply to Mason Chang [:mchang] from comment #27)
> (In reply to peter chang[:pchang][:peter] from comment #21)
> > (In reply to Mason Chang [:mchang] from comment #11)
> > > (In reply to Jerry Shih[:jerry] (UTC+8) from comment #10)
> > > > I try to make GonkVsyncSource::GonkDisplay inherent from SoftwareDisplay.
> > > > Thus, we can switch to sw and hw vsync tick dynamically. It seems work. I'm
> > > > checking the vsync related preference setting for L and will upload the
> > > > patch later.
> > > 
> > > Why do we need to do this versus just making vsync on L be reliable like on
> > > kit-kat?
> > 
> > Mason, I had discussed with Jerry about the vsync on L yesterday. I agree
> > with you that we should figure out why vsync on L is unreliable. 
> > 
> > But I also think gecko should consider to handle the condition when HW vsync
> > is unreliable because it may happen in other platforms. The following is the
> > proposed idea to fix this issue.
> > In gecko, the platform only needs to be aware of vsyncsource[1]. Inside
> > vsyncsouce class, it checks the HW vsync availability first and fallback to
> > SW if there is any problem to enable HW vsync for each vsync enable call.
> > For each platform, we can override the function to enable/disable the HW
> > vsync. Therefore, we still can share the sw vsync implementation for each
> > platform.
> > 
> > [1]https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.
> > cpp#570
> 
> I talked with Jerry about this yesterday as well. I disagree that we should
> handle the hardware vsync failure unless we have a good reason why we can't
> fix hardware vsync. Silk is not very effective without hardware vsync and
> flame kit-kat has reliable hardware vsync. Android actually has reliable
> hardware vsync, so do OS X and Windows as well. If we want a good smooth

If we agree the platform should provide the reliable hardware vsync, then I think SW vsync should be used for debug only or fallback path if we don't have the hw vsync implementation.

For hw vsync problem on n5, I think we can force crash the b2g or switch to sw vsync with 30 fps to help developer to notice the problem in early stage.

> product, we need a reliable hardware vsync. I think we're better off making
> sure vsync is reliable and not covering the hardware bugs so that we know
> when hardware vsync isn't working such as in this case.
(In reply to peter chang[:pchang][:peter] from comment #33)
> 
> If we agree the platform should provide the reliable hardware vsync, then I
> think SW vsync should be used for debug only or fallback path if we don't
> have the hw vsync implementation.

Correct, that's what we have now. If we cannot initialize hardware vsync, we already fallback to software vsync. But we should not alternate between hardware and software vsync.

We had one bug on the nexus 5 where we could not initialize hardware vsync and fell back to software vsync. The patch in part 3 is good as it fixes a different bug, but we really need to make sure hardware vsync is both detected correctly and works reliably on a nexus 5.
We have a couple of issues wrapped up in here, thanks for investigating Jerry!

1) During initialization, we failed to init hardware vsync and fell back to software vsync. Then hardware vsync actually starts working and now we have multiple vsync "streams". One on the hardware vsync thread and one on the software vsync thread. This is the issue in comment 22 and addressed by part 3 of this bug.
2) Vsync on a nexus 5 for b2g is detected as unavailable during init due to the panel not loading, this is comment 26.
3) We need a flag to force fallback to software vsync for testing purposes, this is comment 32.
4) Vsync might still be unreliable due to thermal CPU issues as in comment 24. This might go away if we resolve the initialization issues.

We should fix the Nexus 5 vsync issues (2 and 4) in this bug as they seem related right. We can spin off (1) into another bug, and I'll spin off another bug for (3). 

@Jerry - Can you please make another bug with attachment 8598790 [details] [diff] [review] so we can land that? And investigate comment 29? Thanks!
I would like to describe the vsync problem again.

a) nexus5-l and nexus5-kk are all affected. It's related to the kernel video driver, not android version.
b) We just can't turn on vsync at earlier stage. After the first MSMFB_OVERLAY_PLAY ioctl call, we can use the vsync normally. In my opinion, I will not say that the vsync is unreliable. It just needs some time to set up.
c) In our current implementation, we will fall back to software timer at first failed vsync control. This vsync control is before MSMFB_OVERLAY_PLAY ioctl call.
d) I think thermal CPU issues in comment 24 is due to the wrong ScheduleNextVsync() task in message loop as comment 22. CPU is doing the ScheduleNextVsync() task all the time, so we get high temperature and bring up cpu failed.


I also dump the "nexus5 android 5.1" kernel log. It's also failed at the first vsync control call. Please check time stamp [4.632905]. And android set up the panel at [5.526450]. After that, vsync control is ready.

Here is the nexus5 android 5.1 log:
<4>[    4.632905] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1969  <==== failed control
<4>[    4.632973] bignose vsync_ctrl
<4>[    4.633027] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c, l:1428
.......
<4>[    5.518298] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1905
<4>[    5.518783] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1936
<6>[    5.518964] mdss_dsi_cont_splash_on:685 DSI on for continuous splash.
<4>[    5.525900] bignose file:drivers/video/msm/mdss/mdss_mdp_intf_cmd.c, l:630
<4>[    5.526056] bignose file:drivers/video/msm/mdss/mdss_mdp_ctl.c,line:1138
<4>[    5.530183] mdss_dsi_on:485 Panel already on.


Compare to b2g's log:
<4>[   11.703429] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1969  <==== failed control
<4>[   11.703511] bignose vsync_ctrl
<4>[   11.703585] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c, l:1428
......
<4>[   16.991132] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1905
<4>[   16.991231] bignose file:drivers/video/msm/mdss/mdss_mdp_overlay.c line:1936
<6>[   16.991347] mdss_dsi_cont_splash_on:685 DSI on for continuous splash.
<4>[   16.997311] bignose file:drivers/video/msm/mdss/mdss_mdp_intf_cmd.c, l:630
<4>[   16.997355] bignose file:drivers/video/msm/mdss/mdss_mdp_ctl.c,line:1138
<4>[   17.000237] mdss_dsi_on:485 Panel already on.


Please check time stamp [11.703429]. That's the call we want to check vsync capacity at [1], and fall back to sw timer.

[1]
https://hg.mozilla.org/mozilla-central/annotate/4b9b12c248dc/gfx/thebes/gfxAndroidPlatform.cpp#l499


Milan, Mason and Michael,
Is it clear?
Depends on: 1160102
Attachment #8598790 - Attachment is obsolete: true
Flags: needinfo?(hshih)
Attachment #8598790 - Flags: review?(mwu)
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #36)
> I would like to describe the vsync problem again.
> 
> b) We just can't turn on vsync at earlier stage. After the first
> MSMFB_OVERLAY_PLAY ioctl call, we can use the vsync normally. In my opinion,
> I will not say that the vsync is unreliable. It just needs some time to set
> up.

If we turn on vsync after MSMFB_OVERLAY_PLAY, is vsync reliable from that point on? If we properly init vsync, is it always reliable?
Flags: needinfo?(hshih)
Depends on: 1160216
Attached patch [test] vsync reliability test (obsolete) — Splinter Review
squash all patch.
add debug log to test vsync reliability.
Attachment #8594719 - Attachment is obsolete: true
(In reply to Mason Chang [:mchang] from comment #29)
> (In reply to Jerry Shih[:jerry] (UTC+8) from comment #26)
> Cool, good find! Can you manually create a test where we start with software
> vsync on a nexus 5 L. Then after mdp3_ctrl_init is called, we swap to
> hardware vsync. Try booting with SoftwareVsyncSource, then after
> mdp3_ctrl_init, change to GonkVsyncSource. Use two separate timers, not one
> single timer that defaults back to software if hardware isn't working. 
> 
> In that case, is hardware vsync reliable? As in, if we properly wait for
> mdp3_ctrl_init then start using vsync, is it always reliable? Or do we still
> hit the CPU wake up problem? Or does the vsync control still fail sometimes?

We can't call mdss_mdp_ctl_start() directly. It's in kernel video driver.
I test with attachment 8600868 [details] [diff] [review]. After mdss_mdp_ctl_start() call(MSMFB_OVERLAY_PLAY ioctl command), I don't see error log when we turn on vsync.

It will failed if we want to turn on vsync after the screen off. In log, I can see B2G still want to turn on vsync just after screen off. At this time, I can see the failed vsync control. Flame-kk, nexus-5-kk and nexus-5-l are all failed in this case.
Flags: needinfo?(hshih)
Attached file b2g vsync control log
This log show the sw and hw vsync tick time stamp.

B2G try to enable vsync at line 513.
Then, it uses sw timer until line 933.
I will try to design a non-mix timer and make a difference list compared with attachment 8598791 [details] [diff] [review].
I would like to check how android does for the first vsync control failed. Does it have sw timer before hw ready? Or it has another mechanism. Let me build the android system and trace again.
Depends on: 1160877
If there is no bootanimation file in device(path: /system/media/bootanimation.zip), show a solid color frame instead.
With this, b2g will use hwc::prepare() and hwc::set() at early stage. Then, we can use vsync event control successfully.
Attachment #8598788 - Attachment is obsolete: true
Attachment #8598789 - Attachment is obsolete: true
Attachment #8598791 - Attachment is obsolete: true
Attachment #8598792 - Attachment is obsolete: true
Milan, Mason and Michael,

There is another solution for vsync control problem. If we can call hwc::prepare() and set() earlier, there is no vsync issue at n5-l device. B2G boot animation will do this.
We don't always have boot animation zip file in device, so I create a solid frame to show if we can't play the bootAnim(attachment 8602108 [details] [diff] [review]).

Mason, what do you think about this?
Flags: needinfo?(mchang)
Attachment #8600868 - Attachment is obsolete: true
Rendering early isn't desirable because there might be something displayed already that we'll wipe out. That being said, this is an edge case and having working vsync is more valuable than covering this edge case.
Comment on attachment 8602108 [details] [diff] [review]
create a solid color frame if there is no bootAnimation. v1

Review of attachment 8602108 [details] [diff] [review]:
-----------------------------------------------------------------

I'm ok with this. Especially if most user facing builds actually have a boot animation, then this isn't a problem. I just want to make sure that if we fake a boot animation, vsync actually becomes reliable. I'm also not qualified to review this code, so please ask from :mwu I think.

::: widget/gonk/libdisplay/BootAnimation.cpp
@@ +660,5 @@
> +    }
> +    const gralloc_module_t* grallocModule = reinterpret_cast<const gralloc_module_t*>(module);
> +
> +    if (!PlayAnim(display, grallocModule)) {
> +        // If we haven't play bootAnim, show a solid color here.

nit: Add a comment as to why we need this and reference bug 1155797.
Flags: needinfo?(mchang)
I would like to describe why I try to show the bootAnimation here.

In nexus5-l video driver, the hwc::eventControl() needs add_vsync_handler and remove_vsync_handler handler ready in mdss_mdp_overlay_vsync_ctrl(). These two handlers are initialized in mdss_mdp_ctl_start() through MSMFB_OVERLAY_PLAY ioctl command. And hwc::prepare() hwc::set() will send MSMFB_OVERLAY_PLAY commend to kernel. If we can set up bootAnimation, b2g could call hwc::prepare() and hwc::set() much earlier.
Please check comment 4 and comment 8.

Our bootAnim is before the first vsync control call in gfxAndroidPlatform::CreateHardwareVsyncSource(). So the control call will not failed after that.

I have tested at n5-l manually, and I don't see the failed message. I will request some QA guys to confirm that.
Attachment #8602108 - Attachment is obsolete: true
Attachment #8602529 - Flags: review?(mwu)
Comment on attachment 8602111 [details] [diff] [review]
P4: turn on vsync for kk and upon. v2

If we already set up the bootAnim, we will not get vsync control failed in gfxAndroidPlatform::CreateHardwareVsyncSource(). Then, we can use vsync at n5-l device.
Attachment #8602111 - Attachment description: turn on vsync for kk and upon. v2 → P4: turn on vsync for kk and upon. v2
Attachment #8602111 - Flags: review?(bugmail.mozilla)
Comment on attachment 8602111 [details] [diff] [review]
P4: turn on vsync for kk and upon. v2

Review of attachment 8602111 [details] [diff] [review]:
-----------------------------------------------------------------

mchang is probably a better reviewer for this. I've lost track of all the things it's disabled on and why.

::: gfx/thebes/gfxAndroidPlatform.cpp
@@ +491,5 @@
>      // Jelly Bean has inaccurate hardware vsync so disable on JB
>      // Android pre-JB doesn't have hardware vsync
>      // Once L HwcComposer issues have been resolved, re-enable for L devices
>      // L is andriod version 21, Kit-kat is 19, 20 is kit-kat for wearables
> +#if defined(MOZ_WIDGET_GONK) && (ANDROID_VERSION >= 19)

According to the comment just above this (which also needs updating) we shouldn't be enabling it on ANDROID_VERSION == 20 because that's for wearables. So this should be ... && (ANDROID_VERSION == 19 || ANDROID_VERSION == 21).

::: widget/gonk/HwcComposer2D.cpp
@@ +180,5 @@
>  
>  bool
>  HwcComposer2D::EnableVsync(bool aEnable)
>  {
> +    // Only support hardware vsync on kitkat and upon due to inaccurate timings

s/upon/up/

@@ +213,5 @@
>      // Disable Vsync first, and then register callback functions.
>      device->eventControl(device, HWC_DISPLAY_PRIMARY, HWC_EVENT_VSYNC, false);
>      device->registerProcs(device, &sHWCProcs);
>  
> +    // Only support hardware vsync on kitkat and upon due to inaccurate timings

ditto

@@ +215,5 @@
>      device->registerProcs(device, &sHWCProcs);
>  
> +    // Only support hardware vsync on kitkat and upon due to inaccurate timings
> +    // with JellyBean.
> +#if ANDROID_VERSION >= 19

Should this be ANDROID_VERSION == 19 || ANDROID_VERSION == 21?

@@ +226,5 @@
>  
>  void
>  HwcComposer2D::Vsync(int aDisplay, nsecs_t aVsyncTimestamp)
>  {
> +    // Only support hardware vsync on kitkat and upon due to inaccurate timings

ditto
Attachment #8602111 - Flags: review?(bugmail.mozilla) → review?(mchang)
Comment on attachment 8602111 [details] [diff] [review]
P4: turn on vsync for kk and upon. v2

Review of attachment 8602111 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxAndroidPlatform.cpp
@@ +491,5 @@
>      // Jelly Bean has inaccurate hardware vsync so disable on JB
>      // Android pre-JB doesn't have hardware vsync
>      // Once L HwcComposer issues have been resolved, re-enable for L devices
>      // L is andriod version 21, Kit-kat is 19, 20 is kit-kat for wearables
> +#if defined(MOZ_WIDGET_GONK) && (ANDROID_VERSION >= 19)

I agree the comment needs updating, I'm not sure I agree that we shouldn't enable it on ANDROID_VERSION==20. If we do have a wearable (which I don't think we have), I'd like the default to be vsync and we explicitly disable if we find problems.

::: widget/gonk/HwcComposer2D.cpp
@@ +233,3 @@
>      TimeStamp vsyncTime = mozilla::TimeStamp::FromSystemTime(aVsyncTimestamp);
>      gfxPlatform::GetPlatform()->GetHardwareVsync()->GetGlobalDisplay().NotifyVsync(vsyncTime);
> +#endif

nit: #else MOZ_ASSERT(false) / NS_WARNING. We shouldn't be getting here without hardware vsync.
Attachment #8602111 - Flags: review?(mchang) → feedback+
Jerry,

Your latest boot animation patches also fix VSync for the CAF v2.2 prototype device. Can we get this fix in the v2.2 branch too please?
blocking-b2g: --- → 2.2?
Flags: needinfo?(dwilson)
Comment on attachment 8602534 [details] [diff] [review]
P3: show a solid color frame if we don't have the bootAnim. v1

Review of attachment 8602534 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/libdisplay/BootAnimation.cpp
@@ +671,5 @@
> +        png_color_16 color;
> +        // The b2g firefox bootAnim bg color.
> +        color.red = 20;
> +        color.green = 132;
> +        color.blue = 210;

I think trying to set this specific color is causing things to be more complicated than necessary. Just set the color to black and don't worry about the exact pixel format.
Attachment #8602534 - Flags: review?(mwu)
Attachment #8602111 - Attachment is obsolete: true
Attachment #8603319 - Flags: review?(mchang)
Just use a black frame if we don't have bootAnim zip file.

If we just use black frame, should we need P1 patch?
Attachment #8602534 - Attachment is obsolete: true
Attachment #8603324 - Flags: review?(mwu)
Attachment #8603319 - Flags: review?(mchang) → review+
Comment on attachment 8603324 [details] [diff] [review]
P3: show a solid color frame if we don't have the bootAnim. v2

Review of attachment 8603324 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/libdisplay/BootAnimation.cpp
@@ +661,5 @@
> +        return nullptr;
> +    }
> +    const gralloc_module_t* grallocModule = reinterpret_cast<const gralloc_module_t*>(module);
> +
> +    if (!PlayAnim(display, grallocModule, format)) {

Instead of wrapping the animation thread and putting the fallback here, make this fallback a helper function and call it from the animation code.

@@ +685,5 @@
> +                                 GRALLOC_USAGE_HW_FB,
> +                                 0, 0, buffer->width, buffer->height,
> +                                 &mappedAddress)) {
> +            wchar_t backgroundColor = AsBackgroundFill(color, format);
> +            wmemset((wchar_t*)mappedAddress,

We're just setting this to black, so we don't need AsBackgroundFill. Just memset the whole buffer to zero.
Attachment #8603324 - Flags: review?(mwu)
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #58)
> If we just use black frame, should we need P1 patch?

No. But that can be filed as a separate bug and landed if you think it's right.
Whiteboard: [CR 834841]
Whiteboard: [CR 834841] → [caf priority: p2][CR 834841]
Attachment #8602529 - Attachment is obsolete: true
Attachment #8603324 - Attachment is obsolete: true
Attachment #8602529 - Flags: review?(mwu)
Attachment #8603800 - Flags: review?(mwu)
blocking-b2g: 2.2? → 2.2+
Comment on attachment 8603800 [details] [diff] [review]
P3: show a black solid color frame if we don't have the bootAnim. v3

Review of attachment 8603800 [details] [diff] [review]:
-----------------------------------------------------------------

Much simplified - thanks.

::: widget/gonk/libdisplay/BootAnimation.cpp
@@ +477,5 @@
> +{
> +    LOGW("Show solid color frame for bootAnim");
> +
> +    ANativeWindowBuffer *buffer = aDisplay->DequeueBuffer();
> +    void* mappedAddress = nullptr;

nit: pick a side for the * to be on and stick with it. I think this file mostly prefers the right side.

@@ +684,5 @@
>              usleep(frameDelayUs * part.pause);
>          }
>      }
>  
> +    if(!animPlayed) {

nit: space after if
Attachment #8603800 - Flags: review?(mwu) → review+
Comment on attachment 8602530 [details] [diff] [review]
P2: extract format BPP util function. v1

Review of attachment 8602530 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gonk/libdisplay/BootAnimation.cpp
@@ +291,5 @@
> +{
> +    uint16_t bpp = 0;
> +
> +    switch (aFormat) {
> +        case HAL_PIXEL_FORMAT_BGRA_8888:

nit: don't indent case labels
Attachment #8602530 - Flags: review?(mwu) → review+
fix indent.
Attachment #8602530 - Attachment is obsolete: true
Attachment #8603800 - Attachment is obsolete: true
Attachment #8603319 - Attachment description: P4: turn on vsync for kk, l and up. v3 → P3: turn on vsync for kk, l and up. v3
Attachment #8603319 - Attachment is obsolete: false
update commit comment.
Attachment #8603319 - Attachment is obsolete: true
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #68)
> Please and p1~p3 to m-c.
s/and/land/
Needs rebasing.
Keywords: checkin-needed
Attachment #8605623 - Attachment is obsolete: true
Attachment #8605626 - Attachment is obsolete: true
Attachment #8605708 - Attachment is obsolete: true
Flags: needinfo?(ryanvm)
Update b2g.js for v2.2
Attachment #8607392 - Flags: review?(mchang)
Comment on attachment 8606790 [details] [diff] [review]
P1: extract format BPP util function. r=mwu

[Approval Request Comment]
Bug caused by (feature/regressing bug #):
Bug 1149480. N5-L, L-mr1 and CAF v2.2 device(comment 55) can't init vsync in CreateHardwareVsyncSource().

User impact if declined:
Init vsync failed in CreateHardwareVsyncSource(), then b2g falls back to software vsync.

Testing completed:
Test locally at m-c and v2.2.

Risk to taking this patch (and alternatives if risky):
low.
This patch just show a solid color frame if we don't have bootAnimation zip file in device. With this, we can turn on vsync later. Please check comment comment 45.

String or UUID changes made by this patch:
none.
Attachment #8606790 - Flags: approval-mozilla-b2g37?
Comment on attachment 8606791 [details] [diff] [review]
P2: show a black solid color frame if we don't have the bootAnim. r=mwu

[Approval Request Comment]
Bug caused by (feature/regressing bug #):
Bug 1149480. N5-L, L-mr1 and CAF v2.2 device(comment 55) can't init vsync in CreateHardwareVsyncSource().

User impact if declined:
Init vsync failed in CreateHardwareVsyncSource(), then b2g falls back to software vsync.

Testing completed:
Test locally at m-c and v2.2.

Risk to taking this patch (and alternatives if risky):
Low.
This patch just show a solid color frame if we don't have bootAnimation zip file in device. With this, we can turn on vsync later. Please check comment comment 45.

String or UUID changes made by this patch:
None.
Attachment #8606791 - Flags: approval-mozilla-b2g37?
Attachment #8606790 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Attachment #8606791 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Keywords: verifyme
Attachment #8607392 - Flags: review?(mchang) → review+
Comment on attachment 8607392 [details] [diff] [review]
P3: turn on vsync for kk, l and up for v2.2. v1


[Approval Request Comment]
Bug caused by (feature/regressing bug #):
Bug 1149480. N5-L, L-mr1 and CAF v2.2 device(comment 55) can't init vsync in CreateHardwareVsyncSource().

User impact if declined:
This patch try to turn on vsync at l and l-mr1 device. Without this, n5-l will still use software vsync for v2.2. 
We had turned on vsync before, but we disable it due to some init problem.

Testing completed:
Test locally at m-c and v2.2.

Risk to taking this patch (and alternatives if risky):
High.
This patch try to turn on vsync for l and l-mr1 device. Flame-kk already has vsync for v2.2 and m-c. This patch will turn on vsync as flame-kk.

String or UUID changes made by this patch:
None.
Attachment #8607392 - Flags: approval-mozilla-b2g37?
I have used below STR to verify this bug on master branch, and will get below errors. During this test, device can be used but slowly; home button also works correctly; scrolling at Homescreen is also fine. There is no error found at bug 1147753.

* STR:
1. reboot N5
2. open several apps, and camera
3. use camera to record video with flashlight
4. keep zoom in/out during recording to increase CPU temperature
5. after recording, playing that video with screen always on (not to sleep mode)

* Error msg:
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu2
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu3
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu1
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu2
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu3
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu1
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu2
E/MP-Decision(  863): Error 1 setting online status to 1 forcpu3


* Test env:
Build ID               20150520162503
Gaia Revision          bc42fbc12d622bffd7e8afcb8d56f8a1d9773c60
Gaia Date              2015-05-20 22:32:56
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/8663598512f7
Gecko Version          37.0
Device Name            hammerhead
Firmware(Release)      5.1
Firmware(Incremental)  eng.cltbld.20150520.193900
Firmware Date          Wed May 20 19:39:15 EDT 2015
Bootloader             HHZ12f
Attachment #8607392 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Hi Hermes,
Thanks for verifying on master.
Please help to verify on 2.2 after patch landed. Thanks!
Flags: needinfo?(hcheng)
Josh, I will take PTO next week. Could you ask Marigold for help?


* STR:
1. reboot N5 (after reboot, use two terminal to catch logcat)
   - terminal A: $ adb logcat | grep vsync
   - terminal B: $ adb logcat | grep cpu
2. open several apps, and camera
3. use camera to record video with flashlight
4. keep zoom in/out during recording to increase CPU temperature. After 2-3 mins, terminal B would get below errors:
   - E/MP-Decision(  863): Error 1 setting online status to 1 forcpu1
5. after recording 10 mins, playing that video with screen always on (not to sleep mode)


* Expected results:
1. Terminal A "cannot" get below error
   -  "Error enabling gonk vsync. Falling back to software vsync"
2. After terminal B get error at step 4, the device should be still workable but maybe a little slow.
   Besides, no errors mentioned at bug 1147753 comment 0 happen.
   a. screen flashing (at Lockscreen, at Homescreen, at some apps)
   b. blank screen only has background at Homescreen
   c. Homescreen only has app names but no app icons
   d. got stuck at Card View
Flags: needinfo?(hcheng) → needinfo?(jocheng)
Thanks for testing! Does the forcecpu issue from comment 82 also occur without this patch? As in, does a nightly from May 17th, 2015 also show this issue? I just want to make sure that these patches in this bug didn't make it worse. From my conversations with Jerry, if a Nexus 5 was left on for long periods of time, the CPU temperature mechanism would shut off CPUs.
Flags: needinfo?(hcheng)
(In reply to Mason Chang [:mchang] from comment #85)
> Thanks for testing! Does the forcecpu issue from comment 82 also occur
> without this patch? As in, does a nightly from May 17th, 2015 also show this
> issue? I just want to make sure that these patches in this bug didn't make
> it worse. From my conversations with Jerry, if a Nexus 5 was left on for
> long periods of time, the CPU temperature mechanism would shut off CPUs.

Yes, this error had shown when I reported bug 1147753 (https://bugzilla.mozilla.org/attachment.cgi?id=8583612).
Flags: needinfo?(hcheng)
Hi Hermes,

Could you try to turn off these 4 prefs and test comment 82 for v2.2 again?
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/file/a1e8f172523d/b2g/app/b2g.js#l1109
Then we can see the temperature different between vsync on and off.

Nightly from 2015/5/17 is the m-c version without vsync on.
https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-nexus-5-l-eng/2015/05/2015-05-17-16-02-01/
Nightly from 2015/5/18 is the first version with vsync on at m-c.
https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-nexus-5-l-eng/2015/05/2015-05-18-16-02-01/

We want to make sure that these patches in this bug didn't make the temperature problem worse
Flags: needinfo?(hcheng)
@Jerry, as we discussed offline, could you provide v2.2 build id for on/off testing?
Flags: needinfo?(hcheng) → needinfo?(hshih)
@Jerry,

My STR:
1. restart phone
2. start timer when Firefox OS icon display
3. unlock phone, and launch Camera at Homescreen
4. start record with flashlight on
5. keep zoom in and out

The CPU error occurs at about 50 seconds for both builds. So, I cannot see too much impact for now.
E/MP-Decision(  870): Error 1 setting online status to 1 forcpu1
Flags: needinfo?(hshih)
Flags: needinfo?(hshih)
Hi Norry,
Please help to verify on 2.2 with STR from comment 84.
Thanks!
Flags: needinfo?(jocheng) → needinfo?(fan.luo)
This bug has been verified as pass on latest build of Flame v2.2, Nexus5 v2.2, by the STR in Comment 84 
Actually Result:
N5 v2.2:
1. Terminal A "cannot" get error
2. After terminal B get error at step 4, no errors mentioned at bug 1147753 comment 0 happen.
  
Flame v2.2 :
1. Terminal A "cannot" get error
2. Terminal B hadn't get any errors and no errors mentioned at bug 1147753 comment 0 happen.
 
Reproduce rate:0/5
See attachment:Verify_2.2.3gp and Verify_2.2.jpg

Device:Flame v2.2 build(pass)
Build ID               20150527162504
Gaia Revision          0f162853ae6a3e2b05164878e49276ead0d6f09c
Gaia Date              2015-05-27 22:50:51
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/f2efe1d20e12
Gecko Version          37.0
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150527.195632
Firmware Date          Wed May 27 19:56:46 EDT 2015
Bootloader             L1TC000118D0

Device:Nexus 5 v2.2 build(pass)
Build ID               20150527162504
Gaia Revision          0f162853ae6a3e2b05164878e49276ead0d6f09c
Gaia Date              2015-05-27 22:50:51
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/f2efe1d20e12
Gecko Version          37.0
Device Name            hammerhead
Firmware(Release)      5.1
Firmware(Incremental)  eng.cltbld.20150527.195418
Firmware Date          Wed May 27 19:54:38 EDT 2015
Bootloader             HHZ11k
Flags: needinfo?(fan.luo)
Attached video Verify_2.2.3gp
Attached image Verify_2.2.jpg
jerry, is there a summary of why vsync is unreliable on L devices? I am wondering if threading difference also caused the problem. On android, hwc hal is always controlled from SurfaceFlinger thread. But on b2g gonk, main thread and compositor thread access to hwc hal.
Flags: needinfo?(hshih)
QA Whiteboard: [MGSEI-Triage+]
(In reply to Sotaro Ikeda [:sotaro] from comment #95)
> jerry, is there a summary of why vsync is unreliable on L devices? I am
> wondering if threading difference also caused the problem. On android, hwc
> hal is always controlled from SurfaceFlinger thread. But on b2g gonk, main
> thread and compositor thread access to hwc hal.

Hi Sotaro,

I don't know other platform's implementation.
For nexus-5-l or kk's hwc, mdss_mdp_ctl_start_sub() should be called before vsync enable/disable. Before mdss_mdp_ctl_start_sub(), all Hwc::eventControl() will get EOPNOTSUPP error. That's why we think the vsync is not unreliable on l device.
At flame-kk device, we have a totally different path for vsync eventControl. We can use Hwc::eventControl() at very early stage unlike nexus-5-l.
So the key function at nexus-5 device is the mdss_mdp_ctl_start_sub() call. This function is called implicitly in hwc::set() hwc::prepare(). Sometimes, we don't have bootAnimation.zip file in device. In this case, we will not call prepare() and set() for bootAnim and get error call in the first Hwc::eventControl() call at [2].
I don't know the hwc::set() hwc::prepare() function's thread module restriction. We always call that in compositor thread except the bootAnim case. The vsync eventControl() should be called at any thread. That's showed in android's header. So I think that it's not related to our thread model for hwc.

The cpu error message in comment 0 is due to a vsync control bug. I have a simple diagram to show that. Please check [1].
  E/MP-Decision(  860): Error 1 setting online status to 1 forcpu3

Is it clear for you?

[1]
https://bugzilla.mozilla.org/show_bug.cgi?id=1155797#c22
[2]
https://hg.mozilla.org/mozilla-central/annotate/45a4d6336c73/gfx/thebes/gfxAndroidPlatform.cpp#l499
Flags: needinfo?(hshih) → needinfo?(sotaro.ikeda.g)
Flags: needinfo?(sotaro.ikeda.g)
You need to log in before you can comment on or make changes to this bug.