Closed Bug 1018836 Opened 10 years ago Closed 10 years ago

Lockscreen fails to respond for a few seconds

Categories

(Firefox OS Graveyard :: Gaia::System::Window Mgmt, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed)

RESOLVED FIXED
2.0 S3 (6june)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed

People

(Reporter: ladamski, Assigned: alive)

References

()

Details

(Keywords: perf, regression, Whiteboard: [c=effect p= s=2014.06.06.t u=2.0] [systemsfe])

Attachments

(4 files, 2 obsolete files)

Flame 2.0 20140601 (10G-2 base)

When trying to unlock the phone, the lockscreen is unresponsive for 2-3 seconds.  Sometimes it doesn't even render the lockscreen at all and you see just see the translucent overlay and background during that time.
Using the Flame and:

Gaia   82679a5ce84d1b6bf388da6536d5682a3ad56de3
SourceStamp 6a984e21c2ca
BuildID 20140602072051
Version 32.0a1
10-G base

I still see the issue with the lockscreen being sluggish to open, although I haven't seen it not rendering the lockscreen at all. Adding the appropriate keywords to nail down when this regression started.
Thats a profile:
http://people.mozilla.org/~bgirard/cleopatra/#report=4008a136ed93e986d05a173235a06c09ddb1d046

We see 3 touch events where I try to use the slider but nothing happens.
Afterwards the transition to the homescreen is very slow.
I've bisect to hunt the similar symptom, which is the LockScreen would show a semi-transparent background of the homescreen. And here is the result:

450feb56670368032aafcca287418137d874b53e is the first bad commit
commit 450feb56670368032aafcca287418137d874b53e
Author: Alive Kuo <alegnadise@gmail.com>
Date:   Wed Apr 16 15:56:03 2014 +0800

    Bug 950673 - Delay animationend event until active app loaded.


But I don't know whether this is the same bug. I'll do more research on it.
Attached file Similar symptom (obsolete) —
Attachment #8433024 - Attachment is obsolete: true
Attached image Similar symptom
This is what I've bisected.
NI Alive because of the regression. Maybe we need a back out, because this is a regression.
Flags: needinfo?(alive)
Oh, and another reason I did the NI is because it now would make a flashing before the LockScreen got shown, and I don't know whether it's a correct way to show the LockScreen (which also 'hide' the semi-transparent homescreen, if it still exists), or it's just another bug (which means a bug hide another bug's symptom, and this means we'll have two bugs need to be solved).
blocking-b2g: --- → 2.0+
Thanks Greg for the analysis! Please back out the patch that causes the regression if we can't find an immediate fix here.
Is this going to be needed on 2.0 after bug 950673 lands on 1.4? If so, please nom this for 1.4?
I've bisect that the blinking, and the responding delay was caused by

2aa5680b7013157ee1911cee4cefd7bec4151e91 is the first bad commit
commit 2aa5680b7013157ee1911cee4cefd7bec4151e91
Author: Etienne Segonzac <etienne@segonzac.info>
Date:   Wed May 28 12:53:15 2014 +0200

    Bug 1016483 - Stop doing the expensive #screen.edges restyle

Before it it's the old issue that there is a layout, which would show the semi-transparent homescreen before the window got opened, and it got disappeared after the patch landed, with the blinking effect and the responding delay. I'll take a video to show that and NI the author.
Flags: needinfo?(etienne)
Blocks: 1016483
No longer blocks: 950673
I've uploaded the video. There are two problems:

1. Flicking caused by the patch, or a semi-transparent layout after reverting it
2. Fail to response user after the window got opened
I'm having a look but this bug was filed before bug 1016483 landed, and I also encountered the issue before that...

So I'd like to understand exactly what is causing this and if/why bug 1016483 is making things worth.
I think the semi-transparent layout was not caused by the Bug 1016483, and need to bisect again. However, the "unresponding time" was getting obvious because there is a window opened and it seems already ready to be manipulated, which may be not true when user can see a semi-transparent homescreen before the lockscreen.
If you activate the "Flash repainted area" setting you can see that there is a full screen repaint just before the lockscreen finally becomes responsive.
I might be missing something but why aren't we opening the lockscreen app as soon as the screen is turned off?

This is helping a lot for the unresponsiveness issue, does it have unwanted side effects?
Attachment #8433856 - Flags: feedback?(gweng)
Flags: needinfo?(etienne)
Blocks: 1019456
> I might be missing something but why aren't we opening the lockscreen app as soon as the screen is turned off?

I remembered I did it once but there were some issues pushed me to give it up. I may retry this idea and to see if it's a good idea to solve this.
Well it doesn't help. I've tested to do NO revert and just patch LWM as you suggested, but the unresponding time still occur. There would be two attachments to show that. I think the real problem is something change in the transition control, which now would not open the window after the animation like before (so the user would see an 'untouchable' LockScreen, but these are all my guests).

However I would also need to start to bisect to remove the 'semi-trasparent' homescreen first, which seems relevant with this regression.
I've bisected that the 'semi-transparent' layout is a regression by:

Author: Alive Kuo <alegnadise@gmail.com>
Date:   Wed Apr 16 15:56:03 2014 +0800

    Bug 950673 - Delay animationend event until active app loaded.

As usual, NI the author to discuss how to solve the regression.
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #19)
> Patch in LWM only: http://youtu.be/moI4sMJWLUY
> Revert the patch: http://youtu.be/S8j59iWslro

If you turn on the screen right after you turned it off we don't benefit from loading the lockscreen app early.

But in the real life use case of turning the screen off, putting the phone in your pocket, and turning it on a bit later, then there is no delay.
Comment #16 echos what I was wondering.  FWIW I went back to a 5/29 build and so far haven't run into the issue.
(In reply to Lucas Adamski [:ladamski] (plz needinfo) from comment #22)
> Comment #16 echos what I was wondering.  FWIW I went back to a 5/29 build
> and so far haven't run into the issue.

As I mentioned, it's (according to what I've found) actually caused by two patches. So I don't know whether you mean that both two symptoms are all disappeared or not. I'll give it a try.
(In reply to Etienne Segonzac (:etienne) from comment #21)
> (In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #19)
> > Patch in LWM only: http://youtu.be/moI4sMJWLUY
> > Revert the patch: http://youtu.be/S8j59iWslro
> 
> If you turn on the screen right after you turned it off we don't benefit
> from loading the lockscreen app early.
> 
> But in the real life use case of turning the screen off, putting the phone
> in your pocket, and turning it on a bit later, then there is no delay.

I don't think the point is whether this is a real life case, especially the demo I put is also not a edge case. It's a regression, and what I can do is to find out which bugs cause it, just like people would do for other regression bugs. If I bisect the wrong bugs out, it's welcome to correct me and we can find out the real root cause.
I am moving this is window management because the regression happened on AppWindow and regressed LockScreenWindow.
Component: Gaia::System::Lockscreen → Gaia::System::Window Mgmt
Whiteboard: [systemsfe]
Well, I don't have any quick idea and I am afraid I don't have much time on this right now, but
1. What if the animationend event is passed to lockscreenWindow always without delay?
2. I don't see what's the so-called semi-transparent homescreen during lockscreenWindow is opening..
but if you think something is above lockscreenWindow to block the user to touch it, try to use AppManager to see what it is. Maybe a z-index issue which was not found before.

And for sure if this is blocking something, bug 950673 could be backout-ed. But since we are all working on system app, try to find out the root cause and fix it will benefit than just blaming and throwing the ball back.

I know everyone is busy at this point, thank you all for all kind of effort.
Flags: needinfo?(alive)
No, I don't blame anyone. As I said, I just find out the regression. To back out or not to back out, and even who should fix, is not my main purpose, and it can be discussed with leaders who own the actual resources management duty.

The fact is, which in my opinion it's a very clear, the patch caused the regression, and no one has found it before the landing (no blaming). So we need to kill the regression as soon as possible, rather than to guess why I do this or "throwing the ball back".
And I don't know whether we have clear rules about who should fix the regressions except the author (when the author has no resource), or under what situation we can/can't back them out. In this case, I did avoid to directly back out the patch (as others did on my regressions) because it's a System patch, which may cause other regressions, but that doesn't mean the patch didn't break things (unless I bisect it incorrectly), or the bug doesn't important (it exactly block user experience).
Blocks: 950673
Severity: normal → blocker
Priority: -- → P1
Whiteboard: [systemsfe] → [c=effect p= s= u=2.0] [systemsfe]
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #27)
> No, I don't blame anyone. As I said, I just find out the regression. To back
> out or not to back out, and even who should fix, is not my main purpose, and
> it can be discussed with leaders who own the actual resources management
> duty.
> 
> The fact is, which in my opinion it's a very clear, the patch caused the
> regression, and no one has found it before the landing (no blaming). So we
> need to kill the regression as soon as possible, rather than to guess why I
> do this or "throwing the ball back".


Tim,

Please may a call on whether to back this out. It's confirmed as causing a regression and needs to be fixed or backed out asap. This also needs an assignee.

Thanks,
FxOS Perf Triage
Flags: needinfo?(timdream)
Greg - I'm confused about the bisection discussion above. Which bug is the root cause of this regression?
Flags: needinfo?(gweng)
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #27)
> No, I don't blame anyone. As I said, I just find out the regression. To back
> out or not to back out, and even who should fix, is not my main purpose, and
> it can be discussed with leaders who own the actual resources management
> duty.
> 
> The fact is, which in my opinion it's a very clear, the patch caused the
> regression, and no one has found it before the landing (no blaming). So we
> need to kill the regression as soon as possible, rather than to guess why I
> do this or "throwing the ball back".

Finding a regression range is good, but still less useful that understanding the issue.

The fact is, we open the lockscreen app when turning the screen on, so there's always gonna be a delay.
We can work on making this delay shorter, but a setVisible(true) isn't free and there are many things that can cause the phone CPU to be under stress an make this even slower (check for updates, downloads, calendar syncing...).
The WindowManager can't guarantee an instantaneous app opening, even if it's warm, even if the app isn't OOP, but the LockscreenWindowManager is kind of based on this assumption right now.

When we turn the screen off we know we'll want to display the lockscreen next, why wait?
I think it's another issue as I said: the unresponding time wasn't exist even with the LWM without the pre-launching. However, it suddenly occur after some patches. From my point of view, your suggestion is to hide the symptom, not to eliminate it. In fact, this only leave the issue unsolved. 

I my opinion this bug can be solved by taking both actions:

1. kill the unresponding time and restore to the previous states, unless there is some other concerns
2. test and make sure the pre-launching really works and break nothing

I would need to take a detailed test to make sure the 2. really works. As I said, I remembered that I did it once, and something got broken, so I removed the code.
Flags: needinfo?(gweng)
The bisect shows that

1. The Bug 950673 cause the 'semi-transparent' window before the LockScreen got show

In this case user can't manipulate LockScreen until the window got disappeared.

2. The Bug 1016483 hide it

In this cause user still can't manipulate the LockScreen, but now user can see LockScreen, not the strange window. This make things worse because the user would feel the window now is able to manipulate, even though it actually isn't as the case 1. shows.
(In reply to Etienne Segonzac (:etienne) from comment #31)
> The fact is, we open the lockscreen app when turning the screen on, so
> there's always gonna be a delay.
> We can work on making this delay shorter, but a setVisible(true) isn't free
> and there are many things that can cause the phone CPU to be under stress an
> make this even slower (check for updates, downloads, calendar syncing...).
> The WindowManager can't guarantee an instantaneous app opening, even if it's
> warm, even if the app isn't OOP, but the LockscreenWindowManager is kind of
> based on this assumption right now.

I'd really like this point to be addressed before we start ripping things apart. It's a fundamental issue which has many symptoms.
I suggest to open another bug to land the pre-launching in LWM, it can be a separate bug and be solved while the regression be handling.
I recommend we try to fix the regressions than back out complex patches, in which 2.0 is depend on.
Flags: needinfo?(timdream)
Gregor, could your team address this issue? This is a [systemsfe] bug.
Flags: needinfo?(anygregor)
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #37)
> I recommend we try to fix the regressions than back out complex patches, in
> which 2.0 is depend on.

What patch is so complex that we can't back out? What caused the regression here?
I disagree that we should keep this regression without a clear plan and timeframe to fix it.
Flags: needinfo?(anygregor)
Hey Mason, do you think a profile could help us identify the issue(s) here.

If we're going to workaround the debatable design of the LockScreenWindowManager/LockScreenWindow by making app launch time slower I'd like to know exactly what we need to remove.
Flags: needinfo?(mchang)
(In reply to Etienne Segonzac (:etienne) from comment #40)
> Hey Mason, do you think a profile could help us identify the issue(s) here.
> 
> If we're going to workaround the debatable design of the
> LockScreenWindowManager/LockScreenWindow by making app launch time slower
> I'd like to know exactly what we need to remove.

There is a cleopatra profile in comment 3. Is this what you need?
(In reply to Gregor Wagner [:gwagner] from comment #41)
> (In reply to Etienne Segonzac (:etienne) from comment #40)
> > Hey Mason, do you think a profile could help us identify the issue(s) here.
> > 
> > If we're going to workaround the debatable design of the
> > LockScreenWindowManager/LockScreenWindow by making app launch time slower
> > I'd like to know exactly what we need to remove.
> 
> There is a cleopatra profile in comment 3. Is this what you need?

Thanks Gregor, I might be misreading it (so keeping the ni? on Mason) but it looks like we're doing *nothing* during ~2sec which looks suspiciously similar to the OPENING_TRANSITION_TIMEOUT of the AppTransitionController.

I think we might have a very simple logical bug, I'd like to have a look before we start backing out stuff without understanding the issue.
(In reply to Etienne Segonzac (:etienne) from comment #42)
> (In reply to Gregor Wagner [:gwagner] from comment #41)
> > (In reply to Etienne Segonzac (:etienne) from comment #40)
> > > Hey Mason, do you think a profile could help us identify the issue(s) here.
> > > 
> > > If we're going to workaround the debatable design of the
> > > LockScreenWindowManager/LockScreenWindow by making app launch time slower
> > > I'd like to know exactly what we need to remove.
> > 
> > There is a cleopatra profile in comment 3. Is this what you need?
> 
> Thanks Gregor, I might be misreading it (so keeping the ni? on Mason) but it
> looks like we're doing *nothing* during ~2sec which looks suspiciously
> similar to the OPENING_TRANSITION_TIMEOUT of the AppTransitionController.
> 
> I think we might have a very simple logical bug, I'd like to have a look
> before we start backing out stuff without understanding the issue.

The opening transition for lockscreenWindow is immediate.
I am thinking about: what if we jump from closed to opened rightaway if we see the animation name is immediate?
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #43)
> (In reply to Etienne Segonzac (:etienne) from comment #42)
> > (In reply to Gregor Wagner [:gwagner] from comment #41)
> > > (In reply to Etienne Segonzac (:etienne) from comment #40)
> > > > Hey Mason, do you think a profile could help us identify the issue(s) here.
> > > > 
> > > > If we're going to workaround the debatable design of the
> > > > LockScreenWindowManager/LockScreenWindow by making app launch time slower
> > > > I'd like to know exactly what we need to remove.
> > > 
> > > There is a cleopatra profile in comment 3. Is this what you need?
> > 
> > Thanks Gregor, I might be misreading it (so keeping the ni? on Mason) but it
> > looks like we're doing *nothing* during ~2sec which looks suspiciously
> > similar to the OPENING_TRANSITION_TIMEOUT of the AppTransitionController.
> > 
> > I think we might have a very simple logical bug, I'd like to have a look
> > before we start backing out stuff without understanding the issue.
> 
> The opening transition for lockscreenWindow is immediate.
> I am thinking about: what if we jump from closed to opened rightaway if we
> see the animation name is immediate?

Sounds promising I will try this right away :)
I made a patch before I saw your lasted comment.
I just figure out the stuff prevents us to touch the lockscreen is div.touch_blocker. It's visible during transition opening.
What we should do is just skip the opening/closing state if animation name is immediate.
Set feedback since no tests for now.
Attachment #8434780 - Flags: feedback?(etienne)
Have a patch to fix the root cause, stealing. Making unit tests.
Assignee: nobody → alive
Comment on attachment 8434780 [details]
skip open/close states if animation name is immediate

Love it!
Works like a charm and benefits from the appTransitionController instead of just shortcuting it.

Let's add some quick tests and land it :)
Attachment #8434780 - Flags: feedback?(etienne) → feedback+
patch + simple tests.
Attachment #8434804 - Flags: review?(etienne)
Attachment #8433856 - Attachment is obsolete: true
Attachment #8433856 - Flags: feedback?(gweng)
Flags: needinfo?(mchang)
Comment on attachment 8434804 [details]
https://github.com/mozilla-b2g/gaia/pull/20055/files

All good!
Small question about the fakeTimers removal but travis will tell :)
Attachment #8434804 - Flags: review?(etienne) → review+
https://github.com/mozilla-b2g/gaia/commit/0130f0ebfd3deca45c1d5663332221020e698c57
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Thanks guys for the quick fix!
Depends on: 1021533
Target Milestone: --- → 2.0 S3 (6june)
Whiteboard: [c=effect p= s= u=2.0] [systemsfe] → [c=effect p= s=2014.06.06.t u=2.0] [systemsfe]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: