Closed
Bug 1110960
Opened 10 years ago
Closed 10 years ago
Device slowing down after more than one day of uptime
Categories
(Firefox OS Graveyard :: Gaia::System::Lockscreen, defect)
Tracking
(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.0M fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)
People
(Reporter: gerard-majax, Assigned: gweng)
References
Details
(Keywords: regression)
Attachments
(4 files)
This is reproduced consistently since a couple of weeks at least. I cannot provide any STR at all other than "use the device". Reproduced on v2.0 on a v188 base system (OTA from Mozilla).
Symptoms: device getting sluggish.
|adb shell top -m 5 -d 1| shows that B2G main process is around 50%.
Reporter | ||
Comment 1•10 years ago
|
||
This profile shows that B2G is spending quite a lot of time in LockScreenWindow.lockOrientation().
I cannot tell for sure that it's the reason for the slow down, but I find it suspicious enough.
Flags: needinfo?(gweng)
Assignee | ||
Comment 2•10 years ago
|
||
I think if the only STR is using the phone, we should make a test that disable the orientation call, and put it one day long to see if the device become slow. By the way, how you used the phone? If I only left it alone and do not manipulate it, would it become slow, too? Or I must use it as a daily phone, like twittering, browsering and dialing, like what we would do on our phone?
Flags: needinfo?(gweng) → needinfo?(lissyx+mozillians)
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #2)
> I think if the only STR is using the phone, we should make a test that
> disable the orientation call, and put it one day long to see if the device
> become slow. By the way, how you used the phone? If I only left it alone and
> do not manipulate it, would it become slow, too? Or I must use it as a daily
> phone, like twittering, browsering and dialing, like what we would do on our
> phone?
This is a dogfooding device, so its usage is quite straightforward:
- phone calls
- sms/mms
- radio fm
- taking pictures, browsing gallery
Letting the device rot by itself is not an option. Did you had a look at the profile ? Why would lockOrientation makes calls to mozApps ?
Flags: needinfo?(lissyx+mozillians) → needinfo?(gweng)
Assignee | ||
Comment 4•10 years ago
|
||
Yes I've viewed the log with Cleopatra. I just want to make sure it's the root cause as you said, with a real device and real case. Since if I need to pin down the root cause of course I need to make sure how to reproduce it by myself.
And another thing I notice is you mentioned this affects since 2.0, this means it's possible a regression of Bug 1089529, that Alive and I encountered a case we need to workaround with the current Orientation API. I can revert it to see if your symptom would happen again. If not, we need a QA help to find regression window, since the bug is the only one I suspect related.
And I don't understand what you mean that "lockOrientation makes calls to mozApps"? The function only call lock orientation API to do things.
Flags: needinfo?(gweng)
Assignee | ||
Comment 5•10 years ago
|
||
OK. Now I've two Flames: one nullized the function and one does have the function, and I added some log to see what happen. I would try to perform some user actions later to see if I can reproduce it, and if I revert the patch the symptom would disappear.
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #4)
> Yes I've viewed the log with Cleopatra. I just want to make sure it's the
> root cause as you said, with a real device and real case. Since if I need to
> pin down the root cause of course I need to make sure how to reproduce it by
> myself.
Of course, but sadly we have not yet been able to identify proper STR :(
>
> And another thing I notice is you mentioned this affects since 2.0, this
> means it's possible a regression of Bug 1089529, that Alive and I
> encountered a case we need to workaround with the current Orientation API. I
> can revert it to see if your symptom would happen again. If not, we need a
> QA help to find regression window, since the bug is the only one I suspect
> related.
Yes, this is the bug I have spotted also. FYI, the date this bug landed would roughly match the first time the slowdown was experienced by my girlfriend dogfooding this device. Nothing 100% sure, but the timeframe do match.
>
> And I don't understand what you mean that "lockOrientation makes calls to
> mozApps"? The function only call lock orientation API to do things.
Ok, I'll have a look at the Gecko-side of this to see why.
Reporter | ||
Comment 7•10 years ago
|
||
Ok, in the mozLockOrientation() implementation, there's a call to permission checking: http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/dom/base/nsScreen.cpp#269
This does a check of the app status (installed or not): http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/dom/base/nsScreen.cpp#216
Then at some point, there's probably a call made to http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/caps/src/nsScriptSecurityManager.cpp#275 whichs tries to get the local app id at http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/caps/src/nsScriptSecurityManager.cpp#293
Assignee | ||
Comment 8•10 years ago
|
||
So I think there are some possible issues with the codes:
1. If I handle the ID in a wrong way, the setInterval may not stop and therefore we have a performance trouble. However, according to my previous log & test, the interval callback was cleared as the design. So maybe this is not the issue, but I would check it again.
2. If the slowdown happen because the call of the API, the phone would not be slowdown without unlock & lock. And I can guess this can lead to a new (shorter) STR that try to unlock & lock multiple times, the phone should be slowdown significantly. However, I need to do some test about it.
Assignee | ||
Comment 9•10 years ago
|
||
The reason is the set interval is too fast, which would set a new one before we clear the old one. I didn't realize this is because I set a slower value in my WIP patch. From the test logs I believe this can solve the issue.
I would submit other patches for other branches later.
Attachment #8536326 -
Flags: review?(alive)
Assignee | ||
Comment 10•10 years ago
|
||
For master. The content is the same.
Assignee | ||
Comment 11•10 years ago
|
||
Patch v2.1. The content is the same.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → gweng
Assignee | ||
Updated•10 years ago
|
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → affected
status-b2g-v2.2:
--- → affected
Keywords: regression
Comment 12•10 years ago
|
||
Comment on attachment 8536326 [details] [review]
Patch v2.0
Test please (lockOrientation() two times should not setInterval twice)
Also, I am curious if turn screen on -> off quickly, do we still try to lock the orientation? Do we need to clear the interval if screen is off and try lock again later when screen is on?
Attachment #8536326 -
Flags: review?(alive) → feedback+
Assignee | ||
Comment 13•10 years ago
|
||
I've tested the case you mentioned: it would set and clear immediately, and there is no leaking. That is, every time the screen get off and on, the set and clear pair would do the lock work.
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8536326 [details] [review]
Patch v2.0
Updated. Would update other versions after this is r+.
Attachment #8536326 -
Flags: review?(alive)
Reporter | ||
Comment 15•10 years ago
|
||
Have you been able to expose cases where things may go wrong?
I can't help much more sadly, yet, what I know is that this device has been running fine and started to get crazy like this just a couple of weeks ago, with the symptoms documented in comment 0. And the profile I captured was when it was getting slow. There was just 5-10 secs at most between the time I started profiling and when I did the capture.
Flags: needinfo?(gweng)
Assignee | ||
Comment 16•10 years ago
|
||
?? As I said at Comment 9, I thought that I pin down the root cause and the patch should fix it. If you want to test it before the patch land, you can take the v2.0 patch that I just updated.
Flags: needinfo?(gweng)
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Greg Weng [:snowmantw][:gweng][:λ] from comment #16)
> ?? As I said at Comment 9, I thought that I pin down the root cause and the
> patch should fix it. If you want to test it before the patch land, you can
> take the v2.0 patch that I just updated.
Ok, that's fine and that answers my question: we should keep an eye on this to make sure this was the root cause :)
Comment 18•10 years ago
|
||
Comment on attachment 8536326 [details] [review]
Patch v2.0
r=me
Attachment #8536326 -
Flags: review?(alive) → review+
Assignee | ||
Comment 19•10 years ago
|
||
OK I've updated all patches. Now waiting CI results.
Assignee | ||
Comment 20•10 years ago
|
||
We've encountered CI troubles... need manually re-trigger it. I'll keep to try to make them with clear results.
Assignee | ||
Comment 21•10 years ago
|
||
Current result:
v2.0: with one irrelevant Gij failure at Calendar test, re-triggering.
master: all green.
v2.1: TBPL hook didn't trigger it...I'm now closing & opening to see if the test would be triggered.
As a conclusion, I'll land the master patch first, and to see when the other two patches can be ready for submitting approvals.
Assignee | ||
Comment 22•10 years ago
|
||
TBPL result for master patch:
https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=d554340b50d1
Assignee | ||
Comment 23•10 years ago
|
||
master: https://github.com/mozilla-b2g/gaia/commit/9471af2ff4e995a0d9785ced2073c8fa9c8c357c
Since I landed the master patch, I now set it as RESOLVED FIXED.
Assignee | ||
Comment 24•10 years ago
|
||
OK, now TBPL is testing the v2.1 patch.
Assignee | ||
Comment 25•10 years ago
|
||
I re-triggered the jobs failed at B2G Desktop OS X opt for v2.1 patch, and would wait the result. Although I wonder the failures are not from my patch.
Assignee | ||
Comment 26•10 years ago
|
||
After re-ran it the v2.0 patch is green at TBPL:
https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=cffe717b08e7
So I would land it.
Assignee | ||
Comment 27•10 years ago
|
||
Ahhh sorry I forgot to submit the approval! I now revert it...
Assignee | ||
Comment 28•10 years ago
|
||
Comment on attachment 8536326 [details] [review]
Patch v2.0
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): Bug 1089529
[User impact] if declined: Slowing down continues
[Testing completed]: https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=cffe717b08e7
[Risk to taking this patch] (and alternatives if risky): No
[String changes made]: No
Attachment #8536326 -
Flags: approval-gaia-v2.0?(bbajaj)
Assignee | ||
Comment 29•10 years ago
|
||
Comment on attachment 8536403 [details] [review]
Patch v2.1
[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): Bug 1089529
[User impact] if declined: Slowing down continues
[Testing completed]: With some intermittent failures that are all green after re-runs: https://treeherder.mozilla.org/ui/#/jobs?repo=gaia-try&revision=7ceb1a1b7413
[Risk to taking this patch] (and alternatives if risky): No
[String changes made]: No
Attachment #8536403 -
Flags: approval-gaia-v2.1?(bbajaj)
Comment 30•10 years ago
|
||
I also noticed this issue on FxOS 2.0, both on a Peak and two ZTE Open C devices.
I suspect it might the cause of many slowdowns I encountered.
gerard-majax, maybe it would be worth waiting for this fix to land in the 2.0 branch before releasing a new OTA version for 2.0 in the ZTE Open C community builds?
Comment 31•10 years ago
|
||
Comment on attachment 8536326 [details] [review]
Patch v2.0
Approving, given this is a regression we introduced and a bad one for dogfooders.
Alive/Greg, Thanks for the tests, if there is any additional testing that QA can do here to verify this please add "verifyme" and NI Brian Hunag for help.
Attachment #8536326 -
Flags: approval-gaia-v2.0?(bbajaj) → approval-gaia-v2.0+
Updated•10 years ago
|
Attachment #8536403 -
Flags: approval-gaia-v2.1?(bbajaj) → approval-gaia-v2.1+
Comment 32•10 years ago
|
||
v2.1: https://github.com/mozilla-b2g/gaia/commit/e5c7022367308e90bbbd549fb5a497b7fca6032e
v2.0: https://github.com/mozilla-b2g/gaia/commit/135998648c16d42f7f2f2dc99029553d5ec4db51
Target Milestone: --- → 2.2 S2 (19dec)
Updated•10 years ago
|
status-b2g-v2.0M:
--- → fixed
Reporter | ||
Comment 33•10 years ago
|
||
Applied the update as soon as it was containing the fix. Device is well responsive after nearly three days:
> $ adb shell uptime
> up time: 2 days, 23:34:54, idle time: 05:38:31, sleep time: 2 days, 16:56:35
I'll keep an eye on this, but it looks good, thanks for fixing it!
You need to log in
before you can comment on or make changes to this bug.
Description
•