Closed Bug 980070 Opened 11 years ago Closed 11 years ago

90% CPU load in parent after re-opening message app

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE
1.4 S4 (28mar)

People

(Reporter: zrzut01, Unassigned)

References

Details

(Keywords: perf, power, Whiteboard: [c=power p= s=2014.03.28 u=])

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release) Build ID: 20140212131424 Steps to reproduce: 1. Reboot the device. 2. Check CPU load - it will be about 2-6% for /system/b2g/b2g 3. Try to write message with Messages - keyboard works smoothly 4. Open any web page with use of the Browser or Rocket Bar Actual results: After page was loaded /system/b2g/b2g process consumes 95-99% of CPU continuously. Now keyboard or any activity is laggy. Expected results: /system/b2g/b2g process shouldn't consume as much of CPU power especially if the Browser process is not on foreground.
Found on: Alcatel One Touch Fire production (got from T-mobile Poland) B2G version: 1.4.0.0-prerelease master Platform version: 30.0a1 Build Identifier: 20140305040204 Git commit info: 2014-03-04 17:26:14 6781459a
Before open a page in the Borwser: User 1%, System 2%, IOW 0%, IRQ 0% User 4 + Nice 0 + Sys 7 + Idle 297 + IOW 0 + IRQ 0 + SIRQ 0 = 308 PID PR CPU% S #THR VSS RSS PCY UID Name 545 0 2% R 1 1028K 428K fg root top 140 0 1% S 36 160932K 68060K fg root /system/b2g/b2g 24 0 0% S 1 0K 0K fg root krpcserversd 104 0 0% S 1 0K 0K fg root yaffs-bg-1 4 0 0% S 1 0K 0K fg root kworker/0:0 After open a page in the browser: User 87%, System 12%, IOW 0%, IRQ 0% User 270 + Nice 4 + Sys 39 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 313 PID PR CPU% S #THR VSS RSS PCY UID Name 139 0 98% R 40 166584K 72972K fg root /system/b2g/b2g 429 0 1% R 1 1024K 420K fg root top 499 0 0% R 10 57992K 17208K fg root /system/b2g/plugin-container 111 0 0% S 5 4480K 8K fg root /sbin/adbd 239 0 0% S 1 0K 0K fg root kworker/0:3
After additional tests it looks like STR is even simpler: 1. Reboot the device. 2. Check CPU load - it will be about 2-6% for /system/b2g/b2g 3. Open Messages - threads with messages loads smoothly 4. Close Messages by X button on Task Manager. 5. Open Messages again - this time threads with messages loads noticeably slower and after check CPU load now /system/b2g/b2g process consumes 95-99% of CPU.
Put this in the Browser team so that it's noticed by someone.
blocking-b2g: --- → 1.4?
Component: General → Gaia::Browser
This isn't a browser bug.
Component: Gaia::Browser → General
Keywords: perf
But I always fear that bugs inside "General" are just lost.
Can we get a profile? Mason, can you link your profile guide?
Flags: needinfo?(mchang)
Whiteboard: [systemsfe]
Can QA confirm this?
blocking-b2g: 1.4? → 1.4+
Keywords: qawanted
Comment 3 suggests this is a messages bug.
Component: General → Gaia::SMS
Oh right, I missed this. I also recall that while I was doing some performance test runs, I saw that the time was higher each time. Definitely need a profile here. Gabriele, do you have some time to help here ?
Flags: needinfo?(gsvelto)
Sure, I'll do it ASAP. Will clear the needinfo once I have the profile.
Whiteboard: [systemsfe]
(In reply to Gregor Wagner [:gwagner] from comment #9) > Can QA confirm this? Using comment 0 STR CPU still goes up to 90%, but no typing perf occurs and user is able to type smoothly. 1.4 Environmental Variables: Device: Buri 1.4 MOZ BuildID: 20140307040203 Gaia: 04eb7996543f114133d1367f834a4d88b68c60ac Gecko: b0e7f63c2138 Version: 30.0a1 Firmware Version: v1.2-device.cfg
Keywords: qawanted
QA Contact: gbennett
Can we check what happens on 1.3 to compare testing results in comment 13 vs. 1.3's results?
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #14) > Can we check what happens on 1.3 to compare testing results in comment 13 > vs. 1.3's results? CPU rises to 90% and perf does occur. 1.3 Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140310004001 Gaia: 78c30d7ed6f6e30337d6c05453b867f5e97e42bc Gecko: 00f249d54bda Version: 28.0 Firmware Version: v1.2-device.cfg
Keywords: qawanted
Not a regression, so moving to not blocking.
blocking-b2g: 1.4+ → backlog
Jason, don't you think this is pretty bad for battery? Won't partners call for 1.3 blocker later on?
I agree with Julien. We can't ship with this bug.
blocking-b2g: backlog → 1.4+
(In reply to Gregor Wagner [:gwagner] from comment #18) > I agree with Julien. We can't ship with this bug. We can't block something for the purpose of it being already shippable in a past release, so this won't ever block 1.4. We would need to evaluate this against 1.1 as a point of comparison as well to see if we've become worse. But we don't block bugs for the purpose of wanting something fixed. I agree this important, but it won't block if it isn't a regression. I'm adding qawanted here to compare 1.1 as well.
blocking-b2g: 1.4+ → 1.4?
Keywords: qawanted
Based on the drivers meeting we make a blocking decision before we do QA.
blocking-b2g: 1.4? → 1.4+
(In reply to Gregor Wagner [:gwagner] from comment #20) > Based on the drivers meeting we make a blocking decision before we do QA. Then this is a blocking- - the QA is already done here to prove this reproduces on 1.3, so this isn't a regression from 1.3 --> 1.4. The only blocking argument that could be made here is to block 1.3, but that only will continue to hold true if there's a significant difference between 1.1 & 1.3.
blocking-b2g: 1.4+ → backlog
We definitely want to see how this behaves in 1.1. I'm confident this is a regression though. That said, I don't see why such a big issue couldn't block 1.4 by itself, even if it's not a regression. If it's not a regression, I can accept to not block 1.3 as we're really late, but I don't get why it could not block 1.4.
Summary: 95-99% CPU load after first web page open in the Browser after reboot → 95-99% CPU load after re-opening message app
Summary: 95-99% CPU load after re-opening message app → 95-99% CPU load in parent after re-opening message app
We need to fix that on any version it happens. I don't see how this can be acceptable in any version. I'm marking it 1.4+ because it's only confirmed on 1.4 so far but we'll block on 1.3 if it's confirmed there too.
blocking-b2g: backlog → 1.4+
I'm not understanding here why we would not block on this bug given how bad it is. I understand that we already have shipped the problem in 1.3 and we probably can't take a fix there due to release pressure unless it turns out to be a simple, safe fix. But, with something this bad, I think we should take it for 1.4 and block on it. I think the magnitude of the issue makes a blocking case by itself even though it's not a regression from 1.3. I'd also like to see what the behavior was on 1.1.
Enough reports confirming it, marking NEW
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Fabrice Desré [:fabrice] from comment #23) > We need to fix that on any version it happens. I don't see how this can be > acceptable in any version. I'm marking it 1.4+ because it's only confirmed > on 1.4 so far but we'll block on 1.3 if it's confirmed there too. Then this should be nominated for 1.3. comment 15 already indicates that this reproduces on 1.3. (In reply to Clint Talbert ( :ctalbert ) from comment #24) > I'm not understanding here why we would not block on this bug given how bad > it is. I understand that we already have shipped the problem in 1.3 and we > probably can't take a fix there due to release pressure unless it turns out > to be a simple, safe fix. But, with something this bad, I think we should > take it for 1.4 and block on it. Badness isn't what's causing the debate here. It's the fact that this is already present on 1.3. Our triage criteria indicates that performance blockers fall into two areas of scope: * Significant regression from a past release * New performance requirement from QC Neither of these areas meet 1.4 blocking criteria, as this is confirmed to reproduce on 1.3 & it isn't a performance requirement from QC. 1.3 is a different story to evaluate here, so moving to 1.3? is the right thing to do here. > > I think the magnitude of the issue makes a blocking case by itself even > though it's not a regression from 1.3. I'd also like to see what the > behavior was on 1.1. I don't think so. That's definition of wanting a bug fixed, not holding a release on something.
blocking-b2g: 1.4+ → 1.3?
Summary: 95-99% CPU load in parent after re-opening message app → 90% CPU load in parent after re-opening message app
Rules are guidelines. Particular events call for out of the book decisions.
blocking-b2g: 1.3? → 1.3T+
(In reply to Fabrice Desré [:fabrice] from comment #27) > Rules are guidelines. Particular events call for out of the book decisions. As I said, this gets evaluated 1.3?, not the other blocking flags.
blocking-b2g: 1.3T+ → 1.3?
I've tested the STRs in both comment 0 and comment 3 with today's master build and I can't seem to reproduce the issue. FYI I'm using medium reference workloads. I'll try again with the nightly build on which this bug was first seen to see what's different there.
Flags: needinfo?(gsvelto)
Can you please check if this is being reproduced on 1.3 as well
Flags: needinfo?(gsvelto)
The other thing we need to figure out is if this is a persistent 90% or a spike.
I tested with a v1.3 build on my Hamachi but I cannot reproduce the problem there either. The b2g main process gets the normal load it should get during app startup (slightly less than the actual app) which is caused almost entirely by the IPC needed to pull the threads and contacts out of the databases. Apart from this I cannot see anything strange going on. Keep in mind that I did my tests on my local build, which is basically a vanilla VARIANT=eng build and the device is using the v1.2-device.cfg firmware.
Flags: needinfo?(gsvelto)
I did entire test scenario once again. Base version - my own build, flashed system and userdata was: OS Version: 1.4.0-prerelease Build identifier: 20140214123201 Platform version: 30.0a1 Git info commit: 2014-02-14 09:23:40 729e4cef Rocket Bar present. On the base version issue doesn't exists. Then with use of FOTA I did update to: OS Version: 1.4.0-prerelease Build identifier: 20140311040203 Platform version: 30.0a1 Git info commit: 2014-03-10 17:10:35 a351fe62 Rocket Bar present. I was able to reproduce the issue for both scenarios from Comments 0 and 3. Moreover issue can be reproduced not on every reboot - maybe that's why QA now is unable to reproduce it. Then I reflashed the device, system and userdata again to another of my own builds: OS Version: 1.4.0-prerelease Build identifier: 20140308210648 Platform version: 30.0a1 Git info commit: 2014-03-08 10:15:36 efde3979 Rocket Bar NOT present. On the version issue doesn't exist. Then with use of FOTA I did update to: OS Version: 1.4.0-prerelease Build identifier: 20140311040203 Platform version: 30.0a1 Git info commit: 2014-03-10 17:10:35 a351fe62 Rocket Bar NOT present. Issue doesn't exist.
I wonder which data you used. Whan I saw a manifestation of it (higher loading times after the first launch, using "make test-perf APP=sms") I used the "heavy" reference workload. Therefore, I wonder if the mobile messaging subsystem is "finishing" fetching data from the DB, even if the child process is closed, but do it once again when the messages app is launched again and this is not finished. Mac I think we need to know the data you used, and also if you can try with the heavy reference workload (APP=sms make reference-workload-heavy). Note that the first time you'll launch the app, the workload db will be first converted to the new version which takes some time, so you need to wait before you see the data first and then reboot to get a clean testing state.
Mike Please assign.
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(mlee)
Keywords: power
(In reply to Jason Smith [:jsmith] from comment #19) > (In reply to Gregor Wagner [:gwagner] from comment #18) > > I agree with Julien. We can't ship with this bug. > > We can't block something for the purpose of it being already shippable in a > past release, so this won't ever block 1.4. We would need to evaluate this > against 1.1 as a point of comparison as well to see if we've become worse. > But we don't block bugs for the purpose of wanting something fixed. I agree > this important, but it won't block if it isn't a regression. > > I'm adding qawanted here to compare 1.1 as well. Using comment 0 STR CPU still goes up to 90-94% when in browser (~60% in messages app), but no typing perf occurs and user is able to type smoothly. 1.1 Environmental Variables: Device: Buri 1.1 MOZ BuildID: 20140218041202 Gaia: c5cb252e5d1aa45d14f3a2ea182e73e2271e4496 Gecko: d7f2811943d1 Version: 18.0 Firmware Version: v1.2-device.cfg (In reply to Jason Smith [:jsmith] from comment #31) > The other thing we need to figure out is if this is a persistent 90% or a > spike. I am able to keep it between 90-94% for as long as I'm typing.
Keywords: qawanted
Flags: needinfo?(mlee)
Whiteboard: [c=power p= s= u=]
(In reply to gbennett from comment #36) > I am able to keep it between 90-94% for as long as I'm typing. Could this be related to the autocompletion logic? I'll give it a spin with and without autocompletion enabled to see if I can reproduce this and if it makes a difference.
Can we get a video here showcasing the exact STR used to cause the CPU spike? That could probably help Gabriele here with reproducing this bug.
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #38) > Can we get a video here showcasing the exact STR used to cause the CPU > spike? That could probably help Gabriele here with reproducing this bug. Of course. :) Here you go. https://www.youtube.com/watch?v=wF_H9KNaEb0 STR 1. Open Messages app 2. Create a new message and type rapidly in the body (gets about 60%) 3. Open Browser app and type rapidly in the URL field (gets about 92%)
Keywords: qawanted
(In reply to gbennett from comment #39) > Of course. :) Here you go. https://www.youtube.com/watch?v=wF_H9KNaEb0 > > STR > 1. Open Messages app > 2. Create a new message and type rapidly in the body (gets about 60%) > 3. Open Browser app and type rapidly in the URL field (gets about 92%) Thanks for this, I didn't realize it meant tapping *that* fast :) So, we do already have a bug for this issue in the SMS app, it's bug 888950 so I'd move the investigation of this issue there. For the browser we don't have a specific bug already but it might be a different problem. I'll take a profile and see what's going on there.
I've taken a profile of the browser app when it's exhibiting this behavior you can find it here: http://people.mozilla.org/~bgirard/cleopatra/#report=92e5d7353d50d5638012d404f70b481a10706f1b The sample range you should look at is [12000, 16000]. Tapping a key in the browser mostly causes two events: - Updating the text in the URL bar - Displaying the results of the browser history (or just the Google search for "..." banner if the history is empty). This causes IPC messages to be sent between the browser and b2g app and all related DB activity for probing the history. The former doesn't have much impact but the latter causes an interruptible reflow to be piped up to update the search/history entries below the URL bar for each pressed key. Each reflow will trigger a paint cycle and this ends up burning all the available CPU time. The bottom line is that this is not a regression; it could be improved by not sending each and every key to the main process while tapping quickly but as it is shouldn't be considered as a blocker. My take on this bug is that we should maybe prioritize bug 888950 for the SMS app if we care about making text input smoother and open a follow-up for the browser app to throttle the amount of updates we do when the user is tapping very quickly. AFAIK we're already doing this in desktop Firefox. This bug should then be closed as INVALID since it's not a regression and we should really not block 1.3 on it. I'm moving it to 1.3? for this so that it gets re-triaged. There's also another possibility though and that the original STR in comment 0 didn't involve tapping very fast and was an actual regression. If it was however it seems to be gone as I was unable to reproduce it with the original STR on any version I tried.
blocking-b2g: 1.3+ → 1.3?
I agree to not block this based on your rationale, but I don't think that this is bug is invalid. Based on your comments, it sounds like there are improvements we can make here, so this is valid bug.
blocking-b2g: 1.3? → backlog
(In reply to Jason Smith [:jsmith] from comment #42) > I agree to not block this based on your rationale, but I don't think that > this is bug is invalid. Based on your comments, it sounds like there are > improvements we can make here, so this is valid bug. OK, then let's take bug 888950 for the SMS part and open a new follow-up for the browser. I still think this bug should be closed afterwards as it's too generic and I think it's better to have component-specific issues with a more detailed description of the problem at hand.
(In reply to Gabriele Svelto [:gsvelto] from comment #43) > (In reply to Jason Smith [:jsmith] from comment #42) > > I agree to not block this based on your rationale, but I don't think that > > this is bug is invalid. Based on your comments, it sounds like there are > > improvements we can make here, so this is valid bug. > > OK, then let's take bug 888950 for the SMS part and open a new follow-up for > the browser. I still think this bug should be closed afterwards as it's too > generic and I think it's better to have component-specific issues with a > more detailed description of the problem at hand. Fair enough. Can you open a new bug for the browser issue & then close this out?
Filed bug 985369 for the browser. I'm linking both that bug and bug 888950 to this one. Unblocking as per comment 42 and closing as per comment 44. I'm closing as INCOMPLETE as the referred bugs contain more detailed information of what's happening and what needs to be fixed.
Status: NEW → RESOLVED
blocking-b2g: backlog → ---
Closed: 11 years ago
Resolution: --- → INCOMPLETE
See Also: → 888950, 985369
Whiteboard: [c=power p= s= u=] → [c=power p= s=2014.03.28 u=]
Target Milestone: --- → 1.4 S4 (28mar)
You need to log in before you can comment on or make changes to this bug.