Closed Bug 880390 Opened 11 years ago Closed 11 years ago
Never-ending 'callschanged' events while SIM card not present
358 bytes, text/html
Looking at the following datazilla page, there appears to have been a major load time regression on or around June 4, 2012: http://mzl.la/15OPjCH All apps except the browser increased from around 1 second up to around 3 or 4 seconds. We need to investigate to see if this is a real regression or a change in the test, etc.
Severity: normal → critical
Priority: -- → P1
Whiteboard: c= ,
Initial feedback is that there is an issue with the marionette automation infrastructure. From what I understand Dave Hunt is actively working it. Dave, assigning to you if you don't mind.
Assignee: nobody → dave.hunt
Status: NEW → ASSIGNED
Summary: [gaia] general app load time regression → [gaia] marionette automation showing general app load time regression
My guess was http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.fps/ -- you'll note that we started failing that suite here: Failed > Console Output #630 Jun 4, 2013 3:08:12 AM 628KB Success > Console Output #629 Jun 4, 2013 12:49:08 AM 1MB Of course, I really shouldn't weigh in without Dave; I know that he's working on fixing a regression due to "a) media-file paths (in master/v1-train) and b) databases changing, both of which affected b2gpopulate"
I don't think the issues are related here. The run immediately before this jump is  and the first run showing the increase is   http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1241/  http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1247/ I see 8 gaia changes between those runs, and three gecko changes. I can rerun those locally to see if I can reproduce the spike. For information, the mentioned infrastructure issues are related to recent changes in the DeviceStorage API and the databases used for reference workloads. These issues have pretty much now been resolved, so if it was affecting the performance I would expect it to have dropped back down.
(In reply to Stephen Donner [:stephend] from comment #3) > My guess was > http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia. > master.perf.fps/ -- you'll note that we started failing that suite here: This is the FPS testrun, and not related to the app load time tests.
Note that this increase has also affected the v1-train results. Strangely, it hasn't affected the browser load times. https://datazilla.mozilla.org/b2g/?branch=v1-train&device=unagi&range=7&test=cold_load_time&app_list=browser,calendar,camera,clock,contacts,email%20FTU,fm_radio,gallery,messages,music,phone,settings,usage,video&app=browser&gaia_rev=337674ef250cc8a0&gecko_rev=f8d267e127d848c5 How do I create a neat mzl.la shortened URL? :)
I'm having issues with my unagi, so I've triggered two builds on the CI. They will take a few hours to run, and won't report to DataZilla but the regression should be obvious in the console output. Last good: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/ First bad: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/
That second URL should have been: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.bad/ MV VPN is required for those links.
This does not appear to be related to the automation framework. Here are the average load times (in ms) from two tests using the build immediately before the increase and the first build demonstrating the increase. These were on the same device, and using the same versions of b2gpopulate and b2gperf: Good (gecko:3a432e057b4a0173ad83046cae7668da6c62303b, gaia:4de4354e3a99f151a834743c7b03a041ac8db12f): Phone: 965 Contacts: 871 Messages: 809 Settings: 1105 Gallery: 662 Video: 707 Music: 759 Camera: 904 Email: 1283 Calendar: 973 Browser: 851 Clock: 1132 FM Radio: 669 Usage: 1277 Bad (gecko:83a3e05401785a7c51b2bae088826a60f01cdbad, gaia:5534304aee934055f08f21ce93261ba2a596516a): Phone: 3433 Contacts: 3256 Messages: 3317 Settings: 3772 Gallery: 2828 Video: 3009 Music: 3082 Camera: 2919 Email: 3906 Calendar: 3565 Browser: 881 Clock: 3872 FM Radio: 2900 Usage: 5541 Pushlogs: Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=997cdbf5d012 Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...5534304aee934055f08f21ce93261ba2a596516a I can try to reduce this regression range, as there were a couple of nightly builds where we had no test results.
This regression is also present in the Gaia performance results. These only use b2gperf to submit results to DataZilla. http://mzl.la/ZUJi8r
Dave, thank you! I'm going to start bisecting on my device here. If you have a convenient way to test those intermediate nightly builds, that would help as well.
The following is an earlier bad build: Gecko: 80fd2ae052f7a1ff7d92ff17d06c20b19223017b Gaia: 0a1cf9f90d33092a845c9c791bee4f7455b602b1 https://hg.mozilla.org/releases/mozilla-b2g18/rev/b6f9eab044dd Revised pushlogs: Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=b6f9eab044dd Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...0a1cf9f90d33092a845c9c791bee4f7455b602b1 I have one more build to run, and will update when I have the results.
I can reproduce the delay locally on my Buri. It appears to be related to the gecko revision. With mozilla-b2g18 at 119502:4785b1353fd7 and HEAD of gaia/v1-train the slow down does not seem to occur.
Note, while datazilla shows the slowdown for "master", Dave indicates that the configuration for that set is really "b2g18+gaia/master". So that is consistent with the results we are seeing.
Bad: mozilla-b2g18 119509:efc1e28607d5
jhylands has identified that the slowdown does not occur in v1-train if a sim card is present. This is regardless of the wifi setting, so does not appear related to having a data connection. Bisecting suggests the problem is somewhere in this 5 part change: http://hg.mozilla.org/releases/mozilla-b2g18/rev/4b236b7d48a7 http://hg.mozilla.org/releases/mozilla-b2g18/rev/dd63ed7329f6 http://hg.mozilla.org/releases/mozilla-b2g18/rev/c958c525d075 http://hg.mozilla.org/releases/mozilla-b2g18/rev/d25bd7f89629 http://hg.mozilla.org/releases/mozilla-b2g18/rev/efc1e28607d5
Running top -t in the adb shell shows the b2g process is spinning: PID TID PR CPU% S VSS RSS PCY UID Thread Proc 1712 1712 0 97% R 171816K 65732K fg root b2g /system/b2g/b2g
Adding some debug in, I see this: ### ### Telephony::EnumerateCallStateComplete() And then it spins on this: ### ### CallEvent::Create(null) ### ### Telephony::NotifyCallsChanged(null) ### ### Telephony::DispatchEvent() ### ### Telephony::EnumerationAck::Run() ### ### CallEvent::Create(null) ### ### Telephony::NotifyCallsChanged(null) ### ### Telephony::DispatchEvent() ### ### Telephony::EnumerationAck::Run()
Component: Gaia → DOM: Device Interfaces
Product: Boot2Gecko → Core
:htsai, :bent, do you have any ideas on this? We'll keep investigating this afternoon, but thought I would ask in case it was immediately obvious to you.
Hm... Multiple enumeration requests maybe? If it's the same request that triggers multiple events then we have a problem...
Ok, thanks to jhylands for debugging here. This was caused (kinda) by bug 823958. The problem is that system/js/statusbar.js is adding a 'callschanged' listener inside the 'handleEvent' method when it receives a 'callschanged' event. That used to be a no-op, but after bug 823958 this causes a never-ending loop on b2g18. This should be fixed on trunk because bug 875545 changed the way all this works. We can fix the platform on b2g18 so that this works but it's a little bit complicated since bug 875545 can't be backported to b2g18. I'd much prefer to fix statusbar.js instead to avoid this problem and then move on. Any objections to that plan?
11 years ago
Summary: [gaia] marionette automation showing general app load time regression → Never-ending 'callschanged' events while SIM card not present
Pointer to Github pull-request
I went ahead and made the patch recommended in comment 21 since it was only a couple lines. This does seem to fix the problem on my device.
Thanks for Ben's comment 21. Sounds reasonable to me.
Comment on attachment 759938 [details] Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270 Comment on github. We probably need to add a test for this, but good thing is most of the boilerplate is already there :)
Sure. Would you prefer a separate test case checking state or would it be adequate to have the mock throw an exception if a duplicate handler is added? Also, I have a very busy weekend, so I may not get to this before Monday which I think might be the cutoff for leo.
(In reply to Ben Kelly [:bkelly] from comment #26) > Sure. Would you prefer a separate test case checking state or would it be > adequate to have the mock throw an exception if a duplicate handler is added? I'd say separate test case. But the quicker the better :)
Pointer to Github pull-request
Comment on attachment 759938 [details] Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270 This pull request got into a bad state when I rebased my new changes in. Closed and opened the new PR.
Attachment #759938 - Attachment is obsolete: true
Comment on attachment 760388 [details] Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294 New pull request including unit test.
Comment on attachment 760388 [details] Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294 Thanks! Ready to go!
Attachment #760388 - Flags: review?(etienne) → review+
Merged to gaia master: 063570622c45d79ba40f8148bb7e89e8c1ff8a46
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
John, can you please assist with the uplift? :)
Just a late thought: why couldn't we just add the listener at init, and never remove it ?
(In reply to Julien Wajsberg [:julienw] from comment #35) > Just a late thought: why couldn't we just add the listener at init, and > never remove it ? Seems possible, but its unclear to me. I assumed the code had a good reason to avoid the callschanged events when not in emergency only mode. Unfortunately I took the shortest path to fix the performance regression given the looming leo deadlines.
You need to log in before you can comment on or make changes to this bug.