Closed
Bug 880390
Opened 12 years ago
Closed 12 years ago
Never-ending 'callschanged' events while SIM card not present
Categories
(Core :: DOM: Device Interfaces, defect, P1)
Tracking
()
People
(Reporter: bkelly, Assigned: bkelly)
References
Details
(Keywords: perf, regression, Whiteboard: c= p=3 s=2013.06.14 ,)
Attachments
(1 file, 1 obsolete file)
Looking at the following datazilla page, there appears to have been a major load time regression on or around June 4, 2012:
http://mzl.la/15OPjCH
All apps except the browser increased from around 1 second up to around 3 or 4 seconds.
We need to investigate to see if this is a real regression or a change in the test, etc.
Comment 1•12 years ago
|
||
ouch,
Updated•12 years ago
|
Assignee | ||
Comment 2•12 years ago
|
||
Initial feedback is that there is an issue with the marionette automation infrastructure. From what I understand Dave Hunt is actively working it. Dave, assigning to you if you don't mind.
Assignee: nobody → dave.hunt
Status: NEW → ASSIGNED
Summary: [gaia] general app load time regression → [gaia] marionette automation showing general app load time regression
Comment 3•12 years ago
|
||
My guess was http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.fps/ -- you'll note that we started failing that suite here:
Failed > Console Output #630 Jun 4, 2013 3:08:12 AM 628KB
Success > Console Output #629 Jun 4, 2013 12:49:08 AM 1MB
Of course, I really shouldn't weigh in without Dave; I know that he's working on fixing a regression due to "a) media-file paths (in master/v1-train) and b) databases changing, both of which affected b2gpopulate"
Comment 4•12 years ago
|
||
I don't think the issues are related here. The run immediately before this jump is [1] and the first run showing the increase is [2]
[1] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1241/
[2] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1247/
I see 8 gaia changes between those runs, and three gecko changes. I can rerun those locally to see if I can reproduce the spike.
For information, the mentioned infrastructure issues are related to recent changes in the DeviceStorage API and the databases used for reference workloads. These issues have pretty much now been resolved, so if it was affecting the performance I would expect it to have dropped back down.
Comment 5•12 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #3)
> My guess was
> http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.
> master.perf.fps/ -- you'll note that we started failing that suite here:
This is the FPS testrun, and not related to the app load time tests.
Comment 6•12 years ago
|
||
Note that this increase has also affected the v1-train results. Strangely, it hasn't affected the browser load times.
https://datazilla.mozilla.org/b2g/?branch=v1-train&device=unagi&range=7&test=cold_load_time&app_list=browser,calendar,camera,clock,contacts,email%20FTU,fm_radio,gallery,messages,music,phone,settings,usage,video&app=browser&gaia_rev=337674ef250cc8a0&gecko_rev=f8d267e127d848c5
How do I create a neat mzl.la shortened URL? :)
Comment 7•12 years ago
|
||
I'm having issues with my unagi, so I've triggered two builds on the CI. They will take a few hours to run, and won't report to DataZilla but the regression should be obvious in the console output.
Last good: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/
First bad: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/
Comment 8•12 years ago
|
||
That second URL should have been: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.bad/
MV VPN is required for those links.
Comment 9•12 years ago
|
||
This does not appear to be related to the automation framework. Here are the average load times (in ms) from two tests using the build immediately before the increase and the first build demonstrating the increase. These were on the same device, and using the same versions of b2gpopulate and b2gperf:
Good (gecko:3a432e057b4a0173ad83046cae7668da6c62303b, gaia:4de4354e3a99f151a834743c7b03a041ac8db12f):
Phone: 965
Contacts: 871
Messages: 809
Settings: 1105
Gallery: 662
Video: 707
Music: 759
Camera: 904
Email: 1283
Calendar: 973
Browser: 851
Clock: 1132
FM Radio: 669
Usage: 1277
Bad (gecko:83a3e05401785a7c51b2bae088826a60f01cdbad, gaia:5534304aee934055f08f21ce93261ba2a596516a):
Phone: 3433
Contacts: 3256
Messages: 3317
Settings: 3772
Gallery: 2828
Video: 3009
Music: 3082
Camera: 2919
Email: 3906
Calendar: 3565
Browser: 881
Clock: 3872
FM Radio: 2900
Usage: 5541
Pushlogs:
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=997cdbf5d012
Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...5534304aee934055f08f21ce93261ba2a596516a
I can try to reduce this regression range, as there were a couple of nightly builds where we had no test results.
Comment 10•12 years ago
|
||
This regression is also present in the Gaia performance results. These only use b2gperf to submit results to DataZilla.
http://mzl.la/ZUJi8r
Assignee | ||
Comment 11•12 years ago
|
||
Dave, thank you! I'm going to start bisecting on my device here. If you have a convenient way to test those intermediate nightly builds, that would help as well.
Comment 12•12 years ago
|
||
The following is an earlier bad build:
Gecko: 80fd2ae052f7a1ff7d92ff17d06c20b19223017b
Gaia: 0a1cf9f90d33092a845c9c791bee4f7455b602b1
https://hg.mozilla.org/releases/mozilla-b2g18/rev/b6f9eab044dd
Revised pushlogs:
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=b6f9eab044dd
Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...0a1cf9f90d33092a845c9c791bee4f7455b602b1
I have one more build to run, and will update when I have the results.
Assignee | ||
Comment 13•12 years ago
|
||
I can reproduce the delay locally on my Buri. It appears to be related to the gecko revision.
With mozilla-b2g18 at 119502:4785b1353fd7 and HEAD of gaia/v1-train the slow down does not seem to occur.
Assignee | ||
Comment 14•12 years ago
|
||
Note, while datazilla shows the slowdown for "master", Dave indicates that the configuration for that set is really "b2g18+gaia/master". So that is consistent with the results we are seeing.
Assignee | ||
Comment 15•12 years ago
|
||
Bad: mozilla-b2g18 119509:efc1e28607d5
Assignee | ||
Comment 16•12 years ago
|
||
jhylands has identified that the slowdown does not occur in v1-train if a sim card is present. This is regardless of the wifi setting, so does not appear related to having a data connection.
Bisecting suggests the problem is somewhere in this 5 part change:
http://hg.mozilla.org/releases/mozilla-b2g18/rev/4b236b7d48a7
http://hg.mozilla.org/releases/mozilla-b2g18/rev/dd63ed7329f6
http://hg.mozilla.org/releases/mozilla-b2g18/rev/c958c525d075
http://hg.mozilla.org/releases/mozilla-b2g18/rev/d25bd7f89629
http://hg.mozilla.org/releases/mozilla-b2g18/rev/efc1e28607d5
Assignee | ||
Comment 17•12 years ago
|
||
Running top -t in the adb shell shows the b2g process is spinning:
PID TID PR CPU% S VSS RSS PCY UID Thread Proc
1712 1712 0 97% R 171816K 65732K fg root b2g /system/b2g/b2g
Assignee | ||
Comment 18•12 years ago
|
||
Adding some debug in, I see this:
### ### Telephony::EnumerateCallStateComplete()
And then it spins on this:
### ### CallEvent::Create(null)
### ### Telephony::NotifyCallsChanged(null)
### ### Telephony::DispatchEvent()
### ### Telephony::EnumerationAck::Run()
### ### CallEvent::Create(null)
### ### Telephony::NotifyCallsChanged(null)
### ### Telephony::DispatchEvent()
### ### Telephony::EnumerationAck::Run()
Assignee | ||
Updated•12 years ago
|
Component: Gaia → DOM: Device Interfaces
Product: Boot2Gecko → Core
Assignee | ||
Comment 19•12 years ago
|
||
:htsai, :bent, do you have any ideas on this? We'll keep investigating this afternoon, but thought I would ask in case it was immediately obvious to you.
Flags: needinfo?(htsai)
Flags: needinfo?(bent.mozilla)
Hm... Multiple enumeration requests maybe? If it's the same request that triggers multiple events then we have a problem...
Flags: needinfo?(bent.mozilla)
Ok, thanks to jhylands for debugging here. This was caused (kinda) by bug 823958.
The problem is that system/js/statusbar.js is adding a 'callschanged' listener inside the 'handleEvent' method when it receives a 'callschanged' event. That used to be a no-op, but after bug 823958 this causes a never-ending loop on b2g18.
This should be fixed on trunk because bug 875545 changed the way all this works.
We can fix the platform on b2g18 so that this works but it's a little bit complicated since bug 875545 can't be backported to b2g18. I'd much prefer to fix statusbar.js instead to avoid this problem and then move on.
Any objections to that plan?
Blocks: 823958
Updated•12 years ago
|
Summary: [gaia] marionette automation showing general app load time regression → Never-ending 'callschanged' events while SIM card not present
Assignee | ||
Comment 22•12 years ago
|
||
Pointer to Github pull-request
Assignee | ||
Updated•12 years ago
|
Attachment #759938 -
Flags: review?(etienne)
Assignee | ||
Updated•12 years ago
|
Assignee: dave.hunt → bkelly
Assignee | ||
Comment 23•12 years ago
|
||
I went ahead and made the patch recommended in comment 21 since it was only a couple lines. This does seem to fix the problem on my device.
Comment 24•12 years ago
|
||
Thanks for Ben's comment 21. Sounds reasonable to me.
Flags: needinfo?(htsai)
Assignee | ||
Updated•12 years ago
|
blocking-b2g: --- → leo?
Comment 25•12 years ago
|
||
Comment on attachment 759938 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270
Comment on github.
We probably need to add a test for this, but good thing is most of the boilerplate is already there :)
Attachment #759938 -
Flags: review?(etienne)
Assignee | ||
Comment 26•12 years ago
|
||
Sure. Would you prefer a separate test case checking state or would it be adequate to have the mock throw an exception if a duplicate handler is added?
Also, I have a very busy weekend, so I may not get to this before Monday which I think might be the cutoff for leo.
Comment 27•12 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #26)
> Sure. Would you prefer a separate test case checking state or would it be
> adequate to have the mock throw an exception if a duplicate handler is added?
I'd say separate test case.
But the quicker the better :)
Assignee | ||
Comment 28•12 years ago
|
||
Pointer to Github pull-request
Assignee | ||
Comment 29•12 years ago
|
||
Comment on attachment 759938 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270
This pull request got into a bad state when I rebased my new changes in. Closed and opened the new PR.
Attachment #759938 -
Attachment is obsolete: true
Assignee | ||
Comment 30•12 years ago
|
||
Comment on attachment 760388 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294
New pull request including unit test.
Attachment #760388 -
Flags: review?(etienne)
Attachment #760388 -
Flags: feedback?(felash)
Comment 31•12 years ago
|
||
Comment on attachment 760388 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294
Thanks! Ready to go!
Attachment #760388 -
Flags: review?(etienne) → review+
Assignee | ||
Comment 32•12 years ago
|
||
Merged to gaia master: 063570622c45d79ba40f8148bb7e89e8c1ff8a46
Assignee | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•12 years ago
|
tracking-b2g18:
--- → ?
Updated•12 years ago
|
Comment 34•12 years ago
|
||
v1-train: 43e1b73371c7e7e6b9fde90ea2d0d5e7745b9705
status-b2g18:
--- → fixed
Flags: needinfo?(jhford)
Updated•12 years ago
|
Flags: in-moztrap-
Updated•12 years ago
|
Attachment #760388 -
Flags: feedback?(felash)
Updated•12 years ago
|
Whiteboard: c= , → c= p=3 s=2013.06.14 ,
Comment 35•12 years ago
|
||
Just a late thought: why couldn't we just add the listener at init, and never remove it ?
Assignee | ||
Comment 36•12 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #35)
> Just a late thought: why couldn't we just add the listener at init, and
> never remove it ?
Seems possible, but its unclear to me. I assumed the code had a good reason to avoid the callschanged events when not in emergency only mode. Unfortunately I took the shortest path to fix the performance regression given the looming leo deadlines.
You need to log in
before you can comment on or make changes to this bug.
Description
•