All users were logged out of Bugzilla on October 13th, 2018

Never-ending 'callschanged' events while SIM card not present

RESOLVED FIXED

Status

()

P1
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: bkelly, Assigned: bkelly)

Tracking

({perf, regression})

unspecified
ARM
Gonk (Firefox OS)
perf, regression
Points:
---
Bug Flags:
in-moztrap -

Firefox Tracking Flags

(blocking-b2g:leo+, b2g18 fixed)

Details

(Whiteboard: c= p=3 s=2013.06.14 ,)

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Looking at the following datazilla page, there appears to have been a major load time regression on or around  June 4, 2012:

  http://mzl.la/15OPjCH

All apps except the browser increased from around 1 second up to around 3 or 4 seconds.

We need to investigate to see if this is a real regression or a change in the test, etc.
ouch,

Updated

5 years ago
Severity: normal → critical
Keywords: perf
Priority: -- → P1
Whiteboard: c= ,
(Assignee)

Comment 2

5 years ago
Initial feedback is that there is an issue with the marionette automation infrastructure.  From what I understand Dave Hunt is actively working it.  Dave, assigning to you if you don't mind.
Assignee: nobody → dave.hunt
Status: NEW → ASSIGNED
Summary: [gaia] general app load time regression → [gaia] marionette automation showing general app load time regression
My guess was http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.fps/ -- you'll note that we started failing that suite here:

Failed > Console Output  #630 	Jun 4, 2013 3:08:12 AM	 628KB
Success > Console Output  #629 	Jun 4, 2013 12:49:08 AM	 1MB

Of course, I really shouldn't weigh in without Dave; I know that he's working on fixing a regression due to "a) media-file paths (in master/v1-train) and b) databases changing, both of which affected b2gpopulate"
I don't think the issues are related here. The run immediately before this jump is [1] and the first run showing the increase is [2]

[1] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1241/
[2] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf/1247/

I see 8 gaia changes between those runs, and three gecko changes. I can rerun those locally to see if I can reproduce the spike.

For information, the mentioned infrastructure issues are related to recent changes in the DeviceStorage API and the databases used for reference workloads. These issues have pretty much now been resolved, so if it was affecting the performance I would expect it to have dropped back down.
(In reply to Stephen Donner [:stephend] from comment #3)
> My guess was
> http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.
> master.perf.fps/ -- you'll note that we started failing that suite here:

This is the FPS testrun, and not related to the app load time tests.
I'm having issues with my unagi, so I've triggered two builds on the CI. They will take a few hours to run, and won't report to DataZilla but the regression should be obvious in the console output.

Last good: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/

First bad: http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.unagi.gaia.master.perf.good/
This does not appear to be related to the automation framework. Here are the average load times (in ms) from two tests using the build immediately before the increase and the first build demonstrating the increase. These were on the same device, and using the same versions of b2gpopulate and b2gperf:

Good (gecko:3a432e057b4a0173ad83046cae7668da6c62303b, gaia:4de4354e3a99f151a834743c7b03a041ac8db12f):
Phone: 965
Contacts: 871
Messages: 809
Settings: 1105
Gallery: 662
Video: 707
Music: 759
Camera: 904
Email: 1283
Calendar: 973
Browser: 851
Clock: 1132
FM Radio: 669
Usage: 1277

Bad (gecko:83a3e05401785a7c51b2bae088826a60f01cdbad, gaia:5534304aee934055f08f21ce93261ba2a596516a):
Phone: 3433
Contacts: 3256
Messages: 3317
Settings: 3772
Gallery: 2828
Video: 3009
Music: 3082
Camera: 2919
Email: 3906
Calendar: 3565
Browser: 881
Clock: 3872
FM Radio: 2900
Usage: 5541

Pushlogs: 
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=997cdbf5d012
Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...5534304aee934055f08f21ce93261ba2a596516a

I can try to reduce this regression range, as there were a couple of nightly builds where we had no test results.
This regression is also present in the Gaia performance results. These only use b2gperf to submit results to DataZilla.

http://mzl.la/ZUJi8r
(Assignee)

Comment 11

5 years ago
Dave, thank you!  I'm going to start bisecting on my device here.  If you have a convenient way to test those intermediate nightly builds, that would help as well.
The following is an earlier bad build:

Gecko: 80fd2ae052f7a1ff7d92ff17d06c20b19223017b
Gaia: 0a1cf9f90d33092a845c9c791bee4f7455b602b1

https://hg.mozilla.org/releases/mozilla-b2g18/rev/b6f9eab044dd

Revised pushlogs:
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/pushloghtml?fromchange=4785b1353fd7&tochange=b6f9eab044dd
Gaia: https://github.com/mozilla-b2g/gaia/compare/4de4354e3a99f151a834743c7b03a041ac8db12f...0a1cf9f90d33092a845c9c791bee4f7455b602b1

I have one more build to run, and will update when I have the results.
(Assignee)

Comment 13

5 years ago
I can reproduce the delay locally on my Buri.  It appears to be related to the gecko revision.

With mozilla-b2g18 at 119502:4785b1353fd7 and HEAD of gaia/v1-train the slow down does not seem to occur.
(Assignee)

Comment 14

5 years ago
Note, while datazilla shows the slowdown for "master", Dave indicates that the configuration for that set is really "b2g18+gaia/master".  So that is consistent with the results we are seeing.
(Assignee)

Comment 15

5 years ago
Bad:  mozilla-b2g18   119509:efc1e28607d5
(Assignee)

Comment 16

5 years ago
jhylands has identified that the slowdown does not occur in v1-train if a sim card is present.  This is regardless of the wifi setting, so does not appear related to having a data connection.

Bisecting suggests the problem is somewhere in this 5 part change:

  http://hg.mozilla.org/releases/mozilla-b2g18/rev/4b236b7d48a7
  http://hg.mozilla.org/releases/mozilla-b2g18/rev/dd63ed7329f6
  http://hg.mozilla.org/releases/mozilla-b2g18/rev/c958c525d075
  http://hg.mozilla.org/releases/mozilla-b2g18/rev/d25bd7f89629
  http://hg.mozilla.org/releases/mozilla-b2g18/rev/efc1e28607d5
(Assignee)

Comment 17

5 years ago
Running top -t in the adb shell shows the b2g process is spinning:

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
 1712  1712  0  97% R 171816K  65732K  fg root     b2g             /system/b2g/b2g
(Assignee)

Comment 18

5 years ago
Adding some debug in, I see this:

  ### ### Telephony::EnumerateCallStateComplete()

And then it spins on this:

  ### ### CallEvent::Create(null)
  ### ### Telephony::NotifyCallsChanged(null)
  ### ### Telephony::DispatchEvent()
  ### ### Telephony::EnumerationAck::Run()
  ### ### CallEvent::Create(null)
  ### ### Telephony::NotifyCallsChanged(null)
  ### ### Telephony::DispatchEvent()
  ### ### Telephony::EnumerationAck::Run()
(Assignee)

Updated

5 years ago
Component: Gaia → DOM: Device Interfaces
Product: Boot2Gecko → Core
(Assignee)

Comment 19

5 years ago
:htsai, :bent, do you have any ideas on this?  We'll keep investigating this afternoon, but thought I would ask in case it was immediately obvious to you.
Flags: needinfo?(htsai)
Flags: needinfo?(bent.mozilla)
Hm... Multiple enumeration requests maybe? If it's the same request that triggers multiple events then we have a problem...
Flags: needinfo?(bent.mozilla)
Ok, thanks to jhylands for debugging here. This was caused (kinda) by bug 823958.

The problem is that system/js/statusbar.js is adding a 'callschanged' listener inside the 'handleEvent' method when it receives a 'callschanged' event. That used to be a no-op, but after bug 823958 this causes a never-ending loop on b2g18.

This should be fixed on trunk because bug 875545 changed the way all this works.

We can fix the platform on b2g18 so that this works but it's a little bit complicated since bug 875545 can't be backported to b2g18. I'd much prefer to fix statusbar.js instead to avoid this problem and then move on.

Any objections to that plan?
Blocks: 823958
Summary: [gaia] marionette automation showing general app load time regression → Never-ending 'callschanged' events while SIM card not present
(Assignee)

Comment 22

5 years ago
Created attachment 759938 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270

Pointer to Github pull-request
(Assignee)

Updated

5 years ago
Attachment #759938 - Flags: review?(etienne)
(Assignee)

Updated

5 years ago
Assignee: dave.hunt → bkelly
(Assignee)

Comment 23

5 years ago
I went ahead and made the patch recommended in comment 21 since it was only a couple lines.  This does seem to fix the problem on my device.
Thanks for Ben's comment 21. Sounds reasonable to me.
Flags: needinfo?(htsai)
(Assignee)

Updated

5 years ago
blocking-b2g: --- → leo?
Comment on attachment 759938 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270

Comment on github.

We probably need to add a test for this, but good thing is most of the boilerplate is already there :)
Attachment #759938 - Flags: review?(etienne)
(Assignee)

Comment 26

5 years ago
Sure.  Would you prefer a separate test case checking state or would it be adequate to have the mock throw an exception if a duplicate handler is added?

Also, I have a very busy weekend, so I may not get to this before Monday which I think might be the cutoff for leo.
(In reply to Ben Kelly [:bkelly] from comment #26)
> Sure.  Would you prefer a separate test case checking state or would it be
> adequate to have the mock throw an exception if a duplicate handler is added?

I'd say separate test case.
But the quicker the better :)
(Assignee)

Comment 28

5 years ago
Created attachment 760388 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294

Pointer to Github pull-request
(Assignee)

Comment 29

5 years ago
Comment on attachment 759938 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10270

This pull request got into a bad state when I rebased my new changes in.  Closed and opened the new PR.
Attachment #759938 - Attachment is obsolete: true
(Assignee)

Comment 30

5 years ago
Comment on attachment 760388 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294

New pull request including unit test.
Attachment #760388 - Flags: review?(etienne)
Attachment #760388 - Flags: feedback?(felash)
Comment on attachment 760388 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/10294

Thanks! Ready to go!
Attachment #760388 - Flags: review?(etienne) → review+
(Assignee)

Comment 32

5 years ago
Merged to gaia master:  063570622c45d79ba40f8148bb7e89e8c1ff8a46
(Assignee)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
tracking-b2g18: --- → ?
blocking-b2g: leo? → leo+
tracking-b2g18: ? → ---
Keywords: regression
John, can you please assist with the uplift? :)
Flags: needinfo?(jhford)
v1-train: 43e1b73371c7e7e6b9fde90ea2d0d5e7745b9705
status-b2g18: --- → fixed
Flags: needinfo?(jhford)

Updated

5 years ago
Flags: in-moztrap-
Attachment #760388 - Flags: feedback?(felash)

Updated

5 years ago
Whiteboard: c= , → c= p=3 s=2013.06.14 ,
Just a late thought: why couldn't we just add the listener at init, and never remove it ?
(Assignee)

Comment 36

5 years ago
(In reply to Julien Wajsberg [:julienw] from comment #35)
> Just a late thought: why couldn't we just add the listener at init, and
> never remove it ?

Seems possible, but its unclear to me.  I assumed the code had a good reason to avoid the callschanged events when not in emergency only mode.  Unfortunately I took the shortest path to fix the performance regression given the looming leo deadlines.
You need to log in before you can comment on or make changes to this bug.