Closed Bug 1074379 Opened 10 years ago Closed 10 years ago

Dialer does not always show incoming call # and prevents the user to answer

Categories

(Firefox OS Graveyard :: Gaia::Dialer, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+, b2g-v2.0 unaffected, b2g-v2.1 verified, b2g-v2.2 verified)

RESOLVED DUPLICATE of bug 1083729
2.1 S7 (24Oct)
blocking-b2g 2.1+
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: khuey, Assigned: gerard-majax)

References

Details

(Keywords: regression, Whiteboard: [priority][planned-sprint][in-sprint=v2.1-S6])

Attachments

(6 files)

David Baron and I have both been seeing an issue lately where incoming calls do not display the number/contact information of the caller.  Unfortunately we haven't been able to find STR yet.
And, worse, if you want to answer the phone, pressing the answer button doesn't do anything.
kgrandon has seen this too.
blocking-b2g: --- → 2.2?
Can you take a look?
Flags: needinfo?(anthony)
I've seen that once and others on the team have seen it too. We suspect it could be happening when we haven't received call for a while.
Flags: needinfo?(anthony)
Assignee: nobody → anthony
I've seen it too. steps-wanted until we find reproducible STR.
Keywords: steps-wanted
Reproduced on my dogfooding phone. We have this error:

W/GeckoConsole(  279): [JavaScript Error: "TypeError: this._settingsManager._window is null" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsManager.js" line: 118}]
W/GeckoConsole(  279): [JavaScript Error: "NS_ERROR_UNEXPECTED: " {file: "app://callscreen.gaiamobile.org/gaia_build_defer_index.js" line: 154}]

Kyle: Does that ring a bell for you?

I don't have better STRs though.
Flags: needinfo?(kyle)
Pretty sure this is a duplicate of bug 1070066. The way the callscreen is closed at the end of a call causes weird issues with settings, because we somehow manage to trigger callbacks after dom-window-destroyed, which means SettingsManager has no window object and can't clean up correctly, which for some reason boots the system into an odd state.

The fact that the proposed solution for bug 1070066 of "just wait 2 seconds" is not exactly heartening, though.

:rik, can you look at bug 1070066, see if it matches what you're seeing, and mark this dupe if it is?
Flags: needinfo?(kyle) → needinfo?(anthony)
triage: keep the nom for now before we're sure it's duplicate or not.
We don't really have good STRs so I can't check that it's a duplicate of bug 1070066. Also, this one is about receiving calls, not placing some. Do we have any bug opened to fix the Settings API rather than hiding this with the 2s delay?
Flags: needinfo?(anthony)
On IRC, smaug says: "I think it happens when b2g process takes tons of cpu time"
Yeah, Comment 0 and Comment 1 both sound like bug 1070066. It doesn't matter if it's incoming or outgoing, something is wrong with the callscreen app either way because it comes up in both cases. Due to comment 10, I'm now curious if the STR in bug 1070066 would allow us to hit the high CPU load reported in other bugs easily.
Summary: Dialer does not always show incoming call # → Dialer does not always show incoming call # and prevents the user to answer
is this on trunk only?  or do people see this on the 2.0 or 2.1 branches also?
I think it's trunk-only; I'm dogfooding 2.1 on my flame and haven't experienced this. (and IIRC, dbaron and khuey are both using trunk builds)
I tried reproducing this today and experienced a variation of it. I received a black background (missing wallpaper) on an incoming call.  There was no calling number listed.  I was able to answer and get audio.  This was the error I saw in the log: E/GeckoConsole(  278): [JavaScript Error: "ReadOnlyError: A mutation operation was attempted in a READ_ONLY transaction." {file: "resource://gre/modules/SettingsRequestManager.jsm" line: 509}]
(In reply to Tamara Hills [:thills] from comment #15)

> E/GeckoConsole(  278): [JavaScript Error:
> "ReadOnlyError: A mutation operation was attempted in a READ_ONLY
> transaction." {file: "resource://gre/modules/SettingsRequestManager.jsm"
> line: 509}]

This error is not on a valid line for either master or v2.1 currently. This would have to happen on an IDB call, while line 509 in both branches is the beginning of the TaskClear function. How old is the version you are running?
triage: we'll keep nomination for now as there's no 2.2 branch yet. 
added to [priority]
Whiteboard: [priority]
I was able to get this and I agree with comment 11 about Bug 1070066 being a good setup for this bug. In 2.2 and 2.1, I was able to reliably get this bug by stressing out the device in the following manner.

STR:
1. Flash to the latest Flame 2.2 or 2.1
2. Place a call to another device then hang up immediately.
3. Repeat this step 4-5 times to stress the device. (You may see problems related to bug 1070066 here)
4. Lock the device and then RECIEVE a call from a second device.
5. Test device will not show the inbound call information AND most likely the call cannot be answered.

***NOTE: Placing the 4-5 calls then hanging up is just to put stress on the system and may not be necessary if the system is already stressed due to heavy use already. (placing calls was just the fastest way for me to repro)

Tested with Shallow Flash on 319mb

This bug repro's on Flame KK builds: Flame 2.2 KK, Flame 2.1 KK

Actual Results: Missing incoming call information and loss of call acceptance when receiving a call.

Repro Rate: 6/6

Environmental Variables:
Device: Flame Master KK
BuildID: 20141001160320
Gaia: a23d2c490b39c4699c9375e25c4acdf396a2fa85
Gecko: f771fd927304
Version: 35.0a1 (Master) 
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0
-----------------------------------------------------------------
Environmental Variables:
Device: Flame 2.1 KK
BuildID: 20141002071122
Gaia: 94dcc25f2e34a4900ea58310c26be52bcb089161
Gecko: 2886eca89559
Version: 34.0a2
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

-----------------------------------------------------------------
-----------------------------------------------------------------

This bug does NOT repro on Flame kk build: Flame 2.0 KK

Actual Result: Unable to get into a state where inbound calls have missing information on the call screen.

Repro Rate: 0/10

Environmental Variables:
Device: Flame 2.0
BuildID: 20141001174325
Gaia: 9725d188a733a4aeebcfcf4c52d28e1ad8a2ba6f
Gecko: 05c6a4fed6bc
Version: 32.0 (2.0) 
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: steps-wantedregression
QA Contact: croesch
I am seeing this for a one time call on 2.1. Flame using:

Gaia   94dcc25f2e34a4900ea58310c26be52bcb089161
SourceStamp baaa0c3ab8fd
BuildID 20141002000202
Version 34.0a2
base: v180

At the time it happened the phone was not stressed with lots of apps. The QA team will investigate this further since I found out this morning others have seen it on 2.1 intermittently.
[Blocking Requested - why for this release]: Moving this to a 2.1 blockers as this issue is being seen on 2.1 as well.
blocking-b2g: 2.2? → 2.1?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
QA Contact: croesch
Logcat would be helpful. 99% sure is this the same issue as bug 1070066, except this won't be fixed by the timeout.
Adding qawanted for repro on 2.1 with logcat
Keywords: qawanted
Triage: This has major impact for all users in one of the core apps in OS.
blocking-b2g: 2.1? → 2.1+
If kyle is right, I feel that this bug 1074379 and bug 1070066, bug 1066718, bug 1072155 could all be duped/fixed by bug 1076597.

I have been testing bug 1076597 on my Xperia ZR, which was exposing several of the conditions quite steadly after some time of use, and it seemed to have fixed all.
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> If kyle is right, I feel that this bug 1074379 and bug 1070066, bug 1066718,
> bug 1072155 could all be duped/fixed by bug 1076597.
> 
> I have been testing bug 1076597 on my Xperia ZR, which was exposing several
> of the conditions quite steadly after some time of use, and it seemed to
> have fixed all.

bug 1064718 and not 1066718
Doing the STR, I see a lot of:

> 10-03 10:18:49.141   301   301 W GeckoConsole: [JavaScript Error: "NS_ERROR_FAILURE: " {file: "chrome://global/content/BrowserElementChildPreload.js" line: 998}]
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> If kyle is right, I feel that this bug 1074379 and bug 1070066, bug 1066718,
> bug 1072155 could all be duped/fixed by bug 1076597.
> 
> I have been testing bug 1076597 on my Xperia ZR, which was exposing several
> of the conditions quite steadly after some time of use, and it seemed to
> have fixed all.

On my flame, bug 1076597 did nothing to fix bug 1070066. I was still hitting the STR reliably even with the patch applied.
Here is a logcat of this issue.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
QA Contact: aalldredge
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Ok, there's

10-03 17:31:17.649   205   205 W GeckoConsole: [JavaScript Error: "TypeError: this._settingsManager._window is null" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsManager.js" line: 118}]


All over the place in the new logcat, so I'd say that whatever fixes bug 1070066 will fix this. I don't think settings is dealing well with how callscreen is launched in the system process.

That said, not sure if I should dupe yet?
QA: Can we test this bug with the patch from bug 1068109 applied? To see if it fixes the issue.
Keywords: qawanted
Sorry, I meant testing with the patch from bug 1076597.
I am not able to reproduce this bug anymore by following the STR in comment 18 when I apply attachment 8498652 [details] [diff] [review]. Let's mark bug 1076597 as a dependency and clear the QA work here.

We should verify the fix once bug 1076597 is landed.
Keywords: regression
Fixed by bug 1076597.

This should be verified once that lands on m-c.
Status: NEW → RESOLVED
Closed: 10 years ago
Keywords: verifyme
Resolution: --- → DUPLICATE
Target Milestone: --- → 2.1 S6 (10oct)
Assignee: anthony → lissyx+mozillians
(In reply to Daniel Holbert [:dholbert] from comment #14)
> I think it's trunk-only; I'm dogfooding 2.1 on my flame and haven't
> experienced this.

(I have actually now hit this with 2.1 on my flame, yesterday evening -- but it sounds like a fix is coming. hooray!)
(In reply to Johan Lorenzo [:jlorenzo] from comment #32)
> I am not able to reproduce this bug anymore by following the STR in comment
> 18 when I apply attachment 8498652 [details] [diff] [review]. Let's mark bug
> 1076597 as a dependency and clear the QA work here.
> 
> We should verify the fix once bug 1076597 is landed.

I was wrong. That bug still occurs with attachment 8498652 [details] [diff] [review] and the fix in bug 1076597. I had to try the STR 3 times before it appears. Afterwards, the bug repros each time. See attached logcat for details.

Now that bug 1070066 is landed, I changed the 2s delay like said in https://bugzilla.mozilla.org/show_bug.cgi?id=1070066#c39.

In other words, we still need to fix this issue, but bug 1070066 make it harder to reproduce and you have to place quickly at least 15 phone calls. Hence, I don't think this is a blocker anymore.
Status: RESOLVED → REOPENED
blocking-b2g: 2.1+ → 2.1?
Resolution: DUPLICATE → ---
triage: still this regression might break experience even with low repro. rate.
blocking-b2g: 2.1? → 2.1+
Assignee: lissyx+mozillians → nobody
Whiteboard: [priority] → [priority][in-sprint=v2.1-S6]
Target Milestone: 2.1 S6 (10oct) → 2.1 S7 (Oct24)
Whiteboard: [priority][in-sprint=v2.1-S6] → [priority][planned-sprint][in-sprint=v2.1-S6]
The logcat shows a bunch of this:

> W/GeckoConsole( 1382): [JavaScript Error: "TypeError: this._settingsManager._window is null" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsManager.js" line: 118}]

So this seems to be either a Settings or Window Management issue. Let's bring in Alexandre and Michael to take a look.
Flags: needinfo?(mhenretty)
Flags: needinfo?(lissyx+mozillians)
See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1070066#c10

I was seeing that while debugging 1070066. Something is getting slated to run after callscreen kill itself.
(In reply to Doug Sherk (:drs) (use needinfo?) from comment #38)
> The logcat shows a bunch of this:
> 
> > W/GeckoConsole( 1382): [JavaScript Error: "TypeError: this._settingsManager._window is null" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsManager.js" line: 118}]
> 
> So this seems to be either a Settings or Window Management issue. Let's
> bring in Alexandre and Michael to take a look.

Yes, and I spent the whole yesterday trying to grab more informations on this, unsuccessfully.
Flags: needinfo?(lissyx+mozillians)
Blocks: 1063375
As this bug is pretty long to reproduce, I wrote a Gaia UI test to help checking the resolution. I attached the diff, you can apply it in the gaia repo.

It's not exactly the same STR as the manual one, but once the test fails (at the 25th outgoing call), I am able to see manually the issue for an incoming call.
Assignee: nobody → drs.bugzilla
Yesterday's of investigation has been unsuccessfull too. Wasted a lot of time just trying to get B2G_DEBUG to work ...

I've setup my system to run jlorenzo's UI test that exposes the issue. I'm collecting data on a KK B2G_DEBUG enabled build with much debug turned on on the Settings API side. Let's hope this will reveal some interesting facts. If it does not, I'll go into regression-window/bisect this, since it looks like nobody even cared of this earlier.
Maybe some good news. I have been working with Johan's test since this morning. It used to fail consisenty before 30 calls being made (Flame, KK, 1GB, B2G_DEBUG=1). I've bumped the number of runs to 50.

Then I applied patches from bug 1073259, and a hack/fix for bug 1078448. So far, I'm able to complete my 50 tests run.
So far:
 - with hack/fix for bug 1078448: one complete 50 sequential calls run done successfully
 - without this hack/fix: one failure after 25 calls, one after 24 calls
(In reply to Alexandre LISSY :gerard-majax from comment #44)
> So far:
>  - with hack/fix for bug 1078448: one complete 50 sequential calls run done
> successfully
>  - without this hack/fix: one failure after 25 calls, one after 24 calls

Anoter run of 50 voicemail calls with the hack:
> TEST-PASS | test_Bug_1074379_Dialer_does_not_always_show_incoming_call_number_and_prevents_the_user_to_answer.py TestReceiveCallScreenLocked.test_receive_call_with_locked_screen | took 1307069ms
Depends on: 1078448
Could QA checks that bug 1078448 attachment 8505528 [details] fixes this issue ?
Keywords: qaurgent, qawanted
QA Contact: aalldredge → pcheng
(In reply to Alexandre LISSY :gerard-majax from comment #46)
> Could QA checks that bug 1078448 attachment 8505528 [details] fixes this
> issue ?

I assume you meant bug 1078448 attachment 8505522 [details] [review] ? attachment 8505528 [details] seems to be a memory report.

After applying the patch on top of latest master build, the bug no longer occurs. I placed ~40 calls out and hung up, and each time the phone number/name was displayed correctly. I didn't see the call screen turning black either.

However I noticed that after the patch, the call screen uses low resolution icons, as well as lacking the word 'connecting' when making a call. I've attached screenshots comparing the call screens.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qaurgent, qawanted
NI to Alexandre to direct attention to comment 47
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell) → needinfo?(lissyx+mozillians)
QA Contact: pcheng
See my reply on bug 1081714. Thanks for testing :)
Flags: needinfo?(lissyx+mozillians)
Stealing.
Assignee: drs+bugzilla → lissyx+mozillians
Depends on: 1083729
The bfcache problem described in bug 1078448 is the reason of the problem. 

Before applying the patch, I ran 3 times the test in attachment 8504694 [details]. On each try, the test fails at the 24th call.
After applying it, I am able to go to beyond 66 call (68 on the first and second try, 66 on the third). In that case, the test failed because of an OOM on a 319MB device.

Note that the issue only occured on a french Orange SIM card, but not on a SIM from Free carrier. :gerard-majax said that it might be that Orange SIM cards have SIM Application Toolkit, but Free's don't.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(mhenretty)
No longer depends on: 1078448
See Also: → 1078448
remove verifyme until bug 1078448 fixed
Keywords: verifyme
This is actually a duplicate of the workaround put in place by bug 1083729. We should still verify this.
Keywords: verifyme
The problem is verified not happen on latest Flame 2.1 & 2.2 build.

Steps:
1. Place a call to another device then hang up immediately.
2. Repeat this step 4-5 times to stress the device.
3. Lock the device and then RECIEVE a call from a second device.

Actual result:
3. Test device show the inbound call information correctly, and the call can be answered.

Fail rate: 0/10
We have repeat step 1 more than 60 times, and each time the calling page was displayed correctly.
See attachment:1720.mp4

Flame 2.1 version:
Build ID               20150203001532
Gaia Revision          17bf14f12e43043654498330d610d469b8b55e64
Gaia Date              2015-02-03 05:19:41
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/014f57033b8a
Gecko Version          34.0
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150203.041012
Firmware Date          Tue Feb  3 04:10:23 EST 2015
Bootloader             L1TC000118D0

Flame 2.2 version:
Build ID               20150203002504
Gaia Revision          cd62ff9fe199fb43920ba27bd5fdbc5c311016fc
Gaia Date              2015-02-03 00:56:43
Gecko Revision         https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/11d93135c678
Gecko Version          37.0a2
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150203.041704
Firmware Date          Tue Feb  3 04:17:15 EST 2015
Bootloader             L1TC000118D0
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][MGSEI-Triage+]
Keywords: verifyme
Attached video 1720.MP4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: