Closed Bug 979626 Opened 6 years ago Closed 6 years ago

Crash in Telephony while running stability scripts

Categories

(Firefox OS Graveyard :: RIL, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:1.3+, firefox29 wontfix, firefox30 fixed, firefox31 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

RESOLVED FIXED
1.4 S4 (28mar)
blocking-b2g 1.3+
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: ggrisco, Assigned: hsinyi)

References

Details

(Keywords: crash, Whiteboard: [CR 626685] [FT:RIL][b2g-crash])

Attachments

(3 files, 2 obsolete files)

I don't have STR at this point, just the minidump info and log snippet (in the attached .extra file)

 [@ mozPersonalDictionary::AddRef() | mozilla::dom::Telephony::NotifyCdmaCallWaiting(unsigned int, nsAString_internal const&) | mozilla::dom::telephony::TelephonyIPCProvider::NotifyCdmaCallWaiting(unsigned int, nsAString_internal const&) | mozilla::dom::telephony::TelephonyChild::RecvNotifyCdmaCallWaiting(unsigned int const&, nsString const&) ]
blocking-b2g: --- → 1.3?
Hsin-Yi - Is only having the crash stack enough to make this bug actionable?
Flags: needinfo?(htsai)
(In reply to Jason Smith [:jsmith] from comment #2)
> Hsin-Yi - Is only having the crash stack enough to make this bug actionable?

I know STR isn't available right not, but please understand having more context of the scripts is great help. Please keep us posted once you get the steps. I will look at the log first anyway.

And, I guess the gecko/gaia version is mozilla_b2g28_v1_3 and mozillaorg/v1.3. Is that right?
Flags: needinfo?(htsai) → needinfo?(ggrisco)
Yes, gecko and gaia are both v1.3.
according to .extra.log, seems timing issue b/w enumerateCalls and notifyCdmaCallWaiting.
==============
03-03 21:54:00.851   304   304 D NS_TELEPHONY_PROVIDER_QC_B2G: [SUB0] TelephonyListener:NotifyCdmaCallWaiting:
03-03 21:54:01.461   304   304 D NS_TELEPHONY_PROVIDER_QC_B2G: [SUB0] TelephonyListener:EnumerateCalls:
03-03 21:54:01.461   304   304 D NS_TELEPHONY_PROVIDER_QC_B2G: [SUB1] TelephonyListener:EnumerateCalls:
==============

We use 'EnumerateCalls' to synchronize TelephonyDOM with RIL. And we expect when cdmaCallWaiting event comes there's already one call on TelephonyDOM. However, there's a chance that a CdmaCallWaiting event comes before DOM completely retrieving the calls maintained in RIL (see log snippet above). So there's no mCalls[0] in [1] that leads the crash...

A possible quick solution: register ril listener only after enumerateCallComplete. 

[1] http://dxr.mozilla.org/mozilla-central/source/dom/telephony/Telephony.cpp?from=Telephony.cpp&case=true#728
Attached patch WIP (obsolete) — Splinter Review
Please see comment 5.

Hi Greg,

Please help verify if this patch solves the issue. Thanks.
Attachment #8386063 - Flags: feedback?(ggrisco)
Michael

Is this still needed for 1.3?
Flags: needinfo?(mvines)
We'd like to see a fix, yes.  Stability on v1.3 does still not meet the CS goal
Flags: needinfo?(mvines)
blocking-b2g: 1.3? → 1.3+
I saw Hsinyi had the WIP.
Assignee: nobody → htsai
Whiteboard: [FT:RIL]
PM Triage: 1.3 stability needs to reach target MTBFs. Stays 1.3+.
> Hi Greg,
> 
> Please help verify if this patch solves the issue. Thanks.

We're having problems with our internal mirrors right now and so the patch doesn't apply.  Once we get this worked out I can apply the patch and make an eng build for re-test.
Flags: needinfo?(ggrisco)
Whiteboard: [FT:RIL] → [CR 626685] [FT:RIL]
Keywords: crash
Whiteboard: [CR 626685] [FT:RIL] → [CR 626685] [FT:RIL][b2g-crash]
Patch applied.  I will report back when results are available.
We haven't seen this crash repeat with this patch applied.
Hsin-Yi, can you move this forward?
Flags: needinfo?(htsai)
Attachment #8386063 - Flags: feedback?(ggrisco) → feedback+
Okay, I'll ask for a formal review and push this patch into try server. Thanks for the feedback!
Flags: needinfo?(htsai)
Attached patch v1 (obsolete) — Splinter Review
Please see comment 5 for diagnosis.

I'd say it's really an easy but not perfect solution. With this patch, the API behaviour becomes a little bit weird. For example, gaia could get mozTelephony which is unable to receive RIL messages. However, as this is an urgent bug and it's not the right time to refactor the whole structure. I think it's okay to go.

Hi Vicamo, may I have your review? Thanks.
Attachment #8386063 - Attachment is obsolete: true
Attachment #8393306 - Flags: review?(vyang)
(In reply to Hsin-Yi Tsai  [:hsinyi] from comment #16)
> I'd say it's really an easy but not perfect solution. With this patch, the
> API behaviour becomes a little bit weird. For example, gaia could get
> mozTelephony which is unable to receive RIL messages.

IMO, receiving telephony signals is not a big deal here because of Javascript run-to-complete rule.  In contrast, it brings a problem when we make a call right after getting mozTelephony.  Actually, the original design suffers from the same problem, but this patch just makes that a little more serious.

> However, as this is an
> urgent bug and it's not the right time to refactor the whole structure. I
> think it's okay to go.

Please also file an follow-up for rewrite call enumeration mechanism.  Thank you :)
Comment on attachment 8393306 [details] [diff] [review]
v1

Review of attachment 8393306 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8393306 - Flags: review?(vyang) → review+
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #17)
> (In reply to Hsin-Yi Tsai  [:hsinyi] from comment #16)
> > I'd say it's really an easy but not perfect solution. With this patch, the
> > API behaviour becomes a little bit weird. For example, gaia could get
> > mozTelephony which is unable to receive RIL messages.
> 
> IMO, receiving telephony signals is not a big deal here because of
> Javascript run-to-complete rule.  

We've discussed this offline. Let me just write down my explanation for others' reference.

Before this patch, gaia is never able to get mozTelephony when listener registration fails. However, this patch will return telephony first then registering listener later. That means, no matter registration fails or not, gaia still holds a valid mozTelephony, but she will *never* receive any RIL signals.

> In contrast, it brings a problem when we
> make a call right after getting mozTelephony.  Actually, the original design
> suffers from the same problem, but this patch just makes that a little more
> serious.
> 
> > However, as this is an
> > urgent bug and it's not the right time to refactor the whole structure. I
> > think it's okay to go.
> 
> Please also file an follow-up for rewrite call enumeration mechanism.  Thank
> you :)

No problem!
Local tests pass. Let's wait for try result https://tbpl.mozilla.org/?tree=Try&rev=7c172c7a99ee before landing the patch.
try green. Go for land it.
Keywords: checkin-needed
Keywords: checkin-needed
Attachment #8393306 - Attachment is obsolete: true
Attachment #8393954 - Attachment description: v2 - add r=vicamo → patch for b2g-inbound - add r=vicamo
https://hg.mozilla.org/mozilla-central/rev/479570be21be
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S4 (28mar)
Comment on attachment 8393954 [details] [diff] [review]
patch for b2g-inbound - add r=vicamo

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 822210
User impact if declined: no obvious user impact, but this bug is caught by a stability test, it has chances that crash appears.
Testing completed: yes, landed in m-c and passed try server
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: no
Attachment #8393954 - Flags: approval-mozilla-b2g28?
> > 
> > > However, as this is an
> > > urgent bug and it's not the right time to refactor the whole structure. I
> > > think it's okay to go.
> > 
> > Please also file an follow-up for rewrite call enumeration mechanism.  Thank
> > you :)
> 
> No problem!

Here, Bug 986333.
Comment on attachment 8393954 [details] [diff] [review]
patch for b2g-inbound - add r=vicamo

Approving this for 1.3 given the verification and low risk here.

This also needs to be landed on mozilla-aurora so this patch is not missed on 1.4. Flagginf :ryanvm to help with this one-off landings here.
Attachment #8393954 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
ryan, can you please help with aurora landing with a=bajaj, please NI :hsinyi in case the patch does not apply cleanly for a branch specific patch.
Flags: needinfo?(ryanvm)
Flags: in-moztrap-
You need to log in before you can comment on or make changes to this bug.