Closed Bug 991988 Opened 6 years ago Closed 6 years ago

UI Freezes when trying to send MMS while data is down.

Categories

(Firefox OS Graveyard :: RIL, defect)

x86
Linux
defect
Not set

Tracking

(blocking-b2g:1.3+, firefox29 wontfix, firefox30 fixed, firefox31 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

RESOLVED FIXED
1.4 S6 (25apr)
blocking-b2g 1.3+
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: pgravel, Assigned: bevis)

References

()

Details

(Whiteboard: [cert])

Attachments

(2 files)

Steps to reproduce:

1. Ensure data connection is not enabled
2. Compose and send a MMS message
3. UI will be unresponsive until the MMS is sent, which can take 30-60 seconds while the data call is brought up. Blue circle keeps spinning, but hitting home or trying to scroll the screen does not work.

Does not happen 100% of the time, but is fairly consistent.

This occurs on both com and moz RILs, and doesn't really seem to be a RIL issue in general. Something just seems to block while waiting for the data call to connect.
Blocks: 942267
This is a critical issues that is blocking TA of one of the partner's devices. Please look into this urgently.
blocking-b2g: --- → 1.3?
Can we confirm this on the Moz side?
Keywords: qawanted
At least it works fine on my Peak 1.3, but my build is about 1 week old.
We had tested it and we cannot reproduce it in Spain but it was reported in Uruguay and it is blocking certification. Is it possible to review the logs provided in comment 2 and try to fix it?
I can't either on the latest 1.3 build from geeksphone.

Bevis, do you see something in the logs?
Flags: needinfo?(btseng)
(In reply to Beatriz Rodríguez [:brg] from comment #5)
> We had tested it and we cannot reproduce it in Spain but it was reported in
> Uruguay and it is blocking certification. 

The logs provided are from Uruguay, but the reporter also reproduced this (and I believe he is not based in Uruguay)
Phil, how did you reproduce this issue? Which SIM card and network were you using?
Flags: needinfo?(pgravel)
FWIW - this reads off strongly as being a QC RIL issue. Should we get SR on file?
Component: Gaia::SMS → Vendcom
Keywords: qawanted
The same issue was happening even with Moz ril, so it is not RIL specific. Still trying to get a case that makes this reproduce 100% of the time.
Flags: needinfo?(pgravel)
Component: Vendcom → RIL
blocking-b2g: 1.3? → 1.3+
(In reply to pgravel from comment #9)
> The same issue was happening even with Moz ril, so it is not RIL specific.
> Still trying to get a case that makes this reproduce 100% of the time.

Even 50% of the time is fine, as long as we can reproduce sometimes. Thanks!
Ken

This seems to be fairly important as it is a TA blocker.

Can you please weigh in?
Flags: needinfo?(kchang)
Latest findings:
It seems to be a lot more reproducible if the test is run while wifi is active and connected. I'm seeing the issue near 100% of the time now.
The logs do not show any significant activity while the UI is frozen. Note that the sending animation keeps playing, so draws are still happening, but hitting home key or scrolling in the sms app is completely unresponsive.

b2g-ps:
 APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
 b2g              0 root      3661  1     217700 44208 ffffffff b6ee5888 S /system/b2g/b2g
 (Nuwa)           0 root      3816  3661  52476  3260  ffffffff b6ef6888 S /system/b2g/plugin-container
 Homescreen       2 u0_a3916  3916  3816  69240  16408 ffffffff b6ef6888 S /system/b2g/plugin-container
 Messages         2 u0_a4173  4173  3816  82092  21868 ffffffff b6ef6888 S /system/b2g/plugin-container
 (Preallocated a  2 u0_a5398  5398  3816  60752  15788 ffffffff b6ef6888 S /system/b2g/plugin-container


top:
 PID PR CPU% S  #THR     VSS     RSS PCY UID      Name
 3661  0  30% S    60 236408K  31808K     root     /system/b2g/b2g
 4173  0   9% S    20  87528K  22100K     u0_a4173 /system/b2g/plugin-container

While this is happening, the b2g and messages process are the only ones really using the cpu, with b2g using 20-30% and the plugin-container for messages takes 5-10%.

I noticed while running gdb that when the UI is frozen, whenever I pause the execution it is waiting for a PR_Wait() in nsDNSService2.cpp::771
I added a few debug logs to trace it at run-time and see that this PR_Wait often waits for ~25s.
 04-04 14:22:47.625   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 722
 04-04 14:22:47.625   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve calling res->ResolveHost, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 768
 04-04 14:22:47.625   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve calling PR_Wait(), file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 773
 04-04 14:23:02.665   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve FAILED to resolved host, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 778
 
 
 04-04 14:23:02.675   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 722
 04-04 14:23:02.675   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve calling res->ResolveHost, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 768
 04-04 14:23:02.675   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve calling PR_Wait(), file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 773
 04-04 14:23:17.695   222   222 I Gecko   : [Parent 222] ###!!! ASSERTION: nsDNSService::Resolve FAILED to resolved host, file ../../../../../../../../gecko/netwerk/dns/nsDNSService2.cpp, line 778
 

In the normal case when the UI doesn't free, I see these resolves occur almost instantly.
Another difference - When wifi isn't enabled, nsDNSService::Resolve does not even get used.
Is there a different code path based on whether there is only 1 connection active vs multiple connections?
(In reply to pgravel from comment #12)
It seems that the main thread was blocked for a long time at [1].
We might need to change the design here to resolve the name in async way.

[1] http://hg.mozilla.org/releases/mozilla-b2g28_v1_3/file/fba2e7d69356/dom/system/gonk/NetworkManager.js#l433

(In reply to pgravel from comment #13)
> Another difference - When wifi isn't enabled, nsDNSService::Resolve does not
> even get used.
> Is there a different code path based on whether there is only 1 connection
> active vs multiple connections?

I'll double check the code path to see if there is any difference to 
resolve the hostname with/without wifi enabled.
Flags: needinfo?(pgravel)
Flags: needinfo?(kchang)
Flags: needinfo?(btseng)
Flags: needinfo?(pgravel)
Assignee: nobody → btseng
Update finding so far:
1. MMSC/MMS Proxy will be resolved by NetworkManager.setExtraHostRoute(),
   removeExtraHostRoute() in Gecko's main thread.
1. nsDNSService::Resolve() will always be used no matter WiFi is ON or not.
2. Gecko's main thread is also used to dispatch the touch events from nsAppShell.cpp.

Hence, the root cause of this frozen UI behaivor are
1. The access to the blocking API of DNSService.resolve() in NetworkManager. (bug 939026)
2. No easier way to resolve the hostname with the DNS from specified Network Interface.
   (Design change is needed.)

We'll try to resolve root cause#1 by resolving it in async way to 
remove this TA blocker due to the frozen UI and 
address root cause#2 in bug 992772 in advance to enhance it in the future.
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #15)
...
> Hence, the root cause of this frozen UI behaivor are
> 1. The access to the blocking API of DNSService.resolve() in NetworkManager.
> (bug 939026)
...
> We'll try to resolve root cause#1 by resolving it in async way to 
> remove this TA blocker due to the frozen UI and 
...
Add bug 939026 in depends on list.
Depends on: 939026
Whiteboard: [cert]
Whiteboard: [cert] → cert
Whiteboard: cert → [cert]
Duplicate of this bug: 939026
No longer depends on: 939026
Assignee: btseng → chulee
assign to Chuck Lee for root cause#1 fixing, and leave #2 (bug 992772) as follow up
Hi,

We would like to double confirm that if the MMS will eventually be sent out after UI is unfrozen.
Flags: needinfo?(pgravel)
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #19)
> Hi,
> 
> We would like to double confirm that if the MMS will eventually be sent out
> after UI is unfrozen.

According to my colleague in Uruguay, the MMS is finally sent out, once the UI is unfrozen. Anyway, I'm leaving the ni for Phil, for him to confirm at his side
Confirming that the MMS gets sent out eventually (~45-60 seconds usually), at which point the UI becomes responsive again.
Flags: needinfo?(pgravel)
Just realized that 
for MMS, 
1. if MMS Proxy is set, only MMS Proxy will be used for all the transactions.
2. Most of MMS Proxy are IP address instead of hostname.
   This also appears in the network that reporter tested:
   "mmsproxy":"10.0.2.29","mmsport":"8080","mmsc":"http://mmsc.movistar.com.uy".

Hence, the shortest path to fix this is to resolve the hostname of either mms proxy or mmsc, 
where mms proxy is the 1st priority to be resolved if available.

With this solution, actually, we don't have to resolve the hostname if mms proxy is ip based.
That means the happen rate of this blocking at [1] is reduced if the mms proxy is ip-based.

[1] http://hg.mozilla.org/releases/mozilla-b2g28_v1_3/file/fba2e7d69356/dom/system/gonk/NetworkManager.js#l433
Assignee: chulee → btseng
async solution will be addressed in bug 939026.
The solution of this bug is to provide a solution mentioned in comment 22.
Depends on: 939026
Hi,

Is it possible to give it a try of the attached patch?
We will skip the unnecessary hostname-resolving if mms proxy is ip-address.
Flags: needinfo?(pgravel)
Attachment #8404425 - Flags: review?(vyang) → review+
Attachment #8404425 - Attachment description: Patch v1 - Resolve HostName of either MMS Proxy or MMSC. r=vyang. a=1.3+ → Patch v1 - Resolve HostName of either MMS Proxy or MMSC. r=vyang, a=1.3+, a=1.4+
Comment on attachment 8404425 [details] [diff] [review]
Patch v1 - Resolve HostName of either MMS Proxy or MMSC. r=vyang, a=1.3+, a=1.4+

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 886765
User impact if declined: The UI will be frozen while sending MMS. The solution is to prevent unnecessary hostname resolving to prevent blocking in main thread.
Testing completed: Yes
Risk to taking this patch (and alternatives if risky): No
String or UUID changes made by this patch: N/A
Attachment #8404425 - Flags: approval-mozilla-b2g28?
Comment on attachment 8404425 [details] [diff] [review]
Patch v1 - Resolve HostName of either MMS Proxy or MMSC. r=vyang, a=1.3+, a=1.4+

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 886765
User impact if declined: The UI will be frozen while sending MMS. The solution is to prevent unnecessary hostname resolving to prevent blocking in main thread.
Testing completed (on m-c, etc.): Yes
Risk to taking this patch (and alternatives if risky): No
String or IDL/UUID changes made by this patch: N/A
Attachment #8404425 - Flags: approval-mozilla-aurora?
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #25)
> Hi,
> 
> Is it possible to give it a try of the attached patch?
> We will skip the unnecessary hostname-resolving if mms proxy is ip-address.

It seems to work on AT&T, but I don't understand why based on your comment. The mms proxy in my case is not an ip-address.
Flags: needinfo?(pgravel)
(In reply to pgravel from comment #28)
> (In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #25)
> > Hi,
> > 
> > Is it possible to give it a try of the attached patch?
> > We will skip the unnecessary hostname-resolving if mms proxy is ip-address.
> 
> It seems to work on AT&T, but I don't understand why based on your comment.
> The mms proxy in my case is not an ip-address.

May I know what your mms proxy is in your configuration?
The other reason is that the MMS Proxy of AT&T is reachable from public network and mms connection.
No longer depends on: 939026
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #29)
> (In reply to pgravel from comment #28)
> > (In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #25)
> > > Hi,
> > > 
> > > Is it possible to give it a try of the attached patch?
> > > We will skip the unnecessary hostname-resolving if mms proxy is ip-address.
> > 
> > It seems to work on AT&T, but I don't understand why based on your comment.
> > The mms proxy in my case is not an ip-address.
> 
> May I know what your mms proxy is in your configuration?
> The other reason is that the MMS Proxy of AT&T is reachable from public
> network and mms connection.

Or at least resolvable.
(In reply to Julien Wajsberg [:julienw] from comment #31)
> (In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #29)
> > (In reply to pgravel from comment #28)
> > > (In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #25)
> > > > Hi,
> > > > 
> > > > Is it possible to give it a try of the attached patch?
> > > > We will skip the unnecessary hostname-resolving if mms proxy is ip-address.
> > > 
> > > It seems to work on AT&T, but I don't understand why based on your comment.
> > > The mms proxy in my case is not an ip-address.
> > 
> > May I know what your mms proxy is in your configuration?
> > The other reason is that the MMS Proxy of AT&T is reachable from public
> > network and mms connection.
> 
> Or at least resolvable.

proxy is: proxy.mobile.att.net
mmsport: 80
mmsc: http://mmsc.mobile.att.net
https://hg.mozilla.org/mozilla-central/rev/27eb9024d960
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.5 S1 (9may)
(In reply to pgravel from comment #32)
> 
> proxy is: proxy.mobile.att.net
> mmsport: 80
> mmsc: http://mmsc.mobile.att.net

The mms proxy is resolvable from public network:
$ nslookup proxy.mobile.att.net 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
Name:	proxy.mobile.att.net
Address: 172.26.39.1
Attachment #8404425 - Flags: approval-mozilla-b2g28?
Attachment #8404425 - Flags: approval-mozilla-b2g28+
Attachment #8404425 - Flags: approval-mozilla-aurora?
Attachment #8404425 - Flags: approval-mozilla-aurora+
Flags: in-moztrap?
Flags: in-moztrap? → in-moztrap+
You need to log in before you can comment on or make changes to this bug.