847592 - Incoming calls take a long time in loaded scenario, sometimes they are not even displayed and ends up as unanswered call.

Reporter

Description

•

11 years ago

Similar to Bug 812059. 

But in a loaded scenario the incoming notification doesn't even showup in some cases. Steps to reproduce the issue. 

1. Make sure a valid SIM is inserted and able to receive calls in the device.
2. Open the browser and visit a page like nytimes.
3. Use another mobile to make initiate a call to the target we want to test. 
4. Mean while pinch and zoom\ pan continuously in the web page we already  opened. 
5. It takes really long time to get the incoming call (more than 2 secs) and sometimes we donot even get the call. 

Please let me know if you need any more details.

godavari@codeaurora.org

Reporter

Updated

•

11 years ago

Summary: Incoming calls take a long time in loaded scenario, sometime they are not displayed ends up as unanswered. → Incoming calls take a long time in loaded scenario, sometimes they are not even displayed and ends up as unanswered call.

godavari@codeaurora.org

Reporter

Updated

•

11 years ago

Whiteboard: [FFOS_perf]

godavari@codeaurora.org

Reporter

Comment 1

•

11 years ago

Anshul is looking into the later part of the issue where the UI doesn't come up some times.

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

11 years ago

> 5. It takes really long time to get the incoming call (more than 2 secs) and sometimes we donot even 
> get the call. 

> Anshul is looking into the later part of the issue where the UI doesn't come
> up some times.

I think these are essentially the same issue.  It takes a lot of CPU in order to get the UI to show up, and if by then the call is over, the UI will only show a missed call.

atula

Updated

•

11 years ago

Severity: critical → blocker

Priority: P2 → P1

Justin Lebar (not reading bugmail)

Assignee

Comment 3

•

11 years ago

I don't quite know how QC is using priority flags -- is this a tef+ bug?

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

11 years ago

The only way I can think of to address this bug is, right when the main process receives the first inkling that there's an incoming call, we immediately drop the CPU priority of all other processes.  This will allow the dialer to launch quickly.

But I need to understand how serious we consider this bug to be before I invest in that.

atula

Comment 5

•

11 years ago

In Mobile world, not able to receive a call is a Highest priority issue so must be fixed for any commercial product.

(In reply to Justin Lebar [:jlebar] from comment #4)
> The only way I can think of to address this bug is, right when the main
> process receives the first inkling that there's an incoming call, we
> immediately drop the CPU priority of all other processes.  This will allow
> the dialer to launch quickly.
> 
> But I need to understand how serious we consider this bug to be before I
> invest in that.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

blocking-b2g: --- → tef?

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Assignee: nobody → justin.lebar+bug

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 844323

Fernando Jiménez Moreno [:ferjm]

Updated

•

11 years ago

Blocks: 812059

Anshul

Comment 6

•

11 years ago

I did some more analysis on the issue where incoming call is not getting displayed sometimes when user is busy zooming in/out and panning on a browser page.

The issue is that phone runs out of memory and kernel kills the communications app right after telephony sends the telephony-new-call message and before communications app gets a chance to launch.

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

11 years ago

> The issue is that phone runs out of memory and kernel kills the communications app right after 
> telephony sends the telephony-new-call message and before communications app gets a chance to launch.

Interesting.  That...shouldn't happen after bug 836199 was fixed.  Maybe we regressed something.

I'll dig into this, but if you're interested in providing more data, the first thing I'm going to try to collect is the result of modifying ProcessPriorityManager.cpp so it prints to logcat (near the top of the file; comment out an "&& 0") and then recording

 $ adb logcat -v time

and, after you've placed the call and it's died, running

 $ adb shell dmesg.

Anshul

Comment 8

•

11 years ago

Attached file Android logs with ProcessPriorityManager messages enabled — Details

Anshul

Comment 9

•

11 years ago

Attached file Dmesg logs after the incoming call UI failed to show — Details

Anshul

Comment 10

•

11 years ago

Justin, I was able to reproduce the issue after 7th try. The android logs is from the boot up so it would have all 7 incoming call related logs. Please look at the end of the log for the 7th call for which the communications app didn't show the incoming call UI.

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

11 years ago

Thanks for the info, Anshul.

This is all good initially.  But then we're releasing the CPU wake-lock, which should not happen until the call ends:

> 03-06 10:50:08.549 I/Gecko:ProcessPriorityManager(  700): Got wake lock changed event. Now mHoldsCPUWakeLock=0, mHoldsHighPriorityWakeLock=0

We then decide that the communications app is in the background (because it is):

> 03-06 10:50:09.579 I/Gecko:ProcessPriorityManager(  700): Reset priority timer callback; about to ResetPriorityNow.
> 03-06 10:50:09.579 I/Gecko:ProcessPriorityManager(  700): Setting priority to BACKGROUND

We die soon thereafter.

We shouldn't be releasing this wake lock.  But also, holding the lock isn't going to guarantee that the process will run quickly enough to receive the call.

Anshul

Comment 12

•

11 years ago

Justin the message from ProcessPriorityManager that there is no active cpu lock is after the call was disconnected and so isn't that expected. Also, please look at the last call in the logs, which is one that really has the issue; around the log statement below.

> 03-06 10:50:56.829 I/Gecko   (  127): -*- QCContentHelper_QC_B2G: Notify system message manager of telephony-new-call

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

11 years ago

Okay...

> the phone runs out of memory and the kernel kills the communications app right after 
> telephony sends the telephony-new-call message and before communications app gets a 
> chance to launch.

Which telephony PID do you think is getting killed here?

I wish we could match up the times in dmesg and logcat!

Michael Vines [:m1] [:evilmachines]

Updated

•

11 years ago

blocking-b2g: tef? → tef+

Anshul

Comment 14

•

11 years ago

Justin, 753 (Communications) app is getting killed.

Josh Carpenter [:jcarpenter]

Updated

•

11 years ago

Whiteboard: [FFOS_perf] → [FFOS_perf], u=user c=dialer s=ux-most-wanted

Kai-Chih Hu [:khu]

Comment 15

•

11 years ago

Do we have any update for this case? Thanks!

Kai-Chih Hu [:khu]

Comment 16

•

11 years ago

Justin, may I know if you're working on this? Or, should we find someone to help this? Just want to know if there has anything I can help here. Thanks!

Flags: needinfo?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Assignee

Comment 17

•

11 years ago

I'm working on bug 844323, which is a prerequisite to fixing this bug, as I currently understand it.

Flags: needinfo?(justin.lebar+bug)

Chris Lee [:clee]

Comment 18

•

11 years ago

This appears to be a important bug to fix especially if we're missing calls.  Sounds like jlebar is making good progress on bug 844323 (given the risk), but wanted to reiterate the importance.   Thanks.

Justin Lebar (not reading bugmail)

Assignee

Comment 19

•

11 years ago

As an update on my progress here: I have a working draft of bug 844323, and I've implemented code to immediately renice all processes to 20 when we receive an incoming call.

We renice all the processes within 400ms or so from when the RIL notices that we have a call.  But the phone still doesn't ring more than two or three times when under process stress ("bust processes" at http://bit.ly/membuster).  In other words, this change does not have the expected effect of fixing this bug.

There are a number of possible explanations for this.  Perhaps we're so overloaded that RIL doesn't receive the notification until later than it should.  Perhaps the main process is busy doing work for the reniced subprocesses, and this delays the communication app's loading.  Perhaps despite the fact that they've been reniced, the kernel schedules the processes too aggressively.  Maybe it's some combination of these factors, or maybe it's something else.

More investigation to come as I learn more.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 20

•

11 years ago

I will try to profile Monday.

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

11 years ago

You should probably try my git branch.

https://github.com/jlebar/mozilla-central/tree/process-priority-main-process-b2g18

Gregor Wagner [:gwagner]

Updated

•

11 years ago

Depends on: 860097

Justin Lebar (not reading bugmail)

Assignee

Comment 22

•

11 years ago

I have and am exercising tef+ powers for bugs this depends on.  Just ask.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 860273

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

11 years ago

With my whole-system profiler from bug 860273, I see the child processes taking up a lot of CPU time before they get reniced.

I'm running with my patch which gives nice 19 to all child processes, but I see samples like this

  pid=544 Browser nice=19 utime=0.010 stime=0.000
  pid=567 system-profiler nice=0 utime=0.000 stime=0.020
  pid=582 Browser nice=0 utime=0.070 stime=0.030

This sample reflects 100ms of time, and pid 582 got 100ms of CPU time in this sample (utime + stime = 0.1).

I'll see if I can figure out why these processes aren't getting reniced as quickly as I'd expect.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 861441

John Hammink

Updated

•

11 years ago

Whiteboard: [FFOS_perf], u=user c=dialer s=ux-most-wanted → [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442

Alex Keybl [:akeybl]

Updated

•

11 years ago

Whiteboard: [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 → [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid]

Alex Keybl [:akeybl]

Updated

•

11 years ago

Target Milestone: --- → Madrid (19apr)

Andrew Overholt [:overholt]

Updated

•

11 years ago

Whiteboard: [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid] → [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid][status: needs more investigation]

Gabriele Svelto [:gsvelto] (PTO)

Comment 24

•

11 years ago

I did some quite extensive testing of this scenario today as part of my investigation of bug 861441. From what I could gather there's no easy solution to fix it least landing both my patch in bug 861441 and Justin's patch-set for bug 844323.

When I was working on this however I came up with a possible, quicker (and dirtier) alternative which might allow us to fix this specific problem while dealing with the more general issue later. With my patch applied I tried making the Communications process put himself in the highest priority control group as soon as its created and have it stick there (i.e. never lower its priority to background). Since this completely bypasses the ProcessPriorityManager it ensures that when a call arrives Communications has enough CPU time to respond fast. With this trick I was able to consistently receive calls within a 2s time-frame under process-buster stress.

I know it's a really bad solution and it has its drawbacks (mainly related to OOM behavior since our priorities affect that too) but if we need a quick'n'dirty fix it works.

Whiteboard: [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid][status: needs more investigation] → [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid]

Justin Lebar (not reading bugmail)

Assignee

Comment 25

•

11 years ago

What you're talking about here is similar to what I have in bug 847592.  But note that the communications app is also the dialer app, so we can't lock it at the highest priority class.

Anyway I'm done doing quick-and-dirty fixes here.  We have time to do this right.  But your investigation is very helpful because we have a lower bound on what we're trying to achieve here.

Gabriele Svelto [:gsvelto] (PTO)

Comment 26

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #25)
> What you're talking about here is similar to what I have in bug 847592.  But
> note that the communications app is also the dialer app, so we can't lock it
> at the highest priority class.

Yes, it would be really ugly, especially from a memory POV.

> Anyway I'm done doing quick-and-dirty fixes here.  We have time to do this
> right.  But your investigation is very helpful because we have a lower bound
> on what we're trying to achieve here.

I agree, if we can fix this properly by having a proper method to quickly giving most of the CPU to a single task it will also greatly improve the overall responsiveness of the system.

As an extra data-point I've tried to measure how long it takes for a call to show up under perfect conditions (Communications already open, system essentially idle). With my stopwatch I could measure times a bit below one second; say 0.7-0.8s though the exact measure is not really relevant because it was well within the error margin of my reflexes :) So if we can get it to show up within a 2s timeframe under process-buster stress it would be a very good result IMO.

Kai-Chih Hu [:khu]

Comment 27

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #25)
> What you're talking about here is similar to what I have in bug 847592.  But
> note that the communications app is also the dialer app, so we can't lock it
> at the highest priority class.

Does it mean you're proposing another different solution here? Thanks! 

> 
> Anyway I'm done doing quick-and-dirty fixes here.

Thanks, Justin. Because this bug is marked as a certification blocker and it needs to be fixed before April 24th. Do you think it's possible to close this case by that day? 

  We have time to do this
> right.  But your investigation is very helpful because we have a lower bound
> on what we're trying to achieve here.

Justin Lebar (not reading bugmail)

Assignee

Comment 28

•

11 years ago

> Does it mean you're proposing another different solution here? Thanks! 

I'm proposing that we do what I have in bug 847592, instead of something that gsvelto and I think will regress other features of the phone.

> Do you think it's possible to close this case by that day? 

If it needs to be done by April 24, it doesn't really matter whether I think it's possible, right?  I'll do my best.

Kai-Chih Hu [:khu]

Comment 29

•

11 years ago

Reviewed on April 23th: Agreed to have this fixed before the 3rd certification.

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

11 years ago

Attached patch Part 1: Add an additional lowCPUPriority argument to hal::SetProcessPriority. (obsolete) — Details — Splinter Review

I'm not sure we need to take these patches -- I'd love nothing more than to wontfix this bug -- but I want to save what I have in case we do need it.

Justin Lebar (not reading bugmail)

Assignee

Comment 31

•

11 years ago

Attached patch Part 2: When a process has priority FOREGROUND_HIGH, renice all other processes. (obsolete) — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Comment 32

•

11 years ago

Attached patch Tests for renicing all other processes when one process gets priority FOREGROUND_HIGH (obsolete) — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Comment 33

•

11 years ago

Sorry, I should move these patches to a separate bug, since this is a metabug...

Alex Keybl [:akeybl]

Updated

•

11 years ago

Target Milestone: 1.0.1 Madrid (19apr) → 1.0.1 Cert2 (28may)

Kai-Chih Hu [:khu]

Comment 34

•

11 years ago

The OEM will need around 5 days to prepare the build and do the verification. It means, we need to fix it by May 21th. Justin, do you think it's possible? Thanks.

Justin Lebar (not reading bugmail)

Assignee

Comment 35

•

11 years ago

At this point we don't know how to solve this bug with the testcase we've been using.  My best guess is that the main process is busy doing work for the low-priority background processes, and that's what delays the communications app from loading.  If that's happening, I doubt we can fix it by May 21.

But the testcase we've been using is highly unrealistic; it launches something like twelve processes all at once.  I think we need to try with a more realistic testcase.

Alex Keybl [:akeybl]

Comment 36

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #35)
> But the testcase we've been using is highly unrealistic; it launches
> something like twelve processes all at once.  I think we need to try with a
> more realistic testcase.

Can we propose a more realistic testcase to partners?

Justin Lebar (not reading bugmail)

Assignee

Comment 37

•

11 years ago

> Can we propose a more realistic testcase to partners?

Absolutely; as I said, the next step is to try a more realistic testcase with the patches we have and see what happens.

Sarah Parsons

Comment 38

•

11 years ago

This issue reproduces on the Inari device. 

Build ID: 20130426070207
Environmental  Variables:
Kernel Date: Feb 21
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/54285d67454b
Gaia: c47ef39de04e634d847cc86b6acc616fabce69eb

Incoming calls take a long time display, sometimes they do not show up at all.

John Hammink

Comment 39

•

11 years ago

This issue does not repro on the Ikura device with 4/26 partner build:

Gaia   a3176dbb57dd73fe99397392bf1f8a98a6bf66dd
BuildID 20130426165758
Version 18.0

Justin Lebar (not reading bugmail)

Assignee

Comment 40

•

11 years ago

With all of the patches I've landed in dependent bugs, I haven't been able to cause my Hamachi device running b2g18 (not the 1.0.1 branch) to miss a call.  Calls do take a few extra seconds to get to the phone, but I always get at least two rings (if I'm not panning and zooming at nytimes, I get three rings).

There are three things we need to figure out here.

1. Whether we can make the CPU so busy, we miss a call.
2. If we can't receive a call, exactly what is the delay we can cause in receiving a call?
3. Is that delay acceptable?  If it's not acceptable, we need to decide which releases, if any, the delay blocks.

It would be helpful if QA could test b2g18_v1.0.1 with a rev that includes all the relevant patches to help determine answers to (1) and (2).  I just landed everything on hg: 

https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/41a0854fe988

This bug blocks the next round of certification, which is imminent.  So please consider this testing request to be a high priority.

Keywords: qawanted

Marcia Knous [:marcia]

Comment 41

•

11 years ago

Hey Justin - I pulled the latest 1.0.1 build from the directory today and it shows:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/5c1d67e0c242
Gaia   11477c127ae9be5051e4cfbcbf3da1d4150f9967
BuildID 20130502070201
Version 18.0

Does this version have the relevant patches that you note in Comment 40?

Justin Lebar (not reading bugmail)

Assignee

Comment 42

•

11 years ago

The hg debugancestor command is the best way I know of to find out.

If you want to know whether a tip commit T contains a commit Q, then you can run

  $ hg debugancestor T Q   (or Q T; it doesn't matter)

If the result is Q, then the tip contains Q.  If it's anything else, then it doesn't.

> $ hg debugancestor 5c1d67e0c 41a0854fe
> 118820:41a0854fe9884eb6544578b7cdc9442844496611

So, yes.

Marcia Knous [:marcia]

Comment 43

•

11 years ago

Attached file Testing Log with Browser and Music Player open — Details

I performed some testing with the Inari device and the build referenced in Comment 41, which is a Commercial RIL build with AT&T SIM card.

I had the following apps open on the phone while I was testing:

*Music Player - playing one track
*Browser with one tab open, nytimes.com

I basically called the device while I was aggressively pinching and zooming on the nytimes.com site. I got inconsistent results along the way, with it taking sometimes as much as 8 seconds for the call to ring through. Sometimes it took under 2 seconds.

The log I am attaching has the 8 second delay. I also have another log where the dialer screen simply froze.

I was not able to generate a consistent scenario where the calls were actually missed, but the delay still seems to be an issue. 

We are going to test again on the partner build just to see if this happens on that device.

Angela Hubenya

Comment 44

•

11 years ago

I got the same result as Marcia using the same build in comment 41 on the Inari.
I tested scenarios 1, and 2 a few times.
All of the received calls were within an inconsistent time frame on the Inari device.

Keywords: qawanted

Justin Lebar (not reading bugmail)

Assignee

Comment 45

•

11 years ago

> 2. If we can't receive a call, **exactly what is the delay** we can cause in receiving a call?

> All of the received calls were within an inconsistent time frame on the Inari device.

In the future, please try to be explicit about time frames, if you can.  That will help release drivers make an informed decision here.  Thanks.

Sreenidhi

Comment 46

•

11 years ago

Tested on an Inari using partner build with the following build information :

Gaia   a3176dbb57dd73fe99397392bf1f8a98a6bf66dd
BuildID 20130426165758
Version 18.0

STR:

1.Did an incoming call with memory intensive apps running… was using a browser, pinching and zooming the page in the browser.
2.Camera, dialer apps and a game were running in the background.
3.An incoming call to this phone resulted in a blank screen for few seconds (though the call was then displayd on the phone screen)

The time frame was inconsistent as in other comments. I would give the time range as 3 to 5 seconds (the screen being completely blank before the call and caller info could be displayed on the screen)


The logcat shows something in RED like this :

05-02 12:33:40.804: E/GeckoConsole(1994): [JavaScript Warning: "HTTP "Content-Type" of "text/html" is not supported. Load of media resource app://communications.gaiamobile.org/dialer/oncall.html failed." {file: "app://communications.gaiamobile.org/dialer/oncall.html#unlocked" line: 0}]

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 868521

Justin Lebar (not reading bugmail)

Assignee

Comment 47

•

11 years ago

In my latest tests, I see that merely having the browser open to nytimes.com adds ~1s to the call-received time.  In contrast, having example.com open does not add anything to the call-received time.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 868533

Stephany Wilkes

Comment 48

•

11 years ago

Just reviewing UX tags and what UX help, if any, may still be needed here (per the tags). Please let me know.

Justin Lebar (not reading bugmail)

Assignee

Comment 49

•

11 years ago

Recording some of our progress from today:

bz created a whole-system profile which we've been looking at. (I'll attach the profile in a moment.) He started profiling when the phone he was calling from started to ring.

What we see in this profile is that there are two long extents of time while the communications app gets under 10% cpu. During those periods of time, the browser process gets roughly 60% of the CPU, and the B2G main process gets roughly 25% of the CPU.

The browser process spends all of its time reflowing, and the main process spends half of its time compositing and half of its time sending touch events to the browser.

We'd previously thought that what might be happening is that the browser app causes the main process to do a lot of work on its behalf, and indeed the compositing that the main process is doing is a result of the reflows in the browser. But the compositing happens on its own thread in the main process, so this should not be getting in the way of any events coming from the communications app. So it seems that if the main process is not responding quickly enough to requests from the child, this is likely a result of touch events. But since I seem to get calls pretty quickly when I'm touching the phone a lot, I think touch events are unlikely what's causing our problem here.

The outstanding question is: Why is the communications app getting so little CPU? We want to distinguish between three cases:

1) The communications app is blocked (on I/O, on a sync call to the parent, on page faults, or something else)

2) The communications app is runnable, but is not being scheduled

3) The communications app doesn't /need/ a lot of CPU to start up, but whatever it needs to do to start up isn't happening quickly enough.

Jed is working on getting data out of perf which will take a stack every time a context switch happens. Hopefully we'll be able to look at when the communications app is de-scheduled and distinguish between (1) and (2). To understand (3), it would be helpful to have a profile of receiving a call while not under CPU load.

Justin Lebar (not reading bugmail)

Assignee

Comment 50

•

11 years ago

Attached file Profile from comment 49 — Details

xz -d this file, then open it in http://people.mozilla.com/~bgirard/cleopatra/#.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 51

•

11 years ago

We've determined that the kernel we're using doesn't give a call stack or a PC for PERF_COUNT_SW_CONTEXT_SWITCHES.  (Note that it does work as expected for PERF_COUNT_SW_CPU_CLOCK.)  A more recent kernel on x86 doesn't have this problem, but we don't yet know if the issue is machine-independent (in which case there is presumably a fix) or machine-dependent (in which case it may still be broken upstream).

Note that PERF_COUNT_SW_PAGE_FAULTS does appear to work correctly, so we could try taking a time profile and a page fault profile and merging them.

Something else we could try: using TableTicker with the patch in bug 810526 (and my modified toolchain).  We should either see a sample every N milliseconds, in which case we'll know what the thread of interest is doing, or we don't, in which case we suspect that it's runnable but the kernel chooses not to run it (which would be supported by also seeing an absence of page faults in that period).

Justin Lebar (not reading bugmail)

Assignee

Comment 52

•

11 years ago

> Something else we could try: using TableTicker with the patch in bug 810526 (and my modified 
> toolchain).  We should either see a sample every N milliseconds, in which case we'll know what the 
> thread of interest is doing, or we don't, in which case we suspect that it's runnable but the kernel 
> chooses not to run it (which would be supported by also seeing an absence of page faults in that 
> period).

This samples by sending signals to each thread, right?  I'd expect us to get samples from both runnable and non-runnable threads, except that the blocked threads would sample at a blocking function, and the unblocked threads wouldn't.  Am I missing something?

Anyway, that would be interesting either way...

Justin Lebar (not reading bugmail)

Assignee

Comment 53

•

11 years ago

> 3) The communications app doesn't /need/ a lot of CPU to start up, but whatever it needs to do to 
> start up isn't happening quickly enough.

I guess another way of putting this is, the process of loading the communications app may not be CPU bound.  If this was the case, I guess I'd expect to see a lot of idle CPU time during the app's startup.  But that would be pretty weird...

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 54

•

11 years ago

From my reading of the kernel sources I don't think we're ever going to get stacks from PERF_COUNT_SW_CONTEXT_SWITCHES (latest version of kernel on ARM still uses empty registers as far as I can tell, http://lxr.free-electrons.com/source/include/linux/perf_event.h#L681 )

Justin Lebar (not reading bugmail)

Assignee

Comment 55

•

11 years ago

If we stop sending and receiving messages to non-critical processes as soon as we get an incoming call, it does seem to work to fix this bug.  It's a crappy user experience, but it's something.  I'll post patches and we can take the discussion into bug 868533.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 56

•

11 years ago

(In reply to ben turner [:bent] from comment #54)
> From my reading of the kernel sources I don't think we're ever going to get
> stacks from PERF_COUNT_SW_CONTEXT_SWITCHES (latest version of kernel on ARM
> still uses empty registers as far as I can tell,
> http://lxr.free-electrons.com/source/include/linux/perf_event.h#L681 )

I think the problem might just be that ARM doesn't define perf_arch_fetch_caller_regs: http://lxr.free-electrons.com/ident?i=perf_arch_fetch_caller_regs

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 57

•

11 years ago

Yes, exactly.

Justin Lebar (not reading bugmail)

Assignee

Comment 58

•

11 years ago

Attached file Profile without loading the CPU — Details

As before, xz -d and load in http://people.mozilla.com/~bgirard/cleopatra/.

What I find immediately interesting when comparing this profile to the other one is that here, HTML5 parsing takes place in one short burst, whereas in the loaded profile, HTML5 parsing is spread over almost half of the sample!  The communications app doesn't do anything else until it's done parsing HTML.

I don't immediately see why HTML5 parsing should be special.  But perhaps reading the HTML files out of omni.ja is particularly affected by load for some reason.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 59

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #58)
> I don't immediately see why HTML5 parsing should be special.  But perhaps
> reading the HTML files out of omni.ja is particularly affected by load for
> some reason.

It seems to me, from the first (bad) profile, that the HTML5 parser is running script, and that's where we are in at least some of the cases where the Communications app main thread is absent from the CPU for unusually long periods (up to 450ms in one case, unless I'm misreading).  I wonder if there's something the script is doing there that we want to know about.

Also, the first sample after the 450ms gap is constructing an XPCAutoRequest and the last one before it is in nsGlobalWindow::AddEventListener.  I don't know if that's of interest.

Gregor Wagner [:gwagner]

Comment 60

•

11 years ago

Attached file perf profile for missed call during app installation — Details

I have a STR that never shows the incoming call on b2g18:
Go to marketplace and install the timeout app: TO Barcelona
During the installation call the phone.
I tried it 4 times and it showed the missing call about 5 sec after the network ended the call.

Justin Lebar (not reading bugmail)

Assignee

Comment 61

•

11 years ago

Here's a patch which makes the Mozilla RIL print something when it notices that we receive a call.  I /think/ this is pretty early in the lifecycle.

diff --git a/dom/system/gonk/ril_worker.js b/dom/system/gonk/ril_worker.js
index 0a1d395..b60537c 100644
--- a/dom/system/gonk/ril_worker.js
+++ b/dom/system/gonk/ril_worker.js
@@ -4351,16 +4351,17 @@ let RIL = {
         currentCall.started = new Date().getTime();
       }
       currentCall.state = newCall.state;
       this._handleChangedCallState(currentCall);
     }
 
     // Go through any remaining calls that are new to us.
     for each (let newCall in newCalls) {
+      dump("XXX ril_worker::_process_calls() got new call.\n");
       if (newCall.isVoice) {
         // Format international numbers appropriately.
         if (newCall.number &&
             newCall.toa == TOA_INTERNATIONAL &&
             newCall.number[0] != "+") {
           newCall.number = "+" + newCall.number;
         }
         if (newCall.state == CALL_STATE_INCOMING) {

Justin Lebar (not reading bugmail)

Assignee

Comment 62

•

11 years ago

And here's a page which reflows a lot, in an attempt to automatically replicate the nytimes testcase.  http://bit.ly/jlflow

Justin Lebar (not reading bugmail)

Assignee

Comment 63

•

11 years ago

(In reply to Gregor Wagner [:gwagner] from comment #60)
> I have a STR that never shows the incoming call on b2g18:
> Go to marketplace and install the timeout app: TO Barcelona
> During the installation call the phone.
> I tried it 4 times and it showed the missing call about 5 sec after the
> network ended the call.

FWIW I can't reproduce this locally on my hamachi.  I'll have a look at the profile and see if it's informative.

Justin Lebar (not reading bugmail)

Assignee

Comment 64

•

11 years ago

Gregor, when did you start recording this profile?

One thing I see is us spending a ton of CPU doing some animations in the homescreen process.  Was the homescreen even showing while you recorded this profile?

Gregor Wagner [:gwagner]

Comment 65

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #64)
> Gregor, when did you start recording this profile?
> 
> One thing I see is us spending a ton of CPU doing some animations in the
> homescreen process.  Was the homescreen even showing while you recorded this
> profile?

As said on IRC, we animate the TO Barcelona icon during installation.

Justin Lebar (not reading bugmail)

Assignee

Comment 66

•

11 years ago

Comment on attachment 740801 [details] [diff] [review]
Part 1: Add an additional lowCPUPriority argument to hal::SetProcessPriority.

I'm going to move these patches into a separate bug.

Attachment #740801 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Attachment #740802 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Attachment #740803 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 870181

Andrew Overholt [:overholt]

Updated

•

11 years ago

Whiteboard: [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid] → [status: still investigating, needs work and deps to land] [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid]

Justin Lebar (not reading bugmail)

Assignee

Comment 67

•

11 years ago

Attached patch Part 1: Add an additional LowCPUPriority argument to hal::SetProcessPriority. (obsolete) — Details — Splinter Review

This patch also makes it an error to call hal::SetProcessPriority from a child process.

Attachment #747237 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Comment 68

•

11 years ago

Attached patch Part 2: Lower CPU priority for non-high priority processes when there's an incoming call. (obsolete) — Details — Splinter Review

Attachment #747238 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Comment 69

•

11 years ago

Attached patch Part 3: Test that we downgrade processes' CPU priorities when appropriate. (obsolete) — Details — Splinter Review

Attachment #747239 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Comment 70

•

11 years ago

Comment on attachment 747237 [details] [diff] [review]
Part 1: Add an additional LowCPUPriority argument to hal::SetProcessPriority.

Sorry, wrong bug.

Attachment #747237 - Attachment is obsolete: true

Attachment #747237 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Attachment #747238 - Attachment is obsolete: true

Attachment #747238 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Attachment #747239 - Attachment is obsolete: true

Attachment #747239 - Flags: review?(bent.mozilla)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 870203

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 870598

Justin Lebar (not reading bugmail)

Assignee

Comment 71

•

11 years ago

I think we may have fixed this bug with all of the code that we've landed in the past few days in blocking bugs.

I'd love some QA help here verifying this fix, particularly if you can be creative with your testcases.

In particular, today is the tef+ target milestone, so if you guys are able to test /today/, that gives me a much better (that is, nonzero) chance of addressing before the deadline any new issues you discover.

RyanVM is going to do a birch --> b2g18 merge soon, and I'll give you an hg changeset that you can test once he does that.  But you will probably need to spin your own build.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 870480

Kai-Chih Hu [:khu]

Comment 72

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #71)
> I think we may have fixed this bug with all of the code that we've landed in
> the past few days in blocking bugs.
> 
> I'd love some QA help here verifying this fix, particularly if you can be
> creative with your testcases.
> 
> In particular, today is the tef+ target milestone, so if you guys are able
> to test /today/, 

Tony or Al, is this something you can help? Thanks.

that gives me a much better (that is, nonzero) chance of
> addressing before the deadline any new issues you discover.
> 
> RyanVM is going to do a birch --> b2g18 merge soon, and I'll give you an hg
> changeset that you can test once he does that.  But you will probably need
> to spin your own build.

Flags: needinfo?(tchung)

Flags: needinfo?(atsai)

Jason Smith [:jsmith]

Comment 73

•

11 years ago

Let's avoid using needinfo here - use qawanted for QA support.

Flags: needinfo?(tchung)

Flags: needinfo?(atsai)

Keywords: qawanted

Jason Smith [:jsmith]

Updated

•

11 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

mlevin

Comment 74

•

11 years ago

Tested 10 incoming calls to Inari while zooming in and out on nytimes.com. 

RSSI bars was only 1.

Average time it took to receive incoming call on Inari from the time the originating phone started dialing out was 4.3 seconds.

Inari device build info:
Commercial RIL
Inari Build ID: 20130513070209
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/6e1dbf036d34
Gaia: 54dfb62735c81152eef74eb1768e2bf621c0a258

Removing QAWanted keyword.

Keywords: qawanted

Justin Lebar (not reading bugmail)

Assignee

Comment 75

•

11 years ago

Thanks for testing and reporting a number, mlevin.  But how long does it take to receive a call when we're not under load?  It's very hard for me to understand what 4.3s means without this information.

We also need a lot more QA than this before we can call this bug fixed.  This bug calls for creative QA, as Marcia started doing in comment 43 (by testing with music playing in the background).

Please come up with creative ways to stress the phone and try to break this.

Keywords: qawanted

Justin Lebar (not reading bugmail)

Assignee

Comment 76

•

11 years ago

> RSSI bars was only 1.

Also, what does this mean?

mlevin

Comment 77

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #76)
> > RSSI bars was only 1.
> 
> Also, what does this mean?

RSSI = Received Signal Strength Indicator. It is the cellular bars of reception which appear on the status bar of cell phones. It can range from 0 to 5 usually.

Stephany Wilkes

Updated

•

11 years ago

Whiteboard: [status: still investigating, needs work and deps to land] [FFOS_perf], u=user c=dialer s=ux-most-wanted, IOT, Chile, Ikura, khepara_43442 [madrid] → [status: still investigating, needs work and deps to land] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking

Marcia Knous [:marcia]

Comment 78

•

11 years ago

QA has started looking at these builds and should have some data shortly. I filed Bug 872171 as a separate bug, mainly because I can reproduce it consistently. We will make sure to note what apps are running, timing, and logcat information when we report back.

mlevin

Comment 79

•

11 years ago

Attached file Received Call Times To Inari — Details

mlevin

Comment 80

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #75)
> Thanks for testing and reporting a number, mlevin.  But how long does it
> take to receive a call when we're not under load?  It's very hard for me to
> understand what 4.3s means without this information.
> 
> We also need a lot more QA than this before we can call this bug fixed. 
> This bug calls for creative QA, as Marcia started doing in comment 43 (by
> testing with music playing in the background).
> 
> Please come up with creative ways to stress the phone and try to break this.

Re-tested this on Inari with the following software:

Commercial RIL
Inari Build ID: 20130514070202
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/08196b630a4a
Gaia: 57188a05fe5b669fc98bfcc18a49c9b3effdac06

Please see attached Excel document titled: "Received Call Times to Inari.xlsx"
(See comment 79)

Average time to receive incoming call on non-stressed Inari: 8.28 seconds
Average time to receive incoming call on stressed Inari: 12.65 seconds

Please refer to the aforementioned document for the complete test details.
(See commment 79)

Alex Keybl [:akeybl]

Updated

•

11 years ago

Whiteboard: [status: still investigating, needs work and deps to land] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking → [target:resolved already?] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking

Johnny Stenback (:jst)

Updated

•

11 years ago

Whiteboard: [target:resolved already?] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking → [target: likely done, but waiting on QA results and investigation thereof] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking

Justin Lebar (not reading bugmail)

Assignee

Comment 81

•

11 years ago

> Stressed (wifi on, wifi hotspot on, cellular data & roaming on, gps & Bluetooth on, active Internet 
> page zooming on nytimes.com, Gallery, Marketplace, Here Maps, Settings, Calendar, Clock, Music, 
> Video, E-mail, Contacts, SMS apps all running in the background, music file playing).

The Inari has 256mb of RAM.  On my Hamachi (also 256mb), I can have about two apps running in the background, plus the music app playing music.  My phone definitely cannot run all of these apps at the same time, and I strongly suspect yours can't either.

Reproducing this may depend on exactly which apps are still alive at the point when you make the call, so that information would be helpful.  Long-press the home button to see, or run adb shell b2g-ps.

Justin Lebar (not reading bugmail)

Assignee

Comment 82

•

11 years ago

fwiw I'm having trouble testing with audio playing in the bg due to bug 872655.

mlevin

Comment 83

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #81)
> > Stressed (wifi on, wifi hotspot on, cellular data & roaming on, gps & Bluetooth on, active Internet 
> > page zooming on nytimes.com, Gallery, Marketplace, Here Maps, Settings, Calendar, Clock, Music, 
> > Video, E-mail, Contacts, SMS apps all running in the background, music file playing).
> 
> The Inari has 256mb of RAM.  On my Hamachi (also 256mb), I can have about
> two apps running in the background, plus the music app playing music.  My
> phone definitely cannot run all of these apps at the same time, and I
> strongly suspect yours can't either.
> 
> Reproducing this may depend on exactly which apps are still alive at the
> point when you make the call, so that information would be helpful. 
> Long-press the home button to see, or run adb shell b2g-ps.

Re-tested today on same Inari build as comment 80.
Commercial RIL
Inari Build ID: 20130514070202
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/08196b630a4a
Gaia: 57188a05fe5b669fc98bfcc18a49c9b3effdac06

Unstressed (wifi off, celluar data roaming & connection off, gps & bluetooth off, no music streaming, display screen set to "never").
10 calls.
Average time for call to be received by Inari: 8.5 seconds.

Stressed (wifi on, cellular data & roaming on, gps & Bluetooth on, active Internet page zooming on nytimes.com, background music file playing).
10 Calls.
Average time for call to be received by Inari: 10.9

RSSI Inari: 2 bars
RSSI Unagi: 4 bars

Justin Lebar (not reading bugmail)

Assignee

Comment 84

•

11 years ago

In the excel spreadsheet from comment 79, the first call when we were under load took 18.6s, while the other calls took 10-12s each.

Is the lack of that outlier why the average is lower here?

mlevin

Comment 85

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #84)
> In the excel spreadsheet from comment 79, the first call when we were under
> load took 18.6s, while the other calls took 10-12s each.
> 
> Is the lack of that outlier why the average is lower here?

It could very well be. 

Here is the data for the stressed calls on the 2nd run (seconds):
10.9
11.2
10.5
10.9
10.7
10.4
11.6
11.0
10.7
11.4

Justin Lebar (not reading bugmail)

Assignee

Comment 86

•

11 years ago

I think we need to be focusing on the 18-second outlier.  If we can reproduce that somewhat consistently, that would be a big deal.  A three-second delay is much harder to worry about.

Justin Lebar (not reading bugmail)

Assignee

Comment 87

•

11 years ago

bz and I spent a few hours trying to reproduce this long delay under various load conditions, and we couldn't get a delay more than two or three seconds longer than what we see when the phone is not loaded.

That's consistent with all but one of the data points in mlevin's testing.  It's possible that the outlier is significant -- perhaps it corresponds to the phone starting up the communications app -- and perhaps it is insignificant -- perhaps it corresponds to a momentary blip in the network.  It's hard to say without more testing.

Marcia Knous [:marcia]

Comment 88

•

11 years ago

Sreenidhi and I have been continuing to test this and are tracking our results here: https://docs.google.com/spreadsheet/ccc?key=0Al4xZaJkos_adHdVLXg1bjRMTS01MzZkSzhiM2NVUHc#gid=0. We are using the build from today which seemed to me to bit a better than the one I was testing with yesterday.

During my testing today, the only time I hit a real show stopper is when I was loading nfl.com, aggressively panning and zooming while an inbound call was coming in -  I got a slow script warning which paralyzed the phone, causing an entire manual phone reboot. I also did see at least one case where the call crapped out while I was aggressively panning zooming in the browser.

One thing I consistently noticed today is that while I have the headphones in my ear and the music is playing, there seems to be a bit of delay between the time I stop hearing the audio play in the music player and the time I actually hear the phone start ringing. In other words, it is far from being an immediate transition.

Doug Turner (:dougt)

Comment 89

•

11 years ago

Marcia -- these numbers are meaningless, aren't they?  A lot of the time that you are recording is time spent in the cell network making the actual phone call.  This 'cell connection time' is variable and isn't something we control.

I suppose that you don't have a better way of measuring this time, do you?  Basically what I want to know is how long does it take between the time the modem knows that there is a phone call and the time that it takes for us to show the user the phone UI.

bz, jlebar, do we have that sort of instrumentation?

Justin Lebar (not reading bugmail)

Assignee

Comment 90

•

11 years ago

> Marcia -- these numbers are meaningless, aren't they?

They're not meaningless; we just need a baseline to compare them to.

> bz, jlebar, do we have that sort of instrumentation?

No.  I've asked, and nobody has known how to do this properly.

But what you're asking for also assumes that nothing B2G does affects how long it takes before the modem knows that there's an incoming call.  I don't know if that's true.

Anyway I don't think we can instrument the modem itself.  We can instrument B2G, but certainly what we do in B2G can affect how long it takes before B2G notices that there's an incoming call.

Boris Zbarsky [:bzbarsky]

Comment 91

•

11 years ago

Attached patch Patch that logs when the RIL finds out we have a call — Details — Splinter Review

> bz, jlebar, do we have that sort of instrumentation?

This patch prints to the logcat log when the RIL sees an incoming call, or so we think.

In my measurements, this print happens at exactly the same time (within my perception limits) as the first "brrrring" is heard back from the network on the originating phone, so I believe it fairly accurately reflects the "RIL has a call coming in" event...  It's worth double checking whether the two correlate that well for others.

Basically, what I've been doing is running:

  adb logcat | grep 'ril_worker::_process_calls'

then doing whatever on the phone while at the same time dialing it from another phone, and paying attention to three times:

1)  When the phone that originates the call gets a "brrring" from the network.
2)  When the log line process_calls line appears in logcat.
3)  When the target phone rings.

For me, #1 and #2 always happen at the same time, as I said.  I've been focused on finding and debugging situations in which there is a delay between #2 and #3.

Justin Lebar (not reading bugmail)

Assignee

Comment 92

•

11 years ago

> This patch prints to the logcat log when the RIL sees an incoming call, or so we think.

Doesn't this printf happen /after/ the process priority manager sets the communications app to high priority?  I can't actually see what this patch is, because bugzilla is acting up in a strange way.

Boris Zbarsky [:bzbarsky]

Comment 93

•

11 years ago

> /after/ the process priority manager sets the communications app to high priority?

Hmmm.  Yes, I think we had that going on.  Again, the key part for me was the correlation between this printf and the caller getting the "phone is rining" signal.

Marcia Knous [:marcia]

Comment 94

•

11 years ago

Boris: When I try to grep for 'ril_worker::_process_calls', I am not getting that generated in the logs on my builds - we are testing Commercial RIL builds from the pvt directory - is there something else I need to do on my end to see this? Or will your patch address that (maybe we can get a tryserver build with that patch?)


(In reply to Boris Zbarsky (:bz) from comment #91)
[snip]

Justin Lebar (not reading bugmail)

Assignee

Comment 95

•

11 years ago

If we're going to do a special build here, we want that patch, the patch in bug 870203, and we also want ENABLE_LOGGING uncommented in ProcessPriorityManager.cpp.

bz's patch won't help with the commercial RIL, although the commercial RIL has its own log messages that give some indication of when it receives a call.  Please remember to use logcat -v time, so we can see when things happen.

Anshul

Comment 96

•

11 years ago

If you are using commercial RIL, you can collect radio logs using the following command.

 adb logcat -b radio -v time 

 The radio logs should show the following two messages to specify that there is an incoming call.
  
 UNSOL_CALL_RING <- When network tells the phone to start ringing
 GET_CURRENT_CALLS (with state as INCOMING) <- When network sends an incoming call
These two events usually are within msecs apart.

Gabriele Svelto [:gsvelto] (PTO)

Comment 97

•

11 years ago

I'll add another data-point to the tests you've been doing in the hope it sheds some light on the very long outlier you're seeing from time to time. When testing under process buster stress load I used to keep 'top' open just to get the feeling of what was going on. From time to time I would see a much longer delay than usual and for some reason the 'mmcqd' process would show up in top consuming ~20-25% CPU for a few seconds and then quieting down again.

I don't know if this is the cause for the outliers and I found it a bit strange too because AFAIR it's a kernel-thread that deals with the SD card but I remember seeing it more than once in my testing (always under process-buster stress) which got me suspicious about it.

Justin Lebar (not reading bugmail)

Assignee

Comment 98

•

11 years ago

> During my testing today, the only time I hit a real show stopper is when I was loading nfl.com, 
> aggressively panning and zooming while an inbound call was coming in -  I got a slow script warning 
> which paralyzed the phone, causing an entire manual phone reboot.

This seems like the worst outstanding issue at the moment, so I'm going to see if I can reproduce this, unless someone thinks I should be looking into something else.

The fact that we get a slow-script warning doesn't sound like a bug, however; that's exactly what should happen if we need to give CPU to another process.

The fact that the slow-script warning freezes the phone may not be a Gecko bug.  It certainly doesn't sound like a process-priority bug.  Anyway, we'll see if I can see.

Justin Lebar (not reading bugmail)

Assignee

Comment 99

•

11 years ago

I managed to get the main process to segfault just by panning and zooming on nfl.com and then placing a call.  Unfortunately, this is all I got:

05-16 17:38:44.339 F/libc    (  144): Fatal signal 11 (SIGSEGV) at 0x0000003c (code=1)

I'm starting to believe that this bug is fixed as well as we're going to fix it, relative to the overall quality level of our product at the moment.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 873284

Sreenidhi

Comment 100

•

11 years ago

Hey Anshul, 

I am testing on an Inari, with Commercial RIL (RIL 107) for the bug in the subject. It is using the following build info:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/0b6bcb1f4175
Gaia   9648799c2e45917ff150fa9eef8aeac79a9ac008
BuildID 20130515070203
Version 18.0


Comment 96 advises to use adb -b radio -v time to know about the time when the network tells the phone to start ringing and the time when the network sends an incoming call (UNSOL_CALL_RING and GET_CURRENT_CALLS). 

When I looked at the log, I was surprised to find that there were multiple entries for the GET_CURRENT_CALLS message, say : 

15:27:22.421
15:54:35.114
12:29:13.882
12:32:42.206 

and these timestamps are repetitive all over the log in a cycle. The same is the case for UNSOL_CALL_RING. So, to compute the difference between the UNSOL_CALL_RING and GET_CURRENT_CALLS, which timestamp should be considered?

Flags: needinfo?(anshulj)

Sreenidhi

Comment 101

•

11 years ago

Attached file Logcat for long delay in displaying incoming calls — Details

Attached the corresponding logcat, by using the adb -b radio -v time command

Daniel Coloma:dcoloma

Comment 102

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #99)
> I managed to get the main process to segfault just by panning and zooming on
> nfl.com and then placing a call.  Unfortunately, this is all I got:
> 
> 05-16 17:38:44.339 F/libc    (  144): Fatal signal 11 (SIGSEGV) at
> 0x0000003c (code=1)
> 
> I'm starting to believe that this bug is fixed as well as we're going to fix
> it, relative to the overall quality level of our product at the moment.

We (tef) believe that what you have achieved is enough for 1.0.1. Can we close this one and open a new one for follow-up bugs? Alternatively we could move this to leo+.

Flags: needinfo?(justin.lebar+bug)

Alex Keybl [:akeybl]

Comment 103

•

11 years ago

Given the testing thus far at https://mozqa.etherpad.mozilla.org/Testing-Bug847592, we haven't run into any critical issues in the given user scenarios. Let's reopen when a new blocking scenario is found (if any).

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Jason Smith [:jsmith]

Updated

•

11 years ago

Keywords: qawanted

Jason Smith [:jsmith]

Comment 104

•

11 years ago

(In reply to Alex Keybl [:akeybl] from comment #103)
> Given the testing thus far at
> https://mozqa.etherpad.mozilla.org/Testing-Bug847592, we haven't run into
> any critical issues in the given user scenarios. Let's reopen when a new
> blocking scenario is found (if any).

Given the large array of comments, I'd instead suggest that if we find a new blocking scenario, then let's file new bugs. That way, we don't turn this bug into an endless bug that never ends.

Justin Lebar (not reading bugmail)

Assignee

Comment 105

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #104)
> Given the large array of comments, I'd instead suggest that if we find a new
> blocking scenario, then let's file new bugs. That way, we don't turn this
> bug into an endless bug that never ends.

Yes, please.

Thank you so much for your hard work and patience here, everyone.

Flags: needinfo?(justin.lebar+bug)

Daniel Coloma:dcoloma

Updated

•

11 years ago

status-b2g18: --- → fixed

status-b2g18-v1.0.1: --- → fixed

Anshul

Updated

•

11 years ago

Flags: needinfo?(anshulj)

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Whiteboard: [target: likely done, but waiting on QA results and investigation thereof] [FFOS_perf], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking → [target: likely done, but waiting on QA results and investigation thereof] [c= p= s=2013.05.17 u=], IOT, Chile, Ikura, khepara_43442 [madrid], ux-tracking

Android logs with ProcessPriorityManager messages enabled 11 years ago Anshul 886.88 KB, text/plain		Details
Dmesg logs after the incoming call UI failed to show 11 years ago Anshul 52.62 KB, text/plain		Details
Part 1: Add an additional lowCPUPriority argument to hal::SetProcessPriority. 11 years ago Justin Lebar (not reading bugmail) 9.07 KB, patch		Details \| Diff \| Splinter Review
Part 2: When a process has priority FOREGROUND_HIGH, renice all other processes. 11 years ago Justin Lebar (not reading bugmail) 14.15 KB, patch		Details \| Diff \| Splinter Review
Tests for renicing all other processes when one process gets priority FOREGROUND_HIGH 11 years ago Justin Lebar (not reading bugmail) 9.85 KB, patch		Details \| Diff \| Splinter Review
Testing Log with Browser and Music Player open 11 years ago Marcia Knous [:marcia] 504.97 KB, text/plain		Details
Profile from comment 49 11 years ago Justin Lebar (not reading bugmail) 358.91 KB, application/x-xz		Details
Profile without loading the CPU 11 years ago Justin Lebar (not reading bugmail) 206.61 KB, application/x-xz		Details
perf profile for missed call during app installation 11 years ago Gregor Wagner [:gwagner] 1.53 MB, application/x-gzip		Details
Part 1: Add an additional LowCPUPriority argument to hal::SetProcessPriority. 11 years ago Justin Lebar (not reading bugmail) 20.77 KB, patch		Details \| Diff \| Splinter Review
Part 2: Lower CPU priority for non-high priority processes when there's an incoming call. 11 years ago Justin Lebar (not reading bugmail) 12.64 KB, patch		Details \| Diff \| Splinter Review
Part 3: Test that we downgrade processes' CPU priorities when appropriate. 11 years ago Justin Lebar (not reading bugmail) 9.60 KB, patch		Details \| Diff \| Splinter Review
Received Call Times To Inari 11 years ago mlevin 9.29 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet		Details
Patch that logs when the RIL finds out we have a call 11 years ago Boris Zbarsky [:bzbarsky] 496 bytes, patch		Details \| Diff \| Splinter Review
Logcat for long delay in displaying incoming calls 11 years ago Sreenidhi 946.59 KB, text/plain		Details