Closed Bug 942840 Opened 11 years ago Closed 10 years ago

Unable to measure app load time for Phone app on mozilla-central/master builds in CI

Categories

(Firefox OS Graveyard :: Gaia::Dialer, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
1.3 C3/1.4 S3(31jan)

People

(Reporter: davehunt, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c=automation p= s=2014.01.31 u=])

Attachments

(2 files, 1 obsolete file)

The b2g.hamachi.mozilla-central.master.perf job in Jenkins CI is failing to retrieve the cold_load_time metric when launching the Phone app.

See http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.hamachi.mozilla-central.master.perf/620/consoleFull (requires VPN)

$ b2gperf --delay=10 --sources=b2g-distro/sources.xml --testvars=/home/webqa/webqa-credentials/b2g/b2g-3.json --dz-project=b2g --dz-branch=master --dz-device=hamachi --dz-key=******** --dz-secret=******** Phone Contacts Messages Settings Gallery Video Music Camera Email Calendar Clock FM Radio Usage Template Browser
2013-11-25 04:34:36 | B2GPerfRunner | INFO | Running B2GPerfLaunchTest
2013-11-25 04:36:16 | B2GPerfRunner | INFO | Phone [1/30]
Traceback (most recent call last):
  File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.master.perf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 378, in run
    raise MissingMetricError(self.app_name, metric, i)
MissingMetricError: Phone missing cold_load_time metric in iteration 0
Traceback (most recent call last):
  File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.master.perf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 378, in run
    raise MissingMetricError(self.app_name, metric, i)
MissingMetricError: Phone missing cold_load_time metric in iteration 0
...
Traceback (most recent call last):
  File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.master.perf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 244, in measure_app_perf
    test.run()
  File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.master.perf/.env/local/lib/python2.7/site-packages/b2gperf/b2gperf.py", line 388, in run
    raise ExceededThresholdError()
ExceededThresholdError: Exceeded failure threshold for gathering results
I've been unable to replicate this locally. I have set the log level on CI to DEBUG and switched the first two apps (Phone and Contacts) to see if this affects the first app or specifically the Phone app.
Flags: needinfo?(dave.hunt)
It appears that the issue is specific to the Phone app. From adding debug logging I can see that the very first launch of the app appears to provide a metric, but all subsequent launches fail. It could be that the app is not successfully killed. I will try to replicate this again locally, but may need somebody in MV to watch the device while I run tests remotely.
Flags: needinfo?(dave.hunt)
Summary: Unable to measure app load time for Phone app on mozilla-centra/master builds in CI → Unable to measure app load time for Phone app on mozilla-central/master builds in CI
I'm still unable to replicate this locally. Stephen: Could you (or someone else from MV) help me to investigate this? I can trigger a job but will need someone to be watching the phone to see what's happening.
Flags: needinfo?(stephen.donner)
(In reply to Dave Hunt (:davehunt) from comment #3)
> I'm still unable to replicate this locally. Stephen: Could you (or someone
> else from MV) help me to investigate this? I can trigger a job but will need
> someone to be watching the phone to see what's happening.

I'll try to help with this tomorrow, when I'm (hopefully) back in the office.
(In reply to Stephen Donner [:stephend] from comment #4)
> (In reply to Dave Hunt (:davehunt) from comment #3)
> > I'm still unable to replicate this locally. Stephen: Could you (or someone
> > else from MV) help me to investigate this? I can trigger a job but will need
> > someone to be watching the phone to see what's happening.
> 
> I'll try to help with this tomorrow, when I'm (hopefully) back in the office.

I'll try to help with this tomorrrow (be in earlier); if you'll be around too.
(In reply to Stephen Donner [:stephend] from comment #5)
> (In reply to Stephen Donner [:stephend] from comment #4)
> > (In reply to Dave Hunt (:davehunt) from comment #3)
> > > I'm still unable to replicate this locally. Stephen: Could you (or someone
> > > else from MV) help me to investigate this? I can trigger a job but will need
> > > someone to be watching the phone to see what's happening.
> > 
> > I'll try to help with this tomorrow, when I'm (hopefully) back in the office.
> 
> I'll try to help with this tomorrrow (be in earlier); if you'll be around
> too.

This is also now affecting mozilla-aurora/v1.3. When Stephen observed this on Friday it appeared that the phone was doing nothing and sitting at the homescreen. I'm going to take a node offline and see if I can replicate this remotely in a reduced testcase.
Flags: needinfo?(stephen.donner)
Unable to replicate by logging in via SSH and triggering the Phone perf tests on the same device/gecko/gaia where the issue has been seen. I've created a new temporary job (b2g.hamachi.mozilla-central.master.perf.phone) that just runs this test on a dedicated node (b2g-8) with more iterations.
I ran http://qa-selenium.mv.mozilla.com:8080/job/b2g.hamachi.mozilla-central.master.perf.phone/11/console just now and watched it -- the Phone app definitely launches, then closes; rinse, lather, repeat, so I don't know why it's claiming that it's missing the perf metrics.

Dave says he'll try adding debug to the job tomorrow, and we'll reconvene our investigations :-)
So this appears to be caused by the subsequent launches being warm. This implies that the Phone app is not completely killed when we start it. I've added some more debug and running it again now.
B2GPerfRunner DEBUG   | Unlocking device
B2GPerfRunner DEBUG   | Killing all running apps
B2GPerfRunner DEBUG   | Returning to home screen
B2GPerfRunner DEBUG   | Waiting for 10 seconds
B2GPerfRunner DEBUG   | Running apps: Homescreen
B2GPerfRunner DEBUG   | Launching 'Phone'
B2GPerfRunner DEBUG   | Killing 'Phone'   <--- we kill Phone here
B2GPerfRunner DEBUG   | Running apps: Homescreen   <--- and now it's not running
B2GPerfRunner DEBUG   | Metric 'cold_load_time' returned: 817
B2GPerfRunner DEBUG   | Waiting for 10 seconds
B2GPerfRunner DEBUG   | Running apps: Phone, Homescreen   <--- but now it's running again!?
B2GPerfRunner DEBUG   | Launching 'Phone'

As far as I can tell, it looks like the Phone app is being automatically started (perhaps in the background?) after it's killed. I'm not sure what might cause this, perhaps something to do with the SIM/carrier?
Moving the component to Dialer in case anyone is able to help. I will be removing this app from the tests running on Jenkins as it continues to fail.
Component: General → Gaia::Dialer
Blocks: 959707
I've just replicated this manually. The Phone app is shown in cards view, I swipe up to kill it, then a short while later it's back! I'll attach the relevant section from logcat.
Attached file logcat.txt
Etienne: Any idea why this is happening?
Flags: needinfo?(etienne)
The Dialer apps can be launched in background by a bunch of system messages [1].
It's probably the case here, but not sure which one.

If you have everything set up to replicate the issue, can you try with the |mozSetMessageHandler|s commented?

[1] https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/dialer/js/dialer.js#L358-375
Flags: needinfo?(etienne)
My manually reproduced case may differ from what we're seeing in automation. It seems if I launch the Camera app, the Phone app is being launched. Then, even after closing the Camera app, the Phone app keeps coming back after I close it.

(In reply to Etienne Segonzac (:etienne) from comment #15)
> If you have everything set up to replicate the issue, can you try with the
> |mozSetMessageHandler|s commented?

Sorry, it's not clear to me what you're suggesting I change here. Could you provide more details?
Flags: needinfo?(etienne)
(In reply to Dave Hunt (:davehunt) from comment #16)
> (In reply to Etienne Segonzac (:etienne) from comment #15)
> > If you have everything set up to replicate the issue, can you try with the
> > |mozSetMessageHandler|s commented?
> 
> Sorry, it's not clear to me what you're suggesting I change here. Could you
> provide more details?

Here's a patch :)
Just to be clear, we're purposefully breaking the incoming call support (and more) here in order to gain some information about what's causing the dialer to relaunch.
Flags: needinfo?(etienne)
I applied the patch and ran make reset-gaia but I can no longer replicate the issue. Would the patch have that effect? I'm not sure what to expect with all that commented out. :)
Flags: needinfo?(etienne)
(In reply to Dave Hunt (:davehunt) from comment #18)
> I applied the patch and ran make reset-gaia but I can no longer replicate
> the issue. Would the patch have that effect? I'm not sure what to expect
> with all that commented out. :)

Ok. So we just confirmed that the dialer is being launched by a system message [1].

So either the phone is receiving a call, or a notification is triggered etc...

With this new patch you should be able to reproduce the issue, and the logcat (greped on "+++") will tell us what's causing the inadvertent launch.

[1] https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.webapi/o8bkwx0EtmM
Attachment #8362465 - Attachment is obsolete: true
Flags: needinfo?(etienne)
It appears we've found the culprit...

E/GeckoConsole(10156): Content JS LOG at app://communications.gaiamobile.org/dialer/js/dialer.js:177 in headsetCommandHandler: +++ waking up because of a headset message

The devices that run the tests have a headset plugged in so they can run the radio tests, but why would this cause the Phone app the launch? I can confirm that removing the headset causes the behaviour to stop, but connecting it again does not cause it to start again.

I can then cause it to occur again by manually clicking a button on the headset. It seems we have phantom headset button clicking somehow...
Flags: needinfo?(etienne)
(In reply to Dave Hunt (:davehunt) from comment #20)
> I can then cause it to occur again by manually clicking a button on the
> headset. It seems we have phantom headset button clicking somehow...

Yep :/

Randy do you have any idea?
Flags: needinfo?(etienne) → needinfo?(rlin)
Do you use the headset in hamachi box? Or change with another one?
It looks like that hamachi device isn't calibration the headset detection parameter very well.
Flags: needinfo?(rlin)
Keywords: perf
Whiteboard: [c=automation p= s= u=]
(In reply to Randy Lin [:rlin] from comment #22)
> Do you use the headset in hamachi box? Or change with another one?
> It looks like that hamachi device isn't calibration the headset detection
> parameter very well.

Apologies, all :-(

Answering for Dave -- since the performance-measuring Hamachis are hosted by my team, in the Mountain View office -- most of them were actually using a different set of earbuds (we weren't careful in setting them up, clearly - some were ZTE/Inari, others, more generic earbuds).

All Hamachis now are using the factory-provided Alcatel-Lucent earbuds - let's see how subsequent runs fare; again, sorry for this oversight - it's something we'll be sure to get right whenever we set up new phones in the future!
That appears to have fixed it!! This still seems like a valid bug to me though, why would a different set of headphones cause this behaviour?
I've raised bug 962996 for the issue and closing this as we can now run our automation again! :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I'm not seeing the Dialer app on Datazilla. Do we need to wait a bit more or is there another issue?
Never mind, I was looking for Dialer but it shows up as Phone.
Whiteboard: [c=automation p= s= u=] → [c=automation p= s=2014.01.31 u=]
Target Milestone: --- → 1.3 C3/1.4 S3(31jan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: