Closed Bug 790463 Opened 7 years ago Closed 7 years ago

[B2G] several telephony tests often causes emulator to crash

Categories

(Core :: DOM: Device Interfaces, defect)

All
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla20
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: jgriffin, Assigned: swu)

References

Details

(Keywords: crash, intermittent-failure, regression, Whiteboard: [b2g-crash])

Attachments

(3 files, 1 obsolete file)

The test dom/telephony/test/marionette/test_outgoing_hold_resume.js is often (~90% of the time) causing the emulator to crash.  This problem was introduced with this merge from inbound:

https://hg.mozilla.org/mozilla-central/pushloghtml?startID=23429&endID=23430

Full test log:

http://brasstacks.mozilla.com/autologserver/showlog?file=TM-0oBmQRpunKWZSXszGKA.txt.gz
Bug 784220 is the only thing directly related to telephony in that merge.
I'm disabling this test for now, since it crashes the emulator and prevents other tests from running.
I am not sure about the crash cause, but here's friendly info that bug 795255 improved drawbacks of bug 784220.
This also affects test_incoming_already_connected.js, so I'm disabling this test as well.

The log for this test, prior to the crash, is:

I/Gecko   (   43): MARIONETTE LOG: INFO: TEST-START: /data/jenkins/workspace/webapi-marionette-test/dom/telephony/test/marionette/test_incoming_already_connected.js
E/GeckoConsole(   43): [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.32.67.124:48425/empty.html" line: 0}]
I/Gecko   (   43): MARIONETTE LOG: INFO: Verifying initial state.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object Telephony] was false, expected true
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - null should equal null
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined -  was false, expected true
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - 0 should equal 0
I/Gecko   (   43): MARIONETTE LOG: INFO: Initial call list: OK
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - OK should equal OK
I/Gecko   (   43): MARIONETTE LOG: INFO: Make an outgoing call.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] was false, expected true
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - 5555551111 should equal 5555551111
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - dialing should equal dialing
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - 1 should equal 1
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE LOG: INFO: Call list is now: outbound to  5555551111 : unknown,OK
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - outbound to  5555551111 : unknown should equal outbound to  5555551111 : unknown
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - OK should equal OK
I/Gecko   (   43): MARIONETTE LOG: INFO: Answering the outgoing call.
D/AudioHardwareInterface(   38): setMode(IN_CALL)
W/AudioFlinger(   38): Thread AudioOut_1 cannot connect to the power manager service
I/Gecko   (   43): MARIONETTE LOG: INFO: Received 'connected' call event for the original outgoing call.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - connected should equal connected
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE LOG: INFO: Call list is now: outbound to  5555551111 : active,OK
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - outbound to  5555551111 : active should equal outbound to  5555551111 : active
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - OK should equal OK
I/Gecko   (   43): MARIONETTE LOG: INFO: Simulating an incoming call (with one call already connected).
D/AudioHardwareInterface(   38): setMode(IN_CALL)
W/AudioPolicyManagerBase(   38): setPhoneState() setting same state 2
I/Gecko   (   43): MARIONETTE LOG: INFO: Received 'incoming' call event.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] was false, expected true
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - 5555552222 should equal 5555552222
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - incoming should equal incoming
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - 2 should equal 2
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE LOG: INFO: Call list is now: outbound to  5555551111 : active,inbound from 5555552222 : incoming,OK
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - outbound to  5555551111 : active should equal outbound to  5555551111 : active
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - inbound from 5555552222 : incoming should equal inbound from 5555552222 : incoming
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - OK should equal OK
I/Gecko   (   43): MARIONETTE LOG: INFO: Answering the incoming call.
I/Gecko   (   43): MARIONETTE LOG: INFO: Received 'connecting' call event for incoming/2nd call.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - connecting should equal connecting
D/AudioHardwareInterface(   38): setMode(NORMAL)
D/AudioHardwareInterface(   38): setMode(IN_CALL)
I/Gecko   (   43): MARIONETTE LOG: INFO: Received 'connected' call event for incoming/2nd call.
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - connected should equal connected
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - true was false, expected true
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - [object TelephonyCall] should equal [object TelephonyCall]
I/Gecko   (   43): MARIONETTE TEST RESULT:TEST-PASS | undefined - held should equal held
I/Gecko   (   43): MARIONETTE LOG: INFO: Call list is now: outbound to  5555551111 : held,inbound from 5555552222 : active,OK
test_incoming_already_connected.js disabled:  https://hg.mozilla.org/mozilla-central/rev/e64299ca7439
This also affects test_outgoing_remote_hangup_held.js...disabled: https://hg.mozilla.org/mozilla-central/rev/a2978e251ad7
Summary: [B2G] test_outgoing_hold_resume.js often causes emulator to crash → [B2G] several telephony tests often causes emulator to crash
The emulator will crash if an outgoing call remains active for approx. 800ms or longer. The tests mentioned in this bug all longer WebTelephony tests that keep their outgoing call active for a longer period than the other WebTelephony tests.

To reproduce the emulator crash, run the attached "test_emulator_crash.js" marionette test on the emulator, which simply creates an outgoing call and keeps it active for up to 2 seconds.  Around 800ms the emulator crashes.

Note, this also affects test_swap_held_and_active.js.
Simple marionette test that crashes the B2G emulator, by keeping an outgoing call active.
Ping! Hey guys, if someone could please take a look at this it would be much appreciated, as we are trying to get as many tests as possible up and running on TBPL.  Thanks!
As seen in the links posted above by Ed, this issue also happens sometimes with test_incoming_answer_hangup.js.  Will disable this test as well.
Noming, this could be a legit issue.
blocking-basecamp: --- → ?
blocking-basecamp: ? → +
Keywords: crash
Whiteboard: [orange] → [orange][b2g-crash]
It will be easier to debug with gdb.  Does anyone know how to run gdb on emulator in marionette test?
Below is the emulator crash point with gdb, when running marionette test with test_incoming_answer_hangup.js.

=================================================================
emulator64-arm: external/qemu/telephony/android_modem.c:2159: voice_call_event: Assertion `0 && "unreachable event call state"' failed.

Program received signal SIGABRT, Aborted.
=================================================================

When call accepted, the call->state will be changed from A_CALL_ALERTING to A_CALL_ACTIVE.  However there's no handling for A_CALL_ACTIVE in the switch case, so it goes to default case with assert error.

It can be fixed by adding A_CALL_ACTIVE case handling.

Other test cases to be checked.
There is an automatically call state transition mechanism in emulator in android_modem.c.
1. After handleDial(), the call state changed to A_CALL_DIALING and a timer (one second) initiated state transition.
2. Timeout function voice_call_event() called, and call state changed to A_CALL_ALERTING, and another timer (one second) initiated for next state transition.
3. Timeout function voice_call_event() called, and call state changed to A_CALL_ACTIVE.

During the automatically state transition period, it's possible call state been changed by console commands (ex: 'gsm accpet' or 'gsm hold') for testing purpose. We should not assert on these states.

A pull request has been sent to address this issue.
https://github.com/mozilla-b2g/platform_external_qemu/pull/8
.
Attached is a patch to run gdb on emulator with marionette, for debugging purpose.

After applied to gecko and rebuilt emulator image, run with:
$ DBG=1 ./test.sh marionette
The pull request in comment 16 has been merged on github.

The patch adds test_crash_emulator.js created by Rob Wood and recovers all disabled cases related to this bug.

Testing locally by |./test.sh marionette| is passed.

How am I supposed to know if the result is ok on try server?

https://tbpl.mozilla.org/?tree=Try&rev=c57e3d45b5bf
https://tbpl.mozilla.org/?tree=Try&rev=adb4846413bc
(In reply to Shian-Yow Wu from comment #18)
> How am I supposed to know if the result is ok on try server?

The B2G tests got temporarily hidden due to bug 812149. Add &noignore=1 to the url and look at the B2G arm opt 'Mn' tests.

For future reference you can use the syntax "try -b o -p ics_armv7a_gecko -u marionette-webapi -t none" (yeah I know, totally obvious ;) to run just that set of tests.
If this patch depends on a commit to platform_external_qemu, however, we'll need to update the emulator used by rel-eng in order to see any effect.
assigning to swu since you posted patches (thanks!)
Assignee: nobody → swu
Given that we haven't found the causing patch, shouldn't we do a hg bisect to find the exact regression range? I think it would help a lot to know what has been caused this crash.
(In reply to Jonathan Griffin (:jgriffin) from comment #0)
> The test dom/telephony/test/marionette/test_outgoing_hold_resume.js is often
> (~90% of the time) causing the emulator to crash.  This problem was
> introduced with this merge from inbound:
> 
> https://hg.mozilla.org/mozilla-central/pushloghtml?startID=23429&endID=23430
I have read this Andrew, but there are 31 changesets in there which doesn't make it clear.
Depends on: 812699
(In reply to Henrik Skupin (:whimboo) from comment #22)
> Given that we haven't found the causing patch, shouldn't we do a hg bisect
> to find the exact regression range? I think it would help a lot to know what
> has been caused this crash.

The crash only happens in some racing condition, old revision just didn't hit the timing to cause the crash.  I think it's not a regression.  Let's wait bug 812699 to land and we can see try server result.
Shian-Yow Wu, the emulator used by buildbot had been updated with your qemu changes (bug 812699), so you should be able to run this through try server now, with the following syntax:

try: -b o -p ics_armv7a_gecko -t none -u marionette-webapi
Note that all of the WebTelephony tests are currently disabled because of the emulator crash (they are all disabled in the manifest.ini in mozilla-central):

http://mxr.mozilla.org/mozilla-central/source/dom/telephony/test/marionette/manifest.ini
Rebased from previous patch.

The patch adds test_crash_emulator.js created by Rob Wood and recovers all disabled cases related to this bug.

https://tbpl.mozilla.org/?tree=Try&rev=3a9d9fd7f563
Attachment #682356 - Attachment is obsolete: true
Attachment #683535 - Flags: review?(jgriffin)
Duplicate of this bug: 781604
try run passed!
Comment on attachment 683535 [details] [diff] [review]
Patch enabled previous crash cases

Sorry meant to r+ this yesterday.
Attachment #683535 - Flags: review?(jgriffin) → review+
https://hg.mozilla.org/mozilla-central/rev/bc303103fef9
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Looks like this is still happening on inbound:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17285452&tree=Mozilla-Inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Andrew Halberstadt [:ahal] from comment #35)
> Looks like this is still happening on inbound:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17285452&tree=Mozilla-
> Inbound

According to log, the failure is caused by b2g process crash, while emulator still alive, so it's a different problem.

The b2g process crashed twice.

First crash:

11:42:45     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (102): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 421
11:42:45     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (110): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 421
11:42:45     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (112): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 421
11:42:45     INFO - F/libc    (   43): Fatal signal 11 (SIGSEGV) at 0x00000047 (code=1)
11:42:45     INFO - I/Gecko   (  198): [Child 198] WARNING: pipe error (3): Connection reset by peer: file /builds/slave/m-in-ics-armv7a-gecko/build/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
11:42:45     INFO - I/DEBUG   (   35): debuggerd committing suicide to free the zombie!
11:42:45     INFO - I/ServiceManager(   31): service 'media.audio_policy' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.audio_flinger' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.player' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.camera' died
11:42:45     INFO - I/DEBUG   (  287): debuggerd: Nov 16 2012 22:54:02

Second crash:

1:42:45     INFO - I/Gecko   (  325): 1353613352308	Marionette	INFO	sendToClient: {"from":"conn2.marionette1","error":{"message":{},"status":17,"stacktrace":null}}, undefined, null
11:42:45     INFO - I/Gecko   (  325): MARIONETTE LOG: INFO: Received 'alerting' call event.
11:42:45     INFO - I/Gecko   (  325): MARIONETTE LOG: INFO: Answering the outgoing call.
11:42:45     INFO - I/Gecko   (  325): 1353613352421	Marionette	INFO	sendToClient: {"emulator_cmd":"gsm accept 5555551111","id":0}, undefined, null
11:42:45     INFO - F/libc    (  325): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
11:42:45     INFO - I/DEBUG   (  287): debuggerd committing suicide to free the zombie!
11:42:45     INFO - I/ServiceManager(   31): service 'media.audio_policy' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.audio_flinger' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.player' died
11:42:45     INFO - I/ServiceManager(   31): service 'media.camera' died
11:42:45     INFO - I/DEBUG   (  402): debuggerd: Nov 16 2012 22:54:02
Filed bug 814551 for tracking b2g process crash.

Closing this bug for now and please feel free to reopen it if we found emulator crash again.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Whiteboard: [orange][b2g-crash] → [b2g-crash]
You need to log in before you can comment on or make changes to this bug.