Closed Bug 1137653 Opened 5 years ago Closed 2 years ago

Intermittent crashes when running the Gaia UI functional tests

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect)

defect
Not set

Tracking

(firefox37 wontfix, firefox38 wontfix, firefox39 fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED WONTFIX
2.2 S7 (6mar)
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: davehunt, Unassigned)

References

Details

Attachments

(8 files, 1 obsolete file)

It looks like we've recently picked up a new crasher when running the Gaia UI tests. It looks similar to what bug 1131375 sees in try runs, so is probably not actually related to that change.

Traceback (most recent call last):
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette_test.py", line 265, in run
self.setUp()
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/tests/python/gaia-ui-tests/gaiatest/tests/functional/cards_view/test_cards_view_with_two_apps.py", line 18, in setUp
self.apps.launch(app)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 84, in launch
self.marionette.switch_to_frame(app.frame_id)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 1074, in switch_to_frame
response = self._send_message('switchToFrame', 'ok', element=frame.id, focus=focus)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 615, in _send_message
"Connection timed out", status=errors.ErrorCodes.TIMEOUT)
TimeoutException: TimeoutException: Connection timed out

Most recent errors/exceptions are:
02-26 21:53:53.452 W/GeckoConsole( 6373): Full Message: TypeError: gShutdownBarrier is null
02-26 21:58:00.852 W/GeckoConsole( 6373): [JavaScript Error: "1425016680864 Toolkit.Telemetry ERROR TelemetryEnvironment::_isDefaultBrowser - Could not obtain shell service" {file: "resource://gre/modules/Log.jsm" line: 749}]
02-26 21:58:00.852 W/GeckoConsole( 6373): [JavaScript Error: "Failed to load module resource://gre/modules/services/healthreport/profile.jsm." {file: "resource://gre/modules/XPCOMUtils.jsm" line: 245}]
02-26 21:58:00.862 W/GeckoConsole( 6373): [JavaScript Error: "1425016680874 Toolkit.Telemetry ERROR TelemetryEnvironment::_doGetEnvironmentData - There was an exception collecting profile: [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: resource://gre/modules/XPCOMUtils.jsm :: XPCU_moduleLambda :: line 243" data: no] Stack trace: XPCU_moduleLambda()@resource://gre/modules/XPCOMUtils.jsm:243 < XPCU_defineLazyGetter/<.get()@resource://gre/modules/XPCOMUtils.jsm:193 < this.TelemetryEnvironment._getProfile<()@resource://gre/modules/TelemetryEnvironment.jsm:443 < next()@self-hosted:620 < TaskImpl_run()@resource://gre/modules/Task.jsm:314 < TaskImpl()@resource://gre/modules/Task.jsm:275 < createAsyncFunction/asyncFunction()@resource://gre/modules/Task.jsm:249 < this.TelemetryEnvironment._doGetEnvironmentData</sections.profile()@resource://gre/modules/TelemetryEnvironment.jsm:927 < this.TelemetryEnvironment._doGet
02-26 21:58:00.882 W/GeckoConsole( 6373): [JavaScript Error: "1425016680892 Toolkit.Telemetry ERROR TelemetryEnvironment::_doGetEnvironmentData - There was an exception collecting addons: [Exception... "AddonManager is not initialized" nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)" location: "JS frame :: resource://gre/modules/AddonManager.jsm :: AMI_getAddonsByTypes :: line 2183" data: no] Stack trace: AMI_getAddonsByTypes()@resource://gre/modules/AddonManager.jsm:2183 < AM_getAddonsByTypes()@resource://gre/modules/AddonManager.jsm:2831 < promiseGetAddonsByTypes/<()@resource://gre/modules/TelemetryEnvironment.jsm:92 < promiseGetAddonsByTypes()@resource://gre/modules/TelemetryEnvironment.jsm:91 < this.TelemetryEnvironment._getActiveAddons<()@resource://gre/modules/TelemetryEnvironment.jsm:630 < next()@self-hosted:620 < TaskImpl_run()@resource://gre/modules/Task.jsm:314 < TaskImpl()@resource://gre/modules/Task.jsm:275 < createAsyncFunction/asyncFunction()@resource://gre/modules/Task.jsm:249 < this.TelemetryEnvironment._getAddons<()@resource://g

The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure.
Attached file console.log
Attached file logcat.txt.zip
Attached file report.html
Another apparent crash in the same testrun:

Traceback (most recent call last):
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette_test.py", line 283, in run
testMethod()
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/tests/python/gaia-ui-tests/gaiatest/tests/functional/marketplace/test_marketplace_packaged_app.py", line 29, in test_search_and_install_app
marketplace.search(self.app_search)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/tests/python/gaia-ui-tests/gaiatest/apps/marketplace/app.py", line 60, in search
search_box.send_keys(term)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 113, in send_keys
return self.marionette._send_message('sendKeysToElement', 'ok', id=self.id, value=typing)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 615, in _send_message
"Connection timed out", status=errors.ErrorCodes.TIMEOUT)
TimeoutException: TimeoutException: Connection timed out


Most recent errors/exceptions are:
02-26 22:52:31.798 W/GeckoConsole(30399): [JavaScript Warning: "This site makes use of a SHA-1 Certificate; it's recommended you use certificates with signature algorithms that use hash functions stronger than SHA-1." {file: "https://bam.nr-data.net/jserrors/1/4a1733c9c6?a=2914756&pl=1425019887256&v=536.b800686&to=M1ZRYEEADRBRAkAPWwocdUFdAhcKXw8bC18QHVBbXgwMDUANVQVRSkVaUUQSWQBfDFkJWhRfUldW&xhr=%5B%7B%22params%22:%7B%22method%22:%22GET%22,%22host%22:%22marketplace.cdn.mozilla.net:443%22,%22pathname%22:%22/api/v2/feed/get/%22,%22status%22:200%7D,%22metrics%22:%7B%22count%22:2,%22duration%22:%7B%22t%22:3446,%22min%22:1599,%22max%22:1847,%22sos%22:5968210,%22c%22:2%7D,%22rxSize%22:%7B%22t%22:37650,%22min%22:18825,%22max%22:18825,%22sos%22:708761250,%22c%22:2%7D,%22cbTime%22:%7B%22t%22:2352,%22min%22:1109,%22max%22:1243,%22sos%22:2774930,%22c%22:2%7D,%22time%22:%7B%22t%22:9850,%22min%22:2721,%22max%22:7129,%22sos%22:58226482,%22c%22:2%7D%7D%7D%5D" line: 0}]
02-26 22:55:36.858 W/GeckoConsole(30399): [JavaScript Error: "1425020136864 Toolkit.Telemetry ERROR TelemetryEnvironment::_isDefaultBrowser - Could not obtain shell service" {file: "resource://gre/modules/Log.jsm" line: 749}]
02-26 22:55:36.858 W/GeckoConsole(30399): [JavaScript Error: "Failed to load module resource://gre/modules/services/healthreport/profile.jsm." {file: "resource://gre/modules/XPCOMUtils.jsm" line: 245}]
02-26 22:55:36.868 W/GeckoConsole(30399): [JavaScript Error: "1425020136877 Toolkit.Telemetry ERROR TelemetryEnvironment::_doGetEnvironmentData - There was an exception collecting profile: [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: resource://gre/modules/XPCOMUtils.jsm :: XPCU_moduleLambda :: line 243" data: no] Stack trace: XPCU_moduleLambda()@resource://gre/modules/XPCOMUtils.jsm:243 < XPCU_defineLazyGetter/<.get()@resource://gre/modules/XPCOMUtils.jsm:193 < this.TelemetryEnvironment._getProfile<()@resource://gre/modules/TelemetryEnvironment.jsm:443 < InterpretGeneratorResume()@self-hosted:712 < next()@self-hosted:620 < TaskImpl_run()@resource://gre/modules/Task.jsm:314 < TaskImpl()@resource://gre/modules/Task.jsm:275 < createAsyncFunction/asyncFunction()@resource://gre/modules/Task.jsm:249 < this.TelemetryEnvironment._doGetEnvironmentData</sections.profile()@resource://gre/modules/TelemetryEnvironme
02-26 22:55:36.898 W/GeckoConsole(30399): [JavaScript Error: "1425020136898 Toolkit.Telemetry ERROR TelemetryEnvironment::_doGetEnvironmentData - There was an exception collecting addons: [Exception... "AddonManager is not initialized" nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)" location: "JS frame :: resource://gre/modules/AddonManager.jsm :: AMI_getAddonsByTypes :: line 2183" data: no] Stack trace: AMI_getAddonsByTypes()@resource://gre/modules/AddonManager.jsm:2183 < AM_getAddonsByTypes()@resource://gre/modules/AddonManager.jsm:2831 < promiseGetAddonsByTypes/<()@resource://gre/modules/TelemetryEnvironment.jsm:92 < promiseGetAddonsByTypes()@resource://gre/modules/TelemetryEnvironment.jsm:91 < this.TelemetryEnvironment._getActiveAddons<()@resource://gre/modules/TelemetryEnvironment.jsm:630 < InterpretGeneratorResume()@self-hosted:712 < next()@self-hosted:620 < TaskImpl_run()@resource://gre/modules/Task.jsm:314 < TaskImpl()@resource://gre/modules/Task.jsm:275 < createAsyncFunction/asyncFunction()@resource://gre/modules/Task.jsm:249 < this.Te

The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure.
The above was accompanied by an exception in tearDown:

Traceback (most recent call last):
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette_test.py", line 311, in run
self.tearDown()
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 996, in tearDown
MarionetteTestCase.tearDown(self)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette_test.py", line 531, in tearDown
(self.filepath.replace('\\', '\\\\'), self.methodName))
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 1293, in execute_script
filename=os.path.basename(frame[0]))
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 634, in _send_message
self._handle_error(response)
File "/var/jenkins/2/workspace/flame-kk-319.b2g-inbound.tinderbox.ui.functional.smoke/.env/local/lib/python2.7/site-packages/marionette_client-0.8.7-py2.7.egg/marionette/marionette.py", line 663, in _handle_error
"Malformed packet, expected key 'error' to be a dict: %s" % response)
MarionetteException: MarionetteException: Malformed packet, expected key 'error' to be a dict: {u'message': u'this.curBrowser is null', u'error': u"error occurred while processing 'executeScript"}

This looks like it may be a Marionette issue?
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
I think the teardown error is an issue in marionette but I think you only hit it if the session has already gone pear shaped it might not be much we can do. Would you be able to get a reduced test case for us?
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
(In reply to David Burns :automatedtester from comment #6)
> I think the teardown error is an issue in marionette but I think you only
> hit it if the session has already gone pear shaped it might not be much we
> can do. Would you be able to get a reduced test case for us?

I don't think a reduced testcase will be particularly easy, and I was pretty much able to work out what was happening by looking at the code in the tracebacks - I thought perhaps one of you would be able to pick this up much faster than me though, hence the needinfo.

I looks to me like a call to self.marionette._send_message times out (we're assuming a crash) and then a subsequent call to execute_script [1] fails because the error value is a string instead of a dict [2]. This appears to have been added in bug 1111246, but we seem to not always adhere to this ourselves [3].

[1] https://hg.mozilla.org/mozilla-central/file/c886cd910936/testing/marionette/client/marionette/marionette_test.py#l531
[2] https://hg.mozilla.org/mozilla-central/file/c886cd910936/testing/marionette/driver/marionette_driver/marionette.py#l663
[3] https://hg.mozilla.org/mozilla-central/file/eea6188b9b05/testing/marionette/marionette-server.js#l210
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
I have a try run for the teardown and error data shape ^
Flags: needinfo?(dburns)
Flags: needinfo?(ato)
Assignee: nobody → dburns
/r/4539 - Bug 1137653: If there is a crash don't try any more marionette commands as they will just cause unnecessary errors; r=jgriffin
/r/4541 - Bug 1137653: Correct the shape of the data object sent over if there are are crashes or the packet is malformed; r=jgriffin

Pull down these commits:

hg pull review -r 778460375e8d785dedecac4f606449c7d3cb62da
Attachment #8571988 - Flags: review?(jgriffin)
Comment on attachment 8571988 [details]
MozReview Request: bz://1137653/AutomatedTester

https://reviewboard.mozilla.org/r/4537/#review3753

Nice, thanks for the fixes.  I'd noticied weird error messages recently, this should fix them!
Attachment #8571988 - Flags: review?(jgriffin) → review+
One way of reproducing it is to run functional/email/test_setup_and_send_imap_email.py.  It reproduced about 40% of the time on the nightly build from March 5th.

Another way was to add import pdb pdb.set_trace() in the middle of above script (in line 63) , and wait for a few minutes or enter some import statements

in 50% of the time, it crashes with signatures like below:
https://crash-stats.mozilla.com/report/index/c046cf8a-de73-4887-979f-5590f2150305
Attached file marionettecrash.log
logcat while executing marionette test case. Stopped shortly after the crash occured.
Sorry, didn't mean dup.  I meant possibly related?
Blocks: 1137273
https://hg.mozilla.org/mozilla-central/rev/7a78ca49e38e
https://hg.mozilla.org/mozilla-central/rev/e2358ee2e4c8
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S7 (6mar)
Reopening after confirming with davehunt that the fix is for clearing up the log
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: dburns → nobody
should this be marked as qablocker as well? (It happens when running python automation scripts) pinging Geo.
Flags: needinfo?(gmealer)
(In reply to No-Jun Park [:njpark] from comment #22)
> should this be marked as qablocker as well? (It happens when running python
> automation scripts) pinging Geo.

Depends on how often, whether it's at all deterministic, etc. The other one I just marked was bringing down the confidence of entire test runs.

The other thing is that qablocker would normally be something I'd apply to a product bug that interferes with automation. My assumption is any automation/Marionette bug interferes with our automation, so then it really does come down to whether we're at a hard stop or what.
Flags: needinfo?(gmealer)
After the patch, I'm getting this error (although much less frequent). Still trying to get a complete logcat.

TEST-UNEXPECTED-ERROR | test_setup_and_send_receive_imap_email.py TestSetupAndSendIMAPEmail.test_setup_and_send_imap_email | MarionetteException: MarionetteException: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIDOMWindowUtils.sendTouchEvent]
stacktrace:
	emitTouchEvent@chrome://marionette/content/marionette-listener.js:746:0
	generateEvents@chrome://marionette/content/marionette-listener.js:895:8
	actions@chrome://marionette/content/marionette-listener.js:1162:16
	actionChain@chrome://marionette/content/marionette-listener.js:1233:4
I couldn't get the crash signature in above instance since it rebooted immediately.  I was able to repro it at a different time, and the signature is:

https://crash-stats.mozilla.com/report/index/647c9b95-73a9-4a9d-a863-f531e2150310
nhirata, i vaguely remember you said that you can show us how to view the crash reports with symbols - ni?ing you in case if i remembered correctly
Flags: needinfo?(nhirata.bugzilla)
You can get the raw dumps from the https://crash-stats.mozilla.org/report/index/a9e65dc1-4d60-4195-9057-a23fa2150325 then clicking on the rawdump tab.  (Note: you have to be signed in and have access to download the raw dumps).

You will also need to download the symbols assocated with the build.  The first tab shows this : 20150325083424 / eng.cltbld.20150325.122449:en ;  based on the time stamp : https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/b2g-inbound-flame-kk-eng-debug/20150325083424/

From there it's a matter of having the minidump_stackwalk tool compiled on your machine.  To compile it see : https://developer.mozilla.org/en-US/docs/Debugging_a_minidump
Attached file crash_stack.txt
output of 
 minidump_stackwalk a9e65dc1-4d60-4195-9057-a23fa2150325.dmp b2g-39.0a1.en-US.android-arm.crashreporter-symbols/ > crash_stack.txt
Flags: needinfo?(nhirata.bugzilla)
Looking at the crash stack, it is indeed very similar to Bug 1137512. making it a blocking bug.
Depends on: 1137512
Currently occurring at test_setup_and_send_receive_imap_email.py in imagecompare test suite.
No longer blocks: 1141418
According to bug 1141418 #17 it looks to occur in b2g37 as well, is it possible to uplift?
ni? dburns who proposed this patch.
Flags: needinfo?(dburns)
Paul, feel free to propose this for uplift and set the necessary flags to get the patch in https://reviewboard.mozilla.org/r/4537/ uplifted
Flags: needinfo?(dburns)
Comment on attachment 8571988 [details]
MozReview Request: bz://1137653/AutomatedTester

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
bug 1141418
User impact if declined:
For running over 10 hours, marionette framework will throw exception randomly
Testing completed: Auto-test on master, pass regular full tests
Risk to taking this patch (and alternatives if risky): Low.  Error handling
String or UUID changes made by this patch: None
Attachment #8571988 - Flags: approval-mozilla-b2g37?
Attachment #8571988 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
I changed the queries a few weeks ago to ignore bug resolution on approvals :)
Keywords: checkin-needed
See #20
Maybe another following bug will be appropriate.
Attachment #8571988 - Attachment is obsolete: true
Attachment #8619615 - Flags: review+
Attachment #8619616 - Flags: review+
Assignee: dburns → nobody
Firefox OS is not being worked on
Status: REOPENED → RESOLVED
Closed: 5 years ago2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.