Closed
Bug 990580
Opened 11 years ago
Closed 11 years ago
Tests are failing; app freezes after being launched
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:2.0+)
RESOLVED
WORKSFORME
blocking-b2g | 2.0+ |
People
(Reporter: RobertC, Unassigned)
References
Details
(Keywords: qablocker, regression)
Attachments
(8 files, 3 obsolete files)
Tests on Jenkins are failing because of a timeout in the app launch method. The timeout occurs when trying to switch to the frame of the newly opened app.
This issue is not reproduced locally with manual or automated testing.
Stacktrace:
06:42:52 ERROR: None
06:42:52 ----------------------------------------------------------------------
06:42:52 Traceback (most recent call last):
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 163, in run
06:42:52 testMethod()
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/tests/functional/keyboard/test_keyboard.py", line 19, in test_keyboard_basic
06:42:52 contacts_app.launch()
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/contacts/app.py", line 32, in launch
06:42:52 Base.launch(self)
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 25, in launch
06:42:52 self.app = self.apps.launch(self.name, launch_timeout=launch_timeout)
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 60, in launch
06:42:52 self.marionette.switch_to_frame(app.frame_id)
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 895, in switch_to_frame
06:42:52 response = self._send_message('switchToFrame', 'ok', element=frame.id, focus=focus)
06:42:52 File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 604, in _send_message
06:42:52 "Connection timed out", status=ErrorCodes.TIMEOUT)
06:42:52 TimeoutException: TimeoutException: Connection timed out
06:42:52
06:42:52
06:42:52 Most recent errors/exceptions are:
06:42:52
06:42:52 04-01 06:36:45.129 E/GeckoConsole( 6688): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:397 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/gaia_build_defer_index.js :: loadFile :: line 389" data: no]
06:42:52 04-01 06:36:47.209 E/GeckoConsole( 6688): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:60 in retrieve: Got an exception when trying to load icon "app://music2.gaiamobile.org/style/icons/60/Music.png +" falling back to cached icon. Exception is:
06:42:52 TEST-UNEXPECTED-FAIL | test_keyboard.py test_keyboard.TestKeyboard.test_keyboard_basic | The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure.
Latest build on which it is reproduced:
Gaia 874fe42b82e8d819d592690e74db91c07179e68c
Gecko https://hg.mozilla.org/mozilla-central/rev/1417d180a1d8
BuildID 20140401040202
Version 31.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Latest test run where the issue occured:
http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-central.ui.2/255/console
Reporter | ||
Comment 1•11 years ago
|
||
This issue first appeared in yesterdays build (31 March).
Showing version details
application_buildid: 20140331040200
application_changeset: d8e8f13bd4ae
application_name: B2G
application_repository: https://hg.mozilla.org/mozilla-central
application_version: 31.0a1
build_changeset: 1ad48c4be51b279f7f63c1a13025b52fe087d231
device_firmware_date: 1387433095
device_firmware_version_incremental: 324
device_firmware_version_release: 4.0.4
device_id: msm7627a
gaia_changeset: 26839cb46f856d610b192f5655a8c38a6bfe0829
gaia_date: 1396251714
gecko_changeset: 34c6e4261eb036cb6050d5b1a73cd9bc4f5f6251
platform_buildid: 20140331040200
platform_changeset: d8e8f13bd4ae
platform_repository: https://hg.mozilla.org/mozilla-central
Reporter | ||
Updated•11 years ago
|
Summary: Tests are failing on Jenkins with timeout when trying to launch apps → Tests are failing on Jenkins with timeout when trying to switch to the app frame
Reporter | ||
Comment 2•11 years ago
|
||
It seems the error is not limited to the launch method. It appears when calling self.marionette.switch_to_frame(app.frame_id) regardless of the method it's called from.
Comment 3•11 years ago
|
||
Adding qawanted as we can not reproduce this locally with automation and manual testing.
Keywords: qawanted
Comment 4•11 years ago
|
||
(In reply to Florin Strugariu [:Bebe] from comment #3)
> Adding qawanted as we can not reproduce this locally with automation and
> manual testing.
We really only need to do qawanted in cases where there hasn't been a manual investigation already. From what I can tell from the comments above, that already happened here.
Keywords: qawanted
Comment 5•11 years ago
|
||
Andrei, team: when you see timeouts like this, can you start attaching debugging logs, if we suspect a crash?
See https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Debugging_OOMs for info; thanks!
Comment 6•11 years ago
|
||
From the error message:
(In reply to Robert Chira [:RobertC] from comment #0)
> test_keyboard.TestKeyboard.test_keyboard_basic | The B2G process has
> restarted after crashing during the tests so Marionette can't respond due to
> either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors
> are listed. Check logcat for all errors if these errors are not the cause of
> the failure.
This means that during this test, b2g crashed and restarted, so Marionette can't respond since the server was restarted.
The crash might be due to the errors that was printed out:
"04-01 06:36:45.129 E/GeckoConsole( 6688): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:397 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/gaia_build_defer_index.js :: loadFile :: line 389" data: no]
06:42:52 04-01 06:36:47.209 E/GeckoConsole( 6688): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:60 in retrieve: Got an exception when trying to load icon "app://music2.gaiamobile.org/style/icons/60/Music.png +" falling back to cached icon. Exception is: "
But more information about the crash would be in the logcat, or in the minidumps if you have that enabled.
Comment 7•11 years ago
|
||
I found:
Bug 918982 - FILE_NOT_FOUND error fired in logcat looking for singleVariant configuration file (singlevariantconf.json)
Bug 980814 - Fix test_system_message.py, JavaScript Error: "NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE)
That look to contain the same error:
E/GeckoConsole( 2241): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:397 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/gaia_build_defer_index.js :: loadFile :: line 389" data: no]
Also when we reboot the phone we see the same issue:
E/GeckoConsole( 371): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:397 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/gaia_build_defer_index.js :: loadFile :: line 389" data: no]
Manual steps:
1. flash latest build
2. open adb log cat and watch for the error
BuildID:
Gaia 0e974ff33ba47f3d1e59df1e0ad534f1bbe3ef8a
Gecko https://hg.mozilla.org/mozilla-central/rev/91be2828f17e
BuildID 20140403040201
Version 31.0a1
ro.build.version.incremental=324
ro.build.date=Thu Dec 19 14:04:55 CST 2013
Comment 8•11 years ago
|
||
at line 231 you can find the error
Comment 9•11 years ago
|
||
This issue is reproducible in today’s Jenkins build.
Gaia 0e974ff33ba47f3d1e59df1e0ad534f1bbe3ef8a
Gecko https://hg.mozilla.org/mozilla-central/rev/91be2828f17e
BuildID 20140403040201
Version 31.0a1
ro.build.version.incremental=324
ro.build.date=Thu Dec 19 14:04:55 CST 2013
We can't reproduce the test failure locally.
Comment 10•11 years ago
|
||
I actually managed to replicate this when running test_url_keyboard.TestUrlKeyboard locally.
When I saw it failed, the UI Tests app failed to launch. After being tapped the app opening animation did not finish.
Hamachi build:
Gaia d9a574284d672f532f7c562a091bb01f531202b1
Gecko https://hg.mozilla.org/mozilla-central/rev/6c924a018540
BuildID 20140404040204
Version 31.0a1
(In reply to [:AndreiH] from comment #10)
> I actually managed to replicate this when running
> test_url_keyboard.TestUrlKeyboard locally.
> When I saw it failed, the UI Tests app failed to launch. After being tapped
> the app opening animation did not finish.
>
> Hamachi build:
> Gaia d9a574284d672f532f7c562a091bb01f531202b1
> Gecko https://hg.mozilla.org/mozilla-central/rev/6c924a018540
> BuildID 20140404040204
> Version 31.0a1
Thanks for isolating that; in the future, as requested in comment 5, can you get a minidump/crash log attached, here? See https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Debugging_OOMs
Comment 12•11 years ago
|
||
Have we determined here that this is a gecko bug or a bug in the test?
Flags: needinfo?(stephen.donner)
(In reply to Jason Smith [:jsmith] from comment #12)
> Have we determined here that this is a gecko bug or a bug in the test?
Looks like a core B2G process/Gecko crasher, to me. Bebe, please work with the team to get this further debugged (look at http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.tinderbox.ui.receive_call/42/ and http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.unittests/370/) for smaller tests to help reproduce. Malini/Gregor, can you take a look with the information we'll provide? Thx.
Flags: needinfo?(stephen.donner) → needinfo?(florin.strugariu)
Comment 14•11 years ago
|
||
Full logcat from
http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.tinderbox.ui.receive_call/42
Flags: needinfo?(florin.strugariu)
Comment 15•11 years ago
|
||
The problem here is not the step that's failing. It's the one just before that.
Strongly suspect this from the logcat:
I/Gecko ( 682): Security problem: Content process does not have `undefined'. It will be killed.
Some process is being killed and thus the following marionette client step fails.
Comment 16•11 years ago
|
||
It would really be helpful to have a live video on how the tests run and what happens when they fail.
I thought to run the whole tests suite locally, thinking that 1 test will not reproduce this.
In my run I caught "test_camera_multiple_shots.py" crashed.
Also attached OOM debugging before the crash.
Screenshot: http://postimg.org/image/ko59q81sp/
Comment 17•11 years ago
|
||
When running the whole test suite I could see some failures locally with:
I/Gecko (23169): 1396968219722 Marionette INFO sendToClient: {"from":"0","value":{"ELEMENT":"{1b27510b-b459-4034-b481-8eab8a079274}"}}, {8d1c8064-c11c-4157-9a50-205e42696fa0}, {8d1c8064-c11c-4157-9a50-205e42696fa0}
I/Gecko (23169): 1396968219736 Marionette INFO sendToClient: {"from":"0","value":{"ELEMENT":"{1b27510b-b459-4034-b481-8eab8a079274}"}}, {0b42070b-9090-43f2-8363-f9511cfe55ed}, {0b42070b-9090-43f2-8363-f9511cfe55ed}
I/Gecko (23169): 1396968219746 Marionette DEBUG Got request: switchToFrame, data: {"to":"0","sessionId":{"rotatable":true,"browserVersion":"31.0a1","takesScreenshot":true,"appBuildId":"20140408040204","cssSelectorsEnabled":true,"javascriptEnabled":true,"XULappId":"{3c2e2abc-06d4-11e1-ac3b-374f68613e61}","secureSsl":false,"platform":"ANDROID","browserName":"B2G","version":"31.0a1","device":"msm7627a","b2g":true,"nativeEvents":false,"platformVersion":"31.0a1","takesElementScreenshot":true,"platformName":"ANDROID","handlesAlerts":false},"name":"switchToFrame","parameters":{"focus":true,"element":"{1b27510b-b459-4034-b481-8eab8a079274}"}}, id: {41470f06-d969-48bb-a572-9d2d3592a703}
I/Gecko (23169): 1396968219751 Marionette INFO Switched to frame: {"frameValue":"{1b27510b-b459-4034-b481-8eab8a079274}"}
I/Gecko (23374): ###################################### forms.js loaded
I/Gecko (23169): 1396968219752 Marionette INFO trying remote frame 0
I/Gecko (23169): 1396968219767 Marionette INFO frame-manager load script: [object ChromeMessageSender]
I/Gecko (23374): ############################### browserElementPanning.js loaded
I/Gecko (23374): ######################## BrowserElementChildPreload.js loaded
I/Gecko (23169): Security problem: Content process does not have `undefined'. It will be killed.
I/Gecko (23169): [Parent 23169] WARNING: waitpid failed pid:23374 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 261
I/Gecko (23169):
I/Gecko (23169): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko (23169):
I/Gecko (23169): ############ ErrorPage.js
E/GeckoConsole(23169): [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:864"]
I/Gonk (23169): Setting nice for pid 23286 to 18
I/Gonk (23169): Changed nice for pid 23286 from 18 to 18.
I/Gecko (23169):
I/Gecko (23169): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
Comment 18•11 years ago
|
||
This is the logcat of
stack trace:
Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 163, in run
testMethod()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/tests/functional/gallery/test_gallery_view.py", line 23, in test_gallery_view
gallery.launch()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/gallery/app.py", line 24, in launch
Base.launch(self)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 25, in launch
self.app = self.apps.launch(self.name, launch_timeout=launch_timeout)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 60, in launch
self.marionette.switch_to_frame(app.frame_id)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 895, in switch_to_frame
response = self._send_message('switchToFrame', 'ok', element=frame.id, focus=focus)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-central.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 604, in _send_message
"Connection timed out", status=ErrorCodes.TIMEOUT)
TimeoutException: TimeoutException: Connection timed out
Most recent errors/exceptions are:
04-08 05:42:18.149 E/GeckoConsole( 2996): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:420 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/gaia_build_defer_index.js :: loadFile :: line 412" data: no]
04-08 05:42:20.459 E/GeckoConsole( 2996): Content JS ERROR at app://homescreen.gaiamobile.org/gaia_build_defer_index.js:81 in retrieve: Got an exception when trying to load icon "app://music2.gaiamobile.org/style/icons/60/Music.png +" falling back to cached icon. Exception is:
The B2G process has restarted after crashing during the tests so Marionette can't respond due to either a Gecko, Gaia or Marionette error. Above, the 5 most recent errors are listed. Check logcat for all errors if these errors are not the cause of the failure.
Build ID:
Showing version details
05:34:44 application_buildid: 20140407160203
05:34:44 application_changeset: d8b2e3738785
05:34:44 application_name: B2G
05:34:44 application_repository: https://hg.mozilla.org/mozilla-central
05:34:44 application_version: 31.0a1
05:34:44 build_changeset: 1ad48c4be51b279f7f63c1a13025b52fe087d231
05:34:44 device_firmware_date: 1387433095
05:34:44 device_firmware_version_incremental: 324
05:34:44 device_firmware_version_release: 4.0.4
05:34:44 device_id: msm7627a
05:34:44 gaia_changeset: 3796021a52948a2997d9e449ab2c83279d3e1074
05:34:44 gaia_date: 1396856401
05:34:44 gecko_changeset: 63e9fff3bca167b1ab0fce67f3ae122fe82b84ba
05:34:44 platform_buildid: 20140407160203
05:34:44 platform_changeset: d8b2e3738785
05:34:44 platform_repository: https://hg.mozilla.org/mozilla-central
Thomas, would you be able to help us see if we're running into a B2G process/kernel or socket crash/error, here?
Flags: needinfo?(tzimmermann)
Comment 20•11 years ago
|
||
QA Blocker - this is causing numerous problems in our UI automation suite right now. We need to get this fixed.
Comment 21•11 years ago
|
||
Zac - Can you find out the last time we didn't see this failure & first time it failed? I'm trying to pinpoint a regression range to help diagnose this.
Flags: needinfo?(zcampbell)
I *think* this regressed between the 19th and the 20th.
1) 19th build (GOOD): http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-central.unittests/291/testReport/ (might expire)
Build failures were: test_cleanup_gaia and test_connect_and_forget_all_networks (both of which were/are due to Wi-Fi, ala bug 987760
2) 20th build (BAD): http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-central.unittests/292/testReport/ (might expire)
Build failures were: test_cold_launch (Clock failed to launch)
Comment 23•11 years ago
|
||
I'm afraid I'm significantly hampered by being unable to reach the MV Jenkins server from London.
I can reach Tpe job server and unfortunately there the data only goes back 1 week. The oldest failing build there is:
Gaia 4cf7ec04b82320a1ff1f825220204f454f951a87
Gecko https://hg.mozilla.org/mozilla-central/rev/1c23863d8dda
BuildID 20140330160201
Version 31.0a1
which is 2 builds older than the one from comment #0.
The last successful report filed to qa-b2g was on Friday before that:
Gaia 9da1b9c11bf518bce882be305ae121c44c5d1e05
Gecko https://hg.mozilla.org/mozilla-central/rev/9afe2a1145bd
BuildID 20140327040202
Version 31.0a1
Which gives a very crude regression range to start with.
We (mostly Bebe and I) think this is more associated with executing javascript, or launching an app (possibly WindowManager) rather than a problem with Marionette, which the bug title suggests, so you could start looking at commits around that. We don't think it is a Marionette problem but we could be wrong!
Stephen might be able to narrow this range further using tinderbox test runs.
Updated•11 years ago
|
Flags: needinfo?(zcampbell)
Comment 24•11 years ago
|
||
Comment 25•11 years ago
|
||
Hmm - as Zac suggests, two of these logcats show the security error firing. When that security error shows up in the logcat, that will end killing the app out right.
Comment 26•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #25)
> Hmm - as Zac suggests, two of these logcats show the security error firing.
> When that security error shows up in the logcat, that will end killing the
> app out right.
Which typically is usually due to a DOM bug.
Andrew - Is there someone you could suggest here that could help diagnose the security errors we are seeing firing here? This is causing major problems in a lot of tests right now running reliably.
Flags: needinfo?(overholt)
Comment 27•11 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #19)
> Thomas, would you be able to help us see if we're running into a B2G
> process/kernel or socket crash/error, here?
I can probably help a bit, but I won't promise anything. I've been chasing these Marionette failures before, but often without much luck. Is the issue reproducible locally by running test_url_keyboard.TestUrlKeyboard?
Flags: needinfo?(tzimmermann)
Comment 28•11 years ago
|
||
I grep'ed for this error message from comment 15 and found it in
'dom/ipc/AppProcessChecker.cpp|, near line 118.
That's an internal function, which gets called by |AssertAppProcessPermission|, |AssertAppProcessManifestURL|, and |AssertAppHasPermission|. Grep'ing for these functions and sorting out impossible results, I get
content/base/src/nsFrameLoader.cpp:2303: return AssertAppProcessPermission(GetRemoteBrowser(),
content/base/src/nsFrameLoader.cpp:2310: return AssertAppProcessManifestURL(GetRemoteBrowser(),
dom/devicestorage/DeviceStorageRequestParent.cpp:365: if (!AssertAppProcessPermission(aParent, permissionName.get())) {
dom/filesystem/FileSystemRequestParent.cpp:68: if (!AssertAppProcessPermission(aParent, permissionName.get())) {
dom/indexedDB/ipc/IndexedDBParent.cpp:146: !AssertAppProcessPermission(mManagerContent, fullPermission.get())) ||
dom/indexedDB/ipc/IndexedDBParent.cpp:148: !AssertAppProcessPermission(mManagerTab, fullPermission.get()))) {
dom/ipc/ContentParent.cpp:3221: return AssertAppProcessPermission(this, NS_ConvertUTF16toUTF8(aPermission).get());
dom/ipc/ContentParent.cpp:3227: return AssertAppProcessManifestURL(this, NS_ConvertUTF16toUTF8(aManifestURL).get());
Can we get a stack trace?
Comment 29•11 years ago
|
||
Taipei CI is seeing the same problem but in the test failure stacktrace is consistently showing this error:
04-09 10:56:18.089 E/GeckoConsole( 7601): [JavaScript Error: "A promise chain failed to handle a rejection.
04-09 10:56:18.089 E/GeckoConsole( 7601): Full Message: TypeError: worker is null
However Taipei CI server does not capture the full logcat so I cannot correlate that with the MV CI server logs.
Comment 30•11 years ago
|
||
I managed to replicate this locally when testing "test_email_keyboard.TestEmailKeyboard" with repeat 30.
UI Tests app just stayed there after launching it, but not opening.
Traceback: https://pastebin.mozilla.org/4780229
Logcat: https://pastebin.mozilla.org/4780232
Comment 31•11 years ago
|
||
(In reply to [:AndreiH] from comment #30)
> I managed to replicate this locally when testing
> "test_email_keyboard.TestEmailKeyboard" with repeat 30.
> UI Tests app just stayed there after launching it, but not opening.
>
> Traceback: https://pastebin.mozilla.org/4780229
> Logcat: https://pastebin.mozilla.org/4780232
The logcat is incomplete; can you add it as an attachment so it will live longer?
Comment 32•11 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #28)
> Can we get a stack trace?
We'd be happy to help you help us; I'm not sure which stack trace, though -- can you clarify on what's needed? FWIW, we're running Engineering builds (with Marionette enabled), but they're not debug AFAIK, otherwise.
Comment 34•11 years ago
|
||
(In reply to Zac C (:zac) from comment #15)
> The problem here is not the step that's failing. It's the one just before
> that.
>
> Strongly suspect this from the logcat:
> I/Gecko ( 682): Security problem: Content process does not have
> `undefined'. It will be killed.
>
> Some process is being killed and thus the following marionette client step
> fails.
This means something is calling AssertAppProcess() passing in "undefined" as the last argument. Looking at the consumers in Gecko, I can't find anywhere that we pass "undefined" down to this function. This function however is indirectly exposed to JS (here is one example, not saying this call site has necessarily anything to do with this bug: http://mxr.mozilla.org/mozilla-central/source/dom/downloads/src/DownloadsAPI.jsm#145)
Now given the above, here's my theory. I bet some JS code is doing something like:
messageMananger.assertPermission(foo.bar);
where foo doesn't have a bar property. That creates a JS undefined value, which stringifies to "undefined".
This means we need to know the JS caller. The easiest way I can think of to do that would be to attach gdb to the child process, reproduce the bug so that this crash happens, and then do: |call DumpJSStack()| in gdb.
Comment 35•11 years ago
|
||
Instructions for attaching gdb to the child process: <https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_B2G_using_gdb#Debugging_out-of-process_tasks>
You basically need to run b2g-ps, get the pid for the process that you're interested in (looking at the name on the first column) and then do ./run-gdb attach <pid>.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #35)
> Instructions for attaching gdb to the child process:
> <https://developer.mozilla.org/en-US/Firefox_OS/Debugging/
> Debugging_B2G_using_gdb#Debugging_out-of-process_tasks>
>
> You basically need to run b2g-ps, get the pid for the process that you're
> interested in (looking at the name on the first column) and then do
> ./run-gdb attach <pid>.
:retornam, would you be able to try to get a stack for us, using one of the failing unittests? (test_cleanup_gaia, run repeatedly until it fails, looks like a good, lean candidate). Thanks either way.
Flags: needinfo?(mozbugs.retornam)
Updated•11 years ago
|
Severity: normal → critical
Comment 37•11 years ago
|
||
(Please try to get a C++ stack using the gdb bt command as well, that will also be useful. Thanks!)
Comment 38•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #35)
> Instructions for attaching gdb to the child process:
> <https://developer.mozilla.org/en-US/Firefox_OS/Debugging/
> Debugging_B2G_using_gdb#Debugging_out-of-process_tasks>
>
> You basically need to run b2g-ps, get the pid for the process that you're
> interested in (looking at the name on the first column) and then do
> ./run-gdb attach <pid>.
Err, sorry, make that the parent process.
Updated•11 years ago
|
Flags: needinfo?(overholt)
Comment 39•11 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #36)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #35)
> > Instructions for attaching gdb to the child process:
> > <https://developer.mozilla.org/en-US/Firefox_OS/Debugging/
> > Debugging_B2G_using_gdb#Debugging_out-of-process_tasks>
> >
> > You basically need to run b2g-ps, get the pid for the process that you're
> > interested in (looking at the name on the first column) and then do
> > ./run-gdb attach <pid>.
>
> :retornam, would you be able to try to get a stack for us, using one of the
> failing unittests? (test_cleanup_gaia, run repeatedly until it fails, looks
> like a good, lean candidate). Thanks either way.
After attaching the debugger, enter
break AppProcessChecker.cpp:120
before continuing the b2g process. This will make it stop right after the failed permission check happened. Entering
bt
will give you the stack trace.
Comment 40•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #34)
> (In reply to Zac C (:zac) from comment #15)
> > The problem here is not the step that's failing. It's the one just before
> > that.
> >
> > Strongly suspect this from the logcat:
> > I/Gecko ( 682): Security problem: Content process does not have
> > `undefined'. It will be killed.
> >
> > Some process is being killed and thus the following marionette client step
> > fails.
>
> This means something is calling AssertAppProcess() passing in "undefined" as
> the last argument. Looking at the consumers in Gecko, I can't find anywhere
> that we pass "undefined" down to this function. This function however is
> indirectly exposed to JS (here is one example, not saying this call site has
> necessarily anything to do with this bug:
> http://mxr.mozilla.org/mozilla-central/source/dom/downloads/src/DownloadsAPI.
> jsm#145)
>
> Now given the above, here's my theory. I bet some JS code is doing
> something like:
>
> messageMananger.assertPermission(foo.bar);
>
> where foo doesn't have a bar property. That creates a JS undefined value,
> which stringifies to "undefined".
>
> This means we need to know the JS caller. The easiest way I can think of to
> do that would be to attach gdb to the child process, reproduce the bug so
> that this crash happens, and then do: |call DumpJSStack()| in gdb.
Now I can see this logcat line in passing tests so I don't think it is a problem anymore. Or at least, not this problem!
Comment 41•11 years ago
|
||
(In reply to Zac C (:zac) from comment #40)
> Now I can see this logcat line in passing tests so I don't think it is a
> problem anymore. Or at least, not this problem!
That's actually good, because I have been able to reproduce a crash with test_url_keyboard.py, but wasn't seeing the security error mentioned above. I am seeing the following error though, which I am currently investigating.
E/GeckoConsole( 581): [JavaScript Error: "A promise chain failed to handle a rejection.
E/GeckoConsole( 581):
E/GeckoConsole( 581): Date: Thu Apr 10 2014 12:03:05 GMT-0400 (UTC)
E/GeckoConsole( 581): Full Message: TypeError: worker is null
E/GeckoConsole( 581): Full Stack: post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:417:11
E/GeckoConsole( 581): TaskImpl_run@resource://gre/modules/Task.jsm:282:1
E/GeckoConsole( 581): TaskImpl@resource://gre/modules/Task.jsm:247:3
E/GeckoConsole( 581): createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:224:7
E/GeckoConsole( 581): Task_spawn@resource://gre/modules/Task.jsm:139:5
E/GeckoConsole( 581): post/<@resource://gre/modules/osfile/osfile_async_front.jsm:402:28
E/GeckoConsole( 581): Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:748:11
E/GeckoConsole( 581): this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:627:7
E/GeckoConsole( 581): " {file: "resource://gre/modules/osfile/osfile_async_front.jsm" line: 417 column: 0 source: "417"}]
Comment 42•11 years ago
|
||
That looks like a separate thing Michael.
Just this afternoon I've had some luck replicating this, I can now hit it over 50% of the time. It seems to be a bit of a race between the startup of the device and the launching of the app.
I might have to call in Alive for some help soon :)
Comment 43•11 years ago
|
||
(In reply to Michael Henretty [:mhenretty] from comment #41)
> (In reply to Zac C (:zac) from comment #40)
> > Now I can see this logcat line in passing tests so I don't think it is a
> > problem anymore. Or at least, not this problem!
>
> That's actually good, because I have been able to reproduce a crash with
> test_url_keyboard.py, but wasn't seeing the security error mentioned above.
> I am seeing the following error though, which I am currently investigating.
>
> E/GeckoConsole( 581): [JavaScript Error: "A promise chain failed to handle
> a rejection.
> E/GeckoConsole( 581):
> E/GeckoConsole( 581): Date: Thu Apr 10 2014 12:03:05 GMT-0400 (UTC)
> E/GeckoConsole( 581): Full Message: TypeError: worker is null
> E/GeckoConsole( 581): Full Stack:
> post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:417:11
> E/GeckoConsole( 581): TaskImpl_run@resource://gre/modules/Task.jsm:282:1
> E/GeckoConsole( 581): TaskImpl@resource://gre/modules/Task.jsm:247:3
> E/GeckoConsole( 581):
> createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:224:7
> E/GeckoConsole( 581): Task_spawn@resource://gre/modules/Task.jsm:139:5
> E/GeckoConsole( 581):
> post/<@resource://gre/modules/osfile/osfile_async_front.jsm:402:28
> E/GeckoConsole( 581):
> Handler.prototype.process@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:748:11
> E/GeckoConsole( 581):
> this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:627:7
> E/GeckoConsole( 581): " {file:
> "resource://gre/modules/osfile/osfile_async_front.jsm" line: 417 column: 0
> source: "417"}]
David can tell us what this means.
Flags: needinfo?(dteller)
Comment 44•11 years ago
|
||
Alive, to summarise this bug, in the UI-tests test suite we are having a problem with apps freezing when launched (using the atom which you have worked on with us before).
The app freezes in its launch state with the icon in the middle. The phone is still active, but no apps can be launched, they all fail to launch.
By building a new Gaia profile with NOFTU=1 I managed to get to replicate it >80% of the time. I've narrowed it down to an issue with the timing of the startup. If I had a sleep(10) at the start of the test before launching the app it works fine, 100% passes.
I am thinking that the AppWindowManager or something is not fully initialised before we try to launch the app. Has something changed recently that regressed the startup of APIs? It seems like we need to add another wait into the startup of the b2g process to wait for something to be ready.
Flags: needinfo?(alive)
Comment 45•11 years ago
|
||
A new app which is was introduced in bug 978785. The killAll method in the atoms was killing this app and for reasons to be honest I don't understand was causing apps to fail to launch.
I've modified killAll to not kill this app. (another option would be to remove killAll and only kill the FTU).
Alive, I'm still interested in your ni? if you can provide context to why this fix works in the context of my ni? !
Attachment #8404969 -
Flags: review?(dave.hunt)
Comment 46•11 years ago
|
||
Comment on attachment 8404969 [details] [review]
github pr
This patch worked locally but not on other devices.
I think the patch is valid in another context so I'll file it there.
Attachment #8404969 -
Attachment is obsolete: true
Attachment #8404969 -
Flags: review?(dave.hunt)
Comment 47•11 years ago
|
||
I guess that patch just changed the timing subtly enough to allow the apps to launch.
Interesting. It means that the race condition I thought fixed in bug 961317 is not fully fixed yet. sigh.
I'll take a look tomorrow.
We should really activate OS.File's xpcshell tests on B2G to catch these errors earlier. Is there anything that prevents this?
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #43)
[...]
> > E/GeckoConsole( 581): [JavaScript Error: "A promise chain failed to handle
> > a rejection.
> > E/GeckoConsole( 581):
> > E/GeckoConsole( 581): Date: Thu Apr 10 2014 12:03:05 GMT-0400 (UTC)
> > E/GeckoConsole( 581): Full Message: TypeError: worker is null
> > E/GeckoConsole( 581): Full Stack:
> > post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:417:11
[...]
Comment 49•11 years ago
|
||
(In reply to comment #48)
> We should really activate OS.File's xpcshell tests on B2G to catch these errors
> earlier. Is there anything that prevents this?
Please file a bug for that! I guess the try server can answer that question.
Comment 50•11 years ago
|
||
(In reply to Michael Henretty [:mhenretty] from comment #41)
> I am seeing the following error though, which I am currently investigating.
>
> E/GeckoConsole( 581): [JavaScript Error: "A promise chain failed to handle
> a rejection.
> E/GeckoConsole( 581):
> E/GeckoConsole( 581): Date: Thu Apr 10 2014 12:03:05 GMT-0400 (UTC)
> E/GeckoConsole( 581): Full Message: TypeError: worker is null
> E/GeckoConsole( 581): Full Stack:
> post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:417:11
> E/GeckoConsole( 581): TaskImpl_run@resource://gre/modules/Task.jsm:282:1
> E/GeckoConsole( 581): TaskImpl@resource://gre/modules/Task.jsm:247:3
> E/GeckoConsole( 581):
> createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:224:7
> E/GeckoConsole( 581): Task_spawn@resource://gre/modules/Task.jsm:139:5
> E/GeckoConsole( 581):
> post/<@resource://gre/modules/osfile/osfile_async_front.jsm:402:28
> E/GeckoConsole( 581):
> Handler.prototype.process@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:748:11
> E/GeckoConsole( 581):
> this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm ->
> resource://gre/modules/Promise-backend.js:627:7
> E/GeckoConsole( 581): " {file:
> "resource://gre/modules/osfile/osfile_async_front.jsm" line: 417 column: 0
> source: "417"}]
As Zac mentioned in comment 42, this is probably unrelated to the automation failure. Bug 995162 has been filed to track that.
See Also: → 995162
Updated•11 years ago
|
Flags: needinfo?(dteller)
Comment 51•11 years ago
|
||
Discussed with Clint & Stephen - here's what we're going to do:
* We'll implement the sleep hack to get the tests working again
* We'll do a manual bisection of engineering builds from 3/19 - 3/20 to find the regressing patch
Comment 52•11 years ago
|
||
I've put the workaround in to unblock Taipei on their Monday:
https://github.com/mozilla-b2g/gaia/commit/68d8e6897f326a8980c81da80bf8a76440ef3faf
By reducing the items in the gaiatest setUp we can trigger this bug more easily. Previously we could only hit it 1 in 30 which made bisecting difficult but with some tweaks we should be able to bisect more effectively.
Comment 53•11 years ago
|
||
(In reply to Zac C (:zac) from comment #44)
> Alive, to summarise this bug, in the UI-tests test suite we are having a
> problem with apps freezing when launched (using the atom which you have
> worked on with us before).
>
> The app freezes in its launch state with the icon in the middle. The phone
> is still active, but no apps can be launched, they all fail to launch.
>
> By building a new Gaia profile with NOFTU=1 I managed to get to replicate it
> >80% of the time. I've narrowed it down to an issue with the timing of the
> startup. If I had a sleep(10) at the start of the test before launching the
> app it works fine, 100% passes.
>
> I am thinking that the AppWindowManager or something is not fully
> initialised before we try to launch the app. Has something changed recently
> that regressed the startup of APIs? It seems like we need to add another
> wait into the startup of the b2g process to wait for something to be ready.
Uh....I don't think 'freeze' is a gaia issue anyway. We cannot 'stop' any app from gaia side.
Let's have the regression window instead of just guessing. Thanks!
If it's proven to be gaia side I will take a look.
Flags: needinfo?(alive)
Comment 54•11 years ago
|
||
Thanks Alive,
We are bisecting on-device builds right now, results in a couple of hours.
Comment 55•11 years ago
|
||
Last passing (source: releng): 20140329040206
First failing (source: tbpl): 20140329090231
The Gecko range for this is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c401296f71ae&tochange=8da3aabb044f
The Gaia commits were identical between the last passing build and the first failing.
Comment 56•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 57•11 years ago
|
||
Realize now that this should have been marked leave open.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 58•11 years ago
|
||
(In reply to Zac C (:zac) from comment #55)
> Last passing (source: releng): 20140329040206
> First failing (source: tbpl): 20140329090231
>
> The Gecko range for this is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=c401296f71ae&tochange=8da3aabb044f
>
> The Gaia commits were identical between the last passing build and the first
> failing.
Good - that at least proves with hard evidence that this is a gecko bug.
Comment 59•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #58)
> (In reply to Zac C (:zac) from comment #55)
> > Last passing (source: releng): 20140329040206
> > First failing (source: tbpl): 20140329090231
> >
> > The Gecko range for this is:
> > http://hg.mozilla.org/mozilla-central/
> > pushloghtml?fromchange=c401296f71ae&tochange=8da3aabb044f
> >
> > The Gaia commits were identical between the last passing build and the first
> > failing.
>
> Good - that at least proves with hard evidence that this is a gecko bug.
Note - you can go a bit a deeper with the window if you continue the bisection using mozilla-inbound builds.
https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/mozilla-inbound-hamachi-eng/
That will get us down to only a couple of changesets.
Comment 60•11 years ago
|
||
Ah I didn't realise they were available. We'll get onto it straight away!
Comment 61•11 years ago
|
||
The first failing m-i build is: https://tbpl.mozilla.org/?showall=1&tree=Mozilla-Inbound&rev=574ec49a35b8
Comment 62•11 years ago
|
||
(In reply to comment #61)
> The first failing m-i build is:
> https://tbpl.mozilla.org/?showall=1&tree=Mozilla-Inbound&rev=574ec49a35b8
Can you please continue bisecting the b2g-inbound range there?
Comment 63•11 years ago
|
||
Yep!
Comment 64•11 years ago
|
||
First failing b2g-i is: https://tbpl.mozilla.org/?showall=1&tree=B2g-Inbound&rev=95130f72756e
Not sure where to go from here, is there a greater resolution of builds I can test?
Comment 65•11 years ago
|
||
(In reply to comment #64)
> First failing b2g-i is:
> https://tbpl.mozilla.org/?showall=1&tree=B2g-Inbound&rev=95130f72756e
>
> Not sure where to go from here, is there a greater resolution of builds I can
> test?
This range includes commits from inbound and fx-team. I suspect the inbound commits are more interesting. I don't really know what the best way to use the inbound/b2g-inbound builds for bisection is but it looks like you're getting stuck in finding the regression on inbound and b2g-inbound back and forth. One option would be building and bisecting locally, I have no further ideas.
Comment 66•11 years ago
|
||
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #65)
> (In reply to comment #64)
> > First failing b2g-i is:
> > https://tbpl.mozilla.org/?showall=1&tree=B2g-Inbound&rev=95130f72756e
> >
> > Not sure where to go from here, is there a greater resolution of builds I can
> > test?
>
> This range includes commits from inbound and fx-team. I suspect the inbound
> commits are more interesting. I don't really know what the best way to use
> the inbound/b2g-inbound builds for bisection is but it looks like you're
> getting stuck in finding the regression on inbound and b2g-inbound back and
> forth. One option would be building and bisecting locally, I have no
> further ideas.
We shouldn't be bisecting b2g-inbound here - the m-c regression ranges points to a regression from a landing on mozilla-inbound, not b2g-inbound.
Comment 67•11 years ago
|
||
Weird. The mozilla-inbound push log indicates that this happened post merging m-c back into mozilla-inbound.
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=245d0cb5a7b3&tochange=574ec49a35b8
Comment 68•11 years ago
|
||
For the sake of my own sanity - can we confirm the m-c & mozilla-inbound ranges are correct with some more testing of the builds for last working & first failed?
Comment 69•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #68)
> For the sake of my own sanity - can we confirm the m-c & mozilla-inbound
> ranges are correct with some more testing of the builds for last working &
> first failed?
I've gone through a lot of the builds on the last-working side and refined the range down to:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=88ae1bfaaf3d&tochange=eb6da95b659e
I will move to m-i bisection tomorrow.
Updated•11 years ago
|
Summary: Tests are failing on Jenkins with timeout when trying to switch to the app frame → Tests are failing; app freezes after being launched
Comment 70•11 years ago
|
||
OK after comment #69 I changed my replication scenario to increase the number of repetitions to 10. Previously it was 5 and sometimes all these tests would pass giving a false positive.
After bisecting on b2g-i the range from comment #69 I have narrowed it down to one changeset:
http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=996fd5d8f606&tochange=0bc0f7044d51
which points to: https://bugzilla.mozilla.org/show_bug.cgi?id=936703
After getting this far I again went back and forward over the lw and ff and ran each again 10 times which confirmed my initial results.
Then I flashed with the latest nightly and re-ran my test scenario to be sure that the latest master is still hitting the problem and I confirmed that it is.
I reverted the Gaia commit landed in bug 936703 and reset-gaia and ran the test cases again 20 times, more than it usually takes to fail. I'm 100% sure this is the offending commit now.
Flags: needinfo?(mozbugs.retornam)
Comment 71•11 years ago
|
||
The offending commit:
https://github.com/mozilla-b2g/gaia/commit/de6c28247b128fcbd2bc8172fd77ced3eeb5872c
Comment 73•11 years ago
|
||
Vicamo, I've left the steps to replicate with Askeing and Eric if you want to replicate it locally.
Updated•11 years ago
|
blocking-b2g: 2.0? → 2.0+
Comment 74•11 years ago
|
||
I have following errors as well with bug 936703 reverted. I don't think those error messages mentioned above have anything to do with bug 936703. I'm going to run Zac's test script locally. Thank you.
E/GeckoConsole( 44): [JavaScript Warning: "mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]]
value using Object.create" {file: "resource://gre/modules/Preferences.jsm" line: 378}]
E/GeckoConsole( 44): [JavaScript Warning: "TypeError: mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[
Prototype]] value using Object.create" {file: "app://system.gaiamobile.org/js/browser_context_menu.js" line: 31}]
I/Gecko ( 44): *************************
I/Gecko ( 44): A coding exception was thrown and uncaught in a Task.
I/Gecko ( 44):
I/Gecko ( 44): Full message: TypeError: worker is null
I/Gecko ( 44): Full stack: post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:417:11
I/Gecko ( 44): TaskImpl_run@resource://gre/modules/Task.jsm:282:1
I/Gecko ( 44): TaskImpl@resource://gre/modules/Task.jsm:247:3
I/Gecko ( 44): createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:224:7
I/Gecko ( 44): Task_spawn@resource://gre/modules/Task.jsm:139:5
I/Gecko ( 44): post/<@resource://gre/modules/osfile/osfile_async_front.jsm:402:28
I/Gecko ( 44): Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:863:11
I/Gecko ( 44): this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:742:7
I/Gecko ( 44):
I/Gecko ( 44): *************************
E/GeckoConsole( 44): [JavaScript Error: "A promise chain failed to handle a rejection.
E/GeckoConsole( 44):
E/GeckoConsole( 44): Date: Wed Apr 16 2014 06:02:08 GMT+0000 (UTC)
E/GeckoConsole( 44): Full Message: TypeError: worker is null" {file: "resource://gre/modules/osfile/osfile_async_front.jsm" line: 417 column: 0 source: "417"}]
E/GeckoConsole( 169): Content JS ERROR at app://homescreen.gaiamobile.org/js/configurator.js:144 in loadSVConfFileError: Failed parsing singleVariant configuration file [js/singlevariantconf.json]: [Exception... "File error: Not found" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: app://homescreen.gaiamobile.org/js/configurator.js :: loadFile :: line 70" data: no]
E/GeckoConsole( 169): Content JS ERROR at app://homescreen.gaiamobile.org/js/icon_retriever.js:67 in onerror: error while HTTP GET: http://inapp-pay-test.paas.allizom.org/img/out-64.png
E/GeckoConsole( 169): Content JS ERROR at app://homescreen.gaiamobile.org/js/icon_retriever.js:67 in onerror: error while HTTP GET: app://music2.gaiamobile.org/style/icons/60/Music.png
Comment 75•11 years ago
|
||
Vicamo yes I agree that the error messages we were investigating in the past are unrelated.
When the app freezes it leave no error messages at all so it's quite confusing!
Comment 76•11 years ago
|
||
In this archive I placed four log files,
1) with bug 936703, run Gaia test_cold_launch.py in FTU,
2) with bug 936703, run Gaia test_cold_launch.py in Homescreen,
3) with bug 936703 REVERTED, run Gaia test_cold_launch.py in FTU,
4) with bug 936703 REVERTED, run Gaia test_cold_launch.py in Homescreen,
It's all based on https://github.com/zacc/gaia/tree/bug_990580-bisection . Commands executed:
shell-1$ cd $(B2G_HOME)
shell-1$ cd gaia; O=`git clean -d -x -n | cut -d ' ' -f 3 | cut -d/ -f 1 | sort -u | grep -v xulrunner`; [ -n "$O" ] && git clean -d -x -f $O; cd -
shell-1$ cd gaia/tests/python/gaia-ui-tests/; python setup.py develop; cd -
shell-1$ nice -n 19 ./build.sh && ./run-emulator.sh
shell-2$ adb forward tcp:2828 tcp:2828
shell-2$ (wait for either FTU or Homescreen)
shell-2$ gaiatest --address=localhost:2828 --testvars=`pwd`/testvars.json --restart --timeout 30000 --repeat 10 gaia/tests/python/gaia-ui-tests/gaiatest/tests/unit/test_cold_launch.py
I never have a successful run in any of them. All the test runs failed with timeout error. Please help point out if I'm wrong in some step.
Flags: needinfo?(vyang) → needinfo?(zcampbell)
Comment 77•11 years ago
|
||
"A timeout error" ? Please be more specific.
I mean this bug does cause a timeout error so you're either hitting the failure or you've done something else wrong.
Flags: needinfo?(zcampbell) → needinfo?(vyang)
Comment 78•11 years ago
|
||
The console would be more helpful than the logcat.
Comment 79•11 years ago
|
||
Flags: needinfo?(vyang)
Comment 80•11 years ago
|
||
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #79)
> Created attachment 8407404 [details]
> bug-990580_gaia-cold-launch_qemu-ics_console.txt
What device and Gaia build is this? It looks like Gaia is failing to start completely, possibly there is no profile or it's corrupted.
I think you should re-flash with the nightly build and just run the steps as you have.
Askeing can you help Vicamo to replicate this?
Flags: needinfo?(fyen)
Comment 81•11 years ago
|
||
(In reply to Zac C (:zac) from comment #80)
> (In reply to Vicamo Yang [:vicamo][:vyang] from comment #79)
> > Created attachment 8407404 [details]
> > bug-990580_gaia-cold-launch_qemu-ics_console.txt
>
> What device and Gaia build is this? It looks like Gaia is failing to start
> completely, possibly there is no profile or it's corrupted.
> I think you should re-flash with the nightly build and just run the steps as
> you have.
It's based on emulator-ics, Gecko m-c HEAD (http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=fa7c5ad357b11c1c11d71e14f28d97cc3c05ef74), Gaia https://github.com/zacc/gaia/tree/bug_990580-bisection . I can surely boot to FTU & Homescreen.
Comment 82•11 years ago
|
||
OK that's why it is timing out - emulator is very slow to load. I don't expect you can replicate this on the emulator build.
You need to use a Hamachi/Buri.
Comment 83•11 years ago
|
||
I don't think that's a valid explanation especially when emulator is mostly idle during the process. I'll still run the test on Nexus-5 again, which is the fastest device I have.
Comment 84•11 years ago
|
||
Ok you can try again with emulator but you need to increase the timeout on line 715 of gaia_test.py to around 120 seconds and on the command line use maybe --timeout=60000 because it is slower. Even then I can't be sure you'll replicate this issue as I've only tried on device!
Comment 85•11 years ago
|
||
We can replicate this issue on Buri. If Vicamo want to try other device, then I will try to help him to setup the environment.
Flags: needinfo?(fyen)
Comment 86•11 years ago
|
||
Thanks Askeing!
I think Buri/Hamachi is the most important because that's the device on MV and Tpe CIs where this is blocking us.
Comment 87•11 years ago
|
||
I can replicate this with a Nexus-5 on the first shot.
Attachment #8407396 -
Attachment is obsolete: true
Attachment #8407404 -
Attachment is obsolete: true
Comment 88•11 years ago
|
||
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #87)
> Created attachment 8408162 [details]
> bug-990580_gaia-cold-launch_nexus-5_adb-logcat.zip
>
> I can replicate this with a Nexus-5 on the first shot.
What do you mean, did the test fail with and without the patch applied?
Flags: needinfo?(vyang)
Comment 89•11 years ago
|
||
(In reply to Zac C (:zac) from comment #88)
> (In reply to Vicamo Yang [:vicamo][:vyang] from comment #87)
> > Created attachment 8408162 [details]
> > bug-990580_gaia-cold-launch_nexus-5_adb-logcat.zip
> >
> > I can replicate this with a Nexus-5 on the first shot.
>
> What do you mean, did the test fail with and without the patch applied?
Test pass with bug 936703 reverted; failed otherwise. However, I have only one success (with bug 936703 reverted) and one failure so far. Nexus-5 often fails to boot to homescreen.
Flags: needinfo?(vyang)
Comment 90•11 years ago
|
||
I think we can narrow down to |asyncStorage.getItem|. If I partially revert what's done to that in bug 936703, then that switchToFrame timeout can't be easily replicated.
Comment 91•11 years ago
|
||
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #90)
> I think we can narrow down to |asyncStorage.getItem|. If I partially revert
> what's done to that in bug 936703, then that switchToFrame timeout can't be
> easily replicated.
Thanks Vicamo that's positive news. How do you think we should proceed; a backout or can you fix it?
Comment 92•11 years ago
|
||
It's "apps/system/js/ftu_launcher.js". Somehow if the |asyncStorage.getItem| call in it only returns result on transaction oncomplete event, then the timeout comes back. My steps to reproduce:
1) add an |asyncStorage.getItemUnsafe| call that returns result in
IDBRequest.onsuccess event. => test PASS,
2) let every call to |asyncStorage.getItem| call
|asyncStorage.getItemUnsafe| instead. Now it's practically
identical to revert bug 936703. => test PASS,
3) call to |asyncStorage.getItem| in "apps/system/js/ftu_launcher.js"
=> test FAIL
Comment 93•11 years ago
|
||
(In reply to Zac C (:zac) from comment #91)
> (In reply to Vicamo Yang [:vicamo][:vyang] from comment #90)
> > I think we can narrow down to |asyncStorage.getItem|. If I partially revert
> > what's done to that in bug 936703, then that switchToFrame timeout can't be
> > easily replicated.
>
> Thanks Vicamo that's positive news. How do you think we should proceed; a
> backout or can you fix it?
You want a backout then you get a backout, I won't stop you from that, but I guess somebody else does. The problem is the unsafe IndexedDB usage in System App, not in those IndexedDB helpers.
Comment 94•11 years ago
|
||
Thanks for your excellent help Vicamo.
I won't force a backout, if it will only take a couple of days then we can wait to fix the unsafe usage in ftu_launcher.js. Let's file a bug for that (I'm supposed to be on holiday today!)
Comment 95•11 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=998255 is filed for the unsafe usage in the System app.
Comment 96•11 years ago
|
||
Something in this Gecko range has solved this bug:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5010b38abf18&tochange=78b0aaa91320
Just going to complete a bit more testing today and then will change the status of this bug.
Comment 97•11 years ago
|
||
Attachment #8410163 -
Flags: review?(florin.strugariu)
Updated•11 years ago
|
Attachment #8410163 -
Flags: review?(florin.strugariu) → review+
Comment 98•11 years ago
|
||
Comment 99•11 years ago
|
||
(In reply to Zac C (:zac) from comment #96)
> Something in this Gecko range has solved this bug:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=5010b38abf18&tochange=78b0aaa91320
>
> Just going to complete a bit more testing today and then will change the
> status of this bug.
This is working very reliably now. I don't think I have time to do a better analysis of how it was fixed.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•