921194 - TypeError: stack is undefined from trusted_ui.js, line 303

Malini Das [:mdas] - Away, not checking bugmail

Reporter

Description

•

12 years ago

In our automation, we call WindowManager's kill command on all applications (except homescreen), and this has worked up until recently, when we started to see: 3:53:14 TEST-UNEXPECTED-FAIL | test_settings_device_info.py TestSettingsDeviceInfo.test_that_device_info_is_populated | JavascriptException: TypeError: stack is undefined 13:53:14 stacktrace: 13:53:14 @app://system.gaiamobile.org/js/trusted_ui.js, line 303 13:53:14 ------------------------------------------------------------------ This is happening frequently, as you can see in some of the gaia-ui-test automation: http://qa-selenium.mv.mozilla.com:8080/view/B2G Hamachi/job/b2g.hamachi.mozilla-central.ui/83/console There is no reduced test case we can think of right now, we only see this with central hamachi, running the unit test suite (where you'd need to set up your environment and testvars file according to https://github.com/mozilla-b2g/gaia/tree/master/tests/python/gaia-ui-tests): gaiatest --testvars=/home/webqa/webqa-credentials/b2g/b2g-2.json --address=localhost:2828 --type=b2g-xfail-unit --restart gaiatest/tests/manifest.ini My guess is that it's from this change: https://github.com/mozilla-b2g/gaia/commit/2e417d84777e3cfc974da1fa4745a2fbf9ad32aa

Malini Das [:mdas] - Away, not checking bugmail

Reporter

Comment 1

•

12 years ago

Or you know, maybe not that change, that was from 11 months ago.

Jason Smith [:jsmith]

Comment 2

•

12 years ago

Alive has done a bunch of changes on this recently, so he probably has some input as to why this is happening.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

12 years ago

Blocks: 801898

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 3

•

12 years ago

This does happen before my change at random rate, and that's why I filed https://bugzilla.mozilla.org/show_bug.cgi?id=911880 because trustedUI has some problem by maintain the app stack.

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 4

•

12 years ago

Could you tell me how to run your automation? (In reply to Malini Das [:mdas] from comment #0) > In our automation, we call WindowManager's kill command on all applications > (except homescreen), and this has worked up until recently, when we started > to see:

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Updated

•

12 years ago

Assignee: nobody → alive

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 5

•

12 years ago

Note: See https://bugzilla.mozilla.org/show_bug.cgi?id=863117#c0 (In reply to Alive Kuo [:alive] from comment #4) > Could you tell me how to run your automation? > > (In reply to Malini Das [:mdas] from comment #0) > > In our automation, we call WindowManager's kill command on all applications > > (except homescreen), and this has worked up until recently, when we started > > to see:

Jason Smith [:jsmith]

Comment 6

•

12 years ago

(In reply to Alive Kuo [:alive] from comment #4) > Could you tell me how to run your automation? https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/README.md could provide information on how to run the tests.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 7

•

12 years ago

The b2gperf tests are also affected by this, causing all following tests to be aborted.

Severity: normal → blocker

Whiteboard: [qa-automation-blocked]

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 8

•

12 years ago

See http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Perf/job/b2g.inari.mozilla-central.master.perf/335/console (requires VPN)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

12 years ago

Blocks: 921385

Stephen Donner [:stephend] Not actively reading bugmail

Comment 9

•

12 years ago

Bug 905116 was mentioned by :lightsofapollo as a /possible/ suspect, but nobody's yet had a chance to try an on-device backout and re-test.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 10

•

12 years ago

For what it's worth, in the performance startup tests this appears to almost always fail when starting the clock test. I haven't been able to narrow this down yet, but it might be related to launching the browser app 30 times between restarts, which immediately precedes the clock launches. I've added some extra resilience to b2gperf to allow it to continue after this, but it means we're losing measurements for clock at this time.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 11

•

12 years ago

Possibly a coincidence, but the b2gperf test after bug 920944 was fixed passed. Perhaps that had further implications that just the ones seen there.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 12

•

12 years ago

(In reply to Dave Hunt (:davehunt) from comment #11) > Possibly a coincidence, but the b2gperf test after bug 920944 was fixed > passed. Perhaps that had further implications that just the ones seen there. Or not... The Hamachi build with the same revision failed. I have replicated the failure locally but haven't had time to see if I can reduce it down. It currently takes a couple of hours before the failure occurs, however I should be able to test if backing out bug 905116 locally helps.

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 13

•

12 years ago

This does happen occasionally before bug 905116 landed. I don't think back-out that works.

Kyle Wooten

Comment 14

•

12 years ago

Thanks!

Zac C (:zac)

Comment 15

•

12 years ago

Can we just fix it for good instead of a backout? This is really affecting a lot of our automation, sometimes greater than 10% of tests. The failure seems to rely on perfect alignment of this Exception and marionette running execute_aync_script which is why it comes as so intermittent.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

12 years ago

Blocks: 925494

Jason Smith [:jsmith]

Updated

•

12 years ago

No longer blocks: 801898

Jason Smith [:jsmith]

Updated

•

12 years ago

Blocks: 801898

Ben Kelly [:bkelly, not reviewing]

Comment 16

•

12 years ago

I've reproduced locally and noticed this in the logcat just after the trusted UI error: E/GeckoConsole(30228): Content JS LOG at dummy file:971 in GaiaApps.kill: terminating app with origi n 'app://calendar.gaiamobile.org' E/GeckoConsole(30228): Content JS LOG at dummy file:961 in GaiaApps.kill/gt_onAppTerminated/<: app with origin 'app://calendar.gaiamobile.org' has terminated This doesn't appear when we are successful. Doesn't fully explain whats going on, but perhaps a clue.

Ben Kelly [:bkelly, not reviewing]

Comment 17

•

12 years ago

I hacked apps/calendar/js/controllers/alarm.js to always use `duration=1` in _resetSyncAlarm. This forces the calendar sync process to launch every minute. With this in place I can reliably trigger the trusted UI failure by running b2gperf. So it definitely seems related to trying to kill the Calendar app launched from a mozAlarm.

Ben Kelly [:bkelly, not reviewing]

Comment 18

•

12 years ago

Ok, I see what is happening here. For the normal app life cycle trusted_ui.js sets the app origin to `this._lastDisplayedApp` when it receives the `appwillopen` event. This then automatically creates an entry for that origin in the `this._displayedStacks` hash when you access `this.currentStack`. This works out for normal apps because the `appterminated` event will have the same matching origin and will be found in `this._displayedStacks`. In the case of non-normal app launches, for example from mozAlarm, then you may never get `appwillopen`. In this case `this._lastDisplayedApp` is never set which results in a stack being created at `this._displayedStacks["null"]`. Since there is no entry for the origin from the `appterminated` event, no stack is found and the type error occurs. I'm not sure what the correct fix is here. Perhaps something as simple as: if (origin && this._displayedApps[origin]) { stack = this._displayedApps[origin]; } In any case, I really think we should add a test case handling the non-normal app path here. So a test that sets a very quick mozAlarm and then tries to kill it. Since this is such an outlier case that will only happen infrequently it will be easy to regress on this. Alive, what do you think is the best path forward?

Flags: needinfo?(alive)

Ben Kelly [:bkelly, not reviewing]

Comment 19

•

12 years ago

As an aside, should the b2gperf and gaia-ui-tests clear all mozAlarms? On the one hand, it would produce more consistent results, but on the other hand we need the system function well with alarms running. Thoughts?

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 20

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #18) > Ok, I see what is happening here. > > For the normal app life cycle trusted_ui.js sets the app origin to > `this._lastDisplayedApp` when it receives the `appwillopen` event. This > then automatically creates an entry for that origin in the > `this._displayedStacks` hash when you access `this.currentStack`. This > works out for normal apps because the `appterminated` event will have the > same matching origin and will be found in `this._displayedStacks`. > > In the case of non-normal app launches, for example from mozAlarm, then you > may never get `appwillopen`. In this case `this._lastDisplayedApp` is never > set which results in a stack being created at > `this._displayedStacks["null"]`. Since there is no entry for the origin > from the `appterminated` event, no stack is found and the type error occurs. > > I'm not sure what the correct fix is here. Perhaps something as simple as: > > if (origin && this._displayedApps[origin]) { > stack = this._displayedApps[origin]; > } > > In any case, I really think we should add a test case handling the > non-normal app path here. So a test that sets a very quick mozAlarm and > then tries to kill it. Since this is such an outlier case that will only > happen infrequently it will be easy to regress on this. > > Alive, what do you think is the best path forward? Much thanks for spending time on this. The proposal is fine. The said mozAlarm should be invoked by system message, and window manager put the invoked frame in background. In current stage, I would like to have 'appcreated' event in window manager when a app frame is created. (we have similar event for homescreen app already, see https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/homescreen_window.js#L5). |created| would be paired with |terminated| event doubtless. Ben, do you like to steal this bug or want me to complete it? Thank you again for investigating!

Flags: needinfo?(alive)

Ben Kelly [:bkelly, not reviewing]

Comment 21

•

12 years ago

Alive, can you complete? I'm not that familiar with this part of the code and I still have a number of 1.2 launch/fps bugs to deal with. Thanks!

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 22

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #21) > Alive, can you complete? I'm not that familiar with this part of the code > and I still have a number of 1.2 launch/fps bugs to deal with. Thanks! Sure, thank you for fast reply!

(inactive after 6/18) Alive Kuo [:alive]

Assignee

Comment 23

•

12 years ago

Attached file https://github.com/mozilla-b2g/gaia/pull/12804 — Details

Fix async app life state in trustedUI

Attachment #816325 - Flags: review?(timdream)

Tim Guan-tin Chien [:timdream] (please needinfo)

Updated

•

12 years ago

Attachment #816325 - Flags: review?(timdream) → review+

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 24

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #19) > As an aside, should the b2gperf and gaia-ui-tests clear all mozAlarms? On > the one hand, it would produce more consistent results, but on the other > hand we need the system function well with alarms running. Thoughts? We're seeing this in CI after flashing the device, so there shouldn't be any mozAlarms present (should there?). Also, the gaia-ui-tests run with --restart, which removes all persistent storage between tests, so there should be no mozAlarms there either...

Zac C (:zac)

Comment 25

•

12 years ago

Ben and Alive, thanks a lot for your work on this bug!

Ben Kelly [:bkelly, not reviewing]

Comment 26

•

12 years ago

(In reply to Dave Hunt (:davehunt) from comment #24) > We're seeing this in CI after flashing the device, so there shouldn't be any > mozAlarms present (should there?). Also, the gaia-ui-tests run with > --restart, which removes all persistent storage between tests, so there > should be no mozAlarms there either... Dave, are you sure we wipe the mozAlarm database between runs? I was seeing the calendar alarms persist across the b2gperf executed restart locally. I suspect that the alarm was being added after the flash when the Calendar set of tests run. So the failure probably occurs around 15 minutes after those Calendar tests.

Ben Kelly [:bkelly, not reviewing]

Comment 27

•

12 years ago

Thanks for the quick fix! Since we think this might address the automation breakage I took the liberty of merging to get it into the build queue faster: https://github.com/mozilla-b2g/gaia/commit/19cbc0a11fa3f17908027569fdced8421d03ee0f

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 28

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #26) > (In reply to Dave Hunt (:davehunt) from comment #24) > > We're seeing this in CI after flashing the device, so there shouldn't be any > > mozAlarms present (should there?). Also, the gaia-ui-tests run with > > --restart, which removes all persistent storage between tests, so there > > should be no mozAlarms there either... > > Dave, are you sure we wipe the mozAlarm database between runs? I was seeing > the calendar alarms persist across the b2gperf executed restart locally. > > I suspect that the alarm was being added after the flash when the Calendar > set of tests run. So the failure probably occurs around 15 minutes after > those Calendar tests. We flash between runs, so everything should be wiped. We don't clear between apps within the b2gperf run though, and the failure is after the Calendar launch. I don't understand why just opening the Calendar app would add alarms though...

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 29

•

12 years ago

I've raised bug 926454 to clear up between app launches in b2gperf.

Zac C (:zac)

Comment 30

•

12 years ago

Sorry I forgot to mention that I did check this earlier. I ran some gaia-ui-tests clock tests on a device on a loop and they definitely weren't being kept between test runs. I think the tests would have failed long ago if that were the case anyway.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 31

•

12 years ago

Zac: Apparently just launching Calendar creates an alarm for the syncing service ~15 minutes later. This explains the b2gperf issues but not necessarily the Gaia UI tests issues. I wonder if there's a different app other than Calendar that could be involved, or if we're somehow not clearing that particular mozAlarm...

Zac C (:zac)

Comment 32

•

12 years ago

oh pardon me, I didn't understand the STR then!