Closed Bug 1064305 Opened 10 years ago Closed 9 years ago

TEST-UNEXPECTED-FAIL | /builds/slave/test/gaia/apps/system/fxa/test/marionette/fxa_screen_flow_test.js | Firefox Accounts Screen Flow Test (UITest app) should walk screen flow for existing user

Categories

(Firefox OS Graveyard :: FxA, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rpapa, Assigned: rpapa)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [b2g-crash])

Attachments

(4 files)

Marionette JS tests for Firefox Accounts fail intermittently (~50% of the time) due to timeout / element not found error. Example: https://travis-ci.org/mozilla-b2g/gaia/jobs/34392829 Solution: Need to replace findElement w/ waitForElement Reference: https://groups.google.com/forum/#!topic/mozilla.dev.gaia/Lfy2lHhjNTM
Assignee: nobody → rpappalardo
Attached file Git PR
Attachment #8485758 - Flags: review?(6a68)
Looks like still not passing on TBPL. Maybe the fixes help, but they haven't yet resolved timeout issues: https://bugzilla.mozilla.org/show_bug.cgi?id=1064069 Perhaps I should remove PR til I can get this to pass on TBPL?
Hey Richard, I'd suggest you keep the PR open while you try to isolate the bug. That way, each time you push new changes, a new gaia-try job will be started for you. If you aren't able to reproduce the failures locally, you can ping dev-gaia or ask in #releng to try to figure out environment variables that differ. If you'd like to kick off multiple gaia-try jobs, I am happy to vouch you for level 1 commit access, and explain how to create gaia-try test runs. Are you interested in this? If so, follow the instructions on the committer page[1]. You can ni? me in the bug to vouch for you (I need to bump my access level first[2], but I think we can get both done this week). The process for creating gaia-try test runs is documented at [3]. Also, a minor thing, but if you could write r=6a68, not r=_6a68, in the commit message, I'd prefer it; the underscore is only there for archaic systems (like bugzilla and IRC) which don't permit initial digits in usernames ^_^ Jared [1] https://www.mozilla.org/hacking/committer/ [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1064469 [3] https://wiki.mozilla.org/ReleaseEngineering/TryServer#Using_a_custom_Gaia (read the whole page for more context)
Comment on attachment 8485758 [details] [review] Git PR Ah, but I will unset the r? flag until you've got the tests consistently green. Once they're ready, set r? again and I'll look.
Attachment #8485758 - Flags: review?(6a68)
Just for information: when all the integration tests take more than 2 hours to run (which they often do now), the stack traces for your test failures will get eaten by the entire test harness timing out. This makes it much harder to see what is going on, and this happened on your PR here: https://tbpl.mozilla.org/?rev=6b8664dc03294cf285cb847173f9387c209505f2&tree=Gaia-Try I retriggered it a few times, to try and get the stack trace. Most likely it is the same issue you mentioned in comment 0, which we've also seen on tbpl: 14:22:57 INFO - 1) Firefox Accounts Screen Flow Test (UITest app) should walk screen flow for new user: 14:22:57 INFO - Error: timeout exceeded! 14:22:57 INFO - at Object.Client.waitForSync (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:682:16) 14:22:57 INFO - at Object.Client.waitFor (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:650:60) 14:22:57 INFO - at Object.MarionetteHelper.waitForElement (/builds/slave/test/gaia/node_modules/marionette-helper/index.js:142:12) 14:22:57 INFO - at Object.FxA.clickDone (/builds/slave/test/gaia/apps/system/fxa/test/marionette/lib/fxa.js:130:12) 14:22:57 INFO - at Context.<anonymous> (/builds/slave/test/gaia/apps/system/fxa/test/marionette/fxa_screen_flow_test.js:44:9)
Disabled this test for now, please re-enable when you fix the intermittent. And feel free to flag me for help. master: https://github.com/mozilla-b2g/gaia/commit/4acd3e69b263b54f4111e3586ff4ade84b49b4da
Hey Jared, > If you'd like to kick off multiple gaia-try jobs, I am happy to vouch you for level 1 commit access, > and explain how to create gaia-try test runs. Are you interested in this? Yes, thanks very much. I'm interested in anything I can do to troubleshoot testcases more quickly. I'm not exactly sure how this could help other than enabling me to push directly to gaia-try (removing the github middleman, I assume). I've sent in a committer's agreement and filed a defect for level 1 access. https://bugzilla.mozilla.org/show_bug.cgi?id=1064547 Also, I'll use r=6a68 going forward :)
Flags: needinfo?(6a68)
(In reply to Michael Henretty [:mhenretty] from comment #6) Hey Michael, thanks for all the info! > Just for information: when all the integration tests take more than 2 hours > to run (which they often do now), the stack traces for your test failures > will get eaten by the entire test harness timing out. This makes it much > harder to see what is going on, and this happened on your PR here: Since that's the case, would you by chance have any suggestions how I might further isolate this? It seems like there are several Marionette methods that accomplish the same thing like: findElement & waitForElement, however, I'm getting the feeling there's a 'best practices' for making testcases robust enough to run on TBPL. If I'm able to get commit access, will I then be able to do gaia-try test runs for this 1 testcase and thus won't run into the 2 hour limit (and thus, retain the stack trace)? Just trying to figure out where to go from here. Thanks!
Flags: needinfo?(mhenretty)
(In reply to Richard Pappalardo from comment #9) > Since that's the case, would you by chance have any suggestions how I might > further isolate this? It seems like there are several Marionette methods > that accomplish the same thing like: findElement & waitForElement, however, > I'm getting the feeling there's a 'best practices' for making testcases > robust enough to run on TBPL. Unfortunately there is no definitive guide, but there is some helpful information sprinkled throughout the internet. Here is a good starting place, I think: https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_integration_tests#Use_cases_and_design_patterns > If I'm able to get commit access, will I then be able to do gaia-try test > runs for this 1 testcase and thus won't run into the 2 hour limit (and thus, > retain the stack trace)? Level 1 access will only give you the ability to re-run ALL the Gij for a particular pull request on Gaia-Try. You cannot re-run individual tests using the tbpl web interface (at least not until we get bug 1046694). So for now, you are stuck waiting ~2 hours for each re-run, and hoping it won't cross the two hour threshold. If you really want to have quicker runs, you could go through each app besides system and delete the test/marionette folders, but that sounds like more than a PITA than what is worth it.
Flags: needinfo?(mhenretty)
> I'm not exactly sure how this could help other than > enabling me to push directly to gaia-try (removing the github middleman, I > assume). Gaia-try access lets you run multiple tests in parallel, so that you could kick off (say) 20 test runs with and without a patch applied, then use statistics to decide if the measured change in stability is significant. I resorted to this statistical approach when stabilizing Persona's tests, as I couldn't figure out any other way to decide if a given patch was worth merging.
Flags: needinfo?(6a68)
Awesome, Jared. Thanks!
(In reply to Michael Henretty [:mhenretty] from comment #7) > Disabled this test for now, please re-enable when you fix the intermittent. > And feel free to flag me for help. > > master: > https://github.com/mozilla-b2g/gaia/commit/ > 4acd3e69b263b54f4111e3586ff4ade84b49b4da Looks like the "disable" is not working. The test still ran in current master branch, see the job[1]. We could disable the tests with `skip` method. [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=47754558&tree=Gaia-Try#error0 [2] https://tbpl.mozilla.org/php/getParsedLog.php?id=47751426&tree=Gaia-Try#error0
And looks like the failure is about the test code of fxa module.
Component: Gaia::UI Tests → Gaia::System
Summary: [Marionette] FxA tests - intermittent timeout failures on system/fxa/test/marionette/fxa_screen_flow_test.js → TEST-UNEXPECTED-FAIL | /builds/slave/test/gaia/apps/system/fxa/test/marionette/fxa_screen_flow_test.js | Firefox Accounts Screen Flow Test (UITest app) should walk screen flow for existing user
Attached file Disable the test
Disable the tests, please feel free to re-enable them once we fix the issue.
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #13) > Looks like the "disable" is not working. > The test still ran in current master branch, see the job[1]. > We could disable the tests with `skip` method. My bad! I had the wrong path to the file. Let's correct this so that we are only disabling it in one place (ie. using the manifest). master: https://github.com/mozilla-b2g/gaia/commit/6d54cd7f6a04a33aca3882762a29e674d6553035 Richard, when you go to re-enable, please just remove it from the manifest files. I already removed the test.skip() parts. Sorry about the confusion.
This should belong to FxA module if the failure is caused by the test code itself.
Component: Gaia::System → FxA
(In reply to Michael Henretty [:mhenretty] from comment #17) > (In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #13) > Richard, when you go to re-enable, please just remove it from the manifest > files. I already removed the test.skip() parts. Sorry about the confusion. Thanks, Michael, Evan! Will do.
Hey Richard, How's it going? Would you like me to take this bug off your hands?
Flags: needinfo?(rpappalardo)
Hi Jared, sorry I got pulled into new projects and our team is down by 2 people right now, but I did schedule time this Friday to pick it back up again come hell or high water. If you hurtin for automation coverage then by all means, feel free to take it. If not, however, I'm determined to get it out the door either way :) Your notes on getting setup with gaia-try were extremely helpful, btw.
Flags: needinfo?(rpappalardo)
Hey Jared, Looking at your etherpad: https://id.etherpad.mozilla.org/intro-to-gaia-20140303? I've gotten down to line #129, and have just one question: > Inside gecko-dev, make a new branch corresponding to your gaia branch Do you mean, git clone my gaia branch over gecko-dev/b2g/gaia? The rest of the steps seem to be fairly clear for doing gaia-try
Flags: needinfo?(6a68)
Richard and I chatted about this on vidyo
Flags: needinfo?(6a68)
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Fixing this would be great to have integration test in context of bug 1150823.
Status: RESOLVED → REOPENED
Flags: needinfo?(mbdejong)
Resolution: WORKSFORME → ---
Blocks: 1150823
we're aware of the issues breaking both of these and are working on a fix. * fxa_screen_flow_test * fxa_launch_test see: fxa_Bug 1189144 - Intermittent fxa_launch_test.js | Firefox Accounts Launch Tests Should launch FxA flow from FxA-consuming apps: Settings/UITest/test-fxa-client app
>(In reply to Richard Pappalardo [:rpapa][:rpappalardo] from comment #27) > working on a fix. Awesome! @gerard-majax, I'll stay out of the kitchen, then. :) I just started working on the FxOS-FxA code and I'm eager to learn how it all works, so do let me know if I can help with anything in the future.
Flags: needinfo?(mbdejong)
Depends on: 1205918
A fix has been submitted. just waiting on review: https://github.com/mozilla-b2g/gaia/pull/31922
(In reply to OrangeFactor Robot from comment #129) > 86 automation job failures were associated with this bug in the last 7 days. > > Repository breakdown: > * mozilla-inbound: 34 > * fx-team: 20 > * b2g-inbound: 18 > * mozilla-central: 9 > * try: 5 > > Platform breakdown: > * b2g-linux64: 83 > * mulet-linux64: 3 We are still seeing this test fail in automation regularly. Richard, is there anyone still looking into this?
Flags: needinfo?(rpappalardo)
(In reply to Michael Henretty [:mhenretty] from comment #130) > (In reply to OrangeFactor Robot from comment #129) > > 86 automation job failures were associated with this bug in the last 7 days. > > > > Repository breakdown: > > * mozilla-inbound: 34 > > * fx-team: 20 > > * b2g-inbound: 18 > > * mozilla-central: 9 > > * try: 5 > > > > Platform breakdown: > > * b2g-linux64: 83 > > * mulet-linux64: 3 > > We are still seeing this test fail in automation regularly. Richard, is > there anyone still looking into this? Hi Michael, YiFan, We merged a PR to gaia last week to fix these intermittent failures on apps/system/fxa. I was surprised when I noticed the same failures persisting, but it looks like the tv_apps folks merged these testcases into tv_apps/smart-system/fxa/test prior to us committing our fixes. see: Bug 1202360 - Bring Firefox Account handling modules back to TV system YiFan, we have some fixes in apps/system/fxa. Can you copy them over to your tests?
Flags: needinfo?(rpappalardo) → needinfo?(yliao)
Yes, thank you! I'll make a patch for this.
Flags: needinfo?(yliao)
Depends on: 1221390
Pretty sure this was fixed with YiFan's patch in bug 1221390. Let's close this for now and re-open if OrangeFactorBot keeps yelling at us.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
(In reply to Michael Henretty [:mhenretty] from comment #133) > Pretty sure this was fixed with YiFan's patch in bug 1221390. Let's close > this for now and re-open if OrangeFactorBot keeps yelling at us. That's good news! Thanks, Michael, YiFan. We'll keep an eye on it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
YiFran, Richard, any ideas why this is still failing?
Flags: needinfo?(yliao)
Flags: needinfo?(rpappalardo)
(In reply to Michael Henretty [:mhenretty] from comment #138) > YiFran, Richard, any ideas why this is still failing? Not sure how to repro these kinds of failures. They pass for us locally 100% of the time. I've seen a thread on dev-gaia indicating that many intermittents may possibly be caused by the framework itself (rather than the tests). Do you know anything about this?
Flags: needinfo?(rpappalardo) → needinfo?(mhenretty)
(In reply to Richard Pappalardo [:rpapa][:rpappalardo] from comment #140) > (In reply to Michael Henretty [:mhenretty] from comment #138) > > YiFran, Richard, any ideas why this is still failing? > > Not sure how to repro these kinds of failures. They pass for us locally 100% > of the time. I've seen a thread on dev-gaia indicating that many > intermittents may possibly be caused by the framework itself (rather than > the tests). Do you know anything about this? Yeah, there was a time in the not too distant path where our framework had issues and would cause all tests to fail sporadically. It looks like that has been cleared up, and since this test specifically fails often I would say the problem is with the test itself. We should figure out why it's only the TV app test that fails and not the regular one, that's a big hint anyway. If we can't figure this out, I'll probably just disable this test next week since it's duplicated anyway.
Flags: needinfo?(mhenretty)
The error ( https://tools.taskcluster.net/task-inspector/#V1NXYCtWTIC4lq9PxkuNLA/4 ) looks the same as the errors we saw on the tv marionette tests after migrating marionette tests to mulet, if I remember correctly. It would probably be resolved if we figure out why tv marionette tests fail. Agreed that we could skip it for now.
Flags: needinfo?(yliao)
Blocks: 1227527
Attachment #8706294 - Flags: review?(mhenretty)
Comment on attachment 8706294 [details] [review] [gaia] ferjm:bug1064305.disable.marionette.fxa.tv > mozilla-b2g:master That's fine with me, but are we still seeing this test fail? Looks like it hasn't been reported on treeherder since Jan 1.
Attachment #8706294 - Flags: review?(mhenretty) → review+
Adding Phil in case this is relevant for his FxA device-registration patch
The thing is that these tests are basically testing what the ones at apps/system/fxa/test/marionette/fxa_screen_flow_test.js are already testing [1]. So I'd prefer to disable them until bug 1238505 is fixed and we can properly test the FxA TV code. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1238505#c1
I am closing this one as resolved fixed because the phone tests haven't failed since Jan 1 and this commit disables the ones for the TV. https://github.com/mozilla-b2g/gaia/commit/9cd0e343b833b680deb5e4d18f5a43168703fe9b I'll enable the tests again on bug 1238505.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: