Closed Bug 1054793 Opened 5 years ago Closed 5 years ago

Intermittent test_shared_all.py TestSharedUnits.test_units | NoSuchElementException: NoSuchElementException: Unable to locate element: complete

Categories

(Hello (Loop) :: Client, defect, P1)

x86
Linux
defect

Tracking

(firefox32 unaffected, firefox33 fixed, firefox34 fixed, firefox-esr24 unaffected, firefox-esr31 unaffected)

RESOLVED FIXED
mozilla34
Iteration:
34.3
Tracking Status
firefox32 --- unaffected
firefox33 --- fixed
firefox34 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: jib)

References

Details

(Keywords: intermittent-failure)

Attachments

(4 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=46132787&tree=Fx-Team

Ubuntu VM 12.04 fx-team debug test marionette on 2014-08-17 13:46:54 PDT for push 37ac55a26014
slave: tst-linux32-spot-517

14:00:50     INFO -  TEST-START | test_shared_all.py TestSharedUnits.test_units
14:01:04    ERROR -  TEST-UNEXPECTED-ERROR | test_shared_all.py TestSharedUnits.test_units | NoSuchElementException: NoSuchElementException: Unable to locate element: complete
14:01:04     INFO -  Traceback (most recent call last):
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 171, in run
14:01:04     INFO -      testMethod()
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/tests/browser/components/loop/test/shared/test_shared_all.py", line 16, in test_units
14:01:04     INFO -      self.check_page("index.html")
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/tests/browser/components/loop/test/shared/frontend_tester.py", line 89, in check_page
14:01:04     INFO -      self.marionette.find_element("id", 'complete')
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 1274, in find_element
14:01:04     INFO -      response = self._send_message('findElement', 'value', **kwargs)
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/decorators.py", line 35, in _
14:01:04     INFO -      return func(*args, **kwargs)
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 634, in _send_message
14:01:04     INFO -      self._handle_error(response)
14:01:04     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 668, in _handle_error
14:01:04     INFO -      raise errors.NoSuchElementException(message=message, status=status, stacktrace=stacktrace)
14:01:04     INFO -  TEST-INFO expected PASS | took 13406ms
There was substantial discussion and debugging about this yesterday in #ateam, starting at

http://logs.glob.uno/?c=mozilla%23ateam&s=18+Aug+2014&e=18+Aug+2014#c843455

And pretty much any pieces of the chat log that I'm involved up to and including 22:02.

The try server runs that chmanster mentioned pushing did not yield any assertions.  A reasonable next step might be to either push that to mozilla-central since it might only assert in the failing case.  Even better would be to do something like I describe starting around 21:54.
Randell, Jan-Ivar -- I'd like to find a regression window for this bug and then see if we can eliminate this intermittent.
Target Milestone: --- → mozilla34
Priority: -- → P1
Jib -- Can you look at this?  First we want to find regression window and then find the root cause.  We suspect that Standard8's checkin of the initial web sockets work (bug 1022594) on Friday caused this intermittent, but we don't know.  Standard8 is on PTO until Monday and dmose is unavailable for questions until tomorrow (Thursday).  However, please feel free to ask for support in #media.
Assignee: nobody → jib
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #12)

Sorry for leaving this hanging. The IRC chat in comment 7 suggests fixing the logging harness to emit the AssertionError message that would tell us which assertion failed on future runs on tbpl (though I see no more TBPL Robot messages here since Tuesday) - I don't have access rights to push to inbound as suggested nor do I have a good grasp on the marionette harness, so I may not be the best person to pursue that avenue.

However, it reproduces well locally for me (but again without assertion message):
> ★ ~/moz/mozilla-central $ ./mach marionette-test browser/components/loop/manifest.ini
>  ...
>  0:11.45 LOG: MainThread INFO starting httpd
>  0:11.45 LOG: MainThread INFO running webserver on http://127.0.0.1:56581/
>  0:11.45 SUITE_START: MainThread 3
>  0:11.46 TEST_START: MainThread test_shared_all.py TestSharedUnits.test_units
>  0:27.58 TEST_END: MainThread ERROR, expected PASS
> Traceback (most recent call last):
>   File "/Users/Jan/moz/mozilla-central/testing/marionette/client/marionette/marionette_test.py", line 171, in run
>     testMethod()
>   File "/Users/Jan/moz/mozilla-central/browser/components/loop/test/shared/test_shared_all.py", line 16, in test_units
>     self.check_page("index.html")
>   File "/Users/Jan/moz/mozilla-central/browser/components/loop/test/shared/frontend_tester.py", line 89, in check_page
>     self.marionette.find_element("id", 'complete')
>   File "/Users/Jan/moz/mozilla-central/testing/marionette/client/marionette/marionette.py", line 1278, in find_element
>     response = self._send_message('findElement', 'value', **kwargs)
>   File "/Users/Jan/moz/mozilla-central/testing/marionette/client/marionette/decorators.py", line 35, in _
>     return func(*args, **kwargs)
>   File "/Users/Jan/moz/mozilla-central/testing/marionette/client/marionette/marionette.py", line 638, in _send_message
>     self._handle_error(response)
>   File "/Users/Jan/moz/mozilla-central/testing/marionette/client/marionette/marionette.py", line 672, in _handle_error
>     raise errors.NoSuchElementException(message=message, status=status, stacktrace=stacktrace)
> NoSuchElementException: NoSuchElementException: Unable to locate element: complete

But I'm not sure, is it possible to run marionette tests on mozregression builds?
Flags: needinfo?(dmose)
As I mentioned in the linked discussion on Monday, I haven't reproduced a case where the test harness omits relevant logging. The log pasted above and the one associated with the failure in automation look to me like an assertion originating from python that logs a relevant call stack and message.

My hypothesis wasn't proved by the push to try, and relies on the idea that somewhere in the test run the current working directory is intermittently changed to something unexpected, so this may be a long shot.
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #12)
> We suspect that Standard8's checkin of the
> initial web sockets work (bug 1022594) on Friday caused this intermittent,
> but we don't know.

Ok, useful data point... starting to try and link some dots...

(In reply to Jan-Ivar Bruaroey [:jib] from comment #13)
> However, it reproduces well locally for me (but again without assertion
> message):

Ok, I think the assertion message is a separate bug/issue.


> > NoSuchElementException: NoSuchElementException: Unable to locate element: complete

This could be the key. With this message, there's two possible issues that I know of:

1) The page is taking too long to process, and hits the 10 second timeout.
2) There's been a fatal error on the page.

Jib, as you can reproduce, can you try a couple of things:

1) Set the timeout in this line to something like 60 seconds (or even longer maybe): http://hg.mozilla.org/mozilla-central/annotate/daa84204a11a/browser/components/loop/test/shared/frontend_tester.py#l66

See if that removes the issue.

2) Grab a screenshot of the failing case.

- In http://hg.mozilla.org/mozilla-central/annotate/daa84204a11a/browser/components/loop/test/shared/frontend_tester.py#l89
- Wrap the find_element('complete') in a try..except..raise with dumping a screenshot, similar to:

try:
    self.marionette.find_element("id", 'complete')
except:
    print "data:image/png;base64,%s" % self.marionette.screenshot()
    raise

> But I'm not sure, is it possible to run marionette tests on mozregression
> builds?

Lets see what the above give us first. To explain the dots I'm starting to link:

If "complete" is missing, the likelyhood is that something has gone wrong in the page or timed out. Bug 1022594 added some new code, that was working fine locally, but did introduce ~150ms delays into the test occasionally.

test_shared_all.py is the first test to run in the sequence of tests for the Loop folder (though I don't know how Marionette runs wrt browser loads/restarts on tbpl). If the browser is slow starting up and was previously on the edge, then a 150ms delay or two could push us over the 10 seconds that we wait for "complete" to be displayed.
Meant to needinfo Jib on that last comment.
Flags: needinfo?(dmose) → needinfo?(jib)
Blocks: 1058038
(In reply to Mark Banner (:standard8) from comment #15)
> 1) The page is taking too long to process, and hits the 10 second timeout.
> 2) There's been a fatal error on the page.
> 
> Jib, as you can reproduce, can you try a couple of things:
> 
> 1) Set the timeout in this line to something like 60 seconds (or even longer
> maybe):
> http://hg.mozilla.org/mozilla-central/annotate/daa84204a11a/browser/
> components/loop/test/shared/frontend_tester.py#l66
> 
> See if that removes the issue.

That was totally it! On my machine the first test takes around 17s, and this timed out at around 12s. I need a faster machine apparently. :-)

Is 60 the number we want?
Flags: needinfo?(jib) → needinfo?(standard8)
Comment on attachment 8478299 [details] [diff] [review]
increase marionette find_element timeout for intermittent

Review of attachment 8478299 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I think 60 is reasonable here as it is a timeout value.
Attachment #8478299 - Flags: review?(standard8) → review+
Flags: needinfo?(standard8)
Attached image snapshot1
Attached image snapshot2
https://hg.mozilla.org/mozilla-central/rev/78a46df3e3bd
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Thanks for the fix, jib!  One not is that the comment above the statement you changed has become wrong.  A good fix for that might be to not put the constant in the comment at all.  :-)
Attached patch comment fixSplinter Review
(In reply to Dan Mosedale (:dmose - away until 8/21) from comment #25)
> Thanks for the fix, jib!  One not is that the comment above the statement
> you changed has become wrong.  A good fix for that might be to not put the
> constant in the comment at all.  :-)

Ugh, sorry missed that.
Attachment #8478695 - Flags: review?(dmose)
Comment on attachment 8478695 [details] [diff] [review]
comment fix

Review of attachment 8478695 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  r=dmose
Attachment #8478695 - Flags: review?(dmose) → review+
Status: RESOLVED → REOPENED
Keywords: checkin-needed
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/d3b46bc636ad
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Untracking for QA. Please needinfo me to request specific testing.
Whiteboard: [qa-]
I'm not entirely sure, but I think bug 1000240 has encountered this bug again, preventing it to land; see:

https://bugzilla.mozilla.org/show_bug.cgi?id=1000240#c40
https://tbpl.mozilla.org/php/getParsedLog.php?id=49008339&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=49009152&tree=Fx-Team

As this might be something different and me not being an expert with Marionette, NI :dmose and :Standard8 here.
Flags: needinfo?(standard8)
Flags: needinfo?(dmose)
(In reply to Nicolas Perriault (:NiKo`) — needinfo me if you need my attention from comment #33)
> I'm not entirely sure, but I think bug 1000240 has encountered this bug
> again, preventing it to land; see:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1000240#c40
> https://tbpl.mozilla.org/php/getParsedLog.php?id=49008339&tree=Fx-Team
> https://tbpl.mozilla.org/php/getParsedLog.php?id=49009152&tree=Fx-Team
> 
> As this might be something different and me not being an expert with
> Marionette, NI :dmose and :Standard8 here.

Unlikely, for two reasons:

16:46:07     INFO -  TEST-START
...
16:46:16    ERROR -  TEST-UNEXPECTED-ERROR

That's ~ 9 seconds, whereas we've now got a 60 second timeout.

Also, there's:

16:46:16    ERROR -  PROCESS-CRASH | test_standalone_all.py TestDesktopUnits.test_units | application crashed [None]

Which is slightly more critical. I'll follow-up in bug 1000240.
Flags: needinfo?(standard8)
Flags: needinfo?(dmose)
Flags: qe-verify-
Whiteboard: [qa-]
Iteration: --- → 34.3
You need to log in before you can comment on or make changes to this bug.