Closed Bug 1320629 Opened 3 years ago Closed 3 years ago

Intermittent test_execute_script.py TestExecuteContent.test_window_set_timeout_is_not_cancelled | AssertionError: 0 != 1

Categories

(Testing :: Marionette, defect)

Version 3
defect
Not set

Tracking

(firefox51 fixed, firefox52 fixed, firefox53 fixed)

RESOLVED FIXED
mozilla53
Tracking Status
firefox51 --- fixed
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ato)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

 16:58:50     INFO - TEST-UNEXPECTED-FAIL | test_execute_script.py TestExecuteContent.test_window_set_timeout_is_not_cancelled | AssertionError: 0 != 1
 16:58:50     INFO - Traceback (most recent call last):
 16:58:50     INFO -   File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette_test/testcases.py", line 166, in run
 16:58:50     INFO -     testMethod()
 16:58:50     INFO -   File "/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit/test_execute_script.py", line 251, in test_window_set_timeout_is_not_cancelled
16:58:50 INFO - "return window.contentTimeoutTriggered", sandbox=None))

Tracing output:

1480294729812	Marionette	TRACE	conn614 -> [0,8,"executeScript",{"scriptTimeout":null,"newSandbox":true,"args":[],"filename":"test_execute_script.py","script":"return window.contentTimeoutTriggered","sandbox":null,"line":251}]
1480294729829	Marionette	TRACE	conn614 <- [1,8,null,{"value":1}]

Sadly the actual and expected value are in the wrong order in the call to assertEqual() so it's a bit hard to see. But the first invocation of the script returns already 1. So maybe we have a race here, which let us return late from this call. It's a debug build btw which might explain it.
This test was introduced in the patch for bug 1297551, and there is a chance of a race condition occurring in this test simply because I couldn’t find a better way to test this.

(In reply to Henrik Skupin (:whimboo) from comment #1)
>  16:58:50     INFO - TEST-UNEXPECTED-FAIL | test_execute_script.py
> TestExecuteContent.test_window_set_timeout_is_not_cancelled |
> AssertionError: 0 != 1
>  16:58:50     INFO - Traceback (most recent call last):
>  16:58:50     INFO -   File
> "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/
> marionette_test/testcases.py", line 166, in run
>  16:58:50     INFO -     testMethod()
>  16:58:50     INFO -   File
> "/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/
> marionette/tests/unit/test_execute_script.py", line 251, in
> test_window_set_timeout_is_not_cancelled
> 16:58:50 INFO - "return window.contentTimeoutTriggered", sandbox=None))
> 
> Tracing output:
> 
> 1480294729812	Marionette	TRACE	conn614 ->
> [0,8,"executeScript",{"scriptTimeout":null,"newSandbox":true,"args":[],
> "filename":"test_execute_script.py","script":"return
> window.contentTimeoutTriggered","sandbox":null,"line":251}]
> 1480294729829	Marionette	TRACE	conn614 <- [1,8,null,{"value":1}]
> 
> Sadly the actual and expected value are in the wrong order in the call to
> assertEqual() so it's a bit hard to see.

They’re not?

> But the first invocation of the script returns already 1. So maybe we
> have a race here, which let us return late from this call. It's a debug
> build btw which might explain it.

It’s definitely a race condition.  I tried reproducing it with a debug build
with no success, but we could try increasing the `time.sleep(1)` to 1.5 or 2
and see if things improve.
Assignee: nobody → ato
Status: NEW → ASSIGNED
(In reply to Andreas Tolfsen ‹:ato› from comment #2)
> > Sadly the actual and expected value are in the wrong order in the call to
> > assertEqual() so it's a bit hard to see.
> 
> They’re not?

No, you should do assert*(current, expected)

> > But the first invocation of the script returns already 1. So maybe we
> > have a race here, which let us return late from this call. It's a debug
> > build btw which might explain it.
> 
> It’s definitely a race condition.  I tried reproducing it with a debug build
> with no success, but we could try increasing the `time.sleep(1)` to 1.5 or 2
> and see if things improve.

Please get rid of time.sleep() and assertEqual() and replace it with a Wait().until(). Benefit is that we do not loose valuable time for faster builds. So I think the default timeout of 5s should be enough.
(In reply to Henrik Skupin (:whimboo) from comment #4)
> (In reply to Andreas Tolfsen ‹:ato› from comment #2)
> > > Sadly the actual and expected value are in the wrong order in the call to
> > > assertEqual() so it's a bit hard to see.
> > 
> > They’re not?
> 
> No, you should do assert*(current, expected)

 assertEqual(first, second, msg=None)
    Test that first and second are equal. If the values do not compare equal,
    the test will fail.

> > It’s definitely a race condition.  I tried reproducing it with a debug build
> > with no success, but we could try increasing the `time.sleep(1)` to 1.5 or 2
> > and see if things improve.
> 
> Please get rid of time.sleep() and assertEqual() and replace it with a
> Wait().until(). Benefit is that we do not loose valuable time for faster
> builds. So I think the default timeout of 5s should be enough.

Please see the patch.
(In reply to Andreas Tolfsen ‹:ato› from comment #5)
>  assertEqual(first, second, msg=None)
>     Test that first and second are equal. If the values do not compare equal,
>     the test will fail.

It's not defined, right. But it's good practice to stay with one solution and don't mix-up those two. It makes it really hard to investigate failures - I can say that because I did lot of these in the past weeks.
Comment on attachment 8814851 [details]
Bug 1320629 - Increase timeout of test_window_set_timeout_is_not_cancelled;

https://reviewboard.mozilla.org/r/95976/#review95992

Yes, this looks better. See inline comments for nits.

::: testing/marionette/harness/marionette/tests/unit/test_execute_script.py:252
(Diff revision 1)
> +        # subsequent call to execute_script after this
> +        # should not cancel the setTimeout event
>          self.marionette.navigate(inline("""
>              <script>
> -            window.contentTimeoutTriggered = 0;
> -            window.contentTimeoutID = setTimeout(
> +            window.n = 0;
> +            setTimeout(() => ++window.n, 4000);

Maybe set the timeout to 2500ms so it's in the middle of our 5s timeout for until().

::: testing/marionette/harness/marionette/tests/unit/test_execute_script.py:258
(Diff revision 1)
>              </script>"""))
>  
> -        # first execute script call should not cancel event
> +        # as debug builds are inherently slow,
> +        # we need to assert the event did not already fire
>          self.assertEqual(0, self.marionette.execute_script(
> -            "return window.contentTimeoutTriggered", sandbox=None))
> +            "return window.n", sandbox=None),

I would generalize content_timeout_triggered to return the value which we can directly check and avoid code duplication.

::: testing/marionette/harness/marionette/tests/unit/test_execute_script.py:262
(Diff revision 1)
> -        time.sleep(1)
> -        self.assertEqual(1, self.marionette.execute_script(
> -            "return window.contentTimeoutTriggered", sandbox=None))
>  
> -        # ../../../../evaluate.js:/scriptTimeoutID/
> -        # sets the script timeout handler using the content frame script
> +        # if event was cancelled, this will time out
> +        Wait(self.marionette).until(content_timeout_triggered)

Please add a message so we get an explanation instead of a plain TimeoutError.
Attachment #8814851 - Flags: review?(hskupin) → review+
OK, I feel this is completely off topic and nitpicky, but I’ll bite.

(In reply to Henrik Skupin (:whimboo) from comment #6)
> It's not defined, right. But it's good practice to stay with one solution
> and don't mix-up those two. It makes it really hard to investigate failures
> - I can say that because I did lot of these in the past weeks.

You will find that most assertion libraries either do not define this or say that the expected condition should come first.  This is true for unittest, gtest, JUnit, NUnit, and XUnit.

On the other hand, I’m sympathetic to putting the actual test value first as you would if you were using plain assertions:

    #include <assert.h>
    #include <errno.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <unistd.h>

    int
    main(int argc, char* argv[])
    {
            int fd;
            fd = open("/does/not/exist", O_RDONLY);

            if (fd < 0) {
                    perror("badness occurred");
                    assert(errno == 0);
                    goto done;
            }

    done:
            close(fd);
            return 0;
    }

However, in the case of AssertionError: 0 != 1, I really don’t think it matters.  It isn’t a great assertion anyway without the context of the code.
Comment on attachment 8814851 [details]
Bug 1320629 - Increase timeout of test_window_set_timeout_is_not_cancelled;

https://reviewboard.mozilla.org/r/95976/#review95992

> Maybe set the timeout to 2500ms so it's in the middle of our 5s timeout for until().

jgraham suggested using four times the original timeout value in the case of debug builds.  This is based on experience had with WPT.

> I would generalize content_timeout_triggered to return the value which we can directly check and avoid code duplication.

I really don’t think this is necessary.
Comment on attachment 8814851 [details]
Bug 1320629 - Increase timeout of test_window_set_timeout_is_not_cancelled;

https://reviewboard.mozilla.org/r/95976/#review95992

> jgraham suggested using four times the original timeout value in the case of debug builds.  This is based on experience had with WPT.

Then better also bump the timeout for the until() call. Having only 1s remaining is less than ideal. We should balance the timer triggering between both assertion checks.
Comment on attachment 8814851 [details]
Bug 1320629 - Increase timeout of test_window_set_timeout_is_not_cancelled;

https://reviewboard.mozilla.org/r/95976/#review95992

> I really don’t think this is necessary.

Go with it.
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d2c1a3a02f40
Increase timeout of test_window_set_timeout_is_not_cancelled; r=whimboo
Backed out for TestExecuteContent.test_window_set_timeout_is_not_cancelled | TypeError: until() got an unexpected keyword argument 'timeout' bustage.
https://hg.mozilla.org/integration/autoland/rev/e274953486641dbfe3425d2a855537da6aae0304

https://treeherder.mozilla.org/logviewer.html#?job_id=7505972&repo=autoland
Comment on attachment 8814851 [details]
Bug 1320629 - Increase timeout of test_window_set_timeout_is_not_cancelled;

https://reviewboard.mozilla.org/r/95976/#review96862

::: testing/marionette/harness/marionette/tests/unit/test_execute_script.py:263
(Diff revision 3)
> -        self.assertEqual(1, self.marionette.execute_script(
> -            "return window.contentTimeoutTriggered", sandbox=None))
>  
> -        # ../../../../evaluate.js:/scriptTimeoutID/
> -        # sets the script timeout handler using the content frame script
> -        # so the in-content setTimeout should always return 2
> +        # if event was cancelled, this will time out
> +        Wait(self.marionette).until(content_timeout_triggered,
> +                                    timeout=8,

That part I haven't reviewed again. :/ So the timeout argument has to be part of the Wait() call.
Blocks: 1321539
Thanks for noticing.  I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1321539 as a follow-up once this lands.
(In reply to Andreas Tolfsen ‹:ato› from comment #17)
> Thanks for noticing.  I filed bug 1321539 as a follow-up once this lands.

I consider bug 1321539 as invalid. So lets get this intermittent fixed with how Wait().until() is currently used.
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1627dd9be98d
Increase timeout of test_window_set_timeout_is_not_cancelled; r=whimboo
https://hg.mozilla.org/mozilla-central/rev/1627dd9be98d
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Andreas Tolfsen ‹:ato› from comment #2)
> This test was introduced in the patch for bug 1297551, and there is a chance
> of a race condition occurring in this test simply because I couldn’t find a
> better way to test this.

The patch on bug 1297551 got landed on Aurora and Beta, but this test-only patch hasn't been uplifted yet.
Whiteboard: [checkin-needed-aurora][checkin-needed-beta]
Whiteboard: [checkin-needed-aurora][checkin-needed-beta] → [checkin-needed-beta]
You need to log in before you can comment on or make changes to this bug.