Intermittent test_ocsp_timeout.js | xpcshell return code: 0 | - false == true

RESOLVED FIXED in Firefox 37

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: RyanVM, Assigned: Cykesiopka)

Tracking

({intermittent-failure})

Trunk
mozilla39
x86
Windows XP
intermittent-failure
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox37 fixed, firefox38 fixed, firefox39 fixed, firefox-esr31 unaffected)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

4 years ago
08:36:10 INFO - TEST-START | security/manager/ssl/tests/unit/test_ocsp_timeout.js
08:36:21 WARNING - TEST-UNEXPECTED-FAIL | security/manager/ssl/tests/unit/test_ocsp_timeout.js | xpcshell return code: 0
08:36:21 INFO - TEST-INFO took 10525ms
08:36:21 INFO - >>>>>>>
08:36:21 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
08:36:21 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
08:36:21 INFO - running event loop
08:36:21 INFO - security/manager/ssl/tests/unit/test_ocsp_timeout.js | Starting
08:36:21 INFO - (xpcshell/head.js) | test pending (2)
08:36:21 INFO - TEST-PASS | security/manager/ssl/tests/unit/test_ocsp_timeout.js | - true == true
08:36:21 INFO - TEST-PASS | security/manager/ssl/tests/unit/test_ocsp_timeout.js | - true == true
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
08:36:21 INFO - (xpcshell/head.js) | test pending (2)
08:36:21 INFO - (xpcshell/head.js) | test finished (2)
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 1 pending (2)
08:36:21 INFO - (xpcshell/head.js) | test finished (2)
08:36:21 INFO - security/manager/ssl/tests/unit/test_ocsp_timeout.js | Starting
08:36:21 INFO - (xpcshell/head.js) | test pending (2)
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 2 pending (3)
08:36:21 INFO - (xpcshell/head.js) | test finished (3)
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 1 finished (2)
08:36:21 INFO - security/manager/ssl/tests/unit/test_ocsp_timeout.js | Starting
08:36:21 INFO - (xpcshell/head.js) | test pending (2)
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 2 finished (2)
08:36:21 INFO - "handling ocsp-stapling-none.example.com"
08:36:21 INFO - TEST-PASS | security/manager/ssl/tests/unit/test_ocsp_timeout.js | - 2153389959 == 2153389959
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 3 pending (2)
08:36:21 INFO - (xpcshell/head.js) | test finished (2)
08:36:21 INFO - security/manager/ssl/tests/unit/test_ocsp_timeout.js | Starting
08:36:21 INFO - (xpcshell/head.js) | test pending (2)
08:36:21 WARNING - TEST-UNEXPECTED-FAIL | security/manager/ssl/tests/unit/test_ocsp_timeout.js | - false == true
08:36:21 INFO - C:/slave/test/build/tests/xpcshell/tests/security/manager/ssl/tests/unit/test_ocsp_timeout.js:add_tests_in_mode/<:62
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_run_next_test:1378
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:do_execute_soon/<.run:640
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_do_main:208
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_execute_test:500
08:36:21 INFO - -e:null:1
08:36:21 INFO - null:null:0
08:36:21 INFO - exiting test
08:36:21 INFO - 2147500036
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_run_next_test:1380
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:do_execute_soon/<.run:640
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_do_main:208
08:36:21 INFO - C:\slave\test\build\tests\xpcshell\head.js:_execute_test:500
08:36:21 INFO - -e:null:1
08:36:21 INFO - null:null:0
08:36:21 INFO - exiting test
08:36:21 INFO - (xpcshell/head.js) | test run_next_test 3 finished (2)
08:36:21 INFO - <<<<<<<
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 33

4 years ago
Created attachment 8572211 [details] [diff] [review]
bug1121117_test_ocsp_timeout-add-logging_v1.patch

I've been investigating this issue every once in a while (e.g. https://treeherder.mozilla.org/#/jobs?repo=try&revision=091fd641644c ).

It would be nice to see exactly what the time difference is, so this patch adds some print logging to reveal what the exact values are.
Attachment #8572211 - Flags: review?(dkeeler)
Comment on attachment 8572211 [details] [diff] [review]
bug1121117_test_ocsp_timeout-add-logging_v1.patch

Review of attachment 8572211 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good to me.
Attachment #8572211 - Flags: review?(dkeeler) → review+
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 37

4 years ago
Thanks for the review.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c97b0d327cb5
Assignee: nobody → cykesiopka.bmo
Status: NEW → ASSIGNED
Keywords: checkin-needed, leave-open
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 53

4 years ago
Created attachment 8578942 [details] [diff] [review]
bug1121117_add-fuzz_v1.patch

After chasing one or two red herrings, it looks like the non-monotonicity of Date.now() is the issue here (see https://treeherder.mozilla.org/#/jobs?repo=try&revision=0bdd62e8d5c1 where I compared Date.now() to what should be a monotonic time source). 

I'm not aware of any way to access a monotonic time source in a xpcshell test other than abusing nsITelemetry::msSinceProcessStart(), so this patch just adds a 300ms fuzz to the assertion. So far, the lowest time difference has been 9765ms, so 300ms fuzz should hopefully be enough.

Try push looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=09b4269a5353
(BuildAPI says this is all green, don't know why TH still says some jobs are running).
Attachment #8578942 - Flags: review?(dkeeler)
Comment on attachment 8578942 [details] [diff] [review]
bug1121117_add-fuzz_v1.patch

Review of attachment 8578942 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/tests/unit/test_ocsp_timeout.js
@@ +65,5 @@
>      // With OCSP hard-fail on, we timeout after 10 seconds.
>      // With OCSP soft-fail, we timeout after 2 seconds.
> +    // Date.now() is not guaranteed to be monotonic, so add extra fuzz time
> +    // to prevent intermittent failures (see Bug 1121117).
> +    const FUZZ_MS = 300;

Seems reasonable. Let's also mention that this only appeared to be a problem on Win XP, so when we no longer support that platform, we can revert this. Alternatively, do you know of a reason we shouldn't use nsITelemetry.msSinceProcessStart() as in the debugging patch?
Attachment #8578942 - Flags: review?(dkeeler) → review+
(Assignee)

Comment 55

4 years ago
(In reply to David Keeler [:keeler] (use needinfo?) from comment #54)
> Alternatively, do you know of a reason we shouldn't use
> nsITelemetry.msSinceProcessStart() as in the debugging patch?

Other than "it felt hacky" to me, none. I guess this is just a test file though... I'll submit a new patch.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 61

4 years ago
Created attachment 8579927 [details] [diff] [review]
bug1121117_monotonic-clock_v1.patch

This patch implements the alternative approach of using nsITelemetry.msSinceProcessStart().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e453e7550fe
Attachment #8578942 - Attachment is obsolete: true
Attachment #8579927 - Flags: review?(dkeeler)
Comment on attachment 8579927 [details] [diff] [review]
bug1121117_monotonic-clock_v1.patch

Review of attachment 8579927 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is quite the approach we should take. More specifically, this test will be simpler and more maintainable if we use only one timing implementation - either telemetry if it doesn't throw exceptions or regular js Date() with the fuzz factor.

::: security/manager/ssl/tests/unit/test_ocsp_timeout.js
@@ +41,5 @@
>    run_next_test();
>  }
>  
>  function add_tests_in_mode(useHardFail) {
> +  const nsITelemetry = Cc["@mozilla.org/base/telemetry;1"]

My feeling about this is that it should be called something like "telemetry" or "telemetryService", since "nsITelemetry" is the name of the interface. Not a big deal, though.

@@ +50,5 @@
>    add_test(function () {
>      Services.prefs.setBoolPref("security.OCSP.require", useHardFail);
> +    startTimeNonMonotonic = new Date();
> +    try {
> +      startTime = nsITelemetry.msSinceProcessStart();

Does this tend to throw exceptions? (does telemetry need to be initialized earlier in the test?) I think it would be best to rely on only one timing implementation. If using telemetry is not reliable, then we'll be falling back to using Date() anyway, and without the fuzz factor we'll still see intermittent failures.
Attachment #8579927 - Flags: review?(dkeeler) → review-
Comment hidden (Treeherder Robot)
(Assignee)

Comment 64

3 years ago
(In reply to David Keeler [:keeler] (use needinfo?) from comment #62)
> I don't think this is quite the approach we should take. More specifically,
> this test will be simpler and more maintainable if we use only one timing
> implementation - either telemetry if it doesn't throw exceptions or regular
> js Date() with the fuzz factor.

Sure.

> My feeling about this is that it should be called something like "telemetry"
> or "telemetryService", since "nsITelemetry" is the name of the interface.
> Not a big deal, though.

Fair.

> Does this tend to throw exceptions? (does telemetry need to be initialized
> earlier in the test?)

I don't think this will throw often (assuming I'm understanding the current implementation correctly).

> If using telemetry is not reliable, then we'll be falling back to using Date()
> anyway, and without the fuzz factor we'll still see intermittent failures.

Right. I guess it's simpler to just add the fuzz factor then.
(Assignee)

Comment 65

3 years ago
Created attachment 8580465 [details] [diff] [review]
bug1121117_add-fuzz_v2.patch

This is attachment 8578942 [details] [diff] [review] with the feedback raised in comment 54 addressed.
Attachment #8579927 - Attachment is obsolete: true
Attachment #8580465 - Flags: review+
(Assignee)

Comment 66

3 years ago
Comment on attachment 8580465 [details] [diff] [review]
bug1121117_add-fuzz_v2.patch

Thanks for the reviews!

(Try push is in comment 53).
Attachment #8580465 - Flags: checkin?
(Assignee)

Updated

3 years ago
Keywords: leave-open
(Reporter)

Updated

3 years ago
Attachment #8580465 - Flags: checkin? → checkin+
Comment hidden (Treeherder Robot)
https://hg.mozilla.org/mozilla-central/rev/8cd33e72eb47
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
(Reporter)

Updated

3 years ago
status-firefox37: --- → affected
status-firefox38: --- → affected
status-firefox-esr31: --- → unaffected
(Reporter)

Comment 70

3 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/8358c6c2c417
status-firefox37: affected → fixed
Flags: in-testsuite+
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
You need to log in before you can comment on or make changes to this bug.