Closed Bug 1772416 Opened 2 years ago Closed 2 years ago

Address intermittent WPT failure: measure_associated_with_navigation_timing.html

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: mcomella, Assigned: mcomella)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Bug 1772416 - enable measure_associated_with_navigation_timing on android. r=sefeng 2 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772416 - print failure values in user-timing/ test_greater_than*. r=sefeng 2 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772416 - sleep in measure_associated_with_navigation_timing. r=sefeng 2 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772416 - increase sleep duration in measure_associated... WPT. r=sefeng 2 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 48 bytes, text/x-phabricator-request		Details \| Review

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Description

•

2 years ago

•

Edited

We currently pass 6/9 tests, according to wpt.fyi: https://wpt.fyi/results/user-timing/measure_associated_with_navigation_timing.html?label=experimental&label=master&aligned

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

2 years ago

Type: task → enhancement

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 1

•

2 years ago

I don't know what changed but these are all green now.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INVALID

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 2

•

2 years ago

•

Edited

These actually appear to be intermittently failing so reopening.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

2 years ago

Blocks: 1564975

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 3

•

2 years ago

This doesn't fail in our CI on desktop because the test isn't disabled. Similarly, I've been unable to reproduce the error on macOS with ./mach wpt --retry-until-unexpected. I wonder what's different about WPT's configuration that it fails intermittently on Linux but not in our CI.

Curiously, the tests that fail intermittently are actually the same tests that are disabled on Android for intermittent failures. I was unable to reproduce these failures on the Android emulator at all. I tried with ./mach wpt --retry-until-unexpected or ./mach wpt <test> --verify (the latter failed but due to a harness timeout). I was also unable to reproduce after 10 runs on try or by visiting http://wpt.live/user-timing/measure_associated_with_navigation_timing.html and rerunning the tests manually tens of times.

After talking with :sefeng, we decided to take two approaches:

re-enable the tests on Android in CI and see if there are failures (since we can't reproduce by hand)
when the tests fail, add the specific timings to the error message so we get debug info in wpt.fyi

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 4

•

2 years ago

See bug 1461298 for the intermittent failures that caused these tests to be disabled on Android.

Comment 5

•

2 years ago

Attached file Bug 1772416 - enable measure_associated_with_navigation_timing on android. r=sefeng — Details

This test failed intermittently 4 years ago when it was disabled. However, I'm
unable to reproduce the failure today on try (10 iterations) or locally so let's
re-enable to see if it's still an issue.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → michael.l.comella

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 6

•

2 years ago

Attached file Bug 1772416 - print failure values in user-timing/ test_greater_than*. r=sefeng — Details

Here is sample output if this assertion function fails:
FAIL Second measure of current mark to navigationStart should be negative value.

assert_greater_than: Second measure of current mark to navigationStart should
be negative value. expected a number greater than 352.1 but got 332.44

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

2 years ago

Keywords: leave-open

Pulsebot

Comment 7

•

2 years ago

Pushed by mcomella@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e25f63783185
enable measure_associated_with_navigation_timing on android. r=sefeng

Pulsebot

Comment 8

•

2 years ago

Pushed by mcomella@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/47a07cd0d028
print failure values in user-timing/ test_greater_than*. r=sefeng

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 9

•

2 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/35506 for changes under testing/web-platform/tests

Marian-Vasile Laza

Comment 10

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e25f63783185

Norisz Fay [:noriszfay]

Comment 11

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/47a07cd0d028

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 12

•

2 years ago

Upstream PR merged by moz-wptsync-bot

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 13

•

2 years ago

With the additional debugging info, the new error messages on wpt.fyi (after clicking the "Show Details" selector) are:

FAIL message: assert_greater_than: Second measure of current mark to navigationStart should be negative value. expected a number greater than 84 but got 84

FAIL message: assert_greater_than: Measure from domComplete event to most recent mark "a" should have longer duration. expected a number greater than 2 but got 2

FAIL message: assert_greater_than: Measure from most recent mark to navigationStart should have longer duration. expected a number greater than -84 but got -84

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 14

•

2 years ago

Attached file Bug 1772416 - sleep in measure_associated_with_navigation_timing. r=sefeng — Details

The root cause of the intermittent failure is that the test records one set of
PerformanceMarks, records another, and verifies the duration of the measures of
the second set are larger than the first set but they're unexpectedly equal.

I speculate this happens because either the test executes fast enough that the
clock doesn't increment or our precision reduction code (to mitigate against
timing attacks) is rounding the before/after values to the same value.

This solution is speculative because I was unable to reproduce the error
locally so need to test it on WPT CI. The solution may fail intermittently if we
don't sleep for long enough.

A more correct solution to this problem would be to mock the clock and
programmatically advance it but I didn't want to take the time. Another possible
solution, that is stable, is to change the assertions to test_greater_or_equal
but this could hide legitimate bugs, e.g. we accidentally return the same
PerformanceMeasure for both queries.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

2 years ago

Summary: Make WPT tests pass: measure_associated_with_navigation_timing.html → Address intermittent WPT failure: measure_associated_with_navigation_timing.html

Pulsebot

Comment 15

•

2 years ago

Pushed by mcomella@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/93de313ddf70
sleep in measure_associated_with_navigation_timing. r=sefeng

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 16

•

2 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/35603 for changes under testing/web-platform/tests

Cosmin Sabou [:CosminS]

Comment 17

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/93de313ddf70

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 18

•

2 years ago

Upstream PR merged by moz-wptsync-bot

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 19

•

2 years ago

Over the past two weeks, we've seen 1 failure (on 2022-09-09T10:52:01.202Z) out of 17 runs. This is much improved but not good enough.

My best idea is to just sleep longer in the test. To make this less random, we can determine how much we reduce precision by and calculate the theoretical worst case and sleep longer than that (assuming the precision reduction is the root cause of this issue).

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 20

•

2 years ago

Attached file Bug 1772416 - increase sleep duration in measure_associated... WPT. r=sefeng — Details

Over two weeks, this test intermittently failed on Firefox in 1/17 runs. After
reading the timestamp precision reduction code, the cause of the failure
appears to be that we're not sleeping long enough. We clamp the value (i.e.
discard sub-millisecond precision) and randomly round up to the next millisecond
value via the midpoint & clampedAndJittered code. In the worst case, this will
cause values 1.999ms apart to be rounded to the same value, e.g. 10ms and
11.999ms round to 11ms. In theory, we need to sleep at least 2ms but we've been
sleeping 1ms.

Pulsebot

Comment 21

•

2 years ago

Pushed by mcomella@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/61831cfe799a
increase sleep duration in measure_associated... WPT. r=sefeng

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 22

•

2 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/35855 for changes under testing/web-platform/tests

Iulian Moraru

Comment 23

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/61831cfe799a

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 24

•

2 years ago

Upstream PR merged by moz-wptsync-bot

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 25

•

2 years ago

Next step: review in ~2 weeks to see if we're still experiencing intermittent test failures.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 26

•

2 years ago

There have been no failures in 20 runs from 9/29 to 10/4 (6 days) so I'm closing this as fixed.

Status: REOPENED → RESOLVED

Closed: 2 years ago → 2 years ago

Keywords: leave-open

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.