Closed Bug 1784938 Opened 3 years ago Closed 6 months ago

High frequency dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug

Categories

(Core :: DOM: Animation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
142 Branch
Tracking Status
firefox142 --- fixed

People

(Reporter: jmaher, Assigned: mconley)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

No description provided.

Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1784938

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 1 year ago8 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

There have been 69 total failures in the last 7 days.
There are:

  • 7 failures on linux1804-64-asan-qr opt
  • 1 failure on linux1804-64-ccov-qr opt
  • 1 failure on linux2204-64-wayland opt
  • 4 failures on linux2204-64-wayland-shippable opt
  • 12 failures on macosx1470-64 & macosx1470-64-shippable opt
  • 3 failures on windows10-64-2009-qr opt
  • 13 failures on windows10-64-2009-shippable-qr opt
  • 1 failure on windows11-32-24h2 opt
  • 27 failures on windows11-64-24h2, windows11-64-24h2-devedition and windows11-64-24h2-shippable opt

Recent failure log.

Joel, could your changes from https://bugzilla.mozilla.org/show_bug.cgi?id=1968587#c5 cause this frequent failure?

Flags: needinfo?(jmaher)

yes, this seems to be related, great detective work to hunt this down and let me know. let me work on a solution.

Flags: needinfo?(jmaher)

this is easy to reproduce, removing a 1 second sleep before starting tests seems to be a problem. I noticed this is disabled on linux/opt already (probably due to frequent/perma failure).

in general this seems to work very reliable on debug/asan/tsan. Is there a way to determine what this test is depending on which would cause it to fail in opt mode?

If that isn't realistic, would you have an idea how to add a 1 second delay before tests in this file start running?

lastly, I could expand the skip-if to be "opt"

Flags: needinfo?(boris.chiou)

What does it mean when document.timeline.currentTime is negative? Does it indicate a problem in the test, or a platform bug?

Flags: needinfo?(dholbert)

(In reply to Florian Quèze [:florian] from comment #124)

What does it mean when document.timeline.currentTime is negative? Does it indicate a problem in the test, or a platform bug?

It seems we may have to redesign this test. We assume the document is not load yet at this moment, per the comment. So I suspect there are some time-constraints in this test (i.e. some other changes may cause any race condition).

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #120)

this is easy to reproduce, removing a 1 second sleep before starting tests seems to be a problem. I noticed this is disabled on linux/opt already (probably due to frequent/perma failure).

I cannot reproduce this locally. However, we probably have to redesign this test. Feel free to update the annotation as intermittent for now.

Hiro, do you recall anything for this part? Should we update the design to make sure we don't get negative document.timeline.currentTime?

Flags: needinfo?(hikezoe.birchill)
Flags: needinfo?(dholbert)
Flags: needinfo?(boris.chiou)

(In reply to Boris Chiou [:boris] from comment #125)

I cannot reproduce this locally. However, we probably have to redesign this test. Feel free to update the annotation as intermittent for now.

Locally for me on a fast Linux machines it reproduces when running with --run-until-failure (typically after 4 or 5 runs).

(In reply to Boris Chiou [:boris] from comment #126)

Hiro, do you recall anything for this part? Should we update the design to make sure we don't get negative document.timeline.currentTime?

If we get negative number there, it definitely means there's a platform bug. :/

Flags: needinfo?(hikezoe.birchill)
Summary: Intermittent dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug → High frequency dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug

The DocumentTimeline.currentTime calculation could return negative values when the
refresh driver timestamp was behind the navigation start timestamp due to IPC or
compositor delays. This occurred because GetCurrentTimeStamp() didn't apply the
same protection logic that UpdateLastRefreshDriverTime() already had.

Assignee: nobody → mconley
Whiteboard: [stockwell disable-recommended]
Pushed by mconley@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/b077c064ef13 https://hg.mozilla.org/integration/autoland/rev/8c892cf54059 Fix intermittent negative DocumentTimeline.currentTime values due to timing race condition. r=hiro
Status: REOPENED → RESOLVED
Closed: 8 months ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 142 Branch

(In reply to Intermittent Failures Robot from comment #140)

This test has failed more than 150 times in the last 21 days. It should be disabled until it can be fixed.

Fortunately this is getting much better, per below.

21 failures were associated with this bug in the last 7 days.
[...]

Repository breakdown:

  • mozilla-release: 7
  • mozilla-beta: 14

Nice -- only beta/release were affected last week. No failures on mozilla-central/autoland. So presumably the patch in comment 138 fixed things here.

Today is merge day, so beta should start being "good" as of today, too. But release will still be affected for the next 4 weeks (until our next merge day).

If the fix were a test-change, we could trivially uplift it to release since there's no user impact -- but that's not the case here. The fix here was an actual code-change. So probably not the sort of thing we'd want to just uplift directly to the release branch mid-cycle.

So: I think that means this test is doomed to keep failing on release (but nowhere else) for the next 4 weeks. If the rate is too high, we could disable the test on release, but assuming trends continue with 0-7 failures per week on release, this should hopefully be minimally-painful to live with for the time being.

Depends on: 1980607
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: