1784938 - High frequency dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug

7 failures on linux1804-64-asan-qr opt
1 failure on linux1804-64-ccov-qr opt
1 failure on linux2204-64-wayland opt
4 failures on linux2204-64-wayland-shippable opt
12 failures on macosx1470-64 & macosx1470-64-shippable opt
3 failures on windows10-64-2009-qr opt
13 failures on windows10-64-2009-shippable-qr opt
1 failure on windows11-32-24h2 opt
27 failures on windows11-64-24h2, windows11-64-24h2-devedition and windows11-64-24h2-shippable opt

Recent failure log.

Joel, could your changes from https://bugzilla.mozilla.org/show_bug.cgi?id=1968587#c5 cause this frequent failure?

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 119

•

7 months ago

yes, this seems to be related, great detective work to hunt this down and let me know. let me work on a solution.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 120

•

7 months ago

this is easy to reproduce, removing a 1 second sleep before starting tests seems to be a problem. I noticed this is disabled on linux/opt already (probably due to frequent/perma failure).

in general this seems to work very reliable on debug/asan/tsan. Is there a way to determine what this test is depending on which would cause it to fail in opt mode?

If that isn't realistic, would you have an idea how to add a 1 second delay before tests in this file start running?

lastly, I could expand the skip-if to be "opt"

Flags: needinfo?(boris.chiou)

Comment hidden (Intermittent Failures Robot)

Florian Quèze [:florian]

Comment 124

•

7 months ago

What does it mean when document.timeline.currentTime is negative? Does it indicate a problem in the test, or a platform bug?

Flags: needinfo?(dholbert)

Boris Chiou [:boris]

Comment 125

•

7 months ago

(In reply to Florian Quèze [:florian] from comment #124)

What does it mean when document.timeline.currentTime is negative? Does it indicate a problem in the test, or a platform bug?

It seems we may have to redesign this test. We assume the document is not load yet at this moment, per the comment. So I suspect there are some time-constraints in this test (i.e. some other changes may cause any race condition).

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #120)

this is easy to reproduce, removing a 1 second sleep before starting tests seems to be a problem. I noticed this is disabled on linux/opt already (probably due to frequent/perma failure).

I cannot reproduce this locally. However, we probably have to redesign this test. Feel free to update the annotation as intermittent for now.

Boris Chiou [:boris]

Comment 126

•

7 months ago

Hiro, do you recall anything for this part? Should we update the design to make sure we don't get negative document.timeline.currentTime?

Flags: needinfo?(hikezoe.birchill)

Flags: needinfo?(dholbert)

Flags: needinfo?(boris.chiou)

Florian Quèze [:florian]

Comment 127

•

7 months ago

(In reply to Boris Chiou [:boris] from comment #125)

I cannot reproduce this locally. However, we probably have to redesign this test. Feel free to update the annotation as intermittent for now.

Locally for me on a fast Linux machines it reproduces when running with --run-until-failure (typically after 4 or 5 runs).

Hiroyuki Ikezoe (:hiro)

Comment 128

•

7 months ago

(In reply to Boris Chiou [:boris] from comment #126)

Hiro, do you recall anything for this part? Should we update the design to make sure we don't get negative document.timeline.currentTime?

If we get negative number there, it definitely means there's a platform bug. :/

Flags: needinfo?(hikezoe.birchill)

Comment hidden (Intermittent Failures Robot)

amarc

Updated

•

7 months ago

Summary: Intermittent dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug → High frequency dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug

Comment hidden (Intermittent Failures Robot)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 133

•

6 months ago

Attached file Bug 1784938 - Fix intermittent negative DocumentTimeline.currentTime values due to timing race condition. r=hiro! — Details

The DocumentTimeline.currentTime calculation could return negative values when the
refresh driver timestamp was behind the navigation start timestamp due to IPC or
compositor delays. This occurred because GetCurrentTimeStamp() didn't apply the
same protection logic that UpdateLastRefreshDriverTime() already had.

Phabricator Automation

Updated

•

6 months ago

Assignee: nobody → mconley

Comment hidden (Intermittent Failures Robot)

Cristina Horotan [:chorotan]

Updated

•

6 months ago

Whiteboard: [stockwell disable-recommended]

Pulsebot

Comment 138

•

6 months ago

Pushed by mconley@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/b077c064ef13 https://hg.mozilla.org/integration/autoland/rev/8c892cf54059 Fix intermittent negative DocumentTimeline.currentTime values due to timing race condition. r=hiro

Cristina Horotan [:chorotan]

Comment 139

•

6 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8c892cf54059

Status: REOPENED → RESOLVED

Closed: 8 months ago → 6 months ago

status-firefox142: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 142 Branch

Comment hidden (Intermittent Failures Robot)

Daniel Holbert [:dholbert]

Comment 141

•

6 months ago

•

Edited

(In reply to Intermittent Failures Robot from comment #140)

This test has failed more than 150 times in the last 21 days. It should be disabled until it can be fixed.

Fortunately this is getting much better, per below.

21 failures were associated with this bug in the last 7 days.
[...]

Repository breakdown:

mozilla-release: 7

mozilla-beta: 14

Nice -- only beta/release were affected last week. No failures on mozilla-central/autoland. So presumably the patch in comment 138 fixed things here.

Today is merge day, so beta should start being "good" as of today, too. But release will still be affected for the next 4 weeks (until our next merge day).

If the fix were a test-change, we could trivially uplift it to release since there's no user impact -- but that's not the case here. The fix here was an actual code-change. So probably not the sort of thing we'd want to just uplift directly to the release branch mid-cycle.

So: I think that means this test is doomed to keep failing on release (but nowhere else) for the next 4 weeks. If the rate is too high, we could disable the test on release, but assuming trends continue with 0-7 failures per week on release, this should hopefully be minimally-painful to live with for the time being.

Comment hidden (Intermittent Failures Robot)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

5 months ago

Depends on: 1980607

Comment hidden (Intermittent Failures Robot)

	1proc	1proc-s	1proc-swr	a11y-checks	a11y-checks-s	a11y-checks-swr	a11y-checks-swr-s	condprof-s	fis	fis-hv	fis-hv-s	fis-s	headless	headless-s	http2	http3	msix	msix-s	no_variant	nofis	nofis-aab	nofis-s	nofis-ship	nofis-ship-s	nofis-spi	nofis-swr	nogpu	nogpu-s	s	spi	spi-nw	spi-nw-1proc	spi-nw-cf	spi-nw-nofis	spi-nw-s	swr	swr-s	vt	vt-s	wmfme	wmfme-s	xorig	xorig-s
linux2204-64-wayland/opt																			1
linux2204-64-wayland-shippable/opt																													3

	!fission	headless	http2	http3	no_variant	socketprocess_networking	xorigin
linux1804-x86_64/asan					1
linux2204-x86_64/opt					1
macosx1470-x86_64/opt					1
windows10-x86_64/opt					1
windows11-x86/opt					1
windows11-x86_64/opt					1

	!fission	headless	http2	http3	no_variant	socketprocess_networking	xorigin
linux1804-x86_64/asan					1
linux2204-x86_64/opt					1
macosx1470-x86_64/opt					1
windows10-x86_64/opt					1
windows11-x86_64/opt					1

	headless	no_variant	xorigin
linux1804-x86_64/asan		3
linux2204-x86_64/opt		1
macosx1470-x86_64/opt		4
windows10-x86_64/opt		5
windows11-x86_64/opt		8

	headless	no_variant	xorigin
linux1804-x86_64/asan		9
linux1804-x86_64/ccov		1
linux2204-x86_64/opt		8
macosx1470-x86_64/opt		18
windows10-x86_64/opt		16
windows11-x86_64/opt		26

	!fission	headless	http2	http3	no_variant	socketprocess_networking	xorigin
linux2204-x86_64/opt					1
linux2404-x86_64/ccov					1
macosx1470-x86_64/opt					6
windows10-x86_64/opt					5
windows11-x86_64/opt					14

	!fission	headless	http2	http3	no_variant	socketprocess_networking	xorigin
linux2204-x86_64/opt					14/18
linux2404-x86_64/ccov					3/17
macosx1470-x86_64/opt					45
windows10-x86_64/opt					39
windows11-x86_64/opt					51

	no_variant	xorigin
linux2204-x86_64/opt	1/18
macosx1470-x86_64/opt	10/44
windows10-x86_64/opt	5/20
windows11-x86_64/opt	5/20

	no_variant	xorigin
linux2204-x86_64/opt	2/18
macosx1470-x86_64/opt	2/18
windows10-x86_64/opt	1/18
windows11-x86_64/opt	1

	no_variant	xorigin
linux2204-x86_64/opt	2/14
macosx1470-x86_64/opt	2/13
windows10-x86_64/opt	3/14
windows11-x86_64/opt	2/17

High frequency dom/animation/test/document-timeline/test_document-timeline.html | single tracking bug

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see:

Summary

Repository breakdown: