Open Bug 1857246 Opened 1 year ago Updated 1 year ago

Intermittent layout/generic/test/test_scroll_position_restore.html failures on new Ubuntu 22.04 Wayland debug pool

Tracking

()

Status:

NEW

People

(Reporter: ahal, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

Bug 1857246: Request longer timeout threshold for mochitest test_scroll_position_restore.html. r?#layout,ahal 1 year ago Daniel Holbert [:dholbert] 48 bytes, text/x-phabricator-request		Details \| Review

Andrew Halberstadt [:ahal]

Reporter

Description

•

1 year ago

This is happening on a new Ubuntu 22.04 Wayland enabled pool we're in the process of standing up.

Example failure log:
https://treeherder.mozilla.org/logviewer?job_id=431356063&repo=try&lineNumber=6870

[task 2023-10-04T20:59:03.544Z] 20:59:03     INFO - TEST-START | layout/generic/test/test_scroll_position_restore.html
[task 2023-10-04T20:59:58.467Z] 20:59:58     INFO - GECKO(6317) | [Child 6500, Main Thread] WARNING: flushing shaped-word cache: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFont.cpp:3194
[task 2023-10-04T21:02:03.140Z] 21:02:03     INFO - GECKO(6317) | [Child 6500, Main Thread] WARNING: flushing shaped-word cache: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFont.cpp:3194
[task 2023-10-04T21:03:41.515Z] 21:03:41     INFO - GECKO(6317) | [Child 6500, Main Thread] WARNING: flushing shaped-word cache: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFont.cpp:3194
[task 2023-10-04T21:04:14.727Z] 21:04:14     INFO - TEST-INFO | started process screentopng
[task 2023-10-04T21:04:18.830Z] 21:04:18     INFO - TEST-INFO | screentopng: exit 0
[task 2023-10-04T21:04:18.830Z] 21:04:18     INFO - TEST-UNEXPECTED-FAIL | layout/generic/test/test_scroll_position_restore.html | Test timed out. - 
[task 2023-10-04T21:04:18.830Z] 21:04:18     INFO - TEST-PASS | layout/generic/test/test_scroll_position_restore.html | undefined assertion name 
[task 2023-10-04T21:04:18.830Z] 21:04:18     INFO - GECKO(6317) | MEMORY STAT | vsize 2682MB | residentFast 373MB | heapAllocated 202MB
[task 2023-10-04T21:04:18.832Z] 21:04:18    ERROR - TEST-UNEXPECTED-FAIL | SimpleTest | this test already called finish!
[task 2023-10-04T21:04:18.833Z] 21:04:18     INFO - TEST-UNEXPECTED-ERROR | layout/generic/test/test_scroll_position_restore.html | called finish() multiple times
[task 2023-10-04T21:04:18.834Z] 21:04:18     INFO - TEST-INFO took 312969ms

I believe this is just platform that pushes this test to the edge of the timeout window.

Daniel Holbert [:dholbert]

Comment 1

•

1 year ago

Attached file Bug 1857246: Request longer timeout threshold for mochitest test_scroll_position_restore.html. r?#layout,ahal — Details

This test does take a little while (in part because it loads an extremely-tall
2.6MB HTML helper-file). We're thinking that this is making it overshoot the
default test-timeout value on the new Ubuntu 22.04 Wayland test pool, so let's
just use a longer timeout threshold and see if that's enough to make it
reliably pass.

Phabricator Automation

Updated

•

1 year ago

Assignee: nobody → dholbert

Status: NEW → ASSIGNED

Daniel Holbert [:dholbert]

Comment 2

•

1 year ago

Try: https://treeherder.mozilla.org/jobs?repo=try&revision=45c5cbf2f17bd258c8c733cb967bb2f7f22afa0d

Daniel Holbert [:dholbert]

Comment 3

•

1 year ago

Hmm, it looks like the task itself is close to overstepping the max run time for the task-as-a-whole, and reenabling this test can make that happen as in this orange log from Try:
https://firefoxci.taskcluster-artifacts.net/SAoPT-M1TMe_R18YYfR2jg/0/public/logs/live_backing.log

So we need to further subdivide into additional "buckets" of tests before we can reenable this one, I think.

ahal, do you know how to do that?

Severity: -- → S3

Flags: needinfo?(ahal)

Daniel Holbert [:dholbert]

Comment 4

•

1 year ago

•

Edited

Alternately, maybe we could increase the timeout threshold for the task as a whole?

One thing I'm noticing when looking at that log from comment 3: the tests don't actually get a chance to start running for 13min, so that time is occupied by various testrunner-environment-setup tasks (from timestamp 7:35 - 7:48):

[taskcluster 2023-10-14T07:35:37.397Z] Worker Type (gecko-t/t-linux-vm-2204-wayland) settings:
[...]
[task 2023-10-14T07:48:24.941Z] 07:48:24     INFO - SUITE-START | Running 664 tests
[task 2023-10-14T07:48:24.942Z] 07:48:24     INFO - TEST-START | dom/base/test/test_bug444546.html
[...]

And then the task hits its max run time at timestamp 8:34, about an hour after starting:

[taskcluster:error] Aborting task...
[taskcluster:warn 2023-10-14T08:34:09.055Z] no such process
[taskcluster 2023-10-14T08:34:09.116Z] === Task Finished ===
[taskcluster 2023-10-14T08:34:09.116Z] Task Duration: 58m30.094373522s
[...]
[taskcluster:error] Task aborted - max run time exceeded

i.e the first quarter of the max run time is just environment-setup-overhead.

Maybe we could/should just increase the task "max run time" for this task? (instead of splitting it up)

Daniel Holbert [:dholbert]

Comment 5

•

1 year ago

Hmm, that 13min env-setup-time might have been an anomaly; in the log from comment 0, SUITE-START is only 4min after the start of the log, which is much nicer. Might be something to look into there.)

Daniel Holbert [:dholbert]

Comment 6

•

1 year ago

•

Edited

(In reply to Daniel Holbert [:dholbert] from comment #3)

Hmm, it looks like the task itself is close to overstepping the max run time for the task-as-a-whole,

Another issue: it looks like the requestLongerTimeout(2) in the attached patch isn't actually sufficient, too. The test itself still times out occasionally (after ~10 minutes), as i noticed after ~30 retriggers of this task on Try with the attached patch:
https://treeherder.mozilla.org/logviewer?job_id=432915850&repo=try&lineNumber=12358
https://treeherder.mozilla.org/logviewer?job_id=433268911&repo=try&lineNumber=12538

At this point it's probably best to leave the test disabled on this platform/configuration, if it really takes that long. Though it's also a bit interesting/concerning that it's this timeout-prone on this particular configuration but not on others.

(I guess we do have bug 1786425 filed on this test intermittently timing out on debug builds very occasionally, but this new platform seems to be timing out with a higher frequency (even with 2x the timeout threshold) as compared to the frequency of intermittent timeouts in bug 1786425.)

Assignee: dholbert → nobody

Status: ASSIGNED → NEW

Daniel Holbert [:dholbert]

Updated

•

1 year ago

Comment 7

•

1 year ago

•

Edited

Sorry for the late reply.

The 16 chunks is coming from here:
https://searchfox.org/mozilla-central/rev/e4afef5d3ff67781dc1377912344694f3cf3a226/taskcluster/ci/test/mochitest.yml#141

We could add another line below that like:

linux.*-wayland/debug: 20

But yes, I agree it's strange that doubling the timeout still isn't always enough. Definitely feels like something else is going on.

Flags: needinfo?(ahal)

Daniel Holbert [:dholbert]

Updated

•

1 year ago

Depends on: 1860872

Daniel Holbert [:dholbert]

Comment 8

•

1 year ago

I think my patch in bug 1860872 will fix this more properly, by making the test reliably complete faster.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Intermittent layout/generic/test/test_scroll_position_restore.html failures on new Ubuntu 22.04 Wayland debug pool

Categories

(Core :: Layout, defect)

Tracking

()

People

(Reporter: ahal, Unassigned)

References

(Depends on 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated

Comment 8

Attachment

General

Description

File Name

Content Type