Closed Bug 1675073 Opened 4 years ago Closed 3 years ago

Intermittent gtest timeout during ImageDecoders.AVIFStackCheck

Categories

(Core :: Graphics: ImageLib, defect, P3)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox82 --- unaffected
firefox83 --- unaffected
firefox84 --- disabled
firefox85 --- fixed

People

(Reporter: gbrown, Assigned: bas.schouten)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Beginning around Oct 27, intermittent gtest timeouts began being reported to bug 1115253. Most (all?) such failures are on Windows 10 debug, and occur during the ImageDecoders.AVIFStackCheck test:

[task 2020-11-03T13:57:52.158Z] 13:57:52     INFO -  TEST-START | ImageDecoders.AVIFStackCheck
[task 2020-11-03T14:02:52.220Z] 14:02:52     INFO -  gtest INFO | gtest | process wait complete, returncode=572
[task 2020-11-03T14:02:52.220Z] 14:02:52  WARNING -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
Blocks: 1115253
Flags: needinfo?(bas)
Regressed by: 1672597
Has Regression Range: --- → yes

There was some initial discussion about this issue in bug 1115253; note especially https://bugzilla.mozilla.org/show_bug.cgi?id=1115253#c306.

Set release status flags based on info from the regressing bug 1672597

Assignee: nobody → bas
Status: NEW → ASSIGNED
Flags: needinfo?(bas)

Andrew, this issue appears to be addressed by the patch here https://treeherder.mozilla.org/jobs?repo=try&revision=cb1eef5a7364e8b6743466c34a69095d770cab0c&selectedTaskRun=b8ltO3y4SxCjUn-ahy8Hnw.0

Apparently we set a non-default size for new thread sizes so I have to manually force the threads to have a larger stack. The old code did this as well. You seemed to believe this code wasn't used in the wild, do you think we should do this?

Flags: needinfo?(aosmond)

Yes it is off by default:

https://searchfox.org/mozilla-central/rev/277ab3925eae21b419167a34624ec0ab518d0c94/image/DecoderFactory.cpp#87
https://searchfox.org/mozilla-central/rev/277ab3925eae21b419167a34624ec0ab518d0c94/modules/libpref/init/StaticPrefList.yaml#5016

But the gtests do force the pref on. You can choose to increase the stack size if the pref is set. I thought most of the work was done in the RDD process these days, so I'm sort of surprised it still has such a high stack requirement for AVIF.

Flags: needinfo?(aosmond)

AVIF doesn't use RDD (yet).

The severity field is not set for this bug.
:ahal, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(ahal)
Severity: -- → S3
Component: GTest → ImageLib
Flags: needinfo?(ahal)
Priority: -- → P3
Product: Testing → Core
Version: Default → unspecified

In addition to the failures reported here, there were another 32 failures reported in bug 1115253; it looks like they are all in AVIFStackCheck.

:bas.schouten -- Thanks for figuring this out! What's the plan going forward? If this won't be fixed for a while, can we disable AVIFStackCheck on Windows?

Flags: needinfo?(bas)

Honestly, I'm willing to land the patch as well despite the memory cost, I think this is up to the ImageLib guys to decide and depends a lot on what the future of the AVIF looks like for us.

Flags: needinfo?(bas) → needinfo?(aosmond)

Recent roadmap discussions indicate the intent is to flip it on in nightly "soon," so we will need a solution. What I don't know is if the RDD work will be done before that happens. jbauman knows all things AVIF better than me, so I will defer to him :). I'm fine with signing off on the increased stacks if the RDD work won't block turning it on in nightly -- we don't have much of a choice.

Flags: needinfo?(aosmond) → needinfo?(jbauman)

We don't plan to move AVIF decoding to RDD before enabling it by default (in nightly and release), but we may well do so in the future.

Flags: needinfo?(jbauman)

Can we go ahead and disable bug 1115253 until you have a fix here?

Flags: needinfo?(jbauman)
Flags: needinfo?(aosmond)

I'm still a bit unclear as to why this test failure is occurring or why it is (or was?) intermittent, but it does seem clear that the change Bas landed for bug 1672597 had the effect of removing the stack size increase which I made in order to support dav1d decoding in bug 1639409. The point of adding the ImageDecoders.AVIFStackCheck test was to catch exactly this sort of regression, and considering we're on the cusp of enabling AVIF in release (with dav1d as the default decoder), I think we should put back the stack size increase rather than disable the test. Do you have any objections to that, Bas?

Flags: needinfo?(jbauman) → needinfo?(bas)

(In reply to Jon Bauman [:jbauman:] from comment #16)

I'm still a bit unclear as to why this test failure is occurring or why it is (or was?) intermittent, but it does seem clear that the change Bas landed for bug 1672597 had the effect of removing the stack size increase which I made in order to support dav1d decoding in bug 1639409. The point of adding the ImageDecoders.AVIFStackCheck test was to catch exactly this sort of regression, and considering we're on the cusp of enabling AVIF in release (with dav1d as the default decoder), I think we should put back the stack size increase rather than disable the test. Do you have any objections to that, Bas?

Not at all, it would be a small memory increase now that we don't lower the thread count anymore. I actually already have a patch for this. I thought I attached it to this bug but I guess I didn't. Do I need to do it in every process for now or are we okay just doing it in the RDD process?

Flags: needinfo?(bas)

The increased stack size (512K) is needed in any process which uses dav1d. Currently that's RDD for AV1 and content processes for AVIF. We may subsequently move the AVIF decoding into RDD, but that hasn't happened yet, and likely won't change before we enable AVIF by default in release. More specifically, the current AVIF implementation needs the expanded stack size on the image decoding threads, which I implemented here: https://hg.mozilla.org/mozilla-central/rev/301cb0eb1e48#l4.12. Is that clear, or should we chat synchronously?

Flags: needinfo?(bas)

(In reply to Jon Bauman [:jbauman:] from comment #18)

The increased stack size (512K) is needed in any process which uses dav1d. Currently that's RDD for AV1 and content processes for AVIF. We may subsequently move the AVIF decoding into RDD, but that hasn't happened yet, and likely won't change before we enable AVIF by default in release. More specifically, the current AVIF implementation needs the expanded stack size on the image decoding threads, which I implemented here: https://hg.mozilla.org/mozilla-central/rev/301cb0eb1e48#l4.12. Is that clear, or should we chat synchronously?

It is, there's some memory cost to this as the TaskController threadpool doesn't grow or shrink. But it does only allocate its threads once they're actually used for image decoding, so I suspect it will be okay. I will upload a patch here this week.

Flags: needinfo?(bas)
Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/de82276dbe0f
Manually set the stack size for TaskController pool threads. r=aosmond,smaug
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: