Intermittent gtest timeout during ImageDecoders.AVIFStackCheck
Categories
(Core :: Graphics: ImageLib, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | --- | disabled |
firefox85 | --- | fixed |
People
(Reporter: gbrown, Assigned: bas.schouten)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
Beginning around Oct 27, intermittent gtest timeouts began being reported to bug 1115253. Most (all?) such failures are on Windows 10 debug, and occur during the ImageDecoders.AVIFStackCheck test:
[task 2020-11-03T13:57:52.158Z] 13:57:52 INFO - TEST-START | ImageDecoders.AVIFStackCheck
[task 2020-11-03T14:02:52.220Z] 14:02:52 INFO - gtest INFO | gtest | process wait complete, returncode=572
[task 2020-11-03T14:02:52.220Z] 14:02:52 WARNING - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
Reporter | ||
Comment 1•4 years ago
|
||
https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=windows%2Cgtest&tochange=9cbcab2100578fca1accdfa5eca4360f131d0c5a&fromchange=29b5f82540f0eff1203e41bab6582f599cea233f
suggests that this failure began with bug 1672597. (While there are intermittent gtest failures on earlier revisions, the ImageDecoders.AVIFStackCheck hang is only seen in https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=W0RicLmjSMiDI5VpYaXBww.0&searchStr=windows%2Cgtest&revision=9cbcab2100578fca1accdfa5eca4360f131d0c5a and later.)
:bas.schouten - Any idea?
Updated•4 years ago
|
Reporter | ||
Comment 2•4 years ago
|
||
There was some initial discussion about this issue in bug 1115253; note especially https://bugzilla.mozilla.org/show_bug.cgi?id=1115253#c306.
Comment 3•4 years ago
|
||
Set release status flags based on info from the regressing bug 1672597
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 5•4 years ago
|
||
Andrew, this issue appears to be addressed by the patch here https://treeherder.mozilla.org/jobs?repo=try&revision=cb1eef5a7364e8b6743466c34a69095d770cab0c&selectedTaskRun=b8ltO3y4SxCjUn-ahy8Hnw.0
Apparently we set a non-default size for new thread sizes so I have to manually force the threads to have a larger stack. The old code did this as well. You seemed to believe this code wasn't used in the wild, do you think we should do this?
Comment hidden (Intermittent Failures Robot) |
Comment 7•4 years ago
|
||
Yes it is off by default:
https://searchfox.org/mozilla-central/rev/277ab3925eae21b419167a34624ec0ab518d0c94/image/DecoderFactory.cpp#87
https://searchfox.org/mozilla-central/rev/277ab3925eae21b419167a34624ec0ab518d0c94/modules/libpref/init/StaticPrefList.yaml#5016
But the gtests do force the pref on. You can choose to increase the stack size if the pref is set. I thought most of the work was done in the RDD process these days, so I'm sort of surprised it still has such a high stack requirement for AVIF.
Comment 8•4 years ago
|
||
AVIF doesn't use RDD (yet).
Comment 9•4 years ago
|
||
The severity field is not set for this bug.
:ahal, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 11•3 years ago
|
||
In addition to the failures reported here, there were another 32 failures reported in bug 1115253; it looks like they are all in AVIFStackCheck.
:bas.schouten -- Thanks for figuring this out! What's the plan going forward? If this won't be fixed for a while, can we disable AVIFStackCheck on Windows?
Assignee | ||
Comment 12•3 years ago
|
||
Honestly, I'm willing to land the patch as well despite the memory cost, I think this is up to the ImageLib guys to decide and depends a lot on what the future of the AVIF looks like for us.
Comment 13•3 years ago
|
||
Recent roadmap discussions indicate the intent is to flip it on in nightly "soon," so we will need a solution. What I don't know is if the RDD work will be done before that happens. jbauman knows all things AVIF better than me, so I will defer to him :). I'm fine with signing off on the increased stacks if the RDD work won't block turning it on in nightly -- we don't have much of a choice.
Comment 14•3 years ago
|
||
We don't plan to move AVIF decoding to RDD before enabling it by default (in nightly and release), but we may well do so in the future.
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Can we go ahead and disable bug 1115253 until you have a fix here?
Comment 16•3 years ago
|
||
I'm still a bit unclear as to why this test failure is occurring or why it is (or was?) intermittent, but it does seem clear that the change Bas landed for bug 1672597 had the effect of removing the stack size increase which I made in order to support dav1d decoding in bug 1639409. The point of adding the ImageDecoders.AVIFStackCheck
test was to catch exactly this sort of regression, and considering we're on the cusp of enabling AVIF in release (with dav1d as the default decoder), I think we should put back the stack size increase rather than disable the test. Do you have any objections to that, Bas?
Assignee | ||
Comment 17•3 years ago
|
||
(In reply to Jon Bauman [:jbauman:] from comment #16)
I'm still a bit unclear as to why this test failure is occurring or why it is (or was?) intermittent, but it does seem clear that the change Bas landed for bug 1672597 had the effect of removing the stack size increase which I made in order to support dav1d decoding in bug 1639409. The point of adding the
ImageDecoders.AVIFStackCheck
test was to catch exactly this sort of regression, and considering we're on the cusp of enabling AVIF in release (with dav1d as the default decoder), I think we should put back the stack size increase rather than disable the test. Do you have any objections to that, Bas?
Not at all, it would be a small memory increase now that we don't lower the thread count anymore. I actually already have a patch for this. I thought I attached it to this bug but I guess I didn't. Do I need to do it in every process for now or are we okay just doing it in the RDD process?
Comment 18•3 years ago
|
||
The increased stack size (512K) is needed in any process which uses dav1d. Currently that's RDD for AV1 and content processes for AVIF. We may subsequently move the AVIF decoding into RDD, but that hasn't happened yet, and likely won't change before we enable AVIF by default in release. More specifically, the current AVIF implementation needs the expanded stack size on the image decoding threads, which I implemented here: https://hg.mozilla.org/mozilla-central/rev/301cb0eb1e48#l4.12. Is that clear, or should we chat synchronously?
Assignee | ||
Comment 19•3 years ago
|
||
(In reply to Jon Bauman [:jbauman:] from comment #18)
The increased stack size (512K) is needed in any process which uses dav1d. Currently that's RDD for AV1 and content processes for AVIF. We may subsequently move the AVIF decoding into RDD, but that hasn't happened yet, and likely won't change before we enable AVIF by default in release. More specifically, the current AVIF implementation needs the expanded stack size on the image decoding threads, which I implemented here: https://hg.mozilla.org/mozilla-central/rev/301cb0eb1e48#l4.12. Is that clear, or should we chat synchronously?
It is, there's some memory cost to this as the TaskController threadpool doesn't grow or shrink. But it does only allocate its threads once they're actually used for image decoding, so I suspect it will be okay. I will upload a patch here this week.
Assignee | ||
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
Pushed by bschouten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/de82276dbe0f Manually set the stack size for TaskController pool threads. r=aosmond,smaug
Comment 22•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•