Open Bug 1661368 Opened 4 years ago Updated 2 years ago

Slow testcase impacts AVIF fuzzing performance

Categories

(Core :: Graphics: ImageLib, defect, P4)

defect

Tracking

()

Tracking Status
firefox82 --- disabled

People

(Reporter: tsmith, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: leave-open, testcase)

Attachments

(5 files)

Attached image testcase.avif

The attached (relatively small) testcase takes just over 10 seconds to run.

Ideally for this target we would like to get iteration rates of tens of iterations per second (or more if possible) for most inputs (with exceptions such as large images). I think running with a timeout of 5 seconds per test case is reasonable (perhaps higher if the slow cases are infrequent) if not please let me know.

:jbauman, can you comment to this bug?

Flags: needinfo?(jbauman)
Severity: -- → S3

(In reply to Sotaro Ikeda [:sotaro] from comment #1)

:jbauman, can you comment to this bug?

Yes, planning to look at this today.

Assignee: nobody → jbauman
Status: NEW → ASSIGNED
Flags: needinfo?(jbauman)

Adding leave-open, since I want to add some additional logging here

Keywords: leave-open

Add finer-grained logging to evaluate where decoding is taking too long.

For reference, the attached test case this file from the Netflix suite of AVIF test images. It hasn't been modified in any way, so we don't expect any sort of pathological performance issues.

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b89f24b3eddd Slow testcase impacts AVIF fuzzing performance. r=tsmith

This issue is proving somewhat mysterious. Running a non-fuzzing build locally, this image takes about 1 second for dav1d do decode on a debug build and ~0.15 seconds on an optimized build. Given that the automated fuzzing runs with optimization setting 02, it's especially confusing why this image would take orders of magnitude longer.

Attached image testcase_2.avif

Here is another one that seem to consistently hang (for about 5 seconds) on my local fuzzing build (44ee384376ce). It seems to load fine in my Nightly opt build.

Attached file hang_2.txt

(In reply to Tyson Smith [:tsmith] from comment #8)

Here is another one that seem to consistently hang (for about 5 seconds) on my local fuzzing build (44ee384376ce). It seems to load fine in my Nightly opt build.

Same. In a local non-fuzzing build, the entire rendering process takes about 295 ms in a release optimized build, but takes ~10,576 ms in your above log. Could you share the mozconfig you're using for your local fuzzing build?

Flags: needinfo?(twsmith)

While I can't specifically run the fuzzing build on my macOS machine, I created a build as close to the fuzzing configuration as I could and got some interesting results.

While the first testcase renders in about 1.6 seconds with this fuzzing-like build (considerably slower than the 0.3 seconds it takes on a normal release optimized local build), the second testcase takes nearly 10 seconds with the fuzz-like build, compared to about 0.3 seconds on a normal one. It looks like there's a particular nsAVIFDecoder::ReadSourceReadSource call that is taking a suspiciously long time to complete. This doesn't really explain the behavior on the first testcase, but it does give me something specific to investigate. I'll look into this more on Monday.

Attached file sample_mozconfig

Looks like you got it but just for the record here is the mozconfig.

Flags: needinfo?(twsmith)

The leave-open keyword is there and there is no activity for 6 months.
:jbauman, maybe it's time to close this bug?

Flags: needinfo?(jbauman)

I'd still intended to come back to track this down. Tyson, is this still impacting our fuzzing, or do you think we should close it?

Flags: needinfo?(jbauman) → needinfo?(twsmith)

(In reply to Jon Bauman [:jbauman:] from comment #15)

I'd still intended to come back to track this down. Tyson, is this still impacting our fuzzing, or do you think we should close it?

I tested the attached test cases with m-c 20210325-2da6d806f457 and here is what I see:

testcase.avif does still trigger a significant slowdown.

Running: testcase.avif
Executed testcase.avif in 8049 ms

The slowdown triggered by testcase_2.avif may have been slow due to large allocations. Running with ASAN_OPTIONS="max_allocation_size_mb=512" I see:

Running: testcase_2.avif
==358926==WARNING: AddressSanitizer failed to allocate 0x7a8380000 bytes
==358926==WARNING: AddressSanitizer failed to allocate 0x7a8380000 bytes
Executed testcase_2.avif in 75 ms
Flags: needinfo?(twsmith)

The leave-open keyword is there and there is no activity for 6 months.
:jbauman, maybe it's time to close this bug?

Flags: needinfo?(jbauman)

Still worth addressing, just haven't been able to get to it yet

Flags: needinfo?(jbauman)

The leave-open keyword is there and there is no activity for 6 months.
:jbauman, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jbauman)

Leave open; I think this may be a good issue for someone new to the AVIF codebase to investigate as a way to learn.

Flags: needinfo?(jbauman)
Assignee: jbauman → nobody
Status: ASSIGNED → NEW
Priority: -- → P4

This sample is YCbCr, 444, 12 bit and PQ, so PQ means HDR, you really broke support for HDR in last 5 or 6 versions. BTW, other such PQ samples are here: https://github.com/mpv-player/mpv/issues/10247

The leave-open keyword is there and there is no activity for 6 months.
:aosmond, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(aosmond)
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: