Open
Bug 1412889
Opened 8 years ago
Updated 3 years ago
av1 SIMD functions are huge on Windows, probably due to __forceinline
Categories
(Core :: Audio/Video: Playback, enhancement, P3)
Core
Audio/Video: Playback
Tracking
()
NEW
People
(Reporter: away, Unassigned)
References
Details
Attachments
(1 file)
2.71 KB,
patch
|
Details | Diff | Splinter Review |
Here are the largest functions in a recent win32 nightly:
0 0 350388 1 av1_fht32x32_sse2
0 0 281652 1 style::properties::LonghandId::parse_value
0 0 243092 1 av1_fht16x32_sse2
0 0 228504 1 av1_fht32x16_sse2
0 0 93176 1 av1_fht16x16_sse2
0 0 80841 1 style::properties::PropertyDeclaration::parse_into
0 0 80333 1 style::properties::{{impl}}::to_css<nsstring::nsAString>
0 0 69058 1 style::properties::animated_properties::AnimationValue::uncompute
0 0 68247 1 style::properties::animated_properties::AnimationValue::from_declaration
0 0 67047 1 style::properties::{{impl}}::eq
0 0 66583 1 style::properties::{{impl}}::eq
0 0 63900 1 av1_iht16x16_256_add_sse2
0 0 58704 1 webrender::frame::FlattenContext::flatten_item
0 0 57632 1 webrender::frame_builder::FrameBuilder::build
0 0 55976 1 av1_fwd_txfm2d_16x16_sse4_1
0 0 52276 1 style::properties::animated_properties::{{impl}}::animate
0 0 49496 1 av1_fht32x32_avx2
0 0 47756 1 av1_inv_txfm2d_add_16x16_sse4_1
0 0 43919 1 webrender::renderer::Renderer::new
0 0 40476 1 av1_fht8x16_sse2
0 0 37396 1 vpx_fdct32x32_sse2
0 0 37235 1 nsHtml5AttributeName::initializeStatics
0 0 36536 1 style::properties::animated_properties::{{impl}}::fmt
0 0 36313 1 style::values::specified::image::{{impl}}::parse
0 0 35051 1 vpx_fdct32x32_avx2
0 0 34501 1 style::properties::{{impl}}::clone
0 0 34437 1 style::properties::{{impl}}::clone
0 0 34286 1 style::properties::{{impl}}::substitute_variables::{{closure}}::{{closure}}
0 0 33556 1 av1_iht16x32_512_add_sse2
0 0 33200 1 av1_iht32x16_512_add_sse2
These av1 functions are huge, something like 1.5MB just for the top few.
Taking a quick look at av1_fht16x16_avx2: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#916
It calls a bunch of functions like load_buffer_16x16 that are marked INLINE: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#21
On Windows, aom_config.h (and vpx_config.h) defines INLINE as __forceinline, but on Linux it's merely "inline" which I bet the compiler would rather ignore than create such huge functions. Can we consider doing the same on Windows? Do we really need such intense inlining?
Flags: needinfo?(giles)
Comment 1•8 years ago
|
||
Only profiling will tell, but I suspect this is an oversight. We should try flipping the config setting.
Updated•8 years ago
|
Flags: needinfo?(giles)
I also noticed that these functions are super repetitive, which worsens the over-inlining.
For example av1_fht8x16_sse2 goes like
case N:
load_buffer_8x16(input, in, stride, 0, 0);
array_transpose_8x8(t, t);
array_transpose_8x8(b, b);
{ call a function that depends on the case }
{ call a function that depends on the case }
row_8x16_rounding(in, 2);
{ call a function that depends on the case }
break;
case N+1
...
Could we factor those out into function pointers, and call do this "call seven functions in a row" bit only once?
Updated•8 years ago
|
Priority: -- → P3
This patch removed 0.6MB from xul.dll on win64-pgo and 1.4MB on win32-pgo! (I don't understand why the difference is so dramatic. Also I haven't measured codec speed.)
But it's not suitable for landing, sadly, since this file is meant to be generated from upstream configures.
Comment 4•8 years ago
|
||
I notice there's also a very popular SIMD_INLINE define which maps to AOM_FORCE_INLINE which is defined on both compilers. So things could be smaller, but maybe the inline-as-unrolling issue is already handled by that one.
Comment 5•8 years ago
|
||
Comment 6•8 years ago
|
||
I filed https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for libvpx
Comment 7•7 years ago
|
||
Upstream took the fix, so this should be addressed as soon as we vendor a new update.
Updated•7 years ago
|
Assignee: giles → nobody
Does the libvpx component of this depend on a separate merge bug?
Flags: needinfo?(giles)
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•