Open
Bug 1412889
Opened 6 years ago
Updated 1 year ago
av1 SIMD functions are huge on Windows, probably due to __forceinline
Categories
(Core :: Audio/Video: Playback, enhancement, P3)
Core
Audio/Video: Playback
Tracking
()
NEW
People
(Reporter: away, Unassigned)
References
Details
Attachments
(1 file)
2.71 KB,
patch
|
Details | Diff | Splinter Review |
Here are the largest functions in a recent win32 nightly: 0 0 350388 1 av1_fht32x32_sse2 0 0 281652 1 style::properties::LonghandId::parse_value 0 0 243092 1 av1_fht16x32_sse2 0 0 228504 1 av1_fht32x16_sse2 0 0 93176 1 av1_fht16x16_sse2 0 0 80841 1 style::properties::PropertyDeclaration::parse_into 0 0 80333 1 style::properties::{{impl}}::to_css<nsstring::nsAString> 0 0 69058 1 style::properties::animated_properties::AnimationValue::uncompute 0 0 68247 1 style::properties::animated_properties::AnimationValue::from_declaration 0 0 67047 1 style::properties::{{impl}}::eq 0 0 66583 1 style::properties::{{impl}}::eq 0 0 63900 1 av1_iht16x16_256_add_sse2 0 0 58704 1 webrender::frame::FlattenContext::flatten_item 0 0 57632 1 webrender::frame_builder::FrameBuilder::build 0 0 55976 1 av1_fwd_txfm2d_16x16_sse4_1 0 0 52276 1 style::properties::animated_properties::{{impl}}::animate 0 0 49496 1 av1_fht32x32_avx2 0 0 47756 1 av1_inv_txfm2d_add_16x16_sse4_1 0 0 43919 1 webrender::renderer::Renderer::new 0 0 40476 1 av1_fht8x16_sse2 0 0 37396 1 vpx_fdct32x32_sse2 0 0 37235 1 nsHtml5AttributeName::initializeStatics 0 0 36536 1 style::properties::animated_properties::{{impl}}::fmt 0 0 36313 1 style::values::specified::image::{{impl}}::parse 0 0 35051 1 vpx_fdct32x32_avx2 0 0 34501 1 style::properties::{{impl}}::clone 0 0 34437 1 style::properties::{{impl}}::clone 0 0 34286 1 style::properties::{{impl}}::substitute_variables::{{closure}}::{{closure}} 0 0 33556 1 av1_iht16x32_512_add_sse2 0 0 33200 1 av1_iht32x16_512_add_sse2 These av1 functions are huge, something like 1.5MB just for the top few. Taking a quick look at av1_fht16x16_avx2: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#916 It calls a bunch of functions like load_buffer_16x16 that are marked INLINE: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#21 On Windows, aom_config.h (and vpx_config.h) defines INLINE as __forceinline, but on Linux it's merely "inline" which I bet the compiler would rather ignore than create such huge functions. Can we consider doing the same on Windows? Do we really need such intense inlining?
Flags: needinfo?(giles)
Comment 1•6 years ago
|
||
Only profiling will tell, but I suspect this is an oversight. We should try flipping the config setting.
Updated•6 years ago
|
Flags: needinfo?(giles)
I also noticed that these functions are super repetitive, which worsens the over-inlining. For example av1_fht8x16_sse2 goes like case N: load_buffer_8x16(input, in, stride, 0, 0); array_transpose_8x8(t, t); array_transpose_8x8(b, b); { call a function that depends on the case } { call a function that depends on the case } row_8x16_rounding(in, 2); { call a function that depends on the case } break; case N+1 ... Could we factor those out into function pointers, and call do this "call seven functions in a row" bit only once?
Updated•6 years ago
|
Priority: -- → P3
This patch removed 0.6MB from xul.dll on win64-pgo and 1.4MB on win32-pgo! (I don't understand why the difference is so dramatic. Also I haven't measured codec speed.) But it's not suitable for landing, sadly, since this file is meant to be generated from upstream configures.
Comment 4•6 years ago
|
||
I notice there's also a very popular SIMD_INLINE define which maps to AOM_FORCE_INLINE which is defined on both compilers. So things could be smaller, but maybe the inline-as-unrolling issue is already handled by that one.
Comment 6•6 years ago
|
||
I filed https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for libvpx
Comment 7•6 years ago
|
||
Upstream took the fix, so this should be addressed as soon as we vendor a new update.
Updated•6 years ago
|
Assignee: giles → nobody
Does the libvpx component of this depend on a separate merge bug?
Flags: needinfo?(giles)
Updated•1 year ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•