Open Bug 1412889 Opened 4 years ago Updated 4 years ago

av1 SIMD functions are huge on Windows, probably due to __forceinline

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: away, Unassigned)

References

Details

Attachments

(1 file)

Here are the largest functions in a recent win32 nightly:

0	0	350388	1	av1_fht32x32_sse2	
0	0	281652	1	style::properties::LonghandId::parse_value	
0	0	243092	1	av1_fht16x32_sse2	
0	0	228504	1	av1_fht32x16_sse2	
0	0	 93176	1	av1_fht16x16_sse2	
0	0	 80841	1	style::properties::PropertyDeclaration::parse_into	
0	0	 80333	1	style::properties::{{impl}}::to_css<nsstring::nsAString>	
0	0	 69058	1	style::properties::animated_properties::AnimationValue::uncompute	
0	0	 68247	1	style::properties::animated_properties::AnimationValue::from_declaration	
0	0	 67047	1	style::properties::{{impl}}::eq	
0	0	 66583	1	style::properties::{{impl}}::eq	
0	0	 63900	1	av1_iht16x16_256_add_sse2	
0	0	 58704	1	webrender::frame::FlattenContext::flatten_item	
0	0	 57632	1	webrender::frame_builder::FrameBuilder::build	
0	0	 55976	1	av1_fwd_txfm2d_16x16_sse4_1	
0	0	 52276	1	style::properties::animated_properties::{{impl}}::animate	
0	0	 49496	1	av1_fht32x32_avx2	
0	0	 47756	1	av1_inv_txfm2d_add_16x16_sse4_1	
0	0	 43919	1	webrender::renderer::Renderer::new	
0	0	 40476	1	av1_fht8x16_sse2	
0	0	 37396	1	vpx_fdct32x32_sse2	
0	0	 37235	1	nsHtml5AttributeName::initializeStatics	
0	0	 36536	1	style::properties::animated_properties::{{impl}}::fmt	
0	0	 36313	1	style::values::specified::image::{{impl}}::parse	
0	0	 35051	1	vpx_fdct32x32_avx2	
0	0	 34501	1	style::properties::{{impl}}::clone	
0	0	 34437	1	style::properties::{{impl}}::clone	
0	0	 34286	1	style::properties::{{impl}}::substitute_variables::{{closure}}::{{closure}}	
0	0	 33556	1	av1_iht16x32_512_add_sse2	
0	0	 33200	1	av1_iht32x16_512_add_sse2

These av1 functions are huge, something like 1.5MB just for the top few.

Taking a quick look at av1_fht16x16_avx2: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#916

It calls a bunch of functions like load_buffer_16x16 that are marked INLINE: https://dxr.mozilla.org/mozilla-central/source/third_party/aom/av1/encoder/x86/hybrid_fwd_txfm_avx2.c#21

On Windows, aom_config.h (and vpx_config.h) defines INLINE as __forceinline, but on Linux it's merely "inline" which I bet the compiler would rather ignore than create such huge functions. Can we consider doing the same on Windows? Do we really need such intense inlining?
Flags: needinfo?(giles)
Component: Audio/Video → Audio/Video: Playback
Only profiling will tell, but I suspect this is an oversight. We should try flipping the config setting.
Flags: needinfo?(giles)
I also noticed that these functions are super repetitive, which worsens the over-inlining.

For example av1_fht8x16_sse2 goes like

case N:
      load_buffer_8x16(input, in, stride, 0, 0);
      array_transpose_8x8(t, t);
      array_transpose_8x8(b, b);
      { call a function that depends on the case }
      { call a function that depends on the case }
      row_8x16_rounding(in, 2);
      { call a function that depends on the case }
      break;
case N+1
      ...

Could we factor those out into function pointers, and call do this "call seven functions in a row" bit only once?
Attached patch un-forceinlineSplinter Review
This patch removed 0.6MB from xul.dll on win64-pgo and 1.4MB on win32-pgo! (I don't understand why the difference is so dramatic. Also I haven't measured codec speed.)

But it's not suitable for landing, sadly, since this file is meant to be generated from upstream configures.
Assignee: nobody → giles
I notice there's also a very popular SIMD_INLINE define which maps to AOM_FORCE_INLINE which is defined on both compilers. So things could be smaller, but maybe the inline-as-unrolling issue is already handled by that one.
Upstream took the fix, so this should be addressed as soon as we vendor a new update.
Assignee: giles → nobody
Depends on: 1445683
Does the libvpx component of this depend on a separate merge bug?
Flags: needinfo?(giles)
Good point. Yes, it does.
Depends on: 1433158
Flags: needinfo?(giles)
You need to log in before you can comment on or make changes to this bug.