Optimize libyuv, libvpx, ffvpx et al. for usage on ARM64

RESOLVED FIXED in Firefox 68

Status

()

enhancement
P2
normal
RESOLVED FIXED
4 months ago
3 months ago

People

(Reporter: dminor, Assigned: dminor)

Tracking

67 Branch
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 fixed)

Details

Attachments

(6 attachments, 9 obsolete attachments)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

We need to ensure that all of the third-party media libraries that we rely upon are being built with the appropriate optimizations for ARM64.

For example, in Bug 1494503 we disabled neon for libyuv for arm64 on Windows.

ffvpx was disabled in Bug 1481513.

The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!

Flags: needinfo?(nfroyd)

(In reply to Dan Minor [:dminor] from comment #2)

The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!

I have hints, but probably not the ones you want to hear. :(

armasm64 accepts the same syntax for instructions as the GNU assembler, but its macro directives are completely different and it doesn't invoke the preprocessor by default. For ICU, I worked around this by generating a completely different assembly file than we generate for aarch64-linux. For our low-level xptcall stuff, I copied the aarch64-linux assembly and manually preprocessed the file.

I think ffvpx has an assembly conversion script that is mostly intelligent enough to convert from the GNU-style to the armasm64 style; have you tried that?

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #3)

(In reply to Dan Minor [:dminor] from comment #2)

The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!

I have hints, but probably not the ones you want to hear. :(

armasm64 accepts the same syntax for instructions as the GNU assembler, but its macro directives are completely different and it doesn't invoke the preprocessor by default. For ICU, I worked around this by generating a completely different assembly file than we generate for aarch64-linux. For our low-level xptcall stuff, I copied the aarch64-linux assembly and manually preprocessed the file.

I think ffvpx has an assembly conversion script that is mostly intelligent enough to convert from the GNU-style to the armasm64 style; have you tried that?

I have the conversion script (gas-preprocessor.pl) used by ffmpeg running similarly to what is done here [1]. The problem I'm hitting is that armasm64 is being called with the wrong arguments. I haven't been able to find a way to process the ASFLAGS to filter out the unnecessary arguments and reformat the include paths, so armasm dies before it ever hits the assembly file and I'm not really sure whether or not the preprocessor is working properly :/

[1] https://searchfox.org/mozilla-central/rev/8e0ea968308dd692ba5394b14e584234acf7d4ca/config/external/ffi/moz.build#94

What does the armasm64 invocation look like?

This is what I'm getting:

0:10.48 C:/PROGRA2/MICROS1/2017/COMMUN1/VC/Tools/MSVC/14161.270/bin/HostX64/arm64/armasm64.exe -o float_dsp_neon.obj -DNDEBUG=1 -DTRIMMED=1 -D_USE_MATH_DEFINES -Dinline=__inline -DHAVE_AV_CONFIG_H -DASSERT_LEVEL=1 -Ic:/src/firefox/media/ffvpx -Ic:/src/firefox/media/ffvpx/compat/atomics/win32 -DPIC -DWIN64 -Ic:/src/firefox/media/ffvpx/ -Ic:/src/firefox/media/ffvpx/libavcodec/x86/ -Ic:/src/firefox/media/ffvpx/libavutil/x86/ -c c:/src/firefox/media/ffvpx/libavutil/aarch64/float_dsp_neon.S
0:10.55 Microsoft (R) ARM Macro Assembler Version 14.16.27027.1 for 64 bits
0:10.55 Copyright (C) Microsoft Corporation. All rights reserved.
0:10.55 error A2029: unknown command-line argument or argument value -DNDEBUG=1
0:10.55 Usage: armasm [<options>] sourcefile objectfile
0:10.55 armasm [<options>] -o objectfile sourcefile
0:10.55 armasm -h for help

Most of the defines come from ffvpx moz.build files so I can remove them conditionally if necessary, but I'm not sure where -DNDEBUG gets set. The -I<path> options need to be -i <path> for armasm64 I think. Thanks for having a look.

-DNDEBUG and -DTRIMMED come from MOZ_DEBUG_DEFINES in build/autoconfig/compiler-opts.m4, and I guess they get added to DEFINES in python/mozbuild/mozbuild/frontend/context.py. I think that if you make all the ffvpx-related DEFINES conditional on !aarch64, -DNDEBUG and -DTRIMMED will magically go away; if I understand the context.py correctly, those two defines only get added if there are other defines already present. I think that's how ICU's aarch64 assembly gets away with not having to deal with those flags: there are no DEFINES present in config/external/icu/data/moz.build for aarch64 windows.

It's possible that just defining ASFLAGS, as you'll need to do for the -i arguments, will drag in the magic DEFINES, and then we'll have to do something else.

(In reply to Nathan Froyd [:froydnj] from comment #7)

-DNDEBUG and -DTRIMMED come from MOZ_DEBUG_DEFINES in build/autoconfig/compiler-opts.m4, and I guess they get added to DEFINES in python/mozbuild/mozbuild/frontend/context.py. I think that if you make all the ffvpx-related DEFINES conditional on !aarch64, -DNDEBUG and -DTRIMMED will magically go away; if I understand the context.py correctly, those two defines only get added if there are other defines already present. I think that's how ICU's aarch64 assembly gets away with not having to deal with those flags: there are no DEFINES present in config/external/icu/data/moz.build for aarch64 windows.

It's possible that just defining ASFLAGS, as you'll need to do for the -i arguments, will drag in the magic DEFINES, and then we'll have to do something else.

It occurred to me just after I wrote that on Friday that the -i arguments might not be necessary if the preprocessor step is doing its job properly. So hopefully I can make the ASFLAGS conditional like you suggest, pass the flags in to the preprocessor step instead and things will work :) Thanks for your help on this!

It turns out that gas-preprocessor.pl expects to run the assembler itself rather than just generate output for the assembler. If I modify it slightly to generate an output file that ends with .asm (rather than .s or .S) and then add the generated file to the moz.build file things more or less just work.

I think the least invasive option will be run gas-preprocessor.pl as part of a ffvpx update and check in the generated files. That will avoid having to add a perl dependency to the build as well as whatever changes are required to convince the build system not to pass extra arguments when we run armasm64.

dav1d is being worked on over in Bug 1540124.

See Also: → 1540124

We're using the generic build on arm64 for libaom, but I'm not sure it's worth spending time on it when we're planning to switch to dav1d anyway.

That leaves the following that appear to only use neon for arm, not arm64:

  • libspeex_resampler
  • libtheora
  • libtremor
  • openmax_dl

resample_neon.c in libspeex_resampler does not build for aarch64, so I'm going to skip it:

INFO - /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang -std=gnu99 --target=aarch64-linux-android -o resample_neon.o -c -flto=thin -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DNDEBUG=1 -DTRIMMED=1 -DOUTSIDE_SPEEX -DEXPORT= -DFIXED_POINT -D_USE_NEON -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/media/libspeex_resampler/src -I/builds/worker/workspace/build/src/obj-firefox/media/libspeex_resampler/src -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include/aarch64-linux-android -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include -gcc-toolchain /builds/worker/workspace/build/src/android-ndk/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64 -D__ANDROID_API__=21 -fstack-protector-strong -fno-short-enums -fno-exceptions -fstack-protector-strong -fno-strict-aliasing -ffunction-sections -fdata-sections -fno-math-errno -fPIC -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -Oz -fno-omit-frame-pointer -funwind-tables -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-sign-compare -MD -MP -MF .deps/resample_neon.o.pp /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c
[task 2019-04-10T17:17:43.994Z] 17:17:43 ERROR - /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c:60:12: error: unknown register name 'q0' in asm
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - : "q0");
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - ^
[task 2019-04-10T17:17:43.994Z] 17:17:43 ERROR - /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c:119:13: error: unknown register name 'q0' in asm
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - : "cc", "q0",
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - ^
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - 2 errors generated

libtremor is unused in win64 builds, and the assembly is not supported by aarch64, so I'm also dropping it.

openmax_dl is limited to arm builds [1]. I think switching FFT implementations (if it is even necessary) for aarch64 is fodder for another bug.

https://searchfox.org/mozilla-central/rev/dd7e27f4a805e4115d0dbee70e1220b23b23c567/config/external/moz.build#49

And finally, gas-preprocessor.pl isn't able to handle .s files in libtheora and I think hand converting them is out of scope for this bug. So I'm going to stick with libyuv, libvpx and ffvpx as originally described.

libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.

EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.

(In reply to Thomas Daede [:TD-Linux] from comment #18)

libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.

EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.

libvpx seems to be ok, but with libtheora I get a ton of syntax errors and unrecognized opcodes if I attempt to build the .s files directly with armasm. That is using the lib/arm directory in tree. Perhaps we're missing a lib/aarch64 directory, but I wasn't able to turn one up.

Depends on D27377

We will use gas-preprocessor.pl for ffvpx and libdav1d. Having it in tree will
also make it available to Treeherder builds of the OpenH264 plugin.

Depends on D27379

Depends on D27380

We know clang-cl is available on Windows build machines, cpp is not necessarily
present.

Depends on D27381

Depends on D27382

This adds a Python script which automates running gas-preprocessor.pl
for each .S file in the aarch64 directories. The next time we do a
ffvpx update we should consider adding an update.sh file to automate
generating the config files and running this script.

Depends on D27383

Alex pointed out that the integrated assembler in clang seems to handle the .S files without preprocessing. I'm going to try a patch to add an option to use the clang assembler in moz.build files and see how it goes.

(In reply to Dan Minor [:dminor] from comment #29)

Alex pointed out that the integrated assembler in clang seems to handle the .S files without preprocessing. I'm going to try a patch to add an option to use the clang assembler in moz.build files and see how it goes.

That looks like a bit of a rabbit hole. We have very limited aarch64 assembly used in tree at the moment, maybe just icudata.asm and ffi.S, so it probably makes sense to decide whether it is safe to switch to use the integrated clang-cl assembler for everything rather than trying to special case ffvpx and libdav1d.

Alex has a try run here [1] that is green for aarch64 using clang-cl as the assembler.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=00c1379e121b3b0e6976721cb5bf3b81bd89fadf

Nathan, sorry to bother you again, but do you have any thoughts on using armasm64 or the integrated assembler in clang? Thanks!

Flags: needinfo?(nfroyd)

(In reply to Dan Minor [:dminor] from comment #31)

Nathan, sorry to bother you again, but do you have any thoughts on using armasm64 or the integrated assembler in clang? Thanks!

It seems reasonable to me to try and use the integrated assembler in clang-cl. I am slightly worried about future updates to ffvpx/libdav1d/etc. using syntax that clang-cl doesn't support, though, and finding ourselves blocked because of that.

Of course, making a change like Alex's would require an all-platforms build, at the very least, as Alex's patch changes build configuration for every platform, not just aarch64 windows. We might have to limit the scope of that change to aarch64 windows (possibly windows generally), since we permit using older versions of clang that may not (?) have the same level of support for gas syntax.

Flags: needinfo?(nfroyd)

As it turns out libffi has syntax that clang-cl does not support. Alex's patch works because it only changes how .S files are compiled and so still uses armasm64 to build the hand converted win64.asm. So we will end up having to compile some files with clang-cl and some with armasm64.

It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e

(In reply to Dan Minor [:dminor] from comment #34)

It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e

FWIW, it looks like the media/ffvpx/libavcodec/aarch64 .asm files were written for the standard armasm64 assembler, probably run through gas-preprocessor, whereas the .S files in that same directory would be parseable by clang-cl. You will want to make sure you're selecting the correct set of assembly files to hand to whichever assembler you decide to use.

(In reply to Nathan Froyd [:froydnj] from comment #35)

(In reply to Dan Minor [:dminor] from comment #34)

It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e

FWIW, it looks like the media/ffvpx/libavcodec/aarch64 .asm files were written for the standard armasm64 assembler, probably run through gas-preprocessor, whereas the .S files in that same directory would be parseable by clang-cl. You will want to make sure you're selecting the correct set of assembly files to hand to whichever assembler you decide to use.

Yes, you're right, I messed that up, I missed switching over to float_dsp_neon.S when I updated the moz.build file. I'll spin up another try job. Thank you for pointing that out!

Some media libraries use gas syntax in their assembly files. Rather than
converting these arm assembly syntax files for aarch64, we can use clang-cl
to build them directly.

Depends on D27785

Depends on D27786

Attachment #9058726 - Attachment description: Bug 1540760 - Make it possible to clang-cl as an assembler; r=froydnj! → Bug 1540760 - Make it possible to use clang-cl as an assembler; r=froydnj!
Attachment #9058006 - Attachment is obsolete: true
Attachment #9058005 - Attachment is obsolete: true
Attachment #9058004 - Attachment is obsolete: true
Attachment #9058003 - Attachment is obsolete: true
Attachment #9058002 - Attachment is obsolete: true
Attachment #9058001 - Attachment is obsolete: true
Attachment #9058000 - Attachment is obsolete: true
Attachment #9057999 - Attachment is obsolete: true
Attachment #9057998 - Attachment is obsolete: true

Sorry, it looks like the commit mapping got dropped along the way. I've just obsoleted the old revisions. Sigh.

(In reply to Thomas Daede [:TD-Linux] from comment #18)

libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.

EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.

Even if e.g. libvpx carries arm assembly files in "armasm" format originally (which is converted to GAS syntax into *.s for assembling on e.g. linux), that armasm format is for the official ARM armasm tool. Microsoft's armasm tool is close but not exactly the same, and e.g. for libvpx for arm32, there's the issue that it also needs converting from arm to thumb form. In libvpx, this is done with the build/make/ads2armasm_ms.pl script though (which can be used instead of ads2gas.pl which is otherwise normally used). Converting first to gas form and then from that to armasm form with gas-preprocessor is unknown if it works. Although this isn't an issue for win/arm64 because libvpx has no standalone assembly for that, only intrinsics.

Even if gas-preprocessor looks like a generic utility, in practice it kind of requires the codebase to be cleaned up a little and prepared for gas-preprocessor to run on it. libav/ffmpeg, openh264 and dav1d all build fine with gas-preprocessor though. But assembling them directly with clang is of course way better, as gas-preprocessor is just a hack really.

(In reply to Nathan Froyd [:froydnj] from comment #32)

It seems reasonable to me to try and use the integrated assembler in clang-cl. I am slightly worried about future updates to ffvpx/libdav1d/etc. using syntax that clang-cl doesn't support, though, and finding ourselves blocked because of that.

You needn't worry about that. dav1d is continuously tested with assembling with clang (and outside of the official CI, I run nightly build tests with clang for win/arm64 and gas-preprocessor+armasm64), so it really shouldn't break wrt that.

In general, since clang 5.0, clang/llvm's built in assembler supports all the major gas features. There are certain cases where it's more strict than GNU binutils gas though (especially when targeting ios, it interprets macros more strictly, requiring commas between parameters where it otherwise is optional), but for well written code it shouldn't be an issue.

Some optional features, like the .func directive (which only is for debug info afaik) is missing, but as long as config.h and similar files are updated according to what a real run of configure/meson with such tools would generate, it should be fine.

Also, fwiw, for assembling, if it makes any difference, you can assemble just as well with the traditional clang-named tool instead of clang-cl. (I just noticed that some patch talked about "USE_INTEGRATED_CLANGCL_AS", even though the same goes just as well to plain clang, not only clang-cl.) But maybe windows is the only case where you otherwise have a separate tool for some assembly files but not all, and thus naming it based on CLANGCL is fine.

(In reply to Martin Storsjö from comment #47)

Also, fwiw, for assembling, if it makes any difference, you can assemble just as well with the traditional clang-named tool instead of clang-cl. (I just noticed that some patch talked about "USE_INTEGRATED_CLANGCL_AS", even though the same goes just as well to plain clang, not only clang-cl.) But maybe windows is the only case where you otherwise have a separate tool for some assembly files but not all, and thus naming it based on CLANGCL is fine.

I struggled with the naming a bit, we only really need it on Windows, so I named it CLANGCL, but I'll go with whatever my reviewer suggests.

Attachment #9058726 - Attachment description: Bug 1540760 - Make it possible to use clang-cl as an assembler; r=froydnj! → Bug 1540760 - Make it possible to use clang-cl as an assembler; r=mshal!

:jya, could you have another look at https://phabricator.services.mozilla.com/D27789 please?

Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard)
Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f1bc4fcd152e
Make it possible to use clang-cl as an assembler; r=firefox-build-system-reviewers,mshal
https://hg.mozilla.org/integration/autoland/rev/cd666f0befca
Enable neon for libyuv for aarch64; r=jya
https://hg.mozilla.org/integration/autoland/rev/a338bdcb894f
Use arm sources for libvpx; r=jya
https://hg.mozilla.org/integration/autoland/rev/f40ae51578ac
Rerun generate_sources_mozbuild.sh for arm64 windows; r=jya
https://hg.mozilla.org/integration/autoland/rev/742b7c0a4bdb
Add missing aarch64 files for ffvpx; r=jya
https://hg.mozilla.org/integration/autoland/rev/d000d40067de
Build system changes for aarch64-win64 support in ffvpx; r=jya
You need to log in before you can comment on or make changes to this bug.