Optimize libyuv, libvpx, ffvpx et al. for usage on ARM64
Categories
(Core :: Audio/Video, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: dminor, Assigned: dminor)
References
Details
Attachments
(6 files, 9 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
We need to ensure that all of the third-party media libraries that we rely upon are being built with the appropriate optimizations for ARM64.
For example, in Bug 1494503 we disabled neon for libyuv for arm64 on Windows.
Assignee | ||
Comment 1•6 years ago
|
||
ffvpx was disabled in Bug 1481513.
Assignee | ||
Comment 2•6 years ago
|
||
The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!
Comment 3•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #2)
The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!
I have hints, but probably not the ones you want to hear. :(
armasm64 accepts the same syntax for instructions as the GNU assembler, but its macro directives are completely different and it doesn't invoke the preprocessor by default. For ICU, I worked around this by generating a completely different assembly file than we generate for aarch64-linux. For our low-level xptcall stuff, I copied the aarch64-linux assembly and manually preprocessed the file.
I think ffvpx has an assembly conversion script that is mostly intelligent enough to convert from the GNU-style to the armasm64 style; have you tried that?
Assignee | ||
Comment 4•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #3)
(In reply to Dan Minor [:dminor] from comment #2)
The main problem I'm having with ffvpx at the moment is that we're passing in a bunch of incorrect ASFLAGS when trying to build the aarch64 assembly. It chokes on -DNDEBUG, but it won't be happy with how the -I flags are passed in either. Nathan, I noticed you got this working for ICU, I was wondering if you had any hints for me? Thanks!
I have hints, but probably not the ones you want to hear. :(
armasm64 accepts the same syntax for instructions as the GNU assembler, but its macro directives are completely different and it doesn't invoke the preprocessor by default. For ICU, I worked around this by generating a completely different assembly file than we generate for aarch64-linux. For our low-level xptcall stuff, I copied the aarch64-linux assembly and manually preprocessed the file.
I think ffvpx has an assembly conversion script that is mostly intelligent enough to convert from the GNU-style to the armasm64 style; have you tried that?
I have the conversion script (gas-preprocessor.pl) used by ffmpeg running similarly to what is done here [1]. The problem I'm hitting is that armasm64 is being called with the wrong arguments. I haven't been able to find a way to process the ASFLAGS to filter out the unnecessary arguments and reformat the include paths, so armasm dies before it ever hits the assembly file and I'm not really sure whether or not the preprocessor is working properly :/
Comment 5•6 years ago
|
||
What does the armasm64 invocation look like?
Assignee | ||
Comment 6•6 years ago
|
||
This is what I'm getting:
0:10.48 C:/PROGRA2/MICROS1/2017/COMMUN1/VC/Tools/MSVC/14161.270/bin/HostX64/arm64/armasm64.exe -o float_dsp_neon.obj -DNDEBUG=1 -DTRIMMED=1 -D_USE_MATH_DEFINES -Dinline=__inline -DHAVE_AV_CONFIG_H -DASSERT_LEVEL=1 -Ic:/src/firefox/media/ffvpx -Ic:/src/firefox/media/ffvpx/compat/atomics/win32 -DPIC -DWIN64 -Ic:/src/firefox/media/ffvpx/ -Ic:/src/firefox/media/ffvpx/libavcodec/x86/ -Ic:/src/firefox/media/ffvpx/libavutil/x86/ -c c:/src/firefox/media/ffvpx/libavutil/aarch64/float_dsp_neon.S
0:10.55 Microsoft (R) ARM Macro Assembler Version 14.16.27027.1 for 64 bits
0:10.55 Copyright (C) Microsoft Corporation. All rights reserved.
0:10.55 error A2029: unknown command-line argument or argument value -DNDEBUG=1
0:10.55 Usage: armasm [<options>] sourcefile objectfile
0:10.55 armasm [<options>] -o objectfile sourcefile
0:10.55 armasm -h for help
Most of the defines come from ffvpx moz.build files so I can remove them conditionally if necessary, but I'm not sure where -DNDEBUG gets set. The -I<path> options need to be -i <path> for armasm64 I think. Thanks for having a look.
Comment 7•6 years ago
|
||
-DNDEBUG
and -DTRIMMED
come from MOZ_DEBUG_DEFINES
in build/autoconfig/compiler-opts.m4, and I guess they get added to DEFINES
in python/mozbuild/mozbuild/frontend/context.py. I think that if you make all the ffvpx-related DEFINES
conditional on !aarch64
, -DNDEBUG
and -DTRIMMED
will magically go away; if I understand the context.py correctly, those two defines only get added if there are other defines already present. I think that's how ICU's aarch64 assembly gets away with not having to deal with those flags: there are no DEFINES
present in config/external/icu/data/moz.build for aarch64 windows.
It's possible that just defining ASFLAGS
, as you'll need to do for the -i
arguments, will drag in the magic DEFINES
, and then we'll have to do something else.
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #7)
-DNDEBUG
and-DTRIMMED
come fromMOZ_DEBUG_DEFINES
in build/autoconfig/compiler-opts.m4, and I guess they get added toDEFINES
in python/mozbuild/mozbuild/frontend/context.py. I think that if you make all the ffvpx-relatedDEFINES
conditional on!aarch64
,-DNDEBUG
and-DTRIMMED
will magically go away; if I understand the context.py correctly, those two defines only get added if there are other defines already present. I think that's how ICU's aarch64 assembly gets away with not having to deal with those flags: there are noDEFINES
present in config/external/icu/data/moz.build for aarch64 windows.It's possible that just defining
ASFLAGS
, as you'll need to do for the-i
arguments, will drag in the magicDEFINES
, and then we'll have to do something else.
It occurred to me just after I wrote that on Friday that the -i
arguments might not be necessary if the preprocessor step is doing its job properly. So hopefully I can make the ASFLAGS conditional like you suggest, pass the flags in to the preprocessor step instead and things will work :) Thanks for your help on this!
Assignee | ||
Comment 9•6 years ago
|
||
It turns out that gas-preprocessor.pl expects to run the assembler itself rather than just generate output for the assembler. If I modify it slightly to generate an output file that ends with .asm (rather than .s or .S) and then add the generated file to the moz.build file things more or less just work.
I think the least invasive option will be run gas-preprocessor.pl as part of a ffvpx update and check in the generated files. That will avoid having to add a perl dependency to the build as well as whatever changes are required to convince the build system not to pass extra arguments when we run armasm64.
Assignee | ||
Comment 11•6 years ago
|
||
We're using the generic build on arm64 for libaom, but I'm not sure it's worth spending time on it when we're planning to switch to dav1d anyway.
That leaves the following that appear to only use neon for arm, not arm64:
- libspeex_resampler
- libtheora
- libtremor
- openmax_dl
Assignee | ||
Comment 12•6 years ago
|
||
resample_neon.c in libspeex_resampler does not build for aarch64, so I'm going to skip it:
INFO - /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang -std=gnu99 --target=aarch64-linux-android -o resample_neon.o -c -flto=thin -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DNDEBUG=1 -DTRIMMED=1 -DOUTSIDE_SPEEX -DEXPORT= -DFIXED_POINT -D_USE_NEON -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/media/libspeex_resampler/src -I/builds/worker/workspace/build/src/obj-firefox/media/libspeex_resampler/src -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include/aarch64-linux-android -isystem /builds/worker/workspace/build/src/android-ndk/sysroot/usr/include -gcc-toolchain /builds/worker/workspace/build/src/android-ndk/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64 -D__ANDROID_API__=21 -fstack-protector-strong -fno-short-enums -fno-exceptions -fstack-protector-strong -fno-strict-aliasing -ffunction-sections -fdata-sections -fno-math-errno -fPIC -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -Oz -fno-omit-frame-pointer -funwind-tables -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-sign-compare -MD -MP -MF .deps/resample_neon.o.pp /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c
[task 2019-04-10T17:17:43.994Z] 17:17:43 ERROR - /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c:60:12: error: unknown register name 'q0' in asm
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - : "q0");
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - ^
[task 2019-04-10T17:17:43.994Z] 17:17:43 ERROR - /builds/worker/workspace/build/src/media/libspeex_resampler/src/resample_neon.c:119:13: error: unknown register name 'q0' in asm
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - : "cc", "q0",
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - ^
[task 2019-04-10T17:17:43.994Z] 17:17:43 INFO - 2 errors generated
Assignee | ||
Comment 13•6 years ago
|
||
libtremor is unused in win64 builds, and the assembly is not supported by aarch64, so I'm also dropping it.
Assignee | ||
Comment 14•6 years ago
|
||
openmax_dl is limited to arm builds [1]. I think switching FFT implementations (if it is even necessary) for aarch64 is fodder for another bug.
Assignee | ||
Comment 15•6 years ago
|
||
And finally, gas-preprocessor.pl isn't able to handle .s files in libtheora and I think hand converting them is out of scope for this bug. So I'm going to stick with libyuv, libvpx and ffvpx as originally described.
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Comment 18•6 years ago
•
|
||
libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.
EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.
Assignee | ||
Comment 19•6 years ago
|
||
(In reply to Thomas Daede [:TD-Linux] from comment #18)
libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.
libvpx seems to be ok, but with libtheora I get a ton of syntax errors and unrecognized opcodes if I attempt to build the .s files directly with armasm. That is using the lib/arm directory in tree. Perhaps we're missing a lib/aarch64 directory, but I wasn't able to turn one up.
Assignee | ||
Comment 20•6 years ago
|
||
Assignee | ||
Comment 21•6 years ago
|
||
Depends on D27377
Assignee | ||
Comment 22•6 years ago
|
||
Depends on D27378
Assignee | ||
Comment 23•6 years ago
|
||
We will use gas-preprocessor.pl for ffvpx and libdav1d. Having it in tree will
also make it available to Treeherder builds of the OpenH264 plugin.
Depends on D27379
Assignee | ||
Comment 24•6 years ago
|
||
Depends on D27380
Assignee | ||
Comment 25•6 years ago
|
||
We know clang-cl is available on Windows build machines, cpp is not necessarily
present.
Depends on D27381
Assignee | ||
Comment 26•6 years ago
|
||
Depends on D27382
Assignee | ||
Comment 27•6 years ago
|
||
This adds a Python script which automates running gas-preprocessor.pl
for each .S file in the aarch64 directories. The next time we do a
ffvpx update we should consider adding an update.sh file to automate
generating the config files and running this script.
Depends on D27383
Assignee | ||
Comment 28•6 years ago
|
||
Depends on D27384
Assignee | ||
Comment 29•6 years ago
|
||
Alex pointed out that the integrated assembler in clang seems to handle the .S files without preprocessing. I'm going to try a patch to add an option to use the clang assembler in moz.build files and see how it goes.
Assignee | ||
Comment 30•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #29)
Alex pointed out that the integrated assembler in clang seems to handle the .S files without preprocessing. I'm going to try a patch to add an option to use the clang assembler in moz.build files and see how it goes.
That looks like a bit of a rabbit hole. We have very limited aarch64 assembly used in tree at the moment, maybe just icudata.asm and ffi.S, so it probably makes sense to decide whether it is safe to switch to use the integrated clang-cl assembler for everything rather than trying to special case ffvpx and libdav1d.
Alex has a try run here [1] that is green for aarch64 using clang-cl as the assembler.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=00c1379e121b3b0e6976721cb5bf3b81bd89fadf
Assignee | ||
Comment 31•6 years ago
|
||
Nathan, sorry to bother you again, but do you have any thoughts on using armasm64 or the integrated assembler in clang? Thanks!
Comment 32•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #31)
Nathan, sorry to bother you again, but do you have any thoughts on using armasm64 or the integrated assembler in clang? Thanks!
It seems reasonable to me to try and use the integrated assembler in clang-cl
. I am slightly worried about future updates to ffvpx/libdav1d/etc. using syntax that clang-cl
doesn't support, though, and finding ourselves blocked because of that.
Of course, making a change like Alex's would require an all-platforms build, at the very least, as Alex's patch changes build configuration for every platform, not just aarch64 windows. We might have to limit the scope of that change to aarch64 windows (possibly windows generally), since we permit using older versions of clang that may not (?) have the same level of support for gas syntax.
Assignee | ||
Comment 33•6 years ago
|
||
As it turns out libffi has syntax that clang-cl does not support. Alex's patch works because it only changes how .S files are compiled and so still uses armasm64 to build the hand converted win64.asm. So we will end up having to compile some files with clang-cl and some with armasm64.
Assignee | ||
Comment 34•6 years ago
|
||
It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e
Comment 35•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #34)
It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e
FWIW, it looks like the media/ffvpx/libavcodec/aarch64
.asm
files were written for the standard armasm64
assembler, probably run through gas-preprocessor
, whereas the .S
files in that same directory would be parseable by clang-cl
. You will want to make sure you're selecting the correct set of assembly files to hand to whichever assembler you decide to use.
Assignee | ||
Comment 36•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #35)
(In reply to Dan Minor [:dminor] from comment #34)
It also turns out that ffvpx has syntax that clang-cl does not support, see the rough wip at [1]. My initial tests were with libavutil which works, but libavcodec does not. I'm going to stick with gas-preprocessor.pl for this bug.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4ef786e57142936edc4668f95dca9567a59b78e
FWIW, it looks like the
media/ffvpx/libavcodec/aarch64
.asm
files were written for the standardarmasm64
assembler, probably run throughgas-preprocessor
, whereas the.S
files in that same directory would be parseable byclang-cl
. You will want to make sure you're selecting the correct set of assembly files to hand to whichever assembler you decide to use.
Yes, you're right, I messed that up, I missed switching over to float_dsp_neon.S when I updated the moz.build file. I'll spin up another try job. Thank you for pointing that out!
Assignee | ||
Comment 37•6 years ago
|
||
Green try run for aarch64 windows: https://treeherder.mozilla.org/#/jobs?repo=try&revision=533c81db07d40e60b90a7bc798d1bad801a03fd7
Assignee | ||
Comment 38•6 years ago
|
||
Some media libraries use gas syntax in their assembly files. Rather than
converting these arm assembly syntax files for aarch64, we can use clang-cl
to build them directly.
Assignee | ||
Comment 39•6 years ago
|
||
Depends on D27785
Assignee | ||
Comment 40•6 years ago
|
||
Depends on D27786
Assignee | ||
Comment 41•6 years ago
|
||
Depends on D27787
Assignee | ||
Comment 42•6 years ago
|
||
Depends on D27788
Assignee | ||
Comment 43•6 years ago
|
||
Depends on D27789
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 44•6 years ago
|
||
Sorry, it looks like the commit mapping got dropped along the way. I've just obsoleted the old revisions. Sigh.
Assignee | ||
Comment 45•6 years ago
|
||
Comment 46•6 years ago
|
||
(In reply to Thomas Daede [:TD-Linux] from comment #18)
libvpx and libtheora already have their own variants of gas-preprocessor.pl in the mozilla-central tree (ads2gas.pl for libvpx, arm2gnu.pl for libtheora), and custom Makefile.in files that call the perl scripts. For these it might be a matter of just enabling them for win64.EDIT: totally wrong, those perl scripts do the exact opposite. Sorry for the confusion. Those projects' asm files can be shoved directly into armasm.
Even if e.g. libvpx carries arm assembly files in "armasm" format originally (which is converted to GAS syntax into *.s for assembling on e.g. linux), that armasm format is for the official ARM armasm tool. Microsoft's armasm tool is close but not exactly the same, and e.g. for libvpx for arm32, there's the issue that it also needs converting from arm to thumb form. In libvpx, this is done with the build/make/ads2armasm_ms.pl script though (which can be used instead of ads2gas.pl which is otherwise normally used). Converting first to gas form and then from that to armasm form with gas-preprocessor is unknown if it works. Although this isn't an issue for win/arm64 because libvpx has no standalone assembly for that, only intrinsics.
Even if gas-preprocessor looks like a generic utility, in practice it kind of requires the codebase to be cleaned up a little and prepared for gas-preprocessor to run on it. libav/ffmpeg, openh264 and dav1d all build fine with gas-preprocessor though. But assembling them directly with clang is of course way better, as gas-preprocessor is just a hack really.
(In reply to Nathan Froyd [:froydnj] from comment #32)
It seems reasonable to me to try and use the integrated assembler in
clang-cl
. I am slightly worried about future updates to ffvpx/libdav1d/etc. using syntax thatclang-cl
doesn't support, though, and finding ourselves blocked because of that.
You needn't worry about that. dav1d is continuously tested with assembling with clang (and outside of the official CI, I run nightly build tests with clang for win/arm64 and gas-preprocessor+armasm64), so it really shouldn't break wrt that.
In general, since clang 5.0, clang/llvm's built in assembler supports all the major gas features. There are certain cases where it's more strict than GNU binutils gas though (especially when targeting ios, it interprets macros more strictly, requiring commas between parameters where it otherwise is optional), but for well written code it shouldn't be an issue.
Some optional features, like the .func directive (which only is for debug info afaik) is missing, but as long as config.h and similar files are updated according to what a real run of configure/meson with such tools would generate, it should be fine.
Comment 47•6 years ago
|
||
Also, fwiw, for assembling, if it makes any difference, you can assemble just as well with the traditional clang
-named tool instead of clang-cl
. (I just noticed that some patch talked about "USE_INTEGRATED_CLANGCL_AS", even though the same goes just as well to plain clang, not only clang-cl.) But maybe windows is the only case where you otherwise have a separate tool for some assembly files but not all, and thus naming it based on CLANGCL is fine.
Assignee | ||
Comment 48•6 years ago
|
||
(In reply to Martin Storsjö from comment #47)
Also, fwiw, for assembling, if it makes any difference, you can assemble just as well with the traditional
clang
-named tool instead ofclang-cl
. (I just noticed that some patch talked about "USE_INTEGRATED_CLANGCL_AS", even though the same goes just as well to plain clang, not only clang-cl.) But maybe windows is the only case where you otherwise have a separate tool for some assembly files but not all, and thus naming it based on CLANGCL is fine.
I struggled with the naming a bit, we only really need it on Windows, so I named it CLANGCL, but I'll go with whatever my reviewer suggests.
Updated•6 years ago
|
Assignee | ||
Comment 49•6 years ago
|
||
:jya, could you have another look at https://phabricator.services.mozilla.com/D27789 please?
Updated•6 years ago
|
Comment 50•6 years ago
|
||
Comment 51•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f1bc4fcd152e
https://hg.mozilla.org/mozilla-central/rev/cd666f0befca
https://hg.mozilla.org/mozilla-central/rev/a338bdcb894f
https://hg.mozilla.org/mozilla-central/rev/f40ae51578ac
https://hg.mozilla.org/mozilla-central/rev/742b7c0a4bdb
https://hg.mozilla.org/mozilla-central/rev/d000d40067de
Description
•