Closed Bug 1061339 Opened 5 years ago Closed 5 years ago

Enable AVX2 in libvpx for VS2012+

Categories

(Firefox Build System :: General, defect)

x86_64
Windows 7
defect
Not set

Tracking

(firefox38 fixed)

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: dmajor, Assigned: m_kato)

Details

Attachments

(2 files, 4 obsolete files)

This was already done for libyuv in bug 1054811. Let's do the same for libvpx. (I don't see any other uses besides those two libs)
I think the patch is very easy.
Attachment #8482628 - Flags: review?(ted)
I think HAVE_X86_AVX2 is enough
Attachment #8482628 - Attachment is obsolete: true
Attachment #8482628 - Flags: review?(ted)
Attachment #8482755 - Flags: review?(ted)
(In reply to xunxun from comment #2)
> Created attachment 8482755 [details] [diff] [review]
> Enable AVX2 in libvpx for VS2012+
> 
> I think HAVE_X86_AVX2 is enough

Not enouth.  You have to generate all headers such as vp9_rtcd_x86-win32-vs11.h by original source code.
- configure into libvpx-1.3.0 can generate vpx_config.* for that platform.
- make after configure can generate vp9_rtcd.h, vp8_rtcd.h and vpx_scale_rtcd.h for that platform
- include these by vp8_rtcd.h, vp9_rtcd.h and vpx_config.asm and vpx_config.h

generated vp9_rtcd.h for VS2012+ will call AVX2 code.  So your patch never calls AVX2's intrinsic.
also, vpx_rtcd.h is't used now.  It is bug 1061538
(In reply to Makoto Kato (:m_kato) from comment #4)
> - configure into libvpx-1.3.0 can generate vpx_config.* for that platform.
> - make after configure can generate vp9_rtcd.h, vp8_rtcd.h and
> vpx_scale_rtcd.h for that platform
> - include these by vp8_rtcd.h, vp9_rtcd.h and vpx_config.asm and vpx_config.h
> 
> generated vp9_rtcd.h for VS2012+ will call AVX2 code.  So your patch never
> calls AVX2's intrinsic.

Can you take the bug?

I found that Mozilla in-tree libvpx 1.3.0 also lack vp9/encoder/x86/*avx2.c, and I have no AVX2 CPU, I can't test it.
Attachment #8482755 - Attachment is obsolete: true
Attachment #8482755 - Flags: review?(ted)
(In reply to xunxun from comment #6)

> I found that Mozilla in-tree libvpx 1.3.0 also lack vp9/encoder/x86/*avx2.c,
> and I have no AVX2 CPU, I can't test it.

That files are upstream only.
Attached patch Add vs2012+ configuration (obsolete) — Splinter Review
Attachment #8483995 - Flags: review?(giles)
Attachment #8483995 - Flags: review?(giles) → review+
Maybe we should add vp9_loopfilter_intrin_avx2 's CFLAGS -arch:AVX2?
Assignee: nobody → m_kato
(In reply to xunxun from comment #9)
> Maybe we should add vp9_loopfilter_intrin_avx2 's CFLAGS -arch:AVX2?

It is unnecessary.  VS can compile AVX2 intrinsic without it.  gcc requires -mavx2 for it.
Makefile.in changes require build peer review. Please request post-facto review or backout.
Flags: needinfo?(m_kato)
Flags: needinfo?(giles)
OK. I will backout soon.
Flags: needinfo?(m_kato)
Flags: needinfo?(giles)
Comment on attachment 8483995 [details] [diff] [review]
Add vs2012+ configuration

No way to detect visual studio version on yasm.  So I want to add vc version for condition of yasm assembler.
Attachment #8483995 - Flags: review?(gps)
And, YASM's flags cannot apply DEFINES['xxx'] on moz.build.
Comment on attachment 8483995 [details] [diff] [review]
Add vs2012+ configuration

Review of attachment 8483995 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the review latency - I was on holiday.
Attachment #8483995 - Flags: review?(gps) → review+
Now with review+, will the patch commit to m-i/m-c?
The patch can be simplified a lot now that we require minimum VS2013.
This need rebase and review again.
Attachment #8483995 - Attachment is obsolete: true
Comment on attachment 8565275 [details] [diff] [review]
Part 1. Always use VS2013 target

Minimal requirement of Windows MSVS build is 2013.  So we should use x86-win32-vs12 and x86_64-win64-vs12 as target.
Attachment #8565275 - Flags: review?(giles)
Attachment #8565304 - Attachment is obsolete: true
Comment on attachment 8565275 [details] [diff] [review]
Part 1. Always use VS2013 target

Review of attachment 8565275 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the clean update.
Attachment #8565275 - Flags: review?(giles) → review+
Comment on attachment 8565305 [details] [diff] [review]
Part 2. Build AVX code on all Windows build

Works for me. Adding ted for build peer review.
Attachment #8565305 - Flags: review?(ted)
Attachment #8565305 - Flags: review?(giles)
Attachment #8565305 - Flags: review+
Comment on attachment 8565305 [details] [diff] [review]
Part 2. Build AVX code on all Windows build

Review of attachment 8565305 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/libvpx/moz.build
@@ +42,5 @@
>          SOURCES += files['X86-64_ASM']
>  
>      # AVX2 only supported on
> +    # Darwin and Windows toolchains right now
> +    if CONFIG['OS_TARGET'] in ('Darwin', 'WINNT'):

You can actually use a set literal for this, not that it makes much difference, that'd be {'Darwin', 'WINNT'}
Attachment #8565305 - Flags: review?(ted) → review+
To clarify, searching set literals is in general faster than searching tuples, since there's a hash lookup function.
(In reply to Ralph Giles (:rillian) from comment #28)
> To clarify, searching set literals is in general faster than searching
> tuples, since there's a hash lookup function.

That's a very common misconception that should be beaten to death. A hash lookup in a small hash table is generally slower than a linear lookup in a small array. In python, it's an order of magnitude slower.
(In reply to Mike Hommey [:glandium] from comment #29)

> That's a very common misconception

$ python -m timeit "'hello' in ('hello', 'goodbye')"
10000000 loops, best of 3: 0.035 usec per loop
$ python -m timeit "'hello' in ('goodbye', 'hello')"
10000000 loops, best of 3: 0.045 usec per loop

$ python -m timeit "'hello' in {'hello', 'goodbye'}"
10000000 loops, best of 3: 0.115 usec per loop
$ python -m timeit "'hello' in {'goodbye', 'hello'}"
10000000 loops, best of 3: 0.115 usec per loop

So the tuple is 2-3 times faster in this test. In fact, there seems to be no effective crossover on my system. Worst case with 10000 strings ('benchmark' isn't in /usr/share/doc/words):

$ python -m timeit "'benchmark' in ( $(for word in $(head -10000 /usr/share/dict/words); do echo "'$word', "; done) 'benchmark')"
10000 loops, best of 3: 115 usec per loop
$ python -m timeit "'benchmark' in { $(for word in $(head -10000 /usr/share/dict/words); do echo "'$word', "; done) 'benchmark'}"
1000 loops, best of 3: 418 usec per loop

I guess the overhead of constructing the data structure dominates when there's only a single lookup.

> that should be beaten to death.

Can we discuss best coding practices using less violent metaphors, please?
Greenish on try.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ba871c6324bf
https://hg.mozilla.org/mozilla-central/rev/8d7af888331b
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.