Closed Bug 1061339 Opened 11 years ago Closed 11 years ago

Enable AVX2 in libvpx for VS2012+

Categories

(Firefox Build System :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(firefox38 fixed)

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: away, Assigned: m_kato)

Details

Attachments

(2 files, 4 obsolete files)

This was already done for libyuv in bug 1054811. Let's do the same for libvpx. (I don't see any other uses besides those two libs)
I think the patch is very easy.
Attachment #8482628 - Flags: review?(ted)
I think HAVE_X86_AVX2 is enough
Attachment #8482628 - Attachment is obsolete: true
Attachment #8482628 - Flags: review?(ted)
Attachment #8482755 - Flags: review?(ted)
(In reply to xunxun from comment #2) > Created attachment 8482755 [details] [diff] [review] > Enable AVX2 in libvpx for VS2012+ > > I think HAVE_X86_AVX2 is enough Not enouth. You have to generate all headers such as vp9_rtcd_x86-win32-vs11.h by original source code.
- configure into libvpx-1.3.0 can generate vpx_config.* for that platform. - make after configure can generate vp9_rtcd.h, vp8_rtcd.h and vpx_scale_rtcd.h for that platform - include these by vp8_rtcd.h, vp9_rtcd.h and vpx_config.asm and vpx_config.h generated vp9_rtcd.h for VS2012+ will call AVX2 code. So your patch never calls AVX2's intrinsic.
also, vpx_rtcd.h is't used now. It is bug 1061538
(In reply to Makoto Kato (:m_kato) from comment #4) > - configure into libvpx-1.3.0 can generate vpx_config.* for that platform. > - make after configure can generate vp9_rtcd.h, vp8_rtcd.h and > vpx_scale_rtcd.h for that platform > - include these by vp8_rtcd.h, vp9_rtcd.h and vpx_config.asm and vpx_config.h > > generated vp9_rtcd.h for VS2012+ will call AVX2 code. So your patch never > calls AVX2's intrinsic. Can you take the bug? I found that Mozilla in-tree libvpx 1.3.0 also lack vp9/encoder/x86/*avx2.c, and I have no AVX2 CPU, I can't test it.
Attachment #8482755 - Attachment is obsolete: true
Attachment #8482755 - Flags: review?(ted)
(In reply to xunxun from comment #6) > I found that Mozilla in-tree libvpx 1.3.0 also lack vp9/encoder/x86/*avx2.c, > and I have no AVX2 CPU, I can't test it. That files are upstream only.
Attached patch Add vs2012+ configuration (obsolete) — Splinter Review
Attachment #8483995 - Flags: review?(giles)
Attachment #8483995 - Flags: review?(giles) → review+
Maybe we should add vp9_loopfilter_intrin_avx2 's CFLAGS -arch:AVX2?
Assignee: nobody → m_kato
(In reply to xunxun from comment #9) > Maybe we should add vp9_loopfilter_intrin_avx2 's CFLAGS -arch:AVX2? It is unnecessary. VS can compile AVX2 intrinsic without it. gcc requires -mavx2 for it.
Makefile.in changes require build peer review. Please request post-facto review or backout.
Flags: needinfo?(m_kato)
Flags: needinfo?(giles)
OK. I will backout soon.
Flags: needinfo?(m_kato)
Flags: needinfo?(giles)
Comment on attachment 8483995 [details] [diff] [review] Add vs2012+ configuration No way to detect visual studio version on yasm. So I want to add vc version for condition of yasm assembler.
Attachment #8483995 - Flags: review?(gps)
And, YASM's flags cannot apply DEFINES['xxx'] on moz.build.
Comment on attachment 8483995 [details] [diff] [review] Add vs2012+ configuration Review of attachment 8483995 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the review latency - I was on holiday.
Attachment #8483995 - Flags: review?(gps) → review+
Now with review+, will the patch commit to m-i/m-c?
The patch can be simplified a lot now that we require minimum VS2013.
This need rebase and review again.
Attachment #8483995 - Attachment is obsolete: true
Comment on attachment 8565275 [details] [diff] [review] Part 1. Always use VS2013 target Minimal requirement of Windows MSVS build is 2013. So we should use x86-win32-vs12 and x86_64-win64-vs12 as target.
Attachment #8565275 - Flags: review?(giles)
Attachment #8565304 - Attachment is obsolete: true
Comment on attachment 8565275 [details] [diff] [review] Part 1. Always use VS2013 target Review of attachment 8565275 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the clean update.
Attachment #8565275 - Flags: review?(giles) → review+
Comment on attachment 8565305 [details] [diff] [review] Part 2. Build AVX code on all Windows build Works for me. Adding ted for build peer review.
Attachment #8565305 - Flags: review?(ted)
Attachment #8565305 - Flags: review?(giles)
Attachment #8565305 - Flags: review+
Comment on attachment 8565305 [details] [diff] [review] Part 2. Build AVX code on all Windows build Review of attachment 8565305 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/libvpx/moz.build @@ +42,5 @@ > SOURCES += files['X86-64_ASM'] > > # AVX2 only supported on > + # Darwin and Windows toolchains right now > + if CONFIG['OS_TARGET'] in ('Darwin', 'WINNT'): You can actually use a set literal for this, not that it makes much difference, that'd be {'Darwin', 'WINNT'}
Attachment #8565305 - Flags: review?(ted) → review+
To clarify, searching set literals is in general faster than searching tuples, since there's a hash lookup function.
(In reply to Ralph Giles (:rillian) from comment #28) > To clarify, searching set literals is in general faster than searching > tuples, since there's a hash lookup function. That's a very common misconception that should be beaten to death. A hash lookup in a small hash table is generally slower than a linear lookup in a small array. In python, it's an order of magnitude slower.
(In reply to Mike Hommey [:glandium] from comment #29) > That's a very common misconception $ python -m timeit "'hello' in ('hello', 'goodbye')" 10000000 loops, best of 3: 0.035 usec per loop $ python -m timeit "'hello' in ('goodbye', 'hello')" 10000000 loops, best of 3: 0.045 usec per loop $ python -m timeit "'hello' in {'hello', 'goodbye'}" 10000000 loops, best of 3: 0.115 usec per loop $ python -m timeit "'hello' in {'goodbye', 'hello'}" 10000000 loops, best of 3: 0.115 usec per loop So the tuple is 2-3 times faster in this test. In fact, there seems to be no effective crossover on my system. Worst case with 10000 strings ('benchmark' isn't in /usr/share/doc/words): $ python -m timeit "'benchmark' in ( $(for word in $(head -10000 /usr/share/dict/words); do echo "'$word', "; done) 'benchmark')" 10000 loops, best of 3: 115 usec per loop $ python -m timeit "'benchmark' in { $(for word in $(head -10000 /usr/share/dict/words); do echo "'$word', "; done) 'benchmark'}" 1000 loops, best of 3: 418 usec per loop I guess the overhead of constructing the data structure dominates when there's only a single lookup. > that should be beaten to death. Can we discuss best coding practices using less violent metaphors, please?
Greenish on try.
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: