Closed Bug 498770 Opened 11 years ago Closed 11 years ago

Enable optimized Theora code in Windows builds

Categories

(Core :: Audio/Video, defect)

x86
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: kinetik, Assigned: ds)

References

Details

Attachments

(1 file, 2 obsolete files)

We were forced to disable the assembly optimized Theora code on Windows due to crashes, but that should only be a temporary fix.  I can't find a bug to get the root cause fixed and enable this code again, so I'm filing a new one.

This is really a bug in the upstream code, but we should track this and work towards getting a fix because it's important for playback performance.

There's a bunch of analysis and context in bug 474937.
There's documentation about the preservation of registers inside inline asm blocks here: http://msdn.microsoft.com/en-us/library/k1a8ss06%28VS.80%29.aspx

In general, ebx should be saved and restored by the compiler... but there's a note about use of ebx when FPO is not used:

> If the compiler does not perform FPO, it will use EBX and EBP. To ensure code runs correctly, do not modify EBX in asm code if the function requires dynamic stack alignment as it could modify the frame pointer.

I'm not sure if this case applies here, though.
For what hearsay is worth, using EBX inappropriately is my best understanding of what's wrong with the 1.0 msvc assembly. However, we (upstream) don't have anyone with domain expertise to develop the fix.

Note that a work-around is to link to a version of the theora library
cross-compiled with gcc. For example, those available from
http://people.xiph.org/~tterribe/theora-win32-builds/ This would provide temporary relief of the performance complaints, which we've been hearing a lot of lately.

We have a volunteer working upstream on the msvc assembly toward the 1.1
release, but that may not be ready for some time. This is a separate issue from fixing the 1.0 msvc assembly; the code has changed significantly on that branch.
There's more information here: http://msdn.microsoft.com/en-us/library/aa290049.aspx

...however, it's slightly conflicting with the earlier link.  In any case, it does seem like avoiding use of ebx (or spilling and restoring it manually) may solve this problem.

I'm busy with other bugs for now, and I'm no expert in this area, but I can take a crack at fixing it later if nobody else beats me to it.  However, if Theora 1.1 is going to be released before we do another Firefox release, it may not be worth spending time fixing this in Theora 1.0.
I think, if PGO is disable on theora using NO_PROFILE_GUIDED_OPTIMIZE=1 in Makefile, this may be able to fix it.  This hack is used on sqlite3.
Attached patch patch (obsolete) — Splinter Review
This is an untested patch to save ebx on the stack and/or use a different register.
Attached patch patch (obsolete) — Splinter Review
Oops.  Forgot to remove some irrelevant cruft in my tree in the first patch.
Attachment #383594 - Attachment is obsolete: true
Thanks Makoto and David!  I'm in the processing of testing now.  It'll take a while to get the results, as PGO builds on my little Windows machine will take a good part of day.  I'll post results when I have them.
With David's patch applied, I noticed the following warnings:

libtheora\lib\dec\x86_vc\mmxstate.c(144) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(144) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(156) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(156) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(156) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(160) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code
libtheora\lib\dec\x86_vc\mmxstate.c(160) : warning C4731: 'oc_state_frag_recon_mmx' : frame pointer register 'ebx' modified by inline assembly code

I'm not sure if they occurred in the earlier PGO build I ran (to make sure I could reproduce the crash), since the scrollback log doesn't go back far enough.  The patched build has only just started the LTCG phase, so I'll test it tomorrow morning.
We do always get warnings like the ones above when the optimized assembly is built, so it's not new with David's patch.

The patch fixes the crashes.  Videos that I'd crash on (in the MMX code) immediately play fine with the patch applied.

Mochitests past, but a couple of reftests fail.  I'll investigate tomorrow.
Oops, the reftests are just fine.  They failed because I was running them against a build that also included an unrelated patch to disable EXTEND_PAD.

We should land this on trunk.
Attached patch FixSplinter Review
This is David's patch packaged up with README_MOZILLA changes and fixes to makefile to enable it.
Attachment #383595 - Attachment is obsolete: true
Pushed to mozilla-central:
http://hg.mozilla.org/mozilla-central/rev/240cad5e94b6
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Attachment #384046 - Flags: review+
(In reply to comment #4)
> I think, if PGO is disable on theora using NO_PROFILE_GUIDED_OPTIMIZE=1 in
> Makefile, this may be able to fix it.  This hack is used on sqlite3.

For clarification note that "FPO" and "PGO" are not the same thing. "FPO" is "Frame pointer optimization", making use of ebp as a general purpose register. "PGO" is "Profile guided optimization", running an instrumented version of the binary to generate information about what codepaths are taken, and then relinking the binaries using that information for further optimization.

We use both in our Win32 nightly/release builds, FWIW.
Assignee: nobody → ds
Would be good to fix this for Firefox 3.5 too, currently theora video is almost unusable on older cpus/netbooks under windows, a lot slower/higher cpu usage than flash video, which is not exactly helpful for promoting <video> ;)
Flags: blocking1.9.1.1?
I'm not sure we should enable this for 1.9.1.1, but I'd love to hear comments from the <video> guys about relative safety of doing so.

Either way, this doesn't block 1.9.1.1 and we'd probably enable it for 1.9.1.2 or later.
Flags: blocking1.9.1.1? → wanted1.9.1.x?
I've been hearing a lot of complaints about performance, so it would be nice to turn this on in 1.9.1 at some point.  It's quite low risk, as that code is called for each frame decode, so it has had a exposure through trunk and our testsuite for 3 weeks or so now.

Having said that, I'm not sure if enabling the optimized Theora code is going to be enough to make a noticeable different on the machines that are hurting now, as our performance problems are mostly higher up the abstraction layers.

Perhaps the next step should be to find someone with a slow Windows machine (the Atom netbooks are probably an ideal testbed) to test builds with this enabled and disabled, and find out if it'll make enough of a difference to be worthwhile.
Duplicate of this bug: 475167
This patch causes problems with compilation on mingw. I've filled bug 530313.
You need to log in before you can comment on or make changes to this bug.