<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Updated

•

16 years ago

No longer depends on: 462082

Comment 1

•

16 years ago

Are you sure that is the only cause? In bug 462082 comment 11, a Core 2 Duo system is listed as affected. Also, the P3 supports SSE, but not SSE2.

Assignee

Updated

•

16 years ago

Summary: Playing Theora videos using video element causes crashes on non-SSE enabled machines → Playing Theora videos using video element causes crashes in Theora MMX/SSE routines

Assignee

Comment 2

•

16 years ago

I've changed the title to reflect that. Feel free to change it to whatever you feel is more appropriate.

timeless

Updated

•

16 years ago

Severity: normal → critical

Keywords: crash

Summary: Playing Theora videos using video element causes crashes in Theora MMX/SSE routines → Playing Theora videos using video element causes crashes in Theora MMX/SSE routines [@ oc_state_frag_recon_mmx]

Assignee

Comment 3

•

16 years ago

Attached patch Disable optimized MMX routines for Win32 (obsolete) — Details — Splinter Review

This disables the new optimized MMX/SSE routines for Windows

Assignee: nobody → chris.double

Status: NEW → ASSIGNED

Assignee

Comment 4

•

16 years ago

blocking1.9.1? This is needed on top of bug 462082 to fix crashes on windows. That bug is blocking1.9.1+.

Comment 5

•

16 years ago

(In reply to comment #3) > This disables the new optimized MMX/SSE routines for Windows Bummer. Theora performance for me is pathetic (100% CPU, stuttering--easily < 10 frames/sec, crackling audio) on Windows. Is there any possibility the optimizations will be fixed and reintroduced any time soon?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 6

•

16 years ago

> Is there any possibility the optimizations will be fixed and reintroduced any time soon? Yes, I'm disabling them until the cause is tracked down. There's some bugs that may interest you that are tracking ideas to improve performance for the things you are seeing (crackling audio, etc). See bug 474748, bug 474749 and bug 474540.

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

16 years ago

Ok. Thank you.

Makoto Kato [:m_kato]

Updated

•

16 years ago

Blocks: 475226

Comment 8

•

16 years ago

Chris, we need this fixed for beta3. Please make sure it lands.

Priority: -- → P1

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 9

•

16 years ago

Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/b62f99aa1fb6

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Priority: P1 → --

Resolution: --- → FIXED

Whiteboard: [baking for 1.9.1]

Comment 10

•

16 years ago

Er, I have a question about this patch ... it looks like it will cause dec/x86 to be compiled on Windows, which can't be right.

Jim Jeffery not reading bug-mail 1/2/11

Comment 11

•

16 years ago

I have an old non-SSE AMD 1.4ghz Thunderbird and I am no longer crashing using the latest hourly build that includes the patch. Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2a1pre) Gecko/20090125 Minefield/3.2a1pre Firefox/3.0.4 ID:20090125104555 changeset: http://hg.mozilla.org/mozilla-central/rev/abd5a41c0b70

Status: RESOLVED → VERIFIED

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 12

•

16 years ago

Pushed to mozilla-1.9.1: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/766a9eaa2989

Keywords: fixed1.9.1

Whiteboard: [baking for 1.9.1]

Comment 13

•

16 years ago

Jeff, I wonder if you could look into getting Theora's SSE2 detection working on Windows... We may need an old PC to do testing on, maybe there's one in the Toronto office?

Assignee

Comment 14

•

16 years ago

It may not be the detection, it may be the MSVC version of the SSE/MMX code itself is buggy. It was a fairly recent addition by a contributor to libtheora.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

16 years ago

Yeah, I can take a look. I don't know if we have an old PC in the toronto office, but I have a pre-SSE2 box at home, are there builds that show the problem?

Comment 16

•

16 years ago

(In reply to comment #15) > but I have a pre-SSE2 box at home, Great! > are there builds that show the problem? Changeset 53a4746eb1ba should have the problem. Thanks!!!

Comment 17

•

16 years ago

If I had to make a guess, it looks like the mmx code is overwriting the return address on the stack. I'm seeing a crash while exiting oc_frag_recon_inter_mmx(). The problem doesn't look related to cpu detection.

Comment 18

•

16 years ago

Attached patch Only disable broken mmx code. — Details — Splinter Review

The attached patch re-enables all of the mmx code except for the crashing oc_frag_recon_inter_mmx(). This seems to fix the problem for me and at least lets us get some performance improvement for the time being. I have a try server build here if other people want to test it out: https://build.mozilla.org/tryserver-builds/2009-01-26_21:20-jmuizelaar@mozilla.com-1233033586/

Attachment #358674 - Attachment is obsolete: true

Comment 19

•

16 years ago

I don't know if what I wrote above is correct. With the build above oc_frag_recon_inter_mmx is still called through oc_state_frag_recon_mmx and there are no problems. However, the code for oc_frag_recon_inter_mmx is different then the failing nightly. The failing nightly appears to have additional code at the end that calls __security_check_cookie(). I'm not sure why this there as I didn't think we were building with -GS... More investigation is needed...

Comment 20

•

16 years ago

Something weird is happening. I tried re-enabling the mmx code and the build from try server no longer crashes for me...

n.pipenbrinck

Comment 21

•

16 years ago

Hi folks. SSE detection is not the problem. The code in question is only using pentium 1 style mmx. Also the asm-code itself looks fine and is most likely not the problem, but I will double-check that. At the moment my guess is that something is going wrong in the function prolog. That doesn't explain why it works on some machines and not on others, but it's a start. If anyone of you have access to one of the machines that triggers the crash, could you please give it another try and - if it crashes - send me the object file please? I'd like to have a look at the generated code.

Comment 22

•

16 years ago

(In reply to comment #21) > At the moment my guess is that something is going wrong in the function prolog. > That doesn't explain why it works on some machines and not on others, but it's > a start. > > If anyone of you have access to one of the machines that triggers the crash, > could you please give it another try and - if it crashes - send me the object > file please? I'd like to have a look at the generated code. I've been able to reproduce the crash on all the window machines that I've tried with the build from here http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009-01-24-03-mozilla-central/

n.pipenbrinck

Comment 23

•

16 years ago

Thanks a lot, Jeff. That was a great help to track it down. I've been able to reproduce the crash as well. Now I know quite a bit more what's going on. The function in question get's some special treatment from the compiler that other (non crashing) functions don't get. VC decides to add an epilog to the code. That by itself is not a problem, but unfortunately VC forgot to add the matching function prolog. At the end a stack-frame gets removed that never existed, and this blows everything up. I guess I've triggered a code-generator bug here. We can do two things to fix it: 1. Use the C version and take the performance hit. Wait until the next compiler revision comes out and then give it another try. 2. Take the code out of VC's hand and use an external assembly-file. The problem may be that VC does not come with an assembler and we may have to use third party tools or add some unmaintainable hack via __emit... There are other strange things in the code as well. I'd like to point out the failed attempt to align the loop entry to 16 bytes. The compiler added a paranoia emms instruction as well. Also the loads of the local variables from EBP-A0h upwards look strange, but that may have something to do with whole program optimization. Here's the relevant part of the disassembly with some annotations. oc_frag_recon_inter_mmx: 104239C5 pxor mm0,mm0 104239C8 mov esi,dword ptr [ebp-9Ch] 104239CE mov edi,dword ptr [ebp-0A8h] 104239D4 mov eax,dword ptr [ebp-94h] 104239DA mov edx,dword ptr [ebp-0A0h] 104239E0 mov ebx,dword ptr [ebp-0ACh] 104239E6 mov ecx,4 ; next instruction is just a fancy NOP. It should ; align to a 16 byte boundary but fails to do so. ; (take a look at the address!) 104239EB lea esp,[esp] 104239EF movq mm3,mmword ptr [esi] 104239F2 movq mm1,mmword ptr [edi] 104239F5 movq mm2,mmword ptr [edi+8] 104239F9 movq mm7,mmword ptr [esi+eax] 104239FD movq mm4,mm3 10423A00 movq mm5,mmword ptr [edi+10h] 10423A04 punpckhbw mm4,mm0 10423A07 movq mm6,mmword ptr [edi+18h] 10423A0B punpcklbw mm3,mm0 10423A0E paddsw mm4,mm2 10423A11 movq mm2,mm7 10423A14 paddsw mm3,mm1 10423A17 punpckhbw mm2,mm0 10423A1A packuswb mm3,mm4 10423A1D punpcklbw mm7,mm0 10423A20 movq mmword ptr [edx],mm3 10423A23 paddsw mm2,mm6 10423A26 add edi,20h 10423A29 paddsw mm7,mm5 10423A2C sub ecx,1 10423A2F packuswb mm7,mm2 10423A32 lea esi,[esi+eax*2] 10423A35 movq mmword ptr [edx+ebx],mm7 10423A39 lea edx,[edx+ebx*2] 10423A3C jne 104239EF ; Unasked function prolog from here on... 10423A3E 0F 77 emms 10423A40 8B 4D FC mov ecx,dword ptr [ebp-4] 10423A43 5F pop edi 10423A44 33 CD xor ecx,ebp 10423A46 5E pop esi ; WTF is this? 10423A47 E8 E4 6B EB FF call 102DA630 ; Root of evil is most likely the next instruction. ; EBP does not point to the stack here. 10423A4C 8B E5 mov esp,ebp ; Crash happends here (ESP is trashed) 10423A4E 5D pop ebp 10423A4F 8B E3 mov esp,ebx 10423A51 5B pop ebx 10423A52 C3 ret Cheers, Nils

(not currently active) Ted Mielczarek

Comment 24

•

16 years ago

(In reply to comment #23) > Thanks a lot, Jeff. That was a great help to track it down. > > I've been able to reproduce the crash as well. Now I know quite a bit more > what's going on. > > The function in question get's some special treatment from the compiler that > other (non crashing) functions don't get. VC decides to add an epilog to the > code. That by itself is not a problem, but unfortunately VC forgot to add the > matching function prolog. > > At the end a stack-frame gets removed that never existed, and this blows > everything up. I guess I've triggered a code-generator bug here. > > > We can do two things to fix it: > > 1. Use the C version and take the performance hit. Wait until the next compiler > revision comes out and then give it another try. > > 2. Take the code out of VC's hand and use an external assembly-file. The > problem may be that VC does not come with an assembler and we may have to use > third party tools or add some unmaintainable hack via __emit... > What's interesting is that it doesn't seem broken all the time. For example, this build https://build.mozilla.org/tryserver-builds/2009-01-29_15:03-jmuizelaar@mozilla.com-1233270144/ should be using the mmx code but doesn't seem to have the same problem. If I recall it might not be doing the call to __security_check_cookie. I'm not sure why this call gets emitted as I don't think we're compiling with that option... > ; WTF is this? > 10423A47 E8 E4 6B EB FF call 102DA630 That's as far as I can tell a call to __security_check_cookie: http://msdn.microsoft.com/en-us/library/aa290051.aspx > > ; Root of evil is most likely the next instruction. > ; EBP does not point to the stack here. > 10423A4C 8B E5 mov esp,ebp

n.pipenbrinck

Comment 25

•

16 years ago

> What's interesting is that it doesn't seem broken all the time. > For example, this build > https://build.mozilla.org/tryserver-builds/2009-01-29_15:03- > jmuizelaar@mozilla.com-1233270144/ > should be using the mmx code but doesn't seem to have the same problem. > If I recall it might not be doing the call to __security_check_cookie. > I'm not sure why this call gets emitted as I don't think we're compiling > with that option... Pure luck if you ask me.. I've never had such a problem in my projects, but I've also never build a library as huge as xul.dll. Maybe Microsoft is using some randomized algorithm somewhere, or they stop optimizing based on available memory at certain points. That could explain why the same code compiles to different assembly each other day. I think the best thing we can do at the moment is to remove the frag_recon functions from the codebase and reimplement them using MMX intrinsics. That way the functions aren't anything special for the compiler anymore. My hope is that this will make the problem go away. The generated code will become a good deal slower, but it should still be fast enough to be usable. I'll ask the theora guys what they think about the issue.

Comment 26

•

16 years ago

note that -GS is the default in VC8+, and we don't change that, so we get the security_check_cookie stuff.

(not currently active) Ted Mielczarek

Comment 27

•

16 years ago

Oh, also, the official nightlies build with PGO, so that could definitely have an effect on codegen.

Ralph Giles (:rillian)

Comment 28

•

16 years ago

If you go with separate asm, I'd suggest just checking in the object files, rather than adding an assembler to the dependencies. As mentioned, this code doesn't change often.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 29

•

16 years ago

That seems bad, since we'd need object files per OS or maybe even finer grained than that.

Ralph Giles (:rillian)

Comment 30

•

16 years ago

The source files (and the bug) are MSVC specific. I don't see how object files are much different. My point was that the extra build dependency on an external assembler is avoidable for everyone but the module maintainer, since the code is both upstream and rarely changing.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 31

•

16 years ago

(In reply to comment #30) > The source files (and the bug) are MSVC specific. I don't see how object files > are much different. My point was that the extra build dependency on an external > assembler is avoidable for everyone but the module maintainer, since the code > is both upstream and rarely changing. But the main reason for using an external assembler would be to have platform independence of the source files. We have the same platform independence problem in other parts of the source tree so there are other non-upstream places that would also benefit. -Jeff

Comment 32

•

16 years ago

Sorry, I assumed some context that wasn't actually here ... what we've been considering is (I think) writing assembler in as/gas syntax, and on Windows bundling yasm into the MozillaBuild tools package. With some hackery to work around calling conventions, we could then potentially share x86 asm across OSes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

16 years ago

I guess whether that's a good idea for Ogg upstream is a separate and more difficult question.

Ralph Giles (:rillian)

Comment 34

•

16 years ago

Ah, sorry for the misunderstanding. Yes, if you have asm elsewhere in the tree, it makes sense to have a unified policy. For upstream, we want it to build out of the box with minimal dependencies, so we've resisted the external assembler so far. Also, inline asm is nice because it handles arguments for you, preprocessor is available, etc. But as always, it's also up to whatever people prefer to maintain...

Timothy B. Terriberry (:derf)

Comment 35

•

16 years ago

So, I'm confused. The original crash report at https://bugzilla.mozilla.org/show_bug.cgi?id=462082#c12 shows the fault occurring in oc_state_frag_recon_mmx(), unfortunately without enough information to identify the exact build. Nils shows a clear error in the assembly in oc_frag_recon_inter_mmx() in one particular build. The former function _already_ uses intrinsics, and contains no inline asm. So I do not think rewriting the code with intrinsics will fix anything. The latter function uses no intrinsics, so I don't think replacing the existing ones with inline asm will fix anything, either. The current patch attempts to disable the latter function, and the latter function only, by not putting it in the function table. However, oc_state_frag_recon_mmx() calls oc_frag_recon_inter_mmx() directly, without using the function table (an optimization used because we know there's only one accelerated version of the function, and we know that MMX is available). So the patch actually doesn't disable this function at all, and if that patch fixes anything, it is just because you got lucky and didn't trip the compiler bug in the next build, not because you've actually solved the problem. So you're left with the only safe option being to completely disable the asm on Windows. I think if you want to enable it, you need to determine a) whether or not this is a known bug (I find it highly unlikely that we are the first to expose it) b) what versions of the compiler are affected and c) whether there is a workaround based on _confirmed_ information about the bug from the upstream vendor. Simply applying a patch and saying "this build worked for me" is insufficient, since we do not know the real cause, and thus can't guarantee that the bug won't reappear in any arbitrary future build. I'm going to put my foot down and refuse to support "hackery to work around calling conventions": these are different not only for every platform, but also different "gcc-compatible" compilers on that platform, and even different command-line switches to that compiler. Mozilla may get to dictate these things on each platform. Xiph does not, and does not want to. I also don't personally have access to most of these to test them, even if I thought it was worth the extra maintenance hassle, which I do not. Every set of macros I've ever seen to mangle symbol names or handle different addressing modes has been deeply flawed in some way. That kind of garbage is why we have compilers in the first place. But unfortunately, right now Windows is already a special case. There isn't really anyone in the core Xiph team that can maintain it. I do want to thank Nils for coming in to help take a look at this problem. He's the only reason you have Win32 asm (without using mingw32) at all. If Nils wants to do a yasm version, at this point I'm not going to stop him (disabling the asm if it's not installed seems no worse off than the current situation, given the compiler bug). But our intent would still be to use it for MSVC on Windows only.