Closed Bug 572034 Opened 11 years ago Closed 11 years ago

Enable asm optimized Y'CbCr conversion on OS X x86_64

Categories

(Core :: Audio/Video, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: kinetik, Assigned: kinetik)

Details

Attachments

(1 file, 2 obsolete files)

We're currently falling back to C code on x86_64 OS X.  The x86_64 Linux asm already in the tree seems to work fine when built on Mac, so we can probably just use that directly.
Summary: Enable asm optimized Y'CbCr conversion on OS X 64-bit → Enable asm optimized Y'CbCr conversion on OS X x86_64
Attached patch patch v0 (obsolete) — Splinter Review
This works for me on x86_64.
Pushed to tryserver.
Comment on attachment 451208 [details] [diff] [review]
patch v0

Looks good on tryserver, and reftests pass for me locally.
Attachment #451208 - Flags: review?(chris.double)
Attachment #451208 - Flags: review?(chris.double) → review+
Keywords: checkin-needed
Whiteboard: [needs landing]
Assignee: nobody → kinetik
http://hg.mozilla.org/mozilla-central/rev/f2835c78ef3f
Status: NEW → RESOLVED
Closed: 11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing]
Backed out due to crash on Tinderbox running reftests on OSX x86_64:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1276728072.1276728777.19294.gz

Backout: http://hg.mozilla.org/mozilla-central/rev/53510454716e
Merge backout: http://hg.mozilla.org/mozilla-central/rev/71f62ccea457
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Uh, yeah...

The code is supposed to look like this:
http://mxr.mozilla.org/mozilla-central/source/gfx/ycbcr/yuv_row_linux.cpp#252

After the optimizer is done with it, it looks like:

0x100c6dc30 <FastConvertYUVToRGB32Row>:	push   %rbp
0x100c6dc31 <FastConvertYUVToRGB32Row+1>:	mov    %rsp,%rbp
0x100c6dc34 <FastConvertYUVToRGB32Row+4>:	lea    0x7544e5(%rip),%rax        # 0x1013c2120 <kCoefficientsRgbY>
0x100c6dc3b <FastConvertYUVToRGB32Row+11>:	jmpq   0x100c6dc9e <convertend>
0x100c6dc40 <convertloop>:	movzbq (%rsi),%r10
0x100c6dc44 <convertloop+4>:	add    $0x1,%rsi
0x100c6dc48 <convertloop+8>:	movzbq (%rdx),%r11
0x100c6dc4c <convertloop+12>:	add    $0x1,%rdx
0x100c6dc50 <convertloop+16>:	movq   0x800(%rax,%r10,8),%xmm0
0x100c6dc5a <convertloop+26>:	movzbq (%rdi),%r10
0x100c6dc5e <convertloop+30>:	movq   0x1000(%rax,%r11,8),%xmm1
0x100c6dc68 <convertloop+40>:	movzbq 0x1(%rdi),%r11
0x100c6dc6d <convertloop+45>:	paddsw %xmm1,%xmm0
0x100c6dc71 <convertloop+49>:	movq   (%rax,%r10,8),%xmm2
0x100c6dc77 <convertloop+55>:	add    $0x2,%rdi
0x100c6dc7b <convertloop+59>:	movq   (%rax,%r11,8),%xmm3
0x100c6dc81 <convertloop+65>:	paddsw %xmm0,%xmm2
0x100c6dc85 <convertloop+69>:	paddsw %xmm0,%xmm3
0x100c6dc89 <convertloop+73>:	shufps $0x44,%xmm3,%xmm2
0x100c6dc8d <convertloop+77>:	psraw  $0x6,%xmm2
0x100c6dc92 <convertloop+82>:	packuswb %xmm2,%xmm2
0x100c6dc96 <convertloop+86>:	movq   %xmm2,(%rcx)
0x100c6dc9a <convertloop+90>:	add    $0x8,%rcx
0x100c6dc9e <convertend>:	sub    $0x2,%r8d
0x100c6dca2 <convertend+4>:	jns    0x100c6dc40 <convertloop>
0x100c6dca8 <convertend+10>:	nop    
0x100c6dca9 <convertend+11>:	nop    
0x100c6dcaa <convertend+12>:	nop    
0x100c6dcab <convertend+13>:	nop    
0x100c6dcac <convertend+14>:	nop    
0x100c6dcad <convertend+15>:	nop    
0x100c6dcae <convertend+16>:	nop    
0x100c6dcaf <convertend+17>:	nop    
0x100c6dcb0 <jpeg_consume_input>:	push   %rbp

Note how convertend falls off into jpeg_consume_input and that convertnext (where it's supposed to fall into) is completely missing.
convertnext is never referenced (other than the declaration) in the asm.  It seems like the optimizer is noticing this and removing the code (and also removing convertdone which becomes unreferenced when convertnext is removed).  Removing the convertnext label fixes the problem for me.

I can't find any documentation that states this behaviour is expected.  I was also surprised to discovered that declaring the asm as "volatile" didn't stop GCC from removing convertnext, et al.
Attached patch patch v1 (obsolete) — Splinter Review
This works and passes reftests on optimized x86_64 OS X for me.
Attachment #451208 - Attachment is obsolete: true
Attachment #451855 - Flags: review?(chris.double)
I think this has to be link-time optimization that's pruning these, not gcc. gcc feeds the asm string verbatim to the assembler after expanding the macros, and does not try to parse it for optimization purposes at all (except to guess how many instructions are in it for evaluating inlining costs and deciding if jumps around it must be made long jumps, but it does this from counting newlines and semi-colons).

It's generally a bad idea to use global labels like this in inline asm, anyway. Even if there weren't linker problems, gcc can duplicate entire asm blocks as it inlines functions, unrolls loops, etc.
Yeah, I wondered about the whole global label thing.  It should be pretty trivial to convert all of this code to local labels--just substitute the named labels with numbered ones starting from 1, right?
Attachment #451855 - Flags: review?(chris.double) → review+
Or just stick a capital L in front of the label names.  I'll post a new patch soon.
Attached patch patch v2Splinter Review
Convert all labels to local variants.  Don't remove convertnext label (it's useless, but harmless--keeping it makes it easier to resync with upstream code).  The code in yuv_row_mac.cpp already uses local labels, so no change needed there.

Sorry to spam you with review requests, Chris!
Attachment #451855 - Attachment is obsolete: true
Attachment #451866 - Flags: review?(chris.double)
Attachment #451866 - Flags: review?(chris.double) → review+
Keywords: checkin-needed
Whiteboard: [needs landing]
http://hg.mozilla.org/mozilla-central/rev/d268e54fbfcf
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing]
Something is causing random orange on Linux m1 (bug 573161).  Backed this out as it's the only Linux-specific change in the window of suspicion:

http://hg.mozilla.org/mozilla-central/rev/c96409d30136
http://hg.mozilla.org/mozilla-central/rev/3e0eb9fe8f48
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #11)
> Or just stick a capital L in front of the label names.  I'll post a new patch
> soon.

This doesn't actually generate local labels on my Fedora 12 machine. :-(
I relanded this as
http://hg.mozilla.org/mozilla-central/rev/d6c0afe65fc7
since it does not appear to be responsible for the failures.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
(In reply to comment #15)
> This doesn't actually generate local labels on my Fedora 12 machine. :-(

Fixing this in bug 573590.
You need to log in before you can comment on or make changes to this bug.