Open Bug 952048 Opened 7 years ago Updated 5 years ago

SIGBUS on OpenBSD since libvpx 1.3.0 update

Categories

(Core :: WebRTC: Audio/Video, defect)

x86_64
OpenBSD
defect
Not set
normal

Tracking

()

Tracking Status
firefox27 --- unaffected
firefox28 --- affected
firefox29 --- affected
Blocking Flags:
backlog parking-lot

People

(Reporter: gaston, Unassigned)

References

Details

Probably a fallout of 918550, right now nightly & aurora sigbuses at startup after showing the main window on OpenBSD/amd64 with the following backtrace:

#0  0x000017c04503e1b6 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
(gdb) bt
#0  0x000017c04503e1b6 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#1  0x000017c0450aae7f in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#2  0x000017c045003a4d in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#3  0x000017c044ffcfec in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#4  0x000017c04501f84d in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#5  0x000017c044ff82b1 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#6  0x000017c04502e0af in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#7  0x000017c0450b6d59 in std::vector<short, std::allocator<short> >::_M_insert_aux () from /home/landry/firefox/libxul.so.1.0
#8  0x000017c043d28876 in imgLoader::SupportImageWithMimeType () from /home/landry/firefox/libxul.so.1.0
#9  0x000017c044d7617c in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
Blocks: 918550
Jan, this is still broken and is preventing work on trunk for me - any idea ?

Looking for that identifier in mxr, i only see a declaration, no definition... 

http://mxr.mozilla.org/mozilla-central/ident?i=vp9_half_horiz_variance8x_h_sse2
Flags: needinfo?(j)
(In reply to Landry Breuil (:gaston) from comment #1)
> Looking for that identifier in mxr, i only see a declaration, no
> definition... 

Try http://mxr.mozilla.org/mozilla-central/source/media/libvpx/vp9/encoder/x86/vp9_variance_impl_sse2.asm#631
I believe the definition is https://mxr.mozilla.org/mozilla-central/source/media/libvpx/vp9/encoder/x86/vp9_variance_impl_sse2.asm#631. SIGBUS is weird. Can that be an illegal instructions, or is it only a memory access error? Alignment problem? Does your CPU support sse2? Can you verify your toolchain is assembling this file correctly?

As a work around, try adding this to your mozconfig:

ac_add_options --disable-webm --disable-webrtc

or maybe:

#define HAVE_SSE2 0

in media/libvpx/config_x86-linux-gcc.* etc.
Flags: needinfo?(j)
The cpu supports SSE2, the arch is amd64, and my builds are done with ac_add_options --enable-gstreamer --disable-webrtc - and i'm using yasm 1.2/clang 3.3 on OpenBSD.

A full build log is at http://buildbot.rhaalovely.net/builders/mozilla-central-amd64/builds/995/steps/build/logs/stdio

That file is built with:

vp9_variance_impl_sse2.o
yasm -o vp9_variance_impl_sse2.o -f elf64 -rnasm -pnasm -DPIC -I. -I/var/buildslave-mozilla/mozilla-central-amd64/build/media/libvpx/ -I/var/buildslave-mozilla/mozilla-central-amd64/build/media/libvpx/vpx_ports/  -g dwarf2   /var/buildslave-mozilla/mozilla-central-amd64/build/media/libvpx/vp9/encoder/x86/vp9_variance_impl_sse2.asm


I, of course, would rather see that fixed instead of having do disable webm or sse2...
Tried aurora, broken too the same way (not surprising since libvpx 1.3 is there too now)
Note that the crash only happens when loading web content (ie for example about:support) - interestingly i can load http://mozilla.github.io/webrtc-landing/gum_test.html without a crash, as about: and about:about. ggogle.fr segfaults..

Looking at the thread list in gdb when it crashes on about:support, two of them are in libvpx:

Thread 2 (process 31150):
#0  0x0000077a41f35d56 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#1  0x0000077a41fa2a1f in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#2  0x0000077a41efc51d in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#3  0x0000077a41ef8d9c in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#4  0x0000077a41f1b6ed in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#5  0x0000077a41ef4611 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#6  0x0000077a41f25c4f in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#7  0x0000077a41faeb79 in std::_Rb_tree<void const*, void const*, std::_Identity<void const*>, std::less<void const*>, std::allocator<void const*> >::_M_erase () from /home/landry/firefox/libxul.so.1.0
#8  0x0000077a40ce2e31 in imgLoader::SupportImageWithMimeType () from /home/landry/firefox/libxul.so.1.0
#9  0x0000077a41c813bc in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
#10 0x0000077a41c6426e in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
#11 0x0000077a40b6047d in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#12 0x0000077a40b6050a in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#13 0x0000077a4083d065 in NS_InvokeByIndex () from /home/landry/firefox/libxul.so.1.0
#14 0x0000077a4083ccdc in NS_InvokeByIndex () from /home/landry/firefox/libxul.so.1.0
#15 0x0000077a407c7210 in NS_NewLocalFile () from /home/landry/firefox/libxul.so.1.0
#16 0x0000077a407d8f86 in XRE_AddJarManifestLocation () from /home/landry/firefox/libxul.so.1.0
#17 0x0000077a40785115 in ?? () from /home/landry/firefox/libxul.so.1.0
#18 0x0000077a4098364f in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#19 0x0000077a4095bd4d in std::_Rb_tree<int, std::pair<int const, std::string>, std::_Select1st<std::pair<int const, std::string> >, std::less<int>, std::allocator<std::pair<int const, std::string> > >::_M_erase () from /home/landry/firefox/libxul.so.1.0
#20 0x0000077a411fc6fb in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf ()
   from /home/landry/firefox/libxul.so.1.0
#21 0x0000077a41c4262e in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
#22 0x0000077a41c099f7 in XRE_InitCommandLine () from /home/landry/firefox/libxul.so.1.0
#23 0x0000077a41c09bc2 in XRE_InitCommandLine () from /home/landry/firefox/libxul.so.1.0
#24 0x0000077a41c0a04e in XRE_main () from /home/landry/firefox/libxul.so.1.0
#25 0x0000077830003eb7 in __register_frame_info () from /home/landry/firefox/firefox
#26 0x0000077830003821 in _start () from /home/landry/firefox/firefox
#27 0x0000000000000000 in ?? ()

Thread 1 (thread 1031150):
#0  0x0000077a41f35d56 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#1  0x0000077a41fa2a1f in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#2  0x0000077a41efc51d in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#3  0x0000077a41ef8d9c in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#4  0x0000077a41f1b6ed in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#5  0x0000077a41ef4611 in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#6  0x0000077a41f25c4f in vp9_half_horiz_variance8x_h_sse2 () from /home/landry/firefox/libxul.so.1.0
#7  0x0000077a41faeb79 in std::_Rb_tree<void const*, void const*, std::_Identity<void const*>, std::less<void const*>, std::allocator<void const*> >::_M_erase () from /home/landry/firefox/libxul.so.1.0
#8  0x0000077a40ce2e31 in imgLoader::SupportImageWithMimeType () from /home/landry/firefox/libxul.so.1.0
#9  0x0000077a41c813bc in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
#10 0x0000077a41c6426e in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
#11 0x0000077a40b6047d in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#12 0x0000077a40b6050a in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#13 0x0000077a4083d065 in NS_InvokeByIndex () from /home/landry/firefox/libxul.so.1.0
#14 0x0000077a4083ccdc in NS_InvokeByIndex () from /home/landry/firefox/libxul.so.1.0
#15 0x0000077a407c7210 in NS_NewLocalFile () from /home/landry/firefox/libxul.so.1.0
#16 0x0000077a407d8f86 in XRE_AddJarManifestLocation () from /home/landry/firefox/libxul.so.1.0
#17 0x0000077a40785115 in ?? () from /home/landry/firefox/libxul.so.1.0
#18 0x0000077a4098364f in std::vector<std::string, std::allocator<std::string> >::vector () from /home/landry/firefox/libxul.so.1.0
#19 0x0000077a4095bd4d in std::_Rb_tree<int, std::pair<int const, std::string>, std::_Select1st<std::pair<int const, std::string> >, std::less<int>, std::allocator<std::pair<int const, std::string> > >::_M_erase () from /home/landry/firefox/libxul.so.1.0
#20 0x0000077a411fc6fb in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf ()
   from /home/landry/firefox/libxul.so.1.0
#21 0x0000077a41c4262e in XRE_StartupTimelineRecord () from /home/landry/firefox/libxul.so.1.0
maps.google.fr and www.youtube.com load fine, so something is fishy - maybe the crash is only triggered when some specific mimetype is accessed, given that SupportImageWithMimeType is in the trace ?
Hmmm, now thinking about what was commited to libvpx in the past, maybe 785638 & 774598 need revisiting for vp9 ?

Jan, does trunk runs fine for you on freebsd without crashes ?
Looking more closely at https://hg.mozilla.org/mozilla-central/rev/f4f8faa3771c#l358.24 - this might be AVX support - is that a cpu flag ? According to http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support we (openbsd) dont have support for AVX.

But still, that doesnt look used anywhere in libvpx's code.... so i might be on the wrong track.
FWIW, those stacks are almost certainly wrong. In fact, AFAICT, nothing ever calls vp9_half_horiz_variance8x_h_sse2(). Thanks to the RTCD macro magic, it's hard to be certain... but I bet you could delete the code entirely with no ill effects. It was probably simply copied over from the corresponding VP8 code (vp8_half_horiz_variance8x_h_sse2(), which _is_ called from media/libvpx/vp8/common/x86/variance_sse2.c).
libvpx has a standalone decoder. Why not try to crash there based on gdb output and build config?

(In reply to Landry Breuil (:gaston) from comment #8)
> Hmmm, now thinking about what was commited to libvpx in the past, maybe
> 785638 & 774598 need revisiting for vp9 ?

That shouldn't matter unless you disable *.asm code in the port or forget to apply port-specific fixes.

(In reply to Landry Breuil (:gaston) from comment #8)
> Jan, does trunk runs fine for you on freebsd without crashes ?

It does, no issues viewing VP9 samples on my amd64 box or within 32bit jail. As both PkgSrc and FreeBSD ports now have 1.3.0 using --with-system-libvpx works, too.
(In reply to Jan Beich from comment #11)
> It does, no issues viewing VP9 samples on my amd64 box or within 32bit jail.
> As both PkgSrc and FreeBSD ports now have 1.3.0 using --with-system-libvpx
> works, too.

Do you mean it works for you, both with system libvpx and bundled one ?
Fwiw, a build of trunk on powerpc runs fine - of course, since it doesnt have all this asm goo.
Jan, are you interested in trying to fix this? It would be nice to get it resolved.
Flags: needinfo?(jbeich)
_I_ am interested in fixing this before 28 hits beta, but i have no idea what could be the root cause of the SIGBUS, besides the whole libvpx update... i can only test diffs, or provide logs, but gdb is unusable for me.
(In reply to Ralph Giles (:rillian) from comment #14)
> Jan, are you interested in trying to fix this? It would be nice to get it
> resolved.

No, I don't use OpenBSD to try hunting for clues like:
- testing with different toolchains (recent gcc/binutils, clang -no-integrated-as) and on i386 with sse2
- craft a VP9 sample or emulate mozilla cflags/environment to try crashing vpxdec and/or ffmpeg
- gdb backtrace (with locals) for default -O0 -g non-debug build
- bisecting upstream libvpx commit history, --with-system-libvpx may be faster
Flags: needinfo?(jbeich)
technically, libvpx 1.3.0 hasnt even been released, only tagged in hg - so i'll have to wrap up my own system libvpx..
I'm trying to disable sse2/sse3/ssse3/sse4.1/avx in libvpx, setting the various *SSE* values to zero in media/libvpx/vpx_config_x86_64-linux-gcc.{h,asm} but the corresponding asm/c files seems to be still built - is it the correct way to disable those optimisations ?
(In reply to Landry Breuil (:gaston) from comment #17)
> technically, libvpx 1.3.0 hasnt even been released, only tagged in hg

http://webm.googlecode.com/files/libvpx-v1.3.0.tar.bz2

They finally posted a tarball based on the the 1.3.0 tag a couple of days ago.
So i tried building with HAVE_SSE2/HAVE_SSE3/HAVE_SSE4_1/HAVE_SSSE3/HAVE_AVX set to 0, but libxul linking fails :

: In function `vp8_loop_filter_row_normal':
/home/landry/src/m-c/media/libvpx/vp8/common/loopfilter.c:229: undefined reference to `vp8_loop_filter_mbv_sse2'


Grr.
Interestingly, a build from last night's tip with --enable-pulseaudio --enable-gstreamer (and then, --enable-webrtc implied) on amd64 unpatched doesnt seem to segfault like it used to, and displays about:support fine (and gmaps, and gmail and google's homepage...) My previous segfaulting builds were with --disable-pulseaudio --disable-webrtc.

A build of aurora with --enable-gstreamer --disable-webrtc (and --disable-pulseaudio implied) works on some pages, then SIGBUSes' on about:support, but the trace doesnt show libvpx.

(gdb) bt
#0  0x00000bbc75363a56 in std::vector<void*, std::allocator<void*> >::_M_fill_insert () from /home/landry/firefox/libxul.so.1.0
#1  0x0000000000000000 in ?? ()

So i'm wondering if all this could be linked to webrtc being enabled or not - and shown more since the libvpx update ?
(In reply to Landry Breuil (:gaston) from comment #20)
> So i tried building with HAVE_SSE2/HAVE_SSE3/HAVE_SSE4_1/HAVE_SSSE3/HAVE_AVX
> set to 0, but libxul linking fails :

libvpx supports doing so at runtime. Try VPX_SIMD_CAPS_MASK=0xfb to disable only SSE2 or VPX_SIMD_CAPS=0 for everything.

(In reply to Landry Breuil (:gaston) from comment #21)
> but the trace doesnt show libvpx.

Couldn't -fomit-frame-pointer corrupt the stack ? It's added by default for non-debug builds.
(In reply to Jan Beich from comment #22)
> (In reply to Landry Breuil (:gaston) from comment #20)
> > So i tried building with HAVE_SSE2/HAVE_SSE3/HAVE_SSE4_1/HAVE_SSSE3/HAVE_AVX
> > set to 0, but libxul linking fails :
> 
> libvpx supports doing so at runtime. Try VPX_SIMD_CAPS_MASK=0xfb to disable
> only SSE2 or VPX_SIMD_CAPS=0 for everything.

Thanks for the tip - i think this rules out optimizations in libvpx, since

$~/firefox-aurora/firefox -no-remote -P Aurora
Bus error (core dumped) 

$VPX_SIMD_CAPS_MASK=0 ~/firefox-aurora/firefox -no-remote -P Aurora    
Bus error (core dumped)
Blocks: 960426
No longer blocks: 960426
Fwiw, i've done some testing with fx 28.0b3 built within our ports infrastructure, and i'm writing this comment from it - browsed a bit, saw no crash. That build still has --disable-webrtc (actually:  --disable-webrtc --enable-gstreamer --with-system-zlib=/usr --with-system-libevent=/usr/ --with-system-bz2=/usr/local --with-system-nspr --with-system-nss --enable-official-branding --enable-gio --disable-gconf --disable-necko-wifi --disable-optimize --disable-tests --disable-updater --disable-dbus --enable-application=browser --prefix=/usr/local --sysconfdir=/etc --mandir=/usr/local/man --infodir=/usr/local/info --localstatedir=/var --disable-silent-rules) so i dont get what was wrong at the time.. and why it's not breaking the same way. I'll retest aurora and central (the latter, once bug 973310 is fixed)
Interestingly, trunk still crashes after browsing some patches, and this time i've seen it crash with webrtc enabled (that is  --enable-gstreamer --enable-pulseaudio --cache-file=/dev/null were the only configure args)
Component: Audio/Video → WebRTC: Audio/Video
Landry - is this still a problem?  Libvpx has been updated since the last report.  Thanks!
backlog: --- → parking-lot
Flags: needinfo?(landry)
I still have some local patches working around sse build config issues (ie https://bugzilla.mozilla.org/show_bug.cgi?id=1122745) and i havent been able to get back to this. I also need to figure out if the issue was only with bundled libvpx and not present with systemwide libvpx (1.4.0 on OpenBSD nowadays)
Flags: needinfo?(landry)
You need to log in before you can comment on or make changes to this bug.