Open Bug 1718280 Opened 1 year ago Updated 5 months ago

Release mozregression-gui binary segfaults on start

Categories

(Testing :: mozregression, defect, P3)

defect

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: yyc1992, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0

Steps to reproduce:

Download release 4.0.18 and run it.

Running in rr triggers an assertion (https://github.com/rr-debugger/rr/issues/2903). Running in gdb suggests that the signal was raised manually.

Starting program: /home/yuyichao/workspace/mozilla/mozregression-gui/mozregression-gui
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[Detaching after fork from child process 3818085]
Fontconfig warning: "/etc/fonts/fonts.conf", line 5: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/10-hinting-slight.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/10-scale-bitmap-fonts.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/20-unhint-small-vera.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/30-metric-aliases.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/40-nonlatin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/45-generic.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/45-latin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/49-sansserif.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/51-local.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/60-generic.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/60-latin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/65-nonlatin.conf", line 4: unknown element "description"
Fontconfig warning: FcPattern object weight does not accept value [0 45)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f52702 in raise () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007ffff7f52702 in raise () from /usr/lib/libpthread.so.0
#1  0x000055555555a213 in ?? ()
#2  0x000055555555815d in ?? ()
#3  0x00007ffff7d9ab25 in __libc_start_main () from /usr/lib/libc.so.6
#4  0x000055555555650a in ?? ()

The release binary doesn't seem to have debug symbols and trying to run locally complains about missing pyside2-rcc (which seems to be removed on archlinux) so I couldn't really debug more...

Actual results:

Segfault.

Expected results:

Not?

Deleting the bundled libfontconfig got me further so I assume it's caused by some fontconfig version mismatch.

OK removing all the bundled libraries (the lib*.so* files) but keeping the following and symlink the libpython* from system works for me.

libffi-806b1a9d.so.6.0.4
libffi.so.6
libpyside2.abi3.so.5.15
libshiboken2.abi3.so.5.15

Otherwise, just removing the libfontconfig will cause the QPA backend loading to fail.

The job that builds the GUI for Linux is here. I'm open to modifying it (so long as it doesn't break existing users):

https://github.com/mozilla/mozregression/blob/b654dc7413ed8aa2bddfd78f02daa82dd991dbf9/.github/workflows/build.yml#L56

(note that I'll be AFK until August though, so it might be a while before I can look into a patch/PR)

Confirmed: removed all the libraries and tried running it, adding back what was needed until it worked, and I ended up where the reporter did, with those four so files and a libpython3.9.so.1.0 symlink to that from /usr/lib. It also wouldn’t run properly under Wayland, so I run it under X11 via mozregression-gui --platform xcb or with the environment variable QT_QPA_PLATFORM=xcb. I further removed PySide2/plugins/xcbglintegrations/*.so, because otherwise after a short while it’d crash from a symbol lookup error in libqxcb-glx-integration.so. As it is, it gets a symbol lookup error in PySide2/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so during shutdown, but I’m not fussing about that. (Can’t just remove PySide2 or PySide2/plugins wholesale, then it doesn’t start.)

I presume there’s something missing in the distribution. Pity the assertion that’s failing doesn’t give a helpful message; that makes life much harder.

(In reply to Chris Morgan from comment #4)

Confirmed: removed all the libraries and tried running it, adding back what was needed until it worked, and I ended up where the reporter did, with those four so files and a libpython3.9.so.1.0 symlink to that from /usr/lib. It also wouldn’t run properly under Wayland, so I run it under X11 via mozregression-gui --platform xcb or with the environment variable QT_QPA_PLATFORM=xcb. I further removed PySide2/plugins/xcbglintegrations/*.so, because otherwise after a short while it’d crash from a symbol lookup error in libqxcb-glx-integration.so. As it is, it gets a symbol lookup error in PySide2/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so during shutdown, but I’m not fussing about that. (Can’t just remove PySide2 or PySide2/plugins wholesale, then it doesn’t start.)

I presume there’s something missing in the distribution. Pity the assertion that’s failing doesn’t give a helpful message; that makes life much harder.

Thanks for looking into this!

We do remove some bundled stuff in the build script but it's hard to see how any of those would be causing the problems (I guess the 3d stuff might be needed now? seems like a stretch).

We could also experiment with using Ubuntu 20.04 to build the GUI instead of 18.04, maybe something needs updating for modern Linux distributions. I guess maybe the place to start would be to see whether there are issues with a pyinstaller bundle built on your local machine:

https://github.com/mozilla/mozregression#building-and-developing-mozregression

Just tried 4.0.18 on openSUSE Tumbleweed and had the same error - https://pastebin.com/s1yXNn7K

Went back to 4.0.15 which was still working.

I'm seeing what I believe to be the same issue:

➜ coredumpctl debug
           PID: 138269 (mozregression-g)
           UID: 1000 (hugo)
           GID: 1000 (hugo)
        Signal: 11 (SEGV)
     Timestamp: Thu 2022-01-13 14:56:46 CET (42s ago)
  Command Line: /opt/mozregression-gui/mozregression-gui --platform xcb
    Executable: /opt/mozregression-gui/mozregression-gui
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/slaunch-alacritty-1642081642579715585.service
          Unit: user@1000.service
     User Unit: slaunch-alacritty-1642081642579715585.service
         Slice: user-1000.slice
     Owner UID: 1000 (hugo)
       Boot ID: 8b04bace1bbe417b83d787ba559d3b85
    Machine ID: 0eb151491e4b4df28772f5c0f99b004e
      Hostname: victory
       Storage: /var/lib/systemd/coredump/core.mozregression-g.1000.8b04bace1bbe417b83d787ba559d3b85.138269.1642082206000000.zst (present)
     Disk Size: 31.0K
       Message: Process 138269 (mozregression-g) of user 1000 dumped core.

                Module linux-vdso.so.1 with build-id 3b23eb9fafd9a6683da8f2a15381914a95d54b1e
                Module ld-linux-x86-64.so.2 with build-id 040cc3dd10461562f177df39e3be2f3704258c3c
                Module libc.so.6 with build-id 4b406737057708c0e4c642345a703c47a61c73dc
                Module libpthread.so.0 with build-id 07c8f95b4f3251d08550217ad8a1f31066229996
                Module libz.so.1 with build-id 0c1459c56513efd5d53eb3868290e9afee6a6a26
                Module libdl.so.2 with build-id 5abc547e7b0949f89f3c0e21ab0c8331a7440a8a
                Module mozregression-gui with build-id 24675b0091f38f45c5c1c5484cf24925b439b164
                Stack trace of thread 138269:
                #0  0x00007fbd6d0c4702 raise (libpthread.so.0 + 0x13702)
                #1  0x000055a7fd5f1213 n/a (mozregression-gui + 0x6213)
                #2  0x000055a7fd5ef15d n/a (mozregression-gui + 0x415d)
                #3  0x00007fbd6cf0cb25 __libc_start_main (libc.so.6 + 0x27b25)
                #4  0x000055a7fd5ed50a n/a (mozregression-gui + 0x250a)
                ELF object binary architecture: AMD x86-64

Trying to use X11 also does not work:

➜ mozregression-gui --platform xcb
Fontconfig warning: "/etc/fonts/fonts.conf", line 5: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/10-hinting-slight.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/10-scale-bitmap-fonts.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/10-sub-pixel-rgb.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/20-unhint-small-vera.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/30-metric-aliases.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/40-nonlatin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/45-generic.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/45-latin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/49-sansserif.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/51-local.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/60-generic.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/60-latin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/65-nonlatin.conf", line 4: unknown element "description"
Fontconfig warning: "/etc/fonts/conf.d/70-no-bitmaps.conf", line 4: unknown element "description"
Fontconfig warning: FcPattern object weight does not accept value [40 210)
zsh: segmentation fault (core dumped)  mozregression-gui --platform xcb
Severity: -- → S2
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.