Closed Bug 1236830 Opened 8 years ago Closed 6 years ago

[emulator-x86-kk][mochitest] Run valgrind on emulator tests

Categories

(Firefox OS Graveyard :: Emulator, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cyu, Unassigned)

References

Details

Attachments

(9 files)

I have this idea from bug 1234981, where there is a crash in jemalloc that arena magic contains unexpected value. One possible cause is memory corruption. We may consider enabling valgrind on b2g tests to detect memory bugs. The target will be local emulator x86 runs, which can run in near-native speed with kvm support to make the slowdown of valgrind more acceptable.
Depends on: 1229348
I got this crash in the linker. It seems to fail in calling soinfo_alloc().

ADB Location:  adb
remount succeeded
Compressing libxul.so...
==1749== Memcheck, a memory error detector
==1749== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1749== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==1749== Command: /data/valgrind-b2g/b2g
==1749==
WARNING: linker: vgpreload_memcheck-x86-linux.so has text relocations. This is wasting memory and is a security risk. Please fix.
==1749== Invalid read of size 4
==1749==    at 0x40049A7: add_vdso (bionic/linker/linker.cpp:1596)
==1749==    by 0x40049A7: __dl__ZL29__linker_init_post_relocationR19KernelArgumentBlockj (bionic/linker/linker.cpp:1733)
==1749==    by 0x40078C7: __dl___linker_init (bionic/linker/linker.cpp:1855)
==1749==    by 0x4008703: __dl__start (bionic/linker/arch/x86/begin.c:38)
==1749==  Address 0x1c is not stack'd, malloc'd or (recently) free'd
==1749==
==1749==
==1749== Process terminating with default action of signal 11 (SIGSEGV)
==1749==  Access not within mapped region at address 0x1C
==1749==    at 0x40049A7: add_vdso (bionic/linker/linker.cpp:1596)
==1749==    by 0x40049A7: __dl__ZL29__linker_init_post_relocationR19KernelArgumentBlockj (bionic/linker/linker.cpp:1733)
==1749==    by 0x40078C7: __dl___linker_init (bionic/linker/linker.cpp:1855)
==1749==    by 0x4008703: __dl__start (bionic/linker/arch/x86/begin.c:38)
==1749==  If you believe this happened as a result of a stack
==1749==  overflow in your program's main thread (unlikely but
==1749==  possible), you can try to increase the size of the
==1749==  main thread stack using the --main-stacksize= flag.
==1749==  The main thread stack size used in this run was 8388608.
==1749==
==1749== HEAP SUMMARY:
==1749==     in use at exit: 0 bytes in 0 blocks
==1749==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1749==
==1749== All heap blocks were freed -- no leaks are possible
==1749==
==1749== For counts of detected and suppressed errors, rerun with: -v
==1749== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #1)
> I got this crash in the linker. It seems to fail in calling soinfo_alloc().

Cervantes, can you re-run with -v added to the flags for Valgrind and
post the results as an attachment?
Attached file Linker debug log
I cross-checked linker on KK with the one on L and found that add_vdso() has a bug that it doesn't perform nullity check against ehdr_vdso. Adding a check as L does fixes this bug.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> I cross-checked linker on KK with the one on L and found that add_vdso() has
> a bug that it doesn't perform nullity check against ehdr_vdso. Adding a
> check as L does fixes this bug.

So .. am I correct to understand that -- at least for the crash --
there is nothing that needs to be changed in Valgrind?


I should point out one other thing, though.  From Comment 5 log I see
a lot of false errors in calls to __dl_strlen.  You might be able to
get rid of these by adding

#     if defined(VGPV_x86_linux_android)
      add_hardwired_spec(
         "NONE", "__dl_strlen",
         (Addr)&VG_(x86_linux_REDIR_FOR_strlen),
         NULL
      );
#     endif

in the section for guarded by "#  if defined(VGP_x86_linux)" in 
VG_(redir_initialise) in m_redir.c.
(In reply to Julian Seward [:jseward] from comment #7)
> (In reply to Cervantes Yu [:cyu] [:cervantes] from comment #6)
> > I cross-checked linker on KK with the one on L and found that add_vdso() has
> > a bug that it doesn't perform nullity check against ehdr_vdso. Adding a
> > check as L does fixes this bug.
> 
> So .. am I correct to understand that -- at least for the crash --
> there is nothing that needs to be changed in Valgrind?
Yes. It can be fixed in the linker. Nothing needs to be done in valgrind.
I got b2g on emulator-x86-kk boot to the homescreen with the following changes

- The changes to gonk-misc/default-gecko-config and .userconfig as in https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Debugging/Debugging_B2G_using_valgrind
- increase system and user data partition image size
- increase qemu memory size
- rebuild goldfish kernel with CONFIG_HIGHMEM and CONFIG_HIGHMEM4G so the kernel can use memory > 895 MB
- build valgrind from source and adb push to the system partition after emulator starts (it's supposed to build with B2G_VALGRIND=1, but emulator-x86-kk doesn't so I built it manually)
- 2 fixes in bionic linker. 1 for startup crash and the other for SIGFPE that deadlocks Nuwa

Then b2g starts with run-valgrind.sh. The next step is integration with mochitest and other tests.
A quick hack for running mochitest with valgrind on emulator-x86-kk.

This patch modifies the command line in running "./mach mochitest" to start b2g with valgrind. Now I can run a single folder (tested with dom/ipc/tests/). Running a chunk passes mochitest-plain but gets stuck in mochitest-plain with subsuite webgl).
Attached file valgrind-logs.tar.bz2
Emulator and adb logcat logs for running "./mach mochitest -f plain --total-chunks 9 --this-chunk 2"
Kernel source downloaded from

> git clone https://github.com/mozilla-b2g/kernel_goldfish.git
> git checkout remotes/origin/b2g-goldfish-3.4.67

and is built with the following configs enabled:

> CONFIG_HIGHMEM=y
> CONFIG_HIGHMEM4G=y

so that we may give qemu more memory for running valgrind.
This patches $B2G_HOME/build/target/board/generic_x86/BoardConfig.mk for creating larger images on which we can put valgrind and b2g binaries with symbol.
This patches $B2G_HOME/bionic/linker/linker.cpp for
1. valgrind startup crash.
2. Nuwa deadlock in SIGFPE when calling dlsym().
To run emulator-x86-kk with valgrind:
1. download B2G source from git://github.com/mozilla-b2g/B2G.git to $B2G_HOME
2. run under $B2G_HOME: ./config.sh emulator-x86-kk
3. apply attachment 8712624 [details] [diff] [review] under $B2G_HOME/build
4. apply attachment 8712628 [details] [diff] [review] under $B2G_HOME/bionic
5. download attachment 8712620 [details] and overwrite $B2G_HOME/prebuilts/qemu-kernel/x86/kernel-qemu
6. build valgrind for android x86 as in http://valgrind.org/docs/manual/dist.readme-android.html
7. add the following to $B2G_HOME/.userconfig as in https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Debugging/Debugging_B2G_using_valgrind
> export B2G_VALGRIND=1
> export DISABLE_JEMALLOC=1
and add the following to $B2G_HOME/gonk-misc/default-gecko-config:
> ac_add_options --enable-optimize="-g -O2"
> ac_add_options --enable-valgrind
> ac_add_options --disable-jemalloc
> ac_add_options --disable-sandbox
8. build the emulator by running ./build.sh under $B2G_HOME. If there are errors that -lX11 or -lGL not found (like on ubuntu 14.04), creating symbolic links as the following works the error around
> cd /usr/lib
> sudo ln -s ./i386-linux-gnu/libX11.so.6 libX11.so
> sudo ln -s ./i386-linux-gnu/mesa/libGL.so.1 libGL.so.1
9. After emulator is successfully built, start it by running
> ./run-emulator.sh
under $B2G_HOME
10. push valgrind binaries to the system partition:
> adb remount
Run under valgrind source directory.
> adb push Inst /
11. Then b2g should be able to start with the script:
> ./run-valgrind.sh
There are lots of false errors caused by Nuwa's stack tricks and we need to do as in bug 1125091 for android x86. Julian, can we have your help for fixing the false errors? Thanks.
Flags: needinfo?(jseward)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #15)
> To run emulator-x86-kk with valgrind:
> 1. download B2G source from git://github.com/mozilla-b2g/B2G.git to $B2G_HOME
> 2. run under $B2G_HOME: ./config.sh emulator-x86-kk
> 3. apply attachment 8712624 [details] [diff] [review] under $B2G_HOME/build
> 4. apply attachment 8712628 [details] [diff] [review] under $B2G_HOME/bionic
> 5. download attachment 8712620 [details] and overwrite
> $B2G_HOME/prebuilts/qemu-kernel/x86/kernel-qemu
> 6. build valgrind for android x86 as in
> http://valgrind.org/docs/manual/dist.readme-android.html
> 7. add the following to $B2G_HOME/.userconfig as in
> https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Debugging/
> Debugging_B2G_using_valgrind
> > export B2G_VALGRIND=1
> > export DISABLE_JEMALLOC=1
> and add the following to $B2G_HOME/gonk-misc/default-gecko-config:
> > ac_add_options --enable-optimize="-g -O2"
> > ac_add_options --enable-valgrind
> > ac_add_options --disable-jemalloc
> > ac_add_options --disable-sandbox
> 8. build the emulator by running ./build.sh under $B2G_HOME. If there are
> errors that -lX11 or -lGL not found (like on ubuntu 14.04), creating
> symbolic links as the following works the error around
> > cd /usr/lib
> > sudo ln -s ./i386-linux-gnu/libX11.so.6 libX11.so
> > sudo ln -s ./i386-linux-gnu/mesa/libGL.so.1 libGL.so.1
8.1 Apply attachment 8713005 [details] [diff] [review] to $B2G_HOME/run-emulator.sh to grant more memory (2GB) to qemu
> 9. After emulator is successfully built, start it by running
> > ./run-emulator.sh
> under $B2G_HOME
> 10. push valgrind binaries to the system partition:
> > adb remount
> Run under valgrind source directory.
> > adb push Inst /
> 11. Then b2g should be able to start with the script:
> > ./run-valgrind.sh
Flags: needinfo?(jseward)
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: