Closed Bug 1587288 Opened 5 years ago Closed 5 years ago

Firefox crashes on a thread doing LZ4 compression in unoptimized debug build

Categories

(Toolkit :: Startup and Profile System, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1587107

People

(Reporter: TYLin, Unassigned)

Details

I notice in my Linux environment, Firefox unoptimized debug build starts to crash after opening for about a minutes. Remove the objdir and rebuild doesn't fix the issue.

rr call stack shows the following.

Thread 5 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4452.4588]
__memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:141
141	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(rr) bt
#0  0x00007fe21eb41f2d in __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:141
#1  0x000055812f386cae in LZ4_streamHC_t_alignment () at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4hc.c:830
#2  0x000055812f3869d6 in LZ4_initStreamHC (buffer=0x7fe1ec52a000, size=262200) at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4hc.c:917
#3  0x000055812f386c1d in LZ4_createStreamHC () at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4hc.c:896
#4  0x000055812f38248d in LZ4F_compressBegin_usingCDict (cctxPtr=0x7fe1ece7e040, dstBuffer=0x7fe1ec47e000, dstCapacity=262156, cdict=0x0, preferencesPtr=0x7fe1f95f8878) at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4frame.c:621
#5  0x000055812f3836fd in LZ4F_compressBegin (cctxPtr=0x7fe1ece7e040, dstBuffer=0x7fe1ec47e000, dstCapacity=262156, preferencesPtr=0x7fe1f95f8878) at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4frame.c:715
#6  0x000055812f3aa66a in mozilla::Compression::LZ4FrameCompressionContext::BeginCompressing(mozilla::Span<char, 18446744073709551615ul>) (this=0x7fe1f95f8bc8, aWriteBuffer=...) at /home/aethanyc/Projects/gecko/mfbt/Compression.cpp:126
#7  0x00007fe20f7211c1 in mozilla::scache::StartupCache::WriteToDisk() (this=0x7fe21e756c40) at /home/aethanyc/Projects/gecko/startupcache/StartupCache.cpp:531
#8  0x00007fe20f723bb2 in mozilla::scache::StartupCache::ThreadedWrite(void*) (aClosure=0x7fe21e756c40) at /home/aethanyc/Projects/gecko/startupcache/StartupCache.cpp:654
#9  0x00007fe21fdd55ea in _pt_root (arg=0x7fe1ece25ee0) at /home/aethanyc/Projects/gecko/nsprpub/pr/src/pthreads/ptthread.c:201
#10 0x00007fe21f8ee6db in start_thread (arg=0x7fe1f95f9700) at pthread_create.c:463
#11 0x00007fe21ead488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) f 1
#1  0x000055812f386cae in LZ4_streamHC_t_alignment () at /home/aethanyc/Projects/gecko/mfbt/lz4/lz4hc.c:830
830	    struct { char c; LZ4_streamHC_t t; } t_a;

My mozconfig contains the following: (I guess the --disable-optimize is the key, otherwise it should have been detected by our CI.)

ac_add_options --enable-debug
ac_add_options --disable-optimize

I can reproduce a quicker crash by change the number 60000 to 1000 in
https://searchfox.org/mozilla-central/rev/7cc0f0e89cb40e43bf5c96906f13d44705401042/startupcache/StartupCache.cpp#740,756

I disabled optimize in my debug for accurate information in rr and gdb, so this may affect other people's day to day work.

This may related to bug 1550108. Doug, could you take a look?

Flags: needinfo?(dothayer)

That is a very strange stack. I'll try to reproduce on my end, but in the mean time could you print the disassembly of LZ4_streamHC_t_alignment?

Flags: needinfo?(dothayer)

Sure.

(rr) f 1
#1  0x000055c89924bcae in LZ4_streamHC_t_alignment () at /home/tlin/Projects/gecko/mfbt/lz4/lz4hc.c:830
830	    struct { char c; LZ4_streamHC_t t; } t_a;
(rr) disassemble 
Dump of assembler code for function LZ4_streamHC_t_alignment:
   0x000055c89924bc80 <+0>:	push   %rbp
   0x000055c89924bc81 <+1>:	mov    %rsp,%rbp
   0x000055c89924bc84 <+4>:	sub    $0x40050,%rsp
   0x000055c89924bc8b <+11>:	mov    %fs:0x28,%rax
   0x000055c89924bc94 <+20>:	mov    %rax,-0x8(%rbp)
   0x000055c89924bc98 <+24>:	lea    -0x40048(%rbp),%rdi
   0x000055c89924bc9f <+31>:	mov    $0xaa,%esi
   0x000055c89924bca4 <+36>:	mov    $0x40040,%edx
   0x000055c89924bca9 <+41>:	callq  0x55c89928ba80 <memset@plt>
=> 0x000055c89924bcae <+46>:	mov    %fs:0x28,%rdx
   0x000055c89924bcb7 <+55>:	mov    -0x8(%rbp),%rdi
   0x000055c89924bcbb <+59>:	cmp    %rdi,%rdx
   0x000055c89924bcbe <+62>:	mov    %rax,-0x40050(%rbp)
   0x000055c89924bcc5 <+69>:	jne    0x55c89924bcd9 <LZ4_streamHC_t_alignment+89>
   0x000055c89924bccb <+75>:	mov    $0x8,%eax
   0x000055c89924bcd0 <+80>:	add    $0x40050,%rsp
   0x000055c89924bcd7 <+87>:	pop    %rbp
   0x000055c89924bcd8 <+88>:	retq   
   0x000055c89924bcd9 <+89>:	callq  0x55c89928ba50 <__stack_chk_fail@plt>
End of assembler dump.

I don't know why it's calling memset on 0x40040 bytes of the stack just to compute the offset of a member of a struct - I assume for some kind of instrumented sanity check. But in any case, this looks like it's just a dupe of bug 1550108.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE

Do you mean to dup this over bug 1587107? I just apply https://phabricator.services.mozilla.com/D48570, and it does help.

Woops! Yes.

You need to log in before you can comment on or make changes to this bug.