Crash in [@ EnterBaseline] affecting users in the es-ar locale doing searches on Google
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
Details
(Keywords: crash, topcrash, topcrash-startup)
Crash Data
Attachments
(1 file)
144.46 KB,
image/png
|
Details |
Crash report: https://crash-stats.mozilla.org/report/index/1590b3bf-fe4b-467f-950c-df6c80230616
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 ? @0x00001cb414863436
1 ? @0x00001cb4148634ed
2 libxul.so EnterBaseline js/src/jit/BaselineJIT.cpp:142
2 libxul.so js::jit::EnterBaselineInterpreterAtBranch js/src/jit/BaselineJIT.cpp:198
3 libxul.so Interpret js/src/vm/Interpreter.cpp:2225
4 libxul.so js::RunScript js/src/vm/Interpreter.cpp:431
4 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:585
4 libxul.so InternalCall js/src/vm/Interpreter.cpp:620
4 libxul.so js::Call js/src/vm/Interpreter.cpp:652
5 libxul.so js::fun_call js/src/vm/JSFunction.cpp:956
I'm chucking this in the JIT component because it seems to be happening in JIT compiled code. We've got a large spike in release primarily spanish-speaking users. Out of all the crashes under these signatures the top affected locale is es-ar
with 76.90% of the crashes.
Almost all of those crashes are in the release channel but spread over several versions, additionally many users seem to be crashing while doing searches on Google so maybe this was triggered by a change on their side which unearthed a bug in Firefox.
Reporter | ||
Comment 1•1 year ago
|
||
Even more crashes affecting users in the es-ar
locale but under a different signature (but with a similar stack and still doing searches on Google).
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 2•1 year ago
|
||
Even more crashes, these are all on very old versions of Firefox so definitely something triggered by a server-side change.
Reporter | ||
Comment 3•1 year ago
|
||
Ouch, even more, this is bad.
Updated•1 year ago
|
Comment 4•1 year ago
|
||
The spike is coming from the Huayra distro (https://es.wikipedia.org/wiki/Huayra_GNU/Linux) which is an Argentinian distro for education, they had a major release v6 last week, it is probably starting to be deployed.
Comment 5•1 year ago
|
||
I sent an email to info [at] educar.gob.ar, which maintain that distro.
Comment 6•1 year ago
•
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #5)
I sent an email to info [at] educar.gob.ar, which maintain that distro.
Thanks, they will get 2 emails then, I sent them one as well :)
Comment 7•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 10 content process crashes on release
- Top 5 desktop browser crashes on Linux on beta
- Top 5 desktop browser crashes on Linux on release
For more information, please visit BugBot documentation.
Comment 8•1 year ago
|
||
Looking at the crash linked in comment 1, we're crashing in code that looks like this:
1cb414863417: mov %rsp,%rbx // rbx = rsp - (rdx * 8)
1cb41486341a: mov %rdx,%rax
1cb41486341d: shl $0x3,%rax
1cb414863421: sub %rax,%rbx
1cb414863424: mov %rsp,%rax // rax = rsp - 0x800
1cb414863427: sub $0x800,%rax
1cb41486342d: cmp %rbx,%rax // while rax >= rbx
1cb414863430: jb 0x1cb414863444
1cb414863436: movl $0x0,(%rax) // *rax = 0
^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
1cb41486343c: sub $0x800,%rax // rax -= 0x800
1cb414863442: jmp 0x1cb41486342d
This appears to be some sort of stack-probing code. I've left out the preceding context, but it's right at the beginning of a function. rdx
is being passed in from the caller. In this case, it's 19535. It looks like we're allocating room for that many 8-byte values. In 2048-byte steps, we walk the stack and touch each page.
Oh, it's this code. That's used in EnterJIT, so we're apparently calling into jit code with ~20000 values on the stack and running out of space. Specifically, we're doing on-stack-replacement to tier up from the C++ interpreter to the baseline interpreter, which entails copying all the values that are currently on the interpreter's stack (arguments, local variables, intermediate results) from the heap onto the native stack.
I should look at more than one crash, but for now one hypothesis is that the distro changed the default stack size.
A couple other things I've noticed:
We're already doing a stack overflow check in EnterBaseline, so whatever's going wrong here is somehow circumventing that.
20000 is our default limit for max stack arguments.
Comment 9•1 year ago
|
||
Oh, one other thing about the crash I've looked at is that we crash when we still have several iterations left to go in the loop, so our check is way off. Maybe cx->nativeStackLimit is being set up wrong somehow?
Comment 10•1 year ago
|
||
Looking at another four EnterBaseline crashes with useragent-locale "es-ar", they're all crashing in the same code. I suggest asking the maintainer whether anything changed regarding stack limits.
Comment 11•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on beta
- Top 10 content process crashes on release
- Top 5 desktop browser crashes on Linux on beta
- Top 5 desktop browser crashes on Linux on release
For more information, please visit BugBot documentation.
Comment 12•1 year ago
|
||
The bug is marked as tracked for firefox114 (release), tracked for firefox115 (beta) and tracked for firefox116 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
:sdetar, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 13•1 year ago
|
||
Did we ever hear anything back yet from emails sent by Pascal and Emilio to maintainers of the distro? It seems like we won't be able to make much progress until we are able to talk to them.
Comment 14•1 year ago
|
||
(In reply to Steven DeTar [:sdetar] from comment #13)
Did we ever hear anything back yet from emails sent by Pascal and Emilio to maintainers of the distro? It seems like we won't be able to make much progress until we are able to talk to them.
I haven't received any response.
Comment 15•1 year ago
|
||
Hi all! i'm working on Huayra GNU/Linux, we use the official build/binary download from download.mozilla.org (sha verified)
Just install on /opt/firefox and have very minimal customization (only disable the updater)
Huayra 6 is base Debian 11.x
You can check our package for this purpose at:
https://github.com/HuayraLinux/firefox-installer
whatever thing we can help, let us know
Saludos!
Comment 16•1 year ago
|
||
using kvm/qemu:
kvm -cdrom huayra-amd64-6.0.iso -m 2G
Comment 17•1 year ago
|
||
(In reply to Fernando Toledo from comment #15)
Hi all! i'm working on Huayra GNU/Linux, we use the official build/binary download from download.mozilla.org (sha verified)
Just install on /opt/firefox and have very minimal customization (only disable the updater)Huayra 6 is base Debian 11.x
You can check our package for this purpose at:
https://github.com/HuayraLinux/firefox-installer
whatever thing we can help, let us know
Saludos!
FYI: The Firefox version that was shipped in Huayra 6.0 is 114.0.1
Comment 18•1 year ago
•
|
||
Hi Fernando,
It looks like something is going wrong with the amount of stack memory that is available. Have you recently changed the system default stack size? I believe our default stack limit on Linux is 8MB.
If my math is correct, we're crashing when allocating a ~160KB stack frame.
Edit to add: this bit of code seems like it might be relevant. Did you lower the value of RLIMIT_STACK in Huayra 6.0?
Comment 19•1 year ago
|
||
(In reply to Iain Ireland [:iain] from comment #18)
Hi Fernando,
It looks like something is going wrong with the amount of stack memory that is available. Have you recently changed the system default stack size? I believe our default stack limit on Linux is 8MB.
If my math is correct, we're crashing when allocating a ~160KB stack frame.
Edit to add: this bit of code seems like it might be relevant. Did you lower the value of RLIMIT_STACK in Huayra 6.0?
nope we do not change these settings.
ragnarok@huayra:~/Descargas/isos$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30691
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 95
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30691
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Comment 20•1 year ago
|
||
I see in that crash report was on Huayra 5.0 (previous and old release) and FF 112.0.1
We shipped Firefox 90 at release time.
Anyway in Huayra 5 it is possible to update to the latest version of FF too.
The crash reports, were they all from the same version?
Reporter | ||
Comment 21•1 year ago
|
||
The crashes are coming from all versions including 114.0.1 but all from Huayara 5.0. Did something change in that version of the distribution?
Comment 22•1 year ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #21)
The crashes are coming from all versions including 114.0.1 but all from Huayara 5.0. Did something change in that version of the distribution?
No changes were made from our repo, but users can receive updates directly from the debian repo
I will make some more test. Can someone reproduce the problem?
I still can't reproduce it
Comment 23•1 year ago
•
|
||
Crash in bug 1839669 comment 5 looks nearly identical to this, and that seems like a non-Argentinian user.
Comment 24•1 year ago
|
||
They also seem to be able to reproduce with official binaries...
Comment 25•1 year ago
|
||
They all seem to be in Debian/Debian-based distros. I'm on Arch and can't repro...
Comment 26•1 year ago
|
||
Julien could repro this on a Debian 10 VM.
Comment 27•1 year ago
|
||
I could reproduce it on VM qemu+kvm Using Huayra 5 (Debian 10) and FF 90.0:
goto google.com
search and click for "Images" results
Reporter | ||
Comment 28•1 year ago
|
||
Good, what does ulimit -a
give you there?
Comment 29•1 year ago
|
||
Duping this, the other bug explains this is a Google-side change that causes huge resource usage, but the question remains why only Debian users are actually hitting some system limit and crashing.
Comment 30•1 year ago
|
||
The duplicate bug is already tracking all the necessary releases, so dropping the flags from this one.
Description
•