1839139 - Crash in [@ EnterBaseline] affecting users in the es-ar locale doing searches on Google

Gabriele Svelto [:gsvelto]

Reporter

Description

•

1 year ago

Crash report: https://crash-stats.mozilla.org/report/index/1590b3bf-fe4b-467f-950c-df6c80230616

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  ?  @0x00001cb414863436  
1  ?  @0x00001cb4148634ed  
2  libxul.so  EnterBaseline  js/src/jit/BaselineJIT.cpp:142
2  libxul.so  js::jit::EnterBaselineInterpreterAtBranch  js/src/jit/BaselineJIT.cpp:198
3  libxul.so  Interpret  js/src/vm/Interpreter.cpp:2225
4  libxul.so  js::RunScript  js/src/vm/Interpreter.cpp:431
4  libxul.so  js::InternalCallOrConstruct  js/src/vm/Interpreter.cpp:585
4  libxul.so  InternalCall  js/src/vm/Interpreter.cpp:620
4  libxul.so  js::Call  js/src/vm/Interpreter.cpp:652
5  libxul.so  js::fun_call  js/src/vm/JSFunction.cpp:956

I'm chucking this in the JIT component because it seems to be happening in JIT compiled code. We've got a large spike in release primarily spanish-speaking users. Out of all the crashes under these signatures the top affected locale is es-ar with 76.90% of the crashes.

Almost all of those crashes are in the release channel but spread over several versions, additionally many users seem to be crashing while doing searches on Google so maybe this was triggered by a change on their side which unearthed a bug in Firefox.

Gabriele Svelto [:gsvelto]

Reporter

Comment 1

•

1 year ago

Even more crashes affecting users in the es-ar locale but under a different signature (but with a similar stack and still doing searches on Google).

Crash Signature: [@ EnterBaseline] → [@ EnterBaseline] [@ chunk_alloc | <unknown in libxul.so>]

Gabriele Svelto [:gsvelto]

Reporter

Updated

•

1 year ago

Summary: Crash in [@ EnterBaseline] → Crash in [@ EnterBaseline] affecting users in the es-ar locale doing searches on Google

Gabriele Svelto [:gsvelto]

Reporter

Comment 2

•

1 year ago

Even more crashes, these are all on very old versions of Firefox so definitely something triggered by a server-side change.

Crash Signature: [@ EnterBaseline] [@ chunk_alloc | <unknown in libxul.so>] → [@ EnterBaseline] [@ chunk_alloc | <unknown in libxul.so>] [@ <unknown in firefox-bin>] [@ base_alloc]

Gabriele Svelto [:gsvelto]

Reporter

Comment 3

•

1 year ago

Ouch, even more, this is bad.

Crash Signature: [@ EnterBaseline] [@ chunk_alloc | <unknown in libxul.so>] [@ <unknown in firefox-bin>] [@ base_alloc] → [@ EnterBaseline] [@ chunk_alloc | <unknown in libxul.so>] [@ <unknown in firefox-bin>] [@ base_alloc] [@ js::jit::EnterBaselineInterpreterAtBranch]

Pascal Chevrel:pascalc

Updated

•

1 year ago

Severity: -- → S2

status-firefox114: --- → affected

status-firefox115: --- → affected

status-firefox116: --- → affected

status-firefox-esr102: --- → affected

tracking-firefox114: --- → +

tracking-firefox115: --- → +

tracking-firefox116: --- → +

Priority: -- → P1

Pascal Chevrel:pascalc

Comment 4

•

1 year ago

The spike is coming from the Huayra distro (https://es.wikipedia.org/wiki/Huayra_GNU/Linux) which is an Argentinian distro for education, they had a major release v6 last week, it is probably starting to be deployed.

Emilio Cobos Álvarez (:emilio)

Comment 5

•

1 year ago

I sent an email to info [at] educar.gob.ar, which maintain that distro.

Pascal Chevrel:pascalc

Comment 6

•

1 year ago

•

Edited

(In reply to Emilio Cobos Álvarez (:emilio) from comment #5)

I sent an email to info [at] educar.gob.ar, which maintain that distro.

Thanks, they will get 2 emails then, I sent them one as well :)

BugBot [:suhaib / :marco/ :calixte]

Comment 7

•

1 year ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 10 content process crashes on release
Top 5 desktop browser crashes on Linux on beta
Top 5 desktop browser crashes on Linux on release

For more information, please visit BugBot documentation.

Keywords: topcrash

Iain Ireland [:iain]

Comment 8

•

1 year ago

Looking at the crash linked in comment 1, we're crashing in code that looks like this:

    1cb414863417:   mov    %rsp,%rbx         // rbx = rsp - (rdx * 8)
    1cb41486341a:   mov    %rdx,%rax
    1cb41486341d:   shl    $0x3,%rax
    1cb414863421:   sub    %rax,%rbx
    1cb414863424:   mov    %rsp,%rax         // rax = rsp - 0x800
    1cb414863427:   sub    $0x800,%rax
    1cb41486342d:   cmp    %rbx,%rax         // while rax >= rbx
    1cb414863430:   jb     0x1cb414863444
    1cb414863436:   movl   $0x0,(%rax)       //   *rax = 0
^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^^
    1cb41486343c:   sub    $0x800,%rax       //   rax -= 0x800
    1cb414863442:   jmp    0x1cb41486342d

This appears to be some sort of stack-probing code. I've left out the preceding context, but it's right at the beginning of a function. rdx is being passed in from the caller. In this case, it's 19535. It looks like we're allocating room for that many 8-byte values. In 2048-byte steps, we walk the stack and touch each page.

Oh, it's this code. That's used in EnterJIT, so we're apparently calling into jit code with ~20000 values on the stack and running out of space. Specifically, we're doing on-stack-replacement to tier up from the C++ interpreter to the baseline interpreter, which entails copying all the values that are currently on the interpreter's stack (arguments, local variables, intermediate results) from the heap onto the native stack.

I should look at more than one crash, but for now one hypothesis is that the distro changed the default stack size.

A couple other things I've noticed:

We're already doing a stack overflow check in EnterBaseline, so whatever's going wrong here is somehow circumventing that.

20000 is our default limit for max stack arguments.

Iain Ireland [:iain]

Comment 9

•

1 year ago

Oh, one other thing about the crash I've looked at is that we crash when we still have several iterations left to go in the loop, so our check is way off. Maybe cx->nativeStackLimit is being set up wrong somehow?

Iain Ireland [:iain]

Comment 10

•

1 year ago

Looking at another four EnterBaseline crashes with useragent-locale "es-ar", they're all crashing in the same code. I suggest asking the maintainer whether anything changed regarding stack limits.

BugBot [:suhaib / :marco/ :calixte]

Comment 11

•

1 year ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 20 desktop browser crashes on release (startup)
Top 10 content process crashes on beta
Top 10 content process crashes on release
Top 5 desktop browser crashes on Linux on beta
Top 5 desktop browser crashes on Linux on release

For more information, please visit BugBot documentation.

Keywords: topcrash-startup

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

1 year ago

The bug is marked as tracked for firefox114 (release), tracked for firefox115 (beta) and tracked for firefox116 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

:sdetar, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Comment 13

•

1 year ago

Did we ever hear anything back yet from emails sent by Pascal and Emilio to maintainers of the distro? It seems like we won't be able to make much progress until we are able to talk to them.

Flags: needinfo?(sdetar)

Pascal Chevrel:pascalc

Comment 14

•

1 year ago

(In reply to Steven DeTar [:sdetar] from comment #13)

Did we ever hear anything back yet from emails sent by Pascal and Emilio to maintainers of the distro? It seems like we won't be able to make much progress until we are able to talk to them.

I haven't received any response.

Fernando Toledo

Comment 15

•

1 year ago

Hi all! i'm working on Huayra GNU/Linux, we use the official build/binary download from download.mozilla.org (sha verified)
Just install on /opt/firefox and have very minimal customization (only disable the updater)

Huayra 6 is base Debian 11.x

You can check our package for this purpose at:

https://github.com/HuayraLinux/firefox-installer

whatever thing we can help, let us know

Saludos!

Fernando Toledo

Comment 16

•

1 year ago

Attached image firefox running on Huayra 6 — Details

using kvm/qemu:

kvm -cdrom huayra-amd64-6.0.iso -m 2G

Fernando Toledo

Comment 17

•

1 year ago

(In reply to Fernando Toledo from comment #15)

Hi all! i'm working on Huayra GNU/Linux, we use the official build/binary download from download.mozilla.org (sha verified)
Just install on /opt/firefox and have very minimal customization (only disable the updater)

Huayra 6 is base Debian 11.x

You can check our package for this purpose at:

https://github.com/HuayraLinux/firefox-installer

whatever thing we can help, let us know

Saludos!

FYI: The Firefox version that was shipped in Huayra 6.0 is 114.0.1

Iain Ireland [:iain]

Comment 18

•

1 year ago

•

Edited

Hi Fernando,

It looks like something is going wrong with the amount of stack memory that is available. Have you recently changed the system default stack size? I believe our default stack limit on Linux is 8MB.

If my math is correct, we're crashing when allocating a ~160KB stack frame.

Edit to add: this bit of code seems like it might be relevant. Did you lower the value of RLIMIT_STACK in Huayra 6.0?

Flags: needinfo?(ragnarok)

Fernando Toledo

Comment 19

•

1 year ago

(In reply to Iain Ireland [:iain] from comment #18)

Hi Fernando,

It looks like something is going wrong with the amount of stack memory that is available. Have you recently changed the system default stack size? I believe our default stack limit on Linux is 8MB.

If my math is correct, we're crashing when allocating a ~160KB stack frame.

Edit to add: this bit of code seems like it might be relevant. Did you lower the value of RLIMIT_STACK in Huayra 6.0?

nope we do not change these settings.

ragnarok@huayra:~/Descargas/isos$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30691
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 95
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30691
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Flags: needinfo?(ragnarok)

Fernando Toledo

Comment 20

•

1 year ago

I see in that crash report was on Huayra 5.0 (previous and old release) and FF 112.0.1
We shipped Firefox 90 at release time.
Anyway in Huayra 5 it is possible to update to the latest version of FF too.

The crash reports, were they all from the same version?

Gabriele Svelto [:gsvelto]

Reporter

Comment 21

•

1 year ago

The crashes are coming from all versions including 114.0.1 but all from Huayara 5.0. Did something change in that version of the distribution?

Flags: needinfo?(ragnarok)

Fernando Toledo

Comment 22

•

1 year ago

(In reply to Gabriele Svelto [:gsvelto] from comment #21)

The crashes are coming from all versions including 114.0.1 but all from Huayara 5.0. Did something change in that version of the distribution?

No changes were made from our repo, but users can receive updates directly from the debian repo
I will make some more test. Can someone reproduce the problem?
I still can't reproduce it

Flags: needinfo?(ragnarok)

Emilio Cobos Álvarez (:emilio)

Comment 23

•

1 year ago

•

Edited

Crash in bug 1839669 comment 5 looks nearly identical to this, and that seems like a non-Argentinian user.

Comment 24

•

1 year ago

They also seem to be able to reproduce with official binaries...

Emilio Cobos Álvarez (:emilio)

Comment 25

•

1 year ago

They all seem to be in Debian/Debian-based distros. I'm on Arch and can't repro...

Emilio Cobos Álvarez (:emilio)

Comment 26

•

1 year ago

Julien could repro this on a Debian 10 VM.

Fernando Toledo

Comment 27

•

1 year ago

I could reproduce it on VM qemu+kvm Using Huayra 5 (Debian 10) and FF 90.0:

goto google.com
search and click for "Images" results

Gabriele Svelto [:gsvelto]

Reporter

Comment 28

•

1 year ago

Good, what does ulimit -a give you there?

Gian-Carlo Pascutto [:gcp]

Comment 29

•

1 year ago

Duping this, the other bug explains this is a Google-side change that causes huge resource usage, but the question remains why only Debian users are actually hitting some system limit and crashing.

Status: NEW → RESOLVED

Closed: 1 year ago

Duplicate of bug: 1839669

Resolution: --- → DUPLICATE

Ryan VanderMeulen [:RyanVM]

Comment 30

•

1 year ago

The duplicate bug is already tracking all the necessary releases, so dropping the flags from this one.

status-firefox114: affected → ---

status-firefox115: affected → ---

status-firefox116: affected → ---

status-firefox-esr102: affected → ---

tracking-firefox114: + → ---

tracking-firefox115: + → ---

tracking-firefox116: + → ---