TM: SIGILL Crash [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]

VERIFIED FIXED in mozilla1.9.1b4

Status

()

P2
critical
VERIFIED FIXED
10 years ago
3 years ago

People

(Reporter: philip.chee, Assigned: gal)

Tracking

({crash, regression})

Trunk
mozilla1.9.1b4
x86
All
crash, regression
Points:
---
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(crash signature, URL)

Attachments

(1 attachment)

(Reporter)

Description

10 years ago
+++ This bug was initially created as a clone of Bug #477471 +++

From Bug 477471 Comment 12:

Build ID:  20090228000503

Build identifier: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1b3pre)
Gecko/20090228 SeaMonkey/2.0b1pre

Toggling javascript.options.jit.content back to true and not using safe-mode,
it launches, but, if a web page has a Flash ad on it (and which ones do not?),
SeaMonkey crashes every time.   :(

Crash ID: bp-429da3c6-0fe7-4506-b651-a51542090228
Crash ID: bp-a559e867-c944-44b8-b94c-3de882090228
Crash ID: bp-f688369b-00c7-4a96-9a12-dc3b32090228

0  	 	@0xb131bbad  	
1 		@0xbfbd88d7 	
2 	libmozjs.so 	js_MonitorLoopEdge 	js/src/jstracer.cpp:4285
3 	libmozjs.so 	js_Interpret 	js/src/jsinterp.cpp:3098

Updated

10 years ago
Summary: TM: SIGILL Crash [js_MonitorLoopEdge(JSContext*, unsigned int&)] → TM: SIGILL Crash [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]
(Assignee)

Comment 1

10 years ago
Can you try to capture a disassembly of the crashing code? What instruction causes the SIGILL?

Comment 2

10 years ago
Perhaps just a "me too" comment, but I'm getting the SIGILL crashes with this signature on an AMD K6-III/450.  The first page encountered after successfully logging into http://www.chase.com reliably triggers the crash.  Setting javascript.options.jit.content to "false" seems to be a valid workaround.

I've had to "roll my own" firefox for many months now (binary packages are for i686 and later).  If you need more information about my build environment, don't hesitate to ask.  I have attempted to perform a build with symbols in the recent past, but I have neither enough disk space nor enough memory (physical and swap).  I can generate a core dump, but gdb can't seem to make any sense out of the mess: I get a "no function contains the referenced address" error when attempting disassembly of the function address given in the backtrace.

Mozilla version info as below (my build is from the release 3.1b3 sources):

Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1b3) Gecko/20090315 Shiretoko/3.1b3

Comment 3

10 years ago
This is the number one crasher on the trunk.
http://crash-stats.mozilla.com/query/query?do_query=1&branch=1.9.2&date=&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=
Flags: blocking1.9.2?
OS: Linux → All
(Assignee)

Comment 4

10 years ago
This is a SIGSEGV crash involving lr after in invocation, not SIGILL. Looks bad. Must fix.
Assignee: general → gal
Flags: blocking1.9.2? → blocking1.9.1?
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1b4
(Assignee)

Comment 5

10 years ago
I will try to take a look at this tonight. If someone can link a crashing website or a testcase or find the regression range, that would be enormously useful.
Whiteboard: need-regression-window, need-testcase

Comment 6

10 years ago
Comments in crash-stats say Google reader or Facebook. I use Facebook a bit but without any 3rd party Facebook applications and have not run into this.

Comment 7

10 years ago
First reported crash was on February 10th. Was there a tm merge around then?
http://crash-stats.mozilla.com/report/index/7ca39b1a-4e13-4900-ba4b-e01462090210

Updated

10 years ago
Flags: blocking1.9.1? → blocking1.9.1+
Priority: P1 → P2
(In reply to comment #4)
> This is a SIGSEGV crash involving lr after in invocation, not SIGILL. Looks
> bad. Must fix.

Should be resummarized, or are people still seeing SIGILL? Usually crashes due to a segv (load or store from a bad pointer) are quite different from illegal instruction crashes (jit bug, cpu version dependency, jump to random data).

/be

Comment 9

10 years ago
<http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_MonitorLoopEdge%28JSContext*%2C%20unsigned%20int%26%29>

This looks fixed. There are still crashes on trunk, but the build ids are all a month old.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Resolution: FIXED → WORKSFORME

Comment 11

10 years ago
All of the build ids in that link are from 2009-04-23. This push

http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?changeset=6dde8411585a

or something in the area probably fixed it.
(Assignee)

Comment 12

10 years ago
Its not clear to me which patch in that merge fixed this crash. If anyone feels like bisecting this down to the patch that resolved it, that would be awesome.

Comment 13

10 years ago
NOT fixed here.

Contrary to an earlier comment, this is a SIGILL crash (illegal instruction), not SIGSEGV.  The problem still exists in the released firefox 3.5b99 on an AMD K6-III/450 built from unmodified source.  The "about" information is

Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1b99) Gecko/20090609 Shiretoko/3.5b99

The crash is 100% reproducible by setting javascript.options.jit.content to its default value (true), and then attempting to access the USAA "bill pay" page (lots of javascript).  Toggling the above config option to false is still a valid workaround.
Can you provide a stacktrace or crash-id (in about:crashes) of this crash?
Keywords: regression, regressionwindow-wanted, testcase-wanted
Whiteboard: need-regression-window, need-testcase
(Assignee)

Comment 15

10 years ago
I don't think the stacktrace will be very useful. This looks like bad instruction set detection. K6-III/450 is the important hint here. If the reporter is willing to help diagnose the problem and testing patches, we can try to address this. We have no way to test locally (this is an issue with K6-III not supporting SSE and conditional moves and us not detecting that right).
Adding Ted here, who has some non-SSE hardware and mandate!
Can anyone provide a publicly accessible page that reproduces the problem for them? Pages behind login are unhelpful here. I have a Shiretoko nightly on an old Pentium III machine with the flash plugin installed, and I don't crash clicking around Yahoo Finance. I'll try running through our unittest suites and see if anything triggers in the meantime.

Comment 18

10 years ago
I don't think you will be able to reproduce the K6 problem on a PIII processor. There is an OS/2 user with a K6-2/500 CPU who also sees SIGILL crashes every now and then with JIT turned on. See

http://groups.google.de/group/mozilla.dev.ports.os2/msg/e0f871f5ed375968
http://groups.google.de/group/mozilla.dev.ports.os2/msg/7c21fb982c57255a

So his FF was crashing on
   www.spiegel.de
and
   www.heise.de     in conjunction with
   http://www.heise.de/open/Distributionsreigen-Zielgerade-und-Einlauf-fuer-Ubuntu-und-Mandriva--/artikel/136574/1

But that was with older OS/2 builds, and not fully reproducible, so probably doesn't help to fix the problem.

(This bug should be reopened, right?)
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(In reply to comment #18)
> I don't think you will be able to reproduce the K6 problem on a PIII processor.

Hm, I don't know that we have anything older available. :-/ Do you think that this is a crash due to SSE1 instructions? The PIII would have that, but the K6 wouldn't. (Although the P3 lacks SSE2 or better.)

Comment 20

10 years ago
Definitely willing to help with diagnosis and patch testing.  Let me know how I can help.  However, do note that I cannot do a debug build: takes far more disk space than I've got, not to mention the RAM and swap required to compile/link it.  I agree that a web page not hiding behind a login would be useful: I'll try to find one that reliably triggers the crash.  Thanks for the assist!
I bet we could get a one-off debug build for you.  Ted, am I crazy?

Comment 22

10 years ago
Careful on the one-off debug build suggestion :-).  If the standard i686 build would work with my processor, I wouldn't be having to roll my own i586 version.  The other thing is, the motherboard has the maximum RAM it can support: 384 MB.  From what I've seen as far as the potential size of a debug build, I might not have enough system resources to execute it.

That's not to say we can't enable debugging for a carefully chosen subset of the build tree if that would be useful.

I guess it might be appropriate at this point to apologize for having a certain fondness for antiques.
(Assignee)

Comment 23

10 years ago
I would suspect that we get the SSE detection wrong, so that piece should be easy to test separately.
No, no -- I mean a build of the usual i686 build with debug symbols.  We want your brokenness, it's going to help us find the problem.

(Debug symbols aren't loaded by default, so you should still be able to run it, but we might be able to just give you a JS shell to use.)
(Assignee)

Comment 25

10 years ago
rct, want to run this on your machine and tell us what it returns?

static bool
js_CheckForSSE2()
{
    int features = 0;
#if defined _MSC_VER
    __asm
    {
        pushad
        mov eax, 1
        cpuid
        mov features, edx
        popad
    }
#elif defined __GNUC__
    asm("xchg %%esi, %%ebx\n" /* we can't clobber ebx on gcc (PIC register) */
        "mov $0x01, %%eax\n"
        "cpuid\n"
        "mov %%edx, %0\n"
	"xchg %%esi, %%ebx\n"
        : "=m" (features)
        : /* We have no inputs */
        : "%eax", "%esi", "%ecx", "%edx"
       );
#elif defined __SUNPRO_C || defined __SUNPRO_CC
    asm("push %%ebx\n"
        "mov $0x01, %%eax\n"
        "cpuid\n"
        "pop %%ebx\n"
        : "=d" (features)
        : /* We have no inputs */
        : "%eax", "%ecx"
       );
#endif
    return (features & (1<<26)) != 0;
}
#endif

If this returns false correctly, we might have a bug in the FPU code. That would suck a lot since we have zero test coverage for that atm.
(In reply to comment #22)
> Careful on the one-off debug build suggestion :-).  If the standard i686 build
> would work with my processor, I wouldn't be having to roll my own i586 version.

FWIW, our official nightly builds are not i686 targeted. They should run fine on any x86 machine. (Although clearly we have a JIT bug here somewhere.)

Comment 27

10 years ago
With a trivial main() wrapper that simply returns the value of js_CheckForSSE2(), the value returned is 0.
(Assignee)

Comment 28

10 years ago
Ok, found something. Taking the bug.
(Assignee)

Updated

10 years ago
Blocks: 468484
(Assignee)

Comment 29

10 years ago
Created attachment 382591 [details] [diff] [review]
disable conditional moves if the processor doesn't support them

Can the reporter(s) test this patch? OPT or DEBUG are both fine. You can just rebuild the JS engine separately instead of a full browser rebuild.

Comment 30

10 years ago
Patch downloaded.  Will apply it and start a build.  Report to follow, but it will be several hours due to demands of life away from the keyboard.  Thanks!
(Assignee)

Updated

10 years ago
Depends on: 497455
(Assignee)

Comment 31

10 years ago
Moving the patch into a separate bug so we can block on that one.

https://bugzilla.mozilla.org/show_bug.cgi?id=497455

Closing this back down, but added dependency.
No longer blocks: 468484
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED
is this fixed1.9.1 as well as per comment 31?
(Assignee)

Comment 33

10 years ago
Not sure how to properly triage this. I opened a new bug to fix this issue, instead of re-opening this one. Whatever was done to this bug to re-open it should be undone.

Comment 34

10 years ago
Fix verified.  No more SIGILL on K6-III/450 with javascript.options.jit.content set to default value of "true".  Thanks Andreas!
Not worth checking for a regression range for an already fixed bug.

Resolving as verified fixed based on comment 34.
Status: RESOLVED → VERIFIED
Keywords: regressionwindow-wanted

Comment 36

9 years ago
I not longer see this bug when javascript.options.jit.content is set to true, but when javascript.options.jit.chrome is set to true, FF crashes.  I tested on FF 3.6b4, mozilla 1.9.2.   Please let me know if anyone is experiencing this.  thank you.

The machine is run on is a vortex process that does not support cmov/sse2 instruction.

thank you.
What is the stack or crash report ID for the chrome.jit crash?

Comment 38

9 years ago
this is the cpu info:

processor       : 0
vendor_id       : Vortex86 SoC
cpu family      : 5
model           : 2
model name      : 05/02
stepping        : 2
cpu MHz         : 1000.072
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc cx8
bogomips        : 2007.74
clflush size    : 32



here is the stack dump:

*** Registering components in: Apprunner
WARNING: NS_ENSURE_TRUE(mHiddenWindow) failed: file nsAppShellService.cpp, line 399
pldhash: for the table at address 0xb52b07c8, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b0760 == 1
pldhash: for the table at address 0xb494a268, the given entrySize of 48 probably favors chaining over double hashing.
++DOMWINDOW == 1 (0xb7061020) [serial = 1] [outer = (nil)]
pldhash: for the table at address 0xb52b0ba8, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b0b40 == 2
++DOMWINDOW == 2 (0xb7061590) [serial = 2] [outer = (nil)]
++DOMWINDOW == 3 (0xb7061760) [serial = 3] [outer = 0xb7061560]
++DOMWINDOW == 4 (0xb7062410) [serial = 4] [outer = 0xb7060ff0]
pldhash: for the table at address 0xb52b1938, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b18d0 == 3
++DOMWINDOW == 5 (0xb7062d20) [serial = 5] [outer = (nil)]
pldhash: for the table at address 0xb52b1b28, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b1ac0 == 4
++DOMWINDOW == 6 (0xb7062ef0) [serial = 6] [outer = (nil)]
pldhash: for the table at address 0xb52b2aa8, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b2a40 == 5
WARNING: NS_ENSURE_TRUE(browserChrome) failed: file nsDocShell.cpp, line 9897
WARNING: Something wrong when creating the docshell for a frameloader!: file nsFrameLoader.cpp, line 912
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file nsFrameLoader.cpp, line 936
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file nsFrameLoader.cpp, line 193
pldhash: for the table at address 0xb52b2e88, the given entrySize of 48 probably favors chaining over double hashing.
++DOCSHELL 0xb52b2e20 == 6
++DOMWINDOW == 7 (0xb7054a30) [serial = 7] [outer = (nil)]

Program /opt/mozilla.org/lib/firefox-3.6b4/firefox-bin (pid = 4744) received signal 4.
Stack:
UNKNOWN 0xffffe420
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x000E0780]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x000F5EF3]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x0005DC94]
js_Invoke+0x00000746 [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x0007475C]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x00074B1E]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x00074CB9]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x00081539]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x000843F7]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x00065042]
js_Invoke+0x00000746 [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x0007475C]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x00074B1E]
JS_CallFunctionValue+
0x00000066 [/opt/mozilla.org/lib/firefox-3.6b4/libmozjs.so +0x0001413C]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x004DE339]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x004D0B3F]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x004EB009]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x001630C0]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x004FC2E2]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x004FC633]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x00502153]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libgklayout.so +0x00502642]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libnecko.so +0x0005AC9F]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libjar50.so +0x00010540]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libnecko.so +0x000326DD]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libnecko.so +0x000327B1]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libxpcom_core.so +0x00064B3E]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/libxpcom_core.so +0x00086DC5]
NS_ProcessNextEvent_P(nsIThread*, int)+0x00000059 [/opt/mozilla.org/lib/firefox-3.6b4/libxpcom_core.so +0x0002F6AB]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libwidget_gtk2.so +0x00049142]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/components/libtoolkitcomps.so +0x00006F8D]
XRE_main+0x00001C4F [/opt/mozilla.org/lib/firefox-3.6b4/libxul.so +0x0001CFA7]
UNKNOWN [/opt/mozilla.org/lib/firefox-3.6b4/firefox-bin +0x000017EF]
__libc_start_main+0x0000012E [/lib/libc.so.6 +0x0001620E]
Sleeping for 300 seconds.
Type 'gdb /opt/mozilla.org/lib/firefox-3.6b4/firefox-bin 4744' to attach your debugger to this thread.
- Show quoted text -
- Show quoted text -
(Assignee)

Comment 39

9 years ago
Looks like we are crashing in JIT code. What does the code in comment #25 return for your CPU? Maybe the CPU sets its flags wrong.

Comment 40

9 years ago
I compiled the code in comment #25 and ran it on the vortex machine; it returned 0 (as expected)
(Assignee)

Comment 41

9 years ago
Ok, thats pretty strange. We really shouldn't be emitting cmovs or sse instructions when that flag is off. I will review the corresponding code a bit to double check.

Comment 42

9 years ago
Were you able to find out what happened?  here is the mozconfig for FF3.6b4 if you really wanted to reproduce the bug.

# sh
# Build configuration script
# Options for client.mk.
# mk_add_options MOZ_MAKE_FLAGS=-j4

# Options for 'configure' (same as command-line options).
ac_add_options --prefix=/opt/mozilla.org
ac_add_options --libdir=/opt/mozilla.org/lib
ac_add_options --sysconfdir=/etc/firefox
ac_add_options --localstatedir=/var
ac_add_options --enable-default-mozilla-five-home
ac_add_options --with-default-mozilla-five-home=/opt/mozilla.org/lib/firefox-3.6b4 
ac_add_options --host=i486-t2-linux-gnu
ac_add_options --disable-debug
ac_add_options --enable-optimize 
ac_add_options --disable-dtd-debug
ac_add_options --disable-tests
ac_add_options --disable-logging
ac_add_options --disable-pedantic
ac_add_options --enable-xft
ac_add_options --enable-default-toolkit=gtk2
ac_add_options --with-system-zlib
ac_add_options --with-system-jpeg
ac_add_options --with-system-png
ac_add_options --with-system-mng
ac_add_options --enable-system-cairo
ac_add_options --enable-crypto

export BUILD_OFFICIAL=1
export MOZILLA_OFFICIAL=1
mk_add_options BUILD_OFFICIAL=1
mk_add_options MOZILLA_OFFICIAL=1
. $topsrcdir/browser/config/mozconfig
export MOZ_PHOENIX=1
mk_add_options MOZ_PHOENIX=1

ac_add_options --enable-default-toolkit=cairo-gtk2
ac_add_options --disable-mailnews
ac_add_options --disable-composer
ac_add_options --enable-extensions=default #,cookie,permissions,xml-rpc,xmlextras,pref,transformiix,webservices,auth
ac_add_options --enable-mathml
ac_add_options --enable-crypto
ac_add_options --enable-module=psm
ac_add_options --without-system-png
ac_add_options --disable-profilesharing

ac_add_options --disable-javaxpcom
ac_add_options --disable-startup-notification
ac_add_options --disable-necko-wifi
ac_add_options --disable-parental-controls
ac_add_options --disable-activex
ac_add_options --disable-activex-scripting
ac_add_options --disable-ogg
ac_add_options --disable-wave
ac_add_options --disable-accessibility
ac_add_options --disable-dbus
ac_add_options --disable-crashreporter
# speedup build
ac_add_options --disable-test
ac_add_options --disable-tests
ac_add_options --disable-glibtest
ac_add_options --disable-freetypetest
ac_add_options --disable-libIDLtest

# Some debug functions when firefox fail to start
#ac_add_options --enable-debug
#ac_add_options --enable-debug-modules
#ac_add_options --disable-strip
#ac_add_options --disable-install-strip
#ac_add_options --disable-optimize

# More to strip down functions in Firefox 3.6.x
ac_add_options --disable-libnotify
ac_add_options --disable-accessibility
ac_add_options --disable-view-source
ac_add_options --disable-plugins
ac_add_options --disable--jsd
ac_add_options --disable-universalchardet
##ac_add_options --disable-libxul
ac_add_options --disable-libIDL
ac_add_options --disable-profilelocking
##ac_add_options --disable-rdf
#ac_add_options --disable-necko-disk-cache
#ac_add_options --disable-necko-wifi
#ac_add_options --disable-necko-small-buffers
ac_add_options --disable-safe-browsing
ac_add_options --disable-help-viewer
#ac_add_options --disable-places
#ac_add_options --disable-canvas
ac_add_options --disable-canvas3d
ac_add_options --disable-updater
ac_add_options --disable-javaxpcom
ac_add_options --disable-xpctools
ac_add_options --disable-parental-control
ac_add_options --disable-leaky


ac_add_options --disable-ldap
Crash Signature: [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]
Filter on qa-project-auto-change:

Bug in removed tracer code, setting in-testsuite- flag.
Flags: in-testsuite-
Keywords: testcase-wanted
You need to log in before you can comment on or make changes to this bug.