top crash [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]

VERIFIED FIXED in mozilla1.9.1

Status

()

defect
P1
critical
VERIFIED FIXED
10 years ago
6 years ago

People

(Reporter: samuel.sidler+old, Assigned: gal)

Tracking

(5 keywords)

1.9.1 Branch
mozilla1.9.1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.1 +
wanted1.9.1.x +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey, crash signature)

Attachments

(3 attachments, 4 obsolete attachments)

The current topcrash in Firefox 3.5b99 and Firefox 3.5 (all RCs) happens with a signature of js_MonitorLoopEdge(JSContext*, unsigned int&). On trunk, it appears further down in the topcrash list and, currently, doesn't have any crashes on Windows.

It seems to happen across platforms on 1.9.1, however, and the top two frames all appear to be random hex numbers.

Query for Firefox 3.5 (RC) crashes: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_MonitorLoopEdge%28JSContext*%2C%20unsigned%20int%26%29

Typically, the crash happens with the following stack, taken from bp-3d4e216c-f4bf-4444-b663-1e82d2090617:

Frame  	Module  	Signature [Expand]  	Source
0 		@0x4122c24 	
1 		@0x12ecbb 	
2 	js3250.dll 	js_MonitorLoopEdge 	js/src/jstracer.cpp:4862
3 	js3250.dll 	js_Interpret 	js/src/jsinterp.cpp:3308 

However, sometimes it appears with the following stack, taken from bp-b786c839-de39-4468-925b-065b72090618:

Frame  	Module  	Signature [Expand]  	Source
0 		@0x162d0c0c 	
1 		@0xbfffc5f7 	
2 	libmozjs.dylib 	js_MonitorLoopEdge 	js/src/jstracer.cpp:4862
3 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:3308
4 	libmozjs.dylib 	js_Invoke 	js/src/jsinterp.cpp:1394
5 	libmozjs.dylib 	js_fun_apply 	js/src/jsfun.cpp:2074
6 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:5147
7 	libmozjs.dylib 	js_Invoke 	js/src/jsinterp.cpp:1394
8 	libmozjs.dylib 	js_fun_apply 	js/src/jsfun.cpp:2074
9 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:5147
10 	libmozjs.dylib 	js_Invoke 	js/src/jsinterp.cpp:1394
11 	libmozjs.dylib 	js_fun_apply 	js/src/jsfun.cpp:2074
12 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:5147
13 	libmozjs.dylib 	js_Invoke 	js/src/jsinterp.cpp:1394
14 	libmozjs.dylib 	js_fun_apply 	js/src/jsfun.cpp:2074
15 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:5147
16 	libmozjs.dylib 	js_Invoke 	js/src/jsinterp.cpp:1394
17 	libmozjs.dylib 	js_fun_apply 	js/src/jsfun.cpp:2074
18 	libmozjs.dylib 	js_Interpret 	js/src/jsinterp.cpp:5147
19 	libmozjs.dylib 	js_Execute 	js/src/jsinterp.cpp:1622
20 	libmozjs.dylib 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5145
21 	XUL 	nsJSContext::EvaluateString 	dom/src/base/nsJSEnvironment.cpp:1631
22 	XUL 	nsScriptLoader::EvaluateScript 	content/base/src/nsScriptLoader.cpp:686
23 	XUL 	nsScriptLoader::ProcessRequest 	content/base/src/nsScriptLoader.cpp:600
24 	XUL 	nsScriptLoader::ProcessPendingRequests 	content/base/src/nsScriptLoader.cpp:740
25 	XUL 	nsRunnableMethod<nsScriptLoader>::Run 	nsThreadUtils.h:264
26 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:510
27 	XUL 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:227
28 	XUL 	nsThread::Shutdown 	xpcom/threads/nsThread.cpp:465
29 	XUL 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
30 	XUL 	nsProxyObjectCallInfo::Run 	xpcom/proxy/src/nsProxyEvent.cpp:181
31 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:510
32 	XUL 	NS_ProcessPendingEvents_P 	nsThreadUtils.cpp:180
33 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:121
34 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:405
35 	CoreFoundation 	CFRunLoopRunSpecific 	
36 	CoreFoundation 	CFRunLoopRunInMode 	
37 	HIToolbox 	RunCurrentEventLoopInMode 	
38 	HIToolbox 	ReceiveNextEventCommon 	
39 	HIToolbox 	BlockUntilNextEventMatchingListInMode 	
40 	AppKit 	_DPSNextEvent 	
41 	AppKit 	-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 	
42 	AppKit 	-[NSApplication run] 	
43 	XUL 	nsAppShell::Run 	widget/src/cocoa/nsAppShell.mm:720
44 	XUL 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:193
45 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3298
46 	firefox-bin 	main 	browser/app/nsBrowserApp.cpp:156
47 	firefox-bin 	firefox-bin@0x1541 	
48 	firefox-bin 	firefox-bin@0x1468 	
49 		@0x1

Filing as security-sensitive for now just to be safe.
Flags: wanted1.9.1.x+
Flags: blocking1.9.1?
Note that this is likely the same as bug 487317, which was marked WFM. I'm guessing it's "random".
Sorry, forgot to say, this is the #1 topcrash with ~3x the amount of crashes as the #2 topcrash.
I probably know the answer to this, but what do we think the frequency is?
Well, it's 3x more frequent than the next crash down. I should note that the reason there are so few 3.5 RC crashes is because we throttle them and only process 15% of the reports. (We throttle all major versions and the RC appears as a major version to the server.)

In the last week, there's been ~7700 crashes using b99.
out of 800,000 users that's ... a lot :(

Sayrer: stack helpful? any ideas?

Comment 6

10 years ago
It looks like Firebug... could that be right?
(In reply to comment #6)
> It looks like Firebug... could that be right?

Possibly? robcee?

Comment 8

10 years ago
hmm, a bunch of these mention a discussion forum. We should get URLs for this.
Lars: Can you generate a list of URLs for this crash signature? Anything newer than June 9 with a version of Firefox 3.5b99, Firefox 3.5, or Firefox 3.5pre (in that order).
(Assignee)

Comment 10

10 years ago
Waldo, this looks like a NULL pointer access on trace to me. Didn't you work on a related bug a while back?
I'd like to see URLs and a set of STRs, ideally.

Can't be sure it's Firebug. One user wrote this in his crash report:

"random crashes even after removing the Firefox folder in Library/Application Support, disable all addons, disable all plugins"

this one (in french) does point at firebug though:

"Je suis quand même surpris qu'une RC plante aussi souvent. Quand j'utilise firebug avec des sites un peu touffus en CSS et JS, j'ai un plantage toutes les 5mn : ex : http://www.blue-days.org"

I'm going to do some digging and see if we can reproduce it.
filed bug 499299 to get the url list from socorro. I'll keep mining those crash reports.
like a champ, I've been commenting away in bug 492041 thinking it was this one. Way to go, Rob.

From that bug's c#26:

(ignore the trailing commas from those URLs, I blame Numbers.app)

there are lots of about: pages in there. Some about:blank, some
about:sessionrestore and some about:rights.

There is also an about:ubiquity link.

A number of chrome: pages (adblockplus, autopager, downbar, fastdial and
google-toolbar to name a few).

I'm going to go through some of these remaining URLs and try to find crashers
with Firebug. Time to get my 4chan on.

...

from c#27:
was able to crash:
http://202.181.195.27/forum-5-1.html

Load that URL, open Firebug. Click the "Yes I am 18 Years or Older" button.
(optional: take a shot of something strong).

Page loads, albeit strangely. Clicking from the script panel to the net panel,
I think, caused the crash.

Trying again to verify.

http://crash-stats.mozilla.com/report/index/d45392dc-cd3a-41be-88f7-c1f852090619?p=1

ted replied in c#29:

This crashed me on a Win32 trunk build without Firebug installed, FWIW:
http://crash-stats.mozilla.com/report/index/1afc6f5c-0901-4cd2-8452-e3f9a2090619?p=1
another tidbit from lars' report. Earliest reports with this signature are:

2009-06-09 00:17:10.288931 running Firefox 3.5pre &
2009-06-09 00:59:01.635172 running Firefox 3.5b99
(In reply to comment #10)

I probably have worked on such bugs, but then again, I've worked on a lot of different null derefs.  Nothing here jumps out at me from the bug, the crash signatures, or anything else to say that it's something I would have more or less knowledge of than anyone else.
problem begins between 3.1b3 and 3.5b4.

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/3d9704097cd8

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/afac8b5958bc

that's a fairly big range and I don't have time to narrow this down any more at the moment. Any west-coasters feel like hunting for this?
QA: Can we get a regression range for this issue based on the URLs from comment 13 and comment 14?
final range:
http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=7e2facde0c95&tochange=9b52390838f0

Looks like the candidate is likely in rsayre's merge at:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8940504c799e

Andreas' changeset 
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/54bd8a0c1c4b

looks promising. but I'm only guessing based on the checkin comment "Recording continues across loop edge" and some scary-looking pointer math.
(Assignee)

Comment 22

10 years ago
54bd8a0c1c4b enables tracing for some code that was accidentally blacklisted too early. It might be a red herring (we don't trace before that fix some code we really should, but trace incorrectly). So my best guess is its either not that changeset, or its not caused by it.
(Assignee)

Comment 23

10 years ago
Note that upvar2 is in the regression window. However, we fall on our face on trace, so thats a bit unusual behavior for an upvar bug.

79606200f871	Brendan Eich — upvar2, aka the big one take 2 (452498, r=mrbkap).
(Assignee)

Comment 24

10 years ago
rc, just to humor me could you disable the jit and see if it still crashes? I know we are crashing on trace, but if we get something wrong in the code generation for upvar maybe the interpreter dies too (which would be a lot easier to analyze from the stack trace).
andreas, sure thing, checking now.

(and I believe your patch may well be a red herring. It just stood out as "interesting" which is why I mentioned it).
um, with jit.content set to false, andjit jit.chrome false, the page loads completely.

If I turn on jit.content, the page crashes.

Still not necessarily that particular patch, but it's one of the ones in that merge.

Updated

10 years ago
Duplicate of this bug: 499349
Now that RC is out there (and we took some fixes between b99 and RC, iirc) do we know if this crash is still happening?
Gary: can we get a TM regression range worked up based on the regression range posted in comment 21?
(In reply to comment #28)
> Now that RC is out there (and we took some fixes between b99 and RC, iirc) do
> we know if this crash is still happening?

Yes, it is. And it's still #1 and it's about double the amount of crashes for the #2 topcrash.
Gary's telling me he needs a more definitive/reduced testcase to be able to autobisect here. Any chance?
auto-bisect what? The merge lists about 10 changesets. By the time you've written a reduced testcase, you could've bisected them with the given testcase.

and yes, this still happens on latest nightlies/RCs.
Yes, I agree with Rob in comment 32; this is our #1 topcrash and should be the #1 priority to fix in case we decide to block on it. Can we get some manual testing done to see which of the patches on the list regressed it?
I have some disturbing new data:

I was testing this on my Mac Pro at home (dual quadcore XEON 5150, 1st generation) and that page (http://202.181.195.27/forum-5-1.html, and click the over 18 link) crashed every time I loaded it.

On my Mac Book Pro (last year's model), I can't get this page to crash at all.

I don't have access to my mac at home to verify that this is still a problem. Not sure they've changed the page or hardware is a factor.
correction, dual dual core mac pro.
Do we seriously not have a blamed patch yet? It's been all day.

Comment 37

10 years ago
(In reply to comment #36)
> Do we seriously not have a blamed patch yet? It's been all day.

Yep, seriously. It stopped crashing for us as we were searching.

Comment 38

10 years ago
Note that for http://202.181.195.27/forum-5-1.html on mac os x and flash 10.0.22.87 and svn trunk valgrind I get a number of invalid reads and writes with sizes 1, 2, 4 in Flash_EnforceLocalSecurity
I think they may have changed some of the ad content on that page.

Bob, have you tried running this with qm-xserve03? I think it's roughly comparable to the machine I've got at home, hardware-wise.
At this point, with no understanding of what's causing it, I'm having a hard time blocking release on this bug. Rob, are you able to reproduce reliably?

Comment 41

10 years ago
Since we don't have great steps to reproduce or understanding of this bug, I think we should make it public (not security-sensitive).  Keeping it private isn't really protecting users, and might exclude people who can help figure out how to reproduce it.
Beltzner: I said in c#34 that I could produce it reliably on my home machine, not at all on my macbook. I think it may be hardware dependent.

There's no question in my mind that there is a bug here that is causing a large number of crashes, but no idea where it's coming from or which exact patch is causing it.

still waiting to hear from Bob if he can reproduce on his xserve. failing that, it's going to be two+ days before I have access to my desktop machine.

Jesse: some of my crashes have been bus errors and memory access violations. There could be an exploit here, but it'd be nice to get some extra hands on this so we could at least develop a hardware profile for crashing machines.

Could we try to get some of the QA machinery on this?

Comment 43

10 years ago
(In reply to comment #39)
> I think they may have changed some of the ad content on that page.
> 
> Bob, have you tried running this with qm-xserve03? I think it's roughly
> comparable to the machine I've got at home, hardware-wise.

no, but I will right now.
(Assignee)

Comment 44

10 years ago
Rob, what is your desktop configuration?
Desktop's a Mac Pro (early generation, 1,1, Late 2007, I think?) dual processor, dual core 2.66 GHz XEON 5150 running OS X 10.5.7.

I just ran a test on an Xserve 1,1, dual-dual core 2.66GHz running 10.4.11 but didn't get a crash.

I did some informal requests for people to load this site in #qa and #firefox and the few responders didn't get crashes either. It's quite possible the cause of the crash has been removed from the page(s). Deeply troubling.
I'm going to leave this as a nomination for now; we're going to start building Firefox 3.5 RC3 with what we have, and hopefully we'll be able to identify what's causing this crash and find a trivial fix, at which point we can discuss respinning RC3 or waiting for 3.5.1.
(Assignee)

Comment 47

10 years ago
Can we open up the bug then? Would be good if people can search for this instead of filing random bugs we have to triage and dup against this if this comes back.
I'm ok with opening this up.

/be

Comment 49

10 years ago
Bug 499299 now has a list of URLs associated with this crash.  For privacy reasons, only Mozilla employees can access bug 499299.

*This* bug doesn't seem to have any security-sensitive information in it, so I'm making it public.
Group: core-security
I've asked QA to dig into this and indicated this is high priority.

Comment 51

10 years ago
Oops, comment 13 already contains a scrubbed, public list of URLs.
This one crashes for me consistently.

 http://www.skyfunny.com/thread-3548-1-5.html
(gdb) x/20i $pc-24
0x1b667bf9:	jne    0x1b665d81
0x1b667bff:	mov    0x20(%edx),%edx
0x1b667c02:	cmp    $0x4f0e,%edx
0x1b667c08:	jne    0x1b665d90
0x1b667c0e:	mov    0x8(%ecx),%ecx
0x1b667c11:	mov    (%ecx),%ecx
0x1b667c13:	mov    (%ecx),%edx
0x1b667c15:	mov    (%edx),%edx
0x1b667c17:	test   %edx,%edx
0x1b667c19:	jne    0x1b665d9f
0x1b667c1f:	mov    0x20(%ecx),%ecx
0x1b667c22:	cmp    $0x4f0e,%ecx
0x1b667c28:	jne    0x1b665dae
0x1b667c2e:	mov    (%eax),%ecx
0x1b667c30:	mov    (%ecx),%ecx
0x1b667c32:	mov    0xc(%ecx),%ecx
0x1b667c35:	cmp    $0x13025a,%ecx
0x1b667c3b:	jne    0x1b665dbd
0x1b667c41:	mov    (%eax),%ecx
0x1b667c43:	mov    (%ecx),%edx
(gdb) p $pc
$1 = (void (*)()) 0x1b667c11
(gdb)
Posted file source file for webpage crash (obsolete) —
i can confirm the crash on http://www.skyfunny.com/thread-15010-1-1.html.   Attached is the source file, saved from Fx3.0.11
Andreas has this in a debugger now.  Stay tuned.
(In reply to comment #54)
> Created an attachment (id=384752) [details]
> source file for webpage crash
> 
> i can confirm the crash on http://www.skyfunny.com/thread-15010-1-1.html.  
> Attached is the source file, saved from Fx3.0.11

More information:  This was ran against 
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090623 Shiretoko/3.5pre.   Clean profile.
I checked this on Mac Shiretoko, 20090417 (no crash) - 20090418 (crash).
(In reply to comment #57)
> I checked this on Mac Shiretoko, 20090417 (no crash) - 20090418 (crash).

can you add the regression changeset?

Comment 59

10 years ago
(In reply to comment #52)
> This one crashes for me consistently.
>  http://www.skyfunny.com/thread-3548-1-5.html
The two required files from that page to crash are mt2.js and jp.js.

beginning of the two files:
mt2.js: var MooTools={version:"1.2.2",
jp.js: MooTools.More={'version':'1.2.2.1'}
(Assignee)

Comment 60

10 years ago
    cx = ld JSVAL_TO_PSEUDO_BOOLEAN(JSVAL_HOLE)[8]
    ld3217 = ld cx[NULL]
    eos = ld ld3217[NULL]
    ld3218 = ld eos[NULL]
    eor = eq ld3218, NULL
    xf2957: xf eor -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8
              mov eax,8(eax)                  eax(JSVAL_TO_PSEUDO_BOOLEAN(JSVAL_HOLE)) ebx(cx) esi(state) edi(sp)
              mov ecx,0(eax)                  eax(cx) ebx(cx) esi(state) edi(sp)
              mov edx,0(ecx)                  eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp)
              mov edx,0(edx)                  eax(cx) ecx(ld3217) edx(eos) ebx(cx) esi(state) edi(sp)
              test edx,edx                    eax(cx) ecx(ld3217) edx(ld3218) ebx(cx) esi(state) edi(sp)
              jne 0x1b688d63                  eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp)
--------------------------------------- exit block (LIR_xt|LIR_xf)
        0x1b688d63:
                                              merging registers (intersect) with existing edge
              mov ecx,-12(ebp)                 <= restore state
              mov eax,449640044              
              mov esp,ebp                    
        0x1b688d6d:
              jmp 0x1b676ff8                 
--------------------------------------- end exit block 0x1accf688
    shape = ld ld3217[skip257]
    guard(shape) = eq shape, #0xfbe5
    $stack1: xf guard(shape) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8
              mov ecx,32(ecx)                 eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp)
              cmp ecx,64485                   eax(cx) ecx(shape) ebx(cx) esi(state) edi(sp)
              jne 0x1b688d72                  eax(cx) ebx(cx) esi(state) edi(sp)
--------------------------------------- exit block (LIR_xt|LIR_xf)
        0x1b688d72:
                                              merging registers (intersect) with existing edge
              mov ecx,-12(ebp)                 <= restore state
              mov eax,449640120              
              mov esp,ebp                    
        0x1b688d7c:
              jmp 0x1b676ff8                 
--------------------------------------- end exit block 0x1accf6d4
    ld825 = ld cx[8]
    ld3219 = ld ld825[NULL]
    ops = ld ld3219[NULL]
    ld3220 = ld ops[NULL]
    guard(native-map) = eq ld3220, NULL
    xf2958: xf guard(native-map) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8
              mov ecx,8(eax)                  eax(cx) ebx(cx) esi(state) edi(sp)
              mov edx,0(ecx)                  ecx(ld825) ebx(cx) esi(state) edi(sp)
              mov eax,0(edx)                  ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)
              mov eax,0(eax)                  eax(ops) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)
              test eax,eax                    eax(ld3220) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)
              mov eax,-36(ebp)                ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)  <= restore GetProperty_tn145
              jne 0x1b688d81                  eax(GetProperty_tn145) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)
--------------------------------------- exit block (LIR_xt|LIR_xf)
        0x1b688d81:
                                              merging registers (intersect) with existing edge
              mov ecx,-12(ebp)                 <= restore state
              mov eax,449640348              
              mov esp,ebp                    
        0x1b688d8b:
              jmp 0x1b676ff8                 
--------------------------------------- end exit block 0x1accf7b8
    shape = ld ld3219[skip257]
    guard(shape) = eq shape, #0xfbe5
    sp: xf guard(shape) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8
              mov edx,32(edx)                 eax(GetProperty_tn145) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp)
              cmp edx,64485                   eax(GetProperty_tn145) ecx(ld825) edx(shape) ebx(cx) esi(state) edi(sp)
              jne 0x1b688d90                  eax(GetProperty_tn145) ecx(ld825) ebx(cx) esi(state) edi(sp)


Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x1b676c11 in ?? ()

0x1b676bf9:	jne    0x1b688d81
0x1b676bff:	mov    0x20(%edx),%edx
0x1b676c02:	cmp    $0xfbe5,%edx
0x1b676c08:	jne    0x1b688d90
0x1b676c0e:	mov    0x8(%ecx),%ecx
0x1b676c11:	mov    (%ecx),%ecx
0x1b676c13:	mov    (%ecx),%edx
0x1b676c15:	mov    (%edx),%edx
0x1b676c17:	test   %edx,%edx
Based on conversations with Andreas and Bkap, we have to block on this.
Flags: blocking1.9.1? → blocking1.9.1+
tony, juan: I already narrowed this to:

http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=7e2facde0c95&tochange=9b52390838f0

Feel free to bisect it again, but I'm pretty sure the problem's in that merge.
Flags: blocking1.9.1+ → blocking1.9.1?

Comment 64

10 years ago
This crashes my mozilla-central opt build on Mac, with a null deref from JIT code.  Will try to reduce more.
Attachment #384752 - Attachment is obsolete: true

Comment 65

10 years ago
To reduce it, I'm using:

./lithium.py --testcase=t.js ./ok-shell-crashes-browser.py 12 ~/central/opt-obj/dist/Firefox.app/Contents/MacOS/firefox-bin s1.html

I also tried to hack out all the browser-dependent bits (e.g. window, document, navigator), but I got stuck on the last |document| :(
Flags: blocking1.9.1? → blocking1.9.1+

Comment 66

10 years ago
robcee doesn't have time to bisect among the changesets that went into the merge identified in comment 63.  Anyone else want to pick up where he left off, and try to identify the changeset that introduced this crash?
(In reply to comment #66)
> robcee doesn't have time to bisect among the changesets that went into the
> merge identified in comment 63.  Anyone else want to pick up where he left off,
> and try to identify the changeset that introduced this crash?

autoBisect can take over once we get a shell testcase.... :)
for what it's worth i crash every time if i refresh any page with firebug 1.3.3's html inspect console open.

if i disable jit.content i no longer crash.
Assignee: general → gal

Comment 69

10 years ago

Updated

10 years ago
Attachment #384784 - Attachment is obsolete: true

Comment 70

10 years ago
I'm scanning the urls from comment 13. 

Crashes so far on 1.9.1/mac os x.

http://www.latio.lv/lv/
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00054c51 in ?? ()
(gdb) bt
#0  0x00054c51 in ?? ()
#1  0xa03a7690 in __sF ()

http://www.latio.lv/lv/piedavajuma/?view=67988
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00054c51 in ?? ()
(gdb) bt
#0  0x00054c51 in ?? ()
#1  0xa03a7690 in __sF ()
Previous frame inner to this frame (gdb could not unwind past this frame)

http://www.latio.lv/lv/870/
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x1750ec51 in ?? ()
(gdb) bt
#0  0x1750ec51 in ?? ()
#1  0xa03a7690 in __sF ()
Previous frame inner to this frame (gdb could not unwind past this frame)

Comment 71

10 years ago
Attachment #384793 - Attachment is obsolete: true
(Assignee)

Comment 72

10 years ago
Not exploitable, but fairly bad bug. Fix in a sec.
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1
(Assignee)

Comment 73

10 years ago
Posted patch patchSplinter Review
Attachment #384795 - Flags: review?(mrbkap)
(Assignee)

Comment 74

10 years ago
Removed a bogus line from the patch in test_property_cache.
Attachment #384795 - Flags: review?(mrbkap) → review+
(Assignee)

Comment 75

10 years ago
Reliably reproduced by Damon, analyzed by mrbkap, fix confirmed by brendan.
Confirmed to fix the particular test case I was looking at. Independent
confirmation very welcome, also a reduced test case (I will give it a try,
too).

http://hg.mozilla.org/tracemonkey/rev/72f8b38ed38d

Comment 77

10 years ago
Gal, can you create a testcase from scratch, using your understanding of the bug?  This code is hard to reduce all the way.
Attachment #384794 - Attachment is obsolete: true

Comment 78

10 years ago
I want to see a minimal testcase so I know what I'm failing to fuzz ;)
autoBisect shows this is probably related to bug 478525 :

The first bad revision is:
changeset:   26145:f449fe8bd097
parent:      26142:33c5c42a29c7
user:        Andreas Gal
date:        Tue Mar 17 15:39:42 2009 -0700
summary:     Try harder to trace array access with non-int / non-string index (478525, r=brendan).

Strange - this isn't in the regression windows of the previous comments.
Blocks: 478525
Keywords: testcase

Comment 80

10 years ago
That's expected: mrbkap narrowed the previous range to http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ab0047adeb64 but realized that change would only affect it in the browser, not in the shell.

Comment 81

10 years ago
That's expected: mrbkap narrowed the previous range to http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ab0047adeb64 but realized that change would only affect it in the browser, not in the shell.
function a() {}
function b() {}
a.prototype = null;
var o1 = new a();
var o2 = new b();
function test(o)
{
  for (var i = 0; i < 5; i++)
    o.foobar;
}
test(o1);
test(Object.getPrototypeOf(Object.getPrototypeOf(o2)));

Et voilà!
Even more minimal, without ES5 shenanigans:

function a() { }
a.prototype = null;
var o = new a();
function test(o)
{
  for (var i = 0; i < 5; i++)
    o.foobar;
}
test(o);
test(Object.prototype);

So |o| and |Object.prototype| both have the same shape, but the former hops once more before nulling out while the second nulls out immediately.
(In reply to comment #79)
> autoBisect shows this is probably related to bug 478525 :
> 
> The first bad revision is:
> changeset:   26145:f449fe8bd097
> parent:      26142:33c5c42a29c7
> user:        Andreas Gal
> date:        Tue Mar 17 15:39:42 2009 -0700
> summary:     Try harder to trace array access with non-int / non-string index
> (478525, r=brendan).
> 
> Strange - this isn't in the regression windows of the previous comments.

It seems the bug went in with the patch for bug 478512.

/be
Gary, I think bug 478525 is innocent, although it may be enabling tracing of something in the testcase you were running autoBisect on. Could you try Waldo's smallest testcase and see if it doesn't confirm the patch for bug 478512 being the regressing change?

/be
Blocks: 478512
Let's talk about the fix for a second? How localized is the codepath? Are we
talking about something which is touched every time we trace, which would
require a full beta cycle, or something akin to adding a null check to ensure
we don't go somewhere we shouldn't be?
Beltzner: The fix adds a JITted null check, indeed. The factoring out of guardHasPrototype is simple, "constant" in complexity. It builds on common methods used all over, using them in conventional ways.

/be
(Assignee)

Comment 88

10 years ago
I would sleep better if we can give this a week of rc coverage before we ship final, but we don't need a beta for this. Its a localized additional null pointer check, targeting a side exit that was already present previously and we exit trace instead of exploding with a bus error. At the machine level its an additional "test reg, reg ; jz exit". We want to run every test we can think of against this, but I think its pretty low risk overall.
OK, so what's the testplan here? Should we land on 1.9.1 and trunk immediately and start testing on nightly builds? Should we also redo RC3?

My feeling is that if this is just a null check, we don't have as much to risk as otherwise thought, so getting it on more branches and in the new RC is better than waiting.
bc: Jesse suggested taking the list of URLs from the crash reports (first attachment here) and running them with a build with this patch through load-crash-urls to see if it manages to solve most of them. Sound good?
I think we should optimize this by building it into RC3. If we do and it bites back somehow we back out and respin. But it is a straightforward fix.

IMHO the odds are higher that a week's worth of RC testing will emphasize other known topcrashes, or perhaps put a new one on our radar due to new content and/or user cohort in some locale (for example). We need to fix what is topmost and if the next one down is much less frequent, release 3.5 and put the rest of the fixes into the dot release.

/be
I'm inclined to agree with the honourable representative from Sunnyvale in comment 91; I'll tell the build team to scupper rc3build1.

Waldo's pushing this to mozilla-central and mozilla-1.9.1 as we speak.
http://hg.mozilla.org/mozilla-central/rev/3cf02352f0e1 (m-c)
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/2747b209db85 (191 tip)
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a625a31a0ad1 (GECKO191_20090623_RELBRANCH)
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey

Comment 95

10 years ago
(In reply to comment #90)
> bc: Jesse suggested taking the list of URLs from the crash reports (first
> attachment here) and running them with a build with this patch through
> load-crash-urls to see if it manages to solve most of them. Sound good?

Well, obsolete already. :-) I guess we go with fresh builds.

current run at 2239/3054 urls. Note the scan is missing sites that require a click to "I'm over 18" and such. Making list of homepages it is reduced to 1177. That might be good enough for a quick run.

I'll continue to list the reproducible crash urls here so others can check as well. 

http://www.skyfunny.com/thread-3548-1-5.html (known not reproducible locally)
http://www.skyfunny.com/thread-15010-1-1.html (known not reproducible locally)
http://www.latio.lv/ru/novosti/1917/ (same stack as before)
http://www.latio.lv/lv/piedavajuma/?view=67122 (same stack as before)
http://www.colchones-online.com/literas.html?from=adwords&gclid=CPCdi7uPjpsCFZkA4wodcXGboQ (same stack as latio)
http://www.colchones-online.com/ (same stack as latio)
Checking the js shell test from comment 83 and the attached html testcase with a debug build on OS X made from http://hg.mozilla.org/releases/mozilla-1.9.1/rev/2747b209db85 looks good. No crash anymore.
(In reply to comment #85)
> Gary, I think bug 478525 is innocent, although it may be enabling tracing of
> something in the testcase you were running autoBisect on. Could you try Waldo's
> smallest testcase and see if it doesn't confirm the patch for bug 478512 being
> the regressing change?
> 
> /be

Brendan, you're right. :) Using Waldo's smallest testcase, autoBisect confirms bug 478512 might be related instead.

The first bad revision is:
changeset:   25416:707f96a1de28
parent:      25413:c63cf255ec3b
user:        Andreas Gal
date:        Thu Feb 26 19:01:02 2009 -0800
summary:     Trace reading undefined properties (478512, r=jwalden).
No longer blocks: 478525
(Assignee)

Comment 98

10 years ago
478512 is definitively the regressor. Broken originally by me, reviewed by Waldo, identified using his testcase, and fixed by my patch. Its all stays in the family :)

Comment 99

10 years ago
only 9 crashers found in the orginal list. none crash with a fresh build from this morning. am scanning the homepages of the listed urls now with a fresh build.
(In reply to comment #92)
> I'm inclined to agree with the honourable representative from Sunnyvale

Santa Clara, but don't call me a congresscritter if you please (or if I am, where are those bribes?).

The followup fix with the FIXME comment is over-optimistic (proto-chaining means there's still a many to one invalidation hazard when mutating a proto after we cache or guard on a not-found property, even with inherited shapes), but I'll deal with it in bug 497789.

/be
Running the url again from comment 54 no longer crashes on today's branch and trunk nightly.  

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090624 Shiretoko/3.5pre

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090624 Minefield/3.6a1pre
Alongside comment #101, I gave it a whirl on Linux using all test case URLs in comment #95

No crashes on 06/24 trunk and RC3 build 2 

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090624 Minefield/3.6a1pre

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5

Comment 103

10 years ago
no crashes on the homepage run with mac os x xserve, but i did get two hangs:

http://www.kafic.net/
https://ibank.standardchartered.com.my/

but they aren't reproducible locally.
(In reply to comment #103)
> no crashes on the homepage run with mac os x xserve, but i did get two hangs:
> 
> http://www.kafic.net/
> https://ibank.standardchartered.com.my/
> 
> but they aren't reproducible locally.

Also no crashes on the crash-url-run with Windows and a build that contain this fix.
fwiw, the topcrash part of this is fixed, but there's still another (much smaller!) lingering crash with a similar stack. I'd give you a set URL with them, but it's not really possible to search by build ID, so... click on the "Table" view to see how many.

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_MonitorLoopEdge%28JSContext*%2C%20unsigned%20int%26%29

(To be clear, no reason to open this bug, just a confirmation that the topcrash looks fixed.)
Marking bug verified given all the verifications in the comments.
Status: RESOLVED → VERIFIED
Filed bug 500936 on the remaining (non-#1) topcrash.
Crash Signature: [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]
Automatically extracted testcase for this bug was committed:

https://hg.mozilla.org/mozilla-central/rev/2e891e0db397
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.