Closed Bug 465784 Opened 16 years ago Closed 15 years ago

JIT crashes JSSpeccy ZX Spectrum emulator [@ Queue<unsigned char*>::add]

Categories

(Core :: JavaScript Engine, defect, P2)

x86
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: bugs, Assigned: dmandelin)

References

()

Details

(Keywords: crash, verified1.9.1)

Crash Data

Attachments

(1 file, 1 obsolete file)

Click on a ROM, say, the first one, then Run.
Crashes.  Different trace from bug #465773

These emulators do seem to give the JIT trouble. :)

http://crash-stats.mozilla.com/report/index/3e279837-b578-4d10-baeb-37fd20081119?p=1
Andreas, do you think we hit OOM in trackCfgMerges?  Probably easiest to tear or #if 0 that code out, it looks dead.
Yeah, take it out. Its not used atm.
OS: Linux → All
ID: 3e279837-b578-4d10-baeb-37fd20081119
Signature: Queue<unsigned char*>::add(unsigned char*)Details Frames Modules Raw Dump  Signature Queue<unsigned char*>::add(unsigned char*) 
UUID 3e279837-b578-4d10-baeb-37fd20081119 
Time 2008-11-19 11:27:52-08 
Uptime 854 
Product Firefox 
Version 3.1b2pre 
Build ID 20081119072659 
OS Windows NT 
OS Version 5.1.2600 Service Pack 2 
CPU x86 
CPU Info GenuineIntel family 6 model 15 stepping 6 
Crash Reason EXCEPTION_ACCESS_VIOLATION 
Crash Address 0x0 
Comments trying out emulators to kill the JIT. The JIT seems to not like emulators. 

Crashing Thread
Frame Module Signature [Expand] Source 
0 js3250.dll Queue<unsigned char*>::add js/src/jstracer.h:91  
1 js3250.dll TraceRecorder::trackCfgMerges js/src/jstracer.cpp:2493  
2 js3250.dll TraceRecorder::fuseIf js/src/jstracer.cpp:2543  
3 js3250.dll TraceRecorder::cmp js/src/jstracer.cpp:4628  
4 js3250.dll js_Interpret js/src/jsopcode.tbl:135  
5 js3250.dll js_Invoke js/src/jsinterp.cpp:1331  
6 js3250.dll js_InternalInvoke js/src/jsinterp.cpp:1388  
7 js3250.dll JS_CallFunctionValue js/src/jsapi.cpp:5242  
8 xul.dll nsJSContext::CallEventHandler dom/src/base/nsJSEnvironment.cpp:1979  
9 xul.dll nsGlobalWindow::RunTimeout dom/src/base/nsGlobalWindow.cpp:7661  
10 xul.dll nsGlobalWindow::TimerCallback dom/src/base/nsGlobalWindow.cpp:7993  
11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420  
12 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512  
13 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510  
14 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170  
15 nspr4.dll PR_GetEnv  
16 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87  
17 firefox.exe firefox.exe@0x2197  
18 kernel32.dll BaseProcessStart
Severity: normal → critical
Keywords: crash
Summary: JIT crashes JSSpeccy ZX Spectrum emulator → JIT crashes JSSpeccy ZX Spectrum emulator [@ Queue<unsigned char*>::add]
Flags: blocking1.9.1+
Depends on: 465886
retest
http://crash-stats.mozilla.com/report/index/06770993-356d-4005-ad30-4ef5d2081206

0  	js3250.dll  	TypeMap::captureStackTypes  	js/src/jstracer.cpp:937
1 	js3250.dll 	js_RecordTree 	js/src/jstracer.cpp:2942
2 	js3250.dll 	js_Interpret 	js/src/jsinterp.cpp:4495
This is a dup of the out-of-memory but timeless reported. I can't believe we hit OOM on a desktop Windows box.
Well you do. :)
Very repeatable. Multiple machines.
Well at least we have a test case and a tester now. We should probably get rid of all dynamic allocation and put everything into the code cache which we know how to flush if we OOM.
I can't reproduce this on either OS X or Windows.  Is this a desktop machine?  I'm curious as to how much memory it has (physical + virtual) that we'd be running out.

Of course, we need to fix our OOM handling regardless :)
1152MiB on one machine with 512MiB of virtual (just noticed this - guess I need to bump that up - was lower because this machine used to have less mem)
2048MiB on the other with 2048MiB of virtual.
The crash trace I linked in original report (the one timeless pasted into his comment) was on the machine with 2GiB/2GiB.

Actually, I have yet to find a windows machine that *doesn't* crash this emulator - I'll try a couple of others tomorrow at work.

David, are you using the nightly builds or one you're making for yourself?
Slight correction (doesn't help David's case thought ;) )

The machine with 2GiB of memory had the windows paging file set to:
3069-3869MiB

Retested it in latest nightly build. Still crashed.
http://crash-stats.mozilla.com/report/index/189f5cfb-779e-46af-b658-c4e132081211
Hey. Um sorry.
Autoupdate moved me onto the beta and I didn't notice.
I refetched a new nightly.  Still crashes.  Below is with actual latest nightly :)

http://crash-stats.mozilla.com/report/index/7ac840c5-89e3-4f23-b639-ce00a2081211

0   js3250.dll      Queue<unsigned char*>::add       js/src/jstracer.h:91
1   js3250.dll  TraceRecorder::trackCfgMerges   js/src/jstracer.cpp:2489
2   js3250.dll  TraceRecorder::fuseIf   js/src/jstracer.cpp:2539
3   js3250.dll  TraceRecorder::cmp  js/src/jstracer.cpp:4624
4   js3250.dll  js_Interpret    js/src/jsopcode.tbl:135
5   js3250.dll  js_Invoke   js/src/jsinterp.cpp:1331
6   js3250.dll  js_InternalInvoke   js/src/jsinterp.cpp:1388
7   js3250.dll  JS_CallFunctionValue    js/src/jsapi.cpp:5245
8   xul.dll     nsJSContext::CallEventHandler   dom/src/base/nsJSEnvironment.cpp:1989
9   xul.dll     nsGlobalWindow::RunTimeout  dom/src/base/nsGlobalWindow.cpp:7661
10  xul.dll     nsGlobalWindow::TimerCallback   dom/src/base/nsGlobalWindow.cpp:7993
11  xul.dll     nsTimerImpl::Fire   xpcom/threads/nsTimerImpl.cpp:420
12  xul.dll     nsTimerEvent::Run   xpcom/threads/nsTimerImpl.cpp:512
13  xul.dll     nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:510
14  xul.dll     nsBaseAppShell::Run     widget/src/xpwidgets/nsBaseAppShell.cpp:170
15  xul.dll     nsAppStartup::Run   toolkit/components/startup/src/nsAppStartup.cpp:192
16  nspr4.dll   PR_GetEnv   
17  firefox.exe     wmain   toolkit/xre/nsWindowsWMain.cpp:87
18  firefox.exe     firefox.exe@0x2197  
19  kernel32.dll    BaseProcessStart
Retesting in the hourly tinderbox as suggested by Littlemutt.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1228998184/

Crashing is no longer instantly reproducible.  Similar to bug #465773, I now get unpredictable crashes.
Also similar to bug #465773, it takes longer :).
I've tried loading various roms, poking at keyboard in vain attempt to get them to do something (shows how little I've looked into what the emulators are actually doing) and all of a sudden. Boom.

http://crash-stats.mozilla.com/report/index/b386cc6a-6deb-4a55-9e94-fe96a2081211
0   js3250.dll      js3250.dll@0x40bbf      
1   js3250.dll  js3250.dll@0x33ede  
2   js3250.dll  js3250.dll@0x2e8ba  
3   js3250.dll  js3250.dll@0x2e9cd  
4   js3250.dll  js3250.dll@0x79c6   
5   xul.dll     xul.dll@0xb9eb2     
6   xul.dll     xul.dll@0x1334d6    
7   xul.dll     xul.dll@0xd6d1f     
8   xul.dll     xul.dll@0x4081b

http://crash-stats.mozilla.com/report/index/8ec372c5-25c9-4e4d-8089-36c5a2081211
And another.

Guess I'll see if this unpredictableness is carried over into tomorrow's nightly.

Also like bug #465773 I had plenty of available memory when it crashed, Firefox having barely gone over 100 megs with over a gig free.
Possibly useful addendum.
Most repeatable crash in nightly (not hourly) was just to click Run without clicking on any roms.
Nightly now behaves like hourly.
Crashing is more rare, can no longer just hit run.

It does, however, happen.
This one was almost instantaneous.  Hit run, nothing, loaded Manic Miner, few seconds later, kaboom.
Did try punching some keys, but I don't even know if this emulator responds to keyboard input.
http://crash-stats.mozilla.com/report/index/6aa75635-086d-42b6-81a9-4bc0d2081212
0   js3250.dll      js_ValueToNumber     js/src/jsnum.cpp:858
1   js3250.dll  js_ValueToECMAInt32     js/src/jsnum.cpp:928
2   js3250.dll  js_Interpret    js/src/jsinterp.cpp:3537
3   js3250.dll  js_Invoke   js/src/jsinterp.cpp:1333
4   js3250.dll  js_InternalInvoke   js/src/jsinterp.cpp:1390
5   js3250.dll  JS_CallFunctionValue    js/src/jsapi.cpp:5247
6   xul.dll     nsJSContext::CallEventHandler   dom/src/base/nsJSEnvironment.cpp:1989
7   xul.dll     nsGlobalWindow::RunTimeout  dom/src/base/nsGlobalWindow.cpp:7659
8   xul.dll     nsGlobalWindow::TimerCallback   dom/src/base/nsGlobalWindow.cpp:7991
9   xul.dll     nsTimerImpl::Fire   xpcom/threads/nsTimerImpl.cpp:420
10  xul.dll     nsTimerEvent::Run   xpcom/threads/nsTimerImpl.cpp:512
11  xul.dll     nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:510
12  xul.dll     nsBaseAppShell::Run     widget/src/xpwidgets/nsBaseAppShell.cpp:170
13  xul.dll     nsAppStartup::Run   toolkit/components/startup/src/nsAppStartup.cpp:192
14  nspr4.dll   PR_GetEnv   
15  firefox.exe     wmain   toolkit/xre/nsWindowsWMain.cpp:87
16  firefox.exe     firefox.exe@0x2197  
17  kernel32.dll    BaseProcessStart

2nd attempt, switching between ROMs did nothing interesting for a good long while.
However, after giving up and just letting Manic Miner run for about 30 seconds, it once again crashed.
http://crash-stats.mozilla.com/report/index/dcb5283e-ffa8-4b5f-88f9-d54ca2081212
0       @0x0    
1       @0x82d03b   
2   js3250.dll  js_MonitorLoopEdge  js/src/jstracer.cpp:3784
3   js3250.dll  js_Interpret    js/src/jsinterp.cpp:3098

That not very verbose trace is same as in bug #465773.
Should I dupe this one perhaps?  The other trace sure didn't match, but you guys know the internals, I'm just a user playing tester.
Priority: -- → P2
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081218 Minefield/3.2a1pre
http://crash-stats.mozilla.com/report/index/b90a64bd-dc7a-4939-95bb-048dd2081218

Essentially same as comment #14.

Was just checking to see if I was still getting a weird (corrupted?) trace like in the jsMSX bug.  I'm not.
And for those who've posted that they can't reproduce, yeah, is a little inconsistent.
In this case, fired it up, loaded miner ROM, waited a bit, got bored, started launching ROMs working down from top.
Crashed on 3rd one or so.
I made a little bit of progress on this today:

* Steps to reproduce: Basically go to the site and play one of the games for 5 minutes or so. Sometimes it crashes much sooner, sometimes a litter later. But I was never able to duplicate in a debug build. I can do it in an opt build running under gdb, though.

* Proximate cause: Running in MacOS, I get different stack traces from kyberneticist. In my case, the program executes a jump to 0x0. And this happens because Assembler::patch is called with a GuardRecord with a 0x0 jump target. There is an assertion for this but of course it is turned off for opt builds, which are the only ones where I can reproduce. The nearby code:

    void Assembler::patch(GuardRecord *lr) {
        Fragment *frag = lr->exit->target;
        NanoAssert(frag->fragEntry != 0);
        NIns* was = nPatchBranch((NIns*)lr->jmp, frag->fragEntry);

I'm not familiar with this stuff so I'll just have to keep grinding away at the debugging. If anyone has any bright ideas that might help me direct effort, let me know, because playing Sinclair Z80 games at half speed for 10 minutes to redirect the bug is getting annoying. ;-)
Attached patch First patch (obsolete) — Splinter Review
The version of this bug that I have reproduced turned out to be a missing OOM check. The attached patch should fix that one, although it will blacklist the pc at which the OOM compile takes place so it probably wants an extra check specific to OOM. I'm going to let it run overnight with the patch to see if it still crashes.

kyberneticist's bug may be different but we think it is probably a missing OOM check at least. I'll switch over and debug on windows if the problem persists after patching the bug I found so far.
Assignee: general → dmandelin
Comment on attachment 357903 [details] [diff] [review]
First patch

Drive-by r=me. Looks good.
Attachment #357903 - Flags: review+
As discussed with dmandelin today the bug mentioned in the title looks like a general malloc OOM condition [@ Queue<unsigned char*>::add]. timeless had a bug on this. Try reproducing by turning off swap space.
I'm puzzled about this OOM thing since Firefox memory stays pretty regular throughout.
Is there some bug causing it to suddenly request a couple of gigs of memory after having been perfectly happy with 100-150 megs up to that point?

Oh. And still crashes for me.  Same trace as usual. Almost instantaneously last time.
http://crash-stats.mozilla.com/report/index/cef0e923-b65f-4ca5-9eeb-3c0f32090121
(In reply to comment #20)
> I'm puzzled about this OOM thing since Firefox memory stays pretty regular
> throughout.

Tracing OOMs are special--there is a fixed-size preallocated pool of memory for traces, so they run out of memory independently of your system and without you seeing any increase in memory consumption.

> Oh. And still crashes for me.  Same trace as usual. Almost instantaneously last
> time.
> http://crash-stats.mozilla.com/report/index/cef0e923-b65f-4ca5-9eeb-3c0f32090121

That stack trace looks like the exact same bug I found on MacOS. I haven't even sent the patch for review yet, so unless you're doing your own builds from the tracemonkey repo it shouldn't be fixed.
Attached patch PatchSplinter Review
Andreas, I was thinking this would be a better way to do it. If so, please plus it. If not, just minus it and I'll check in the first one.
Comment on attachment 357991 [details] [diff] [review]
Patch

Yeah I think an argument can be made both ways (we might run OOM at the same point all the time), but this looks good too. Obsoleting the other patch.
Attachment #357991 - Flags: review+
Attachment #357903 - Attachment is obsolete: true
Checked in to TM as f58f6e91f2b3.
http://hg.mozilla.org/mozilla-central/rev/f58f6e91f2b3
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Keywords: fixed1.9.1
verified FIXED on builds: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090421 Minefield/3.6a1pre ID:20090421032809

and

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090421 Shiretoko/3.5b4pre ID:20090421030848
Status: RESOLVED → VERIFIED
Crash Signature: [@ Queue<unsigned char*>::add]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: