Closed Bug 634711 Opened 13 years ago Closed 13 years ago

Firefox 4.0b11 Crash [@ avmplus::BitSet::grow(int) ]

Categories

(Core :: JavaScript Engine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: marcia, Assigned: paul.biggar)

References

Details

(Keywords: crash, Whiteboard: fixed-in-nanojit)

Crash Data

Attachments

(2 files)

Seen while reviewing crash stats. Low volume crash that is present in B11 as well as on trunk. http://tinyurl.com/4cy6nxa links to the crashes which are all Win XP. Only one comment, but it is in Spanish. 

First crash showed in up in crash stats using the 2010120300 build.



Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	avmplus::BitSet::grow 	js/src/nanojit/avmplus.h:333
1 	mozjs.dll 	js::Oracle::Oracle 	js/src/jstracer.cpp:1286
2 	mozjs.dll 	js::InitJIT 	js/src/jstracer.cpp:7712
3 	mozjs.dll 	JSCompartment::init 	js/src/jscompartment.cpp:111
4 	mozjs.dll 	js::gc::NewCompartment 	js/src/jsgc.cpp:2862
5 	mozjs.dll 	JS_NewCompartmentAndGlobalObject 	js/src/jsapi.cpp:2979
6 	xul.dll 	CreateNewCompartment 	js/src/xpconnect/src/nsXPConnect.cpp:965
7 	xul.dll 	xpc_CreateGlobalObject 	js/src/xpconnect/src/nsXPConnect.cpp:1003
8 	xul.dll 	xpc_CreateSandboxObject 	js/src/xpconnect/src/xpccomponents.cpp:3219
9 	xul.dll 	nsXPCComponents_utils_Sandbox::CallOrConstruct 	js/src/xpconnect/src/xpccomponents.cpp:3415
10 	xul.dll 	nsXPCComponents_utils_Sandbox::Construct 	js/src/xpconnect/src/xpccomponents.cpp:3306
This is fallout from moving the tracer into the compartment. We make a sandbox, which makes a compartment, which allocates a new tracer set, which OOMs, which we don't treat right. pbiggar wanted to look at this.
Assignee: general → pbiggar
A few more of these crashes showed up on the trunk today: http://tinyurl.com/64rjrup
njn, what's the appropriate way to add OOM checks to nanojit? I can just make bitset functions return bool - would upstream accept that?
pbiggar: you've stepped into a mess.  Various pertinent facts:

- In theory, Nanojit has infallible allocators.  This is implemented by pre-allocating some reserve space that we use if an OOM happens.  See the comment in jstracer.cpp above the definition of DataReserveSize.  Also see bug 624590 for more details and suggestions for a longjmp/setjmp alternative that wouldn't require reserve space.

- Also, all Nanojit allocation goes through its allocators, not via vanilla malloc/calloc/new.  See class Allocator in nanojit/Allocator.h, and also the sub-class VMAllocator in jstracer.h.

- So why does BitSet use calloc?  Well, nanojit/avmplus.h is actually not part of Nanojit per se.  It's a TM-specific file.  Tamarin has its own avmplus.h file which is in a different spot in the Tamarin tree.  (Actually, it has two different files called avmplus.h!  But that's irrelevant here.)  So we don't have to worry about upstreaming any changes to it.  I guess this explains how a call to calloc() crept in.

- Also, there are two BitSet implementations in TM;  the one in nanojit/avmplus.h and also the one in nanojit/Containers.cpp (which is a proper part of Nanojit).  They're almost the same, but the latter uses Nanojit's allocators instead of calloc.  Bug 552355 is open to remove the one in avmplus.h.

Confusing, eh?

So, the proper fix here is to get rid of the BitSet in avmplus.h which uses calloc and change TM to use the BitSet in nanojit/Containers.{h,cpp}.  Then the BitSet would use Nanojit's infallible allocators which have the reserve space as fallback.  That would fix this bug and bug 552355.  I'd love it if you could do this;  I don't think it would be hard.

The quick hacky fix would be to change BitSet in avmplus.h to return bool on operations such as grow() and set() that can fail, and check for failure in TM.  No need to upstream.
Two questions: 

Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const P1 &p1|? This is hitting me here.

Secondly, which vmallocator to use for the bitsets? An existing one (which) or a new one?
(In reply to comment #5)
> Two questions: 
> 
> Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const
> P1 &p1|? This is hitting me here.

I don't remember... how is it hitting you?

> Secondly, which vmallocator to use for the bitsets? An existing one (which)
> or a new one?

Use an existing one.  Their lifetimes are described in jscompartment.h, look for "There are 4 allocators here".
(In reply to comment #6)
> (In reply to comment #5)
> > Two questions: 
> > 
> > Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const
> > P1 &p1|? This is hitting me here.
> 
> I don't remember... how is it hitting you?

Consider:
   Oracle::Oracle(Allocator& a) { .. } // constructor
   cx->new_<Oracle> (allocator);


The const is added to allocator, and the compiler complains because allocator is not const (and should not be). (Actually, I'm surprised there aren't more subtle bugs where we think we're copying when we pass to a constructor, but actually pass by reference).


> Use an existing one.  Their lifetimes are described in jscompartment.h, look
> for "There are 4 allocators here".

What's the lifetime of the bitsets?
(In reply to comment #7)
> 
> > Use an existing one.  Their lifetimes are described in jscompartment.h, look
> > for "There are 4 allocators here".
> 
> What's the lifetime of the bitsets?

IIRC they're in the Oracle, right?  The Oracle is in the TraceRecorder, alongside the TraceMonitor.  So I think you want 'dataAlloc'.
That's what I thought, but it was causing problems, so I wanted to be clear on that. I'll have a closer look then.

Any thoughts on comment 7 part 1 (constness)?
If you post what you have so far I can try to fix your compile error for you.
Remove avmplus:bitset, and use nanojit::bitset instead. The semantics of the two are the same (the implementation is nearly identical). The only difference is the allocators.

It needed a small bit of consting because Oracle uses const properly.

I removed a call to Oracle::clear(). I'm not sure of the right thing to do here. It was segfaulting (valgrind said that dataAlloc->reset() was freeing the memory first). In truth, I dont understand why we flush() during initialization, or why we would be reset()ing allocators that just had things allocated out of them. I'll CC gal, he added this in bug 515852, if my archaeology is correct.
(In reply to comment #10)
> If you post what you have so far I can try to fix your compile error for you.

Moved to bug 657384.
Attachment #532686 - Flags: review?(gal)
gal, review ping?
Comment on attachment 532686 [details] [diff] [review]
Remove the second bitset implementation

Review of attachment 532686 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me.
Attachment #532686 - Flags: review?(gal) → review+
(In reply to comment #4)
> 
> - So why does BitSet use calloc?  Well, nanojit/avmplus.h is actually not
> part of Nanojit per se.  It's a TM-specific file.  Tamarin has its own
> avmplus.h file which is in a different spot in the Tamarin tree.  (Actually,
> it has two different files called avmplus.h!  But that's irrelevant here.) 
> So we don't have to worry about upstreaming any changes to it.  I guess this
> explains how a call to calloc() crept in.

I'm wrong about that.  nanojit/avmplus.h is part of Nanojit.  However, Tamarin doesn't use it.  So even though it's part of Nanojit per se, it's TM-specific.  So amvplus.h and Containers.h changes have to land on Nanojit first.  I'll do that.
I landed the nanojit part:

http://hg.mozilla.org/projects/nanojit-central/rev/8bf26319d7f1

I was about to merge it to TM and land the TM part, but Valgrind started complaining:

==16988==    at 0x82342CE: nanojit::BitSet::get(int) const (Containers.h:81)
==16988==    by 0x81F6AE9: js::Oracle::isStackSlotUndemotable(JSContext*, unsigned int, void const*) const (jstracer.cpp:1331)
==16988==    by 0x81F6B1C: js::Oracle::isStackSlotUndemotable(JSContext*, unsigned int) const (jstracer.cpp:1337)
==16988==    by 0x8239B99: js::CaptureTypesVisitor::visitStackSlots(js::Value*, int, js::StackFrame*) (jstracer.cpp:2066)
==16988==    by 0x82311C3: bool js::VisitFrameSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, unsigned int, js::StackFrame*, js::StackFrame*) (jstracer.cpp:1838)
==16988==    by 0x822F2CE: bool js::VisitStackSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, unsigned int) (jstracer.cpp:1849)
==16988==    by 0x822F7F3: void js::VisitSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, JSObject*, unsigned int, unsigned int, unsigned short*) (jstracer.cpp:1879)
==16988==    by 0x822ED67: void js::VisitSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, JSObject*, unsigned int, js::Queue<unsigned short> const&) (jstracer.cpp:1897)
==16988==    by 0x81F7CCF: js::TypeMap::captureTypes(JSContext*, JSObject*, js::Queue<unsigned short>&, unsigned int, bool) (jstracer.cpp:2111)
==16988==    by 0x81F7430: js::TreeFragment::initialize(JSContext*, js::Queue<unsigned short>*, bool) (jstracer.cpp:1633)
==16988==    by 0x8202745: js::RecordTree(JSContext*, js::TraceMonitor*, js::TreeFragment*, JSScript*, unsigned char*, unsigned int, js::Queue<unsigned short>*) (jstracer.cpp:5739)
==16988==    by 0x822C153: js::RecordTracePoint(JSContext*, js::TraceMonitor*, unsigned int&, bool*, bool) (jstracer.cpp:16806)
==16988==  Address 0x53b1530 is 264 bytes inside a block of size 2,008 free'd
==16988==    at 0x47E1B3A: free (vg_replace_malloc.c:366)
==16988==    by 0x8049D2E: js_free (jsutil.h:255)
==16988==    by 0x8058466: js::Foreground::free_(void*) (jsutil.h:497)
==16988==    by 0x81F3D84: nanojit::Allocator::freeChunk(void*) (jstracer.cpp:218)
==16988==    by 0x825185B: nanojit::Allocator::reset() (Allocator.cpp:62)
==16988==    by 0x81FB674: js::TraceMonitor::flush() (jstracer.cpp:2838)
==16988==    by 0x820A4F6: js::TraceMonitor::init(JSRuntime*) (jstracer.cpp:7719)
==16988==    by 0x80A6A62: JSCompartment::init() (jscompartment.cpp:134)
==16988==    by 0x80EBFF6: js::gc::NewCompartment(JSContext*, JSPrincipals*) (jsgc.cpp:2762)
==16988==    by 0x8071D72: JS_NewCompartmentAndGlobalObject (jsapi.cpp:3029)
==16988==    by 0x8057453: NewGlobalObject(JSContext*, CompartmentKind) (js.cpp:5810)
==16988==    by 0x8057757: Shell(JSContext*, int, char**, char**) (js.cpp:5862)

Lifetime problems!  The dataAlloc->reset() call in TraceMonitor::flush() frees all memory in dataAlloc, including the BitSets, but the Oracle doesn't realize this and keeps trying to access them.  You removed the oracle->clear() call, which used to zero all the BitSets.  Where that call was we now need code that reallocates and clears the BitSets.  Eg. add a BitSet::resetAndRealloc() function and call that for each BitSet within Oracle::clear().
Whiteboard: fixed-in-nanojit
Patch on top of the previous patch.
Attachment #534682 - Flags: review?
Attachment #534682 - Flags: review? → review?(pbiggar)
Comment on attachment 534682 [details] [diff] [review]
patch to fix lifetime problems

It's a bit weird that the Oracle immediately deallocates and reallocates itself, but that's not this bug.
Attachment #534682 - Flags: review?(pbiggar) → review+
(In reply to comment #19)
> 
> It's a bit weird that the Oracle immediately deallocates and reallocates
> itself, but that's not this bug.

Where does it do that?
Oracle::Oracle() -> Oracle::clear() -> Oracle::clearDemotability() -> resetAndAlloc()

Also: TraceMonitor::init() -> TraceMonitor::flush() -> Oracle::clear() ...
Crash Signature: [@ avmplus::BitSet::grow(int) ]
Am pretty sure we're done here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: