Closed
Bug 634711
Opened 13 years ago
Closed 13 years ago
Firefox 4.0b11 Crash [@ avmplus::BitSet::grow(int) ]
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: marcia, Assigned: paul.biggar)
References
Details
(Keywords: crash, Whiteboard: fixed-in-nanojit)
Crash Data
Attachments
(2 files)
9.22 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
2.63 KB,
patch
|
paul.biggar
:
review+
|
Details | Diff | Splinter Review |
Seen while reviewing crash stats. Low volume crash that is present in B11 as well as on trunk. http://tinyurl.com/4cy6nxa links to the crashes which are all Win XP. Only one comment, but it is in Spanish. First crash showed in up in crash stats using the 2010120300 build. Frame Module Signature [Expand] Source 0 mozjs.dll avmplus::BitSet::grow js/src/nanojit/avmplus.h:333 1 mozjs.dll js::Oracle::Oracle js/src/jstracer.cpp:1286 2 mozjs.dll js::InitJIT js/src/jstracer.cpp:7712 3 mozjs.dll JSCompartment::init js/src/jscompartment.cpp:111 4 mozjs.dll js::gc::NewCompartment js/src/jsgc.cpp:2862 5 mozjs.dll JS_NewCompartmentAndGlobalObject js/src/jsapi.cpp:2979 6 xul.dll CreateNewCompartment js/src/xpconnect/src/nsXPConnect.cpp:965 7 xul.dll xpc_CreateGlobalObject js/src/xpconnect/src/nsXPConnect.cpp:1003 8 xul.dll xpc_CreateSandboxObject js/src/xpconnect/src/xpccomponents.cpp:3219 9 xul.dll nsXPCComponents_utils_Sandbox::CallOrConstruct js/src/xpconnect/src/xpccomponents.cpp:3415 10 xul.dll nsXPCComponents_utils_Sandbox::Construct js/src/xpconnect/src/xpccomponents.cpp:3306
Comment 1•13 years ago
|
||
This is fallout from moving the tracer into the compartment. We make a sandbox, which makes a compartment, which allocates a new tracer set, which OOMs, which we don't treat right. pbiggar wanted to look at this.
Updated•13 years ago
|
Assignee: general → pbiggar
Reporter | ||
Comment 2•13 years ago
|
||
A few more of these crashes showed up on the trunk today: http://tinyurl.com/64rjrup
Assignee | ||
Comment 3•13 years ago
|
||
njn, what's the appropriate way to add OOM checks to nanojit? I can just make bitset functions return bool - would upstream accept that?
Comment 4•13 years ago
|
||
pbiggar: you've stepped into a mess. Various pertinent facts: - In theory, Nanojit has infallible allocators. This is implemented by pre-allocating some reserve space that we use if an OOM happens. See the comment in jstracer.cpp above the definition of DataReserveSize. Also see bug 624590 for more details and suggestions for a longjmp/setjmp alternative that wouldn't require reserve space. - Also, all Nanojit allocation goes through its allocators, not via vanilla malloc/calloc/new. See class Allocator in nanojit/Allocator.h, and also the sub-class VMAllocator in jstracer.h. - So why does BitSet use calloc? Well, nanojit/avmplus.h is actually not part of Nanojit per se. It's a TM-specific file. Tamarin has its own avmplus.h file which is in a different spot in the Tamarin tree. (Actually, it has two different files called avmplus.h! But that's irrelevant here.) So we don't have to worry about upstreaming any changes to it. I guess this explains how a call to calloc() crept in. - Also, there are two BitSet implementations in TM; the one in nanojit/avmplus.h and also the one in nanojit/Containers.cpp (which is a proper part of Nanojit). They're almost the same, but the latter uses Nanojit's allocators instead of calloc. Bug 552355 is open to remove the one in avmplus.h. Confusing, eh? So, the proper fix here is to get rid of the BitSet in avmplus.h which uses calloc and change TM to use the BitSet in nanojit/Containers.{h,cpp}. Then the BitSet would use Nanojit's infallible allocators which have the reserve space as fallback. That would fix this bug and bug 552355. I'd love it if you could do this; I don't think it would be hard. The quick hacky fix would be to change BitSet in avmplus.h to return bool on operations such as grow() and set() that can fail, and check for failure in TM. No need to upstream.
Assignee | ||
Comment 5•13 years ago
|
||
Two questions: Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const P1 &p1|? This is hitting me here. Secondly, which vmallocator to use for the bitsets? An existing one (which) or a new one?
Comment 6•13 years ago
|
||
(In reply to comment #5) > Two questions: > > Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const > P1 &p1|? This is hitting me here. I don't remember... how is it hitting you? > Secondly, which vmallocator to use for the bitsets? An existing one (which) > or a new one? Use an existing one. Their lifetimes are described in jscompartment.h, look for "There are 4 allocators here".
Assignee | ||
Comment 7•13 years ago
|
||
(In reply to comment #6) > (In reply to comment #5) > > Two questions: > > > > Back in bug 624878, why do parameters to js_new (now in rt->new_) use |const > > P1 &p1|? This is hitting me here. > > I don't remember... how is it hitting you? Consider: Oracle::Oracle(Allocator& a) { .. } // constructor cx->new_<Oracle> (allocator); The const is added to allocator, and the compiler complains because allocator is not const (and should not be). (Actually, I'm surprised there aren't more subtle bugs where we think we're copying when we pass to a constructor, but actually pass by reference). > Use an existing one. Their lifetimes are described in jscompartment.h, look > for "There are 4 allocators here". What's the lifetime of the bitsets?
Comment 8•13 years ago
|
||
(In reply to comment #7) > > > Use an existing one. Their lifetimes are described in jscompartment.h, look > > for "There are 4 allocators here". > > What's the lifetime of the bitsets? IIRC they're in the Oracle, right? The Oracle is in the TraceRecorder, alongside the TraceMonitor. So I think you want 'dataAlloc'.
Assignee | ||
Comment 9•13 years ago
|
||
That's what I thought, but it was causing problems, so I wanted to be clear on that. I'll have a closer look then. Any thoughts on comment 7 part 1 (constness)?
Comment 10•13 years ago
|
||
If you post what you have so far I can try to fix your compile error for you.
Assignee | ||
Comment 11•13 years ago
|
||
Remove avmplus:bitset, and use nanojit::bitset instead. The semantics of the two are the same (the implementation is nearly identical). The only difference is the allocators. It needed a small bit of consting because Oracle uses const properly. I removed a call to Oracle::clear(). I'm not sure of the right thing to do here. It was segfaulting (valgrind said that dataAlloc->reset() was freeing the memory first). In truth, I dont understand why we flush() during initialization, or why we would be reset()ing allocators that just had things allocated out of them. I'll CC gal, he added this in bug 515852, if my archaeology is correct.
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to comment #10) > If you post what you have so far I can try to fix your compile error for you. Moved to bug 657384.
Assignee | ||
Updated•13 years ago
|
Attachment #532686 -
Flags: review?(gal)
Assignee | ||
Comment 14•13 years ago
|
||
gal, review ping?
Comment 15•13 years ago
|
||
Comment on attachment 532686 [details] [diff] [review] Remove the second bitset implementation Review of attachment 532686 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine to me.
Attachment #532686 -
Flags: review?(gal) → review+
Comment 16•13 years ago
|
||
(In reply to comment #4) > > - So why does BitSet use calloc? Well, nanojit/avmplus.h is actually not > part of Nanojit per se. It's a TM-specific file. Tamarin has its own > avmplus.h file which is in a different spot in the Tamarin tree. (Actually, > it has two different files called avmplus.h! But that's irrelevant here.) > So we don't have to worry about upstreaming any changes to it. I guess this > explains how a call to calloc() crept in. I'm wrong about that. nanojit/avmplus.h is part of Nanojit. However, Tamarin doesn't use it. So even though it's part of Nanojit per se, it's TM-specific. So amvplus.h and Containers.h changes have to land on Nanojit first. I'll do that.
Comment 17•13 years ago
|
||
I landed the nanojit part: http://hg.mozilla.org/projects/nanojit-central/rev/8bf26319d7f1 I was about to merge it to TM and land the TM part, but Valgrind started complaining: ==16988== at 0x82342CE: nanojit::BitSet::get(int) const (Containers.h:81) ==16988== by 0x81F6AE9: js::Oracle::isStackSlotUndemotable(JSContext*, unsigned int, void const*) const (jstracer.cpp:1331) ==16988== by 0x81F6B1C: js::Oracle::isStackSlotUndemotable(JSContext*, unsigned int) const (jstracer.cpp:1337) ==16988== by 0x8239B99: js::CaptureTypesVisitor::visitStackSlots(js::Value*, int, js::StackFrame*) (jstracer.cpp:2066) ==16988== by 0x82311C3: bool js::VisitFrameSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, unsigned int, js::StackFrame*, js::StackFrame*) (jstracer.cpp:1838) ==16988== by 0x822F2CE: bool js::VisitStackSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, unsigned int) (jstracer.cpp:1849) ==16988== by 0x822F7F3: void js::VisitSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, JSObject*, unsigned int, unsigned int, unsigned short*) (jstracer.cpp:1879) ==16988== by 0x822ED67: void js::VisitSlots<js::CaptureTypesVisitor>(js::CaptureTypesVisitor&, JSContext*, JSObject*, unsigned int, js::Queue<unsigned short> const&) (jstracer.cpp:1897) ==16988== by 0x81F7CCF: js::TypeMap::captureTypes(JSContext*, JSObject*, js::Queue<unsigned short>&, unsigned int, bool) (jstracer.cpp:2111) ==16988== by 0x81F7430: js::TreeFragment::initialize(JSContext*, js::Queue<unsigned short>*, bool) (jstracer.cpp:1633) ==16988== by 0x8202745: js::RecordTree(JSContext*, js::TraceMonitor*, js::TreeFragment*, JSScript*, unsigned char*, unsigned int, js::Queue<unsigned short>*) (jstracer.cpp:5739) ==16988== by 0x822C153: js::RecordTracePoint(JSContext*, js::TraceMonitor*, unsigned int&, bool*, bool) (jstracer.cpp:16806) ==16988== Address 0x53b1530 is 264 bytes inside a block of size 2,008 free'd ==16988== at 0x47E1B3A: free (vg_replace_malloc.c:366) ==16988== by 0x8049D2E: js_free (jsutil.h:255) ==16988== by 0x8058466: js::Foreground::free_(void*) (jsutil.h:497) ==16988== by 0x81F3D84: nanojit::Allocator::freeChunk(void*) (jstracer.cpp:218) ==16988== by 0x825185B: nanojit::Allocator::reset() (Allocator.cpp:62) ==16988== by 0x81FB674: js::TraceMonitor::flush() (jstracer.cpp:2838) ==16988== by 0x820A4F6: js::TraceMonitor::init(JSRuntime*) (jstracer.cpp:7719) ==16988== by 0x80A6A62: JSCompartment::init() (jscompartment.cpp:134) ==16988== by 0x80EBFF6: js::gc::NewCompartment(JSContext*, JSPrincipals*) (jsgc.cpp:2762) ==16988== by 0x8071D72: JS_NewCompartmentAndGlobalObject (jsapi.cpp:3029) ==16988== by 0x8057453: NewGlobalObject(JSContext*, CompartmentKind) (js.cpp:5810) ==16988== by 0x8057757: Shell(JSContext*, int, char**, char**) (js.cpp:5862) Lifetime problems! The dataAlloc->reset() call in TraceMonitor::flush() frees all memory in dataAlloc, including the BitSets, but the Oracle doesn't realize this and keeps trying to access them. You removed the oracle->clear() call, which used to zero all the BitSets. Where that call was we now need code that reallocates and clears the BitSets. Eg. add a BitSet::resetAndRealloc() function and call that for each BitSet within Oracle::clear().
Whiteboard: fixed-in-nanojit
Updated•13 years ago
|
Attachment #534682 -
Flags: review? → review?(pbiggar)
Assignee | ||
Comment 19•13 years ago
|
||
Comment on attachment 534682 [details] [diff] [review] patch to fix lifetime problems It's a bit weird that the Oracle immediately deallocates and reallocates itself, but that's not this bug.
Attachment #534682 -
Flags: review?(pbiggar) → review+
Comment 20•13 years ago
|
||
My follow-up fix: http://hg.mozilla.org/projects/nanojit-central/rev/8eec2117b714
Comment 21•13 years ago
|
||
(In reply to comment #19) > > It's a bit weird that the Oracle immediately deallocates and reallocates > itself, but that's not this bug. Where does it do that?
Assignee | ||
Comment 22•13 years ago
|
||
Oracle::Oracle() -> Oracle::clear() -> Oracle::clearDemotability() -> resetAndAlloc() Also: TraceMonitor::init() -> TraceMonitor::flush() -> Oracle::clear() ...
Updated•13 years ago
|
Crash Signature: [@ avmplus::BitSet::grow(int) ]
Assignee | ||
Comment 23•13 years ago
|
||
Am pretty sure we're done here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•