Closed
Bug 958980
Opened 9 years ago
Closed 9 years ago
crash in js::SN_IS_TERMINATOR(unsigned char*)
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla29
People
(Reporter: alice0775, Assigned: shu)
References
Details
(Keywords: crash, regression, reproducible, Whiteboard: [dupe me])
Crash Data
Attachments
(2 files, 1 obsolete file)
43 bytes,
text/html
|
Details | |
1.28 KB,
patch
|
jorendorff
:
review+
bajaj
:
approval-mozilla-aurora+
bajaj
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-4ca3b08a-0c49-4095-9c4c-7ef652140112. ============================================================= This crash happens on Nightly29.0a1, holly29.0a1, Aurora28.0a2 as well as Firefox27.b5 Steps To Reproduce: 1. Start Firefox with newly created profile 2. Open testcase 3. Open debugger(Ctrl+Shift+S) (4. Repeat from Step1 if necessary) Actual Results: Crash
![]() |
Reporter | |
Comment 1•9 years ago
|
||
Regression window(m-c) Good: http://hg.mozilla.org/mozilla-central/rev/679410425704 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131121113308 Bad: http://hg.mozilla.org/mozilla-central/rev/cdbe5f3adeff Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131121114210 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=679410425704&tochange=cdbe5f3adeff Regression window(m-i) Good: http://hg.mozilla.org/integration/mozilla-inbound/rev/d9abdb232a25 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131120224907 Bad: http://hg.mozilla.org/integration/mozilla-inbound/rev/22a26c92bf00 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 ID:20131120225324 Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d9abdb232a25&tochange=22a26c92bf00 Suspected: 9ee5c2664d78 Shu-yu Guo — Bug 933882 - Invalidate JIT code instead of doing full GC on debug mode toggle. (r=bhackett)
![]() |
Reporter | |
Comment 2•9 years ago
|
||
Attachment #8358979 -
Attachment is obsolete: true
![]() |
Reporter | |
Comment 3•9 years ago
|
||
Step3 of STR has to perform it after execution of Step2 as promptly as possible.
Assignee | ||
Comment 4•9 years ago
|
||
The perils of GC-dependent API.
Attachment #8359003 -
Flags: review?(jimb)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → shu
Status: NEW → ASSIGNED
Updated•9 years ago
|
I applied the patch and tried the repro steps of the duplicate at bug 953261 comment 4. About 75% of the time, it doesn't crash anymore. But sometimes it crashes at this stack. It might be an independent bug, or perhaps this patch just delayed the crash? xul!mozilla::dom::CallbackObject::Callback xul!nsEventListenerInfo::GetJSVal xul!nsEventListenerInfo::GetListenerObject xul!NS_InvokeByIndex xul!XPCWrappedNative::CallMethod xul!XPC_WN_GetterSetter mozjs!js::Invoke
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #6) > I applied the patch and tried the repro steps of the duplicate at bug 953261 > comment 4. About 75% of the time, it doesn't crash anymore. But sometimes it > crashes at this stack. It might be an independent bug, or perhaps this patch > just delayed the crash? > > xul!mozilla::dom::CallbackObject::Callback > xul!nsEventListenerInfo::GetJSVal > xul!nsEventListenerInfo::GetListenerObject > xul!NS_InvokeByIndex > xul!XPCWrappedNative::CallMethod > xul!XPC_WN_GetterSetter > mozjs!js::Invoke David, that stack is unfamiliar to me. Is there an assert or something? What's going on in xul!mozilla::dom::CallbackObject::Callback?
It's an opt build since we had trouble with debug builds on the other bug. The |this| of the CallbackObject is null. It looks like that |this| corresponds to |Ptr()| at: https://hg.mozilla.org/mozilla-central/file/80a27198344a/content/events/src/nsEventListenerService.cpp#l97 The full stack is here (I didn't want to pollute this bug in case it's a different issue): https://pastebin.mozilla.org/4006317 There is some jsinspector stuff at line 35 if that offers any clues.
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #8) > It's an opt build since we had trouble with debug builds on the other bug. > The |this| of the CallbackObject is null. It looks like that |this| > corresponds to |Ptr()| at: > https://hg.mozilla.org/mozilla-central/file/80a27198344a/content/events/src/ > nsEventListenerService.cpp#l97 > > The full stack is here (I didn't want to pollute this bug in case it's a > different issue): https://pastebin.mozilla.org/4006317 There is some > jsinspector stuff at line 35 if that offers any clues. Sorry, no ideas here. This bug is because of the Debugger API letting you access a not-fully-initialized JSScript. I don't think that can get into a CallbackObject though. Jim, correct me if I'm wrong.
Flags: needinfo?(jimb)
![]() |
Reporter | |
Comment 10•9 years ago
|
||
Firefox27b7 : bp-7bdbcbf6-6322-453f-8216-7e1d12140118 Please fix. this beta cycle is two weeks only, becomes end.
Comment 11•9 years ago
|
||
Comment on attachment 8359003 [details] [diff] [review] Stop Debugger from exposing partially initialized JSScripts. Review of attachment 8359003 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Debugger.cpp @@ +2609,5 @@ > + // It is possible that the script was created and thus exposed to GC, > + // but *not* fully initialized from fullyInit{FromEmitter,Trivial} due > + // to errors. > + if (!script->code()) > + return; Great. Please add a blank line before the comment.
Attachment #8359003 -
Flags: review?(jimb) → review+
Assignee | ||
Comment 12•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bc5af785b3b
Comment 13•9 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #9) > (In reply to David Major [:dmajor] from comment #8) > > The full stack is here (I didn't want to pollute this bug in case it's a > > different issue): https://pastebin.mozilla.org/4006317 There is some > > jsinspector stuff at line 35 if that offers any clues. > > Sorry, no ideas here. This bug is because of the Debugger API letting you > access a not-fully-initialized JSScript. I don't think that can get into a > CallbackObject though. Jim, correct me if I'm wrong. Yes, that's right. Debugger.prototype.findScripts scans the debuggee compartments for all JSScripts that match the query it is given. It scans by actually iterating over all the objects allocated in the Zone, not by following references from one object to another --- which means that it may return scripts that are not actually reachable. This would include scripts that were partially constructed, but then dropped due to an error. Shu, once you found the problem, were you able to identify better steps to reproduce? Can we force an error to occur at the right place, for example?
Flags: needinfo?(jimb) → needinfo?(shu)
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Jim Blandy :jimb from comment #13) > (In reply to Shu-yu Guo [:shu] from comment #9) > > (In reply to David Major [:dmajor] from comment #8) > > > The full stack is here (I didn't want to pollute this bug in case it's a > > > different issue): https://pastebin.mozilla.org/4006317 There is some > > > jsinspector stuff at line 35 if that offers any clues. > > > > Sorry, no ideas here. This bug is because of the Debugger API letting you > > access a not-fully-initialized JSScript. I don't think that can get into a > > CallbackObject though. Jim, correct me if I'm wrong. > > Yes, that's right. Debugger.prototype.findScripts scans the debuggee > compartments for all JSScripts that match the query it is given. It scans by > actually iterating over all the objects allocated in the Zone, not by > following references from one object to another --- which means that it may > return scripts that are not actually reachable. This would include scripts > that were partially constructed, but then dropped due to an error. > > Shu, once you found the problem, were you able to identify better steps to > reproduce? Can we force an error to occur at the right place, for example? STR with timing bugs are always hard. STR that require a particularly unfortunate interleaving is still an open problem for software engineering AFAICT (where's cjones's record & replay system? :)). For an easy test case in the shell you'd need the ability to interleave (not necessarily multithreading, but some ability to call D.p.findScripts *while* parsing some other thread) and fine control over suppressing/triggering GC. Can we do that in the shell? I haven't looked too hard, but my intuition is no. I'm not sure what you mean by "force an error to occur at the right place".
Flags: needinfo?(shu)
https://hg.mozilla.org/mozilla-central/rev/5bc5af785b3b
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment 16•9 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #14) > STR with timing bugs are always hard. STR that require a particularly > unfortunate interleaving is still an open problem for software engineering > AFAICT (where's cjones's record & replay system? :)). For an easy test case > in the shell you'd need the ability to interleave (not necessarily > multithreading, but some ability to call D.p.findScripts *while* parsing > some other thread) and fine control over suppressing/triggering GC. Can we > do that in the shell? I haven't looked too hard, but my intuition is no. > > I'm not sure what you mean by "force an error to occur at the right place". Wait, is this bug caused by inter-thread timing? Off-thread compilation always does its work in a separate compartment, which the debugger should never be able to observe; if findScripts is somehow encountering scripts that are in mid-construction, then that's a very serious bug. I thought this bug was caused by fullyInit{FromEmitter,Trivial} returning an error, and leaving a partially-constructed script in the Zone as garbage. If that's the case, then if the shell could force the error to occur in the right place, we could test it reliably. If the shell can't force such an error to occur now, I don't think it's worth adding such a primitive just to test this.
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Jim Blandy :jimb from comment #16) > (In reply to Shu-yu Guo [:shu] from comment #14) > > STR with timing bugs are always hard. STR that require a particularly > > unfortunate interleaving is still an open problem for software engineering > > AFAICT (where's cjones's record & replay system? :)). For an easy test case > > in the shell you'd need the ability to interleave (not necessarily > > multithreading, but some ability to call D.p.findScripts *while* parsing > > some other thread) and fine control over suppressing/triggering GC. Can we > > do that in the shell? I haven't looked too hard, but my intuition is no. > > > > I'm not sure what you mean by "force an error to occur at the right place". > > Wait, is this bug caused by inter-thread timing? Off-thread compilation > always does its work in a separate compartment, which the debugger should > never be able to observe; if findScripts is somehow encountering scripts > that are in mid-construction, then that's a very serious bug. > I don't know if there are actually threads or just polling, but this is what's going on: +------------------------------+ +------------------------------+ | Main Thread | | UI Thread | |------------------------------| |------------------------------| | Start parsing bad.js | | | | Allocate JSScript for bad.js | | | | | | | | | | Open Debugger panel | | | | D.p.findScripts finds bad.js | | | | | | Parse error | | | | ... | | | | GC | | | +------------------------------+ +------------------------------+ I don't know how to interrupt the parsing process to make a Debugger and call findScripts inside the shell. The error happens in the right place (in the JS frontend), it just happens *after* the Debugger has already opened, which is why Alice's STR say that the Debugger must be opened quickly after the navigating to the page. This patch prevents crashes by ignoring not fully initialized scripts. The frontend should probably block until all scripts have been parsed.
Comment 18•9 years ago
|
||
Thank you for the ASCII art! I'm not sure what you mean by "Main Thread" and "UI Thread". The Debugger UI runs on the main thread. Do you mean that: - We call JS::CompileOffThreadScript, which then queues a job for a worker thread. - The worker thread does the compilation (WorkerThread::handleParseWorkload) in a separate, temporary compartment which Debugger should not be able to see at all, and encounters an error. - The main thread eventually calls JS::FinishOffThreadScript, which waits for the worker thread to finish, and then merges the temporary compartment with the one that originally requested the compilation (WorkerThread::finishParseTask, called from JS::FinishOffThreadScript). - Then, D.p.findScripts finds the dropped JSScript in the merged compartment. Note that finishParseTask does the compartment merge even if an error is thrown. If that's right, then the question is, why can't we get the same error (and thus the same mutant JSScript) without using off-thread compilation?
Comment 19•9 years ago
|
||
I think the "quickly" is simply because the GC will eventually clean up the mutant JSScript.
Comment 20•9 years ago
|
||
This would reproduce the crash, wouldn't it? It crashes on my shell, which doesn't contain your fix's changeset: var text = "function drag(ev)\n\ {\n\ \n\ "; var g = newGlobal(); var dbg = new Debugger(g); g.text = text; g.eval('offThreadCompileScript(text, { filename: "moof" });'); g.eval('try { runOffThreadScript(); } catch (ex) { }'); for (s of dbg.findScripts({})) { print(s.url + " " + s.startLine + " " + s.lineCount); }
Comment 21•9 years ago
|
||
Some of that's noise; this is a little better: var g = newGlobal(); var dbg = new Debugger(g); g.eval('offThreadCompileScript("function drag(ev) {");'); g.eval('try { runOffThreadScript(); } catch (ex) { }'); for (s of dbg.findScripts()) s.lineCount;
Comment 22•9 years ago
|
||
Actually, all the off-thread compilation stuff is a red herring; this crashes: var g = newGlobal(); var dbg = new Debugger(g); try { g.eval('function drag(ev) {'); } catch (ex) { } for (s of dbg.findScripts()) s.lineCount;
Comment 23•9 years ago
|
||
Filed follow-up bug 962441 to add the regression test.
Comment 24•9 years ago
|
||
(Can you tell I've got a test tomorrow?)
Assignee | ||
Comment 25•9 years ago
|
||
(In reply to Jim Blandy :jimb from comment #22) > Actually, all the off-thread compilation stuff is a red herring; this > crashes: > > var g = newGlobal(); > var dbg = new Debugger(g); > try { g.eval('function drag(ev) {'); } catch (ex) { } > for (s of dbg.findScripts()) > s.lineCount; Oh nice, how simple! I thought it would be about interleavings, but it really is only about GC timing. I also didn't really know how the Debugger UI worked. I assumed the UI had a different thread for responsiveness, and that played into it. Too many assumptions.
Assignee | ||
Comment 26•9 years ago
|
||
In retrospect, it doesn't make sense at all why I thought the error from the frontend would immediately collect the errant JSScript, and required off-thread-compilation to trigger.
Comment 27•9 years ago
|
||
Reproduced in nightly 2014-01-17. Verified fixed FF 29.0a1 2014-01-22, win 7 x64.
Comment 28•9 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #26) > In retrospect, it doesn't make sense at all why I thought the error from the > frontend would immediately collect the errant JSScript, and required > off-thread-compilation to trigger. It also shows a shortcoming in our testing --- this is a case that could easily have been covered.
Comment 29•9 years ago
|
||
Can you please request uplift on this with risk analysis. If low risk this should get into one of last beta's going to build tomorrow. Thanks !
Updated•9 years ago
|
Flags: needinfo?(shu)
Assignee | ||
Comment 30•9 years ago
|
||
Comment on attachment 8359003 [details] [diff] [review] Stop Debugger from exposing partially initialized JSScripts. [Approval Request Comment] Bug caused by (feature/regressing bug #): 933882 User impact if declined: GC timing related crashes when using Debugger Testing completed (on m-c, etc.): on m-c Risk to taking this patch (and alternatives if risky): none String or IDL/UUID changes made by this patch: none
Attachment #8359003 -
Flags: approval-mozilla-beta?
Attachment #8359003 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(shu)
Comment 31•9 years ago
|
||
(In reply to Jim Blandy :jimb from comment #28) > It also shows a shortcoming in our testing --- this is a case that could > easily have been covered. So, do you think this is not completely fixed? I marked this verified since the initial crash is no longer reproducible on ff 29
Flags: needinfo?(jimb)
Updated•9 years ago
|
Attachment #8359003 -
Flags: approval-mozilla-beta?
Attachment #8359003 -
Flags: approval-mozilla-beta+
Attachment #8359003 -
Flags: approval-mozilla-aurora?
Attachment #8359003 -
Flags: approval-mozilla-aurora+
Comment 32•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/b1aca8e61c5c https://hg.mozilla.org/releases/mozilla-beta/rev/46a0f202ff0b
Comment 33•9 years ago
|
||
I suck at parentheses. https://hg.mozilla.org/releases/mozilla-aurora/rev/e2600ecd9a9e https://hg.mozilla.org/releases/mozilla-beta/rev/664bc0ef6421
Comment 34•9 years ago
|
||
Verified as fixed on FF 27 beta 9 Build Id: 20140123185438 Environments used: Win XP x86, Win7 X64 I've used the STR provided in comment 0 and the bug is not reproducing anymore.
Comment 35•9 years ago
|
||
(In reply to Paul Silaghi, QA [:pauly] from comment #31) > (In reply to Jim Blandy :jimb from comment #28) > > It also shows a shortcoming in our testing --- this is a case that could > > easily have been covered. > So, do you think this is not completely fixed? > I marked this verified since the initial crash is no longer reproducible on > ff 29 No, no, I just meant that more thorough testing when the feature was first implemented could have caught this bug sooner. This bug is fixed.
Comment 36•9 years ago
|
||
(In reply to Jim Blandy :jimb from comment #35) Jim, did you mean to resent the status flags?
Updated•9 years ago
|
Comment 37•9 years ago
|
||
based on comments 27, 34
Comment 38•9 years ago
|
||
Verified as fixed on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 (20140126004002). I reproduced this issue on a 01/17 Aurora build by simply repeating the steps from comment 0 2 or 3 times. On the 01/26 Aurora it didn't reproduce after multiple tries. Socorro doesn't show any crashes on builds post 01/23 either: https://crash-stats.mozilla.com/report/list?signature=js%3A%3ASN_IS_TERMINATOR%28unsigned+char*%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-01-27+12%3A00%3A00&range_value=4#reports
Keywords: verifyme
You need to log in
before you can comment on or make changes to this bug.
Description
•