Closed Bug 611653 Opened 9 years ago Closed 9 years ago

Assertion failure: obj->containsSlot(slot)

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b12
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: bc, Assigned: dmandelin)

References

(Blocks 1 open bug, )

Details

(Keywords: assertion, Whiteboard: [hardblocker][has patch][fixed-in-tracemonkey])

Attachments

(5 files, 7 obsolete files)

Attached file spider.xpi
1. install Spider
2. firefox -spider -url 'http://gamehay.net/' -start -quit 
3. Assertion failure: obj->containsSlot(slot), at /work/mozilla/builds/2.0.0/mozilla/js/src/jsinterp.cpp:5316

Doesn't appear to happen on mac at least just loading the page in the browser.

See also: bug 596145

2.0.0 mac/win/linux

Assertion failure: obj->containsSlot(slot), at /work/mozilla/builds/2.0.0/mozilla/js/src/jsinterp.cpp:5316

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x0649e465 in JS_Assert (s=0x696d7cf "obj->containsSlot(slot)", file=0x696bb38 "/work/mozilla/builds/2.0.0/mozilla/js/src/jsinterp.cpp", ln=5316) at /work/mozilla/builds/2.0.0/mozilla/js/src/jsutil.cpp:80
80	    *((int *) NULL) = 0;  /* To continue from here in GDB: "return" then "continue". */
(gdb) bt
#0  0x0649e465 in JS_Assert (s=0x696d7cf "obj->containsSlot(slot)", file=0x696bb38 "/work/mozilla/builds/2.0.0/mozilla/js/src/jsinterp.cpp", ln=5316) at /work/mozilla/builds/2.0.0/mozilla/js/src/jsutil.cpp:80
#1  0x063cc633 in js::Interpret () at /work/mozilla/builds/2.0.0/mozilla/js/src/jsinterp.cpp:5316
#2  0x063daea7 in js::RunScript (cx=0x22d18530, script=0x1ee56aa0, fp=0x1000048) at jsinterp.cpp:665
#3  0x063dbc63 in js::Invoke (cx=0x22d18530, argsRef=@0xbfffce54, flags=8192) at jsinterp.cpp:768
#4  0x063dc35b in js::ExternalInvoke (cx=0x22d18530, thisv=@0xbfffceb8, fval=@0xbfffcef8, argc=1, argv=0xbfffd0f8, rval=0xbfffd170) at jsinterp.cpp:881
#5  0x06311511 in js::ExternalInvoke (cx=0x22d18530, obj=0x22c5f888, fval=@0xbfffcef8, argc=1, argv=0xbfffd0f8, rval=0xbfffd170) at jsinterp.h:954
#6  0x06311669 in JS_CallFunctionValue (cx=0x22d18530, obj=0x22c5f888, fval={asBits = 18446462629381047232, s = {payload = {i32 = 583435200, u32 = 583435200, boo = 583435200, str = 0x22c683c0, obj = 0x22c683c0, ptr = 0x22c683c0, why = 583435200}, tag = JSVAL_TAG_OBJECT}, asDouble = -nan(0xf000722c683c0), asPtr = 0x22c683c0}, argc=1, argv=0xbfffd0f8, rval=0xbfffd170) at /work/mozilla/builds/2.0.0/mozilla/js/src/jsapi.cpp:4908
#7  0x058ed612 in nsXPCWrappedJSClass::CallMethod (this=0x22b83c30, wrapper=0x23351b00, methodIndex=3, info=0x889e28, nativeParams=0xbfffd40c) at /work/mozilla/builds/2.0.0/mozilla/js/src/xpconnect/src/xpcwrappedjsclass.cpp:1694
#8  0x058e4589 in nsXPCWrappedJS::CallMethod (this=0x23351b00, methodIndex=3, info=0x889e28, params=0xbfffd40c) at /work/mozilla/builds/2.0.0/mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp:577
#9  0x06150bf8 in PrepareAndDispatch (self=0x24966d20, methodIndex=3, args=0xbfffd534) at /work/mozilla/builds/2.0.0/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_unixish_x86.cpp:93
#10 0x0614b73b in nsXPTCStubBase::Stub3 (this=0x24966d20) at xptcstubsdef.inc:1
#
blocking2.0: --- → ?
Depends on: 605015
Bob, can you still reproduce this? Looks ok here.
blocking2.0: ? → betaN+
I was able to reproduce 1 out of 8 tries with Mac OS X 10.5 intel and 1 out of 3 tries with WinXP and 0 out of 3 tries on Linux.
I can repro this under a debugger on Linux.  The assertion hits in JSOP_GETGLOBAL which wants to access slot 152 where the (global) object's slotSpan() is 124.

Bob, can you think of any way that the spider addon is doing anything with script globals that could break compile-n-go assumptions (e.g., adding or removing properties)?  The assert happens at the very beginning of 'handleEvent' on http://gamehay.net/game/js/media.js line 57, which is called to handle the event of an XHR OnStopRequest.
When Spider starts and before the content page is loaded in the xul:browser element, Spider sets up a number of global variables in its own window object, a window watcher for dialogs, a console listener and a load handler for the xul:browser element where the content page is loaded. In this case Spider doesn't need to load the external hook script to crash, so I think that covers most of what would violate the assumption. The source (a little out of date) can be found at <http://hg.mozilla.org/automation/sisyphus/file/687aa4ec9cf1/spider/chrome/content/spider>
Is this the same bug as bug 614714 or bug 613619? Asking because bug 614714 has a shell test case which triggers this assert.
Bug 620991 covers comment 5.
No longer depends on: 605015
Whiteboard: [orange]
Blocks: 620991
No longer blocks: 438871
Depends on: 605015
I repro this in Windows about 1 out of 3. The global object that it crashes on is weird:

object 090040A0
class 04A4F44C Window
flags: delegate branded has_equality inDictionaryMode
properties:
    enumerate readonly permanent "window": slot 123
proto <XPC_WN_ModsAllowed_NoCall_Proto_JSClass object at 09006150>
parent null
private 0B5B38C0
slots:
   0 (reserved) = undefined
   1 (reserved) = undefined
   [ Many more undefined reserved slots ]
 120 (reserved) = undefined
 121 (reserved) = <unnamed function at 090072D0 (JSFunction at 090072D0)>
 122 (reserved) = <RegExpStatics object at 090040C8>
 123 = <Proxy object at 09008048>

That's printed out after the crash in the debugger. I went and made it print out the global when it is compiling the script, and it's an object with the same address, and the same first 124 properties, and then many more properties, with property 'http' in slot 152, which is exactly what the code is trying to do.

Interestingly, this repros iff I first get an assertion dialog "XPConnect is being called on a scope without a 'Components' property!". (Actually, I get it twice before the crash.) That assertion is generated from this code in xpcwrappednativescope.cpp:

    // This is pretty much always bad. It usually means that native code is
    // making a callback to an interface implemented in JavaScript, but the
    // document where the JS object was created has already been cleared and the
    // global properties of that document's window are *gone*. Generally this
    // indicates a problem that should be addressed in the design and use of the
    // callback code.
    NS_ERROR("XPConnect is being called on a scope without a 'Components' property!");

That would seem to explain why we see an empty Window object.
Bob, do you think this might be a bug in Spider? Comment 8 says we are tripping an XPConnect assertion about when we make a callback. It seems to me that's probably a bug in the extension's use of JSAPI, or maybe something about event generation and delegation in the browser (DOM/xpconnect)?
David, it could very well be a bug in Spider. I'll see what I can find out.
(In reply to comment #10)
> David, it could very well be a bug in Spider. I'll see what I can find out.

OK, unblocking for now. Let us know if you find out that it is a browser bug.
blocking2.0: betaN+ → -
I hit this all the time with a vanilla debug build and google voice. We should block on this.
blocking2.0: - → ?
Assertion failure: obj->containsSlot(slot), at ../../../js/src/jsinterp.cpp:5334

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x0000000101a707d6 in JS_Assert (s=0x101f56554 "obj->containsSlot(slot)", file=0x101f56a88 "../../../js/src/jsinterp.cpp", ln=5334) at ../../../js/src/jsutil.cpp:80
80	    *((int *) NULL) = 0;  /* To continue from here in GDB: "return" then "continue". */
(gdb) bt 10
#0  0x0000000101a707d6 in JS_Assert (s=0x101f56554 "obj->containsSlot(slot)", file=0x101f56a88 "../../../js/src/jsinterp.cpp", ln=5334) at ../../../js/src/jsutil.cpp:80
#1  0x000000010199697a in js::Interpret () at ../../../js/src/jsinterp.cpp:5334
#2  0x00000001019a47f8 in js::RunScript (cx=0x126db0a90, script=0x11bf087c0, fp=0x1189060b0) at jsinterp.cpp:657
#3  0x00000001019a5612 in js::Invoke (cx=0x126db0a90, argsRef=@0x7fff5fbfb310, flags=8192) at jsinterp.cpp:737
#4  0x00000001019556f5 in CallOrConstructBoundFunction (cx=0x126db0a90, argc=1, vp=0x118906038) at ../../../js/src/jsfun.cpp:2290
#5  0x00000001019a75bf in js::CallJSNative (cx=0x126db0a90, native=0x10195550b <CallOrConstructBoundFunction(JSContext*, unsigned int, js::Value*)>, argc=1, vp=0x118906038) at jscntxtinlines.h:692
#6  0x00000001019a5406 in js::Invoke (cx=0x126db0a90, argsRef=@0x7fff5fbfb500, flags=0) at jsinterp.cpp:700
#7  0x00000001019a5e51 in js::ExternalInvoke (cx=0x126db0a90, thisv=@0x7fff5fbfb590, fval=@0x7fff5fbfb5c8, argc=1, argv=0x7fff5fbfbc48, rval=0x7fff5fbfb840) at jsinterp.cpp:858
#8  0x00000001018d5262 in js::ExternalInvoke (cx=0x126db0a90, obj=0x11b8e0058, fval=@0x7fff5fbfb5c8, argc=1, argv=0x7fff5fbfbc48, rval=0x7fff5fbfb840) at jsinterp.h:961
#9  0x00000001018d539d in JS_CallFunctionValue (cx=0x126db0a90, obj=0x11b8e0058, fval={asBits = 18445477441071579648, debugView = {payload47 = 4757225984, tag = JSVAL_TAG_OBJECT}, s = {payload = {i32 = 462258688, u32 = 462258688, why = 462258688, word = 18445477441071579648}}, asDouble = -nan(0xb80011b8d8200), asPtr = 0xfffb80011b8d8200}, argc=1, argv=0x7fff5fbfbc48, rval=0x7fff5fbfb840) at ../../../js/src/jsapi.cpp:5019
(More stack frames follow...)
(gdb) call js_DumpObject(obj)
No symbol "obj" in current context.
(gdb) up
#1  0x000000010199697a in js::Interpret () at ../../../js/src/jsinterp.cpp:5334
5334	    JS_ASSERT(obj->containsSlot(slot));
(gdb) call js_DumpObject(obj)
object 0x11b100120
class 0x126e3a038 Window
flags: delegate branded has_equality inDictionaryMode
properties:
    enumerate readonly permanent "window": slot 117
proto <XPC_WN_ModsAllowed_NoCall_Proto_JSClass object at 0x11b103210>
parent null
private 0x11d325f60
slots:
   0 (reserved) = undefined
   1 (reserved) = undefined
   2 (reserved) = undefined
   3 (reserved) = undefined
   4 (reserved) = undefined
   5 (reserved) = undefined
   6 (reserved) = undefined
   7 (reserved) = undefined
   8 (reserved) = undefined
   9 (reserved) = undefined
  10 (reserved) = undefined
  11 (reserved) = undefined
  12 (reserved) = undefined
  13 (reserved) = undefined
  14 (reserved) = undefined
  15 (reserved) = undefined
  16 (reserved) = undefined
  17 (reserved) = undefined
  18 (reserved) = undefined
  19 (reserved) = undefined
  20 (reserved) = undefined
  21 (reserved) = undefined
  22 (reserved) = undefined
  23 (reserved) = undefined
  24 (reserved) = undefined
  25 (reserved) = undefined
  26 (reserved) = undefined
  27 (reserved) = undefined
  28 (reserved) = undefined
  29 (reserved) = undefined
  30 (reserved) = undefined
  31 (reserved) = undefined
  32 (reserved) = undefined
  33 (reserved) = undefined
  34 (reserved) = undefined
  35 (reserved) = undefined
  36 (reserved) = undefined
  37 (reserved) = undefined
  38 (reserved) = undefined
  39 (reserved) = undefined
  40 (reserved) = undefined
  41 (reserved) = undefined
  42 (reserved) = undefined
  43 (reserved) = undefined
  44 (reserved) = undefined
  45 (reserved) = undefined
  46 (reserved) = undefined
  47 (reserved) = undefined
  48 (reserved) = undefined
  49 (reserved) = undefined
  50 (reserved) = undefined
  51 (reserved) = undefined
  52 (reserved) = undefined
  53 (reserved) = undefined
  54 (reserved) = undefined
  55 (reserved) = undefined
  56 (reserved) = undefined
  57 (reserved) = undefined
  58 (reserved) = undefined
  59 (reserved) = undefined
  60 (reserved) = undefined
  61 (reserved) = undefined
  62 (reserved) = undefined
  63 (reserved) = undefined
  64 (reserved) = undefined
  65 (reserved) = undefined
  66 (reserved) = undefined
  67 (reserved) = undefined
  68 (reserved) = undefined
  69 (reserved) = undefined
  70 (reserved) = undefined
  71 (reserved) = undefined
  72 (reserved) = undefined
  73 (reserved) = undefined
  74 (reserved) = undefined
  75 (reserved) = undefined
  76 (reserved) = undefined
  77 (reserved) = undefined
  78 (reserved) = undefined
  79 (reserved) = undefined
  80 (reserved) = undefined
  81 (reserved) = undefined
  82 (reserved) = undefined
  83 (reserved) = undefined
  84 (reserved) = undefined
  85 (reserved) = undefined
  86 (reserved) = undefined
  87 (reserved) = undefined
  88 (reserved) = undefined
  89 (reserved) = undefined
  90 (reserved) = undefined
  91 (reserved) = undefined
  92 (reserved) = undefined
  93 (reserved) = undefined
  94 (reserved) = undefined
  95 (reserved) = undefined
  96 (reserved) = undefined
  97 (reserved) = undefined
  98 (reserved) = undefined
  99 (reserved) = undefined
 100 (reserved) = undefined
 101 (reserved) = undefined
 102 (reserved) = undefined
 103 (reserved) = undefined
 104 (reserved) = undefined
 105 (reserved) = undefined
 106 (reserved) = undefined
 107 (reserved) = undefined
 108 (reserved) = undefined
 109 (reserved) = undefined
 110 (reserved) = undefined
 111 (reserved) = undefined
 112 (reserved) = undefined
 113 (reserved) = undefined
 114 (reserved) = undefined
 115 (reserved) = <unnamed function at 0x11b102e00 (JSFunction at 0x11b102e00)>
 116 (reserved) = <RegExpStatics object at 0x11b100168>
 117 = <Proxy object at 0x11b105068>

(gdb) p slot
$1 = 157
(gdb) call DumpJSStack()
0 anonymous() ["https://www.google.com/voice/resources/3747311955-gc_prod.js":434]
    this = [object Object]
1 anonymous() ["https://www.google.com/voice/resources/3747311955-gc_prod.js":433]
    this = [object Object]
2 anonymous([object Event @ 0x11ac6e9c0 (native @ 0x127083910)]) ["https://www.google.com/voice/resources/3747311955-gc_prod.js":433]
    this = [object Object]
(gdb)
Andreas, (a) are you still getting this, and (b) if so do you get the XPConnect "assertion" dialog before the containsSlot assertion? (If not it might be an OS thing.) It still seems to me this is a bug in either XPConnect or the add-on where a function is getting called on a window that is closed. Do we look for an xpconnect bug? File bugs on Google Voice for Fx? Change xpconnect so that it squelches the function call in that condition? I think we need some help from xpconnect experts here.
I can run with a debug build and see if this reproduces. Though, this is a JS engine bug. What slot is not found here?
(google voice uses flash by the way, and flash is know to make a lot of weird calls into JS, but still, we shouldn't assert here)
(In reply to comment #16)
> I can run with a debug build and see if this reproduces. Though, this is a JS
> engine bug. What slot is not found here?

I'm not convinced it is a JS engine bug. See comment 8: the reason the slot is not found is that we are calling a function that belongs (i.e., has as its global) a window that has been torn down, which clears out its slots. Asserting this way is new because of the global var optimizations: we just read the slot directly because it's compile-n-go. Maybe we should start making calls to closed windows fail before they even start.
The api doesn't disallow doing clear scope and then calling a function, does it?
(In reply to comment #19)
> The api doesn't disallow doing clear scope and then calling a function, does
> it?

It seems like it should. If a global object has permanent properties, the only reason those properties should be removed is if the object is no longer reachable by language code. Deleting them, and then invoking code which has baked in assumptions about the object layout, is bizarre.

We don't want to introduce shape checks on GETGLOBAL, it'd cost us perf (we already lost some on SETGLOBAL for spec conformance).

It's hard to imagine that the JS code would run properly, even with shape guards. Why exactly is this being done?
The clear should compensate for this and refuse to remove read-only properties. That might be a security issue though (for the outer window, not for the inner).
Even if it's only permanent properties (that would fix this bug), what does it mean to grab a random script in compile-n-go code, clear out the global, and try to run it again?
I am ok with throwing in the API if we can.
This seems to be causing the orange in bug 620991.  Specifically, it seems to be breaking a clearInterval call in one of our tests, which is interfering with mochitest-chrome runs on Windows XP.

See bug 620991 comment 73 and later for an analysis of what's happening here.
Whiteboard: hardblocker
Smaug, can you help us out here? This is related to nsGlobalWindow::ClearScope. Recap of the problem so you don't have to read all the history above:

STR are in comment 0. The basic idea is that we are calling JS scripted functions that belong to a window that has been ClearScope'd. In Fx4.0, this is very bad, because there is a new optimization for global variables that requires that the global object not be invalidated. I guess we might someday not call ClearScope, but for now, it would probably be OK to prevent those calls from going through. (See Details below for full details.)

Does stopping those calls make sense? jst suggested that you would know an efficient way to detect those calls and the right places to detect them.

Details:

Following those STR, we assert in the JS engine in a bad way (may crash) like this:

1. nsGlobalWindow::ClearScope gets called (via either ReallyClearScope or SetDocShell, it looks like).

2. An event handler tries to call a JS function that belongs to that global. Specific steps here: The outer event loop generates an XHR stop-request event, calling XHR::OnStopRequest. This detects that the target isn't really there anymore, so it generates an XHR error. This generates a DOM change event (ready state change, I think), and that tries to call a JS event handler function.

3. DEBUG_CheckForComponentsInScope detects that we are calling a method on a window with cleared scope and shows a dialog (in debug builds, of course).

4. We continue trying to call the event handler, calling JS_CallFunctionValue. This calls a JS scripted function.

5. We assert trying to read the value of a global variable declared with 'var'.

The reason this now asserts is that for a read of a global variable declared with 'var' (or any other permanent property of the global object), we compile that into a direct slot read for any compile-and-go script. So, a script that is compiled with compile-and-go (a) must only ever be called with that same global object, and (b) may not be called after that global is JS_ClearScope'd. Making a call that violate (a) or (b) is now a JSAPI violation and can be expected to crash.
Event listeners don't get called if the inner window their (current)target is
bound to is not the current inner window of the outer window.
But is this a case where the event listener lives in a different context
than the event target (XHR)?

I'll try to reproduce this.
Can't reproduce on linux.
Will try on osx.

But anyway, if the event listener is from different context than than the
target of the event, there isn't much DOM can do.
XPConnect might be able to detect?
Attached file a minimal testcase
This triggers the same assertion, without any event handling.
Comment on attachment 505053 [details]
a minimal testcase

And DEBUG_CheckForComponentsInScope doesn't seem to detect this case.
Just in case this is better for debugging...
(In reply to comment #25)
> The reason this now asserts is that for a read of a global variable declared
> with 'var' (or any other permanent property of the global object), we compile
> that into a direct slot read for any compile-and-go script. So, a script that
> is compiled with compile-and-go (a) must only ever be called with that same
> global object, and (b) may not be called after that global is JS_ClearScope'd.
> Making a call that violate (a) or (b) is now a JSAPI violation and can be
> expected to crash.
I don't understand how anything in DOM could prevent that happening.
It is rather radical change to the API, IMO.
(Not that I understand JS API too well :) )
In the testcase it is just js in window A using a js function from window B, 
and window B just happened to be destroyed already.
Why do we clear scope that object if the window is destroyed? That seems unnecessary.
Blake or jst should know. See also Bug 470510.
This doesn't destroy the window the same way as previous test, yet the assertion crash happens, though not 100% reliably.
FYI, on 3.6 I get
WARNING: Not same origin error!: file /home/smaug/mozilla/hg/192/dom/base/nsJSEnvironment.cpp, line 478
JavaScript error: javascript:var%20foo%20=%20'bar';%20function%20test()%20{%20return%20foo;%20};%20void(0);, line 1: foo is not defined
(In reply to comment #33)
> Why do we clear scope that object if the window is destroyed? That seems
> unnecessary.

It's basically leftovers from the pre-inner/outer window era. We've wanted to remove that for some time now, but doing so causes leaks in some cases that we have yet to track down.
smaug, jst, IIUC, as long as we call JS_ClearScope when 'nuking' the global window, DOM can't prevent a violation of our new API requirement, because it can be done with pure JS (windows A, B, fun on A has ref to fun on B, B gets closed, call fun on A), where DOM has no opportunity to even see the violation. 

So, I think we should try not to call JS_ClearScope. Two ways to do this come to mind:

1. Stop calling JS_ClearScope and debug the memory leaks.

2. Instead of calling JS_ClearScope, set all the property values of the global object to |undefined| (possibly using a new JSAPI function for this purpose).

#2 seems hackier but I'm not sure we have the time for #1. Do you think these memory leaks could reasonably be debugged in time for Fx4?
#2 is not sound from a security perspective. The property names alone might leak information. Debugging leaks is a real pain though and we might not catch all of them (only those our tests currently finger).
smaug, do you have cycles to debug the leaks we've seen from removing the JS_ClearScope calls in the past? IIRC we were seeing leaks during mochitest runs, so try could help there. Peter, do you think this is too risky at this point, from a leak point of view or otherwise?

Andreas, I don't see a security problem with #2, leaving the properties on the old inner with undefined values is no worse than not clearing scope at all AFAICT.
As long we make a new JSAPI and call that only in the case that the window is going dead that might work, but we can't make JS_ClearScope behave like this.
(In reply to comment #42)
> As long we make a new JSAPI and call that only in the case that the window is
> going dead that might work, but we can't make JS_ClearScope behave like this.

Yes, it would be some new API function that no one else is allowed to use. I was kind of hoping that we might be able to get away with just calling JS_DeleteProperty or the like on just a few selected things, but I don't know how hard it would be to find those things.
(In reply to comment #41)
> smaug, do you have cycles to debug the leaks we've seen from removing the
> JS_ClearScope calls in the past? IIRC we were seeing leaks during mochitest
> runs, so try could help there. Peter, do you think this is too risky at this
> point, from a leak point of view or otherwise?

Changing things which might cause leaks at this point sounds risky.

Also, are the leaks really the only problem; is 
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp?mark=1905-1909#1885 not a problem anymore?

Also, are we talking about all the ClearScope calls in nsGlobalWindow, or 
only the one Bug 470510 removes? That bug removes the call from
nsGlobalWindow::FreeInnerObjects and seems to fix my testcases.
(well, the patch doesn't apply to trunk).
(In reply to comment #44)
> (In reply to comment #41)
> > smaug, do you have cycles to debug the leaks we've seen from removing the
> > JS_ClearScope calls in the past? IIRC we were seeing leaks during mochitest
> > runs, so try could help there. Peter, do you think this is too risky at this
> > point, from a leak point of view or otherwise?
> 
> Changing things which might cause leaks at this point sounds risky.

I kind of figured that too.

> Also, are the leaks really the only problem; is 
> http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp?mark=1905-1909#1885
> not a problem anymore?
> 
> Also, are we talking about all the ClearScope calls in nsGlobalWindow, or 
> only the one Bug 470510 removes? That bug removes the call from
> nsGlobalWindow::FreeInnerObjects and seems to fix my testcases.
> (well, the patch doesn't apply to trunk).

We would be removing only JS_ClearScope calls after which a method can be called with that global object; I'm not sure what set that is. I'm also still unclear on which is the global object, the inner or outer window. AFAIK only the real global object matters.

A thought occurs to me for a simple hack that might solve the immediate problem here. We could provide a variant of JS_ClearScope that is exactly like the current one, except that (a) it doesn't shrink the slot vector (so that reads to it are memory safe) and (b) sets all the values in the slot vector to |undefined| (so that reads are type safe and don't leak information). But the properties would still be gone, so that unoptimized property reads or queries would be errors, exactly as for JS_ClearScope'd objects.

I think that might not quite work, because I think there is code in ClearScope that does stuff to the object after calling JS_ClearScope, which could shrink its slot vector at that point. But maybe we can solve that.

Taking a broader view of the problem might help, too. If we could figure out how this would ideally work, that might point to an approximation we can hack up quickly. The fundamental problem with these "bad" calls is that the system (a) creates permanent properties on the global object, then (b) deletes those properties, and (c) then runs scripts that were compiled for that global. Deleting permanent properties on a live object seems bad. I don't know exactly what requirements the JS_ClearScopes are providing on the DOM side, but it seems we should be able to find a way to satisfy them a better way.
I would imagine that a JS_ClearScope variant that simply set all properties to undefined and nothing else would do the trick here. I doubt any of the other stuff the JS_ClearScope() does is important in this case.
(In reply to comment #44)
> Changing things which might cause leaks at this point sounds risky.

Agreed.

> Also, are the leaks really the only problem; is 
> http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp?mark=1905-1909#1885
> not a problem anymore?

mrbkap should confirm this, but I doubt that's relevant any longer since we brain transplant the outer window nowadays, so nothing should leak even w/o clearing the outer's scope. It's the inner that's important here AFAICT.

> Also, are we talking about all the ClearScope calls in nsGlobalWindow, or 
> only the one Bug 470510 removes? That bug removes the call from
> nsGlobalWindow::FreeInnerObjects and seems to fix my testcases.
> (well, the patch doesn't apply to trunk).

Long term they should all go, short term maybe replace all of them with a new variant that doesn't actually delete the properties?
Attached patch patch (obsolete) — Splinter Review
Add an inline method to JSObject that clears alll slots. Call this instead of JS_ClearScope() right before the window dies.
(In reply to comment #48)
> Created attachment 505664 [details] [diff] [review]
> patch
> 
> Add an inline method to JSObject that clears alll slots. Call this instead of
> JS_ClearScope() right before the window dies.

Doesn't branding make it unsafe to do that without changing shape? What if there is a PIC or property cache entry that points to a branded slot value, and then we clear it, it gets GC'd, and then we try to call it?

We also call JS_DefineProperty and JS_ClearRegExpStatics on the inner window after calling JS_ClearScope, so we have to make sure whatever we do doesn't trash the object so badly those fail, or else do something different there.
I'm starting to think maybe we shouldn't fix this. It doesn't seem like that serious of a problem--the worst that can happen is an invalid read, and it seems only to happen with extensions. (Andreas, are you still seeing this with Google Voice?)

Anyway, I tried removing the JS_ClearScope call from nsJSContext::ClearScope and that had leaks on mochitests 1, 2, and 4. I don't think we have time to debug that.

Otherwise, we could try clearing out the slot values, while keeping the properties, but:

- The 'window' property must not be cleared out. Currently, the code reads that property and then defines it again after JS_ClearScope as non-writable, non-configurable. So it seems we would need to either (a) skip clearing that slot, or (b) after blanking everything, make that property writable, set it, and then make it non-writable again.

- Branded objects and method-valued properties are also a problem. For them, I guess we could unbrand the object? Not sure if we need to do anything else.

It's hard to see a solution that isn't pretty hacky. So I worry we'd run a high risk of regressions trying to fix this. 

I'd rather go with a simpler mitigating measure, but I'm not sure what that could be, either. We could try not shrinking the slots. But I think GC can shrink them anyway at any time.

Or maybe we could overwrite the slots only for the "builtin" properties of the global object (global reserved slots? not sure exactly what they would be)? Not sure if there's a way to do that.
What, not fixing this?
The assertion may happen with any web page. I could assume that several
even a bit more complex web apps will trigger it - it is just very easy
to write such code even accidentally.
(In reply to comment #51)
> What, not fixing this?
> The assertion may happen with any web page. I could assume that several
> even a bit more complex web apps will trigger it - it is just very easy
> to write such code even accidentally.

I'm just not convinced we have any quick fix that hurts us less than this bug. I'm trying a super-hacky variant of clearing the values, though. Maybe it will work.
Attached patch v2 (obsolete) — Splinter Review
This patch fixes the comment 0 test case (we get the called on thing with no components warning, but no assert in JS). Currently running on try.

I started with Andreas's patch. It turns out that:

- we better not clear the regexp statics, because nsJSContext::ClearSlots wants to clear that
- |obj| can be an outer window proxy, in which case we have to call JS_ClearSlots as before

I unbrand the object before clearing, to avoid any future problems there. This can fail (should be very rare), which is a small problem. In that case, I just call JS_ClearSlots as usual. It seems we could also consider not doing anything in that case,
Assignee: general → dmandelin
Attachment #505664 - Attachment is obsolete: true
Status: NEW → ASSIGNED
(continuing sentence cut off in comment 53), in which case we would leak. Again, that should be very rare.
What does unbrand do? And when does it fail?
Catching up on bugmail, sorry for not getting here sooner.

Dave, unbrand does not fail, rather it returns false if shape overflow could not be cured by a GC. No error, rather a disable-caches/abort-or-bail-traces thing.

JS_ClearScope is a bad old sharp API from the dark ages. We really need to debug those leaks, but in the mean time, changing it to leave the object's slots vector intact and setting almost all slots to undefined seems best.

/be
Attached patch v3 (obsolete) — Splinter Review
Attachment #506851 - Flags: review?(jwalden+bmo)
Attachment #506851 - Flags: review?(Olli.Pettay)
Attachment #506048 - Attachment is obsolete: true
Comment on attachment 506851 [details] [diff] [review]
v3


>-    JS_ClearScope(mContext, obj);
>+    // Hack fix for bug 611653. This replaces a call to JS_ClearScope, which
>+    // was required to avoid leaks. The intent of this code is to try to avoid
>+    // the leaks without shrinking the slot array by clearing the values of
>+    // the object.
>+    if (obj->isNative())
Could you add a comment why this 'if ()'

Also, style nit, 
if () {

} else {

}
even if there is just one statement.


I'll test this a bit still today, once my build is ready.
Attachment #506851 - Flags: review?(mrbkap)
Comment on attachment 506851 [details] [diff] [review]
v3

>+JS_FRIEND_API(bool)
>+js_UnbrandAndClearSlots(JSContext *cx, JSObject *obj)
>+{
>+    JS_ASSERT(obj->isNative());
>+    JS_ASSERT(obj->isGlobal());
>+
>+    if (!obj->unbrand(cx))
>+        return false;
>+
>+    for (int key = JSProto_Null; key < JSProto_LIMIT * 3; key++)
Not that I'm reviewing js/ part of the patch at all, but where is the * 3 coming?

And nit, ++key is faster than key++ (unless the latter is optimized).


Seems to work in the cases I tried. Things just became undefined.
But mrbkap or jst should review this too.
Attachment #506851 - Flags: review?(Olli.Pettay) → review+
Attached patch v4 (fixes style, comments) (obsolete) — Splinter Review
This just picks up Smaug's stylistic comments, so I won't spam everyone with canceled and new review requests.
For an int or other POD type ++ prefix vs. postfix does not matter if the result is not used.

Now the if-else is overbraced, contrary to JS house style! Smaug, we do not follow the Gecko C++ style guide of always brace (not all Gecko hackers do, either, for short control-flow-transfer consequents: break/continue/return).

Bleah.

/be
Oh, but that if-else was in nsJSEnvironment.cpp -- then (brace yourself dmandelin) it was using the Gecko brace style, but it wrongly indents by 4 not 2.

/be
Comment on attachment 506897 [details] [diff] [review]
v4 (fixes style, comments)

>+    for (int key = JSProto_Null; key < JSProto_LIMIT * 3; key++)
>+        JS_SetReservedSlot(cx, obj, key, JSVAL_VOID);

Aha, now I can finally make a substantive comment:

JSProto_LIMIT * 3 should be a macro #defined in jsapi.h to the value from JSCLASS_GLOBAL_FLAGS:

#define JSCLASS_GLOBAL_FLAGS                                                  \
    (JSCLASS_IS_GLOBAL |                                                      \
     JSCLASS_HAS_RESERVED_SLOTS(JSProto_LIMIT * 3 + JSRESERVED_GLOBAL_SLOTS_COUNT))

/be
Attached patch v4.1 (more style nits) (obsolete) — Splinter Review
Attachment #506897 - Attachment is obsolete: true
Attachment #506851 - Flags: review?(jwalden+bmo) → review?(cam)
Attachment #506963 - Attachment is obsolete: true
Comment on attachment 506851 [details] [diff] [review]
v3

Argh, sent review to cam on wrong bug. (Should have been for bug 628612.)
Attachment #506851 - Flags: review?(cam)
Comment on attachment 506972 [details] [diff] [review]
v5 (use global reserve slots formula)

>+    // Hack fix for bug 611653. Originally, this always called JS_ClearScope,
>+    // which was required to avoid leaks. But for native objects, the JS
>+    // engine has an optimization that requires that permanent properties of
>+    // the global object are never deleted. So instead, we call a new special
>+    // API that clears the values of the global, thus avoiding leaks without
>+    // We just overwrote all slots to undefined, so the freelist has
>+    // been trashed. We need to clear the head pointer or else we will
>+    // crash later. This leaks slots but the object is all but dead
>+    // anyway.
>+    if (obj->hasPropertyTable())
>+        obj->lastProperty()->table->freelist = SHAPE_INVALID_SLOT;
. . .
>+/*
>+ * Hack fix for bug 611653: Do not use for any other purpose.
>+ *
>+ * Unbrand and set all slot values to undefined (except reserved slots that
>+ * are not used for cached prototypes).
>+ */
>+JS_FRIEND_API(bool) js_UnbrandAndClearSlots(JSContext *cx, JSObject *obj);

These don't follow prevailing comment and declaration style but that may be good, since they're hack fix parts and should not be used or emulated.

shaver used to like this when I did it, inspired by old BSD Unix code I'd seen (I think Bill Joy code):

#ifdef HACK
  { int ugly; /* two space indentation, { on same line as first stmt! */
    . . .
  }
#endif

Breaks all the rules!

/be
Comment on attachment 506972 [details] [diff] [review]
v5 (use global reserve slots formula)

So I hope you will accept my drive-by r+.

/be
Attachment #506972 - Flags: review+
(In reply to comment #68)
> Comment on attachment 506972 [details] [diff] [review]
> v5 (use global reserve slots formula)
> 
> So I hope you will accept my drive-by r+.
> 
> /be

Of course! :-)

http://hg.mozilla.org/tracemonkey/rev/b5ca98debed0
Whiteboard: hardblocker → hardblocker, fixed-in-tracemonkey
Attachment #506851 - Flags: review?(mrbkap) → review+
Since this landed, every single opt Mac mochitest-chrome has failed with the same set of 8 things starting with "test_autocomplete2.xul | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - parseInt is not a function at chrome://global/content/bindings/autocomplete.xml:0"
Backed out due to the Mac (opt-only actually) orange:

http://hg.mozilla.org/tracemonkey/rev/36a4ecbcb4be

It looks like "test_autocomplete.xul", which runs only on Mac, is screwing things up.
Whiteboard: hardblocker, fixed-in-tracemonkey → hardblocker
The mochitest failures seem to be timing-related. They happen only on Mac opt builds. And even a few printfs seems enough to make them not reproduce. It seems that some global window object is probably getting closed while a test is running, and so the test sees undefined values it doesn't expect. I really don't know what global window objects are in play, and it may be hard to figure out without being able to print or debug. Anyway, I'll continue tomorrow.

But it might be better just to disable those tests on Mac: this bug is pretty serious and it could take a long time to figure the mochitest failure out, unless someone out there has good knowledge about how mochitest-chrome works, and what global window objects might be getting cleared in these tests.
Could we take one more run at figuring out the leaks that result from not doing this bogus clearing/voiding? I know that looks hard but maybe dbaron or peterv could see something the rest of us can't.

/be
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/b5ca98debed0
http://hg.mozilla.org/mozilla-central/rev/36a4ecbcb4be (backout)
Note: not marking as fixed because last changeset is a backout.
Attached patch v6 (fix autocomplete mochitests) (obsolete) — Splinter Review
Attachment #506851 - Attachment is obsolete: true
Attachment #506972 - Attachment is obsolete: true
Attachment #508561 - Flags: review?(enndeakin)
Whiteboard: hardblocker → [hardblocker][has patch]
Attachment #508561 - Attachment is obsolete: true
Attachment #508580 - Flags: review?(enndeakin)
Attachment #508561 - Flags: review?(enndeakin)
Comment on attachment 506851 [details] [diff] [review]
v3

Assuming the errantly-mis-transferred r? should have been re-requested, r+ on it, at least as far as total awful hacks go.
Attachment #506851 - Flags: review+
Comment on attachment 508580 [details] [diff] [review]
v7 (better fix autocomplete tests)

>+    if (obj->isNative()) {
>+      js_UnbrandAndClearSlots(mContext, obj);

Note you are ignoring the r.v. (not a bug, see below, though).

>+JS_FRIEND_API(bool)
>+js_UnbrandAndClearSlots(JSContext *cx, JSObject *obj)
>+{
>+    JS_ASSERT(obj->isNative());
>+    JS_ASSERT(obj->isGlobal());
>+
>+    if (!obj->unbrand(cx))
>+        return false;

This is a silent failure but you ignore it. As noted in comment 56, unbrand is not fallible. Its return status tells whether to proceed assuming property-caching/guarding/PIC'ing or not (shape overflow and exhaustion).

So just make this hack API void and don't check unbrand's r.v.

/be
Attachment #508580 - Flags: review?(enndeakin) → review+
Landed with Brendan's latest changes.

http://hg.mozilla.org/tracemonkey/rev/6882211c4a80
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch][fixed-in-tracemonkey]
http://hg.mozilla.org/mozilla-central/rev/6882211c4a80
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 631494
I think this caused the other part of bug 631494.
comment 81: Yes, I already have a fix.
Depends on: 632433
Target Milestone: --- → mozilla2.0b12
You need to log in before you can comment on or make changes to this bug.