Closed Bug 635548 Opened 13 years ago Closed 12 years ago

Spurious "attempt to run compile-and-go script on a cleared scope" errors in wake of bug 630072 patch

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla2.0b12
Tracking Status
firefox5 - wontfix
firefox6 - affected

People

(Reporter: brendan, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(2 files, 3 obsolete files)

Followup fix per bug 630072 comment 112.

/be
Attachment #513810 - Flags: review?(gal)
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P1
Summary: quora.com bloats out of control, part 1: js_UnbrandAndClearSlots leaks → Spurious "attempt to run compile-and-go script on a cleared scope" errors in wake of bug 630072 patch
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

Going for r/a=shaver too.

Pushed to try (http://tbpl.mozilla.org/?tree=MozillaTry&rev=8e56cdade7dc), expect this will look only as good or better.

/be
Attachment #513810 - Flags: review?(shaver)
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

r+a=shaver
Attachment #513810 - Flags: review?(shaver)
Attachment #513810 - Flags: review?(gal)
Attachment #513810 - Flags: review+
Attachment #513810 - Flags: approval2.0+
http://hg.mozilla.org/tracemonkey/rev/1a8670f6d41e

/be
Whiteboard: fixed-in-tracemonkey
Sorry I didn't see the bug/review request until now. This is a great.
Attachment #513810 - Flags: feedback+
Should this go into b12?

/be
http://hg.mozilla.org/mozilla-central/rev/4f8d5b10e4ef

/be
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Backed out on orange -- the containsSlot assertbotch, so something is accessing globals even though the script->hasGlobals() check failed (or compileAndGo? Hmm).

http://hg.mozilla.org/tracemonkey/rev/3a7bba27f6c3
http://hg.mozilla.org/mozilla-central/rev/e77f4eda0bad

/be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: fixed-in-tracemonkey
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_581593.js | Exited with code 1 during test run

Full log: http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1298176958.1298178899.1282.gz&fulltext=1

    JS_ASSERT(obj->containsSlot(slot));

assertion in JSOP_{GET,CALL}GLOBAL botching. This must mean that either script->compileAndGo is false or script->hasGlobals() is false for a (function) script that uses JSOP_*GLOBAL. Debugging...

/be
Status: REOPENED → ASSIGNED
Still rebuilding my tm debug build, but pacing around was productive. It occurred to me that the tree of compile-n-go functions rooted at the function to be run on a cleared global could be such that that outermost function had no globals, but inner functions do. We don't (and don't want to) check script->hasGlobals() and throw the new exception when inline-calling within js::Interpret.

Really want some kind of script->hasTransitiveGlobals() predicate to make this faster and not throw the new exception at pure functions that don't deserve it.

/be
http://hg.mozilla.org/tracemonkey/rev/c307f596458f

/be
Whiteboard: fixed-in-tracemonkey
Argh, wrong bug.

/be
Whiteboard: fixed-in-tracemonkey
Whiteboard: [hardblocker]
nope, backed out, sorry.
This looks like it has a reviewed, approved patch, but I'm not sure where it ended up.  Was it backed out as part of the search for the compiler bug?
Whiteboard: [hardblocker] → [hardblocker][has patch]
See comment 10. Need new patch.

/be
Whiteboard: [hardblocker][has patch] → [hardblocker]
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

(bookkeeping)
Attachment #513810 - Flags: approval2.0+
Attached patch proposed fix with test (obsolete) — Splinter Review
I fixed an old clear shell function bug and in the course of auditing code paths exercised by this patch I tidied up some things while there and in jsscript.cpp.

Pushed to try: http://tbpl.mozilla.org/?tree=MozillaTry&rev=d43713eeb533

/be
Attachment #515167 - Flags: review?(gal)
Whiteboard: [hardblocker] → [hardblocker] [has patch]
The test shows the new check's conservatism. Even though global_f.call()'s return value, the inner function that uses the global var x, is discarded, we reject the attempt to run global_f. Finer-grained checking would hit every inline call and methodjitted call.

I hope this is not still too conservative. It will help all the pure functions that use only arguments and have been rejected wrongly so far (this bug is about those at least).

/be
Comment on attachment 515167 [details] [diff] [review]
proposed fix with test

Why the new JS_THIS_OBJECT(cx, vp) in Clear?
Attachment #515167 - Flags: review?(gal) → review+
(In reply to comment #20)
> Comment on attachment 515167 [details] [diff] [review]
> proposed fix with test
> 
> Why the new JS_THIS_OBJECT(cx, vp) in Clear?

Because calling clear() (note no actual params) segfaults on uninitialized obj!

/be
And the usage message for clear is "clear([obj])" -- implying a method of its global. That's how the test uses it, for consistency with .evaluate() usage.

/be
Attached patch proposed fix with test, v2 (obsolete) — Splinter Review
Crucial fix, dvander was right to raise the issue. TCF_FUN_FLAGS had a latent bug: it should not have included TCF_HAS_SHARPS. I scaled back to propagating only TCF_TRANSITIVE_HAS_GLOBALS up the cg tree, post-order, in the emitter.

/be
Attachment #515167 - Attachment is obsolete: true
Attachment #515253 - Flags: review?(dvander)
Attachment #515253 - Flags: review?(dvander) → review+
jsreftests caught the mistake in trying to remove TCF_HAS_SHARPS. I commented.

/be
Attachment #515253 - Attachment is obsolete: true
Attachment #515260 - Flags: review+
(In reply to comment #25)
> http://tbpl.mozilla.org/?tree=MozillaTry&rev=d023f7ec2eea

As philor pointed out on IRC, still obj->containsSlot(slot) botches. I will debug.

/be
I ran under gdb to try to catch the obj->containsSlot(slot) assertion, and instead the tests ran all night and ended up spewing this endlessly. Anyone able to interpret?

I'm adding assertions to inline_call and elsewhere that we never call a function that (transitively) has globals and is scoped by a cleared global.

/be
Clearly there are many entry points other than via RunScript. One I'm hitting that blows up per comment 27 is:

#1991 0x000000010134bbc9 in nsNavBookmarks::InsertBookmark (this=0x105e8aca0, aFolder=93, aURI=0x12fcdd9f0, aIndex=-1, aTitle=@0x130b4db00, aNewBookmarkId=0x7fff5fbf5cb0) at ../../../../../toolkit/components/places/src/nsNavBookmarks.cpp:929
929	  NOTIFY_OBSERVERS(mCanNotify, mCacheObservers, mObservers,
(gdb) l
924	
925	  if (!mBatching) {
926	    ForceWALCheckpoint(mDBConn);
927	  }
928	
929	  NOTIFY_OBSERVERS(mCanNotify, mCacheObservers, mObservers,
930	                   nsINavBookmarkObserver,
931	                   OnItemAdded(*aNewBookmarkId, aFolder, index, TYPE_BOOKMARK,
932	                               aURI));
933	

which flows from native code into XPConnect, where

(gdb) down 5
#1986 0x0000000100f4ab49 in XPCConvert::NativeData2JS (ccx=@0x7fff5fbf5370, d=0x7fff5fbf5090, s=0x7fff5fbf5600, type=@0x7fff5fbf523b, iid=0x7fff5fbf54e0, pErr=0x0) at xpcprivate.h:3262
3262	        return NativeData2JS(lccx, d, s, type, iid, pErr);
(gdb) down
#1985 0x0000000100f2086c in XPCConvert::NativeData2JS (lccx=@0x7fff5fbf4d40, d=0x7fff5fbf5090, s=0x7fff5fbf5600, type=@0x7fff5fbf523b, iid=0x7fff5fbf54e0, pErr=0x0) at ../../../../../js/src/xpconnect/src/xpcconvert.cpp:485
485	                    if(!NativeInterface2JSObject(lccx, d, nsnull, helper, iid,
(gdb) 
#1984 0x0000000100f1f311 in XPCConvert::NativeInterface2JSObject (lccx=@0x7fff5fbf4d40, d=0x7fff5fbf5090, dest=0x0, aHelper=@0x7fff5fbf4c00, iid=0x7fff5fbf54e0, Interface=0x0, allowNativeWrapper=1, isGlobal=0, pErr=0x0) at ../../../../../js/src/xpconnect/src/xpcconvert.cpp:1147
1147	        XPCWrappedNativeScope::FindInJSObjectScope(cx, jsscope);
(gdb) 
#1983 0x0000000100f661cb in XPCWrappedNativeScope::FindInJSObjectScope (cx=0x13480feb0, obj=0x11aa8ecf0, OKIfNotInitialized=0, runtime=0x105c451d0) at ../../../../../js/src/xpconnect/src/xpcwrappednativescope.cpp:839
839	                                        OKIfNotInitialized, runtime);
(gdb) 
#1982 0x0000000100f65dce in DEBUG_CheckForComponentsInScope (cx=0x13480feb0, obj=0x11aa8ecf0, startingObj=0x128d8a6e8, OKIfNotInitialized=0, runtime=0x105c451d0) at ../../../../../js/src/xpconnect/src/xpcwrappednativescope.cpp:763
763	    xpc_DumpJSStack(cx, JS_TRUE, JS_TRUE, JS_TRUE);

we freak out dumping the JS stack over and over because there's no Components in the global for an object to convert (i=4 parameter to onItemAdded).

The global is indeed flagged as "cleared". But we haven't even entered the JS engine yet.

As Luke pointed out, we can also cross globals when calling inside a flow that entered via RunScript, so we'll have to check then too. That could hurt, although if it's only when static scope (global of callee vs. caller) changes it might be ok. But this is starting to look bad.

How much worse is removing JS_ClearScope or no-oping it and diagnosing the leaks? I have no idea, but if peterv were free I'd want him doing that in parallel as I work on this.

/be
Taking out JS_ClearScope is pretty high risk at this time. Peter and I can take turns diagnosing and fixing any resulting leaks we can see, but there might be a lot of stuff we don't see and we are out of betas. I will start working on an alternative patch to see how bad this is.
The cross-global-scope call case can't happen with TM. So that leaves JM and the interpreter needing checks. I'll work on that patch after getting this XPConnect debuggery out of the way.

/be
Agree the leak fixing could be too hard or time-consuming, but we don't yet know. Also, clearing is just wrong -- violates JS semantics even in the best light. It is true hanging onto functions scoped by old inner windows can entrain too much memory, so perhaps there are bloat bugs in addition to permanent leaks.

/be
Absolutely, that JS_ClearScope is dead wrong. I am just terrified to remove it now. Anyway, on it.
Whiteboard: [hardblocker] [has patch] → [hardblocker]
I have been looking at taking out the scope clearing and diagnosing the leaks, but with leaks it's always hard to predict when we'll nail the last one :-/.
blocking2.0: final+ → ---
Whiteboard: [hardblocker]
unblocking on this, since we don't have a good idea of the severity, but we know it is not particularly common.
Not common? Well, simply hitting the reload button on my test WebGL page causes that message to pop-up (it does not happen when the page first loads).

It even happens on this tutorial:
http://learningwebgl.com/lessons/lesson06/index.html 

(load it and then reload it with the reload button...

Result: attempt to run compile-and-go script on a cleared scope
[Break On This Error] requestAnimFrame(tick); 

)
This is a show stopper for our product. Please address.
Pushing the test for bug 642145 to try makes subsequent chrome mochitests fail with this error. Also, if I run the test case locally, it first passes (assuming the fix for bug 642145 is applied, too) but then if I press the reload button in the browser window to rerun the test, I get this error.

Do you have advice on how I could modify the test to work around this bug so that I can get the test landed without introducing a fix for this bug as a dependency?
Blocks: 651465
Hey all,

i get the error in this here:
function renderallcontexts()
{
	glInfo.glrenderscene();	
}

The function is called in body onload. 
glInfo is a global variable and inside 
glrenderscene() I render on a canvas 
using webgl.
The first testcase in bug 655435 when closing/refreshing the page with it running always gives this error in 4.0.1 and above. I think its cause is bug 655227 related.
I get this error quite often.  It always points to the same place in my code where I am notifying an observer:

myObserver.prototype = {
	observe: function(subject, topic, data) {
We see hundreds of these messages when testing Firebug, all seem to point to the first statement in an observer. 

We see this now occasionally in our consoleService observer, where it causes an infinite number of error messages by triggering itself. We've not isolated the steps to trigger it.
Jan "Honza" Odvarko fix the problem in our code. We stopped seeing these messages. The fix seems be related to removing observers loaded in popup windows.
This is tracking-firefox5, so it deserves a summary comment.

1. The ultimate problem is that closing a browser window calls JS_ClearScope. Prior to Fx4, this nuked all the properties of the object. Semantically, this is very bad, because it removes properties that are non-configurable, which ES disallows. Practically, it is also bad, because if the script accesses a global variable, it may contain a baked-in slot load, and JS_ClearScope nukes that slot, so we would make a memory-unsafe read.

For Fx4, we changed this (most recently in bug 630072) so that JS_ClearScope sets a flag on the global object to indicate that it is cleared, and RunScript (the primary entry point that runs a script) checks that flag, and throws an exception if the global object of the script has been cleared.

2. The immediate problem here is (A) that we tend to get a lot of those cleared-global errors in practice, and they are annoying to look at. Also, (B) if the script being called doesn't actually use global variables, then it's perfectly safe to run it, so the error can stop perfectly valid and useful code from running.

3. The proposed fix here was to have the front end track which functions actually use global variables, and throw an exception only if the function would actually use globals. This fixes problem (B), and it alleviates problem (A).

But Brendan points out in comment 28 that the entire approach (of forbidding calls to scripts that would unsafely access properties of cleared globals) is seriously flawed, because there are plenty of ways to unsafely access cleared globals that don't go through RunScript.

4. Options to fix this:

4.1. The real fix is clearly to get rid of JS_ClearScope, which is bug 470510. This is apparently hard, and not expected to be done quickly.

4.2. We could still take the patch here, which will mitigate the problem. I don't think the patch here makes anything worse.

4.3. We could retry the approach of replacing the properties of the global with dummy values in JS_ClearScope. We had done that before, but it seems that left us with a bunch of leaks. Bug 630072 comment 60 has more information on that, but it ends with the ominous "There might be more problems still". So it looks like we might back with the same hard problems we get with bug 470510 if we try to go that way.

5. My recommendation. It depends on what our goal actually is here:

  (i) Less annoying spew messages. 

      - One thing we could do is not throw the error--just don't run the 
        script. I know it sounds bad, but it might actually be an improvement
        over now.

      - We could also land the patch in this bug, which should cut things
        down somewhat.

  (ii) Be able to run somewhat more code that is parented to a closed window.

      - Then land the patch in this bug.

  (iii) Be able to run any code that is parented to a closed window.

      - Then we have to fix bug 470510.
(iv) give the identity of the cleared scope and script in the error message and 
     an event when the scope is cleared, 
     so devs can find the observer registration and 
     unregister it before the clear.
peterv has a browser building and no longer leaking without JS_ClearScope. I will ask him to post a summary of how much more work is needed to land it.
Removal of JS_ClearScope is being tracked in bug 637099.
flag changes mean we're not going to take any mozilla-side changes for Firefox 5 and this seems to have been worked around in Firebug for 5.
This problem is not related to Firebug. We just happened to complain about it.
Depends on: 637099
It would seem to me that John's (#44) suggestion of at least giving the developer more information about the error would be very helpful and would have very low risk. Can we at least get that in to 7?

The current warning of "attempt to run compile-and-go script on a cleared scope" doesn't really make much sense to developers. I don't believe I have seen the term used before outside of this error and from all the questions on the web most developers have not either. Comment #43 is the first good explanation I have seen (thank you very much for it).
Just quick note that may be obvious to those working on this, but wasn't obvious to me. Use of bind() causes a function to raise this error after a scope clear also.

So the following will raise the error
function() {  this.doSomething(); }.

but the following will not
var this2 = this;
function() {  this2.doSomething(); }.
(In reply to David Rees from comment #49)
> It would seem to me that John's (#44) suggestion of at least giving the
> developer more information about the error would be very helpful and would
> have very low risk. Can we at least get that in to 7?
> 
> The current warning of "attempt to run compile-and-go script on a cleared
> scope" doesn't really make much sense to developers. I don't believe I have
> seen the term used before outside of this error and from all the questions
> on the web most developers have not either. Comment #43 is the first good
> explanation I have seen (thank you very much for it).

IIUC, we're getting rid of the calls to ClearScope, so this will go away eventually.
So bug 637099 is fixed. What now? I'm not a good owner, and the bug Henri blocked on this bug is unowned too. Who is in charge? :-P

/be
Assignee: brendan → general
(In reply to Brendan Eich [:brendan] from comment #52)
> So bug 637099 is fixed. What now? I'm not a good owner, and the bug Henri
> blocked on this bug is unowned too. Who is in charge? :-P

AFAIK this bug is now done per comment 51. I'll file a lower-priority followup to remove ClearScope entirely. It still does have 1 use in Gecko.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → WORKSFORME
A few more than one, but not many. There's one remaining case where we do clear a global object in the browser and that's in our js component loader when we're unloading JS components, i.e. not something that can directly be controlled by webpages. The other remaining calls are only a hand-full, specifically some js shell and xpcshell code, the js component loader case, and one call in a quickstub for DOM storage. I just filed bug 749392 on removing the remainders of these calls, which is something Peter and I have discussed several times already, but neither of us filed a bug on it until now.

Given that, I'm marking this fixed as I don't believe there's any remaining ways in the browser to trigger the error that this bug was filed for.
Resolution: WORKSFORME → FIXED
You need to log in before you can comment on or make changes to this bug.