Last Comment Bug 635548 - Spurious "attempt to run compile-and-go script on a cleared scope" errors in wake of bug 630072 patch
: Spurious "attempt to run compile-and-go script on a cleared scope" errors in ...
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: P1 major with 9 votes (vote)
: mozilla2.0b12
Assigned To: general
:
Mentors:
Depends on: 637099
Blocks: 661405 719285 630072 642145 651465
  Show dependency treegraph
 
Reported: 2011-02-19 14:48 PST by Brendan Eich [:brendan]
Modified: 2012-04-26 14:39 PDT (History)
60 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
-
affected


Attachments
Refuse to run against cleared global only if script->hasGlobals() (1.68 KB, patch)
2011-02-19 14:48 PST, Brendan Eich [:brendan]
shaver: review+
gal: feedback+
Details | Diff | Review
proposed fix with test (16.64 KB, patch)
2011-02-25 13:07 PST, Brendan Eich [:brendan]
gal: review+
Details | Diff | Review
proposed fix with test, v2 (16.29 KB, patch)
2011-02-25 16:51 PST, Brendan Eich [:brendan]
dvander: review+
Details | Diff | Review
patch to commit, pushing to try first (15.85 KB, patch)
2011-02-25 17:17 PST, Brendan Eich [:brendan]
brendan: review+
Details | Diff | Review
fragment of stdout/stderr from looping browser chrome mochitests (134.23 KB, text/plain)
2011-02-26 07:59 PST, Brendan Eich [:brendan]
no flags Details

Description Brendan Eich [:brendan] 2011-02-19 14:48:02 PST
Created attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

Followup fix per bug 630072 comment 112.

/be
Comment 1 Brendan Eich [:brendan] 2011-02-19 14:54:25 PST
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

Going for r/a=shaver too.

Pushed to try (http://tbpl.mozilla.org/?tree=MozillaTry&rev=8e56cdade7dc), expect this will look only as good or better.

/be
Comment 2 Mike Shaver (:shaver -- probably not reading bugmail closely) 2011-02-19 16:39:10 PST
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

r+a=shaver
Comment 3 Brendan Eich [:brendan] 2011-02-19 16:44:22 PST
http://hg.mozilla.org/tracemonkey/rev/1a8670f6d41e

/be
Comment 4 Andreas Gal :gal 2011-02-19 16:53:38 PST
Sorry I didn't see the bug/review request until now. This is a great.
Comment 5 Brendan Eich [:brendan] 2011-02-19 20:42:29 PST
Should this go into b12?

/be
Comment 6 Mike Shaver (:shaver -- probably not reading bugmail closely) 2011-02-19 20:43:16 PST
yes.
Comment 7 Brendan Eich [:brendan] 2011-02-19 20:57:15 PST
http://hg.mozilla.org/mozilla-central/rev/4f8d5b10e4ef

/be
Comment 8 Brendan Eich [:brendan] 2011-02-19 22:37:50 PST
Backed out on orange -- the containsSlot assertbotch, so something is accessing globals even though the script->hasGlobals() check failed (or compileAndGo? Hmm).

http://hg.mozilla.org/tracemonkey/rev/3a7bba27f6c3
http://hg.mozilla.org/mozilla-central/rev/e77f4eda0bad

/be
Comment 9 Brendan Eich [:brendan] 2011-02-19 22:42:09 PST
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_581593.js | Exited with code 1 during test run

Full log: http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1298176958.1298178899.1282.gz&fulltext=1

    JS_ASSERT(obj->containsSlot(slot));

assertion in JSOP_{GET,CALL}GLOBAL botching. This must mean that either script->compileAndGo is false or script->hasGlobals() is false for a (function) script that uses JSOP_*GLOBAL. Debugging...

/be
Comment 10 Brendan Eich [:brendan] 2011-02-19 23:00:19 PST
Still rebuilding my tm debug build, but pacing around was productive. It occurred to me that the tree of compile-n-go functions rooted at the function to be run on a cleared global could be such that that outermost function had no globals, but inner functions do. We don't (and don't want to) check script->hasGlobals() and throw the new exception when inline-calling within js::Interpret.

Really want some kind of script->hasTransitiveGlobals() predicate to make this faster and not throw the new exception at pure functions that don't deserve it.

/be
Comment 11 Brendan Eich [:brendan] 2011-02-20 00:39:47 PST
http://hg.mozilla.org/tracemonkey/rev/c307f596458f

/be
Comment 12 Brendan Eich [:brendan] 2011-02-20 00:41:02 PST
Argh, wrong bug.

/be
Comment 13 Robert Sayre 2011-02-22 14:53:48 PST
is this fixed by http://hg.mozilla.org/mozilla-central/rev/1a8670f6d41e ?
Comment 14 Robert Sayre 2011-02-22 14:54:30 PST
nope, backed out, sorry.
Comment 15 Mike Shaver (:shaver -- probably not reading bugmail closely) 2011-02-23 00:03:01 PST
This looks like it has a reviewed, approved patch, but I'm not sure where it ended up.  Was it backed out as part of the search for the compiler bug?
Comment 16 Brendan Eich [:brendan] 2011-02-23 09:08:07 PST
See comment 10. Need new patch.

/be
Comment 17 Mike Beltzner [:beltzner, not reading bugmail] 2011-02-23 11:42:49 PST
Comment on attachment 513810 [details] [diff] [review]
Refuse to run against cleared global only if script->hasGlobals()

(bookkeeping)
Comment 18 Brendan Eich [:brendan] 2011-02-25 13:07:46 PST
Created attachment 515167 [details] [diff] [review]
proposed fix with test

I fixed an old clear shell function bug and in the course of auditing code paths exercised by this patch I tidied up some things while there and in jsscript.cpp.

Pushed to try: http://tbpl.mozilla.org/?tree=MozillaTry&rev=d43713eeb533

/be
Comment 19 Brendan Eich [:brendan] 2011-02-25 13:12:56 PST
The test shows the new check's conservatism. Even though global_f.call()'s return value, the inner function that uses the global var x, is discarded, we reject the attempt to run global_f. Finer-grained checking would hit every inline call and methodjitted call.

I hope this is not still too conservative. It will help all the pure functions that use only arguments and have been rejected wrongly so far (this bug is about those at least).

/be
Comment 20 Andreas Gal :gal 2011-02-25 14:56:23 PST
Comment on attachment 515167 [details] [diff] [review]
proposed fix with test

Why the new JS_THIS_OBJECT(cx, vp) in Clear?
Comment 21 Brendan Eich [:brendan] 2011-02-25 16:07:43 PST
(In reply to comment #20)
> Comment on attachment 515167 [details] [diff] [review]
> proposed fix with test
> 
> Why the new JS_THIS_OBJECT(cx, vp) in Clear?

Because calling clear() (note no actual params) segfaults on uninitialized obj!

/be
Comment 22 Brendan Eich [:brendan] 2011-02-25 16:08:16 PST
And the usage message for clear is "clear([obj])" -- implying a method of its global. That's how the test uses it, for consistency with .evaluate() usage.

/be
Comment 23 Brendan Eich [:brendan] 2011-02-25 16:51:17 PST
Created attachment 515253 [details] [diff] [review]
proposed fix with test, v2

Crucial fix, dvander was right to raise the issue. TCF_FUN_FLAGS had a latent bug: it should not have included TCF_HAS_SHARPS. I scaled back to propagating only TCF_TRANSITIVE_HAS_GLOBALS up the cg tree, post-order, in the emitter.

/be
Comment 24 Brendan Eich [:brendan] 2011-02-25 17:17:54 PST
Created attachment 515260 [details] [diff] [review]
patch to commit, pushing to try first

jsreftests caught the mistake in trying to remove TCF_HAS_SHARPS. I commented.

/be
Comment 25 Brendan Eich [:brendan] 2011-02-25 17:27:39 PST
http://tbpl.mozilla.org/?tree=MozillaTry&rev=d023f7ec2eea

/be
Comment 26 Brendan Eich [:brendan] 2011-02-25 22:15:12 PST
(In reply to comment #25)
> http://tbpl.mozilla.org/?tree=MozillaTry&rev=d023f7ec2eea

As philor pointed out on IRC, still obj->containsSlot(slot) botches. I will debug.

/be
Comment 27 Brendan Eich [:brendan] 2011-02-26 07:59:15 PST
Created attachment 515347 [details]
fragment of stdout/stderr from looping browser chrome mochitests

I ran under gdb to try to catch the obj->containsSlot(slot) assertion, and instead the tests ran all night and ended up spewing this endlessly. Anyone able to interpret?

I'm adding assertions to inline_call and elsewhere that we never call a function that (transitively) has globals and is scoped by a cleared global.

/be
Comment 28 Brendan Eich [:brendan] 2011-02-26 12:43:06 PST
Clearly there are many entry points other than via RunScript. One I'm hitting that blows up per comment 27 is:

#1991 0x000000010134bbc9 in nsNavBookmarks::InsertBookmark (this=0x105e8aca0, aFolder=93, aURI=0x12fcdd9f0, aIndex=-1, aTitle=@0x130b4db00, aNewBookmarkId=0x7fff5fbf5cb0) at ../../../../../toolkit/components/places/src/nsNavBookmarks.cpp:929
929	  NOTIFY_OBSERVERS(mCanNotify, mCacheObservers, mObservers,
(gdb) l
924	
925	  if (!mBatching) {
926	    ForceWALCheckpoint(mDBConn);
927	  }
928	
929	  NOTIFY_OBSERVERS(mCanNotify, mCacheObservers, mObservers,
930	                   nsINavBookmarkObserver,
931	                   OnItemAdded(*aNewBookmarkId, aFolder, index, TYPE_BOOKMARK,
932	                               aURI));
933	

which flows from native code into XPConnect, where

(gdb) down 5
#1986 0x0000000100f4ab49 in XPCConvert::NativeData2JS (ccx=@0x7fff5fbf5370, d=0x7fff5fbf5090, s=0x7fff5fbf5600, type=@0x7fff5fbf523b, iid=0x7fff5fbf54e0, pErr=0x0) at xpcprivate.h:3262
3262	        return NativeData2JS(lccx, d, s, type, iid, pErr);
(gdb) down
#1985 0x0000000100f2086c in XPCConvert::NativeData2JS (lccx=@0x7fff5fbf4d40, d=0x7fff5fbf5090, s=0x7fff5fbf5600, type=@0x7fff5fbf523b, iid=0x7fff5fbf54e0, pErr=0x0) at ../../../../../js/src/xpconnect/src/xpcconvert.cpp:485
485	                    if(!NativeInterface2JSObject(lccx, d, nsnull, helper, iid,
(gdb) 
#1984 0x0000000100f1f311 in XPCConvert::NativeInterface2JSObject (lccx=@0x7fff5fbf4d40, d=0x7fff5fbf5090, dest=0x0, aHelper=@0x7fff5fbf4c00, iid=0x7fff5fbf54e0, Interface=0x0, allowNativeWrapper=1, isGlobal=0, pErr=0x0) at ../../../../../js/src/xpconnect/src/xpcconvert.cpp:1147
1147	        XPCWrappedNativeScope::FindInJSObjectScope(cx, jsscope);
(gdb) 
#1983 0x0000000100f661cb in XPCWrappedNativeScope::FindInJSObjectScope (cx=0x13480feb0, obj=0x11aa8ecf0, OKIfNotInitialized=0, runtime=0x105c451d0) at ../../../../../js/src/xpconnect/src/xpcwrappednativescope.cpp:839
839	                                        OKIfNotInitialized, runtime);
(gdb) 
#1982 0x0000000100f65dce in DEBUG_CheckForComponentsInScope (cx=0x13480feb0, obj=0x11aa8ecf0, startingObj=0x128d8a6e8, OKIfNotInitialized=0, runtime=0x105c451d0) at ../../../../../js/src/xpconnect/src/xpcwrappednativescope.cpp:763
763	    xpc_DumpJSStack(cx, JS_TRUE, JS_TRUE, JS_TRUE);

we freak out dumping the JS stack over and over because there's no Components in the global for an object to convert (i=4 parameter to onItemAdded).

The global is indeed flagged as "cleared". But we haven't even entered the JS engine yet.

As Luke pointed out, we can also cross globals when calling inside a flow that entered via RunScript, so we'll have to check then too. That could hurt, although if it's only when static scope (global of callee vs. caller) changes it might be ok. But this is starting to look bad.

How much worse is removing JS_ClearScope or no-oping it and diagnosing the leaks? I have no idea, but if peterv were free I'd want him doing that in parallel as I work on this.

/be
Comment 29 Andreas Gal :gal 2011-02-26 12:47:51 PST
Taking out JS_ClearScope is pretty high risk at this time. Peter and I can take turns diagnosing and fixing any resulting leaks we can see, but there might be a lot of stuff we don't see and we are out of betas. I will start working on an alternative patch to see how bad this is.
Comment 30 Brendan Eich [:brendan] 2011-02-26 12:48:53 PST
The cross-global-scope call case can't happen with TM. So that leaves JM and the interpreter needing checks. I'll work on that patch after getting this XPConnect debuggery out of the way.

/be
Comment 31 Brendan Eich [:brendan] 2011-02-26 12:50:07 PST
Agree the leak fixing could be too hard or time-consuming, but we don't yet know. Also, clearing is just wrong -- violates JS semantics even in the best light. It is true hanging onto functions scoped by old inner windows can entrain too much memory, so perhaps there are bloat bugs in addition to permanent leaks.

/be
Comment 32 Andreas Gal :gal 2011-02-26 12:51:39 PST
Absolutely, that JS_ClearScope is dead wrong. I am just terrified to remove it now. Anyway, on it.
Comment 33 Peter Van der Beken [:peterv] 2011-02-28 06:35:09 PST
I have been looking at taking out the scope clearing and diagnosing the leaks, but with leaks it's always hard to predict when we'll nail the last one :-/.
Comment 34 Robert Sayre 2011-02-28 12:00:39 PST
unblocking on this, since we don't have a good idea of the severity, but we know it is not particularly common.
Comment 35 Goffredo Marocchi 2011-03-28 03:28:44 PDT
Not common? Well, simply hitting the reload button on my test WebGL page causes that message to pop-up (it does not happen when the page first loads).

It even happens on this tutorial:
http://learningwebgl.com/lessons/lesson06/index.html 

(load it and then reload it with the reload button...

Result: attempt to run compile-and-go script on a cleared scope
[Break On This Error] requestAnimFrame(tick); 

)
Comment 36 Andrew Carter 2011-03-29 13:02:20 PDT
This is a show stopper for our product. Please address.
Comment 37 Henri Sivonen (:hsivonen) 2011-03-29 23:31:17 PDT
Pushing the test for bug 642145 to try makes subsequent chrome mochitests fail with this error. Also, if I run the test case locally, it first passes (assuming the fix for bug 642145 is applied, too) but then if I press the reload button in the browser window to rerun the test, I get this error.

Do you have advice on how I could modify the test to work around this bug so that I can get the test landed without introducing a fix for this bug as a dependency?
Comment 38 Mischa 2011-05-04 01:34:08 PDT
Hey all,

i get the error in this here:
function renderallcontexts()
{
	glInfo.glrenderscene();	
}

The function is called in body onload. 
glInfo is a global variable and inside 
glrenderscene() I render on a canvas 
using webgl.
Comment 39 Hugh Nougher [:Hughman] 2011-05-07 00:52:24 PDT
The first testcase in bug 655435 when closing/refreshing the page with it running always gives this error in 4.0.1 and above. I think its cause is bug 655227 related.
Comment 40 Chuck Baker 2011-05-14 13:58:42 PDT
I get this error quite often.  It always points to the same place in my code where I am notifying an observer:

myObserver.prototype = {
	observe: function(subject, topic, data) {
Comment 41 John J. Barton 2011-05-19 08:57:14 PDT
We see hundreds of these messages when testing Firebug, all seem to point to the first statement in an observer. 

We see this now occasionally in our consoleService observer, where it causes an infinite number of error messages by triggering itself. We've not isolated the steps to trigger it.
Comment 42 John J. Barton 2011-05-20 11:15:46 PDT
Jan "Honza" Odvarko fix the problem in our code. We stopped seeing these messages. The fix seems be related to removing observers loaded in popup windows.
Comment 43 David Mandelin [:dmandelin] 2011-05-24 18:43:28 PDT
This is tracking-firefox5, so it deserves a summary comment.

1. The ultimate problem is that closing a browser window calls JS_ClearScope. Prior to Fx4, this nuked all the properties of the object. Semantically, this is very bad, because it removes properties that are non-configurable, which ES disallows. Practically, it is also bad, because if the script accesses a global variable, it may contain a baked-in slot load, and JS_ClearScope nukes that slot, so we would make a memory-unsafe read.

For Fx4, we changed this (most recently in bug 630072) so that JS_ClearScope sets a flag on the global object to indicate that it is cleared, and RunScript (the primary entry point that runs a script) checks that flag, and throws an exception if the global object of the script has been cleared.

2. The immediate problem here is (A) that we tend to get a lot of those cleared-global errors in practice, and they are annoying to look at. Also, (B) if the script being called doesn't actually use global variables, then it's perfectly safe to run it, so the error can stop perfectly valid and useful code from running.

3. The proposed fix here was to have the front end track which functions actually use global variables, and throw an exception only if the function would actually use globals. This fixes problem (B), and it alleviates problem (A).

But Brendan points out in comment 28 that the entire approach (of forbidding calls to scripts that would unsafely access properties of cleared globals) is seriously flawed, because there are plenty of ways to unsafely access cleared globals that don't go through RunScript.

4. Options to fix this:

4.1. The real fix is clearly to get rid of JS_ClearScope, which is bug 470510. This is apparently hard, and not expected to be done quickly.

4.2. We could still take the patch here, which will mitigate the problem. I don't think the patch here makes anything worse.

4.3. We could retry the approach of replacing the properties of the global with dummy values in JS_ClearScope. We had done that before, but it seems that left us with a bunch of leaks. Bug 630072 comment 60 has more information on that, but it ends with the ominous "There might be more problems still". So it looks like we might back with the same hard problems we get with bug 470510 if we try to go that way.

5. My recommendation. It depends on what our goal actually is here:

  (i) Less annoying spew messages. 

      - One thing we could do is not throw the error--just don't run the 
        script. I know it sounds bad, but it might actually be an improvement
        over now.

      - We could also land the patch in this bug, which should cut things
        down somewhat.

  (ii) Be able to run somewhat more code that is parented to a closed window.

      - Then land the patch in this bug.

  (iii) Be able to run any code that is parented to a closed window.

      - Then we have to fix bug 470510.
Comment 44 John J. Barton 2011-05-24 20:59:23 PDT
(iv) give the identity of the cleared scope and script in the error message and 
     an event when the scope is cleared, 
     so devs can find the observer registration and 
     unregister it before the clear.
Comment 45 Andreas Gal :gal 2011-05-25 00:03:44 PDT
peterv has a browser building and no longer leaking without JS_ClearScope. I will ask him to post a summary of how much more work is needed to land it.
Comment 46 Peter Van der Beken [:peterv] 2011-06-01 01:45:26 PDT
Removal of JS_ClearScope is being tracked in bug 637099.
Comment 47 Asa Dotzler [:asa] 2011-06-06 15:16:14 PDT
flag changes mean we're not going to take any mozilla-side changes for Firefox 5 and this seems to have been worked around in Firebug for 5.
Comment 48 John J. Barton 2011-06-06 15:20:38 PDT
This problem is not related to Firebug. We just happened to complain about it.
Comment 49 David Rees 2011-09-29 10:20:08 PDT
It would seem to me that John's (#44) suggestion of at least giving the developer more information about the error would be very helpful and would have very low risk. Can we at least get that in to 7?

The current warning of "attempt to run compile-and-go script on a cleared scope" doesn't really make much sense to developers. I don't believe I have seen the term used before outside of this error and from all the questions on the web most developers have not either. Comment #43 is the first good explanation I have seen (thank you very much for it).
Comment 50 David Rees 2011-09-29 10:49:11 PDT
Just quick note that may be obvious to those working on this, but wasn't obvious to me. Use of bind() causes a function to raise this error after a scope clear also.

So the following will raise the error
function() {  this.doSomething(); }.

but the following will not
var this2 = this;
function() {  this2.doSomething(); }.
Comment 51 David Mandelin [:dmandelin] 2011-10-05 11:06:20 PDT
(In reply to David Rees from comment #49)
> It would seem to me that John's (#44) suggestion of at least giving the
> developer more information about the error would be very helpful and would
> have very low risk. Can we at least get that in to 7?
> 
> The current warning of "attempt to run compile-and-go script on a cleared
> scope" doesn't really make much sense to developers. I don't believe I have
> seen the term used before outside of this error and from all the questions
> on the web most developers have not either. Comment #43 is the first good
> explanation I have seen (thank you very much for it).

IIUC, we're getting rid of the calls to ClearScope, so this will go away eventually.
Comment 52 Brendan Eich [:brendan] 2012-04-26 12:17:09 PDT
So bug 637099 is fixed. What now? I'm not a good owner, and the bug Henri blocked on this bug is unowned too. Who is in charge? :-P

/be
Comment 53 David Mandelin [:dmandelin] 2012-04-26 13:56:35 PDT
(In reply to Brendan Eich [:brendan] from comment #52)
> So bug 637099 is fixed. What now? I'm not a good owner, and the bug Henri
> blocked on this bug is unowned too. Who is in charge? :-P

AFAIK this bug is now done per comment 51. I'll file a lower-priority followup to remove ClearScope entirely. It still does have 1 use in Gecko.
Comment 54 Johnny Stenback (:jst, jst@mozilla.com) 2012-04-26 14:39:42 PDT
A few more than one, but not many. There's one remaining case where we do clear a global object in the browser and that's in our js component loader when we're unloading JS components, i.e. not something that can directly be controlled by webpages. The other remaining calls are only a hand-full, specifically some js shell and xpcshell code, the js component loader case, and one call in a quickstub for DOM storage. I just filed bug 749392 on removing the remainders of these calls, which is something Peter and I have discussed several times already, but neither of us filed a bug on it until now.

Given that, I'm marking this fixed as I don't believe there's any remaining ways in the browser to trigger the error that this bug was filed for.

Note You need to log in before you can comment on or make changes to this bug.