Closed Bug 479198 Opened 15 years ago Closed 15 years ago

"Deep" property cache entries not invalidated when shadowed (TIBCO General Interface regression)

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: jesse, Assigned: jorendorff)

References

Details

(Keywords: testcase, verified1.9.1)

Attachments

(6 files, 5 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b3pre

It appears that under some circumstances JavaScript memory may not be properly synchronized so that seemingly impossible conditions occur. This is causing TIBCO General Interface Builder 3.7 to fail to load. 

I've spent a lot of time debugging and the issue seems exceedingly complicated. Whether the bug shows up or not seems to be related to whether and how many logging statements are sent to a separate browser window and the number of sync and async XHR requests. The only consistent way to prevent the bug is to turn off jit.client in about:config. 

I'm attaching both the 3.7 GA release and a version of Model.js that I have used to debug this issue. In the modified file the "impossible" condition is shown on lines 455-458. 

            var try1 = this.getServer();
            jsx3.log("Hello world!");
            var try2 = this.getServer();
            jsx3.log(try1 + " " + try2);

"try1" logs null while "try2" logs the server object. However, the implementation of getServer() clearly does nothing to affect the state of memory (line 563):

  Model_prototype.getServer = function() {
    var node = this;
    while (node) {
      if (node._jsxserver) return node._jsxserver;
      node = node._jsxparent;
    }
    return null;
  };

The call to jsx3.log() modifies some parts of memory and may call browser DOM methods but it certainly does not modify anything that might affect the getServer() method.

Reproducible: Always

Steps to Reproduce:
1. Unzip attachment
2 [review]. Launch GI_Builder.html in Firefox 3.1+ with jit.client enabled
3. Click through security warnings (suggest selecting "remember")

Actual Results:  
Start-up fails. Error console shows "objServer is null" or "x.getServer() is null" usually in Model.js.

Or, replace JSX/js/jsx3/app/Model.js with the attached file and then launch GI_Builder. In this case a long red stack track shows up in the Application Monitor external window. Just above it you can see the logging messages corresponding to Model.js lines 455-458:

23:17:15.526 global (INFO) - Hello world!
23:17:15.527 global (INFO) - null @jsx3.app.Server GI_Builder:jsx3.IDE




Expected Results:  
Start-up completes with no such errors. This is the case in Firefox 3.0 or 3.1 with jit.client disabled.

Testing on a dual-core mac book and able to reproduce in XP running in Fusion with two cores enabled.
And of course I meant jit.content, not jit.client.
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Summary: JavaScript memory synchronization bug with JIT enabled in 3.1+ → TM: JavaScript memory synchronization bug with JIT enabled in 3.1+
Version: unspecified → Trunk
Jesse, is this something your tools can reduce?
By "reduce" do you mean, can we work around the issue? or, can we produce a simpler test case?
He means "produce a simpler test case".  And I think he missed the fact that you and I are both "Jesse" ;)

I don't think the tools and tricks in http://www.squarefree.com/2009/01/11/reducing-real-world-scripts/ will help here, given what you said in comment 0:

"Whether the bug shows up or not seems to be related to whether and how many logging statements are sent to a separate browser window and the number of sync and async XHR requests."
The Jesse Project comes to Mozilla...

I spent some time trying to create a simple test case for this bug thinking it was correlated with XPathEvaluator.evaluate or XMLHttpRequest.open/send, both of which we use extensively, but unfortunately I was not successful. 

Our compiled build, which consists of many fewer files (and so fewer XMLHttpRequest's) seems to be more stable than the source build that I attached. Additionally, I was able to reproduce the bug over http:// as well as file:// so it does not seem to be dependent on XMLHttpRequest to the file system.
I just confirmed that this is still an issue in the latest b3 build. At this point there is no way for us to work around this issue short of instructing all users to disable content jit via about:config. We will have to drop support for Fx 3.5 if this issue is not fixed by 3.5 GM.
Assignee: general → gal
Flags: blocking1.9.1?
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1b4
Any luck with creating a more reduced testcase? That would be much more useful than making hostile-sounding comments to the bug...
Does this happen with 3.2a1pre build?
(In reply to comment #9)
> hostile-sounding comments

Sorry, I was with Jesse when he wrote that and it wasn't meant to be hostile. If he/we can't figure this out then his product won't work with Firefox 3.5, plain and simple.

Efforts to reduce the testcase have been documented in comment 0 and comment 7, and so far none of these efforts have resulted in anything that makes sense.

(In reply to comment #10)
> Does this happen with 3.2a1pre build?

Yes, it does.
Status: UNCONFIRMED → NEW
Ever confirmed: true
One interesting data point - the app seems to work properly in a recent debug trunk build...
(In reply to comment #12)

Oops, please ignore me. My debug profile had the JIT disabled. Problem persists in a debug build with the JIT enabled.
Flags: blocking1.9.1? → blocking1.9.1+
Summary: TM: JavaScript memory synchronization bug with JIT enabled in 3.1+ → TM: TIBCO General Interface regression with JIT
Assignee: gal → jorendorff
Attached file slightly reduced test case (obsolete) —
I looked into this briefly, but did not really get anywhere. Here's a slightly reduced test case that reproduces the bug (I just deleted a bunch of plugins and images; if you delete much more, it stops getting far enough to trigger the bug).

I am out for two weeks.  Resisting the urge to assign this to graydon.  :)
Jesse, are there any other tests in your suite that we fail?
I certainly understand the value of a simple reproducible test case, which is why I have spent time trying to create one. Unfortunately, I only see the problem when loading GI_Builder.html, which happens to be a very complex JavaScript application. None of our pre-existing unit tests fail on 3.1+jit so we're obviously not capturing whatever condition is causing this issue. 

Since I spent some time trying to create a simple case and failed, I was hoping that you engineers would have some tools at your disposal that would help pinpoint the issue. 

After some further investigation I can say that the bug behavior does not require XSLT transforms or Sync XHR to show up. Turning off all transforms still led to the same error. And I can compile the first test case such that no files are loaded sync XHR. (I'll attach that test case too).

I don't have a clear idea of what other parts of the JavaScript or browser APIs might cause these sorts of synchronization errors. In addition to sync XHR and transforms, we also use async XHR, window.setTimeout, DOM manipulation, and XPCOM extensively in the failing test case. Async XHR and setTimeout are impossible to factor out of the test case. I may be able to get rid of the XPCOM code to make sure that that is not the culprit.
Ok, I can reduce the XPCOM code out and the issue still exists so I guess that's not the issue either.
Attachment #367675 - Attachment is obsolete: true
Bug 485775 has similar symptoms, though the regression range presented there is more recent than this bug...
Attached file Slightly more-reduced test case (obsolete) —
The previous tarball doesn't reproduce the problem for me, so I produced this starting from the "slightly reduced test case" tarball.

Now it produces a stack trace, but I don't think we're on trace at the point of the error.  The bug must happen earlier.  Will keep hacking away at this tomorrow.
Attached file test case reduction snapshot 3 (obsolete) —
down to 17000 lines in 9 files
Down to 5299 lines in 7 files.
Attachment #367519 - Attachment is obsolete: true
Attachment #370309 - Attachment is obsolete: true
Attachment #370695 - Attachment is obsolete: true
down to 1315 lines in 4 files

Still behaving too strangely to figure out.
function Model(x) {}
Model.prototype._parent = null;
Model.prototype._engine = null;
Model.prototype.getEngine = function() {
    var node = this;
    while (node) {
        if (node._engine)
            return node._engine;
        node = node._parent;
    }
    return null;
};
function add(parent, child) {
    parent.getEngine();
    parent._children.push(child);
    child._parent = parent;
}
function Block(init) {
    this._children = [];
}
Block.prototype = new Model;
var root = new Block;
root._engine = {};
var layout = new Block;
add(layout, new Block);
add(layout, new Block);
add(layout, new Block);
add(root, layout);
var engine = layout._children[0].getEngine();
print(engine ? "no bug" : "bug");
OK, this one records only one trace, not two, so it should be easier to decipher...

function Model() {}
Model.prototype.parent = null;
Model.prototype.engine = null;
function Block() {}
Block.prototype = new Model;
function getEngine(node) {
    while (node) {
        if (node.engine)
            return node.engine;
        node = node.parent;
    }
    return null;
}
var root = new Block;
root.engine = {};
var layout = new Block;
getEngine(layout);
getEngine(layout);
layout.parent = root;
var engine = getEngine(layout);
print(engine ? "no bug" : "bug");
OK, final simplification:

function h(node) {
    var x = 0;
    while (node) {
        x++;
        node = node.parent;
    }
    return x;
}
var tree = {__proto__: {__proto__: {parent: null}}};
h(tree);
h(tree);
tree.parent = {};
assertEq(h(tree), 2);
The code emitted for `node.parent` guards on the exact pointer value of `node`, not its shape.

The assignment `tree.parent = {}` changes the shape only.
`tree.parent = {}` does end up calling js_PurgeScopeChain(cx, tree), but `tree` is not a delegate, so it does nothing.  (Otherwise we would end up in PurgeProtoChain, which would change the shape of the object we're actually guarding on, the one that has the .parent property, ensuring correctness.)

The generated code for `node.parent`, after all guards, is walking the prototype chain, which I find strange.  That is unrelated though.
Works in Firefox 3.0.x -- that DELEGATE optimization (bug 454035) bites again -- sorry about that. That change assumes shadowing in a non-delegate will change the shape of that directly referenced object, so invalidate any cached/compiled code for this lookup.

What's emitting the code to walk up the prototype chain?

/be
(In reply to comment #30)
> Works in Firefox 3.0.x -- that DELEGATE optimization (bug 454035) bites again
> -- sorry about that. That change assumes shadowing in a non-delegate will
> change the shape of that directly referenced object, so invalidate any
> cached/compiled code for this lookup.

It does change the shape, but that's not enough to invalidate the compiled code (which does not shape-guard, comment 28).

Still pondering how to fix.  I'll catch up to you on IRC, I hope.

> What's emitting the code to walk up the prototype chain?

TraceRecorder::prop(JSObject* obj, LIns* obj_ins, uint32& slot, LIns*& v_ins)
{
...
        /*
         * We're getting a proto-property. Walk up the prototype chain emitting
         * proto slot loads, updating obj as we go, leaving obj set to obj2 with
         * obj_ins the last proto-load.
         */
        while (obj != obj2) {
            obj_ins = stobj_get_slot(obj_ins, JSSLOT_PROTO, dslots_ins);
            obj = STOBJ_GET_PROTO(obj);
        }
...
}

This is easier to fix.  It is always safe to teleport in this case (just emit obj2 as a constant), right?
I do not know why Jesse C-G observed the bug not reproducing with jit.content false -- it should happen (and jorendorff's great reduction does happen) with or without the jit. Anyway, the bug seems not to need the "TM: " and "with JIT" bits of the summary.

The property cache distinguishes direct, immediate-prototype, and deep hit cases. The DELEGATE optimization assumes the first two, forgetting that in the deep hit case there is no relation between directly-referenced shape and path through the scope and proto chains to the object holding the found property.

Therefore a fix is to purge even when the DELEGATE bit is not set if (a) there is shadowing going on; (b) the shadowed property's object is more than unit distance away from the directly referenced object in the PCVCAP_TAG metric. I.e. if this is a "deep shadowing" hazard.

/be
Summary: TM: TIBCO General Interface regression with JIT → TIBCO General Interface regression with
(In reply to comment #31)
> (In reply to comment #30)
> > Works in Firefox 3.0.x -- that DELEGATE optimization (bug 454035) bites again
> > -- sorry about that. That change assumes shadowing in a non-delegate will
> > change the shape of that directly referenced object, so invalidate any
> > cached/compiled code for this lookup.
> 
> It does change the shape, but that's not enough to invalidate the compiled code
> (which does not shape-guard, comment 28).

I know, that's what I was trying to say -- the bad assumption is the whole sentence, including "..., so invalidate any cached/compiled code ...."

> > What's emitting the code to walk up the prototype chain?
> 
> TraceRecorder::prop(JSObject* obj, LIns* obj_ins, uint32& slot, LIns*& v_ins)
> {
> ...
>         /*
>          * We're getting a proto-property. Walk up the prototype chain emitting
>          * proto slot loads, updating obj as we go, leaving obj set to obj2
> with
>          * obj_ins the last proto-load.
>          */
>         while (obj != obj2) {
>             obj_ins = stobj_get_slot(obj_ins, JSSLOT_PROTO, dslots_ins);
>             obj = STOBJ_GET_PROTO(obj);
>         }
> ...
> }
> 
> This is easier to fix.  It is always safe to teleport in this case (just emit
> obj2 as a constant), right?

That's what test_property_cache does, indeed. Why is this code there? I forget.

/be
Summary: TIBCO General Interface regression with → TIBCO General Interface regression
Argh, classic mistake #27(http://www.codinghorror.com/blog/archives/000889.html) from last August:

$ hg log -r17502 jstracer.cpp
changeset:   17502:86ef943700ac
user:        Brendan Eich <brendan@mozilla.org>
date:        Wed Aug 13 17:10:18 2008 -0700
summary:     Fix prototype hit case in prop to advance obj and obj_ins up the proto chain.

My apologies again. The loop in prop (which moved) is bogus. Removing it could well net a perf win.

Jason proposed flagging an object being stored as a cache key (trace guard) as a delegate, which it obviously is (the delegate flag should be set when there are one or more, potentially many, saved references to an object, and the refs are used to lookup properties). This could de-optimize non-deep hits on the same object from other sites, but removing that loop could easily win more than is lost, on average for our benchmarks.

/be
Blocks: 454035
Attached patch v1 (obsolete) — Splinter Review
This fixes the JS test case in comment 27.  Applying to my tm-browser tree now, to see if it fixes TIBCO...
Attachment #371271 - Flags: review?(brendan)
(In reply to comment #34)
> Jason proposed flagging an object being stored as a cache key (trace guard) as
> a delegate, which it obviously is (the delegate flag should be set when there
> are one or more, potentially many, saved references to an object, and the refs
> are used to lookup properties).

To be a bit more complete and precise (need to write that paper...) the saved refs are a problem not merely because many to one -- but because many different call sites (call, get, set, what's the diff?).

/be
Attached patch v2Splinter Review
As brendan mentioned to me, v1 only fixes this for the particular "saved reference" that's baked into trace.  But there is a saved reference in the property cache too, so the right place to set the flag is in js_FillPropertyCache.
Attachment #371271 - Attachment is obsolete: true
Attachment #371279 - Flags: review?(brendan)
Attachment #371271 - Flags: review?(brendan)
we need a bug for #34
Comment on attachment 371279 [details] [diff] [review]
v2

>diff --git a/js/src/jsinterp.cpp b/js/src/jsinterp.cpp
>--- a/js/src/jsinterp.cpp
>+++ b/js/src/jsinterp.cpp
>@@ -288,16 +288,22 @@ js_FillPropertyCache(JSContext *cx, JSOb
>         JS_ASSERT_IF(scopeIndex == 0,
>                      protoIndex != 1 || OBJ_GET_PROTO(cx, obj) == pobj);
>         if (scopeIndex != 0 || protoIndex != 1) {
>             khash = PROPERTY_CACHE_HASH_ATOM(atom, obj, pobj);
>             PCMETER(if (PCVCAP_TAG(cache->table[khash].vcap) <= 1)
>                         cache->pcrecycles++);
>             pc = (jsbytecode *) atom;
>             kshape = (jsuword) obj;
>+
>+            /*
>+             * Make sure that a later shadowing assignment will enter
>+             * PurgeProtoChain and invalidate this entry, bug 479198.
>+             */
>+            OBJ_SET_DELEGATE(cx, obj);

Comment also the thread-safety design intentions here (jsobj.h makes it reasonably clear that the caller of OBJ_SET_DELEGATE has to ensure no races to update obj->classword lose bits): only the DELEGATE bit may be set at runtime, never cleared; SYSTEM must be set at birth when obj is guaranteed inaccessible to other threads.

r=me with that, thanks.

/be
Attachment #371279 - Flags: review?(brendan) → review+
Confirmed, this fixes the full test case (in the first attachment above).

Thanks for the complete test case, Jesse.  Truly something is better than nothing.

http://hg.mozilla.org/tracemonkey/rev/606a2df1ed0b

I forgot to add the comment Brendan asked for.  Follow-up push coming.
Summary: TIBCO General Interface regression → "Deep" property cache entries not invalidated when shadowed (TIBCO General Interface regression)
http://hg.mozilla.org/mozilla-central/rev/805dd817a3d4
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
I just tested with the 3.6a1 nightly and I see no problems with JIT enabled or disabled. Thanks Jason, Brendan, Ben and others for tracking this down and fixing it!
Don't thank me, I introduced the bug with that freaky DELEGATE optimization of mine (I'm an optimist, it's a problem... I'm working on it...). All praise to jorendorff's heroic reduction/deduction work.

/be
Verified fixed with testcase given in comment 27 on trunk and 1.9.1 with the
following debug builds:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre)
Gecko/20090422 Minefield/3.6a1pre ID:20090422224452

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre)
Gecko/20090422 Shiretoko/3.5b4pre ID:20090422122043
Status: RESOLVED → VERIFIED
Hardware: x86 → All
Target Milestone: mozilla1.9.1b4 → mozilla1.9.2a1
js1_8_1/trace/trace-test.js	
http://hg.mozilla.org/tracemonkey/rev/61892f57b46a
Flags: in-testsuite+
cvsroot/mozilla/js/tests/js1_8_1/trace/trace-test.js,v  <--  trace-test.js
new revision: 1.14; previous revision: 1.13

/cvsroot/mozilla/js/tests/shell.js,v  <--  shell.js
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: