Closed Bug 614347 Opened 14 years ago Closed 13 years ago

XPConnect-wrapped JSObjects must clear their gray bit when they are handed out

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b11
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: bent.mozilla, Assigned: bent.mozilla)

References

Details

(Keywords: regression, Whiteboard: [hardblocker][has patch] fixed-in-tracemonkey)

Attachments

(1 file, 9 obsolete files)

Now that CC and GC are split we need to make sure that the JSObject colors stay in sync if we hand out a gray object after GC. I fixed XPCWrappedNative, but peterv believes that we should be doing something similar with tearoffs and XPCWrappedJS too.
We need this before release to possibly avoid JS objects being collected too early. Marking this a regression, which it sort of is, though not strictly. This is needed because of the gc/cc split work that's already been done.
blocking2.0: --- → betaN+
Keywords: regression
Ben, you'll get first chops at this one, but if you think peterv needs to own this, feel free to move it to him.
Assignee: nobody → bent.mozilla
Whiteboard: hardblocker
Whiteboard: hardblocker → [hardblocker]
Attached patch XPCWrappedJS changes (obsolete) — Splinter Review
Attachment #505478 - Flags: review?(peterv)
This doesn't need a beta.
blocking2.0: betaN+ → final+
Downgrading this to a softblocker per discussion with bent.
Whiteboard: [hardblocker] → [softblocker]
Based on the stack traces in bug 630517 and bug 630519, those might
depend on this one.
Blocks: 630517, 630519
...and if that is the case, this should be a hardblocker.
Other option is to back out the changes to cycle collector scheduling.
Marking hardblocker per smaug's observations.
No longer blocks: 624549
Whiteboard: [softblocker] → [hardblocker]
Blocks: 624549
(In reply to comment #8)
> Other option is to back out the changes to cycle collector scheduling.

No, let's fix this bug.

/be
This always smelled like a hardblocker to me.
Summary: Need to mark more XPConnect-wrapped JSObjects black when handing them out → XPConnect-wrapped JSObjects must clear their gray bit when they are handed out
Attached patch Patch, v1 (obsolete) — Splinter Review
This fixes the bug for real. Andreas and I realized that we were doing this wrong before - we don't want to make sure the object is marked black, rather that the gray bit is cleared. Marking black in some cases leads to crashes, and leaving the gray bit set will allow the cycle collector to collect the object when it shouldn't.
Attachment #505478 - Attachment is obsolete: true
Attachment #508951 - Flags: review?(peterv)
Attachment #505478 - Flags: review?(peterv)
Whiteboard: [hardblocker] → [hardblocker][has patch]
I was going to push the patch to tryserver, but it doesn't apply cleanly to trunk.
Blocks: 630169
(In reply to comment #13)
> I was going to push the patch to tryserver, but it doesn't apply cleanly to
> trunk.

I did already. There are some build errors, but mochitests look ok on mac. I'll see if I can figure those our today.
Blocks: 630877
Blocks: 630957
New patch that fixes the build errors?

Also, don't you need to either do this recursively to everything traced by that JSObject or change nsXPConnect::Traverse, given that nsXPConnect::Traverse doesn't traverse objects that don't have the gray bit set. It's a performance optimisation in nsXPConnect::Traverse to avoid retraversing all JS objects that the last GC already traversed (traced) to mark them black. Anything hanging off of them should be black anyway, and not recording edges from those JS objects to C++ should just lead to missing refcounts (and black C++ objects). But given that we now can run JS between the GC and the CC I suppose we really need to traverse, in case some of the edges have changed (object hanging off of a gray object, so gray itself, now hanging off of a black object). It'll probably make CC a lot slower again :-(.
Component: XPConnect → XSLT
Target Milestone: --- → mozilla2.0b11
Patch looks ok otherwise, though probably need to comment next to the various JSObject members that they should only be accessed through one of the getters. Don't think we can easily enforce that :-(.
Comment on attachment 508951 [details] [diff] [review]
Patch, v1

-ing for now, given comment 15.
Attachment #508951 - Flags: review?(peterv) → review-
Component: XSLT → XPConnect
Not traversing means we can't find a cycle there. Shouldn't it always be safe to not traverse? We marked one member of the cycle black so we will never find any cycle with that member in it.
Not for GC-ed objects, think about a cycle that's held with an additional edge from an XPCWrappedNative's JS object. The XPCWrappedNative's JS object and the JS objects in the cycle are gray. Now you mark the XPCWrappedNative's JS object non-gray. If we don't traverse then the CC won't know about the edge from the XPCWrappedNative's JS object to the cycle, and so we'll happily collect that cycle even though it's still held by an object that's not collected.
The GC will mark all the JS objects, but we will unlink any XPCOM objects in the cycle or held by it.
I see. We only unset the gray bit on the suspected object. Whe we Traverse we can skip past that even if its gray but then stop along any path that is not gray? That should preserve the shortcut but still find that link. No?
Whiteboard: [hardblocker][has patch] → [hardblocker]
Blocks: 631478
Attached patch Patch, v2 (obsolete) — Splinter Review
This implements our latest plan, but something is leaking...
Attachment #508951 - Attachment is obsolete: true
Attached patch Patch, v2.1 (obsolete) — Splinter Review
Fixed the leak! I was getting the IsGray check backwards :(

I think this is ready to go, I've pushed to try as well so fingers crossed!
http://tbpl.mozilla.org/?tree=MozillaTry&rev=450475da8eb7
Attachment #509787 - Attachment is obsolete: true
Attachment #510025 - Flags: review?(peterv)
Attachment #510025 - Flags: review?(gal)
Bingo. Tryserver is happy.
Whiteboard: [hardblocker] → [hardblocker][has patch]
I applied to v2.1 to my tracemonkey tree. I had one compile problem -- xpcjsruntime.cpp still had a reference to GetFlatJSObjectAndMark() that I replaced with GetFlatJSObject(). I am seeing a number of occurrences of this assert with only this patch + the fix applied:

  SuspectWrappedNative attempting to touch dead object

Every one I've looked at have been Window objects.

I see it on 3 different profiles, one with no extensions. I am rebuilding now without this patch to check if I get the asserts without it.
Wha?! Tryserver like it just fine... Did it rot already?
(In reply to comment #25)
>   SuspectWrappedNative attempting to touch dead object

Wait, I removed that assertion. Did you do a full rebuild?
(In reply to comment #27)
> (In reply to comment #25)
> >   SuspectWrappedNative attempting to touch dead object
> 
> Wait, I removed that assertion. Did you do a full rebuild?

Yes. (I had to for it to compile.)

You removed that assertion in this patch? I don't see that in the v2.1 attached to this bug, nor in the version you pushed to try. (The version pushed to try seems to match v2.1 at least for that file.)

Maybe it's a tracemonkey vs m-c difference?
Must be a m-c vs. tm difference. This patch is meant to apply to tm.
I'm using tm.

Your push to try had a parent of db8be4e3f373. The assert was added later, in e039358204f5. As was the code that doesn't compile for me.

Yer bits are rotten, sir.
Err... or maybe I'm full of it. e039358204f5 is from Feb 2, db8be4e3f373 from Feb 4. Let me look again.
Attached patch Patch, v2.2 (obsolete) — Splinter Review
Unrotted.
Attachment #510025 - Attachment is obsolete: true
Attachment #510484 - Flags: review?(peterv)
Attachment #510484 - Flags: review?(gal)
Attachment #510025 - Flags: review?(peterv)
Attachment #510025 - Flags: review?(gal)
Comment on attachment 510484 [details] [diff] [review]
Patch, v2.2

Steve, can you test this?
With v2.2 I get:

../../../../../mozilla/js/src/xpconnect/wrappers/../src/xpcinlines.h: In member function ‘void XPCLazyCallContext::SetWrapper(XPCWrappedNative*, XPCWrappedNativeTearOff*)’:
../../../../../mozilla/js/src/xpconnect/wrappers/../src/xpcinlines.h:794:40: error: ‘class XPCWrappedNative’ has no member named ‘GetFlatJSObjectAndMark’
Er, nm, that's due to another conflicting patch in my queue.
(In reply to comment #33)
> Comment on attachment 510484 [details] [diff] [review]
> Patch, v2.2
> 
> Steve, can you test this?

I can verify that it applies, compiles, and no longer shows the assert. The last doesn't mean much, though, since the assert is removed as part of the patch. :-) I can't speak for whether it works or not; it does not resolve the problem for which I was applying it, but that was just a shot in the dark anyway.
Comment on attachment 510484 [details] [diff] [review]
Patch, v2.2

Could we move the XPC_GC_COLOR_* to xpcpublic.h?
Could we move nsWrapperCache::GetWrapper to xpcpublic.h? If not, please s/nsWrapperCache-inl.h/nsWrapperCacheInlines.h/.
Please add comments to the *PreserveColor functions to explain how they're different from the non-PreserveColor variant: that they don't make sure that the object returned will be kept alive past the next CC/GC and they should only be used if we are certain that the return value won't be passed into a JS API function or stored without rooting it or without signaling the stored value to the CC.
Attachment #510484 - Flags: review?(peterv) → review+
blocking2.0: final+ → betaN+
Good call on beta+.
Attachment #510484 - Flags: review?(gal) → review+
This bounced off tracemonkey, something in just the debug mochitest-chrome failed.

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1297227039.1297230992.22103.gz&fulltext=1#err2

JavaScript strict warning: chrome://browser/content/tabview.js, line 418: reference to undefined property this[0]
TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js | Console message: [JavaScript Warning: "Error in parsing value for '-moz-column-count'.  Declaration dropped." {file: "chrome://browser/content/tabview.html" line: 0}]
TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js | Console message: [JavaScript Warning: "WARN addons.manager: Exception calling callback: TypeError: PREF_GETADDONS_CACHE_ENABLED is undefined" {file: "chrome://mozapps/content/extensions/extensions.js" line: 1657}]
TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js | Console message: trace: tabview assert: does not yet support multi-objects (or null objects)
iQClass_width()@chrome://browser/content/tabview.js:392
GroupItem_getContentBounds()@chrome://browser/content/tabview.js:2607
GroupItem_arrange([object Object])@chrome://browser/content/tabview.js:3384
GroupItem_reorderTabItemsBasedOnTabOrder()@chrome://browser/content/tabview.js:3853
([object Object],0,[object Array])@chrome://browser/content/tabview.js:7532
UI_showTabView(true)@chrome://browser/content/tabview.js:7531
([object Event])@chrome://browser/content/tabview.js:7217
()@chrome://browser/content/browser.js:4879
TabView__initFrame((function () {var event = document.createEvent("Events");event.initEvent("tabviewshow", false, false);dispatchEvent(event);}))@chrome://browser/content/browser.js:4833
()@chrome://browser/content/browser.js:4876
()@chrome://browser/content/browser.js:4898
test()@chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js:43
Tester_execTest()@chrome://mochikit/content/browser-test.js:235
Test_realNextTest()@chrome://mochikit/content/browser-test.js:196
([object Proxy],29)@chrome://mochikit/content/browser-test.js:110
TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js | Console message: [JavaScript Warning: "reference to undefined property this[0]" {file: "chrome://browser/content/tabview.js" line: 418}]
NEXT ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/tabview/browser_tabview_apptabs.js | Test timed out
Er, mochitest-browser-chrome.
Did this patch cause a regression? http://arewefastyet.com/ seemed to have jumped up 600ms due to this patch
(In reply to comment #41)
> Did this patch cause a regression? http://arewefastyet.com/ seemed to have
> jumped up 600ms due to this patch

It was backed out, so I have no idea why the numbers moved.
Attached patch Patch, v2.3 (obsolete) — Splinter Review
This fixes XPCVariant and the test failures.
Attachment #510484 - Attachment is obsolete: true
Comment on attachment 511119 [details] [diff] [review]
Patch, v2.3

This bounced again yesterday. Something is wrong, the variant thing seems to be a red herring.
Attachment #511119 - Attachment is obsolete: true
Attached patch Patch v2.4 (obsolete) — Splinter Review
This patch fixes XPCWrappedNative's "wrapper wrapper" (BLEGH!) and the expando maps too, but it still doesn't fix the tabcandy mochitests. We're missing something still.
Whiteboard: [hardblocker][has patch] → [hardblocker]
What's the status on this? If this indeed fixes bug 630517, it fixes the #3 crash on Beta11.
No good progress, we still have an undiagnosed problem here. I will confirm with bent and come up with an emergency plan if we can't fix this in time.
We actually have a much better idea of what's going wrong, patch waiting on tryserver now.
Sad news, http://hg.mozilla.org/try/rev/22cb19dd492f failed again. Next idea is to recursively un-gray objects.
Attached patch Patch, v3 (obsolete) — Splinter Review
Woo! This one passed the tryserver for the tabcandy tests that were failing before. Full tryserver run is in progress, http://hg.mozilla.org/try/rev/ae0c55d09422 but assuming that passes let's get some eyes on this.

Now we ungray all objects traced from the object we're handing out. Dbaron pointed out a hole in our logic before so I think we have to do this. We're also correctly telling the CC about black objects now... Before we were claiming that the black objects were GCUnmarked :-(
Attachment #511616 - Attachment is obsolete: true
Attachment #512704 - Flags: review?(peterv)
Attachment #512704 - Flags: review?(gal)
Attachment #512704 - Flags: feedback?(Olli.Pettay)
(In reply to comment #51)
> Dbaron pointed out a hole in our logic before

What's the hole?
What's that?! You want ASCII art?!

Basically we have a cycle of objects exactly like you describe in comment 19, something like this (where A, B, and C are all XPConnect roots with their own JSObjects, and D is some JS object found from JS roots):

  XPConnect roots:    JS roots:

  A -> B ---+         D
       ^    |
       |    |
       C <--+

(trying to show A owns B, B owns C, C owns B, D is off in its own happy world. A, B, and C are gray, D is black.)

If we hand out the JSObject for A (turning it black, of course) then script could do anything it wants with it, like doing several GetProperty calls and finding the JSObject for C. If it then defines that JSObject for C as a new property on D, and then drops the link from A to B, we have created a new owning reference reachable only through JS roots:

  XPConnect roots:    JS roots:

  A    B ---+         D
       ^    |         |
       |    |         |
       C <--+         |
       ^              |
       |              |
       +--------------+

So now the cycle collector will see B and C as collectable. The only way to prevent this is to mark B and C black when we hand out A.
Comment on attachment 512704 [details] [diff] [review]
Patch, v3


> #ifdef DEBUG
>     if (!mWillReparent) {
>       // We really shouldn't have a wrapper here but if we do we need to make sure
>       // it has the correct parent.
>-      JSObject *obj = GetWrapper();
>+      JSObject *obj = GetWrapperPreserveColor();

We preserve the color here because its DEBUG right?

>+// static
>+void
>+nsXPConnect::UnmarkGrayInternal(void *thing)
>+{

XXXInternal, MyXXX, DoXXX are really bad names. How about RecursiveUnmarkGrayObject?

>+    // Remove the gray color from the given GCThing and any other GCThings that
>+    // can be reached through it.
>+    static void UnmarkGray(void *thing)
>+        {if(thing && nsXPConnect::IsGray(thing)) UnmarkGrayInternal(thing);}
>+

thing is really always an object, so we could also type it at such.
Attachment #512704 - Flags: review?(gal) → review+
Attached patch Patch, v3.1 (obsolete) — Splinter Review
Fixed a crash, and addressed gal's comments. Yeah, DEBUG only code shouldn't change color imo, otherwise we'd get weird behavioral differences maybe.
Attachment #512704 - Attachment is obsolete: true
Attachment #512718 - Flags: review?(peterv)
Attachment #512718 - Flags: feedback?(Olli.Pettay)
Attachment #512704 - Flags: review?(peterv)
Attachment #512704 - Flags: feedback?(Olli.Pettay)
Awesome work, bent et al.!

/be
Whiteboard: [hardblocker] → [hardblocker][has patch]
Comment on attachment 512718 [details] [diff] [review]
Patch, v3.1

I tested builds (64bit opt Linux builds which have Bug 630932) with and without this patch.
I didn't see any major changes.
I tested FF startup, Google Maps, Gmail, BBC news opened in 78 tabs,
closing those tabs, the testcase for bug 631494 (with and without ABP)
and then also on OSX 10.5 FOTN.

I think there was a small CC perf regression noticeable in the case when 78 tabs were opened.
Few % or so. But could be just noise.
Attachment #512718 - Flags: feedback?(Olli.Pettay) → feedback+
Comment on attachment 512718 [details] [diff] [review]
Patch, v3.1

>diff --git a/dom/base/nsDOMClassInfo.h b/dom/base/nsDOMClassInfo.h

>   static nsresult WrapNative(JSContext *cx, JSObject *scope,
>                              nsISupports *native, nsWrapperCache *cache,
>                              const nsIID* aIID, jsval *vp,
>                              nsIXPConnectJSObjectHolder** aHolder,
>-                             PRBool aAllowWrapping)
>-  {
>-    if (!native) {

This is annoying, IIRC inlining was a performance improvement here.

>diff --git a/dom/base/nsWrapperCache.cpp b/dom/base/nsWrapperCache.cpp

>+JSObject*
>+nsWrapperCache::GetWrapper() const

Same here :-(.

>diff --git a/dom/base/nsWrapperCache.h b/dom/base/nsWrapperCache.h

>+  /**
>+   * This getter clears the gray bit before handing out the JSObject which means
>+   * that the object is guaranteed to be kept alive past the next CC.
>+   *
>+   * Implemented in xpcpublic.h because we have to include some JS headers that
>+   * don't play nicely with the rest of the codebase. Include xpcpublic.h if you
>+   * need to call this method.

That's not correct anymore. Can you now remove a bunch of xpcpublic.h includes that you added?

>diff --git a/js/src/xpconnect/src/nsXPConnect.cpp b/js/src/xpconnect/src/nsXPConnect.cpp

>@@ -675,17 +720,17 @@ nsXPConnect::Traverse(void *p, nsCycleCo
>             PL_DHashTableOperate(&mJSRoots, p, PL_DHASH_LOOKUP);
>         type = markJSObject || PL_DHASH_ENTRY_IS_BUSY(entry) ? GCMarked :
>                                                                GCUnmarked;
>     }
>     else
> #endif
>     {
>         // Normal codepath (matches non-DEBUG_CC codepath).
>-        type = !markJSObject && IsGray(p) ? GCUnmarked : GCMarked;
>+        type = !markJSObject && IsGray(obj) ? GCUnmarked : GCMarked;

This change is wrong (why is it even needed?), obj might be uninitialized.

>@@ -756,31 +801,24 @@ nsXPConnect::Traverse(void *p, nsCycleCo

>-    // There's no need to trace objects that have already been marked by the JS
>-    // GC. Any JS objects hanging from them will already be marked. Only do this
>-    // if DEBUG_CC is not defined, else we do want to know about all JS objects
>-    // to get better graphs and explanations.
>-    if(!cb.WantAllTraces() && type == GCMarked)
>-        return NS_OK;

Why do we still need this change, given that we're now doing the recursive ungraying in the getters?

>diff --git a/js/src/xpconnect/src/xpcjsruntime.cpp b/js/src/xpconnect/src/xpcjsruntime.cpp

>@@ -498,28 +500,22 @@ XPCJSRuntime::SuspectWrappedNative(JSCon

>-    // Only record objects that might be part of a cycle as roots, unless
>-    // the callback wants all traces (a debug feature).
>-    if(nsXPConnect::IsGray(obj) || cb.WantAllTraces())
>-        cb.NoteRoot(nsIProgrammingLanguage::JAVASCRIPT, obj,
>-                    nsXPConnect::GetXPConnect());
>+    cb.NoteRoot(nsIProgrammingLanguage::JAVASCRIPT, obj,
>+                nsXPConnect::GetXPConnect());

Why do we still need this change, given that we're now doing the recursive ungraying in the getters?

>diff --git a/js/src/xpconnect/src/xpcwrappednativescope.cpp b/js/src/xpconnect/src/xpcwrappednativescope.cpp

> static JSDHashOperator
> WrappedNativeSuspecter(JSDHashTable *table, JSDHashEntryHdr *hdr,
>                        uint32 number, void *arg)
> {
>     XPCWrappedNative* wrapper = ((Native2WrappedNativeMap::Entry*)hdr)->value;
> 
>-    if(wrapper->HasExternalReference())
>+    if(wrapper->HasExternalReference() ||
>+       !nsXPConnect::IsGray(wrapper->GetFlatJSObjectPreserveColor()))

Why do we still need this change, given that we're now doing the recursive ungraying in the getters?
Attachment #512718 - Flags: review?(peterv) → review-
Also, UnmarkGrayChildren casts a void* potentially pointing to a JSXML object to JSObject*, how does that work?
Attached patch Patch v3.2Splinter Review
Should address all comments.
Attachment #512718 - Attachment is obsolete: true
Attachment #512840 - Flags: review?(peterv)
(In reply to comment #58)
> I tested builds (64bit opt Linux builds which have Bug 630932) with and without
> this patch.
> I didn't see any major changes.

Thanks a ton smaug!
Comment on attachment 512840 [details] [diff] [review]
Patch v3.2

>diff --git a/js/src/xpconnect/src/nsXPConnect.cpp b/js/src/xpconnect/src/nsXPConnect.cpp

> static void
> NoteJSChild(JSTracer *trc, void *thing, uint32 kind)
> {
>+    TraversalTracer *tracer = static_cast<TraversalTracer*>(trc);
>+
>+    // There's no point in further traversing a non-gray object here unless we
>+    // explicitly want to see all traces.
>+    if(!xpc_IsGrayGCThing(thing) && !tracer->cb.WantAllTraces())
>+        return;
>+
>     if(ADD_TO_CC(kind))
>     {
>-        TraversalTracer *tracer = static_cast<TraversalTracer*>(trc);

You need to move the |if(!xpc_IsGrayGCThing(thing) && !tracer->cb.WantAllTraces())| inside this block.
Now it just needs to build on Windows.
Attachment #512840 - Flags: review?(peterv) → review+
Fixed that, and fixed the windows build.

http://hg.mozilla.org/tracemonkey/rev/14dda9fd9686
Status: NEW → ASSIGNED
blocking2.0: betaN+ → ---
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch] fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/52246c1b1799
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 634865
Blocks: 634880
Restoring BetaN+ flag that seems to have been accidentally removed.
blocking2.0: --- → betaN+
Depends on: 637605
You need to log in before you can comment on or make changes to this bug.