Closed Bug 684711 Opened 12 years ago Closed 10 years ago

Don't add objects that are used off of the main thread to the cycle collector

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: peterv, Assigned: peterv)

References

Details

(Keywords: sec-moderate, Whiteboard: [sg:moderate])

I stumbled on this while debugging bug 675605. In that case an nsXPCWrappedJS is added to the cycle collector and is then released by a non-main thread while we're running cycle collection. So part of this is bug 464678, but we're also adding XPCWrappedNatives that are not marked main-thread-only.
This probably causes a number of crashes similar to bug 675605, when the cycle collector tries to traverse dangling pointers.

I have a patch that fixes this, but it causes leaks. We apparently already rely on cycle collecting these objects to fix leaks :-(.
Maybe when you find a non-main thread nsXPCWrappedJS during cycle collection, you could throw it into a ref-counted array, which will addref it, and keep it alive during collection.  You'd have to give it an extra +1 to the number of found references.  Then, at the end of a CC, you throw away the extra array, which would drop the extra ref count.

Kind of a hack.  Is the addref stuff threadsafe?  Hopefully...
Luke thinks that we'll have single-threaded JSRuntimes real soon now, and thinks that enforcing single-threaded XPConnect might be the next step.

Is that a reasonable goal? I'd love to work on it, and it sounds like it would clear up nasty bugs like this one (in addition to simplifying a lot of code).
(In reply to Andrew McCreight [:mccr8] from comment #1)
> Maybe when you find a non-main thread nsXPCWrappedJS during cycle
> collection, you could throw it into a ref-counted array, which will addref
> it, and keep it alive during collection.  You'd have to give it an extra +1
> to the number of found references.  Then, at the end of a CC, you throw away
> the extra array, which would drop the extra ref count.

The problem is that the traversal info will potentially be wrong, since the refcount can change after we've traversed but before we try to unlink.
Hmm, yeah I forgot about that.  The original cycle collection paper that the Firefox algorithm is based on deals with that by rechecking once you find some garbage.  But that rechecking can also fail, somehow.  And there was a lot more complexity that I never really tried to understand, so this probably is not a rabbit hole we want to go down.
Looks like bug 674597 might help make this easier to fix.
Depends on: 674597
Depends on: 684938
Actually, xpcom proxies are almost gone.
Depends on: 675221
No longer depends on: 674597
There are other ways than using proxies to hit this though.
(In reply to Peter Van der Beken [:peterv] from comment #7)
Sure.  I was just commenting on why I changed the dependency from bug 674597 to bug 675221.
guessing sg:moderate because the timing dependency would make this an unreliable exploit, but is it common enough that a frame trying the same thing over and over again might have reasonable certainty? If so this might be sg:critical
Whiteboard: [sg:moderate]
The bugs I added have stacks with nsXPCWrappedJS::cycleCollection::Traverse that calls into nsXPCOMCycleCollectionParticipant::CheckForRightISupports, that calls into some random unrelated function, or crashes with an invalid address.  Kind of weird that it seems to often sink into SVG code.  These may not all be related, but they look suspicious.

Bug 604865 is another instance of this, but has been resolved invalid, because the random code that CheckForRightISupports was crashing in does not exist any more.

Bug 633557, bug 547987 and bug 590721 also look a little suspicious, but the call stacks are not obviously mangled: there's a nsXPCWrappedJS::cycleCollection::Traverse that goes to a NoteXPCOMChild that crashes with a bad access.  Bug 607867 and bug 505016 involve stacks that look similar to that.
Blocks: 542078, 602772, 604176
Peter talked about one line of attack for fixing this would involve marking off-thread objects black or gray as you go, in XPCJSRuntime::TraceJS.  This would involve switching colors back and forth.

Unfortunately, Bill pointed out that this isn't sound.  If you have an object x reachable from both a gray and a black object, and you visit the gray one first, then x will end up getting colored gray, which is wrong.

Generally speaking, it is only safe to switch colors to gray once per GC.  I'd forgotten about that when I was talking to Peter.  This could be worked around by scanning the objects twice: once looking for things you want to mark black, and once for things you want to mark gray.  Kind of gross.  This could be made slightly less bad by creating a list of gray objects to scan later while looking for black objects.
(Bill is planning to eventually split up the XPConnect root marking into separate gray and black callbacks, so the color switching will all be done on the GC side, which will make the color switching stuff a little less footgun-y.)
Group: core-security
JSRuntime should be single threaded now, so hopefully not...
Peter, is this still a problem given that we only allow XPCWJS on the main thread?
Flags: needinfo?(peterv)
Sure shouldn't be...
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(peterv)
You need to log in before you can comment on or make changes to this bug.