Data race in Paris Bindings due to use of xpc::StringToJsval

RESOLVED FIXED in Firefox 26

Status

()

defect
RESOLVED FIXED
6 years ago
3 months ago

People

(Reporter: bent.mozilla, Assigned: mccr8)

Tracking

(4 keywords)

unspecified
mozilla28
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox25 wontfix, firefox26+ fixed, firefox27+ fixed, firefox28+ fixed, firefox-esr2426+ fixed, b2g18 wontfix, b2g-v1.1hd wontfix, b2g-v1.2 fixed)

Details

(Whiteboard: [qa-][adv-main26+][adv-esr24.2+])

Attachments

(2 attachments, 12 obsolete attachments)

1.39 KB, patch
jonco
: review+
Details | Diff | Splinter Review
6.75 KB, patch
bzbarsky
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
xpc::StringToJsval uses some static variables. When used from multiple threads this causes problems.

Here's the stack:


xul.dll!StringBufferToJSVal - xpcpublic.h:214
xul.dll!ReadableToJSVal - xpcstring.cpp:71
xul.dll!NonVoidStringToJsval - xpcquickstubs.cpp:791
xul.dll!NonVoidStringToJsval - xpcpublic.h:291
xul.dll!get_responseText - xmlhttprequestbinding.cpp:2492
xul.dll!genericGetter - xmlhttprequestbinding.cpp:2769
mozjs.dll!CallJSNative - jscntxtinlines.h:218
mozjs.dll!Invoke - interpreter.cpp:489
mozjs.dll!Invoke - interpreter.cpp:539
mozjs.dll!InvokeGetterOrSetter - interpreter.cpp:610
mozjs.dll!get - shape-inl.h:256
mozjs.dll!NativeGetInline<1> - jsobj.cpp:4029
mozjs.dll!GetPropertyHelperInline<1> - jsobj.cpp:4200
mozjs.dll!GetPropertyHelper - jsobj.cpp:4209
mozjs.dll!GetPropertyOperation - interpreter.cpp:292
mozjs.dll!Interpret - interpreter.cpp:2323
mozjs.dll!RunScript - interpreter.cpp:446
mozjs.dll!ExecuteKernel - interpreter.cpp:630
mozjs.dll!Execute - interpreter.cpp:667
mozjs.dll!Evaluate - jsapi.cpp:5156
xul.dll!WorkerRun - scriptloader.cpp:660
xul.dll!Run - workerprivate.cpp:1705
xul.dll!RunSyncLoop - workerprivate.cpp:3612
xul.dll!LoadAllScripts - scriptloader.cpp:717
xul.dll!Load - scriptloader.cpp:945
xul.dll!ImportScripts - workerscope.cpp:419
mozjs.dll!CallJSNative - jscntxtinlines.h:218
mozjs.dll!Invoke - interpreter.cpp:489
mozjs.dll!Interpret - interpreter.cpp:2484
mozjs.dll!RunScript - interpreter.cpp:446
mozjs.dll!ExecuteKernel - interpreter.cpp:630
mozjs.dll!Execute - interpreter.cpp:667
mozjs.dll!Evaluate - jsapi.cpp:5156

Marking s-s until we know otherwise.
Assignee

Updated

6 years ago
Keywords: csec-race
Yeah, this is bad.

Is there a convenient per-thread data structure we could stick this cache in or something?
Flags: needinfo?(bobbyholley+bmo)
I asked about that in #jsapi last week and apparently there isn't anything free
in JSRuntime or Zone or such, but such could be added.
Flags: needinfo?(luke)
Something on a Zone would be ideal, actually.  Right now StringBufferToJSVal has to check that the cached-string zone matches the context zone, but if I could just get the cached buffer of the zone directly that would rock.  That would still make it easy to clear the cache when strings are finalized, because I should be able to get to the Zone from the finalizer...
Seems fine to add a jsfriendapi.h js::(Get|Set)ZoneUserData for now.  If more cases like this come up, we'd probably want to be able to hang a general xpc::ZonePrivate struct off of the JS Zone (w/ hooks to create/destroy), but that seems like overkill atm.
Flags: needinfo?(luke)
(In reply to Luke Wagner [:luke] from comment #4)
> Seems fine to add a jsfriendapi.h js::(Get|Set)ZoneUserData for now.  If
> more cases like this come up, we'd probably want to be able to hang a
> general xpc::ZonePrivate struct off of the JS Zone (w/ hooks to
> create/destroy), but that seems like overkill atm.

If this is going to be used on multiple threads, I really don't want to get xpconnect involved.
Flags: needinfo?(bobbyholley+bmo)
Assignee

Comment 6

6 years ago
Zones can't migrate between runtimes, so presumably the ZoneUserData would be used for per-thread caching, so it should be okay for XPConnect to be involved.

Setting to high because it is probably difficult to exploit, but feel free to raise to critical if you want.
Keywords: sec-high
(In reply to Andrew McCreight [:mccr8] from comment #6)
> Zones can't migrate between runtimes, so presumably the ZoneUserData would
> be used for per-thread caching, so it should be okay for XPConnect to be
> involved.

Yes, I was talking about luke's suggestion for storing this information in a generalized xpc::ZonePrivate.
(In reply to Bobby Holley (:bholley) from comment #7)
> Yes, I was talking about luke's suggestion for storing this information in a
> generalized xpc::ZonePrivate.

... which would also hang off the zone (via a pointer in the Zone) and be only used on the JSRuntime's owner thread (so, off the Gecko main thread for workers).  Maybe your point is just that xpc:: is the wrong namespace ;)
(In reply to Luke Wagner [:luke] from comment #8)
> ... which would also hang off the zone (via a pointer in the Zone) and be
> only used on the JSRuntime's owner thread (so, off the Gecko main thread for
> workers).  Maybe your point is just that xpc:: is the wrong namespace ;)

Precisely. If it's general information, it should go in namespace js. If the values are embedder-dependent, we should have different machinery in xpc and mozilla::dom::workers.
I think having a mozilla::dom::ZoneUserData would be fine.  ;)
(In reply to Boris Zbarsky [:bz] from comment #10)
> I think having a mozilla::dom::ZoneUserData would be fine.  ;)

Yeah, that works too.
Assignee

Comment 12

6 years ago
If XPConnect is intended to be mainthread-only, maybe it makes sense to move all of these string functions into some other namespace.
(In reply to Andrew McCreight [:mccr8] from comment #12)
> If XPConnect is intended to be mainthread-only, maybe it makes sense to move
> all of these string functions into some other namespace.

Agreed. I have a heart attack every time I see "xpc" in dom/workers.
Assignee

Comment 14

6 years ago
This looks easy enough, I'll prototype a JS API and run it past Luke or Bill.
Assignee: nobody → continuation
Assignee

Updated

6 years ago
Depends on: 909490
Assignee

Comment 15

6 years ago
I noticed another problem while working on this.  If, at the start of a GC, we have an otherwise unreachable string in the cache, and during marking we hand it out, we can end up with a dead pointer in a live object, which can lead to a use-after-free.  The case where we allocate a string is safe, because we will never throw out a string allocated during the GC. 

The code currently deals with an equivalent problem during sweeping by calling ClearCache() during sweeping, so I can just fix this by moving ClearCache() to the start of GC rather than the start of sweeping.
Keywords: csec-uaf
Assignee

Comment 18

6 years ago
For some reason, after running mochitest browser Chrome for about 10 minutes, it pretty consistently ends up pulling a dead JSString out of the cache.
Attachment #810132 - Attachment is obsolete: true
Assignee

Comment 21

6 years ago
Thanks to Bill for spending a bunch of time helping me figure out why browser-chrome was hitting a use-after-free after ten minutes.  If the browser starts a GC, then puts something in the cache, then gets a reset GC, then the cache purging callback isn't triggered, and the string doesn't end up marked, so the GC will collect it, then it can get handed out again.  So, just a bug in my patch, not an existing GC problem.
Assignee

Updated

6 years ago
Depends on: 924706
Assignee

Comment 22

6 years ago
This whole thing is turning into kind of a mess.  I wonder how bad it would be to just disable this cache for workers, for backporting.
Having the cache just for main thread sounds ok to me.
Why would that be ok? We're trying to push more things on workers, and that will be much harder if they are fundamentally slower...
Assignee

Comment 25

6 years ago
Sure, for trunk (and whatever is the equivalent of b2g trunk) I'm going to fix it for real, with per-zone caching.  I guess none of the patches are particularly complex, so they shouldn't be very high risk, there's just a lot of them.  I'm concerned the most about bug 924706, which reworks how an infrequent GC is done.
Assignee

Comment 26

6 years ago
Bug 928066 looks like it is hitting this reproducibly, on B2G, with Ringmark.
Comment on attachment 819854 [details] [diff] [review]
only use the string cache on the main thread, at least semi-works

I can confirm that applying this patch fixes bug 924525.
Assignee

Updated

6 years ago
Blocks: 924525, 928066
Assignee

Updated

6 years ago
Keywords: crash
Have you tested the performance impact of this NS_IsMainThread() call on the hot path here?
Flags: needinfo?(continuation)
Assignee

Comment 33

6 years ago
No.  Do you have a suggestion for a microbenchmark that could test that?

For the bindings case, it looks like we codegen a version of eg get_responseText for workers and for non-workers, so we could add a template argument IsWorker or whatever and avoid any dynamic overhead.  I'm not sure if that's possible in the 10 or so places that call this method outside of bindings generated code, or if those places are more or less perf sensitive.
Flags: needinfo?(continuation)
We could just change the codegen to call a different function for worker string args.
> No.  Do you have a suggestion for a microbenchmark that could test that?

Sure.  getAttribute is a good testcase, when the attribute has a value.  Worth testing what the numbers look like without the patch, with the NS_IsMainThread() call, and with that call replaced by "false" just to see a baseline...

> We could just change the codegen to call a different function for worker string args.

No, because some bindings (e.g. EventTarget) are the same on workers and non-workers, and long term they all will be, no?
(In reply to On vacation Oct 12 - Oct 27 from comment #35)
> No, because some bindings (e.g. EventTarget) are the same on workers and
> non-workers, and long term they all will be, no?

Actually, I wasn't sure about that. I mean, we could have two bindings that are generated that have the same methods in most cases, but not in all, right?
We don't have plans to ever converge the bindings for XHR, though I suppose we could if we needed to by introducing some sort of interface class abstraction.
We can generate two separate EventTarget bindings, but we don't particularly want to.

In any case, I assume the main-thread-check bit is just for branches; I guess we could munge the codegen a bit for branches if we have to...
Assignee

Comment 39

6 years ago
I tried out a microbenchmark I found http://jsperf.com/demo-of-microbenchmark-silliness that happens to be by Boris (top search result on Google for "getAttribute microbenchmark"!).  The results are not great.

In a regular opt build, I get about 22.3 million ops/sec.  With the main thread checks, I get about 17.6 million ops/sec, so something like 20% slower.  With the cache disabled entirely, I get about 14 million ops/sec.  I'll try some tests with per-zone caching.
Assignee

Updated

6 years ago
Attachment #810848 - Attachment is obsolete: true
Assignee

Updated

6 years ago
Attachment #810850 - Attachment is obsolete: true
Assignee

Comment 40

6 years ago
With the zone cache patch, I'm seeing 20.5 million ops/sec.  On the fast path, there's an additional function call, JS_GetZoneUserData, and the additional indirection of reading data out of a struct.  I could maybe reduce the former by moving the zone data into the shadow zone, and inline JS_GetZoneUserData.
Assignee

Comment 41

6 years ago
Well, with further testing, the zone cache is about the same speed with my patch as the existing code, maybe a few % slower.  Inlining the zone data access seems to close the gap.

I tried caching the zone of the string, but that didn't seem to make it any faster.  Just zone caching might avoid crashes we're seeing, but of course it won't help the raciness.
Assignee

Comment 42

6 years ago
Boris, I assume the performance regressions in comment 39 for the patch with explicit main thread checks are unacceptable and I should pursue an alternate approach?
Flags: needinfo?(bzbarsky)
Assignee

Comment 43

6 years ago
I only just now noticed the part in comment 3 where we don't need to check that the string is from the right zone because we're pulling from a per-zone cache. Oops.  With this version of the patch, we get about the same performance on the getAttribute microbenchmark as without.  I tried inlining JS_GetZoneUserData but that didn't seem to improve performance with this version.  I'll attach that patch for posterity.

I'm also working on a patch to disable the cache on workers without using NS_IsMainThread, by passing around a boolean that indicates if we're on workers or not. I'll see what kind of effect that has on performance on the main thread.  As part of that, I'm planning to figure out in which version we started using this cache on workers, in the perhaps naive hope that we've only done it recently (eg not in b2g18).  The initial landing of the string cache is bug 773520, Firefox 18.  If b2g18 isn't affected by this problem, then I'm tempted to just use this version of the patch on branches.  Bill said he's not too worried about regressions from his GC patch.

Anyways, the regression from NS_IsMainThread is pretty bad, almost as bad as just disabling the cache, so I think we don't want to do that.
Attachment #810847 - Attachment is obsolete: true
Attachment #819854 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Comment on attachment 824390 [details] [diff] [review]
alternate patch, pass in a boolean everywhere to indicate if you use the cache or not, WIP

You could probably just define a 3-arg version that calls the 4-arg version with "true" and avoid most of the changes here, now that you've found all the bindings bits that use the function, right?

>+                # I'm not sure what these are, but they don't seem to be workerAndNotWorker.

They're descriptor providers as opposed to descriptors.  Think dictionary codegen (for a worker or not).  You want this to useCache = false, I believe.
Assignee

Comment 47

6 years ago
Posted file provider files (obsolete) —
Here's a list of files where we hit the provider case.  I'll set them to not use the cache, as bz suggested.  Hopefully nothing too critical in there.  No elements.
Yeah, those are all dictionaries containing strings, basically.
Assignee

Comment 49

6 years ago
Cleaned up a bit, including changes from bug 932505.
Attachment #824390 - Attachment is obsolete: true
Assignee

Comment 50

6 years ago
Here's another approach.  The goal is to have a version we are okay with on all branches.  It uses per-zone caching, but does not require modifications to the GC.  I still need to check the performance, but it is at least stable enough to run 40 minutes of Mochitests.

Instead of clearing the caches at the beginning of a GC, this version uses a two-pronged approach:

1) First, if we read from the cache during the GC, we mark the string black.  This handles the problem where a string is present only in the cache at the start of a GC, and we pull the string out during marking.  This is hopefully inexpensive in the common case, when we're not actively GCing. (This case is not currently handled by the cache.)

2) Second, at the start of sweeping, we clear the caches.  This is done currently, and handles the case where something is only present in the cache prior to sweeping, and it would get handed out during sweeping, when it becomes garbage.  Because we now cache per-zone, and sweeping is done in groups that don't include all zones, I had to add a new callback that clears the cache of a single zone.  We invoke that callback when we're sweeping the zone.
Assignee

Comment 51

6 years ago
Attachment #824103 - Attachment is obsolete: true
Attachment #824105 - Attachment is obsolete: true
Attachment #825381 - Attachment is obsolete: true
Attachment #825650 - Attachment is obsolete: true
Attachment #826838 - Attachment is obsolete: true
Assignee

Comment 53

6 years ago
Linux64 debug run was green: https://tbpl.mozilla.org/?tree=Try&rev=d66269c58900

Performance was about the same as the other implementation of per-zone string caching, which is 0-10% slower than the current implementation.
Assignee

Updated

6 years ago
Depends on: 934990
Assignee

Comment 54

6 years ago
Full try run was green: https://tbpl.mozilla.org/?tree=Try&rev=dcfdb01479f3
...modulo some Win-debug build bustage that is fixed by bug 934990: https://tbpl.mozilla.org/?tree=Try&rev=9094f41d50c6
Assignee

Comment 55

6 years ago
Comment on attachment 827057 [details] [diff] [review]
part 1: implement a read barrier for GC things

This is basically a hacked up version of ExposeGCThingToActiveJS, because we don't care about grayness for strings, and we're mostly concerned about whether something is alive at all.  Hopefully the GGC parts make sense.
Attachment #827057 - Flags: review?(jcoppeard)
Assignee

Updated

6 years ago
Attachment #827058 - Flags: review?(bzbarsky)
Comment on attachment 827057 [details] [diff] [review]
part 1: implement a read barrier for GC things

Review of attachment 827057 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #827057 - Flags: review?(jcoppeard) → review+
Comment on attachment 827058 [details] [diff] [review]
part 2: implement a per-zone string cache

r=me.  Thank you!
Attachment #827058 - Flags: review?(bzbarsky) → review+
Assignee

Comment 58

6 years ago
Comment on attachment 827058 [details] [diff] [review]
part 2: implement a per-zone string cache

(this approval request is for both patches)

[Security approval request comment]
How easily could an exploit be constructed based on the patch? There are two problems here. One, the race condition, can probably be found by anybody with TSAN, so I don't think that's an issue in terms of discoverability from the patch.  The other, the GC thing, isn't entirely obvious, but it is clear some GC stuff is being changed, which is a red flag.  It depends on GC timing, so I'm not sure how easy it would be to actually make an exploit.  Jesse hasn't fuzzed up a test case for this, with the additional GC control we have, which seems like a reasonably good sign for how hard it is to write an exploit.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? Everything.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  b2g18 will require some extra work, because it doesn't have bug 834877. But generally the code is fairly similar.

How likely is this patch to cause regressions; how much testing does it need?  This patch is a little tricky, but it is also code that is run a lot, so any regressions will hopefully be caught quickly.
Attachment #827058 - Flags: sec-approval?
Comment on attachment 827058 [details] [diff] [review]
part 2: implement a per-zone string cache

sec-approval+.

Let's make Aurora, Beta, and ESR24 patches for this once it is in.
Attachment #827058 - Flags: sec-approval? → sec-approval+
Target Milestone: --- → mozilla28
Can we get patches for uplift here (or noms on the existing patch if it applies?)
Flags: needinfo?(continuation)
Assignee

Comment 63

6 years ago
I started putting together backports, but I realized the fix here isn't quite right yet: in between when marking stops and an individual zone is swept, we can end up pulling dying objects out of the cache.  I'll try to think of how to fix this.
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #63)
> I started putting together backports, but I realized the fix here isn't
> quite right yet: in between when marking stops and an individual zone is
> swept, we can end up pulling dying objects out of the cache.  I'll try to
> think of how to fix this.

Andrew, we're going to build our second-to-last Beta for FF26 on Monday, where is this at?  What can we safely uplift or should we wontfix this for FF26 and get a proper fix into FF27?
Flags: needinfo?(continuation)
Assignee

Comment 65

6 years ago
Comment on attachment 827058 [details] [diff] [review]
part 2: implement a per-zone string cache

Yeah, I should just land this.  I'll file a followup for whatever other problem I think there may be.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): something last year
User impact if declined: possible sec problems
Testing completed (on m-c, etc.): it has been on m-c for a bit
Risk to taking this patch (and alternatives if risky): shouldn't be too bad
String or IDL/UUID changes made by this patch: none
Attachment #827058 - Flags: approval-mozilla-esr24?
Attachment #827058 - Flags: approval-mozilla-beta?
Attachment #827058 - Flags: approval-mozilla-aurora?
Assignee

Updated

6 years ago
Flags: needinfo?(continuation)
Assignee

Comment 66

6 years ago
(This also includes requesting approval for the various blocking bugs.)
Attachment #827058 - Flags: approval-mozilla-esr24?
Attachment #827058 - Flags: approval-mozilla-esr24+
Attachment #827058 - Flags: approval-mozilla-beta?
Attachment #827058 - Flags: approval-mozilla-beta+
Attachment #827058 - Flags: approval-mozilla-aurora?
Attachment #827058 - Flags: approval-mozilla-aurora+
Assignee

Comment 67

6 years ago
Ryan, I'll land these patches myself.  They require some tedious fixups for backporting.  Unfortunately, my patches are on a machine that is currently not responding to requests due to the Great MV Power Shutdown of 2013, but hopefully I can retrieve the patches in an hour or so.  If not, I'll start rebackporting.
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/70e625ea860f
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/50f1542687a6

Andrew, can you please prepare patches for b2g18 as well? Being a sec-high and tracking an ESR release, the fix needs to land there as well unless wontfixed.
Assignee

Comment 72

6 years ago
Oh, rats, I forgot that Zones didn't land until Fx22.  It isn't really going to be possible to backport this as is.  I'll have to think about that.  Maybe a per-JSRuntime cache will work.
pre-zone, why not just a per-compartment cache?
Assignee

Comment 74

6 years ago
(In reply to Boris Zbarsky [:bz] from comment #73)
> pre-zone, why not just a per-compartment cache?

Good idea.  I suppose two pointers per compartment won't break the bank.
Assignee

Comment 75

6 years ago
So, somehow or another way, on Esr24 I ended up only registering the new callbacks in XPCJSRuntime, which is bad (the other branches I've landed stuff for are post-CycleCollectedJSRuntime, and are ok).  Kyle, where should I register JS engine callbacks on workers, prior to your CycleCollectedJSRuntime changes?
Flags: needinfo?(khuey)
Assignee

Comment 76

6 years ago
Never mind, it looks like it is CreateJSContextForWorker.
Flags: needinfo?(khuey)
Assignee

Comment 78

6 years ago
Minor followup to register the zone callbacks on workers in ESR24...
https://hg.mozilla.org/releases/mozilla-esr24/rev/0457194b79d6
Bug file(s)? Test case(s)? Any other way to reproduce, test or verify?
Assignee

Comment 80

6 years ago
Not really, unfortunately.
Thanks Andrew. Marking qa- based on this info.
Whiteboard: [qa-]
Whiteboard: [qa-] → [qa-][adv-main26+][adv-esr24.2+]
Assignee

Comment 82

5 years ago
I think this late in the b2g18 cycle we shouldn't fix it there.  Given that I'd have to rewrite this patch from scratch, it would be risky, and I think it isn't as much of a problem on b2g18, as less things were converted to WebIDL there.
Group: core-security
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.