Closed Bug 1487660 Opened 2 years ago Closed 2 years ago

AddressSanitizer: heap-use-after-free [@ IsArray] with READ of size 4 with IndexedDB (again)

Categories

(Core :: Storage: IndexedDB, defect, P2)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla64
Tracking Status
firefox-esr60 63+ verified
firefox62 --- wontfix
firefox63 + verified
firefox64 + verified

People

(Reporter: decoder, Assigned: janv)

References

(Blocks 1 open bug)

Details

(5 keywords, Whiteboard: DWS_NEXT[post-critsmash-triage][adv-main63+][adv-esr60.3+])

Attachments

(1 file)

The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 63.0a1-20180830010250-https://hg.mozilla.org/mozilla-central/rev/e547a1a4ac86d4725d37a57821a1b2979cda9844.

For detailed crash information, see attachment.


This bug as the same signature as bug 1478573, but that bug is fixed for almost a month now and this was reported on a recent build.
Needinfo for :janv as he fixed bug 1478573.
Flags: needinfo?(jvarga)
The only difference I can see to the trace in bug 1478573 is that this bug here has "RemoveElementsAtUnsafe" on the free'd stack while the other doesn't. This might also be an inlining difference, so it might well be that this is the same bug and we didn't fix it correctly.
One question I have is whether we were overly naive about the structured clone.  Specifically, since we're cloning into the same global rather than a sandbox global, could we still end up being the victim of prototype manipulation or things like that?  needinfo-ing :sfink for his JS expertise (and already having had to deal with the static analysis anger about our cloning mechanism here!)  (Also, we'll forever involve him anytime we touch code related to this... :)

If we weren't naive about the clone stopping content JS from running, then the options that jump to mind are:
- There was a path that avoided the clone occurring at all.  Since it's really only the index loop that does this, and that triggers a clone, I'm not sure enough about that.

I think one mitigation we can absolutely do without understanding the exact nuances of this is to cause the IndexedDB API to hard-fail on re-entrancy.  We should create an RAII thing increments/decrements a counter that causes all of the calls to createIndex/deleteIndex/etc. to fail when we've got an AddOrPut()/other on the stack.  If that's not standards compliant, we'll fix the standard or just deviate.  It's a major footgun.
Flags: needinfo?(sphink)
Group: core-security → dom-core-security
(In reply to Andrew Sutherland [:asuth] from comment #5)
> One question I have is whether we were overly naive about the structured
> clone.  Specifically, since we're cloning into the same global rather than a
> sandbox global, could we still end up being the victim of prototype
> manipulation or things like that?

I'm not really up to speed on what's going on here, but perhaps I can answer some things.

Structured cloning is generally pretty resistant to prototype manipulation, because cloning does not clone the prototype chain at all. You get plain Objects out, unless you're serializing one of the various known special cases based on the presence of "internal slots" in the spec, for things like Date/Array/Map/Set, etc. -- see the list at https://www.w3.org/TR/html5/infrastructure.html#section-structuredserializeinternal and note that it also handles serializable platform objects. Bleh, the spec has changed yet again; now they're specifying the abstract thing that gets created with a [[Type]] internal slot and things.

needinfo-ing :sfink for his JS expertise
> (and already having had to deal with the static analysis anger about our
> cloning mechanism here!)  (Also, we'll forever involve him anytime we touch
> code related to this... :)

I'm pretty much the main developer on the internal cloning algorithm that is (at least for now) implemented in the spidermonkey source tree, so yeah, sadly I'm probably a good person to needinfo for cloning stuff.

> If we weren't naive about the clone stopping content JS from running, then
> the options that jump to mind are:

Can you explain what you mean by that? The serialization procedure totally allows code to run, since the spec explicitly says to call getters -- see the last step (currently step 24, specifically 24.4) of https://www.w3.org/TR/html5/infrastructure.html#section-structuredserializeinternal

But I'm guessing you meant that accessing the cloned value won't run content JS? I *think* that is correct for all JS objects. I can't speak for the platform objects implemented in Gecko.

> - There was a path that avoided the clone occurring at all.  Since it's
> really only the index loop that does this, and that triggers a clone, I'm
> not sure enough about that.
> 
> I think one mitigation we can absolutely do without understanding the exact
> nuances of this is to cause the IndexedDB API to hard-fail on re-entrancy. 
> We should create an RAII thing increments/decrements a counter that causes
> all of the calls to createIndex/deleteIndex/etc. to fail when we've got an
> AddOrPut()/other on the stack.  If that's not standards compliant, we'll fix
> the standard or just deviate.  It's a major footgun.

Sorry, I'll need you to explain what's going on here so I won't exhaust any of my half dozen remaining brain cells. All I can see here is that GetJSValFromKeyPathString is triggering a getter that removes a key, and something else is trying to access that key. At what point is the clone happening? The reported stack is too shallow for me to tell, unless I'm misreading it.

If this were about plain JS objects, disaster would be avoided by (1) enumerating all the [[GetOwnPropertyKeys]] in one step before fetching any of them, and then checking each one's existence before serializing it; and (2) having the clone procedure omit any getter/setter information so the deserialization is safe. But before I could think about whether an analogous tactic of procedure would fix this case, I'd have to understand what this case is.
Flags: needinfo?(sphink)
ni?(asuth) for comment 6
Flags: needinfo?(bugmail)
(In reply to Steve Fink [:sfink] [:s:] from comment #6)
> > If we weren't naive about the clone stopping content JS from running, then
> > the options that jump to mind are:
> 
> Can you explain what you mean by that? The serialization procedure totally
> allows code to run, since the spec explicitly says to call getters -- see
> the last step (currently step 24, specifically 24.4) of
> https://www.w3.org/TR/html5/infrastructure.html#section-
> structuredserializeinternal
> 
> But I'm guessing you meant that accessing the cloned value won't run content
> JS? I *think* that is correct for all JS objects. I can't speak for the
> platform objects implemented in Gecko.

Yes, this is exactly what I mean.

The scenario with IndexedDB is that when we put()/add() an object, we structured-clone-serialize it to disk, which inherently means getters will trigger.  The wrinkle is that IndexedDB supports indices that are specified as a traversal of the JS object.  These can also trigger the getters, which can result in semantic problems if the getter returns different values for each invocation.  This is why the spec explicitly calls for cloning the value at step 10 of put(): https://w3c.github.io/IndexedDB/#dom-idbobjectstore-put (but we had not yet implemented)

We implemented the clone step recently to help fix security bug 1478573 under the guise of fixing the correctness problem of not doing the defensive clone.  (There are indeed WPT tests that verify a getter will only be invoked once per put() regardless of the presence of indices).  However, it appears our fix was insufficient and I'm trying to understand all the bases.


The heart of the security bug is this: there was a getter that ran JS code that changed the IndexedDB schema in the middle of a put() operation.  The lack of the defensive structured clone made this extra bad because it meant there were multiple points that this mutation could occur, some with the index's key traversal on the stack, which made creating a user-after free very possible.  In the specific repro, an index was deleted out from under the existing iteration and processing, resulting in a use-after-free.


The heart of our fix was this: Implement the defensive structured clone so that content code only runs at a known, controlled point, and that's before we get into the index traversals which is where we go wrong.  This gives content one chance to do jerk things before we start eentraining state.  One wrinkle is that we were clever and we won't trigger the defensive clone if we think there are no indices (and/or no primary-key getter, because you also can have the primary key extracted from the object), so that could be what sinks us.

Note that in our fix we did not do anything to address the problem where it's okay for the IndexedDB schema to change with the put() on the stack.  I'm proposing here we make that super illegal, it's just that it happens that that paints a pretty big bulls-eye on the security issue.


So the heart of my question is/was: Is it possible that the defensive clone was not inert, and that our index traversals were in fact still happening on an object capable of triggering content code via getters.  It sounded like your answer to this was roughly: structured clone produces inert objects that won't have any getters to be invoked if the structured clone algorithm is run on the object a second time, this should be safe.

But then you also said "(2) having the clone procedure omit any getter/setter information so the deserialization is safe" which confused me again.  But I think you were just trying to do some blue-sky sketching about how this could work without a structured clone since I didn't provide you with enough context about what was happening.  But you get a needinfo again for closure :)

Thank you for your feedback and apologies for not providing more initial context.


In the meantime, :janv, I'm assigning this to you for clarity as well :)
Assignee: nobody → jvarga
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
Ugh, and we have the answer in bug 1489020.  It is a prototype thing.  Please see the repro test case on that bug.  I suggest hitting "details" rather than accidentally viewing the page and crashing your content process.

:sfink, should we be creating our own sandbox or alternate global to clone into then?  I've cc'ed you there.

Also, :jvarga, I super think we should do the thing where we prevent the re-entrancy.

I'm not sure if we should dupe to that bug or not given that it has the repro, etc.
Flags: needinfo?(sphink)
See Also: → 1489020
Wow, having a test case is so helpful.
Andrew, thank you for all the info.
(In reply to Andrew Sutherland [:asuth] from comment #8)
> But then you also said "(2) having the clone procedure omit any
> getter/setter information so the deserialization is safe" which confused me
> again.  But I think you were just trying to do some blue-sky sketching about
> how this could work without a structured clone since I didn't provide you
> with enough context about what was happening.  But you get a needinfo again
> for closure :)

Oops, sorry, my thunderbird-based secure bugmail filter seems to have fallen asleep again, and I didn't see any of these updates.

My (1) (2) were both a description of how "normal" structured clone avoids getting into trouble, as well as a sketch of how this might be fixed. But the problem seems to be more subtle than I was thinking, so I don't think what I said is relevant.

In fact, if I am understanding things correctly (highly unlikely!), then this doesn't seem to have anything to do with structured clone at all? Oh, I guess that's arguable, if you think of structured clones as being completely safe regardless of their environment.

The sequence I see is:

    set up a nasty getter on Object.prototype's getter for "id".
    add a row to a DB, calling AddOrPut https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/IDBObjectStore.cpp#1631
    that makes its way to GetAddInfo
    clone the value -- I don't think any weird getters fire here? One would if the data had a getter, but that's commented out in the (great) example. https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/IDBObjectStore.cpp#1561
    call GetAddInfo
    loop over indexes, calling AppendIndexUpdateInfo https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/IDBObjectStore.cpp#1597
    one of the indexes is on "id", but the data does not have an id, so search goes to the prototype
    call the nasty getter
    the getter deletes the index
    I guess that causes the IndexedDB::Key under consideration to be deleted?
    aKey.AppendItem is called on the deleted key?

I guess to my eye, this is sort of an iterator invalidation problem when processing indexes. Which is what you said way back in comment 5. I guess the structured clone is relevant here, since we're looking for ways that content JS could run when inserting into a DB. And the structured clone *does* prevent eg making the data value be a Proxy that intercepts the property [[Get]] and trashes the state (deleting the index or whatever).

Which is good, because my first thought was to change https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/KeyPath.cpp#120 to call JS_HasOwnProperty first. Sadly, JS_HasOwnUCProperty seems to be missing -- but you don't want to call anything resembling JS_GetProperty anyway, so you should probably replace JS_GetUCProperty with JS_GetOwnUCPropertyDescriptor as in https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/base/ChromeUtils.cpp#230 and then use its value().

That seems like a valid change independent of other hardening you might do. JS_GetProperty is implicitly "given a property name, look through prototypes for it, calling [[Has]] and getters", which is far from "safely fetch the value of a property off of an object that is known to be simple and safe despite potential utter chaos everywhere around it."

I'm not much of an expert on the JavaScript execution model, though. I can't *think* of other ways to interfere here that would still work despite operating on a structured clone, at least for just getting the value. Once you have the value, though, Object.prototype.toString and Object.prototype.valueOf both seem pretty scary. As is substituting a Proxy for Object.prototype, but I don't think you can do that anymore?

Oh, and I'm not sure what indexing on "__proto__" or that path "id.__proto__" would do. I *think* you'd be ok with JS_GetOwnUCPropertyDescriptor.

needinfoing someone who is an expert on this JS goop. Waldo: what I'm seeing right now is a user-controlled sequence of: JS_GetUCProperty (which should die), then JS_HasUCProperty and JS_NewPlainObject and JS_DefineUCProperty and JS_DeleteUCProperty, with property names being user-controlled (and so potentially '__proto__' and/or 'prototype'). Is it possible for this to avoid running content JS code, or do we need a sandbox global to create these in (or some other way of making this safe)?
Flags: needinfo?(sphink) → needinfo?(jwalden+bmo)
I messed up the stack -- GetAddInfo shows up twice. Importantly, GetAddInfo contains the loop over the indexes, which is *after* the clone. So AFAICT you won't clone more than once, so getters during cloning should be all done by the time you start retrieving indexed values.
Priority: -- → P2
After discussing it with Waldo, it seems like it should work as long as you stick to APIs that do not consider the proto chain. So:

  - Do the clone before retrieving any index values (already done)
  - Do not use JS_GetProperty or any of its variants; stick with JS_GetOwnUCPropertyDescriptor.
  - Do not use JS_HasUCProperty (it looks at the proto); use JS_AlreadyHasOwnUCProperty (except JS_GetOwnUCPropertyDescriptor should remove the need for this JS_HasUCProperty/JS_GetUCProperty pair anyway.)
  - JS_DefineUCProperty is ok.
  - JS_DeleteUCProperty should be ok if the logic requires the property you're deleting to be an existing own property. To be extra safe, you could do a JS_AlreadyHasOwnUCProperty check, I guess.

The spec seems to handle this stuff: "Assertions can be made in the above steps because this algorithm is only applied to values that are the output of StructuredDeserialize and only access "own" properties." 

By http://w3c.github.io/IndexedDB/#evaluate-a-key-path-on-a-value step 4 final substep, https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/KeyPath.cpp#120 is just plain wrong. (The spec says HasOwnProperty; the code calls JS_HasUCProperty.) That change alone *might* be enough to fix everything here, though it seems better to eliminate the redundant lookups and by fetching the descriptor.

The GetAddInfoCallback also has the potential to do weird stuff. I guess it's doing another structured clone. https://searchfox.org/mozilla-central/rev/dd965445ec47fbf3cee566eff93b301666bda0e1/dom/indexedDB/IDBObjectStore.cpp#457 But that's on an already-cloned value? (I'm getting a little lost.) If so, then structured clone only works with own properties, so it seems ok.
Flags: needinfo?(jwalden+bmo)
Whiteboard: DWS_NEXT
Thank you for all the information, it now looks trivial to fix.
Ok, JS_GetOwnUCPropertyDescriptor fixes the problem with calling proto getters, however in some cases the method doesn't match
the JS_HasUCProperty/JS_GetUCProperty pair.

I don't know if this is on purpose or a bug in JS_GetOwnUCPropertyDescriptor implementation, but if call the method for an object that looks like this:
let obj = {
  foo: { id: 7 }
};
then the method reports that the "foo" property doesn't exist

If I call it for:
let obj = {
  foo: 5
};
then the method works as expected and I can get the foo property and its value.

Here's how I call the method:
JS::Rooted<JS::PropertyDescriptor> desc(aCx);
JS_GetOwnUCPropertyDescriptor(aCx, obj, keyPathChars, &desc);
if (desc.object) {
  // property exists
  doSomething(desc.value());
} else {
  // property doesn't exist
}
Flags: needinfo?(sphink)
Flags: needinfo?(sphink)
I solved the issue described in comment 15. I have it almost working except JS_GetUCPropertyDescriptor doesn't play well with DOM blob/file objects. It always returns "undefined" when trying to access the "length" or "type" property.
It seems I have to special case it.
Ok, so I just need to check if the descriptor is an accessor, if there's a getter and eventually call the getter.
This will be fixed in bug 1492737.
Flags: needinfo?(jvarga)
(In reply to Jan Varga [:janv] from comment #15)
> Here's how I call the method:
> JS::Rooted<JS::PropertyDescriptor> desc(aCx);
> JS_GetOwnUCPropertyDescriptor(aCx, obj, keyPathChars, &desc);
> if (desc.object) {
>   // property exists
>   doSomething(desc.value());
> } else {
>   // property doesn't exist
> }

Oops, sorry. I tangled up my bugmail and didn't see this.

It sounds like you figured it out, but you want desc.object(), not desc.object. desc.object is a weird accessor function thing defined by MutableWrappedPtrOperations for Rooted. Though I can't explain your observed behavior; I would expect it to be a compile error, or unconditionally true (if it somehow resolved to a method pointer??).

I'll go look at bug 1492737.
(In reply to Steve Fink [:sfink] [:s:] from comment #19)
> (In reply to Jan Varga [:janv] from comment #15)
> > Here's how I call the method:
> > JS::Rooted<JS::PropertyDescriptor> desc(aCx);
> > JS_GetOwnUCPropertyDescriptor(aCx, obj, keyPathChars, &desc);
> > if (desc.object) {
> >   // property exists
> >   doSomething(desc.value());
> > } else {
> >   // property doesn't exist
> > }
> 
> Oops, sorry. I tangled up my bugmail and didn't see this.
> 
> It sounds like you figured it out, but you want desc.object(), not
> desc.object. desc.object is a weird accessor function thing defined by
> MutableWrappedPtrOperations for Rooted. Though I can't explain your observed
> behavior; I would expect it to be a compile error, or unconditionally true
> (if it somehow resolved to a method pointer??).
> 
> I'll go look at bug 1492737.

I actually used desc.object(), but the problem was that I was passing |keyPathChar| that was a substring and I guess it wasn't null terminated sometimes. I solved it by adding a |namelen| parameter to JS_GetOwnUCPropertyDescriptor.
Should we close this bug now jan?
Flags: needinfo?(jvarga)
Yes, closing.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(jvarga)
Resolution: --- → FIXED
Sounds like bug 1492737 needs to be uplifted to at least Beta? What about ESR60?
Depends on: 1492737
Flags: needinfo?(jvarga)
Target Milestone: --- → mozilla64
Group: dom-core-security → core-security-release
Yes, needs to be uplifted to Beta.
Flags: needinfo?(jvarga)
Unfortunately we don't have access to the testcase from bug 1489020 in order to reproduce this issue on old builds. Thus QA can't verify the fix in this bug. 
Can you help us with that Jan Varga? (if QA is needed to verify it).
Flags: needinfo?(jvarga)
(In reply to Bogdan Maris [:bogdan_maris], Release Desktop QA from comment #25)
> Unfortunately we don't have access to the testcase from bug 1489020 in order
> to reproduce this issue on old builds. Thus QA can't verify the fix in this
> bug. 
> Can you help us with that Jan Varga? (if QA is needed to verify it).

I cc'ed you on that bug.
Flags: needinfo?(jvarga)
Thanks. Marking it as qe-verify+
Flags: qe-verify+
Whiteboard: DWS_NEXT → DWS_NEXT[post-critsmash-triage]
Bug 1492737 is now landed across all branches.
Whiteboard: DWS_NEXT[post-critsmash-triage] → DWS_NEXT[post-critsmash-triage][adv-main63+][adv-esr60.3+]
I have managed to reproduce this crash using the test case from bug 1489020, on an affected asan Nightly build 2018-08-31.

I can no longer reproduce it on the latest asan builds under Ubuntu 16.04 x64: 60.3.0esr (20181016222912), 63.0 (20181016201042), and 64.0b2 (20181016172822).
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1489020
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.