Closed Bug 1573032 Opened 5 years ago Closed 4 years ago

Crash in [@ mozilla::dom::IDBCursor::Continue]

Categories

(Core :: Storage: IndexedDB, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1599420
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: philipp, Assigned: sg)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

Attachments

(2 obsolete files)

This bug is for crash report bp-92422d0a-a9af-4908-9361-d164e0190807.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::IDBCursor::Continue dom/indexedDB/IDBCursor.cpp:373
1 xul.dll static bool mozilla::dom::IDBCursor_Binding::_continue_ dom/bindings/IDBCursorBinding.cpp:552
2 xul.dll mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions> dom/bindings/BindingUtils.cpp:3181
3 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:540
4 xul.dll static bool Interpret js/src/vm/Interpreter.cpp:3089
5 xul.dll js::RunScript js/src/vm/Interpreter.cpp:425
6 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:568
7 xul.dll js::Call js/src/vm/Interpreter.cpp:611
8 xul.dll JS::Call js/src/jsapi.cpp:2659
9 xul.dll mozilla::dom::EventHandlerNonNull::Call dom/bindings/EventHandlerBinding.cpp:267

crash reports with this signature have been recorded since firefox 65, they seem to have become more frequent across multiple release channels starting three weeks ago.

some comments and submitted urls indicate that crashes would be happening while scrolling through twitter, but there are also multiple other domains showing up in crash stats.

Group: core-security → dom-core-security

Simon, do you think you can help take a look at this? Thank you.

https://wiki.mozilla.org/Security/Firefox_security_bug_fixing is a nice read regards dealing with a security bug.

Flags: needinfo?(sgiesecke)
Priority: -- → P1

(In reply to Olli Pettay [:smaug] from comment #1)

Hmm, raw pointer, and probably the comment isn't valid
https://searchfox.org/mozilla-central/rev/3366c3d24f1c3818df37ec0818833bf085e41a53/dom/indexedDB/IDBCursor.h#65-66

Is there a particular reason why you think the comment isn't valid? I also think that might be the case, so if you have the same thought it might be good to confirm this.

I found that:

  • The comment seems to imply that IDBCursor::mTransaction is owned either by IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex
  • If that were true, IDBCursor::mSourceIndex, IDBCursor::mSourceObjectStore, IDBIndex::mObjectStore and IDBObjectStore::mTransaction are all RefPtr's which are never modified. They could be const except for syntactical reasons (see Bug 1575173). So unless they are invalid from the time of construction, they cannot become invalid during the lifetime of an IDBCursor.
  • However, IDBCursor::mTransaction is not retrieved from the IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex, but from the mRequest, and IDBRequest::mTransaction indeed is modified. In particular,

I am not so familiar yet with this. Is the IDBRequest::mTransaction expected to be the same as the IDBTransaction that is owned either by IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex?

Flags: needinfo?(sgiesecke) → needinfo?(bugs)

I'm wrong person to ask really. Perhaps the comment as such is valid, but it isn't enough to guarantee that mTransaction doesn't point to a deleted object. At least the stack trace strongly hints it is mTransaction which is the deleted object.

Flags: needinfo?(bugs)

I'm wrong person to ask really (janv, asuth, ytausky would know better). Perhaps the comment as such is valid, but it isn't enough to guarantee that mTransaction doesn't point to a deleted object. At least the stack trace strongly hints it is mTransaction which is the deleted object.

The comment is probably correct, but we had a similar problem when we had to add a temporary strong ref to mTransaction. See bug 1502871.

FWIW, this doesn't look like kungfuDeathGrip type of variable would help (like in bug 1502871). JS has access to IDBCursor and uses it when the native object already has dead mTransaction.

I have no access to bug 1502871.

Jan, could you give me access to that and comment on the expectation?

(In reply to Simon Giesecke [:sg] [he/him] from comment #3)

I am not so familiar yet with this. Is the IDBRequest::mTransaction expected to be the same as the IDBTransaction that is owned either by IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex?

Flags: needinfo?(jvarga)

Also, why are there fields that are cycle-collected but not unlinked. e.g. here: https://searchfox.org/mozilla-central/source/dom/indexedDB/IDBCursor.cpp#871 There is a comment at https://searchfox.org/mozilla-central/source/dom/indexedDB/IDBCursor.cpp#889 implying this is done on purpose. but it does not explain the reason.

And, shouldn't the cycle collection be removed as well if there is no unlink? cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1575173

Assignee: nobody → htsai

Simon has been looking into this. Thanks!

Assignee: htsai → sgiesecke
Flags: needinfo?(bugmail)

Unfortunately I'm not sure why that was done. Jan might have a better idea. I think it's probably a fairly safe assumption invariant-wise that:

  1. The cycle collection choices were made assuming that IDB transactions and requests would always make forward progress and so their cycle collection wasn't necessarily a big concern.
  2. This stuff was probably not entirely thought out, or at least not super thoroughly updated each time. It's reasonable to revisit from first principles.

Adding needinfo to Jan who might have better historical information about what was happening. My research hit a brick wall on this for history purposes.

Flags: needinfo?(bugmail)

It's done that way because we need to keep alive the actor tree, so we don't want to unlink parent objects.
The cycle collector needs to know about the parent objects despite they won't be unlinked by us.
See also bug 786771.

Flags: needinfo?(jvarga)

(In reply to Simon Giesecke [:sg] [he/him] from comment #3)

(In reply to Olli Pettay [:smaug] from comment #1)

Hmm, raw pointer, and probably the comment isn't valid
https://searchfox.org/mozilla-central/rev/3366c3d24f1c3818df37ec0818833bf085e41a53/dom/indexedDB/IDBCursor.h#65-66

Is there a particular reason why you think the comment isn't valid? I also think that might be the case, so if you have the same thought it might be good to confirm this.

I found that:

  • The comment seems to imply that IDBCursor::mTransaction is owned either by IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex
  • If that were true, IDBCursor::mSourceIndex, IDBCursor::mSourceObjectStore, IDBIndex::mObjectStore and IDBObjectStore::mTransaction are all RefPtr's which are never modified. They could be const except for syntactical reasons (see Bug 1575173). So unless they are invalid from the time of construction, they cannot become invalid during the lifetime of an IDBCursor.
  • However, IDBCursor::mTransaction is not retrieved from the IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex, but from the mRequest, and IDBRequest::mTransaction indeed is modified. In particular,

I am not so familiar yet with this. Is the IDBRequest::mTransaction expected to be the same as the IDBTransaction that is owned either by IDBCursor::mSourceObjectStore or the IDBIndex::mObjectStore of the IDBCursor::mSourceIndex?

I am still missing feedback on this, in particular if my expectation in the last sentence is valid. That was actually the central question I had on this bug.

Flags: needinfo?(jvarga)

Yes, the transaction should be the same. The stack trace resembles bug 1502871. You may want to look at bug 1539219 too. I don't say it's the same issue, but you can get more info there how these things work.

Flags: needinfo?(jvarga)

I found that the address that is trying to be read is 0xe5e5e679, which looks suspiciously like an offset to the 0xe5e5e5e5 pattern that is written to deallocated memory, which could indicate that it is the IDBCursor object itself that is no longer valid, rather than the object pointed to by IDBCursor::mTransaction, when trying to call mTransaction->IsOpen(). However, it is not clear if the call to mTransaction->IsOpen() is inlined, so it might also have happened with IDBTransaction::IsOpen(). Is there a way to retrieve the disassembly of the code at the offending location?

We are in the middle of the onsuccess handler to the response received for an IDBCursor.open or IDBCursor.continue/advance operation. Unfortunately, I cannot see from the crash report what JS code this handler was running. Is there a way to get this information?

Flags: needinfo?(bugmail)

C++ disassembly, yes, but it involves manual steps. See my recent tweet at https://twitter.com/asutherland/status/1184583950573940742 for general context. You'd want to download the minidump by having crash-stats private data access bits set. Because the builds were public, there's no need to download the symbols manually because the tool in question knows how to consult the mozilla symbol server and it will do it by default. (So just use the single-argument form that takes the minidump as the sole argument. If you specify a symbol directory path it will inhibit automatic symbol fetching from the mozilla symbol server.)

JS backtrace: Not reliably. The raw minidump includes some stack memory excerpts and some other tiny memory excerpts, but we don't attempt to perform a JS backtrace when we encounter crashes[1], and I think you would need to manually extract any such information. (But I think the file/function names would be on the heap and what was on the stack would only be pointers into that data we don't have.)

1: We could potentially file a bug on this or find an existing bug and attempt to escalate, but we are unlikely to want to get JS backtraces of content because of privacy concerns. Backtraces of Chrome JS could be fine, but that's usually not helpful for IndexedDB.

Flags: needinfo?(bugmail)
Flags: needinfo?(sgiesecke)

You marked this as sec-high above, but I don't think this is exploitable. As said in https://bugzilla.mozilla.org/show_bug.cgi?id=1573032#c15, the address trying to be dereferenced is an offset from the 0xe5e5e5e5 pattern whose purpose is to make such errors non-exploitable. I think this should be reclassified as sec-other.

Flags: needinfo?(bugs)

Pretty sure we classify this kind of crashes usually as sec-high - accessing deleted object is bad.

Flags: needinfo?(bugs)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #16)

C++ disassembly, yes, but it involves manual steps. See my recent tweet at https://twitter.com/asutherland/status/1184583950573940742 for general context.

You'd want to download the minidump by having crash-stats private data access bits set.

I am not sure what this means. I do not seem to have the necessary permissions to download the raw minidump. How do I get those?

Flags: needinfo?(sgiesecke) → needinfo?(bugmail)

I was now able to get the minidump and it shows this:

[simon@sigibln rust-minidump]$ target/debug/get-minidump-instructions ~/work/mozilla-unified/dumps/04ae0751-2a42-40a0-9b0e-4859c0191028.dmp 
Faulting instruction pointer: 0x7fff324cfb45
https://hg.mozilla.org/releases/mozilla-beta/annotate/d5a8eb9b5da997aa995a4aa6199ba37bbb581063/dom/indexedDB/IDBCursor.cpp#l373
  373   if (!mTransaction->IsOpen()) {
   7fff324cfb45:	8b 85 f0 00 00 00    	mov eax, dword ptr [rbp + 0xf0]
^^^^^^^^^^^^^^^^	^^^^^^^^^^^^^^^^^^^^^	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   7fff324cfb4b:	85 c0                	test eax, eax
   7fff324cfb4d:	74 0e                	je 0x7fff324cfb5d
   7fff324cfb4f:	83 f8 01             	cmp eax, 1
   7fff324cfb52:	75 7d                	jne 0x7fff324cfbd1
   7fff324cfb54:	80 bd f8 00 00 00 00 	cmp byte ptr [rbp + 0xf8], 0
   7fff324cfb5b:	74 66                	je 0x7fff324cfbc3
  378   if (IsSourceDeleted() || !mHaveValue || mContinueCalled) {
   7fff324cfb5d:	41 8b 86 28 01 00 00 	mov eax, dword ptr [r14 + 0x128]

This code apparently results from inlining IDBTransaction::IsOpen (https://searchfox.org/mozilla-central/source/dom/indexedDB/IDBTransaction.cpp#442). rbp is 0xe5e5e5e5. Unfortunately you don't see the origin of this value in this disassembly fragment, but since 0xf8 is the offset of mReadyState within IDBTransaction in my build (it's off by 8 bytes actually, but I suspect this is due to me having a debug build and the dump being from a release build), this means that the mTransaction member of this (an IDBCursor) has been overwritten with 0xe5e5e5e5, i.e. the IDBCursor itself has been deleted already. The cause is not that the referenced mTransaction has been deleted (the current transaction might have been deleted as well, but this is irrelevant to the problem).

Still, I don't know what to do with this information. From the dump, one can only see that an IDBCursor.Continue operation is called from a onsuccess handler for some IDBCursor request, which is a very common pattern. Usually, this would be the same cursor, but we don't know if this is the case here. In any case, the JS runtime calls a bound method on a deleted IDBCursor object. I don't know if this is anything that is under control of the dom/indexedDB code. It might result from a state corruption that is not related to IndexedDB at all. However, I have no idea how to proceed to reproduce this without knowing which JS code is being executed.

Flags: needinfo?(bugmail)

We have URLs only in ca. 10% of the cases, but several of them involve twitter, and there is also one user comment which says that it happened when just looking at the tweet feed. I tried the Nightly ASAN build and played around with twitter, but wasn't able to reproduce it. The number of occurrences is also very low, so this probably only happens under very specific circumstances.

I was thinking it might have to do with some extension, but I didn't find one that seemed generally uncommon but common to the crash reports on the other hand.

Also note that that there are different crash signatures (different source code lines within IDBCursor::Continue), but they might still have the same cause, depending on whether the deallocated IDBCursor instance has been reclaimed in the meantime.

Status: NEW → ASSIGNED
Flags: needinfo?(ttung)

Maybe sync XHR is causing this somehow ?

Re: "Unfortunately you don't see the origin of this value in this disassembly fragment", I reverted https://github.com/luser/rust-minidump/commit/aaa0e8cff6d4ec63cbdccd2a1b4df796dec35b24 locally in order to get the disassembly of the whole thing.

A syncloop via XHR does seem like a viable way to try and violate expected lifecycle guarantees.

Flags: needinfo?(bugmail)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #25)

Re: "Unfortunately you don't see the origin of this value in this disassembly fragment", I reverted https://github.com/luser/rust-minidump/commit/aaa0e8cff6d4ec63cbdccd2a1b4df796dec35b24 locally in order to get the disassembly of the whole thing.

Yes, I did something equivalent to that too.

A syncloop via XHR does seem like a viable way to try and violate expected lifecycle guarantees.

ttung and me tried but were not able to reproduce the error.

ttung suggested that this might have something to do with errors in cycle collection.

I am unassigning myself from this, but I would be happy to pair with whoever has ideas on how to approach this further.

Assignee: sgiesecke → nobody
Status: ASSIGNED → NEW

(In reply to Simon Giesecke [:sg] [he/him] from comment #26)

ttung suggested that this might have something to do with errors in cycle collection.

Can you please sync up with smaug or mccr8 about this potential?

Flags: needinfo?(ttung) → needinfo?(sgiesecke)

(In reply to Andrew Overholt [:overholt] from comment #27)

(In reply to Simon Giesecke [:sg] [he/him] from comment #26)

ttung suggested that this might have something to do with errors in cycle collection.

Can you please sync up with smaug or mccr8 about this potential?

Simon discussed it with me and I will ask them about this. Redirect the ni to myself.

Flags: needinfo?(sgiesecke) → needinfo?(ttung)

(In reply to Tom Tung [:tt, :ttung] from comment #28)

Simon discussed it with me and I will ask them about this. Redirect the ni to myself.

I suggested Simon maybe investigating in the direction of CC, refcounting of IDBCursor, but I think I found something after I looking into the reports deeper.

I read one of the crash reports on Nightly (https://crash-stats.mozilla.org/report/index/04ae0751-2a42-40a0-9b0e-4859c0191028) and opened the mini dump. And I found:
PBackgroundCursorChild::mTransaction mRawPtr=NULL note that it should be held here [1; crash stack 24].
PBackgroundCursorChild::mCursor mRawPtr=NULL.

Also, because from the crashing memory addresses for all the member variables of the IDBCursor are 0xe5e5e5e5e5 (poison) (e.g. IDBCursor::mTransaction, IDBCursor::mBackgroundActor, ...). I assume IDBCursor had been deleted before it crashed. From this point, I can assume IDBCurosr::mStrongCursor wasn't set with the expected mCursor.

Thus, I'm suspecting either BackgroundCursorChild::ActorDestroy() or BackgroundCursorChild::SendDeleteMeInternal() is called before BackgroundCursorChild::SendContinueInternal() is called (because this is the only place where we set the mStrongCursor) .

I'll keep looking into this.

[1] https://searchfox.org/mozilla-central/rev/a78233c11a6baf2c308fbed17eb16c6e57b6a2ac/dom/indexedDB/ActorsChild.cpp#3716

I found out:

There might be multiple issues invlove so we probably need to fix that one by one.

To verify my thoughts, we probably want to ensure that by changing the MOZ_ASSERT_IF for mCursor and mStrongCursor to MOZ_DIAGNOSTIC_ASSERT_IF.

Also, it would be great if we can add flags to understand if one of BackgroundCursorChild::ActorDestroy() or BackgroundCursorChild::SendDeleteMeInternal() is called.

One more thing I found is that the crash rate is dropped a bit after Nov. That probably indicate Simon's works on IDBCursor might fix or reduce the original issues.

Flags: needinfo?(ttung)

I think this was a flaw in the old/original cursor preloading implementation and it was probably copied from there. Take a look at the old patch Part 3:
https://bug1168606.bmoattachments.org/attachment.cgi?id=8627381

|nsRefPtr<IDBCursor> newCursor| used to be placed in the main scope (the same scope in which DispatchSuccessEvent is called), so it was ok because the new cursor was kept alive until DispatchSuccessEvent called ResultHelper::GetResult which called GetOrCreateDOMReflector which created a DOM binding for that so it took over the ownership of the new cursor object.

However, the original crash reported in comment 0 was probably a different issue.

(In reply to Jan Varga [:janv] from comment #32)

However, the original crash reported in comment 0 was probably a different issue.

Or maybe not, patches for cursor preloading started landing roughly 4 months ago and this bug was filed 4 months ago too.

(In reply to Jan Varga [:janv] from comment #31)

Yeah, but the crash also happens on esr68 and even earlier. So I wonder if there are multiple causes.

There might be other places with similar or different issues which result in this crash signature, but I think it's clear now that the newCursor RefPtr is misplaced and we can fix that right away.

(In reply to Jan Varga [:janv] from comment #35)

There might be other places with similar or different issues which result in this crash signature, but I think it's clear now that the newCursor RefPtr is misplaced and we can fix that right away.

Yes, this should be fixed, and I already have a patch prepared for that. But I am not sure if this is really related to this bug.

Assignee: nobody → sgiesecke

Comment on attachment 9111558 [details]
Bug 1573032 - Added assertions to diagnose crashes. r=#dom-workers-and-storage

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Unsure. On the one hand, the patch doesn't fix the original issue, but merely increases the chance that it will be caught earlier by diagnostic assertions. Since we have been unable to reproduce the issue, we hope to get more information from this.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?: esr68, 70, 71
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Technically, it would probably be easy to do that. However, without change the diagnostic assertions to release assertions, this wouldn't have any effect. If we find the original cause on Nightly using this patch, we need to see then how hard it is to backport the patch. If this doesn't work out, maybe because the issue was resolved by other recent changes on Nightly, we might actually need to backport and enable release assertions to find the original cause.
  • How likely is this patch to cause regressions; how much testing does it need?: Pending final review, this is not likely. Theoretically, the additional assertions might break rare but valid use cases.
Attachment #9111558 - Flags: sec-approval?

Andrew, can you please land the patch D54703 provided you are ok with it and we get sec-approval?

Flags: needinfo?(bugmail)

Yes. (Clearing needinfo so I can be re-needinfo'ed if needed.)

Flags: needinfo?(bugmail)

(In reply to Simon Giesecke [:sg] [he/him] (PTO until Jan 5th) from comment #17)

the address trying to be dereferenced is an offset from the 0xe5e5e5e5 pattern whose purpose is to make such errors non-exploitable.

If only that worked! The actual purpose is to make Firefox crash more often in recognizable ways so we can fix these bugs. If we didn't clear memory using this pattern then a use-after-free might reference the previous dead object, with fairly reasonable values. The program might continue completely unhindered, or it might carry on for a while and then crash somewhere else mysteriously far from the actual problem. The exploitable case is when an attacker can cause something "useful" to be written to that memory before the use-after-free. That would overwrite our 0xe5e5e5e5 pattern so there's no actual protection provided. When we see the 0xe5e5.... pattern we know there hasn't been an overwrite before the re-use, but that doesn't tell us whether it can happen or not. Since we've often been proved wrong when we assume it's not possible it's better to just fix them.

Comment on attachment 9111558 [details]
Bug 1573032 - Added assertions to diagnose crashes. r=#dom-workers-and-storage

Withdrawing sec-approval request, as the underlying issue appears to have been solved by other changes.

Attachment #9111558 - Flags: sec-approval?
Attachment #9111558 - Attachment is obsolete: true

Comment on attachment 9105795 [details]
Bug 1573032 - Use CheckedUnsafePtr for IDBCursor::mTransaction to improve safety r=ttung

Revision D51411 was moved to bug 1608115. Setting attachment 9105795 [details] to obsolete.

Attachment #9105795 - Attachment is obsolete: true

There are no new reports from 72 beta/release since the beginning of December, so this has probably been solved by other changes, probably from Bug 1599420.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE

Should we backport the changes from 1599420 to FF71?

Flags: needinfo?(lhenry)

No, since 72 was just released a couple of days ago, we aren't making any more changes to 71.
But we might want to backport to esr68.

Flags: needinfo?(lhenry) → needinfo?(sgiesecke)

I think the issue from esr68 (which is much more low volume than on FF71) must be a very different issue, if the assumption that Bug 1599420 solved this is correct, since esr68 didn't have the code that was fixed by Bug 1599420 at all. Is it possible to create a new bug for this crash signature for esr68 only?

Flags: needinfo?(sgiesecke)
Group: dom-core-security → core-security-release

OK, I filed 1608580 as the new bug to track ESR crashes. Thanks!

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: