Closed Bug 795281 Opened 12 years ago Closed 12 years ago

Read after free in nsXPCWrappedJS::Release()

Categories

(Toolkit :: Safe Browsing, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 19
Tracking Status
firefox17 + fixed
firefox18 + fixed
firefox19 + fixed
firefox-esr10 --- wontfix

People

(Reporter: jseward, Assigned: gcp)

References

Details

(Keywords: csectype-uaf, sec-moderate, valgrind, Whiteboard: [adv-track-main17-])

Attachments

(1 file, 1 obsolete file)

I've seen this twice now, on m-c startups on Android (4.0).  Sorry for no
line numbers.  I will try to get a line-numberised version.

I couldn't figure out what Component to select.

Thread 33:
Invalid read of size 4
   at 0x256A5922: nsXPCWrappedJS::Release() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A6AF8F: nsXPTCStubBase::Release() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2579531D: UrlClassifierUpdateObserverProxy::Release() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25159D15: nsRefPtr<nsCaret>::assign_with_AddRef(nsCaret*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2578F613: nsUrlClassifierDBServiceWorker::FinishUpdate() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2500DDB9: nsRunnableMethodImpl<void (nsPACMan::*)(), true>::Run() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A5CB97: nsThread::ProcessNextEvent(bool, bool*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A32F75: NS_ProcessNextEvent_P(nsIThread*, bool) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A5C98F: nsThread::ThreadFunc(void*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x24D00C47: _pt_root (ptthread.c:156)
   by 0x482418F: __thread_entry (pthread.c:217)
   by 0x4823CC7: pthread_create (pthread.c:357)
   by 0xFFFFFFFF: ???
 Address 0x20a7b600 is 32 bytes inside a block of size 60 free'd
   at 0x4806184: free (vg_replace_malloc.c:446)
   by 0x24C10673: moz_free (mozalloc.cpp:51)
   by 0x256A6573: nsXPCWrappedJS::~nsXPCWrappedJS() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x256A593D: nsXPCWrappedJS::Release() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2554D95F: nsProxyReleaseEvent::Run() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A5CB97: nsThread::ProcessNextEvent(bool, bool*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A32F75: NS_ProcessNextEvent_P(nsIThread*, bool) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2592B369: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A87B2D: MessageLoop::RunInternal() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A87B3B: MessageLoop::RunHandler() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x25A87C3B: MessageLoop::Run() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2588CC69: nsBaseAppShell::Run() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x2577A4C9: nsAppStartup::Run() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x24FAE369: XREMain::XRE_mainRun() (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x24FB0821: XREMain::XRE_main(int, char**, nsXREAppData const*) (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
   by 0x24FB0A39: XRE_main (in /data/data/org.mozilla.fennec_sewardj/cache/libxul.so)
And now with line numbers:

Thread 33:

Invalid read of size 4
   at 0x30083922: nsXPCWrappedJS::Release() (ff-opt/js/xpconnect/src/../../../dist/include/nsISupportsImpl.h:268)
   by 0x30448F8F: nsXPTCStubBase::Release() (xpcom/reflect/xptcall/src/xptcall.cpp:33)
   by 0x3017331D: UrlClassifierUpdateObserverProxy::Release() (ff-opt/toolkit/components/url-classifier/../../../dist/include/nsCOMPtr.h:408)
   by 0x2FB37D15: nsRefPtr<nsCaret>::assign_with_AddRef(nsCaret*) (xpcom/base/nsAutoPtr.h:862)
   by 0x3016D613: nsUrlClassifierDBServiceWorker::FinishUpdate() (ff-opt/toolkit/components/url-classifier/../../../dist/include/nsCOMPtr.h:622)
   by 0x2F9EBDB9: nsRunnableMethodImpl<void (nsPACMan::*)(), true>::Run() (ff-opt/xpcom/threads/../../dist/include/nsThreadUtils.h:349)
   by 0x3043AB97: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:612)
   by 0x30410F75: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:220)
   by 0x3043A98F: nsThread::ThreadFunc(void*) (xpcom/threads/nsThread.cpp:256)
   by 0x2F1A5C47: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:156)
   by 0x482418F: __thread_entry (/home/sewardj/AND4-xxray/bionic/libc/bionic/pthread.c:217)
   by 0x4823CC7: pthread_create (/home/sewardj/AND4-xxray/bionic/libc/bionic/pthread.c:357)
   by 0xFFFFFFFF: ???

 Address 0x21000b20 is 32 bytes inside a block of size 60 free'd
   at 0x4806184: free (coregrind/m_replacemalloc/vg_replace_malloc.c:446)
   by 0x226BE673: moz_free (memory/mozalloc/mozalloc.cpp:51)
   by 0x30084573: nsXPCWrappedJS::~nsXPCWrappedJS() (ff-opt/js/xpconnect/src/../../../dist/include/mozilla/mozalloc.h:224)
   by 0x3008393D: nsXPCWrappedJS::Release() (js/xpconnect/src/XPCWrappedJS.cpp:197)
   by 0x2FF2B95F: nsProxyReleaseEvent::Run() (ff-opt/xpcom/build/nsProxyRelease.cpp:19)
   by 0x3043AB97: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:612)
   by 0x30410F75: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:220)
   by 0x30309307: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (ipc/glue/MessagePump.cpp:82)
   by 0x30465B2D: MessageLoop::RunInternal() (ipc/chromium/src/base/message_loop.cc:208)
   by 0x30465B3B: MessageLoop::RunHandler() (ipc/chromium/src/base/message_loop.cc:201)
   by 0x30465C3B: MessageLoop::Run() (ipc/chromium/src/base/message_loop.cc:175)
   by 0x3026AC69: nsBaseAppShell::Run() (widget/xpwidgets/nsBaseAppShell.cpp:163)
   by 0x301584C9: nsAppStartup::Run() (toolkit/components/startup/nsAppStartup.cpp:290)
   by 0x2F98C369: XREMain::XRE_mainRun() (toolkit/xre/nsAppRunner.cpp:3782)
   by 0x2F98E821: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3848)
   by 0x2F98EA39: XRE_main (toolkit/xre/nsAppRunner.cpp:3923)
   by 0x2F992BF7: GeckoStart (toolkit/xre/nsAndroidStartup.cpp:73)
   by 0x223C6C5D: Java_org_mozilla_gecko_GeckoAppShell_nativeRun (mozglue/android/APKOpen.cpp:981)
   by 0x540EBF3: dvmPlatformInvoke (/home/sewardj/AND4-xxray/dalvik/vm/arch/arm/CallEABI.S:258)
Guessing xpconnect.
Component: General → XPConnect
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #2)
> Guessing xpconnect.

Only as much as a null deref of an nsCOMPtr is an XPCOM bug. I'm pretty sure this is a bug in the url classifier.
Component: XPConnect → Phishing Protection
Product: Core → Firefox
Guessing url classifier as being part of Firefox -> Phishing Protection then.
lowering to sec-moderate on the theory that users can't directly control when the URLClassifier does it's thing.
I am a bit concerned about this, not because it is exploitable necessarily,
but because I suspect that Valgrind is complaining about the "--mRefCnt;" 
and I suspect also that V's duplicate-error-removal machinery causes it to
detect but not report the write to the freed block caused by the --mRefCnt.

Hence I am concerned that this is modifying the block and could be a potential
crasher.
Speaking from a position of complete ignorance about this part of the
FX code base .. one reason I have a bad feeling about this is that it
looks like there's more than one thread involved.  That is, it looks
like the block was freed by the main thread, but is later read by the
URL classifier thread?  Hence perhaps there is a threading issue as
well / instead of a memory management issue.

Just guessing here, you understand.
I wonder if it's related to bug 798778. 

    mClassifier->MarkSpoiled(mUpdateTables);
  }
  mUpdateObserver = nullptr;

The last line is the suspect one in this bug, and the one above is crashing in that bug (rather mysteriously so).
nsProxyReleaseEvent::Run is in the trace. That's interesting, because it's only used here:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#840

which doesn't have anything to do with UrlClassifierUpdateObserverProxy::Release

They're consecutive in memory:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#175

Is something else entirely just scribbling over them?
(In reply to Gian-Carlo Pascutto (:gcp) from comment #9)

If I had to guess (with an emphasis on the word guess, btw), I'd say
this is some kind of refcounting snafu that has lead to the same
object being Release()d one too many times.  The reason I say that is
that nsXPCWrappedJS::Release() appears in both stacks.  It makes more
sense if you understand that the second stack happened first.

So we have some object at address 0x21000b20-32 being freed ..

   at 0x4806184: free (coregrind/m_replacemalloc/vg_replace_malloc.c:446)
   by 0x226BE673: moz_free (memory/mozalloc/mozalloc.cpp:51)
   by 0x30084573: nsXPCWrappedJS::~nsXPCWrappedJS()
                  (ff-opt/js/xpconnect/src/../../../dist/include/mozilla/mozalloc.h:224)
   by 0x3008393D: nsXPCWrappedJS::Release() (js/xpconnect/src/XPCWrappedJS.cpp:197)
   by 0x2FF2B95F: nsProxyReleaseEvent::Run() (ff-opt/xpcom/build/nsProxyRelease.cpp:19)
   by 0x3043AB97: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:612)

and later we're back poking at it again, in nsXPCWrappedJS::Release().
This time we don't run the destructor nsXPCWrappedJS::~nsXPCWrappedJS(),
which might be because the refcount has gone to -1 and the destructor
only gets run on a 1->0 refcount transition:

Invalid read of size 4 (at 0x21000b20)
   at 0x30083922: nsXPCWrappedJS::Release()
                  (ff-opt/js/xpconnect/src/../../../dist/include/nsISupportsImpl.h:268)
   by 0x30448F8F: nsXPTCStubBase::Release()
                  (xpcom/reflect/xptcall/src/xptcall.cpp:33)
   by 0x3017331D: UrlClassifierUpdateObserverProxy::Release()
                  (ff-opt/toolkit/components/url-classifier/../../../dist/include/nsCOMPtr.h:408)
   by 0x2FB37D15: nsRefPtr<nsCaret>::assign_with_AddRef(nsCaret*) (xpcom/base/nsAutoPtr.h:862)
   by 0x3016D613: nsUrlClassifierDBServiceWorker::FinishUpdate()
                  (ff-opt/toolkit/components/url-classifier/../../../dist/include/nsCOMPtr.h:622)
(In reply to Gian-Carlo Pascutto (:gcp) from comment #8)
> I wonder if it's related to bug 798778. 
> 
>     mClassifier->MarkSpoiled(mUpdateTables);
>   }
>   mUpdateObserver = nullptr;
> 
> The last line is the suspect one in this bug, and the one above is crashing
> in that bug (rather mysteriously so).

I think that's consistent with the analysis above.  The first stack
(viz, the later-occurring one) contains assign_with_AddRef, which
might be

>   mUpdateObserver = nullptr;

and V is saying this causes a call to Release, but the object has
already been Release-to-zero'd before.  Implication is that
mUpdateObserver has already been freed at this point.  It follows that
its use in the conditional above ..

  if (NS_SUCCEEDED(mUpdateStatus)) {
    LOG(("Notifying success: %d", mUpdateWait));
    mUpdateObserver->UpdateSuccess(mUpdateWait);
  } else {
    LOG(("Notifying error: %d", mUpdateStatus));
    mUpdateObserver->UpdateError(mUpdateStatus);
    /*
     * mark the tables as spoiled, we don't want to block hosts
     * longer than normal because our update failed
    */
    mClassifier->MarkSpoiled(mUpdateTables);
  }

.. is dangerous because it is free.

Could it be that instead of crashing at 

    mClassifier->MarkSpoiled(mUpdateTables);

it is actually crashing at the line before, viz

    mUpdateObserver->UpdateError(mUpdateStatus);

?
(Speculating wildly now ..) my impression is that an event to
delete the object (by some other thread) was added to the event
queue, and that got processed ...

  nsThread::ProcessNextEvent(bool, bool*)
    --> nsProxyReleaseEvent::Run()
        --> nsXPCWrappedJS::Release()

but the thread that added that event used the object after it 
added the event.
I understand something now; I couldn't understand how that nsProxyReleaseEvent got into the stack without NS_ProxyRelease being called on any relevant object, but indeed it's nsXPCWrappedJS that's calling that, to make sure the release happens in the main thread.

This is being passed down from nsUrlClassifierStreamUpdater::DownloadUpdates, which is passing "this" there. So probably this nsUrlClasssifierStreamUpdater gets deleted before it's time. That would be consistent with the crash being near the UpdateError/MarkSpoiled path.
So, an nsUrlClassifierStreamUpdater gets created as a locally scoped JavaScript variable here:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/content/listmanager.js#417

gets passed to nsUrlClassifierDBService here: 

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp#199

Turned into a proxy to the main thread here:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1360

and back to an nsCOMPtr here in the worker thread:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#458

Could it be that the call to
nsUrlClassifierDBService::BeginUpdate(nsIUrlClassifierUpdateObserver *observer,
which takes a simple pointer, fails to properly get a reference added to nsUrlClassifierStreamUpdater when the latter is passing itself down like this?
  rv = mDBService->BeginUpdate(this, aRequestTables, aClientKey); 

This could cause the reference going out of scope in listmanger.js to allow GC to collect the nsUrlClassifierStreamUpdater object, which in turn would cause a failure when mUpdateObserver = nsnull is called.

What I can't jive with this reasoning is that in comment 10, the second trace should correspond to the = nsnull assignment, which means the first trace would be the GC. But the trace contains nsProxyReleaseEvent, which from reading nsXPCWrappedJS::Release, can only be on the stack if the request to Release() the object happened NOT on the main thread.

Is this something the GC can do? 

Both frees in the stack trace look to me like they originate from elsewhere than the main thread, which confuses me as it somehow suggests that both are happening in nsUrlClassifierDBServiceWorker.
Actually nope, that nsIUrlClassifierUpdateObserver* gets used to construct the Proxy, still in the main thread, and that Proxy immediately puts a reference in an nsCOMPtr:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierProxies.h#226
Note that I have patches in bug 773610 that might be relevant here. See in particular:

https://github.com/bholley/mozilla-central/commit/a816a1a3fe235f33060787ab43df93a82014f72c
Very interesting. The description in the patch for bug 771074 says:

>Classes like XPCWrappedJS are main-thread-only, which means that it is
>forbidden to call methods on instances of these classes off the main thread.
>For various reasons (see bug 771074), this restriction recently began to
>apply to AddRef/Release as well.

And:
https://bugzilla.mozilla.org/show_bug.cgi?id=771074#c0

>Off-main-thread Release is easy to handle, because we can just detect it and convert it into a proxied 
>release. But AddRef has to be synchronous, so our only option in the status quo is to acquire a lock. This, 
>in turn, requires _all_ the XPConnect cycle collector machinery to be lock-protected, which is undesirable.

However, your other patches seem to be about removing this lock. So this implies that the lock is still there and the code should work?

The AddRef off the main thread is happening here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#458

If this, for some reason, doesn't take effect, then it would explain this bug.
Also, when you say "this restriction recently began to apply to AddRef/Release as well", what patch (in what release?) did this?
(In reply to Gian-Carlo Pascutto (:gcp) from comment #18)
> Also, when you say "this restriction recently began to apply to
> AddRef/Release as well", what patch (in what release?) did this?

It hasn't happened yet. Ideally things should still work with the lock. But the vestigial threading code in XPConnect is pretty arcane, so I wouldn't really be surprised if it were broken...
I made some progress here. :decoder confirmed that bholley's patch removes his crash in 750988, but he crashes elsewhere. I investigated that and the problem is that one of the Proxies is using NS_NewRunnableMethod which surely isn't using bholley's new main-thread-addref.

I'm going to prepare a combination of bholley's patch + fix for this, and then hopefully sewardj can check if this still reproduces.
Christian and Julian, can you try this patch in your testcases and see if the problems are gone?
Attachment #671785 - Flags: feedback?
Attachment #671785 - Attachment is obsolete: true
Attachment #671785 - Flags: feedback?
Attachment #671812 - Flags: feedback?(jseward)
Attachment #671812 - Flags: feedback?(choller)
This is a sec-moderate bug, and we're not sure it's exploitable. Does this really need to be tracked for release?
(In reply to Gian-Carlo Pascutto (:gcp) from comment #22)
> Patch 1. v2 Make sure AddRef happens on the main thread.

It appears to fix it for me, yes, on two V test runs.

This is hardly a good test for the presence/non-presence of threading
bugs, since the visibility or otherwise of the use-after-free depends
entirely on the relative rates of progress for the two threads
involved.  But it's better than nothing, I guess.
Attachment #671812 - Flags: feedback?(jseward) → feedback+
(In reply to Alex Keybl [:akeybl] from comment #23)
> This is a sec-moderate bug, and we're not sure it's exploitable. Does this
> really need to be tracked for release?

Per comment 6 last para and comment 8 first line, I am not particularly
concerned about exploitability here, but I am concerned that this could be a
cause of unreproducible (timing-related) crashes.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #26)
> I suspect this may be the root cause for 3 other bugs:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=750988
> https://bugzilla.mozilla.org/show_bug.cgi?id=727947
> https://bugzilla.mozilla.org/show_bug.cgi?id=798778
> 
> The last one is a topcrasher.

Thanks gcp. After fixing, please nominate for uplift asap.
Assignee: nobody → gpascutto
Whiteboard: [fixes top crash 798778?]
Comment on attachment 671812 [details] [diff] [review]
Patch 1. v2 Make sure AddRef happens on the main thread.

Julian reported good results. If I understood Christian correctly, he got good results too.

From Bob Holley's comments, this code is actually still supposed to work correctly, but it's complicated enough he's got bad feelings about it (comment 19), and is working to change it so it certainly works correctly.

Myself, after applying Bob's original patch, get a reproducible crash (in code that should "in theory" work) that goes away when I extend Bob's fix to the remaining parts of this code.

Based on this, I think we want to take this patch.
Attachment #671812 - Flags: review?(bobbyholley+bmo)
Attachment #671812 - Flags: review?(bobbyholley+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4d95f422871

Let's see what this does to the crash stats and aggressively uplift if they're looking good.
https://hg.mozilla.org/mozilla-central/rev/b4d95f422871
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 19
Bug 798778 is likely caused by bug 797302, not this one.
Whiteboard: [fixes top crash 798778?]
(In reply to Gian-Carlo Pascutto (:gcp) from comment #31)
> Bug 798778 is likely caused by bug 797302, not this one.

Great - let's get some uplift noms and land this to branches before EOD PT tomorrow (Monday) so this is in 17.0 beta 3
Comment on attachment 671812 [details] [diff] [review]
Patch 1. v2 Make sure AddRef happens on the main thread.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Probably has been there since Firefox 4 or the removal of Proxy Objects way before Firefox 10.
User impact if declined: Use-after-free in rare race conditions, potentially leading to memory corruption. 
Testing completed (on m-c, etc.): Landed on m-c a few days ago.
Risk to taking this patch (and alternatives if risky): Similar as with leaving the bug in place, but without the potential security issue.
Attachment #671812 - Flags: approval-mozilla-beta?
Attachment #671812 - Flags: approval-mozilla-aurora?
Attachment #671812 - Flags: approval-mozilla-beta?
Attachment #671812 - Flags: approval-mozilla-beta+
Attachment #671812 - Flags: approval-mozilla-aurora?
Attachment #671812 - Flags: approval-mozilla-aurora+
Whiteboard: [adv-track-main17-]
Blocks: 808146
Comment on attachment 671812 [details] [diff] [review]
Patch 1. v2 Make sure AddRef happens on the main thread.

Clearing feedback, bug is long gone now in ASan :)
Attachment #671812 - Flags: feedback?(choller) → feedback+
Group: core-security
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: