B2G FTU crash when accessing e.me privacy page (after opening marketplace privacy page)

RESOLVED FIXED in Firefox 28

Status

()

defect
--
critical
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: gwagner, Assigned: smaug)

Tracking

(5 keywords)

Trunk
mozilla30
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(blocking-b2g:1.3+, firefox27 wontfix, firefox28+ fixed, firefox29+ fixed, firefox30+ fixed, firefox-esr24 unaffected, b2g18 unaffected, b2g-v1.1hd unaffected, b2g-v1.2 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

Details

(Whiteboard: [b2g-crash], [systemsfe][adv-main28+])

Attachments

(1 attachment, 5 obsolete attachments)

I still get a consistent crash when I access e.me after I wipe the phone with make reset-gaia. current gaia and gecko trunk.


STR:
make reset-gaia
enable data connection
choose wifi (mozilla guest in my case)
click on privacy link
open b2g privacy page,
open marketplace privacy page,
open e.me privacy page.

Seems like cleaning the marketplace appcache is having problems during cycle collection: https://marketplace.firefox.com/manifest.appcache?repo=fireplace

On my nexus 4 with debug gecko I could catch it in gdb:

Program received signal SIGSEGV, Segmentation fault.
0xb4ca1a6e in nsCOMPtr<nsIFile>::~nsCOMPtr (this=0xb037fc3c, 
    __in_chrg=<optimized out>) at ../../../xpcom/glue/nsCOMPtr.h:517
517	            NSCAP_RELEASE(this, mRawPtr);
(gdb) bt
#0  0xb4ca1a6e in nsCOMPtr<nsIFile>::~nsCOMPtr (this=0xb037fc3c, 
    __in_chrg=<optimized out>) at ../../../xpcom/glue/nsCOMPtr.h:517
#1  0xb555f500 in nsDOMOfflineResourceList::~nsDOMOfflineResourceList (
    this=0xb037fbe0, __in_chrg=<optimized out>)
    at ../../../../dom/src/offline/nsDOMOfflineResourceList.cpp:96
#2  0xb555f564 in nsDOMOfflineResourceList::~nsDOMOfflineResourceList (
    this=0xb037fbe0, __in_chrg=<optimized out>)
    at ../../../../dom/src/offline/nsDOMOfflineResourceList.cpp:96
#3  0xb53d48e2 in nsTransactionManager::DeleteCycleCollectable (
    this=<optimized out>)
    at ../../../../editor/txmgr/src/nsTransactionManager.cpp:58
#4  0xb5172a76 in nsDOMEventTargetHelper::cycleCollection::DeleteCycleCollectable (this=<optimized out>, p=0xb037fbe0)
    at ../../dist/include/nsDOMEventTargetHelper.h:58
#5  0xb4cbbe54 in SnowWhiteKiller::~SnowWhiteKiller (this=0xbee75cfc, 
    __in_chrg=<optimized out>) at ../../../xpcom/base/nsCycleCollector.cpp:2309
#6  0xb4cbbece in nsCycleCollector::FreeSnowWhite (this=0xb3e7d000, 
    aUntilNoSWInPurpleBuffer=<optimized out>)
    at ../../../xpcom/base/nsCycleCollector.cpp:2457
#7  0xb5405a0a in AsyncFreeSnowWhite::Run (this=0xb33e7640)
    at ../../../../js/xpconnect/src/XPCJSRuntime.cpp:209
#8  0xb4ceeaaa in ProcessNextEvent (result=0xbee75da7, 
    mayWait=<optimized out>, this=0xb3e2f600)
---Type <return> to continue, or q <return> to quit---
    at ../../../xpcom/threads/nsThread.cpp:637
#9  nsThread::ProcessNextEvent (this=0xb3e2f600, mayWait=<optimized out>, 
    result=0xbee75da7) at ../../../xpcom/threads/nsThread.cpp:568
#10 0xb4ca9048 in NS_ProcessNextEvent (thread=0xb3e2f600, 
    mayWait=<optimized out>) at ../../../xpcom/glue/nsThreadUtils.cpp:263
#11 0xb4e86c70 in mozilla::ipc::MessagePump::Run (this=0xb3e01b80, 
    aDelegate=0xbee75f00) at ../../../ipc/glue/MessagePump.cpp:95
#12 0xb4e7892a in MessageLoop::RunInternal (this=0xbee75f00)
    at ../../../ipc/chromium/src/base/message_loop.cc:226
#13 0xb4e78942 in RunHandler (this=0xbee75f00)
    at ../../../ipc/chromium/src/base/message_loop.cc:219
#14 MessageLoop::Run (this=0xbee75f00)
    at ../../../ipc/chromium/src/base/message_loop.cc:193
#15 0xb53caaca in nsBaseAppShell::Run (this=0xb316a9c0)
    at ../../../widget/xpwidgets/nsBaseAppShell.cpp:157
#16 0xb5c3e36e in XRE_RunAppShell ()
    at ../../../toolkit/xre/nsEmbedFunctions.cpp:679
#17 0xb4e86d6e in mozilla::ipc::MessagePumpForChildProcess::Run (
    this=0xb3e01b80, aDelegate=0xbee75f00)
    at ../../../ipc/glue/MessagePump.cpp:253
#18 0xb4e7892a in MessageLoop::RunInternal (this=0xbee75f00)
    at ../../../ipc/chromium/src/base/message_loop.cc:226
#19 0xb4e78942 in RunHandler (this=0xbee75f00)
---Type <return> to continue, or q <return> to quit---
    at ../../../ipc/chromium/src/base/message_loop.cc:219
#20 MessageLoop::Run (this=0xbee75f00)
    at ../../../ipc/chromium/src/base/message_loop.cc:193
#21 0xb5c3e252 in XRE_InitChildProcess (aArgc=5, aArgv=<optimized out>, 
    aProcess=<optimized out>) at ../../../toolkit/xre/nsEmbedFunctions.cpp:516
#22 0x0000876e in main (argc=6, argv=0xbee769f4)
    at ../../../ipc/app/MozillaRuntimeMain.cpp:137
(gdb) p this
$1 = (nsCOMPtr<nsIFile> * const) 0xb037fc3c
(gdb) p *this
$2 = {mRawPtr = 0xb037f9b0}
(gdb) up
#1  0xb555f500 in nsDOMOfflineResourceList::~nsDOMOfflineResourceList (
    this=0xb037fbe0, __in_chrg=<optimized out>)
    at ../../../../dom/src/offline/nsDOMOfflineResourceList.cpp:96
96	}
(gdb) p this
$3 = (nsDOMOfflineResourceList * const) 0xb037fbe0
(gdb) p *this
$4 = {<nsDOMEventTargetHelper> = {
    <mozilla::dom::EventTarget> = {<nsIDOMEventTarget> = {<nsISupports> = {
          _vptr.nsISupports = 0xb6bbdf80}, <No data fields>}, <nsWrapperCache> = {_vptr.nsWrapperCache = 0xb6bbe07c, 
        mWrapper = {<js::HeapBase<JSObject*>> = {<No data fields>}, 
          ptr = 0x0}, mFlags = 2}, <No data fields>}, mRefCnt = {
      mRefCntAndFlags = 5}, _mOwningThread = {mThread = 0xb3e2e080}, 
    static _cycleCollectorGlobal = {
    <nsXPCOMCycleCollectionParticipant> = {<nsScriptObjectTracer> = {<nsCycleCollectionParticipant> = {_vptr.nsCycleCollectionParticipant = 0xb6bb73a0, 
            mMightSkip = true}, <No data fields>}, <No data fields>}, <No data fields>}, mListenerManager = {mRawPtr = 0x0}, mParentObject = 0x0, 
    mOwnerWindow = 0x0, 
    mHasOrHasHadOwnerWindow = true}, <nsIDOMOfflineResourceList> = {<nsISupports> = {
      _vptr.nsISupports = 0xb6bbe088}, <No data fields>}, <nsIObserver> = {<nsISupports> = {
      _vptr.nsISupports = 0xb6bbe100}, <No data fields>}, <nsIOfflineCacheUpdateObserver> = {<nsISupports> = {
      _vptr.nsISupports = 0xb6bbe118}, <No data fields>}, 
    <nsSupportsWeakReference> = {<nsISupportsWeakReference> = {<nsISupports> = {_vptr.nsISupports = 0xb6bbe134}, <No data fields>}, mProxy = 0xb04cb5e0}, 
  static _cycleCollectorGlobal = {<nsDOMEventTargetHelper::cycleCollection> = {<---Type <return> to continue, or q <return> to quit---
nsXPCOMCycleCollectionParticipant> = {<nsScriptObjectTracer> = {<nsCycleCollectionParticipant> = {_vptr.nsCycleCollectionParticipant = 0xb6bbe150, 
            mMightSkip = true}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, mInitialized = true, mManifestURI = {mRawPtr = 
    0xb038a5c0}, mManifestSpec = {<nsACString_internal> = {
      mData = 0xb01c7338 "https://marketplace.firefox.com/manifest.appcache?repo=fireplace", mLength = 64, mFlags = 5}, <No data fields>}, mDocumentURI = {
    mRawPtr = 0xb09b9640}, mApplicationCacheService = {mRawPtr = 0x0}, 
  mAvailableApplicationCache = {mRawPtr = 0x0}, mCacheUpdate = {mRawPtr = 
    0xb037f9b0}, mExposeCacheUpdateStatus = false, mStatus = 1, 
  mCachedKeys = 0x0, mCachedKeysCount = 0, 
  mPendingEvents = {<nsCOMArray_base> = {
      mArray = {<nsTArray_Impl<nsISupports*, nsTArrayInfallibleAllocator>> = {<nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>> = {
            mHdr = 0xb6d1a60c}, <nsTArray_TypedBase<nsISupports*, nsTArray_Impl<nsISupports*, nsTArrayInfallibleAllocator> >> = {<nsTArray_SafeElementAtHelper<nsISupports*, nsTArray_Impl<nsISupports*, nsTArrayInfallibleAllocator> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}}, <No data fields>}}
blocking-b2g: --- → 1.3?
Blocks: 958780
Group: core-security
Posted patch 964462.diff (obsolete) — Splinter Review
bent and mrbkap debugged.
Assignee: nobody → anygregor
Attachment #8366268 - Flags: review?(bugs)
blocking-b2g: 1.3? → 1.3+
Smaug is taking this according to bent.
Assignee: anygregor → bugs
Attachment #8366268 - Attachment is obsolete: true
Attachment #8366268 - Flags: review?(bugs)
Posted patch patch? (obsolete) — Splinter Review
Want to test this?
Attachment #8366360 - Flags: feedback?(anygregor)
Attachment #8366360 - Flags: feedback?(continuation)
Comment on attachment 8366360 [details] [diff] [review]
patch?

Review of attachment 8366360 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me, though I should refresh my knowledge of the stabilization stuff.
Attachment #8366360 - Flags: feedback?(continuation) → feedback+
I think the problem here, as diagnosed by bent, smaug and mrbkap, is that an object gets put in the snow white list of dying objects, then it gets resurrected via weak ref, then it dies again, and gets put into the snow white list a second time, then we end up calling the destructor twice on the same object.  This is a use-after-free, but I think that we're only running destructors, not random user code, so that may limit exploitability a bit.  This could be to blame for other crashes, too.
Component: Networking → XPCOM
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → Trunk
Looks like the patch leaks on windows... but I think I know why... It seems to reveal another bug.
Severity: normal → critical
Whiteboard: [b2g-crash]
(In reply to Olli Pettay [:smaug] from comment #3)
> Created attachment 8366360 [details] [diff] [review]
> patch?
> 
> Want to test this?

I will test tomorrow. Thanks for the quick turnaround!
Posted patch v2 (obsolete) — Splinter Review
Duplicate of this bug: 965595
Depends on: 965893
Attachment #8366360 - Flags: feedback?(anygregor)
Posted patch guess fix (obsolete) — Splinter Review
(nsOfflineCacheUpdate code is scary, and this kind of raw pointer in runnables are pure evil.)

Still compiling this.
Attachment #8366360 - Attachment is obsolete: true
Attachment #8366465 - Attachment is obsolete: true
Posted patch guess fix (obsolete) — Splinter Review
Attachment #8368823 - Attachment is obsolete: true
Posted patch guess fix v2Splinter Review
Attachment #8368825 - Attachment is obsolete: true
Attachment #8368923 - Flags: feedback?(anygregor)
I don't have access to my build environment for the nexus.
Alex will test this patch.
Flags: needinfo?(lissyx+mozillians)
(In reply to Gregor Wagner [:gwagner] from comment #13)
> I don't have access to my build environment for the nexus.
> Alex will test this patch.

Is it the patch we tested on my Inari last monday ?
Flags: needinfo?(lissyx+mozillians)
(In reply to Alexandre LISSY :gerard-majax from comment #14)
> (In reply to Gregor Wagner [:gwagner] from comment #13)
> > I don't have access to my build environment for the nexus.
> > Alex will test this patch.
> 
> Is it the patch we tested on my Inari last monday ?

yes
So we retested it, on a debug build, and it fixes the issue :)
Attachment #8368923 - Flags: feedback?(anygregor) → feedback+
Awesome. Thanks for testing.
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

So the thing to fix the crasher in this bug is to not use manual delete,
but always rely on refcounting.
Currently it is possible to end up calling
DeallocPOfflineCacheUpdateChild
while refcnt of offlineCacheUpdate is still > 0.

Making nsOfflineCacheUpdateOwner to support WeakPtr is just to not have
this super scary raw pointer there. It doesn't seem to fix this crash, but if I read the code correctly there are other ways to trigger that issue.
Attachment #8368923 - Flags: review?(honzab.moz)
A dupe of bug 773487 ?
Sounds similar.
Manual delete on refcounted objects is always super scary, and very error prone.
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

Review of attachment 8368923 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

I think for trunk fix we should have a strong ref to mOwner, but it might well be a followup to bake this before backport properly.

r=honzab

Thanks for fixing this!!
Attachment #8368923 - Flags: review?(honzab.moz) → review+
Duplicate of this bug: 773487
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Seems to be hard, though looks like we have had Bug 773487 open quite some time.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
changing refcnt handling is rather obvious.

Which older supported branches are affected by this flaw?
All, but the crash happens only in multiprocess setups. So in practice it should be b2g only.

How likely is this patch to cause regressions; how much testing does it need?
This area of code isn't too well tested since it has been used lately only in b2g.


NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): NA
User impact if declined: crashes 
Testing completed: NA
Risk to taking this patch (and alternatives if risky): No alternatives, and the risk is some kind of change
to offline cache behavior in the child processes. 
String or UUID changes made by this patch: NA
Attachment #8368923 - Flags: sec-approval?
Attachment #8368923 - Flags: approval-mozilla-b2g28?
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

sec-approval+ for trunk.

Please create Aurora, Beta, and ESR24 patches after it is in and nominate them.
Attachment #8368923 - Flags: sec-approval? → sec-approval+
Do we need this on aurora/beta or esr24? As far as I see, this is needed only on b2g branches.
But the patch seems to apply (with some fuzz) to release/beta/aurora
1.3 is based off of 28, so I think it needs to land on beta to end up there.  And at that point, you might as well land it on Aurora, too, just in case.
b2g28_v1_3 diverges from aurora now so one may need to create a b2g28_v1_3-specific patch
Please land on branches for consistency.
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

sec-highs have auto-approval for uplift to affected B2G branches [1], but why am I being pinged about landing this on 1.3 when it hasn't landed on trunk yet?

[1] https://wiki.mozilla.org/Release_Management/B2G_Landing
Attachment #8368923 - Flags: approval-mozilla-b2g28?
Flags: needinfo?(bugs)
Because I forgot to land it to m-i earlier today and m-i is now closed.
Flags: needinfo?(bugs)
landed on central -> https://hg.mozilla.org/mozilla-central/rev/41fbeb789ade
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Whiteboard: [b2g-crash] → [b2g-crash], [systemsfe]
Comment on attachment 8368923 [details] [diff] [review]
guess fix v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): old stuff, from first e10s-era 
User impact if declined: crashes in multiprocess setups, so in practice b2g only
Testing completed (on m-c, etc.): landed to m-c today
Risk to taking this patch (and alternatives if risky): I haven't found any other way to fix the crash. The code was odd (using manual delete with refcounted object). Should be quite safe fix. 
String or IDL/UUID changes made by this patch: NA
Attachment #8368923 - Flags: approval-mozilla-beta?
Attachment #8368923 - Flags: approval-mozilla-aurora?
Attachment #8368923 - Flags: approval-mozilla-beta?
Attachment #8368923 - Flags: approval-mozilla-beta+
Attachment #8368923 - Flags: approval-mozilla-aurora?
Attachment #8368923 - Flags: approval-mozilla-aurora+
Whiteboard: [b2g-crash], [systemsfe] → [b2g-crash], [systemsfe][adv-main28+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.