Closed Bug 1253883 Opened 8 years ago Closed 8 years ago

Intermittent ASAN test_browserElement_oop_getWebManifest.html AddressSanitizer: heap-use-after-free in cache2 code

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: RyanVM, Assigned: michal)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-uaf, intermittent-failure, sec-other, Whiteboard: [necko-active][post-critsmash-triage][adv-main48-])

Attachments

(1 file, 1 obsolete file)

https://treeherder.mozilla.org/logviewer.html#?job_id=23100626&repo=mozilla-inbound

14:03:20     INFO -  1865 INFO TEST-START | dom/browser-element/mochitest/test_browserElement_oop_getWebManifest.html
14:03:23     INFO -  [Child 7242] WARNING: g_path_get_basename: assertion `file_name != NULL' failed: 'glib warning', file /builds/slave/m-in-l64-asan-0000000000000000/build/src/toolkit/xre/nsSigHandlers.cpp, line 142
14:03:23     INFO -  (process:7242): GLib-CRITICAL **: g_path_get_basename: assertion `file_name != NULL' failed
14:03:25     INFO -  -----------------------------------------------------
14:03:25     INFO -  Suppressions used:
14:03:25     INFO -    count      bytes template
14:03:25     INFO -      820      26216 nsComponentManagerImpl
14:03:25     INFO -        6        816 mozJSComponentLoader::LoadModule
14:03:25     INFO -        1        384 pixman_implementation_lookup_composite
14:03:25     INFO -      360      15936 libfontconfig.so
14:03:25     INFO -        8       1816 MessageLoop::MessageLoop
14:03:25     INFO -        1         24 base::WaitableEvent::TimedWait
14:03:25     INFO -        6        192 libcairo.so
14:03:25     INFO -        1         32 libdl.so
14:03:25     INFO -       26       6492 libglib-2.0.so
14:03:25     INFO -  -----------------------------------------------------
14:03:26     INFO -  ###################################### BrowserElementCopyPaste.js loaded
14:03:26     INFO -  ############################### browserElementPanningAPZDisabled.js loaded
14:03:26     INFO -  ############################### browserElementPanning.js loaded
14:03:26     INFO -  ######################## BrowserElementChildPreload.js loaded
14:03:26     INFO -  ######################## extensions.js loaded
14:03:26     INFO -  ###################################### BrowserElementCopyPaste.js loaded
14:03:26     INFO -  ############################### browserElementPanningAPZDisabled.js loaded
14:03:26     INFO -  ############################### browserElementPanning.js loaded
14:03:26     INFO -  ######################## BrowserElementChildPreload.js loaded
14:03:26     INFO -  ######################## extensions.js loaded
14:03:27     INFO -  ###################################### BrowserElementCopyPaste.js loaded
14:03:27     INFO -  ############################### browserElementPanningAPZDisabled.js loaded
14:03:27     INFO -  ############################### browserElementPanning.js loaded
14:03:27     INFO -  ######################## BrowserElementChildPreload.js loaded
14:03:27     INFO -  ######################## extensions.js loaded
14:03:27     INFO -  *************************
14:03:27     INFO -  A coding exception was thrown and uncaught in a Task.
14:03:27     INFO -  Full message: TypeError: NetworkError when attempting to fetch resource.
14:03:27     INFO -  Full stack:
14:03:27     INFO -  *************************
14:03:27     INFO -  *************************
14:03:27     INFO -  A coding exception was thrown and uncaught in a Task.
14:03:27     INFO -  Full message: TypeError: NetworkError when attempting to fetch resource.
14:03:27     INFO -  Full stack:
14:03:27     INFO -  *************************
14:03:27     INFO -  MEMORY STAT | vsize 20973037MB | residentFast 652MB
14:03:27     INFO -  1866 INFO TEST-OK | dom/browser-element/mochitest/test_browserElement_oop_getWebManifest.html | took 6796ms
14:03:30     INFO -  =================================================================
14:03:30     INFO -  ==5209==ERROR: AddressSanitizer: heap-use-after-free on address 0x627000230100 at pc 0x440ba3 bp 0x7fcb3807a750 sp 0x7fcb3807a730
14:03:30     INFO -  READ of size 13480 at 0x627000230100 thread T15 (Cache2 I/O)
14:03:31     INFO -      #0 0x440ba2 in write /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:325
14:03:31     INFO -      #1 0x7fcb40dd52ef in pt_Write /builds/slave/m-in-l64-asan-0000000000000000/build/src/nsprpub/pr/src/pthreads/ptio.c:1315
14:03:31     INFO -  -----------------------------------------------------
14:03:31     INFO -  Suppressions used:
14:03:31     INFO -    count      bytes template
14:03:31     INFO -      820      26216 nsComponentManagerImpl
14:03:31     INFO -        6        816 mozJSComponentLoader::LoadModule
14:03:31     INFO -        1        384 pixman_implementation_lookup_composite
14:03:31     INFO -      360      15936 libfontconfig.so
14:03:31     INFO -        8       1816 MessageLoop::MessageLoop
14:03:31     INFO -        1         24 base::WaitableEvent::TimedWait
14:03:31     INFO -        6        192 libcairo.so
14:03:31     INFO -        1         32 libdl.so
14:03:31     INFO -       26       6492 libglib-2.0.so
14:03:31     INFO -  -----------------------------------------------------
14:03:33     INFO -      #2 0x7fcb26133d1f in (anonymous namespace)::interposedWrite(PRFileDesc*, void const*, int) /builds/slave/m-in-l64-asan-0000000000000000/build/src/xpcom/build/NSPRInterposer.cpp:72
14:03:33     INFO -      #3 0x7fcb2651943f in mozilla::net::CacheFileIOManager::WriteInternal(mozilla::net::CacheFileHandle*, long, char const*, int, bool, bool) /builds/slave/m-in-l64-asan-0000000000000000/build/src/netwerk/cache2/CacheFileIOManager.cpp:2024
14:03:33     INFO -      #4 0x7fcb265696f6 in mozilla::net::WriteEvent::Run() /builds/slave/m-in-l64-asan-0000000000000000/build/src/netwerk/cache2/CacheFileIOManager.cpp:725
14:03:33     INFO -      #5 0x7fcb2653f739 in mozilla::net::CacheIOThread::LoopOneLevel(unsigned int) /builds/slave/m-in-l64-asan-0000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:294
14:03:33     INFO -      #6 0x7fcb2653ef3c in mozilla::net::CacheIOThread::ThreadFunc() /builds/slave/m-in-l64-asan-0000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:224
14:03:33     INFO -      #7 0x7fcb2653e983 in mozilla::net::CacheIOThread::ThreadFunc(void*) /builds/slave/m-in-l64-asan-0000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:173
14:03:33     INFO -      #8 0x7fcb40ddd3cf in _pt_root /builds/slave/m-in-l64-asan-0000000000000000/build/src/nsprpub/pr/src/pthreads/ptthread.c:216
14:03:33     INFO -      #9 0x7fcb453afe99 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7e99)
14:03:33     INFO -      #10 0x7fcb444bf38c (/lib/x86_64-linux-gnu/libc.so.6+0xf338c)
14:03:33     INFO -  ASAN:SIGSEGV
14:03:33     INFO -  ==5209==AddressSanitizer: while reporting a bug found another one.Ignoring.
14:03:33     INFO -  TEST-INFO | Main app process: exit 1
14:03:33     INFO -  1867 INFO TEST-START | Shutdown
14:03:33     INFO -  1868 INFO Passed:  1517
14:03:33     INFO -  1869 INFO Failed:  0
14:03:33     INFO -  1870 INFO Todo:    12
14:03:33     INFO -  1871 INFO Slowest: 12181ms - /tests/dom/browser-element/mochitest/test_browserElement_oop_AudioChannel.html
14:03:33     INFO -  1872 INFO SimpleTest FINISHED
14:03:33     INFO -  1873 INFO TEST-INFO | Ran 1 Loops
14:03:33     INFO -  1874 INFO SimpleTest FINISHED
14:03:33  WARNING -  TEST-UNEXPECTED-FAIL | dom/browser-element/mochitest/test_browserElement_oop_getWebManifest.html | application terminated with exit code 1
Assignee: nobody → michal.novotny
Group: core-security → network-core-security
Whiteboard: [necko-active]
Attached patch fix (obsolete) — Splinter Review
I was not able to find exact cause of this but I guess that the fix in bug 1248958 doesn't work in all cases. I think this write event must be from CacheIndex because both CacheFileMetadata and CacheFileChunk release their write buffer in the callback or in destructor.
Attachment #8740947 - Flags: review?(honzab.moz)
Comment on attachment 8740947 [details] [diff] [review]
fix

Review of attachment 8740947 [details] [diff] [review]:
-----------------------------------------------------------------

I really don't know what to think about this patch.  There is a problem (serious) in this code that we don't fully understand.  This flag is something I wouldn't trust, because it's built with assumption of otherwise functional code, which we don't have.  And in the past it showed up having a patch built on a hunch was not a good thing to do, specially in case of a security bug.

Could we rather change the buffering ownership somehow, or even better - find out the true cause?  I think that is the fragile thing here at the first place.
Attachment #8740947 - Flags: review?(honzab.moz)
Comment on attachment 8740947 [details] [diff] [review]
fix

(In reply to Honza Bambas (:mayhemer) from comment #7)
> I really don't know what to think about this patch.  There is a problem
> (serious) in this code that we don't fully understand.  This flag is
> something I wouldn't trust, because it's built with assumption of otherwise
> functional code, which we don't have.  And in the past it showed up having a
> patch built on a hunch was not a good thing to do, specially in case of a
> security bug.

This is simply a proper fix of the problem discovered in bug 1248958. The buffer handling in CacheIndex is incorrect because the buffer can be freed when read/write event is pending. After many many tries I was able to reproduce this crash and the patch in bug 1248958 doesn't work because it relies on assumption that no read/write event can be posted once CacheIndex::mShuttingDown is set to true, but it can. The sequence is as follows:

1) CacheIndex::WriteIndexToDisk
2) CacheIndex::ChangeState		READY -> WRITING
3) CacheFileIOManager::OpenFile		index.tmp
4) CacheFileIOManager::Shutdown
5) CacheIndex::PreShutdown		PreShutdownInternal posted on WRITE level
6) CacheIndex::OnFileOpenedInternal	doesn't return early because mState != READY
7) CacheIndex::WriteRecords
8) CacheFileIOManager::Write            write event pointing to mRWBuf is posted
9) CacheIndex::PreShutdownInternal
10) CacheIndex::FinishWrite		mRWBuf is freed

This patch will solve the crash. I also need to review mShuttingDown check because I'm not sure why the condition includes mState == READY. Anyway, it's much safer to take this patch for now than changing mShuttingDown check. There is also a benefit from this patch that we don't need to post PreShutdownInternal on WRITE level so we'll shut the index down faster.
Attachment #8740947 - Flags: review?(honzab.moz)
Is this really sec-high? It can happen only during shutdown and the crash needs exact timing which IMO cannot be influenced by any web page, add-on or plug-in.
Flags: needinfo?(mwobensmith)
(In reply to Michal Novotny (:michal) from comment #8)
> This is simply a proper fix of the problem discovered in bug 1248958. The

The true reason I don't much like the patch: having a flag for buffer management where there can potentially be more then one pending operation the flag relates to - there is nothing that ensures just one write op is happening, I also mean when we later change something - is simply a very bad coding architecture.  It's fragile, and will probably break with future changes, even just not directly related, like what we've done to reveal this very bug.  Or at least might make finding a cause of any issues around this code hard, simply because it's not reliable from its nature.

I would prefer to simply change the buffer referencing here (have a referenced wrapper for your buffers), which is IMO the only true correct and safe fix here, which would remove any potential for a hard-to-find crash like this once and for all.  

I don't remember I would ever suggest changing mShuttingDown logic, or did I?

Anyway, it's your code, your decision.  I will try to get to this patch review soon if you don't change your mind, but patches like these are extremely hard to check on also because the cache index code is pretty much complex already and as shown with bug 1248958 pretty much sensitive to changes of the above layers.
The rating was determined by a consensus at our weekly triage. What do you feel would be a better rating?
Flags: needinfo?(mwobensmith) → needinfo?(michal.novotny)
(In reply to Matt Wobensmith [:mwobensmith][:matt] from comment #11)
> The rating was determined by a consensus at our weekly triage. What do you
> feel would be a better rating?

This bug is IMO not exploitable, so sec-moderate or sec-low would be probably more appropriate.
Flags: needinfo?(michal.novotny)
Andrew, do you recall our logic in rating this a sec-high? Or, based on the information above, do you think a lower rating is more appropriate?
Flags: needinfo?(continuation)
(In reply to Honza Bambas (:mayhemer) from comment #10)
> The true reason I don't much like the patch: having a flag for buffer
> management where there can potentially be more then one pending operation
> the flag relates to - there is nothing that ensures just one write op is
> happening

This is nonsense and it shows that you're commenting without understanding the patch and the code. Depending on the state in which CacheIndex is, there can be reading or writing in progress, but not both at the same time. In both states there can be only one read or write operation in progress. All we need to do is to ensure that the buffer is not freed before the read/write operation finishes. This is currently broken only in case we want to cancel reading/writing of the index which we do in case of shutdown and from CacheIndex::RemoveAll (this code is currently unused).


> I don't remember I would ever suggest changing mShuttingDown logic, or did I?

I don't remember I said you suggested it, or did I? I mentioned it because changing the mShuttingDown check would probably make the patch in bug 1248958 working, but doing this would be IMO a bad decision just like in bug 1248958 (comment #2).
(In reply to Matt Wobensmith [:mwobensmith][:matt] from comment #13)
> Andrew, do you recall our logic in rating this a sec-high? Or, based on the
> information above, do you think a lower rating is more appropriate?

We usually rate use-after-frees sec-critical, so one that is hard to trigger gets sec-high. But it sounds like this is very removed from content, and timing-dependent, and at shutdown, so it probably doesn't need to be rated like that. I'll just mark it sec-other.
Flags: needinfo?(continuation)
Keywords: sec-highsec-other
Comment on attachment 8740947 [details] [diff] [review]
fix

Review of attachment 8740947 [details] [diff] [review]:
-----------------------------------------------------------------

I'm leaving r? as few answers are needed first and also two small but important changes are definitely needed.


As I understand, today cache index cannot be writing anything before reading is done, right?  The only way to start index write is through WriteIndexToDiskIfNeeded(), which happens on any updated coming from the platform side and when the index is READY.  So write scheduling is bypassed also when WRITING, is that so?

::: netwerk/cache2/CacheIndex.cpp
@@ +1564,5 @@
>  
>  bool
>  CacheIndex::WriteIndexToDiskIfNeeded()
>  {
> +  if (mState != READY || mShuttingDown || mRWPending) {

do we need to reschedule this?  as I understand, this method is called whenever we do an update to the index data from the platform side.  could it happen we end up with some unwritten data because of this?  is it worth complicating the code?

also, can it actually ever happen that mState == READY && mRWPending?  because when I'm looking into the code, it seems it can only be READING or WRITING while mRWPending == true, or with different state somehow during shutdown (but that is beyond my willingness of putting even more cycles to this review to find out exactly)

based on that, this condition will never reach a point when mState == READY && !mShuttingDown && mRWPending.  so, adding || mRWPending is probably a dead code.

if my assertion here is true, then your change in CacheIndex::OnDataWritten (the handle check) is probably not correct.

or am I wrong?

@@ +3182,5 @@
>  
>  void
>  CacheIndex::ReleaseBuffer()
>  {
> +  if (!mRWBuf || mRWPending) {

ReleaseBuffer() may be called on the main thread:

CacheIndex::~CacheIndex()
CacheIndex::Shutdown()  ->  index->FinishWrite(false)
CacheIndex::RemoveAll()  ->  index->FinishWrite(false)
CacheIndex::Shutdown()  ->  index->FinishRead(false)
CacheIndex::RemoveAll()  ->  index->FinishRead(false)
CacheIndex::Init  ->  CacheIndex::InitInternal  ->  CacheIndex::ReadIndexFromDisk()  ->  FinishRead(false)

according how this code and the flag work, it is probably OK to make the mRWPending flag simply Atomic (at least rel-acq)

@@ +3412,5 @@
>      return NS_ERROR_NOT_AVAILABLE;
>    }
>  
> +  MOZ_ASSERT(mRWPending);
> +  mRWPending = false;

why not before if (!IsIndexUsable()) check?  I think we should drop this flag regardless the index state, no?  because otherwise ReleaseBuffers() called from dtor will not release the buffer and we will leak.

@@ +3421,5 @@
>  
>    switch (mState) {
>      case WRITING:
> +      // mRWPending prevents dispatching another write event, so the handle must
> +      // be mIndexHandle.

hm.. so what write operation was that before?  because the flag prevents (IIUC) only WriteRecords, which is the only one working with mRWBuf.  Do you want to say it was possible that two write operations where sharing the mRWBuf reference?

@@ +3445,5 @@
>      default:
>        // Writing was canceled.
>        LOG(("CacheIndex::OnDataWritten() - ignoring notification since the "
>             "operation was previously canceled [state=%d]", mState));
> +      ReleaseBuffer();

There are possible code paths that may lead to not releasing the buffer at all (until dtor), like when here we case for WRITING, then do the file rename, ending up at CacheIndex::OnFileRenamed, that may not get to call FinishWrite (which would otherwise release the buffer).

This may not be a big deal as we release the buffer in dtor of CacheIndex (when this patch is fixed, as above), but still it may be a waste to keep the buffer around when not needed.

@@ +3464,5 @@
>      return NS_ERROR_NOT_AVAILABLE;
>    }
>  
> +  MOZ_ASSERT(mRWPending);
> +  mRWPending = false;

here as well, why not before if (!IsIndexUsable()) check?

::: netwerk/cache2/CacheIndex.h
@@ +990,5 @@
>    uint32_t                  mRWBufSize;
>    uint32_t                  mRWBufPos;
>    RefPtr<CacheHash>         mRWHash;
>  
> +  // True if read or write operation is pending. It is used to ensure that

probably worth mentioning that it applies only to write operations that work with mRWBuf member.
Thanks for the feedback.

(In reply to Honza Bambas (:mayhemer) from comment #16)
> As I understand, today cache index cannot be writing anything before reading
> is done, right?  The only way to start index write is through
> WriteIndexToDiskIfNeeded(), which happens on any updated coming from the
> platform side and when the index is READY.  So write scheduling is bypassed
> also when WRITING, is that so?

Right.

> >  bool
> >  CacheIndex::WriteIndexToDiskIfNeeded()
> >  {
> > +  if (mState != READY || mShuttingDown || mRWPending) {
> 
> do we need to reschedule this?  as I understand, this method is called
> whenever we do an update to the index data from the platform side.  could it
> happen we end up with some unwritten data because of this?  is it worth
> complicating the code?

Right, we call it from methods that could have changed the data in index. There is definitely no need to reschedule it, because it will be called on the next change. If there is no other change then all unwritten changes will be written to the journal during shutdown. If the process crashes, then those unwritten changes will be restored during update process on the next start.

> also, can it actually ever happen that mState == READY && mRWPending? 
> because when I'm looking into the code, it seems it can only be READING or
> WRITING while mRWPending == true, or with different state somehow during
> shutdown (but that is beyond my willingness of putting even more cycles to
> this review to find out exactly)
> 
> based on that, this condition will never reach a point when mState == READY
> && !mShuttingDown && mRWPending.  so, adding || mRWPending is probably a
> dead code.
> 
> if my assertion here is true, then your change in CacheIndex::OnDataWritten
> (the handle check) is probably not correct.
> 
> or am I wrong?

This happens when read/write operation is pending and FinishRead/FinishWrite is called to cancel the read/write process. In current code this can happen only during shutdown and from CacheIndex::RemoveAll. Given that RemoveAll() is and probably will stay unused, I could remove this method and then the current check would be sufficient.

> 
> @@ +3182,5 @@
> >  
> >  void
> >  CacheIndex::ReleaseBuffer()
> >  {
> > +  if (!mRWBuf || mRWPending) {
> 
> ReleaseBuffer() may be called on the main thread:
> 
> CacheIndex::~CacheIndex()
> CacheIndex::Shutdown()  ->  index->FinishWrite(false)
> CacheIndex::RemoveAll()  ->  index->FinishWrite(false)
> CacheIndex::Shutdown()  ->  index->FinishRead(false)
> CacheIndex::RemoveAll()  ->  index->FinishRead(false)
> CacheIndex::Init  ->  CacheIndex::InitInternal  -> 
> CacheIndex::ReadIndexFromDisk()  ->  FinishRead(false)
> 
> according how this code and the flag work, it is probably OK to make the
> mRWPending flag simply Atomic (at least rel-acq)

All calls to ReleaseBuffer are called under sLock (also all other mRWPending accesses) except the call from destructor. So is it necessary to make it atomic? What could go wrong? Anyway, I know you don't like locks, but since it is already there it would be probably cleaner to grab the static lock in the destructor and put sLock.AssertCurrentThreadOwns() into ReleaseBuffer().

> > +  MOZ_ASSERT(mRWPending);
> > +  mRWPending = false;
> 
> why not before if (!IsIndexUsable()) check?  I think we should drop this
> flag regardless the index state, no?  because otherwise ReleaseBuffers()
> called from dtor will not release the buffer and we will leak.

Good question. In fact IsIndexUsable must return true in these methods. It is not usable when the state is INITIAL or SHUTDOWN. In case the state is INITIAL nobody called CacheFileIOManager::Read or CacheFileIOManager::Write so we cannot be here. When the state is SHUTDOWN IO thread no longer exists so these callbacks cannot be called. I'll change it to assertion.

> >    switch (mState) {
> >      case WRITING:
> > +      // mRWPending prevents dispatching another write event, so the handle must
> > +      // be mIndexHandle.
> 
> hm.. so what write operation was that before?  because the flag prevents
> (IIUC) only WriteRecords, which is the only one working with mRWBuf.  Do you
> want to say it was possible that two write operations where sharing the
> mRWBuf reference?

This code was simply wrong and it couldn't happen that mIndexHandle != aHandle. Not even back in the days when method CacheIndex::RemoveAll was used. Simply because the index was empty and to start writing the index we would need to make 300 changes in the index and those changes would be queued after the pending write event that was canceled.

> @@ +3445,5 @@
> >      default:
> >        // Writing was canceled.
> >        LOG(("CacheIndex::OnDataWritten() - ignoring notification since the "
> >             "operation was previously canceled [state=%d]", mState));
> > +      ReleaseBuffer();
> 
> There are possible code paths that may lead to not releasing the buffer at
> all (until dtor), like when here we case for WRITING, then do the file
> rename, ending up at CacheIndex::OnFileRenamed, that may not get to call
> FinishWrite (which would otherwise release the buffer).

I don't think there is such code path. The state is changed in FinishRead/FinishWrite so if don't call it we will stay in READ/WRITE state forever.

> ::: netwerk/cache2/CacheIndex.h
> @@ +990,5 @@
> >    uint32_t                  mRWBufSize;
> >    uint32_t                  mRWBufPos;
> >    RefPtr<CacheHash>         mRWHash;
> >  
> > +  // True if read or write operation is pending. It is used to ensure that
> 
> probably worth mentioning that it applies only to write operations that work
> with mRWBuf member.

I'm not sure I understand this. All read or write operations that call OnDataRead and OnDataWritten back use mRWBuf. CacheFileIOManager::Write that marks the index dirty (in CacheIndex::ParseRecords) doesn't call OnDataWritten because it is called with aCallback==nullptr.
Comment on attachment 8740947 [details] [diff] [review]
fix

I will attach a new patch soon.
Attachment #8740947 - Attachment is obsolete: true
Attachment #8740947 - Flags: review?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #17)
> it would be probably cleaner to grab the static lock in
> the destructor and put sLock.AssertCurrentThreadOwns() into ReleaseBuffer().

AFAICS, destructor is already called under the lock because CacheIndex::Shutdown should release the last reference and it does it under the lock.
(In reply to Michal Novotny (:michal) from comment #17)
> > also, can it actually ever happen that mState == READY && mRWPending? 
> > because when I'm looking into the code, it seems it can only be READING or
> > WRITING while mRWPending == true, or with different state somehow during
> > shutdown (but that is beyond my willingness of putting even more cycles to
> > this review to find out exactly)
> > 
> > based on that, this condition will never reach a point when mState == READY
> > && !mShuttingDown && mRWPending.  so, adding || mRWPending is probably a
> > dead code.
> > 
> > if my assertion here is true, then your change in CacheIndex::OnDataWritten
> > (the handle check) is probably not correct.
> > 
> > or am I wrong?
> 
> This happens when read/write operation is pending and FinishRead/FinishWrite
> is called to cancel the read/write process. In current code this can happen
> only during shutdown and from CacheIndex::RemoveAll. Given that RemoveAll()
> is and probably will stay unused, I could remove this method and then the
> current check would be sufficient.

Based on your comment (1) below, this is moot.

> > according how this code and the flag work, it is probably OK to make the
> > mRWPending flag simply Atomic (at least rel-acq)
> 
> All calls to ReleaseBuffer are called under sLock (also all other mRWPending
> accesses) except the call from destructor. So is it necessary to make it
> atomic? What could go wrong? Anyway, I know you don't like locks, but since
> it is already there it would be probably cleaner to grab the static lock in
> the destructor and put sLock.AssertCurrentThreadOwns() into ReleaseBuffer().

Please definitely assert the lock in ReleaseBuffer() and make sure that each method itself that is using this new flag does so as well.

> > >    switch (mState) {
> > >      case WRITING:
> > > +      // mRWPending prevents dispatching another write event, so the handle must
> > > +      // be mIndexHandle.
> > 
> > hm.. so what write operation was that before?  because the flag prevents
> > (IIUC) only WriteRecords, which is the only one working with mRWBuf.  Do you
> > want to say it was possible that two write operations where sharing the
> > mRWBuf reference?
> 

(1)

> This code was simply wrong and it couldn't happen that mIndexHandle !=
> aHandle. Not even back in the days when method CacheIndex::RemoveAll was
> used. Simply because the index was empty and to start writing the index we
> would need to make 300 changes in the index and those changes would be
> queued after the pending write event that was canceled.

Good!  So there will be no if and no assertion in the next patch?

> 
> > @@ +3445,5 @@
> > >      default:
> > >        // Writing was canceled.
> > >        LOG(("CacheIndex::OnDataWritten() - ignoring notification since the "
> > >             "operation was previously canceled [state=%d]", mState));
> > > +      ReleaseBuffer();
> > 
> > There are possible code paths that may lead to not releasing the buffer at
> > all (until dtor), like when here we case for WRITING, then do the file
> > rename, ending up at CacheIndex::OnFileRenamed, that may not get to call
> > FinishWrite (which would otherwise release the buffer).
> 
> I don't think there is such code path. The state is changed in
> FinishRead/FinishWrite so if don't call it we will stay in READ/WRITE state
> forever.

I still think there are potential paths.  When the index state changes between (e.g.) file rename, we may not get to the point OnFileRenamed calls FinishWrite.  There is a potential for that, and when there is a potential, there is usually a problem.

> 
> > ::: netwerk/cache2/CacheIndex.h
> > @@ +990,5 @@
> > >    uint32_t                  mRWBufSize;
> > >    uint32_t                  mRWBufPos;
> > >    RefPtr<CacheHash>         mRWHash;
> > >  
> > > +  // True if read or write operation is pending. It is used to ensure that
> > 
> > probably worth mentioning that it applies only to write operations that work
> > with mRWBuf member.
> 
> I'm not sure I understand this. All read or write operations that call
> OnDataRead and OnDataWritten back use mRWBuf. CacheFileIOManager::Write that
> marks the index dirty (in CacheIndex::ParseRecords) doesn't call
> OnDataWritten because it is called with aCallback==nullptr.

Ups, you are right, not sure now what was this comment exactly about.  Ignore me at this point.

(In reply to Michal Novotny (:michal) from comment #19)
> (In reply to Michal Novotny (:michal) from comment #17)
> > it would be probably cleaner to grab the static lock in
> > the destructor and put sLock.AssertCurrentThreadOwns() into ReleaseBuffer().
> 
> AFAICS, destructor is already called under the lock because
> CacheIndex::Shutdown should release the last reference and it does it under
> the lock.

Even better.

Thanks.
Attached patch patch v2Splinter Review
changes in this patch:
 - releasing of mRWBuf is now logged
 - sLock.AssertCurrentThreadOwns() is in every method using mRWPending
 - more assertions (with comment) in CacheIndex::FinishWrite and CacheIndex::FinishRead
 - added assertion to check correct thread in CacheFileIOListener methods
 - IsIndexUsable() check changed to assertion in CacheFileIOListener methods


(In reply to Honza Bambas (:mayhemer) from comment #20)
> > This happens when read/write operation is pending and FinishRead/FinishWrite
> > is called to cancel the read/write process. In current code this can happen
> > only during shutdown and from CacheIndex::RemoveAll. Given that RemoveAll()
> > is and probably will stay unused, I could remove this method and then the
> > current check would be sufficient.

In the end I didn't remove the method because I would need to remove CacheFileIOManager::EvictAll(), CacheFileIOManager::EvictAllInternal() and all references to CacheIndex::mRemovingAll too. This would be quite a large change.

> > This code was simply wrong and it couldn't happen that mIndexHandle !=
> > aHandle. Not even back in the days when method CacheIndex::RemoveAll was
> > used. Simply because the index was empty and to start writing the index we
> > would need to make 300 changes in the index and those changes would be
> > queued after the pending write event that was canceled.
> 
> Good!  So there will be no if and no assertion in the next patch?

I left the assertion just to be consistent with OnDataRead which contains the assertion too. But I removed the comment which was a bit confusing.

> I still think there are potential paths.  When the index state changes
> between (e.g.) file rename, we may not get to the point OnFileRenamed calls
> FinishWrite.  There is a potential for that, and when there is a potential,
> there is usually a problem.

In every On... method we either continue with next step of the read/write process or we call FinishRead/Write to fail it which then switches index to READY state. I agree that this isn't the nicest code but it seems to work. OTOH it is true that if we get stuck in READ or WRITE state we probably would not notice it. I'll file a follow up bug to clean this up.
Attachment #8746530 - Flags: review?(honzab.moz)
Attachment #8746530 - Flags: review?(honzab.moz) → review+
https://hg.mozilla.org/mozilla-central/rev/398b93819760
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Group: network-core-security → core-security-release
Probably too late for Beta at this point, but can we please uplift this to Aurora at least?
Flags: needinfo?(michal.novotny)
Comment on attachment 8746530 [details] [diff] [review]
patch v2

Approval Request Comment
[Feature/regressing bug #]: probably 923016
[User impact if declined]: possibly using a freed memory during shutdown 
[Describe test coverage new/current, TreeHerder]: the cache index is written to disk when running mochitests which use the cache, but there is no test of canceled writing of the index during shutdown
[Risks and why]: probably low, the code has been in the tree for a while already and there is no crash or use-after-free report
[String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8746530 - Flags: approval-mozilla-aurora?
Comment on attachment 8746530 [details] [diff] [review]
patch v2

Fix an intermittent, taking it
Attachment #8746530 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [necko-active] → [necko-active][post-critsmash-triage]
Whiteboard: [necko-active][post-critsmash-triage] → [necko-active][post-critsmash-triage][adv-main48-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.