Closed Bug 949250 Opened 9 years ago Closed 9 years ago

crash in jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()

Categories

(Core :: Networking: Cache, defect)

Other Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla29

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 7 obsolete files)

Crash Signature: [@ jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()] → [@ jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()] [@ jemalloc_crash | free | mozilla::net::CacheFileChunk::Release() ] [@ moz_abort | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mozilla::net::CacheFileC…
Attached patch v1 (obsolete) — Splinter Review
Since all crashes has mozilla::net::MetadataWriteTimer::Notify on the stack and there is cross-thread usage of non-thread-safe nsWeakReference I decided to remove using it altogether.

I've spent some time on trying to implement threadsafe weak ptr but that was really hard and would just introduce a new complex and potentially unreliable code.

Omitting the weak ref and using hard ref leads to leaks:  there are relations like:

  CacheFile <--> MetadataWriteTimer <--> nsTimerImpl

During shutdown it only changes to:

  CacheFile <--> MetadataWriteTimer --> nsTimerImpl

..so the cycle CacheFile <--> MetadataWriteTimer remains.

I went a simple way, there is actually no need for allocating and referring any new MetadataWriteTimer object.  I made it (to keep the patch simpler) a nested member class that manages the timer while CacheFile receives the notification directly itself.


This bug should have higher attention then 945945 since we have much more crash reports for this signature then CacheFileHandle.  Also, it might be that this is the cause of all the crashes.

Michal, please try to take a look at this quickly.

https://tbpl.mozilla.org/?tree=Try&rev=69c739009534 (alone)
https://tbpl.mozilla.org/?tree=Try&rev=afbff2422fa9 (over bug 945945)
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8349407 - Flags: review?(michal.novotny)
Comment on attachment 8349407 [details] [diff] [review]
v1

Review of attachment 8349407 [details] [diff] [review]:
-----------------------------------------------------------------

Hmm, it seems that this bug could be really cause of all crashes. One thing I don't like about the new patch is that we can loose entries or at least some data during shutdown. Without this patch, following sequence happens during shutdown when timer is fired and only CacheStorageService references the CacheFile (via CacheEntry):

1) CacheStorageService::Shutdown()
2) CacheFile::Release()
3) CacheFile::WriteMetadataIfNeeded() - since this was the last reference
4) CacheFileIOManager::Shutdown()
5) nsTimerImpl::Shutdown()

With the new patch, timer references the CacheFile, so following happens.

1) CacheStorageService::Shutdown()
2) CacheFile::Release()
3) CacheFileIOManager::Shutdown()
4) nsTimerImpl::Shutdown()
5) CacheFile::WriteMetadataIfNeeded() - this fails since CacheFileIOManager is already shutdown

I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called when removing all CacheEntries either in CacheStorageService::Shutdown() or in CacheStorageService::ShutdownBackground().

::: netwerk/cache2/CacheFile.cpp
@@ +103,2 @@
>  {
> +  Cancel();

IMO this has no effect. When mTimer is active it refers CacheFile, so we cannot be here. And once the timer is cancelled or executed, there is nothing to cancel.

::: netwerk/cache2/CacheFile.h
@@ +190,5 @@
> +    nsresult Fire(CacheFile * aFile);
> +    nsresult Cancel();
> +    void Release();
> +    bool ShouldFireNew();
> +  } mTimer;

Maybe rename this member to mMetaTimer so it cannot be confused with the nested mTimer. Btw, I know this came from my original code, but I didn't realized it until I did the review :)
Attachment #8349407 - Flags: review?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #2)
> I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called
> when removing all CacheEntries either in CacheStorageService::Shutdown() or
> in CacheStorageService::ShutdownBackground().

And, of course, don't allow any other timer to be posted after this point.
Checking whether this patch alone leaks:
https://tbpl.mozilla.org/?tree=Try&rev=10cdf3035c2c
(In reply to Michal Novotny (:michal) from comment #2)
> I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called
> when removing all CacheEntries either in CacheStorageService::Shutdown() or
> in CacheStorageService::ShutdownBackground().

Understand.  I think ~CacheEntry may force this write.  It will also conform the plan on fast shutdown.  Writes will be bypassed when shutdown would be blocked.

> 
> ::: netwerk/cache2/CacheFile.cpp
> @@ +103,2 @@
> >  {
> > +  Cancel();
> 
> IMO this has no effect. When mTimer is active it refers CacheFile, so we
> cannot be here. And once the timer is cancelled or executed, there is
> nothing to cancel.

No.  In case the initiation of the timer fails, we still reference it.  But it still must be released on the same thread it has been created on.
I decided to change this patch from scratch.  I think we need one global timer with one global place to call on callbacks at one time.  This way it's simple to call them all at the shutdown time, synchronously and safely before IOman shutdown.
Attached patch v2 (obsolete) — Splinter Review
- based on v1 (on current gum's tip)
- global array at ioman where files are hreffed
- one timer for them all
- shutdown simply iterates the array
Attachment #8355380 - Flags: feedback?(michal.novotny)
Attached patch followup 1 to v2 (obsolete) — Splinter Review
Since crashes on https://tbpl.mozilla.org/?tree=Gum&rev=f1e41e89c176 I rather rewrote the Release() method altogether.

Explanation:
- Current CacheFile::Release() is missing one important feature - refcnt stabilization
- Lack of it causes reenter of Release with same count value (actually a recursion) when something AddRefs/Releases |this| before we delete it (same reason why default Release impl does stabilization before deleting the object - before dtor is called)
- I wanted to clean the code up a bit so I used CacheFileAutoLock after stabilization to actually do two things:
  1. provide mutex protection
  2. provide kung-fu death grip over call to WriteMetadataIfNeeded
- This way we are fully independent on state of the object after WriteMetadataIfNeeded exited
- Subsequent calls (even recursive) to Release() will delete the object correctly (definitely only ones)

https://tbpl.mozilla.org/?tree=Try&rev=b814b16c2db6
Attachment #8349407 - Attachment is obsolete: true
Attachment #8356409 - Flags: feedback?(michal.novotny)
Attached patch followup 1 (v2) to v2 (obsolete) — Splinter Review
Showed up we could not release mMetadata.  Using flag to indicate the point of no return.

https://tbpl.mozilla.org/?tree=Try&rev=df52d95fed57
Attachment #8356409 - Attachment is obsolete: true
Attachment #8356409 - Flags: feedback?(michal.novotny)
Attachment #8356602 - Flags: feedback?(michal.novotny)
Depends on: 957707
Comment on attachment 8355380 [details] [diff] [review]
v2

Review of attachment 8355380 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFile.cpp
@@ +1420,5 @@
>    return mDataIsDirty || mMetadata->IsDirty();
>  }
>  
>  void
> +CacheFile::MaybeWriteMetadataIfNeeded()

This is really ugly name :) Please rename this method to WriteMetadataIfNeeded() and the original WriteMetadataIfNeeded() to WriteMetadataIfNeededLocked(). Then remove mMemoryOnly check. After fixing bug #949175, we cannot be here when mMemoryOnly == true.

::: netwerk/cache2/CacheFileIOManager.cpp
@@ +1087,5 @@
>    Telemetry::AutoTimer<Telemetry::NETWORK_DISK_CACHE_SHUTDOWN_V2> shutdownTimer;
>  
>    CacheIndex::PreShutdown();
>  
> +  ShutdownMetadataWriteScheduling();

There is no need to post another event. Call the shutdown function from CacheFileIOManager::ShutdownInternal() which is executed on IO thread.

@@ +1245,5 @@
>  }
>  
> +// static
> +nsresult
> +CacheFileIOManager::ScheduleMetadataWrite(CacheFile * aFile)

Please use the style that is common for this file. I.e. split logic into static method (which can be called from any thread) and a method with suffix Internal, which is guaranteed to run on IO thread.
Attachment #8355380 - Flags: feedback?(michal.novotny) → feedback-
(In reply to Michal Novotny (:michal) from comment #11)
> Comment on attachment 8355380 [details] [diff] [review]
> v2
> 
> Review of attachment 8355380 [details] [diff] [review]:
> -----------------------------------------------------------------
> 

Thanks.

> Then remove mMemoryOnly check. 

Not until the MOZ_ASSERT(!mMemoryOnly); is removed.
> 
> ::: netwerk/cache2/CacheFileIOManager.cpp
> @@ +1087,5 @@
> >    Telemetry::AutoTimer<Telemetry::NETWORK_DISK_CACHE_SHUTDOWN_V2> shutdownTimer;
> >  
> >    CacheIndex::PreShutdown();
> >  
> > +  ShutdownMetadataWriteScheduling();
> 
> There is no need to post another event. Call the shutdown function from
> CacheFileIOManager::ShutdownInternal() which is executed on IO thread.

There is a need.  This fires as an xpcom event and schedules write events on open handles.  The shutdown internal is at the close level and removes all handles, so we would loose the metadata writings.
Attached patch v3 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=3b2d99756911

- as v2 but:
- addressed feedback
- as discussed on IRC, there is a standard impl of isupports
- metadata write (thanks that metadata are represented by a separate class and don't refer the file them self) is forced from cache files's dtor
- this very much simplifies and stabilize the CacheFile referencing code
Attachment #8355380 - Attachment is obsolete: true
Attachment #8356602 - Attachment is obsolete: true
Attachment #8356602 - Flags: feedback?(michal.novotny)
Attachment #8357509 - Flags: feedback?(michal.novotny)
Depends on: 958311
Comment on attachment 8357509 [details] [diff] [review]
v3

Review of attachment 8357509 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFile.cpp
@@ +1374,3 @@
>         this));
>  
> +  rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this);

This will assert in CacheFileMetadata::OnDataWritten() on MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write() without callback, but make sure to pass a copy of buffer since the buffer is released in WriteEvent::~WriteEvent() in this case.

::: netwerk/cache2/CacheFile.h
@@ +165,5 @@
>    bool           mDataAccessed;
>    bool           mDataIsDirty;
>    bool           mWritingMetadata;
>    bool           mKeyIsHash;
> +  bool           mMetadataClosed;

This seems to be unused.

::: netwerk/cache2/CacheFileIOManager.cpp
@@ +1273,5 @@
> +  MOZ_ASSERT(IsOnIOThread());
> +
> +  nsresult rv;
> +
> +  if (!gInstance->mMetadataWritesTimer) {

Internal methods should not be static.

@@ +1333,5 @@
> +CacheFileIOManager::ShutdownMetadataWriteScheduling()
> +{
> +  NS_ENSURE_TRUE(gInstance, NS_ERROR_NOT_INITIALIZED);
> +
> +  if (!IsOnIOThread()) {

I think we will be off the IO thread probably always. So I would always post event to reduce number of possible codepaths.
Attachment #8357509 - Flags: feedback?(michal.novotny) → feedback-
No longer depends on: 958311
(In reply to Michal Novotny (:michal) from comment #14)
> > +  rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this);
> 
> This will assert in CacheFileMetadata::OnDataWritten() on
> MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally
> CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write()
> without callback, but make sure to pass a copy of buffer since the buffer is
> released in WriteEvent::~WriteEvent() in this case.

Why so complicated?  I will remove that assertion and we are done.
(In reply to Honza Bambas (:mayhemer) from comment #15)
> (In reply to Michal Novotny (:michal) from comment #14)
> > > +  rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this);
> > 
> > This will assert in CacheFileMetadata::OnDataWritten() on
> > MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally
> > CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write()
> > without callback, but make sure to pass a copy of buffer since the buffer is
> > released in WriteEvent::~WriteEvent() in this case.
> 
> Why so complicated?  I will remove that assertion and we are done.

BTW, to not let mMetadata refer |this| is the whole point of this patch!
(In reply to Honza Bambas (:mayhemer) from comment #16)
> BTW, to not let mMetadata refer |this| is the whole point of this patch!

Hmm.. got the point now. This could also prevent any potential leaks when the result from some reason could not be fired back.  I'll try to do it.
Attached patch v4 (obsolete) — Splinter Review
- the hunk at CacheFileMetadata::OnDataWritten should be removed yet (forgot)
- addressed review comments

https://tbpl.mozilla.org/?tree=Try&rev=6bb4ca1f0d65
Attachment #8357509 - Attachment is obsolete: true
Comment on attachment 8361373 [details] [diff] [review]
v4

I think it's for r now.
Attachment #8361373 - Flags: review?(michal.novotny)
Attached patch v4.1 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=e8d67f8df5c3

fingers crossed...
Attachment #8361373 - Attachment is obsolete: true
Attachment #8361373 - Flags: review?(michal.novotny)
Attachment #8363276 - Flags: review?(michal.novotny)
Comment on attachment 8363276 [details] [diff] [review]
v4.1

Review of attachment 8363276 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with following fixed

::: netwerk/cache2/CacheFileIOManager.cpp
@@ +1002,5 @@
> +CacheFileIOManager::ScheduleMetadataWrite(CacheFile * aFile)
> +{
> +  NS_ENSURE_TRUE(gInstance, NS_ERROR_NOT_INITIALIZED);
> +
> +  nsRefPtr<CacheFileIOManager> ioMan = gInstance;

gInstance could be nulled out on other thread just before this line. The order should be opposite:

nsRefPtr<CacheFileIOManager> ioMan = gInstance;
NS_ENSURE_TRUE(ioMan, NS_ERROR_NOT_INITIALIZED);

The same issue is in UnscheduleMetadataWrite() and ShutdownMetadataWriteScheduling().

@@ +1054,5 @@
> +  NS_ENSURE_TRUE(target, NS_ERROR_UNEXPECTED);
> +  return target->Dispatch(event, nsIEventTarget::DISPATCH_NORMAL);
> +}
> +
> +// static

not a static method

::: netwerk/cache2/CacheFileMetadata.cpp
@@ +251,5 @@
> +    writeBuffer = mWriteBuf;
> +  } else {
> +    // We are not going to pass |this| as callback to CacheFileIOManager::Write
> +    // so we must allocate a new buffer that will be released automatically when
> +    // write is finished.  This is actually better then to let

then -> than
Attachment #8363276 - Flags: review?(michal.novotny) → review+
Attached patch v4.2 [to land]Splinter Review
Attachment #8363276 - Attachment is obsolete: true
Attachment #8363680 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/570b33dbbd70
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Verified by cache2 trial (bug 967693)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.