Closed
Bug 949250
Opened 10 years ago
Closed 10 years ago
crash in jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()
Categories
(Core :: Networking: Cache, defect)
Tracking
()
VERIFIED
FIXED
mozilla29
People
(Reporter: mayhemer, Assigned: mayhemer)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file, 7 obsolete files)
30.70 KB,
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-7582a8fa-9609-4d41-b60e-36b942131210. ============================================================= Other reports: https://crash-stats.mozilla.com/report/index/c9de2d85-c3e9-44fd-8e14-bad702131202 https://crash-stats.mozilla.com/report/index/17453456-ea0e-4b1c-a036-fb3782131127 https://crash-stats.mozilla.com/report/index/c4bcadfe-f1e8-41c6-a3f7-620b32131203
![]() |
||
Updated•10 years ago
|
Crash Signature: [@ jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()] → [@ jemalloc_crash | arena_dalloc | mozilla::net::CacheFileChunk::Release()]
[@ jemalloc_crash | free | mozilla::net::CacheFileChunk::Release() ]
[@ moz_abort | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | je_free | mozilla::net::CacheFileC…
![]() |
Assignee | |
Comment 1•10 years ago
|
||
Since all crashes has mozilla::net::MetadataWriteTimer::Notify on the stack and there is cross-thread usage of non-thread-safe nsWeakReference I decided to remove using it altogether. I've spent some time on trying to implement threadsafe weak ptr but that was really hard and would just introduce a new complex and potentially unreliable code. Omitting the weak ref and using hard ref leads to leaks: there are relations like: CacheFile <--> MetadataWriteTimer <--> nsTimerImpl During shutdown it only changes to: CacheFile <--> MetadataWriteTimer --> nsTimerImpl ..so the cycle CacheFile <--> MetadataWriteTimer remains. I went a simple way, there is actually no need for allocating and referring any new MetadataWriteTimer object. I made it (to keep the patch simpler) a nested member class that manages the timer while CacheFile receives the notification directly itself. This bug should have higher attention then 945945 since we have much more crash reports for this signature then CacheFileHandle. Also, it might be that this is the cause of all the crashes. Michal, please try to take a look at this quickly. https://tbpl.mozilla.org/?tree=Try&rev=69c739009534 (alone) https://tbpl.mozilla.org/?tree=Try&rev=afbff2422fa9 (over bug 945945)
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8349407 -
Flags: review?(michal.novotny)
Comment 2•10 years ago
|
||
Comment on attachment 8349407 [details] [diff] [review] v1 Review of attachment 8349407 [details] [diff] [review]: ----------------------------------------------------------------- Hmm, it seems that this bug could be really cause of all crashes. One thing I don't like about the new patch is that we can loose entries or at least some data during shutdown. Without this patch, following sequence happens during shutdown when timer is fired and only CacheStorageService references the CacheFile (via CacheEntry): 1) CacheStorageService::Shutdown() 2) CacheFile::Release() 3) CacheFile::WriteMetadataIfNeeded() - since this was the last reference 4) CacheFileIOManager::Shutdown() 5) nsTimerImpl::Shutdown() With the new patch, timer references the CacheFile, so following happens. 1) CacheStorageService::Shutdown() 2) CacheFile::Release() 3) CacheFileIOManager::Shutdown() 4) nsTimerImpl::Shutdown() 5) CacheFile::WriteMetadataIfNeeded() - this fails since CacheFileIOManager is already shutdown I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called when removing all CacheEntries either in CacheStorageService::Shutdown() or in CacheStorageService::ShutdownBackground(). ::: netwerk/cache2/CacheFile.cpp @@ +103,2 @@ > { > + Cancel(); IMO this has no effect. When mTimer is active it refers CacheFile, so we cannot be here. And once the timer is cancelled or executed, there is nothing to cancel. ::: netwerk/cache2/CacheFile.h @@ +190,5 @@ > + nsresult Fire(CacheFile * aFile); > + nsresult Cancel(); > + void Release(); > + bool ShouldFireNew(); > + } mTimer; Maybe rename this member to mMetaTimer so it cannot be confused with the nested mTimer. Btw, I know this came from my original code, but I didn't realized it until I did the review :)
Attachment #8349407 -
Flags: review?(michal.novotny)
Comment 3•10 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #2) > I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called > when removing all CacheEntries either in CacheStorageService::Shutdown() or > in CacheStorageService::ShutdownBackground(). And, of course, don't allow any other timer to be posted after this point.
![]() |
Assignee | |
Comment 4•10 years ago
|
||
Checking whether this patch alone leaks: https://tbpl.mozilla.org/?tree=Try&rev=10cdf3035c2c
![]() |
Assignee | |
Comment 5•10 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #2) > I think we need to make sure CacheFile::WriteMetadataIfNeeded() is called > when removing all CacheEntries either in CacheStorageService::Shutdown() or > in CacheStorageService::ShutdownBackground(). Understand. I think ~CacheEntry may force this write. It will also conform the plan on fast shutdown. Writes will be bypassed when shutdown would be blocked. > > ::: netwerk/cache2/CacheFile.cpp > @@ +103,2 @@ > > { > > + Cancel(); > > IMO this has no effect. When mTimer is active it refers CacheFile, so we > cannot be here. And once the timer is cancelled or executed, there is > nothing to cancel. No. In case the initiation of the timer fails, we still reference it. But it still must be released on the same thread it has been created on.
![]() |
Assignee | |
Comment 6•10 years ago
|
||
I decided to change this patch from scratch. I think we need one global timer with one global place to call on callbacks at one time. This way it's simple to call them all at the shutdown time, synchronously and safely before IOman shutdown.
![]() |
Assignee | |
Comment 7•10 years ago
|
||
- based on v1 (on current gum's tip) - global array at ioman where files are hreffed - one timer for them all - shutdown simply iterates the array
Attachment #8355380 -
Flags: feedback?(michal.novotny)
![]() |
Assignee | |
Comment 8•10 years ago
|
||
Since crashes on https://tbpl.mozilla.org/?tree=Gum&rev=f1e41e89c176 I rather rewrote the Release() method altogether. Explanation: - Current CacheFile::Release() is missing one important feature - refcnt stabilization - Lack of it causes reenter of Release with same count value (actually a recursion) when something AddRefs/Releases |this| before we delete it (same reason why default Release impl does stabilization before deleting the object - before dtor is called) - I wanted to clean the code up a bit so I used CacheFileAutoLock after stabilization to actually do two things: 1. provide mutex protection 2. provide kung-fu death grip over call to WriteMetadataIfNeeded - This way we are fully independent on state of the object after WriteMetadataIfNeeded exited - Subsequent calls (even recursive) to Release() will delete the object correctly (definitely only ones) https://tbpl.mozilla.org/?tree=Try&rev=b814b16c2db6
Attachment #8349407 -
Attachment is obsolete: true
Attachment #8356409 -
Flags: feedback?(michal.novotny)
![]() |
Assignee | |
Comment 9•10 years ago
|
||
Showed up we could not release mMetadata. Using flag to indicate the point of no return. https://tbpl.mozilla.org/?tree=Try&rev=df52d95fed57
Attachment #8356409 -
Attachment is obsolete: true
Attachment #8356409 -
Flags: feedback?(michal.novotny)
Attachment #8356602 -
Flags: feedback?(michal.novotny)
Comment 11•10 years ago
|
||
Comment on attachment 8355380 [details] [diff] [review] v2 Review of attachment 8355380 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/cache2/CacheFile.cpp @@ +1420,5 @@ > return mDataIsDirty || mMetadata->IsDirty(); > } > > void > +CacheFile::MaybeWriteMetadataIfNeeded() This is really ugly name :) Please rename this method to WriteMetadataIfNeeded() and the original WriteMetadataIfNeeded() to WriteMetadataIfNeededLocked(). Then remove mMemoryOnly check. After fixing bug #949175, we cannot be here when mMemoryOnly == true. ::: netwerk/cache2/CacheFileIOManager.cpp @@ +1087,5 @@ > Telemetry::AutoTimer<Telemetry::NETWORK_DISK_CACHE_SHUTDOWN_V2> shutdownTimer; > > CacheIndex::PreShutdown(); > > + ShutdownMetadataWriteScheduling(); There is no need to post another event. Call the shutdown function from CacheFileIOManager::ShutdownInternal() which is executed on IO thread. @@ +1245,5 @@ > } > > +// static > +nsresult > +CacheFileIOManager::ScheduleMetadataWrite(CacheFile * aFile) Please use the style that is common for this file. I.e. split logic into static method (which can be called from any thread) and a method with suffix Internal, which is guaranteed to run on IO thread.
Attachment #8355380 -
Flags: feedback?(michal.novotny) → feedback-
![]() |
Assignee | |
Comment 12•10 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #11) > Comment on attachment 8355380 [details] [diff] [review] > v2 > > Review of attachment 8355380 [details] [diff] [review]: > ----------------------------------------------------------------- > Thanks. > Then remove mMemoryOnly check. Not until the MOZ_ASSERT(!mMemoryOnly); is removed. > > ::: netwerk/cache2/CacheFileIOManager.cpp > @@ +1087,5 @@ > > Telemetry::AutoTimer<Telemetry::NETWORK_DISK_CACHE_SHUTDOWN_V2> shutdownTimer; > > > > CacheIndex::PreShutdown(); > > > > + ShutdownMetadataWriteScheduling(); > > There is no need to post another event. Call the shutdown function from > CacheFileIOManager::ShutdownInternal() which is executed on IO thread. There is a need. This fires as an xpcom event and schedules write events on open handles. The shutdown internal is at the close level and removes all handles, so we would loose the metadata writings.
![]() |
Assignee | |
Comment 13•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=3b2d99756911 - as v2 but: - addressed feedback - as discussed on IRC, there is a standard impl of isupports - metadata write (thanks that metadata are represented by a separate class and don't refer the file them self) is forced from cache files's dtor - this very much simplifies and stabilize the CacheFile referencing code
Attachment #8355380 -
Attachment is obsolete: true
Attachment #8356602 -
Attachment is obsolete: true
Attachment #8356602 -
Flags: feedback?(michal.novotny)
Attachment #8357509 -
Flags: feedback?(michal.novotny)
Comment 14•10 years ago
|
||
Comment on attachment 8357509 [details] [diff] [review] v3 Review of attachment 8357509 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/cache2/CacheFile.cpp @@ +1374,3 @@ > this)); > > + rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this); This will assert in CacheFileMetadata::OnDataWritten() on MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write() without callback, but make sure to pass a copy of buffer since the buffer is released in WriteEvent::~WriteEvent() in this case. ::: netwerk/cache2/CacheFile.h @@ +165,5 @@ > bool mDataAccessed; > bool mDataIsDirty; > bool mWritingMetadata; > bool mKeyIsHash; > + bool mMetadataClosed; This seems to be unused. ::: netwerk/cache2/CacheFileIOManager.cpp @@ +1273,5 @@ > + MOZ_ASSERT(IsOnIOThread()); > + > + nsresult rv; > + > + if (!gInstance->mMetadataWritesTimer) { Internal methods should not be static. @@ +1333,5 @@ > +CacheFileIOManager::ShutdownMetadataWriteScheduling() > +{ > + NS_ENSURE_TRUE(gInstance, NS_ERROR_NOT_INITIALIZED); > + > + if (!IsOnIOThread()) { I think we will be off the IO thread probably always. So I would always post event to reduce number of possible codepaths.
Attachment #8357509 -
Flags: feedback?(michal.novotny) → feedback-
![]() |
Assignee | |
Comment 15•10 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #14) > > + rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this); > > This will assert in CacheFileMetadata::OnDataWritten() on > MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally > CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write() > without callback, but make sure to pass a copy of buffer since the buffer is > released in WriteEvent::~WriteEvent() in this case. Why so complicated? I will remove that assertion and we are done.
![]() |
Assignee | |
Comment 16•10 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #15) > (In reply to Michal Novotny (:michal) from comment #14) > > > + rv = mMetadata->WriteMetadata(mDataSize, aFireAndForget ? nullptr : this); > > > > This will assert in CacheFileMetadata::OnDataWritten() on > > MOZ_ASSERT(mListener) when aFireAndForget is true. Ideally > > CacheFileMetadata::WriteMetadata() should call CacheFileIOManager::Write() > > without callback, but make sure to pass a copy of buffer since the buffer is > > released in WriteEvent::~WriteEvent() in this case. > > Why so complicated? I will remove that assertion and we are done. BTW, to not let mMetadata refer |this| is the whole point of this patch!
![]() |
Assignee | |
Comment 17•10 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #16) > BTW, to not let mMetadata refer |this| is the whole point of this patch! Hmm.. got the point now. This could also prevent any potential leaks when the result from some reason could not be fired back. I'll try to do it.
![]() |
Assignee | |
Comment 18•10 years ago
|
||
- the hunk at CacheFileMetadata::OnDataWritten should be removed yet (forgot) - addressed review comments https://tbpl.mozilla.org/?tree=Try&rev=6bb4ca1f0d65
Attachment #8357509 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 19•10 years ago
|
||
Comment on attachment 8361373 [details] [diff] [review] v4 I think it's for r now.
Attachment #8361373 -
Flags: review?(michal.novotny)
![]() |
Assignee | |
Comment 20•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=e8d67f8df5c3 fingers crossed...
Attachment #8361373 -
Attachment is obsolete: true
Attachment #8361373 -
Flags: review?(michal.novotny)
Attachment #8363276 -
Flags: review?(michal.novotny)
Comment 21•10 years ago
|
||
Comment on attachment 8363276 [details] [diff] [review] v4.1 Review of attachment 8363276 [details] [diff] [review]: ----------------------------------------------------------------- r+ with following fixed ::: netwerk/cache2/CacheFileIOManager.cpp @@ +1002,5 @@ > +CacheFileIOManager::ScheduleMetadataWrite(CacheFile * aFile) > +{ > + NS_ENSURE_TRUE(gInstance, NS_ERROR_NOT_INITIALIZED); > + > + nsRefPtr<CacheFileIOManager> ioMan = gInstance; gInstance could be nulled out on other thread just before this line. The order should be opposite: nsRefPtr<CacheFileIOManager> ioMan = gInstance; NS_ENSURE_TRUE(ioMan, NS_ERROR_NOT_INITIALIZED); The same issue is in UnscheduleMetadataWrite() and ShutdownMetadataWriteScheduling(). @@ +1054,5 @@ > + NS_ENSURE_TRUE(target, NS_ERROR_UNEXPECTED); > + return target->Dispatch(event, nsIEventTarget::DISPATCH_NORMAL); > +} > + > +// static not a static method ::: netwerk/cache2/CacheFileMetadata.cpp @@ +251,5 @@ > + writeBuffer = mWriteBuf; > + } else { > + // We are not going to pass |this| as callback to CacheFileIOManager::Write > + // so we must allocate a new buffer that will be released automatically when > + // write is finished. This is actually better then to let then -> than
Attachment #8363276 -
Flags: review?(michal.novotny) → review+
![]() |
Assignee | |
Comment 22•10 years ago
|
||
Attachment #8363276 -
Attachment is obsolete: true
Attachment #8363680 -
Flags: review+
![]() |
Assignee | |
Comment 23•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/570b33dbbd70
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in
before you can comment on or make changes to this bug.
Description
•