Closed Bug 1226941 Opened 9 years ago Closed 8 years ago

crash in mozilla::net::CacheFileIOManager::Write

Categories

(Core :: Networking: Cache, defect)

43 Branch
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox43 + wontfix
firefox44 + ?

People

(Reporter: philipp, Assigned: michal)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-2b676c0e-7cc3-4c64-bb81-f1dca2151120.
=============================================================
Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::CacheFileIOManager::Write(mozilla::net::CacheFileHandle*, __int64, char const*, int, bool, bool, mozilla::net::CacheFileIOListener*) 	netwerk/cache2/CacheFileIOManager.cpp
1 	xul.dll 	mozilla::net::CacheFileMetadata::WriteMetadata(unsigned int, mozilla::net::CacheFileMetadataListener*) 	netwerk/cache2/CacheFileMetadata.cpp
2 	xul.dll 	mozilla::net::CacheFile::WriteMetadataIfNeededLocked(bool) 	netwerk/cache2/CacheFile.cpp
3 	xul.dll 	mozilla::net::CacheFile::WriteMetadataIfNeeded() 	netwerk/cache2/CacheFile.cpp
4 	xul.dll 	mozilla::net::CacheFileIOManager::Notify(nsITimer*) 	netwerk/cache2/CacheFileIOManager.cpp
5 	xul.dll 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
6 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/TimerThread.cpp

this low-level crash signature has been around for a while but it's now somewhat increasing over the last week. on 43.0b5 it's on #30 making up 0.44% of all crashes...
[Tracking Requested - why for this release]:
This is 1.2% of our 43.0 release crashes while the signature was very low in 42.

Honza, can you find out what's going on there or point to someone who can?
Flags: needinfo?(honzab.moz)
From a glance over CacheFileIOManager::Write, it looks like this could be that either aHandle or ioMan (or IoMan->mIOThread) are null.   Thread 7 in the stacktrace shows that we're in shutdown IIUC (ClosingService.cpp is being hit), so I'm guessing we need to add some null checks here?
Flags: needinfo?(michal.novotny)
ClosingSevice is running all the time, it is a thread that just do all PR_Close calls, so we are not in shutdown

I have spend some time today trying to figure it out but still nothing. 
I was assuming that aHandle is null, or freed
this crash:
https://crash-stats.mozilla.com/report/index/9cc8a590-6ff8-4e4c-9076-9b8572151229

happens through function CacheFile::OnChunkWritten which checks mHandle as well:
http://hg.mozilla.org/releases/mozilla-release/diff/65219cf254c7/netwerk/cache2/CacheFile.cpp#l347
it is a MOZ_ASSERT so I was looking for debug build crashes with that signature, but no luck

There are some crashes on Read as well, but less:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Anet%3A%3ACacheFileIOManager%3A%3ARead&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
(In reply to Jason Duell [:jduell] (needinfo? me) from comment #3)
> From a glance over CacheFileIOManager::Write, it looks like this could be
> that either aHandle or ioMan (or IoMan->mIOThread) are null.   Thread 7 in
> the stacktrace shows that we're in shutdown IIUC (ClosingService.cpp is
> being hit), so I'm guessing we need to add some null checks here?

There are 3 different types of the crash (2 on IO thread and 1 on main thread) and none of them happens during shutdown. Even if ioMan was null the code wouldn't crash. So aHandle is probably null, but we never null out mHandle in CacheFileMetadata, so this isn't use-after-free case. It seems that we try to write metadata of an entry that doesn't have any associated handle, i.e. a memory only entry. But this is checked in CacheFile::WriteMetadataIfNeeded so we must somehow end up with an entry that we think is stored on the disk but it isn't.
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Could this be a duplicate of bug 1203655?
Or better of bug 1121672?
Still a top crash on 43 (#12) but it is too late to ship another fix for 43. 
We should aim to fix this on 44.
I'm slowly running out of ideas here. Let's take one of the 3 possible crashes:

0 mozilla::net::CacheFileIOManager::Write(mozilla::net::CacheFileHandle*, __int64, char const*, int, bool, bool, mozilla::net::CacheFileIOListener*)
1 mozilla::net::CacheFileMetadata::WriteMetadata(unsigned int, mozilla::net::CacheFileMetadataListener*)
2 mozilla::net::CacheFile::WriteMetadataIfNeededLocked(bool)
3 mozilla::net::CacheFile::OnChunkWritten(nsresult, mozilla::net::CacheFileChunk*)
4 mozilla::net::CacheFileChunk::OnDataWritten(mozilla::net::CacheFileHandle*, char const*, nsresult)
5 mozilla::net::WriteEvent::Run()
6 mozilla::net::CacheIOThread::LoopOneLevel(unsigned int)

We must have a valid handle at frame 5, otherwise we would crash in WriteEvent::Run() earlier. WriteEvent holds a reference to the handle, WriteEvent::Run() is called on IO thread and the handle is always deleted only on IO thread, so it simply cannot happen that the handle is deleted between frame 6 and frame 0. WriteEvent gets the handle from CacheFile via CacheFileChunk::Write(), so CacheFile has a valid handle. CacheFile::mHandle is assigned in CacheFile::OnFileOpened() and the handle in CacheFileMetadata is set just few lines below if we already have CacheFileMetadata without the handle. If we don't have metadata at this time we create it later in CacheFile::OnFileOpened() with a valid handle. So if we have a valid handle in CacheFile we should always have a valid handle in CacheFileMetadata. Also it's not possible that we replace CacheFileMetadata which has a valid handle with a newly created CacheFileMetadata, because we do this only in CacheFile::Init() and this method is called in CacheEntry::Load() always on newly created CacheFile.
looking at crash stats, the particular signature doesn't seem to be present in 44 builds or higher at all though...
based on this having been cleared up in 44.. wfm
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.