Closed
Bug 784414
Opened 12 years ago
Closed 12 years ago
Intermittent crash in xpcshell-unpack/test_hotfix.js [@ PR_Seek | nsDiskCacheMap::WriteCacheClean]
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla17
People
(Reporter: emorley, Assigned: bbondy)
References
Details
(Keywords: crash, intermittent-failure)
Crash Data
Attachments
(1 file)
1.95 KB,
patch
|
michal
:
review+
|
Details | Diff | Splinter Review |
Rev3 Fedora 12x64 mozilla-inbound pgo test xpcshell on 2012-08-21 07:30:51 PDT for push eb4392d9bc48 slave: talos-r3-fed64-023 https://tbpl.mozilla.org/php/getParsedLog.php?id=14563427&tree=Mozilla-Inbound { TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell-unpack/head_addons.js | [null : 1269] false == false TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell-unpack/head_addons.js | [null : 1272] false == false *** LOG addons.manager: shutdown *** LOG addons.xpi: shutdown *** LOG addons.xpi-utils: shutdown *** LOG addons.xpi-utils: Database closed TEST-PASS | (xpcshell/head.js) | 140 (+ 0) check(s) passed TEST-INFO | (xpcshell/head.js) | 0 check(s) todo <<<<<<< Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-pgo/1345554102/firefox-17.0a1.en-US.linux-x86_64.crashreporter-symbols.zip PROCESS-CRASH | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell-unpack/test_hotfix.js | application crashed (minidump found) Crash dump filename: /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell-unpack/0960b613-c79c-ecdf-650739c8-754853b2.dmp Operating system: Linux 0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 CPU: amd64 family 6 model 23 stepping 10 2 CPUs Crash reason: SIGSEGV Crash address: 0x0 Thread 8 (crashed) 0 libnspr4.so!PR_Seek [priometh.c:eb4392d9bc48 : 119 + 0x0] rbx = 0x47539450 r12 = 0x80004005 r13 = 0x00000002 r14 = 0x475ac8a8 r15 = 0x00000002 rip = 0x5781ed80 rsp = 0x466feb68 rbp = 0x466feb90 Found by: given as instruction pointer in context 1 libxul.so!nsDiskCacheMap::WriteCacheClean [nsDiskCacheMap.cpp:eb4392d9bc48 : 1281 + 0x8] rbx = 0x47539450 r12 = 0x80004005 r13 = 0x00000002 r14 = 0x475ac8a8 r15 = 0x00000002 rip = 0x582980ad rsp = 0x466feb70 rbp = 0x466feb90 Found by: call frame info 2 libxul.so!nsDiskCacheMap::RevalidateCache [nsDiskCacheMap.cpp:eb4392d9bc48 : 1388 + 0x4] rbx = 0x47539450 r12 = 0x58297faa r13 = 0x00000002 r14 = 0x475ac8a8 r15 = 0x00000002 rip = 0x58297a7c rsp = 0x466feba0 rbp = 0x466febc0 Found by: call frame info 3 libxul.so!nsDiskCacheMap::RevalidateTimerCallback [nsDiskCacheMap.cpp:eb4392d9bc48 : 1355 + 0x7] rbx = 0x47539450 r12 = 0x58297faa r13 = 0x00000002 r14 = 0x475ac8a8 r15 = 0x00000002 rip = 0x58297ff8 rsp = 0x466febd0 rbp = 0x466febe0 Found by: call frame info 4 libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:eb4392d9bc48 : 473 + 0x9] rbx = 0x475ac880 r12 = 0x58297faa r13 = 0x00000002 r14 = 0x475ac8a8 r15 = 0x00000002 rip = 0x58ec7811 rsp = 0x466febf0 rbp = 0x466fec40 Found by: call frame info 5 libxul.so!nsTimerEvent::Run [nsTimerImpl.cpp:eb4392d9bc48 : 556 + 0x7] rbx = 0x475ac880 r12 = 0x00000000 r13 = 0x5a42d970 r14 = 0x00000000 r15 = 0x466fecb0 rip = 0x58ec78f5 rsp = 0x466fec50 rbp = 0x466fec60 Found by: call frame info 6 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:eb4392d9bc48 : 624 + 0x6] rbx = 0x48fa31a0 r12 = 0x00000000 r13 = 0x5a42d970 r14 = 0x00000000 r15 = 0x466fecb0 rip = 0x58ec3eb6 rsp = 0x466fec70 rbp = 0x466fecf0 Found by: call frame info 7 libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp:eb4392d9bc48 : 220 + 0xd] rbx = 0x48fa31a0 r12 = 0x466fed50 r13 = 0x48fa31f0 r14 = 0x00000000 r15 = 0x00000003 rip = 0x58e8ec0c rsp = 0x466fed00 rbp = 0x466fed30 Found by: call frame info 8 libxul.so!nsThread::ThreadFunc [nsThread.cpp:eb4392d9bc48 : 257 + 0xc] rbx = 0x48fa31a0 r12 = 0x466fed50 r13 = 0x48fa31f0 r14 = 0x00000000 r15 = 0x00000003 rip = 0x58ec4af8 rsp = 0x466fed40 rbp = 0x466fed80 Found by: call frame info 9 libnspr4.so!_pt_root [ptthread.c:eb4392d9bc48 : 156 + 0x6] rbx = 0x56512890 r12 = 0xb0fb0c10 r13 = 0x00000000 r14 = 0x00000000 r15 = 0x00000003 rip = 0x57836064 rsp = 0x466fed90 rbp = 0x00000000 }
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → netzen
Assignee | ||
Comment 1•12 years ago
|
||
I think what's happening is that after a Close(), even know it is protected by the lock, and even know it calls Cancel() first before closing the file handle, I think the callback can sometimes still get through.
Attachment #653890 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 2•12 years ago
|
||
Probably it is waiting on the lock inside the callback while Close() is running. Then Close() completes and it keeps running. And now the cache clean file is set to nullptr causing a crash when you seek.
Comment 3•12 years ago
|
||
Comment on attachment 653890 [details] [diff] [review] Patch v1. I would much rather see a proper solution to not process RevalidateTimerCallback() when the timer was canceled. Btw. I've found another potential problem in nsDiskCacheMap::RevalidateTimerCallback(). The disk cache map passed as arg to the timer can be already deleted in case the disk device was shutdown in nsCacheService::Shutdown() instead of nsCacheService::OnProfileShutdown(). We should first check if the disk device nsCacheService::gService->mDiskDevice still exists, then we could access the cache map as nsCacheService::gService->mDiskDevice.mCacheMap. And since we shouldn't use aClosure argument to pass the cache map to the timer, we can use it for the proper canceling. We could have a counter in the cache map, that we increment every time we cancel the timer. We would pass the actual value of the counter as aClosure argument and in nsDiskCacheMap::RevalidateTimerCallback() we would compare it with the counter in the cache map. If it differs the timer was canceled.
Attachment #653890 -
Flags: review?(michal.novotny) → review-
Assignee | ||
Comment 4•12 years ago
|
||
> I would much rather see a proper solution to not process > RevalidateTimerCallback() when the timer was canceled. The problem is that the callback timer is already called, not that it will be called (I was wrong in comment 1, but correct in comment 2). We are inside the timer callback already, context switch happens, we do Close(), and then we go back to the timer callback thread and it waits on the lock. Then Close() finishes, then back in our callback we proceed and obtain the lock. In that case I think this is the proper solution, check if the file is already closed. ---- Re the other comments, I'll work on those.
Assignee | ||
Comment 5•12 years ago
|
||
Actually for the other comments since they aren't related to this exact crash, I'll post another bug. I don't want to hold up a good fix for another potential problem. I'll still work on it right away though.
Assignee | ||
Comment 6•12 years ago
|
||
Comment on attachment 653890 [details] [diff] [review] Patch v1. Review of attachment 653890 [details] [diff] [review]: ----------------------------------------------------------------- Resetting review based on Comment 4. I think this is the correct solution. For the other comments I will implement those in another bug soon.
Attachment #653890 -
Flags: review- → review?(michal.novotny)
Comment 7•12 years ago
|
||
Comment on attachment 653890 [details] [diff] [review] Patch v1. (In reply to Brian R. Bondy [:bbondy] from comment #4) > The problem is that the callback timer is already called, not that it will > be called (I was wrong in comment 1, but correct in comment 2). Sure, I know. But what is the difference? We cancel the timer to avoid performing actions in the timer callback. So the proper solution would be to check in the callback if the timer was canceled. Btw I think that the last diskCacheMap->ResetCacheTimer() in nsDiskCacheMap::RevalidateTimerCallback() should be also inside the lock. Anyway, I'm fine with the current fix if we're going to change it in bug #784657.
Attachment #653890 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Comment 8•12 years ago
|
||
The crash isn't because it is canceled or not though, it is just because the handle gets set to nullptr in the Close() while the timer is waiting on the lock in the callback. This patch will ensure that this specific crash won't happen though so as you mentioned I'll land it. Thanks for the review! > Btw I think that the last diskCacheMap->ResetCacheTimer() > in nsDiskCacheMap::RevalidateTimerCallback() should be also inside the lock. Ya I moved it into the lock in bug 784657.
Assignee | ||
Comment 9•12 years ago
|
||
BTW if we get through bug 784657 today then I'll land these together.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 11•12 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/281b36542acd
Target Milestone: --- → mozilla17
Comment 12•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/281b36542acd
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
You need to log in
before you can comment on or make changes to this bug.
Description
•