https://tbpl.mozilla.org/php/getParsedLog.php?id=16757321&tree=Try I occasionally see this leak on try server when I'm testing patches for bug #725993. Although I didn't see this leak without these patches, I'm almost sure the patches don't cause it, so I'm filing a separate bug.
Created attachment 684028 [details] [diff] [review] fix We leak a lot of cache classes when some other code releases output or input stream of the cache entry descriptor after we do a shutdown of the cache service. This patch changes handling of output and input streams so that we know about all the streams in cache descriptor and we can close them when needed. With this change the output or input stream that is released too late is closed during shutdown and no longer references the cache service or any other cache class.
Created attachment 684234 [details] [diff] [review] patch v2 - fixed deadlock when running test_bug654926.js
Created attachment 684524 [details] [diff] [review] patch v3 Yet another (hopefully final) patch version. - fixed wrong cast - fixed intermittent deadlock when running content/base/test/unit/test_bug558431.js - fixed possible race condition in the new as well as in an old code - fixed wrong name of telemetry probe Pushed to try server: https://tbpl.mozilla.org/?tree=Try&rev=0b0e8ee5fc16
Created attachment 685003 [details] [diff] [review] patch v4 Yet another version: - CloseInternal() didn't work correctly in case of nsDecompressInputStreamWrapper and nsCompressOutputStreamWrapper - there was no need for nsICacheStreamInternal, CloseInternal() is now a private method of stream wrappers
Comment on attachment 685003 [details] [diff] [review] patch v4 Clearing the review request for now. Testing with compression enabled pointed out that even the current code is wrong. In some situation we doom the entry in nsCacheEntry::RemoveDescriptor() which might lead to deleting of the cache entry, but we use it later in nsCacheService::CloseDescriptor().
Created attachment 685576 [details] [diff] [review] patch v5 This patch passes tryserver successfully except few intermittent failures. compression disabled: https://tbpl.mozilla.org/?tree=Try&rev=cac35163fad2 compression enabled: https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc Changes from v4: - fixed problem with dooming entries in nsCacheEntry::RemoveDescriptor() and nsCacheEntry::DetachDescriptors() - compressed output stream is correctly finalized when nsCacheEntryDescriptor::Close() is called
Comment on attachment 685576 [details] [diff] [review] patch v5 Review of attachment 685576 [details] [diff] [review]: ----------------------------------------------------------------- This looks good to me. I am a bit worried about the linux opt xpcshell failure listed on https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc (the compression enabled try run) that doesn't seem to have any known failures listed for it. I might have just missed something in my bugzilla search, though. Any insight, Michal? r=me once the linux opt xpcshell failure is taken care of, either through a patch update or pointing out what I missed :)
Attachment #685576 - Flags: review?(hurley) → review+
(In reply to Nick Hurley [:hurley] from comment #7) > This looks good to me. I am a bit worried about the linux opt xpcshell > failure listed on https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc (the > compression enabled try run) that doesn't seem to have any known failures > listed for it. That failure is new to me. I've requested a retrigger (blue plus symbol in the middle of the lower TBPL UI panel; available to anyone with an LDAP login), which should hopefully confirm either way. Could this be related to bug 689247?
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #8) > Could this be related to bug 689247? s/this/the leak that this bug is fixing/
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #9) > (In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #8) > > Could this be related to bug 689247? > > s/this/the leak that this bug is fixing/ It might be. Although it is not clear to me why bug #689247 doesn't occur anymore anywhere except esr10. This leak is caused by change in #725993 because it replaces entries cached as file (which are released on the main thread) with normal entries (which are released on the cache IO thread). So after the change, when HTTP channel closes cache's output stream during the shutdown, the cache IO thread might be already destroyed and the stream is leaked with all referenced objects.
Created attachment 690900 [details] [diff] [review] patch v6 Re-requesting review since the patch has changed significantly. While trying to find the failure of xpcshell test mentioned in comment #7 I found also another problem. There is possible race condition between methods Close(), Read(), Write(), ... called on the stream on the background thread and method CloseInternal() called on the main thread during shutdown. The stream's methods need to be synchronized with a lock, but this can't be done with the current approach. CloseInternal() is called under the cache lock and would acquire stream's lock, on the other hand Close() would grab stream's lock and then later it would grab cache lock. So the changes in patch v6 are: - streams are protected with the lock - streams are always closed with Close() outside the cache lock in nsCacheService::CloseAllStreams() and nsCacheEntryDescriptor::Close() - interface nsIDiskCacheStreamInternal is no longer needed - the termination of cache entries is now the same in nsCacheService::Shutdown() and nsCacheService::OnProfileShutdown() (I never understood why they were different) https://tbpl.mozilla.org/?tree=Try&rev=d6fcd4cd24cc - result without compression https://tbpl.mozilla.org/?tree=Try&rev=c9afb9a4327e - result with compression I'm not sure why test "WinXP debug Mochitest 2" with compression enabled is marked as failing. Maybe because of few assertions in the log, but they are not related to the cache code. Similar assertions are in the log from "WinXP debug Mochitest 4", the test also timed out but there is no cache (even no necko) code in the backtrace. Another run of the same test passed successfully. Other failures in both results seem to be known intermittent failures.
Comment on attachment 690900 [details] [diff] [review] patch v6 Review of attachment 690900 [details] [diff] [review]: ----------------------------------------------------------------- This all looks good to me, and I'm satisfied with the state of try and your explanations of what's going on. r=hurley
Attachment #690900 - Flags: review?(hurley) → review+
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
You need to log in before you can comment on or make changes to this bug.