Intermittent leak of nsCacheEntryDescriptor, nsCacheEntryDescriptor::nsOutputStreamWrapper, nsCacheEntryHashTable, nsCacheService and nsDiskCacheStreamIO in mochitest

RESOLVED FIXED in mozilla20

Status

()

RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: michal.novotny, Assigned: michal.novotny)

Tracking

({intermittent-failure})

Trunk
mozilla20
intermittent-failure
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

6 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=16757321&tree=Try

I occasionally see this leak on try server when I'm testing patches for bug #725993. Although I didn't see this leak without these patches, I'm almost sure the patches don't cause it, so I'm filing a separate bug.
(Assignee)

Comment 1

6 years ago
Created attachment 684028 [details] [diff] [review]
fix

We leak a lot of cache classes when some other code releases output or input stream of the cache entry descriptor after we do a shutdown of the cache service. This patch changes handling of output and input streams so that we know about all the streams in cache descriptor and we can close them when needed. With this change the output or input stream that is released too late is closed during shutdown and no longer references the cache service or any other cache class.
Attachment #684028 - Flags: review?(hurley)
(Assignee)

Comment 2

6 years ago
Created attachment 684234 [details] [diff] [review]
patch v2 - fixed deadlock when running test_bug654926.js
Attachment #684028 - Attachment is obsolete: true
Attachment #684028 - Flags: review?(hurley)
Attachment #684234 - Flags: review?(hurley)
(Assignee)

Comment 3

6 years ago
Created attachment 684524 [details] [diff] [review]
patch v3

Yet another (hopefully final) patch version.

- fixed wrong cast
- fixed intermittent deadlock when running content/base/test/unit/test_bug558431.js
- fixed possible race condition in the new as well as in an old code
- fixed wrong name of telemetry probe

Pushed to try server: https://tbpl.mozilla.org/?tree=Try&rev=0b0e8ee5fc16
Attachment #684234 - Attachment is obsolete: true
Attachment #684234 - Flags: review?(hurley)
Attachment #684524 - Flags: review?(hurley)
(Assignee)

Comment 4

6 years ago
Created attachment 685003 [details] [diff] [review]
patch v4

Yet another version:

- CloseInternal() didn't work correctly in case of nsDecompressInputStreamWrapper and nsCompressOutputStreamWrapper
- there was no need for nsICacheStreamInternal, CloseInternal() is now a private method of stream wrappers
Attachment #684524 - Attachment is obsolete: true
Attachment #684524 - Flags: review?(hurley)
Attachment #685003 - Flags: review?(hurley)
(Assignee)

Comment 5

6 years ago
Comment on attachment 685003 [details] [diff] [review]
patch v4

Clearing the review request for now. Testing with compression enabled pointed out that even the current code is wrong. In some situation we doom the entry in nsCacheEntry::RemoveDescriptor() which might lead to deleting of the cache entry, but we use it later in nsCacheService::CloseDescriptor().
Attachment #685003 - Flags: review?(hurley)
(Assignee)

Comment 6

6 years ago
Created attachment 685576 [details] [diff] [review]
patch v5

This patch passes tryserver successfully except few intermittent failures.

compression disabled: https://tbpl.mozilla.org/?tree=Try&rev=cac35163fad2
compression enabled: https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc

Changes from v4:
- fixed problem with dooming entries in nsCacheEntry::RemoveDescriptor() and nsCacheEntry::DetachDescriptors()
- compressed output stream is correctly finalized when nsCacheEntryDescriptor::Close() is called
Attachment #685003 - Attachment is obsolete: true
Attachment #685576 - Flags: review?(hurley)
Comment on attachment 685576 [details] [diff] [review]
patch v5

Review of attachment 685576 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. I am a bit worried about the linux opt xpcshell failure listed on https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc (the compression enabled try run) that doesn't seem to have any known failures listed for it. I might have just missed something in my bugzilla search, though. Any insight, Michal?

r=me once the linux opt xpcshell failure is taken care of, either through a patch update or pointing out what I missed :)
Attachment #685576 - Flags: review?(hurley) → review+

Updated

6 years ago
Keywords: intermittent-failure

Comment 8

6 years ago
(In reply to Nick Hurley [:hurley] from comment #7)
> This looks good to me. I am a bit worried about the linux opt xpcshell
> failure listed on https://tbpl.mozilla.org/?tree=Try&rev=3d2bf79ba4bc (the
> compression enabled try run) that doesn't seem to have any known failures
> listed for it.

That failure is new to me. I've requested a retrigger (blue plus symbol in the middle of the lower TBPL UI panel; available to anyone with an LDAP login), which should hopefully confirm either way.

Could this be related to bug 689247?

Comment 9

6 years ago
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #8)
> Could this be related to bug 689247?

s/this/the leak that this bug is fixing/
(Assignee)

Updated

6 years ago
Depends on: 817556
(Assignee)

Comment 10

6 years ago
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #9)
> (In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #8)
> > Could this be related to bug 689247?
> 
> s/this/the leak that this bug is fixing/

It might be. Although it is not clear to me why bug #689247 doesn't occur anymore anywhere except esr10. This leak is caused by change in #725993 because it replaces entries cached as file (which are released on the main thread) with normal entries (which are released on the cache IO thread). So after the change, when HTTP channel closes cache's output stream during the shutdown, the cache IO thread might be already destroyed and the stream is leaked with all referenced objects.
(Assignee)

Comment 11

6 years ago
Created attachment 690900 [details] [diff] [review]
patch v6

Re-requesting review since the patch has changed significantly. While trying to find the failure of xpcshell test mentioned in comment #7 I found also another problem. There is possible race condition between methods Close(), Read(), Write(), ... called on the stream on the background thread and method CloseInternal() called on the main thread during shutdown. The stream's methods need to be synchronized with a lock, but this can't be done with the current approach. CloseInternal() is called under the cache lock and would acquire stream's lock, on the other hand Close() would grab stream's lock and then later it would grab cache lock. So the changes in patch v6 are:

- streams are protected with the lock
- streams are always closed with Close() outside the cache lock in nsCacheService::CloseAllStreams() and nsCacheEntryDescriptor::Close()
- interface nsIDiskCacheStreamInternal is no longer needed
- the termination of cache entries is now the same in nsCacheService::Shutdown() and nsCacheService::OnProfileShutdown() (I never understood why they were different)


https://tbpl.mozilla.org/?tree=Try&rev=d6fcd4cd24cc - result without compression
https://tbpl.mozilla.org/?tree=Try&rev=c9afb9a4327e - result with compression

I'm not sure why test "WinXP debug Mochitest 2" with compression enabled is marked as failing. Maybe because of few assertions in the log, but they are not related to the cache code. Similar assertions are in the log from "WinXP debug Mochitest 4", the test also timed out but there is no cache (even no necko) code in the backtrace. Another run of the same test passed successfully. Other failures in both results seem to be known intermittent failures.
Attachment #685576 - Attachment is obsolete: true
Attachment #690900 - Flags: review?(hurley)
Comment on attachment 690900 [details] [diff] [review]
patch v6

Review of attachment 690900 [details] [diff] [review]:
-----------------------------------------------------------------

This all looks good to me, and I'm satisfied with the state of try and your explanations of what's going on. r=hurley
Attachment #690900 - Flags: review?(hurley) → review+
https://hg.mozilla.org/mozilla-central/rev/1c74122a6fcf
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla20

Updated

6 years ago
Blocks: 814010

Updated

6 years ago
Blocks: 405407
(Assignee)

Updated

6 years ago
Depends on: 822036, 822074, 824484, 825525
You need to log in before you can comment on or make changes to this bug.