Closed Bug 405407 Opened 17 years ago Closed 12 years ago

Merge nsDiskCacheStreamIO and nsDiskCacheStreamOutput

Categories

(Core :: Networking: Cache, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox17 --- unaffected
firefox18 + disabled
firefox19 + disabled
blocking2.0 --- -

People

(Reporter: alfredkayser, Assigned: alfredkayser)

References

Details

(Keywords: memory-footprint, Whiteboard: [should (help) fix assertions too])

Attachments

(2 files, 11 obsolete files)

nsDiskCacheStreamOutput is always one-to-one to nsDiskCacheStreamIO (there can only be one writing stream towards the cache entry at any time). 
So, instead of allocating a separate nsDiskCacheStreamOutput object (which just refers back to the nsDiskCacheStreamIO object), make the StreamIO object itself support the nsIOutputStream and return itself as the output stream for that cache entry. 

This saves a separate malloc for each cache entry that is written to. It also allows to cleanup the links between the two objects, and cleanup the internal interface (Tell() is not used, and Seek & SetEOF are only used when the output stream is opened to set the offset at the desired position).

It also saves some codesize:
Before:
Suite-debug:
  109725 Nov 23 16:57 nsDiskCacheStreams.obj
 1823484 Nov 23 16:57 nkcache_s.lib

FF-Opt:
   81218 Nov 26 10:14 nsDiskCacheStreams.obj
 1432058 Nov 26 10:14 nkcache_s.lib

After:
Suite-debug:
   97694 Nov 26 10:18 nsDiskCacheStreams.obj
 1810006 Nov 26 10:19 nkcache_s.lib
Savings: 13478 bytes

FF-Opt:
   71025 Nov 26 10:17 nsDiskCacheStreams.obj
 1420420 Nov 26 10:17 nkcache_s.lib
Savings: 11638 bytes

So, about 11,5K codesize in total, and less memory allocations.
Note, the nsDiskCacheOutputStreamIO object itself is after the merge even smaller than before.
Flags: blocking1.9?
Attachment #290196 - Flags: review?(dcamp)
Note, this patch also fixes bug 284594 and bug 187034.
Version: unspecified → Trunk
Keywords: footprint
+ing because of codesize/malloc wins but setting P5..
Flags: blocking1.9? → blocking1.9+
Priority: -- → P5
Comment on attachment 290196 [details] [diff] [review]
V1: Merge StreamIO and StreamOutput objects

>-    // no one is interested in us anymore, so we don't need to grab any locks

Maybe this comment should follow its code to the destructor.

Looks good, make sure biesi sr's it.
Attachment #290196 - Flags: review?(dcamp) → review+
Note, the 'Close' function is now part of the nsIOutputStream.
Attachment #290196 - Attachment is obsolete: true
Attachment #292372 - Flags: superreview?(cbiesinger)
Attachment #292372 - Attachment is obsolete: true
Attachment #292375 - Flags: superreview?(cbiesinger)
Attachment #292372 - Flags: superreview?(cbiesinger)
Flags: tracking1.9+ → wanted-next+
Requesting 1.9.1 as it is ready to go, except for the last SR, and this is both a sizeable code saving as well as sizeable memory allocation saving.
Flags: wanted1.9.1?
Attached patch V4: Updated to current trunk (obsolete) — Splinter Review
Attachment #292375 - Attachment is obsolete: true
Attachment #372028 - Flags: review?(cbiesinger)
Attachment #292375 - Flags: superreview?(cbiesinger)
Blocks: 284594, 187034
Depends on: 328196
Blocks: 459117
Linked to Fennec performance improvement bug, as this is a valid, working patch to optimizes cache memory and is a sizeable code reduction.
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Darin, can you have a look on this. We would like to have this for Fennec (and others as well).
Attachment #372028 - Flags: review?(cbiesinger) → review?(darin.moz)
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Alfred,

I'll take this review (Darin is not actively working on necko any more, and biesi's time for it is limited, so we tend to save him for superreviews).
Attachment #372028 - Flags: review?(darin.moz) → review?(jduell.mcbugs)
Note, I have run this also on tryserver and it passed all tests.
Jason, any chance of reviewing this?
blocking2.0: --- → ?
Whiteboard: [should (help) fix assertions too]
Blocks: deadcode
Not holding a release back on this fix.
blocking2.0: ? → -
Attachment #372028 - Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Now that more bugs in the cache area are being fixed, can we also have this one reviewed and committed? This reduces code size, and complexity in the cache IO area.
I'm really not a good reviewer for the cache stuff; I'm not that familiar with
it.  Perhaps Michal?  Or maybe Jason can recommend someone?
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Jason, can you review this, or can you recommend someone? I am desperately looking for someone to review this.
Attachment #372028 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Seeing if Michal can take this review.
Attachment #372028 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Now that more cache code is being revised (awaiting the modern cache of bug 512849), can we at least review and checkin this code? Codesize and memory usage saving and performance improvements are not to be neglected.
Functionally no change, just optimization of memory allocation (one alloc instead of two, and a smaller object).
Michal, ping for review.
(Someone was asking on IRC if they could get this reviewed, since it's been a year!?!).

I really think we need a review dashboard with top ten outstanding reviews listed by time - it's the only way we're going to get rid of this kind of thing...
(In reply to Ed Morley [:edmorley] from comment #19)
I'd suggest a dashboard of pending reviews.
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Shifting review back to Jason, since Michal hasn't responded in the 18 months since.
Attachment #372028 - Flags: review?(michal.novotny) → review?(jduell.mcbugs)
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Sorry to play hot-potato with this patch, but michal really is the better reviewer.

I think part of the reason this patch has lingered is that saving 11K in executable size and a malloc call hasn't seemed pressing enough to spend time on this.  But Dave Camp liked what he saw (comment 3), and either way Alfred deserves an answer as to whether we take this.  I just don't think I'm the right person to make the risk/reward call.
Attachment #372028 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

Looks good, r+ provided it passes tryserver.
Attachment #372028 - Flags: review?(michal.novotny) → review+
Comment on attachment 372028 [details] [diff] [review]
V4: Updated to current trunk

This won't apply, it is at least three years old. Also, it adds usage of nspr types we don't use any more, like PRBool and PRUint32.
We messed up letting this sit in review for so long, we shouldn't make Alfred update the patch. Volunteers?
Eh, I'll take a shot at cleaning up the patch.
Assignee: alfredkayser → joshmoz
Attached patch fix v5 (obsolete) — Splinter Review
A number of things changed between when the last patch was written and now, it isn't just type changes. Please check on the way I integrated telemetry changes and nsIDiskCacheStreamInternal, for example.

https://tbpl.mozilla.org/?tree=Try&rev=e32c6cf56366
Attachment #372028 - Attachment is obsolete: true
Attachment #657355 - Flags: review?(michal.novotny)
Comment on attachment 657355 [details] [diff] [review]
fix v5

Review of attachment 657355 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the above fixed

::: netwerk/cache/nsDiskCacheStreams.cpp
@@ +210,2 @@
>  {
> +    // no one is interested in us anymore, so we don't need to grab any locks

I overlooked this in the previous review. This is no longer true since CloseInternal() calls Flush() which needs to be called under the lock. This code seems to be called always under the lock from nsDiskCacheBinding::~nsDiskCacheBinding() since the binding should IMO live longer than any input/output stream. Just to be sure, please add nsCacheService::AssertOwnsLock() here.

@@ +260,4 @@
>  
>      mozilla::Telemetry::ID id;
>      if (NS_IsMainThread())
>          id = mozilla::Telemetry::NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE_MAIN_THREAD;

This will report both NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE_MAIN_THREAD and NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE_INTERNAL_MAIN_THREAD. I don't think we really need to know whether CloseInternal() was called directly or via Close(), so remove the telemetry from Close().
Attachment #657355 - Flags: review?(michal.novotny) → review+
Attached patch fix v6 (obsolete) — Splinter Review
Attachment #657355 - Attachment is obsolete: true
Attachment #658109 - Flags: review?(michal.novotny)
Comment on attachment 658109 [details] [diff] [review]
fix v6

Also please remove NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE_MAIN_THREAD and NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE from toolkit/components/telemetry/Histograms.json
Attachment #658109 - Flags: review?(michal.novotny) → review+
Attached patch fix v7 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=956aaa5bb3ee
Attachment #658109 - Attachment is obsolete: true
Lots of failures on 10.6 try debug:

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug248970_cache.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug651100.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug654926.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug654926_doom_and_read.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug654926_test_seek.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug712914_secinfo_validation.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_doomentry.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_compressappend.js | test failed (with xpcshell return code: -6), see following log:
Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:222
Attached patch fix (obsolete) — Splinter Review
(In reply to Josh Aas (Mozilla Corporation) from comment #32)
> Lots of failures on 10.6 try debug:

This patch should these failures. Apply on the top of your patch.
Attached patch fix v8 (obsolete) — Splinter Review
Thanks Michal!

https://tbpl.mozilla.org/?tree=Try&rev=e6749605cc4d
Attachment #658266 - Attachment is obsolete: true
Attachment #658513 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/c12db325f863
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Depends on: 804614
Depends on: 810781
Depends on: 812853
Attached patch backout v1.0 (obsolete) — Splinter Review
Preparing to back this out due to regressions. Backout try run:

https://tbpl.mozilla.org/?tree=Try&rev=6d7a92b4c6d4
backed out

http://hg.mozilla.org/integration/mozilla-inbound/rev/452ecece0b17
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 683265 [details] [diff] [review]
backout v1.0

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 804614 810781 812853
User impact if declined: bad regressions, see bugs
Testing completed (on m-c, etc.): 
Risk to taking this patch (and alternatives if risky): some risk, but minor and worth it
String or UUID changes made by this patch: no string or interface changes
Attachment #683265 - Flags: approval-mozilla-aurora?
The two bugs bug 810781 and bug 812853 don't seem to be directly related.

The crash pattern of 804614 seem to hint that the line:
 memcpy(mBuffer + mBufPos, buffer, chunkSize);
causes the crash.
The code directly above is the only place where this buffer is allocated.
If the realloc fails for some reason the memcpy will just break.

There is there some "smart" code to smartly reallocate mBuffer.
However for writing, the buffer is always 16K, so for writing streams, it would be more robust to just allocate that buffer of 16K (as soon as the write stream is opened) and freed on closing of the writestream. 
The variable mBuffer is used for both read and write modus, both the DiskCacheStream cannot do overlapped IO so never doing both read and write at the same time (see other asserts on this in the code).
(In reply to Josh Aas (Mozilla Corporation) from comment #39)
> Comment on attachment 683265 [details] [diff] [review]
> backout v1.0
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): 804614 810781 812853
> User impact if declined: bad regressions, see bugs
> Testing completed (on m-c, etc.): 
> Risk to taking this patch (and alternatives if risky): some risk, but minor
> and worth it
> String or UUID changes made by this patch: no string or interface changes

Aurora=19, Beta=18, so I'll approve for both branches.

Also, can you go into more detail about the risk here? This is a backout, so it's not entirely clear to me. Please add qawanted once you've explained what could go wrong, so that QA can perform exploratory testing.
(In reply to Alex Keybl [:akeybl] from comment #42)

> Also, can you go into more detail about the risk here? This is a backout, so
> it's not entirely clear to me. Please add qawanted once you've explained
> what could go wrong, so that QA can perform exploratory testing.

The back out was pretty clean, so the risk should be minimal. The only win from the patch in the first place was cleaner code and one less malloc (very minor perf win). We need to verify that the regressions are cleared up, and basically stress the cache in testing. Two of the regressions have already been verified fixed in Firefox 20.
Keywords: qawanted
Attachment #683265 - Flags: approval-mozilla-beta?
Depends on: 572011
Attachment #683265 - Flags: approval-mozilla-beta?
Attachment #683265 - Flags: approval-mozilla-beta+
Attachment #683265 - Flags: approval-mozilla-aurora?
Attachment #683265 - Flags: approval-mozilla-aurora+
QA Contact: virgil.dicu
(In reply to Josh Aas (Mozilla Corporation) from comment #43)
> (In reply to Alex Keybl [:akeybl] from comment #42)
> We need to verify that the regressions are cleared up, and
> basically stress the cache in testing. 

What else should be done here apart from clearing/populating the cache? Could you provide any other cases to test here?
Josh, there is an approval+, will you land the patch?

I experienced the issue again.  Some time (days) after a complete cache folder removal.  Could the bug trigger when we evict from cache based on size?
See bug 814010: I am working on a more crash-proof solution and want to apply that before doing the merge of nsDiskCacheStreamIO and nsDiskCacheStreamOutput.
Depends on: 814010
pushed backout to mozilla-beta, mozilla-aurora was closed

http://hg.mozilla.org/releases/mozilla-beta/rev/a1d18db38f23
Josh, can you please expand on what QA can cover, in regards to Virgil's question in comment 44?
Alfred - any interest in taking over this bug and figuring out why it caused so many problems?
(In reply to Virgil Dicu [:virgil] [QA] from comment #44)

> What else should be done here apart from clearing/populating the cache?
> Could you provide any other cases to test here?

The best thing QA can do is make sure the crashes that should be fixed by the backout actually go away. This can be done on a per-bug basis, see the deps list in this bug.

Aside from that, all the information I have access to is in the individual regression bugs.

Thanks for your help!
Yes, I will attempt to find the root cause of these problems.
Particular, I am splitting this patch into two steps:
1. Optimize buffer usage (only for first 16K): bug 814010
2. merge the Write object, this bug.
Bug 814010 should also fix bug 572011.
Assignee: joshmoz → alfredkayser
Two theories as causing issues with the disk cache:
1. See bug 808532 and bug 771832: Possibly in some error cases (or binding issues), the mFD file handle is not closed correctly.
2. See bug 810781: the same css file is offered by the site compressed and uncompressed, possibly that can confused the cache, thinking it is the same url (but with different compression).
A third thing to consider: if for some reason the diskcache file has incorrect size (when compared to the expected size according to the cache entry), it should be thrown away, and reported back as FILE_NOT_FOUND (see bug 812483 to handle such cases) instead of crashing on reading an empty or partly filled file.
(In reply to Josh Aas (Mozilla Corporation) from comment #51)
> (In reply to Virgil Dicu [:virgil] [QA] from comment #44)
> 
> > What else should be done here apart from clearing/populating the cache?
> > Could you provide any other cases to test here?
> 
> The best thing QA can do is make sure the crashes that should be fixed by
> the backout actually go away. This can be done on a per-bug basis, see the
> deps list in this bug.

Virgil, can you look into this please?
Bug 804614 (no crashes after the backout was made) and bug 810781 (verified by Pauly in F18) are verified.
Bug 812853 was verified by Alice on Nightly - F18 beta still remaining (will follow up in that bug)

All looks good.
Keywords: qawanted
Depends on: 808997
Depends on: 725993
Blocks: 824244
try run:
https://tbpl.mozilla.org/?tree=Try&rev=19caeb564470

Crash issue should be solved by:
1. File/buffer changes in byg 814010
2. Previous versions merged FlushOutputStream with Flush, so that it could be called as such on the nsIOutputStream, but FlushOutputStream didn't support to be called multiple times.
Attachment #658808 - Attachment is obsolete: true
Attachment #692735 - Attachment is obsolete: true
Attachment #699063 - Flags: review?(michal.novotny)
Comment on attachment 699063 [details] [diff] [review]
V9: Rebased, after 814010 and others

r+ with the following fixed:


>  nsDiskCacheStreamIO::~nsDiskCacheStreamIO()
>  {
> -    Close();
> +    // Close (without locking) the outputstream
> +    if (mBinding && mOutputStreamIsOpen) {
> +        (void)CloseOutputStream();
> +    }

The comment is wrong or at least misleading. CloseOutputStream() must be called under the lock and AFAICS nsDiskCacheStreamIO::~nsDiskCacheStreamIO() is always called under the cache lock. Please remove the comment and add nsCacheService::AssertOwnsLock() either here or at the beginning of nsDiskCacheStreamIO::CloseOutputStream().


> -nsDiskCacheStreamIO::CloseOutputStream(nsDiskCacheOutputStream *  outputStream)
> +NS_IMETHODIMP
> +nsDiskCacheStreamIO::Close()
>  {
> -    nsCacheServiceAutoLock lock(LOCK_TELEM(NSDISKCACHESTREAMIO_CLOSEOUTPUTSTREAM)); // grab service lock
...
> +    nsCacheServiceAutoLock lock(LOCK_TELEM(NSDISKCACHESTREAMIO_CLOSEOUTPUTSTREAM));

Name of telemetry probe now doesn't correspond with method's name.


> +        id = mozilla::Telemetry::NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE_MAIN_THREAD;
> +    else
> +        id = mozilla::Telemetry::NETWORK_DISK_CACHE_OUTPUT_STREAM_CLOSE;
> +    mozilla::Telemetry::AccumulateTimeDelta(id, start);

The same as above.
Attachment #699063 - Flags: review?(michal.novotny) → review+
Attachment #683265 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/2eba5f66565a
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: