Closed Bug 586859 Opened 10 years ago Closed 10 years ago

Move startupcache write off the main thread

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+
fennec 2.0+ ---

People

(Reporter: benedict, Assigned: taras.mozilla)

References

(Blocks 1 open bug)

Details

(Whiteboard: [softblocker])

Attachments

(2 files, 5 obsolete files)

Post-startup writes in startupcache should be batched into an in-memory cache, and only written out after a long idle period. All writes should happen off the main thread.
Depends on: 520309
Assignee: nobody → bhsieh
Depends on: 595924
The design that we had in mind was:

Writes always go to an in-memory cache. Attempts to read will check this cache first and the on-disk JAR second. After a sufficient idle period (say, no user input for 5 minutes), this in-memory cache is written out on a separate thread. When this is done, the in-memory cache is deleted and the zipArchive (the representation of the on-disk JAR) is reloaded. 

Need to think about the best way to synchronize this. One way is just to have a lock around the in-memory cache and another lock around the zipArchive, but there may be a better approach.
Assignee: bhsieh → tglek
Attached patch this should do (obsolete) — Splinter Review
This does a couple of things
a) it always batches writes, instead of just first ones. This reduces fragmentation, by writing less. I'm not sure if I broke anything by changing this logic.
b) Writes off main thread. There is no locking, just pr_join in case there is an outstanding write request. I think this is good enough since cache access should be relatively rare, so there is a good chance of a write getting through without anyone having to wait on it.
c) Since we are writing off main thread, I turned on compression to reduce amount of io performed on subsequent startups.
d) Make use of the timestamp field in the jar. File stamps will enable fragmentation analysis to see how many times the startup cache was written to. This will be useful if we later want to trigger rewrites of the cache
Attached patch simplify zipitem reading (obsolete) — Splinter Review
This accomplishes the following
* move zip writing to a single function
* group zip writes
* start the write timer on demand 
* postpone write timer with every cache write
* turn on compression, add timestamps

This passes try
Attachment #488494 - Flags: review?
Attachment #487701 - Attachment is obsolete: true
Attachment #488494 - Flags: review? → review?(ben.hsieh)
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing

Ben, if you don't have time to review this, please just glance over to see if I made any fatal assumptions
This is my first piece of threaded C++, so I might've picked the wrong approach here. The idea is to spawn off a worker thread for writes and hope that nobody else tries to read/write from the cache(which is relatively rare) while the thread runs(otherwise they will block).
Attachment #488599 - Flags: review?(jones.chris.g)
Flags: wanted-fennec1.0?
tracking-fennec: --- → ?
Flags: wanted-fennec1.0?
blocking2.0: --- → final+
Comment on attachment 488599 [details] [diff] [review]
run zipwriter + reload zip on a thread

Approach looks OK as far as I understand it.  I don't know startupcache well to confirm the assumption of cache/read writes being relatively rare.

It's not clear to me how this patch gets startupcache reloaded off the main thread.  It doesn't seem like that's possible, given the apparently synchronous API.

What prevents gStartupCache from being deleted while the writer thread is writing it?  Are all the methods/functions that can mutate gStartupCache protected by WaitOnWriteThread?  Need to document that.

Things this patch needs
 - Document the threading model.  The are exactly two, main and writer thread.  There's only one startupcache "instance" at any time, and it only travels from main-->writer to be written to disk.  Attempts to reincarnate a new instance are serialized after the writing of the old instance.
 - I would rename WaitOnWriteThread() to something more specific, like FinishWritingPreviousCache().
 - Make strong assertions about which functions can be called on which thread.  It seem like WaitOnWriteThread() is main-thread only, right?
Attachment #488599 - Flags: review?(jones.chris.g) → feedback+
(In reply to comment #7)
> Comment on attachment 488599 [details] [diff] [review]
> run zipwriter + reload zip on a thread
> 
> Approach looks OK as far as I understand it.  I don't know startupcache well to
> confirm the assumption of cache/read writes being relatively rare.

Every write to the cache delays the writeout by 60 seconds. Writes occur on cache misses when something is trying to read from cache. Cache read/writes are usually clustered together. Thus there is a high chance of the writeout occurring without anyone blocking on it.

> 
> It's not clear to me how this patch gets startupcache reloaded off the main
> thread.  It doesn't seem like that's possible, given the apparently synchronous
> API.

I meant the part where it does LoadArchive() after the write.

> 
> What prevents gStartupCache from being deleted while the writer thread is
> writing it?  Are all the methods/functions that can mutate gStartupCache
> protected by WaitOnWriteThread?  Need to document that.

The destructor calls WriteToDisk which does WaitOnWriteThread().

> 
> Things this patch needs
>  - Document the threading model.  The are exactly two, main and writer thread. 
> There's only one startupcache "instance" at any time, and it only travels from
> main-->writer to be written to disk.  Attempts to reincarnate a new instance
> are serialized after the writing of the old instance.
>  - I would rename WaitOnWriteThread() to something more specific, like
> FinishWritingPreviousCache().
>  - Make strong assertions about which functions can be called on which thread. 
> It seem like WaitOnWriteThread() is main-thread only, right?

That's the confusing part: it is not mainthread-only. LoadArchive() is called from the main thread upon opening the startup cache. WriteToDisk() is also called from the main thread in the destructor. Ie in the these cases these are inherently synchronous operations, makes no sense to do them on a worker thread.

The write-thread is spawned on a timeout while the app is running. This way we can avoid a slow shutdown. After writing out the cache, the zipreader is reloaded on the same thread.

WaitOnWriteThread() can be called from a main thread to wait for the worker thread to finish. However since the same code is used in the worker thread and main thread, the worker thread can also call WaitOnWriteThread() which does a thread.join on itself which is a no-op.


It's a bit messy, but this is the simplest threading model I could come up with for startupcache.
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing

I'm guessing benh is too busy atm
Attachment #488494 - Flags: review?(ben.hsieh) → review?(dwitte)
(In reply to comment #8)

This seems OK.  Just ensure all that makes it into code comments ;).
Hey, sorry about the delay. (As though I ever could get _my_ patches reviewed in less than a week, ;) ).
Just a few small comments:

>+  nsCOMPtr<nsIZipWriter> zipW = do_CreateInstance("@mozilla.org/zipwriter;1");
>+  if (!zipW)
>+    return;
Probably want a NS_WARNING in this case, as below.

>@@ -388,7 +345,6 @@ StartupCache::InvalidateCache()
> 
>  // This is usually closed, but it's possible to get into
>   // an inconsistent state.
>-  mZipW->Close();
I think you can remove the comment now, right?

>+nsresult
>+StartupCache::ResetStartupWriteTimer()
>+{
>+  mStartupWriteInitiated = PR_FALSE;
Do you still need this variable?

From the other patch:
>+  //stick a functiontimer thing here
>+  //NS_WARNING("Waiting on startupcache write");
Are you still planning on doing either of these?

> nsresult
> StartupCache::PutBuffer(const char* id, const char* inbuf, PRUint32 len) 
> {
>+  WaitOnWriteThread();
This Wait is only necessary in debug, right? Otherwise the PutBuffer doesn't check the existing archive or in-memory table at all? (I'm not sure about this, though.)

So basically the only case where we would block at all is if we did a write, then nothing for 10 minutes, then did another access operation, correct?

Also, maybe we need another test in TestStartupCache now to test interweaving read/writes. One suggestion for that is, you could make ResetStartupWriteTimer() take a param that says how long to wait, which defaults to 10 seconds. Then you can make that shorter in tests.

Other than that, patch looks pretty good. Thanks for the cleanup bits too.
tracking-fennec: ? → 2.0+
Blocks: 593349
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing

>diff --git a/startupcache/StartupCache.cpp b/startupcache/StartupCache.cpp

>@@ -334,7 +286,7 @@ CacheCloseHelper(const nsACString& key, 
>   NS_ASSERTION(NS_SUCCEEDED(rv) && hasEntry == PR_FALSE, 
>                "Existing entry in disk StartupCache.");
> #endif
>-  rv = writer->AddEntryStream(key, 0, 0, stream, false);
>+  rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);

Why is providing holder->time relevant? Are you planning to use it later? It costs cycles to compute it...

>@@ -388,7 +345,6 @@ StartupCache::InvalidateCache()
> 
>   // This is usually closed, but it's possible to get into
>   // an inconsistent state.
>-  mZipW->Close();

No need for the comment now.

>@@ -599,20 +551,29 @@ StartupCacheWrapper::StartupWriteComplet
>   return NS_OK;
> }
> 
>+

Extra newline.

>+nsresult
>+StartupCache::ResetStartupWriteTimer()
>+{

Move this up with the other SC methods, rather than by SCW?

>+  mStartupWriteInitiated = PR_FALSE;
>+  nsresult rv;
>+  if (!mTimer)
>+    mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);
>+  else
>+    rv = mTimer->Cancel();
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  // Wait for 10 seconds, then write out the cache.

Presumably you mean 60. :)

Also, throw some newlines around?

Can you cancel the timer in ~StartupCache? Also, do you want to force a write there, or throw away any unwritten data?

r=dwitte with fixes. Tests? Talos results?
Attachment #488494 - Flags: review?(dwitte) → review+
> >-  rv = writer->AddEntryStream(key, 0, 0, stream, false);
> >+  rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);
> 
> Why is providing holder->time relevant? Are you planning to use it later? It
> costs cycles to compute it...

It's useful for analysis of how startup cache is populated/etc. In the future it will be useful for automatic defrag-logic. CPU overhead is irrelevant in this case.
 
> 
> Can you cancel the timer in ~StartupCache? Also, do you want to force a write
> there, or throw away any unwritten data?
> 
> r=dwitte with fixes. Tests? Talos results?

It's already canceled+flushed. No tests, this just changes existing behavior to be more efficient, no new functionality.
(In reply to comment #13)
> > >-  rv = writer->AddEntryStream(key, 0, 0, stream, false);
> > >+  rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);
> > 
> > Why is providing holder->time relevant? Are you planning to use it later? It
> > costs cycles to compute it...
> 
> It's useful for analysis of how startup cache is populated/etc. In the future
> it will be useful for automatic defrag-logic. CPU overhead is irrelevant in
> this case.
> 
> > 
> > Can you cancel the timer in ~StartupCache? Also, do you want to force a write
> > there, or throw away any unwritten data?
> > 
> > r=dwitte with fixes. Tests? Talos results?
> 
> It's already canceled+flushed. No tests, this just changes existing behavior to
> be more efficient, no new functionality.

you can haz approval?
Addressed review comments
Attachment #487702 - Attachment is obsolete: true
Attachment #488494 - Attachment is obsolete: true
Attachment #498255 - Flags: review+
Added comments on threading as requested
Attachment #488599 - Attachment is obsolete: true
Attachment #498257 - Flags: review?(jones.chris.g)
Comment on attachment 498257 [details] [diff] [review]
run zipwriter + reload zip on a thread

I still don't particularly like WaitOnWriteThread() but wouldn't r- for that.
Attachment #498257 - Flags: review?(jones.chris.g) → review+
Whiteboard: [softblocker]
http://hg.mozilla.org/mozilla-central/rev/c2cc892c5d53
http://hg.mozilla.org/mozilla-central/rev/908f598737d7
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [softblocker]
backed out the threading part in case it was causing an intermittent gc-on-shutdown crash
http://hg.mozilla.org/mozilla-central/rev/dc0cff820b22
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Are you joining the helper thread on xpcom-shutdown? We mustn't leave threads running past the XPCOM shutdown process.
Whiteboard: [softblocker]
(In reply to comment #20)
> Are you joining the helper thread on xpcom-shutdown? We mustn't leave threads
> running past the XPCOM shutdown process.

No, I this patch joins the thread from startupcache destructor. Perhaps that happens too late in shutdown.
Yes, that happens too late.
This is the same patch as before, but it joins the writer thread on shutdown.
Attachment #498257 - Attachment is obsolete: true
Attachment #505205 - Flags: approval2.0?
Blocks: 627208
Attachment #505205 - Flags: approval2.0? → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/ff8275cbf3b1
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Looks like the original landing caused a small Ts regression (~40ms) in Mobile

before:
663b7d2d4b6d ts: 6881.78

after:
22a635e15db1 ts: 6923.56

Unexpected?
(In reply to comment #25)
> Looks like the original landing caused a small Ts regression (~40ms) in Mobile
> 
> before:
> 663b7d2d4b6d ts: 6881.78
> 
> after:
> 22a635e15db1 ts: 6923.56
> 
> Unexpected?

I turned on compression to reduce cold io :(.
Blocks: 637461
Depends on: 706343
You need to log in before you can comment on or make changes to this bug.