1614795 - Migrate the StartupCache write thread to NS_DispatchBackgroundTask

Reporter

Description

•

5 years ago

This would probably be a bit cleaner and maybe a little more performant. Also it should not be difficult. We would just need to replace the mWriteThread checks when we're waiting on the write thread with some atomic boolean.

:Gijs (he/him)

Assignee

Comment 1

•

5 years ago

Tentatively fxperf:p2 but it's hard to tell if we expect this to make a significant dent in things - if not, p3 is probably better.

Whiteboard: [fxperf] → [fxperf:p2]

:Gijs (he/him)

Assignee

Updated

•

5 years ago

Assignee: nobody → gijskruitbosch+bugs

Status: NEW → ASSIGNED

Alex Thayer [:alexical] (she/her)

Reporter

Comment 2

•

5 years ago

So, something to note about this: we currently call WaitOnWriteThread from the main thread during operations like GetBuffer and HasEntry. I actually don't think we need to do this anymore(?) But it will probably be bad if we're waiting for the write to complete on the main thread if that write is dispatched via NS_DispatchBackgroundTask (it's bad today that we're waiting on the main thread for this, it's worse if we move it to NS_DispatchBackgroundTask.) I think in most cases we can do away with blocking the main thread to wait for the write thread, and that should probably just be part of this bug.

:Gijs (he/him)

Assignee

Comment 3

•

5 years ago

(In reply to Doug Thayer [:dthayer] from comment #2)

So, something to note about this: we currently call WaitOnWriteThread from the main thread during operations like GetBuffer and HasEntry. I actually don't think we need to do this anymore(?) But it will probably be bad if we're waiting for the write to complete on the main thread if that write is dispatched via NS_DispatchBackgroundTask (it's bad today that we're waiting on the main thread for this, it's worse if we move it to NS_DispatchBackgroundTask.) I think in most cases we can do away with blocking the main thread to wait for the write thread, and that should probably just be part of this bug.

Right, I think we don't need it for GetBuffer and HasEntry. However, I don't think we can do away with it for PutBuffer. As you say, it'd be worse to wait on the background task -- but I don't see another solution short of copying mTable when writing, which also seems bad, given that it's likely to be a substantial amount of data. I was trying to come up with some way of basically telling the background task to abort the write attempt, and stop reading mTable, but I don't think that works either - not without a mutex/lock, and then we're back to the main thread having to take that mutex/lock when PutBuffer is called, which means it could end up waiting for a read of mTable (and potentially the IO) to be completing on the background task's thread, which would again be bad. It seems like we're likely to hit this in practice given that we will call PutBuffer whenever we load some new chrome document or script.

Some other options I'm considering but not very enthusiastic about:

queue PutBuffer calls if a write is pending. That means we'll miss the startup cache for subsequent loads until the write has finished, but that should not be too bad - all of these caches will fall back to "real" loads until the write finishes and all the PutBuffer calls execute.
similar, but instead of just queueing modifications to mTable, do so in a secondary mPendingTable, and store items there if PutBuffer is called while a write is ongoing. This is annoying because it adds complexity to GetBuffer and HasEntry (which would have to check both if the write is ongoing), but is otherwise probably straightforward to implement - the main downside is additional code and whatever the memory overhead of the second hashtable would be. We can reconcile the table when the write is complete.
use a lock-free hashtable for mTable. But I don't think we've got an implementation of this in the tree. bug 1625101 suggests some movement, but no assignee, and I'm probably not the right person to write one...

We could combine (1) or (2) with some kind of cooperative signaling, where the main thread can call Cancel() on the write task, but then has to wait on the write task to report back to the mainthread that it's got that memo and will stop using mTable before it can proceed with (1) or (2).

Doug, am I missing something obvious? Do you have a preference on any of these approaches and/or do you see an obvious issue with one of them? I guess if I had to pick, (2) seems like it'd be the easiest to switch to (3) once an implementation materializes...

Flags: needinfo?(dothayer)

:Gijs (he/him)

Assignee

Comment 4

•

5 years ago

Hm, another option would be to iterate mTable on the main thread and dispatch multiple tasks to the background task scheduler to do the actual writing -- tasks that would rely on stable references to specific buffers referenced in mTable, so those should be thread-safe. But we'd have to either create a queue of such tasks in one pass, or be able to iterate over mTable in a stable way even if it gets modified - not sure the latter is easily possible with our hashtable implementation.

Alex Thayer [:alexical] (she/her)

Reporter

Comment 5

•

5 years ago

I think we could just add a mutex and TryLock it in PutBuffer - then do nothing if we can't get it. WriteToDisk should only be running if it's 60s after startup, or if we're shutting down, at which point having our PutBuffer succeed really isn't all that important, I think. Thoughts?

Flags: needinfo?(dothayer)

:Gijs (he/him)

Assignee

Comment 6

•

5 years ago

(In reply to Doug Thayer [:dthayer] from comment #5)

I think we could just add a mutex and TryLock it in PutBuffer - then do nothing if we can't get it. WriteToDisk should only be running if it's 60s after startup, or if we're shutting down, at which point having our PutBuffer succeed really isn't all that important, I think. Thoughts?

Wouldn't this risk permanently not caching certain information in startupcache, if startup is slow? Or would we be hoping that it'd get cached the next time we start up?

Flags: needinfo?(dothayer)

Alex Thayer [:alexical] (she/her)

Reporter

Comment 7

•

5 years ago

(In reply to :Gijs (he/him) from comment #6)

(In reply to Doug Thayer [:dthayer] from comment #5)

I think we could just add a mutex and TryLock it in PutBuffer - then do nothing if we can't get it. WriteToDisk should only be running if it's 60s after startup, or if we're shutting down, at which point having our PutBuffer succeed really isn't all that important, I think. Thoughts?

Wouldn't this risk permanently not caching certain information in startupcache, if startup is slow? Or would we be hoping that it'd get cached the next time we start up?

I think it's unlikely - when I say 60s after startup I mean 60s after the most recent PutBuffer call. So startup would have to be very slow AND very non-uniformly slow such that there's a 60s window in the middle of it wherein we don't use the startup cache.

We can also check telemetry to make sure the number of misses doesn't jump up after this.

Flags: needinfo?(dothayer)

:Gijs (he/him)

Assignee

Comment 8

•

5 years ago

Attached file Bug 1614795 - use the background task queue for startupcache writes, r?dthayer!,decoder! — Details

Prior to this patch, the startupcache created its own mWriteThread off which it
wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called
at shutdown, to do the shutdown write if there was any reason to do so, and
from a timer that is re-initialized after every addition to the startup cache,
to run 60s after the last change to the cache.

It then joined that write thread on the main thread (in other words, blocks
on that off-main-thread write completing from the main thread) when:

xpcom-shutdown fired
the startupcache itself gets destroyed
someone calls any of:
- HasEntry
- GetBuffer
- PutBuffer
- InvalidateCache

This patch removes the separate write thread, and instead dispatches a task to
the background task queue, indicating it can block. The task is started in
the same circumstances where we previously used to write (timer from the last
PutBuffer call, and shutdown if necessary).

To ensure it cannot be trying to use the data it writes out (mTable) from
the other thread while that data changes on the main thread, we use a mutex.
The task locks the mutex before starting, and unlocks when finished.
Enumerating the cases that we used to block on joining the thread:

In terms of application shutdown, we expect the background task queue to
either finish the write task, or fail to run it if it hasn't started it yet.

The task keeps a reference to the startupcache object, so it cannot be
destroyed while the task is pending.

Because the write does not modify mTable, and neither does HasEntry,
we do not need to do anything there.

In the GetBuffer case, we do not modify the table unless we have to read
the entry off disk (memmapped into mCacheData). This can only happen if
mCacheData.initialized() returns true, and we specifically call
mCacheData.reset() before firing off the write task to avoid this.
mCacheData is only re-initialized if someone calls LoadArchive(),
which can only happen from Init() (which is guaranteed not to run
again because this is a singleton), or InvalidateCache(), where we lock
the mutex (see below). So this is safe - but we assert on the lock to try
and avoid people breaking this chain of assumptions in the future.

When PutBuffer is called, we try to lock the mutex - but if locking fails
(ie the background thread is writing), we simply fail to store the entry
in the startupcache. In practice, this should be rare - it'd happen if
new calls to PutBuffer happen while writing during shutdown (when really,
we don't care) or when it's been 60 seconds since the last PutBuffer so
we started writing the startupcache.

When InvalidateCache is called, we lock the mutex - we shouldn't try to
write while invalidating, or invalidate while writing. This may be slow,
but in practice nothing should call InvalidateCache except developer
restarts or the -purgecaches commandline flag, so it shouldn't
matter a great deal.

Phabricator Automation

Updated

•

5 years ago

Attachment #9139537 - Attachment description: Bug 1614795 - use the background task queue for startupcache writes, r?dthayer!,froydnj! → Bug 1614795 - use the background task queue for startupcache writes, r?dthayer!,decoder!

Pulsebot

Comment 9

•

5 years ago

Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/4f66650c338d use the background task queue for startupcache writes, r=dthayer,decoder

Noemi Erli[:noemi_erli]

Comment 10

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4f66650c338d

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox77: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla77

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Regressions: 1647378

Jens Stutte [:jstutte]

Updated

•

4 years ago

Regressions: 1701178

Bugzilla

Migrate the StartupCache write thread to NS_DispatchBackgroundTask

Categories

(Core :: XPCOM, enhancement)

Tracking

()

People

(Reporter: alexical, Assigned: Gijs)

References

Details

(Whiteboard: [fxperf:p2])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Updated

Updated

Attachment

General

Description

File Name

Content Type