Closed Bug 1121282 Opened 9 years ago Closed 2 years ago

Relax threadOpenedOn constraints to allow use of thread-pools or other shifting ownership of connections

Categories

(Toolkit :: Storage, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
107 Branch
Tracking Status
firefox107 --- fixed

People

(Reporter: asuth, Assigned: jjalkanen)

References

(Blocks 4 open bugs)

Details

Attachments

(1 file)

mozStorage sorta has the rule that connections must be created and used on only one thread.  AFAICT this is not actually needed, and it's posing some problems for ServiceWorkers using a thread-pool.

==== Analysis

Connections save the thread they are opened on as threadOpenedOn.

== Assertions / Loud errors

We assertion check that we are on threadOpenedOn in:
- Connection::Clone() via MOZ_ASSERT
- Connection::setClosedState() via NS_ERROR

The assertion here effectively serves as a poor man's way to ensure that we're not racing another thread that's closing the connection.  However, we don't need the poor man's solution anymore since we have mAsyncExecutionThreadShuttingDown (now a misnomer) and mConnectionClosed that are protected by sharedAsyncExecutionMutex.  Clone could acquire the mutex, ensure we aren't closed/closing, and then perform the clone.

(We also assertion check that we're on the main thread for all of our explicit async API stuff.  We don't need to get into that for this bug.)

== Improper thread checks

- AsyncStatement::getAsyncStatement has a DEBUG check to make sure it's not being called on the opening thread.  I suspect this was a hack to avoid needing to retrieve the async thread or save an extra reference off.  (The mDBConnection is already directly accessible, whereas we try to be tricky about handing out that async thread.)  This check could likely be rewritten by using a debug-only field or just doing the legwork to retrieve the thread.

== Dispatching

We depend on threadOpenedOn to dispatch runnables somewhere appropriate, and these all use threadOpenedOn:
- AsyncStatement proxies the connection release
- AsyncStatementFinalizer::Run proxies the connection release
- LastDitchSqliteStatementFinalizer::Run proxies the connection release 
- AsyncInitializeClone::Run dispatches the callback (and threadOpenedOn must be the main thread based on the NS_IsMainThread check in AsyncClone)

- Soon, MinimizeMemory wants to use threadOpenedOn as part of bug 998330.  But really all it wants is a thread that isn't the main thread; using threadOpenedOn assuming the current threadOpenedOn invariant is a pretty good choice since we are potentially forcing whatever thread we post the message to to join on the I/O activity of that connection.  Note that MinimizeMemory is able to run as an atomic operation and its isClosed check is already thread-safe, so it doesn't exactly matter where this goes.

== Other alleged thread issues

- Bug 656675 "SQLite storage statements should assert that they're only ever used on one thread".  I've commented there, but I don't think that bug is a reason to limit the use of connections or statements across threads.

==== Solutions

== Simple

I think we can just:
- update the setClosedState and Clone stuff to be properly thread-safe and stop them freaking out about threadOpenedOn
- let the caller specify an nsIEventTarget to use for dispatch purposes that is initialized to threadOpenedOn if not specified.
- let the caller update the nsIEventTarget too, so that the value can switch to a specific thread when a Manager effectively binds a Connection to a specific thread and then back to the pool when it's unbound.  (This avoids wedging other threads for no reason.)

The primary risk from this is code starts dispatching things more complex/dangerous than releases or MinizeMemory to a thread-pool and things get reordered in a bad way.  However, since threadOpenedOn is an internal thing, a comment and general competence should keep us safe!

== Bad ideas

- Auto-affinity: When using a thread pool, the connection could save what thread it was most recently used on and favor dispatching to that thread, failing over to the thread pool if no longer available.
For my purposes, creating the connection with a single nsIEventTarget and never changing it would be just dandy.  I would like to use nsStreamTransportService (also known as STS):

  https://dxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStreamTransportService.h

This avoids the need to manage a bunch of threads manually just to keep the storage connection open.
I'm just a little bit worried by thread safety issues due to GC/CC (for example bug 1005991, that soon or later we should fix), xpcom, PR_LOG and all the other non safe stuff around not using thread-safe refcounting. I'd be far more comfortable by large if we'd write a completely new threadsafe synchronous cpp minimal API that cannot run on mainthread :( But sure, that has a cost.

It's true Sqlite API doesn't have issues with API calls by different threads, but our boundaries have issues, if we're not super cautious.
For example that bug 998330 (minimize memory) blocks bug 1101419, and that crash is very likely a thread-safety problem on our side, cause Sqlite should not have any issue with executing a query on any thread.

At a minimum, the change should be easy to backout and we should refrain from doing other changes to Storage until we are sure there's no huge crash peak and it will stick.
Service workers could just use SQLite directly, right?
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #2)
> I'm just a little bit worried by thread safety issues due to GC/CC (for
> example bug 1005991, that soon or later we should fix), xpcom

So, yes, everything XPCOM related is potentially a concern.  That specific bug is about us misusing XPCVariant.  For the motivating service worker cache use-case and other new WebAPI things where it's just C++ code talking to non-XPCOM JS Runtimes via WebIDL, we'd have to go out of our way to create screw-ups on this.

> PR_LOG and

I'm under the hazy impression that if PR_LOG had thread-safety issues, they were only a concern if you were dynamically messing with the module list?

> all the other non safe stuff around not using thread-safe refcounting. I'd
> be far more comfortable by large if we'd write a completely new threadsafe
> synchronous cpp minimal API that cannot run on mainthread :( But sure, that
> has a cost.

hm, yeah, there are some ISUPPORTS declarations that are not THREADSAFE, which is concerning.  But I rather suspect we can just change those over.  I would expect it's just laziness/insufficient maintenance that led us there.  It's possible there was some intent related to concerns over XPCOM not really being threadsafe but it being supported.  I expect if the nsIVariant issues on bug 1005991 are addressed, we can mark everything threadsafe.

> It's true Sqlite API doesn't have issues with API calls by different
> threads, but our boundaries have issues, if we're not super cautious.
> For example that bug 998330 (minimize memory) blocks bug 1101419, and that
> crash is very likely a thread-safety problem on our side, cause Sqlite
> should not have any issue with executing a query on any thread.

Yeah, that seems to almost certainly be minimizeMemory on the main thread racing IndexedDB servicing alarms DB requests.  It'd be great if we could get some PR_LOG off of that.

I do think we'd want bug 998330 fixed before doing this bug, especially because of that crasher.


(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #3)
> Service workers could just use SQLite directly, right?

Yes, but that potentially throws out the baby with the bathwater.  Besides the mozStorage C++ binding helpers being useful, there's stuff like TelemetryVFS, the about:memory memory reporters, the PR_LOGging, the lock notification stuff that got added for IndexedDB, etc.

At least, while doing some service worker cache review for :bkelly, I was surprised how reasonable the use of mozStorage was.  mozStorage is still quite crufty on the inside, it's just not clear that it's any more work to fix the known issues we have right now than write some new code and then still have the mozStorage albatross.
Depends on: 998330
(In reply to Ben Kelly [:bkelly] from comment #1)
> For my purposes, creating the connection with a single nsIEventTarget and
> never changing it would be just dandy.  I would like to use
> nsStreamTransportService (also known as STS):

Looking at the stream transport service, for scheduling it simply hands things off to nsThreadPool so it's basically just a thread pool with up to 25 threads that everyone can share.  And nsThreadPool has no magic affinity logic.  If whoever's calling Dispatch on the runnables isn't ensuring only one mozStorageConnection-user is ever scheduled at a time, then there's the potential for a single consumer to block up the entire thread pool as they get distributed over all the idle threads and all but one block on the mutex.  (Or worse yet, they interleave while blocking each other if anything is multi-step and doesn't entirely check out the connection mutex.)

I think DOMCache and mozStorage's async execution use-case want the hoisted-from-media TaskQueue abstraction that runs on top of the also-hoisted SharedThreadPool:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TaskQueue.h
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/SharedThreadPool.h

(NB: I recognize that your comments pre-date the hoisting of these things out of media and these may already be your plans going forward.  I came across a reference to blocked bug 1119864 while trying to grok cache to finish your review which sent me on this understanding sub-sub-sub-quest.  Also, it would be more appropriate to comment in that bug, but we do want to fix the threading model of mozStorage so I give :mak a free heads-up this way!)
(In reply to Andrew Sutherland [:asuth] from comment #5)
> Looking at the stream transport service, for scheduling it simply hands
> things off to nsThreadPool so it's basically just a thread pool with up to
> 25 threads that everyone can share.

I remember a comment on dev-platform (IIRC) about the fact STS is not a free-for-all meal. Effectively at a certain point it was requested to stop trying to put everything in the STS thread pool.

> I think DOMCache and mozStorage's async execution use-case want the
> hoisted-from-media TaskQueue abstraction that runs on top of the
> also-hoisted SharedThreadPool:
> https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TaskQueue.h
> https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/
> SharedThreadPool.h

TaskQueue looks very interesting, it preserves our need to execute stuff in order, by still reducing the global number of threads we use.
Honestly the number of threads was not a big deal, until we started exposing mozStorage more and more to content, but now it may be a concern.
The downside is that sounds like a huge task to move the whole mozStorage to a TaskQueue, and I'm not sure where we'd find resources to do that. Unless e make it a GSOC or other similar mentored program and maybe try to write a new kind of connection (isolated code) that we can use to replace the existing ones (but again, we are already struggling to find resources to move hybrid sync/async connection to purely async connections.
(In reply to Marco Bonardo [::mak] from comment #6)
> The downside is that sounds like a huge task to move the whole mozStorage to
> a TaskQueue, and I'm not sure where we'd find resources to do that.

I think you may be inferring something I didn't mean to imply or there's an iceberg hiding between the waterline I'm not aware of.  I'm talking about two main things:

A) Having getAsyncExecutionTarget() return an AbstractThread (backed by a TaskQueue under the hood) and have us use that instead of a full thread.
B) Support (sync) API consumers who use an AbstractThread since the TaskQueue abstraction seems useful and easier to reason about for all involved parties.

== A: Have getAsyncExecutionTarget() return AbstractThread ==

As you point out, this is not super important since mozStorage's async consumers are somewhat bounded in number; on a currentish Nightly I see mozStorage #1-#7 threads.  We'd probably want our pool able to grow that big anyways to ensure things don't step on each others' toes too much.

But complexity-wise, the conversion doesn't seem so bad.  Connection implements nsIInterfaceRequestor for nsIEventTarget, so we do need to potentially deal with the type cascade.  But Places' Database::DispatchToAsyncThread is hopefully the only consumer.  In bug 599970 where we added it we decided to keep it a secret footgun, and I doubt any code but Places is clever/nefarious enough to want to do that.  If there are other in-tree call-sites, we should be able to patch those fairly easily.  Out-of-tree call-sites are not a problem because binary extensions are forbidden and JS code cannot safely do anything with the nsIEventTarget they get out of the nsIInterfaceRequestor since XPConnect will explode on other threads.

== B: Support (sync) API consumers who use an AbstractThread ==

Also seems not that bad.  Although AbstractThread is its own non-XPCOM type, it is reference counted and it exposes a static CreateXPCOMThreadWrapper that we can use to normalize non-AbstractThread consumers into looking like them.

I would expect us (in the guise of me or bkelly) to implement this one more promptly so DOMCache can reduce the number of threads it can spin up.

Aside: It would also be interesting if QuotaManager could help limit thread DoS attacks by forcing all threads for a given group into their own shared thread pool used by worker threads, indexeddb threads, dom/cache threads, etc.
Priority: -- → P3
Blocks: 1541548
Blocks: 1607603

To update this bug, we'd want to use nsISerialEventTarget instead of nsIEventTarget. And any caller code would want to be using NS_CreateBackgroundTaskQueue rather than using TaskQueue.

Blocks: OPFS
Assignee: nobody → jjalkanen
Pushed by jjalkanen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c2a1541ff12e
Allow use of thread pools for storage connections. r=asuth,dom-storage-reviewers,janv
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 107 Branch
See Also: → 1791767
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: