Investigate an alternative way to chunk async statements results notifications

NEW
Unassigned

Status

()

enhancement
P2
normal
2 years ago
Last year

People

(Reporter: mak, Unassigned)

Tracking

(Blocks 1 bug)

55 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fxsearch])

Reporter

Description

2 years ago
Some relevant previous comments:
https://bugzilla.mozilla.org/show_bug.cgi?id=1320301#c18
https://bugzilla.mozilla.org/show_bug.cgi?id=1320301#c23

We currently throttle results, when an async statement is executed and a row is returned, we add it to an array of results, and notify once one of these conditions is met:
n_results > MAX_ROWS_PER_RESULT || time_from_last_notification = MAX_MILLISECONDS_BETWEEN_RESULTS

The reason is mostly to avoid flooding the main-thread with events that may in the end just cause jank, something the async API was created to avoid.

The problems with the current approach are:
1. many queries may return far less than 15 results, but we don't have a good measurement to make better calls. In any case, this means that any query taking more then 75ms to completion won't notify any result before 75ms.
2. The timeout is actually just a delay and results of a query are not supposed to always arrive evenly spaced. Supposing a query returns 1 result after 5ms and one after 1s, both results will be notified after 1s, since it's just when examining the second result we check the delay. This is particularly bad for linear scans, recursive queries and queries that Sqlite can parallelize (UNION ALL, multi-threaded-sorting...)

Andrew suggested looking into ThrottledEventQueue to avoid the main-thread flooding, though it seems to be only partially applicable to our case.
We need a notifications thread where we can enqueue results, and that thread should have an actual watchdog to notify ONCE at given time/count thresholds (likely less than 75ms). We basically need something like a ThrottledEventQueue that can merge notifications, we could potentially have a single notifications thread for the whole Storage service.

Whether we can reuse ThrottledEventQueue or another approach, is what this bug is up to.

I actually think we could gain some interesting responsiveness benefits out of this, for most mozStorage consumers, though we must always take into account the main-thread flooding problem.
I don't think ThrottledEventQueue really helps here.  You probably need to build something specific to storage.
ThrottledEventQueue was an attractive solution to band-aid back-pressure onto the existing implementation without having to worry about doing clever synchronization things, but I wasn't thinking about how our existing batching could indeed use an overhaul.  (In particular, it only took a very small amount of code and its single-dispatch was oriented around dealing with nested event loops, which was a nice freebie if Places could end up doing that.)

Given that we no longer need to worry about rogue extensions doing pathologically bad stuff and Quantum DOM has given us multiple prioritized event queues, I think main-thread flooding is a greatly reduced concern.  If Places is regularly issuing queries with result counts that approach thousands and there's some JS-digesting logic that we could move off the main thread, then I think we should pursue WebIDL bindings to let the logic run in a ChromeWorker.

I asked :bkelly if we could use the https://streams.spec.whatwg.org/ ReadableStream implementation we've now landed, but it sounds like the freebies I was hoping for would require us to implement stream support for transferables/etc. which aren't there yet.

I think my favored approach is to change the results notification process to be something like:
- We have a mutex-guarded: results collection, "event-in-flight" boolean, error-to-signal errorObj, and "need-to-signal-completion" boolean.
- Whenever we have something to report, we acquire the mutex and put our result in the collection and/or set the various flags.  Prior to releasing the mutex, if "event-in-flight" is not true, we set it, drop the mutex, and dispatch the runnable.
- When the notification event runnable runs, it acquires the mutex and pulls all the current results out, any error, and the need-to-signal completion boolean, then drops the mutex.  It does the callback for all results, it signals the error if it exists, and it signals completion if the boolean was set.

As long as the nsIEventTarget the results runnable event is dispatched at is sufficiently highly prioritized and not backlogged, I think this minimizes latency.  It's conceivable that we might need to enable awesomebar and other queries to specify that they should have their results go to a higher priority nsIEventTarget than the one, say, used for viewed-link-colorizing.  That seems doable.

If we find that the time-slices the runnable is consuming are too long, the runnable could limit how many results it reports at a time, self-reschedule itself before issuing the callbacks, and not clear the "event-in-flight" boolean until it has consumed the results array down to 0.
Reporter

Comment 3

2 years ago
(In reply to Andrew Sutherland [:asuth] from comment #2)
> If
> Places is regularly issuing queries with result counts that approach
> thousands and there's some JS-digesting logic that we could move off the
> main thread, then I think we should pursue WebIDL bindings to let the logic
> run in a ChromeWorker.

I'm actually looking for a shorter term solution that could be done in the 58 cycle. While on paper there are a lot of good ways to reimplement the Places views, in reality we are very much resource constrained.

> - Whenever we have something to report, we acquire the mutex and put our
> result in the collection and/or set the various flags.  Prior to releasing
> the mutex, if "event-in-flight" is not true, we set it, drop the mutex, and
> dispatch the runnable.

I see that with QF we are less concerned, but the existence of ThrottledEventQueue itself seems to point out we should not just drop the ball and assume things will be great if we notify at every result. I actually think the main-thread will still be quite busy for a while, and we're not yet at a point where we can ignore its issues.
If I correctly understand your suggestion, basically we keep adding results in the time that it takes for the runnable to start after we dispatched it. Do we have an idea of how long that time is?
And I assume that if we drop chunking we could indeed use ThrottledEventQueue to avoid flooding (that could indeed enlarge the above time).

My doubts are:
1. chunking will start to depend deeper on the thread scheduler, that means it will vary across OS and may create annoying-to-debug differences.
2. we may not chunk enough and still have too much runnables overhead

My idea involved a third thread, but now that I look at your suggestion to have mutex-protected-results, it could be done all on the main-thread.
The main-thread could issue a timed runnable to itself after executing an async query, and in that runnable basically do what you suggest, and eventually dispatch itself again on the timer if it didn't consume all the results, until query completion.
It's a little bit more complex, but would retain a controllable chunking behavior.

> It's conceivable that we might need to enable awesomebar and other
> queries to specify that they should have their results go to a higher
> priority nsIEventTarget than the one, say, used for viewed-link-colorizing. 
> That seems doable.

I'm not too much concerned about this. The location bar speed is not in an horrible position today, and if we reduce the 75ms and remove the "slower result dictate rules" thing, it will gain a lot. Surely there's space for more micro-optimizations, but I'm not willing to expand the reach of this for now.
Reporter

Comment 4

2 years ago
I said main-thread, but I should have said caller-thread.
(In reply to Marco Bonardo [::mak] from comment #3)
> I see that with QF we are less concerned, but the existence of
> ThrottledEventQueue itself seems to point out we should not just drop the
> ball and assume things will be great if we notify at every result.

ThrottledEventQueue is more about stopping ill-mannered content-controlled DOM Workers from overwhelming the main thread.

> If I correctly understand your suggestion, basically we keep adding results
> in the time that it takes for the runnable to start after we dispatched it.

Yes.  The crux of the idea is that we know that results from a query are likely to be bursty and that we expect there to be some non-zero latency between when we dispatch the event and when it actually gets to run on the main thread.

As you say later in your reply, nsITimer could be involved to help guarantee some minimum delay at the expense of slightly increased minimum latency.  I think the timer thread logic also recently got very smart about time-slices and not dominating the main thread forever, so we'd benefit from that (at the expense of latency) too.

> Do we have an idea of how long that time is?

No clue at this point.  But I think the argument can be made that as long as the main thread is not hopelessly overwhelmed, the feedback loop works out for everyone.

At the one extreme where the main-thread is completely free for immediate dispatch, we get one event per result.  This potentially incurs some overhead, but means that our time-slice is minimal and we're minimizing the latency for other main-thread events.  As the main thread gets busier, our batching increases, arguably improving our efficiency because we spend less time in the mutex and the system benefits from some degree of cache locality.

> 1. chunking will start to depend deeper on the thread scheduler, that means
> it will vary across OS and may create annoying-to-debug differences.

The observable differences are how the result rows are broken up over different runnables.  Given that many of our users are on spinning disks that may have serious fragmentation issues and/or high I/O contention that could easily result in delays >= 75ms between results, I don't think this is anything new.  Places' in-memory tables may see some differences, but I would expect Places to have experienced many scenarios where those have been paged to disk and subject to the same I/O delays as they are paged back in.

> 2. we may not chunk enough and still have too much runnables overhead
> 
> My idea involved a third thread, but now that I look at your suggestion to
> have mutex-protected-results, it could be done all on the main-thread.
> The main-thread could issue a timed runnable to itself after executing an
> async query, and in that runnable basically do what you suggest, and
> eventually dispatch itself again on the timer if it didn't consume all the
> results, until query completion.
> It's a little bit more complex, but would retain a controllable chunking
> behavior.

Yeah, there of course does come a point though where the main thread is too busy and the naive batches too big and if we're being a good citizen we would ideally start using timers as you propose to avoid dominating the main thread.  If we're concerned about the memory bloat of this backlog, back-pressure as a function of the number of queued results *at the connection level* is appropriate.

Along those lines, if you're worried about confusing debugging situations, I do think we want to ensure that separate statement executions appear to run 100% sequentially so you don't sometimes get results runnables from different statements on the same connection interleaving themselves because of the timers.  In that case, I think we'd want to move results notification coordination onto the connection and use a Monitor instead of a mutex so that we can signal the async execution logic when it can resume because the number of outstanding results has dipped low enough again.  (We'd still store results and state on the AsyncExecuteStatements, but the connection would have an array of the ones with pending results, and the connection would own the runnable which would process them in order.)

I think the pathological case for us would be a backlogged main thread, a fairly responsive async execution thread, and an onslaught of requests, like a fast-typing user indecisively typing and deleting awesomebar queries.  But as long as the awesomebar logic cancels these requests (and not just interrupts them!) and our cancellation logic aggressively clears out the results, that might be enough on its own without fancy back-pressure.
You need to log in before you can comment on or make changes to this bug.