general sluggish, lack of responsiveness during indexing of gloda

VERIFIED FIXED in Thunderbird 3.0b4

Status

defect
P3
major
VERIFIED FIXED
11 years ago
10 years ago

People

(Reporter: wsmwk, Assigned: rkent)

Tracking

({perf})

Trunk
Thunderbird 3.0b4
x86
Windows XP
Dependency tree / graph
Bug Flags:
blocking-thunderbird3 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [no l10n impact])

Attachments

(2 attachments, 4 obsolete attachments)

compose sluggish during indexing of gloda, so it looks like additional throttling may be needed.

I was attempting, unsuccessfully, to recreate  Bug 465009 -  Long pauses (windows hourglass 100% cpu) switching folders while indexing gloda

Steps:
1. set compose to spell check as you type
2. start indexing
3. open compose window
4. type

results: compose process is sluggish, screen doesn't keep up with typing.  CPU usage while typing and doing other UI work in Thunderbird, 8-20%.  CPU usage while UI is idle, 20-70%.

IBM T40, XP, 1.5GHz, 512MB, plugged in.  Results may be worse on battery power.  Before indexing, total .msf files after compact ~200MB
Isn't gloda supposed to index only while idling, and throttle itself appropriately?

For me it certainly seemed to do the indexing while I was doing other things (in and outside thunderbird), which made the whole system unusable with tb eating 20-50% CPU and a lot of memory until (I assume) the indexing was done.
Priority: -- → P3
Yes it is. Andrew and I had discussed this on IRC prior to me filing the bug.  He understands the operating parameters, but I don't.

But my system was completely usable, except compose was sluggish, b/c my CPU usage wasn't as high as you are seeing.  Did you have spell check as you type enabled? You are linux?

The memory issue is in another bug.
Gloda indexes all the time.  It indexes more heavily when idle.  If gloda didn't index when you were using the system, gloda-based views would be entirely useless.

If you have gloda's dump logging enabled, you may find that your terminal or syslog daemon is the one eating the CPU.  At least, that is how it is on my machine; with gnome-terminal displaying the debug output, processor utilization is very high, but when I switch to another tab it plummets.

Currently, the indexing rate is controlled by constants in indexer.js, namely: _indexInterval (and variants) and _indexTokens (and variants).  The constants may benefit from further tweaking.  Ideally we would move to some form of system that dynamically adapts based on the performance of the machine and other active processes.  Unfortunately, unless someone wants to help, that is probably aways away.
Ok, thx for the info. Yes, I'm on linux - spell check as you type enabled.
I didn't have a terminal for it at the time. OTOH, maybe it wasn't gloda indexing causing it after all...
I need to verify whether this is gone on the t40 to see if fixed by bug 466438. 
Perhaps again after landing of  Bug 470329 -  gloda indexer throttling logic needs to be adaptive

xref  Bug 465009  Long pauses (windows hourglass 100% cpu) switching folders while indexing
Whiteboard: wsm to test
P4 3.2GHz, desktop, windows XP pro, initial indexing in progress. takes 3-4 sec to load message. Same amount of time from clicking reply to usable compose window. Total 6-8 sec.

clicking folders also slow - perhaps not as bad in days of but 479008 (which may yet be reopened or new bug filed), but still not good enough for users not to be frustrated.

cpu graph seems to indicate indexing idle (or something else) doesn't happen.

Haven't tested yet on T40 - task for this weekend.

from IRC
<asuth>	wsmwk: gloda has limited granularity. it can't control how much cpu that message streaming uses, for example. well, not easily. so in your case it may end up a little imbalanced. that might explain why you see 0% then 80%
wsmwk: gloda is backing off extremely because the message streaming hits you so hard, so when it is doing things it controls, it does almost no work
(in order to control the message streaming cpu hit gloda would need to control the rate at which the message data is delivered to libmime from the offline store)

I'll look into profiling - mighit not happen soon
Severity: normal → major
Whiteboard: wsm to test
I've been looking at this, so let me make some comments. I'm commenting generally on gloda performance management, not just on issues with compose.

Gloda performance management logic clearly performs badly when there is a large external load on the system. For example, if I start a video playback that takes >60% of my system cpu, gloda will do everything in its power to try to increase its share of the system, trying to achieve its target of 40% of cpu (when not idling) and 80% (when idling). That means it will continually increase its parameters that determing gloda load. Also, once the performance parameters get out of whack, it is very slow to restore them to sane values.

I'l like to propose an alternate approach. First, let gloda continually estimate the d(cpu time)/d(token processed) so that it can make intelligent guesses of appropriate tokens per batch, given the actual system performance. Second, try to detect non-gloda demand on the cpu, and back off instantly if a load is detected. I hope that with these changes, the effect of gloda on the perceived system performance will be negligible. And I think that is important, as otherwise initial impressions of most people will be that Thunderbird is a very sluggish program. For that reason I think this should block TB3. But I won't request that until I have a specific improvement patch to propose, as I know that TB3 will ship with sluggish startup indexing performance if no specific alternative is available.
Status: NEW → ASSIGNED
Component: Backend → Database
QA Contact: backend → database
This is just an experiment, but shows the direction that I am headed in case anyone is interested.

The approach is this:

1) We adjust the tokens per batch to try to achieve a target time that the indexing batch takes. That time is short when active (48 ms) and longer during idle (240 ms).

2) We measure the unexpected delay in returning from the pause, and the cpu time during the pause. If either of these show that the mail process is loaded, we do another pause (up to a limit).

The net effect of all of this is that we can respond instantly to cpu demands from the process, typically because of UI by the user. With this patch running, I see no perceptible UI impact from running gloda indexing.
I'm going to go ahead and request blocking on this, and target beta3 just to ask the question. The approach that I am taking in the patch I just added, though quite different from existing code, is fairly simple and easily understood. I think there is a reasonable possibility that it could be landed in a couple of days. But I'll need others to give input on how serious they think this problem is, and how risky is my proposed solution.

Just for the record, this is not at all what I proposed in comment 7.
Flags: blocking-thunderbird3?
Target Milestone: --- → Thunderbird 3.0b3
Thanks Kent. Short, short term I think this needs to hit trunk post haste or provide a tryserver build for quick testing.  And not surprisingly I am in favor of blocking b3 because I think we want the mass of testers to use Gloda search in b3, and not wait to b4 to get big usage/big testing.

Bear in mind our public's machines are far more varied than ours (eg less powerful) and they are not so forgiving. So many won't accept the current performance hit even if what they get in return (Gloda) is great. Users universally hate reduced performance.
This wants to block Tb3, but probably not b3, since we're not targetting GloDa to be turned on by default for b3.  That said, we'd at least consider approving this patch after the slushy freeze, because it would make the experience of folks who want to help testing GloDa better.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
Duplicate of this bug: 487501
Assignee: nobody → kent
follow up to comment 6. I blew away the index to start from scratch and it's still indexing after 1hr (it's now doing local folders). But so far on the P4 with kent's patch performance is much better.  There is a barely perceptible delay in most operations (UI via keyboard, clicks, etc).  Worst is a) when typing fast in compose and b) scrolling folder pane - both have very obvious delay for me (note: I also have issues with folder scroll without Gloda).  

But so far nothing annoying.  Performance when switching folders and selecting folders is excellent in a non-memory constrained environment.  And navigating UI is great.
Summary: compose sluggish during indexing of gloda → general sluggish, lack of responsiveness during indexing of gloda
With my patch, there is a noticable delay when I first select an operation, such as select the scrollbar on the folderpane, similar to reported by Wayne.

The allowed time for gloda operations is 48 milliseconds in the patch while active. I'm not sure that we need that much - and in any case that should probably be a preference, so that we can tell people who have issues to turn it down. Gloda only took some hours to do the initial indexing of my years of emails, even on my slow tablet, and after initial indexing only runs occasionally. So I'm feeling that turning that down to 32 ms or even 16 ms per gloda batch might work. During idle, the gloda time slices are 240 milliseconds, and that does not seem to be a problem AFAICT.

The other improvement, where I detect cpu usage during pauses and repeat the pause if TB is busy, seems to work pretty well, and is particularly active during startup when there is a lot of other activity going on. (If you are running a console, with the patch you see a "!" whenever a pause is repeated).

I need to fix the time per token calculation to use the proper count of consumed tokens, now that I understand better how that part of the code works.
I wouldn't expect 48ms to actually be perceptible.  My understanding is that even 100-150ms is perceived by the human brain as causal/instant w.r.t. an action.  Which would make me want to look for other possible code paths that are absorbing more time than that...
Here's another issue. The idle service uses fairly slow polling for its observer callbacks, up to 5 seconds, so you cannot rely on it to get you out of idle mode in milliseconds, which is what we are trying to do here. But you can do the polling yourself and get fast response. So I think that the slowness that I sometimes see may be a slowness to come out of idle, which I believe that I can solve. That would tend to agree with comment 15, that 48 ms should be seen as instantaneous, while 240 ms is not.
I'm also seeing a glitch in the UI when the database commits (and possibly when garbage collection occurs.) I can use direct detection of idle condition to delay those, if possible, until an idle time. I'm picturing a minimum and maximum rate of commit and GC, with trying to delay the commit and GC up to the maximum time when active, to try to avoid the UI glitch. Even waiting for 1-2 seconds of non activity would help a lot.
I don't think we need to do as much of a demarcation between idle/non-idle processing with your revised algorithm.  It made sense for the steady-state algorithm, but since your algorithm is more adaptive you could potentially abandon it entirely.  Under idle I think our goals are to 1) maintain efficiency by keeping the cache warm-ish and 2) avoid throwing away cycles by sampling too much in regards to load.  Power efficiency should also be a concern; we want to facilitate the processor being able to go into deep wait-states.

The main suggestion I would make then is to drop from 240 ms down to perhaps just always 48ms but perhaps not doing load sampling every cycle.


The database commit glitches could be one of two things:

1) System-wide fsync-related pauses.  The actual commit happens on a background thread and we should not be doing anything synchronous with the database from the main thread.

2) Because we do not want to set properties on message headers until the commits take place, once the commit happens, we do touch the mork file.  Maybe we are triggering a mork commit there?

The current heuristic on when to commit is not tuned *at all*.  I like your idea of deferring commit/GC until a period when the user is unlikely to notice.  We should be greatly reducing the GC overhead once we adapt gloda to use the enumeration filters bienvenu implemented.
Mork commits are really really cheap, unless we do a compress commit, which is highly unlikely to be caused by gloda.
Here's my latest, with fixes for some of the issues in my previous comments. Still WIP.

I probably agree with asuth's "I don't think we need to do as much of a demarcation between idle/non-idle processing with your revised algorithm" but this patch leaves the distinction.

I'm not sure of his concern though for doing excessive performance monitoring. I am assumming that this is a fairly low overhead operation, just reading some counters maintained by the OS. The main feature of my algorithm is rapid detection of user interaction, so I really think those checks need to be done at least every 30 - 40 milliseconds.
Attachment #387031 - Attachment is obsolete: true
for grins I used tryserver build during vista sp2 download. Thunderbird performance with gloda on (initial indexing in progress) is not worse than I would expect during vista update, which is good.
Whiteboard: [has patch]
Note to self: At one point while running my current patch, cpu time per token became NaN. I need to protect against that, as NaN is viral, and does not recover from the damping mechanism.
back on the same system 2 days later as comment 21 with tryserver build still - I haven't touched the dell d531 since then - and thunderbird is in some kinda funk. 

Windows and it's apps seemed OK, but when I brought thunderbird to the foreground TB was very sluggish to the point of being mostly unresponsive. Activity manager shows stuff like indexing 2 messages taking 31 seconds. Thunderbird is mostly at high CPU. I don't think I have any addons active that are whittlers.  Had to kill it (couldn't shut it down).

Addressbooks Synchronizer 0.9.4
Bugmail 1.5 [DISABLED]
Bugzilla Link Grabber 1.0
Clippings 3.0.1 [DISABLED]
Console² 0.5
Crash Me Now! Advanced 0.2
Crash Report Helper 1.1
DOM Inspector 1.8.1.2 [DISABLED]
Experimental Toolbar 1.0 [DISABLED]
Gloda Plugin: Bugzilla 0.1 [DISABLED]
InspectorWidget 2.11.20080214 [DISABLED]
Lightning 0.8 [DISABLED]
Nightly Tester Tools 2.0.2
Nightly Tester Tools Lite 0.2.2 [DISABLED]
Provider for Google Calendar 0.5pre [DISABLED]
Restart Thunderbird 0.9.1
STEEL test extension 0.0.1
ThreadBubble 0.8 [DISABLED]
ViewAbout 1.4
Lose the STEEL extension.  It's unlikely to be causing anything, but there's no real point to it anymore.

A very likely explanation is that you are experience the garbage collection cost of the message enumeration.  I doubt that has to do with rkent's algorithm.
Posted patch What I have been running (obsolete) — Splinter Review
I have been running this patch for a few weeks now. It is designed to spew info to the console, which helps me see what is going on. That would be removed of course in an official patch.

Overall, I am happy with this approach. The only complaint that I have about this method at the moment is that the sync actions take much less time than other actions, and tend to occur all at once, so it drives the estimate of time per token down too low temporarily. But in practice I have not noticed this affecting my ui.

Asuth, where do you want me to go with this? Are you happy with this alternate approach? I see that it has two big advantages. First, it is robust against other CPU-intensive applications running at the same time (like a video display). Second, it responds quickly to UI operations that need CPU.
Attachment #387404 - Attachment is obsolete: true
(In reply to comment #24)
> Lose the STEEL extension.  

required for bugzilla link grabber ... which isn't working for me for some time now :( don't know why
(In reply to comment #25)
> Asuth, where do you want me to go with this? Are you happy with this alternate
> approach? I see that it has two big advantages. First, it is robust against
> other CPU-intensive applications running at the same time (like a video
> display). Second, it responds quickly to UI operations that need CPU.

I am very happy with this approach.  I would like to get this on the trunk as soon as possible.  I am busy today and tomorrow but will have time this weekend to review and test.

I would like the gloda unit test (test_index_adaptive.js) to be updated to cover the decision points for the new approach.  If you have the time to do this, that would be ideal.  If not, I can do so, but it would help me a lot if you could describe your vision for testing the new approach (and what changes might be required to the implementation to make this easy).
(In reply to comment #26)
> (In reply to comment #24)
> > Lose the STEEL extension.  
> 
> required for bugzilla link grabber ... which isn't working for me for some time
> now :( don't know why

We landed the FUEL part of STEEL.  It is probably winning over the STEEL extension when it comes to exposing the XPCOM component.

In any event, the STEEL extension has bad performance characteristics when there are a lot of message events happening.  I would not use it in any profile where you are going to consider reporting performance problems.
Posted patch Cleaned up for review (obsolete) — Splinter Review
I cleaned this up a little for review, but there were no substantial changes to the code. It would definitely be worth someone else tweaking it a little.

The updates to the report of number of messages indexing occurs much less often in this version (once about every 5 seconds), which may not be desireable.

It passes gloda unit tests, except the one dealing with the adaptive performance, as expected. Because I am relatively unfamiliar with gloda code, and this is a relatively large patch, it would be great if asuth could write/fix unit tests rather than me.
Attachment #394328 - Attachment is obsolete: true
Attachment #394626 - Flags: review?
can someone throw up a tryserver build?
Attachment #394626 - Flags: review? → review?(bugmail)
The performance on this is fantastic.  I am getting excellent responsiveness while maintaining an indexing rate that is 50% of the highly synthetic xpcshell-only benchmark I have.

Great work, rkent!

I will kick off some try server builds right now...
Whiteboard: [has patch] → [has patch][no l10n impact]
update ... after some initial indexing activity, it's pretty much stalled. same for rkent.  new messages since last previously disabling gloda aren't indexing
Two things are happening:

The state of things is that bug 511609 (without any gloda compensation) is causing indexing to abandon folders before they are fully indexed.  Autosync, OS search indexing (vista/spotlight), as well as user navigation through folders can all cause gloda to abort.

The calls to _beginTransaction/_commitTransaction are not consistently paired since the latter call is conditional in workBatch (with this patch) while the former is always made.  This will tend to happen once the indexing process has moved into event-driven indexing mode because only then will the batch be short enough that doCommit will be false.

Once the transaction becomes unbalanced, a database commit is not made until you quit Thunderbird (when we force a commit regardless of transaction depth).  It looks like indexing is broken because we do not mark messages as clean until after the commit occurs.  (In this manner we avoid inconsistency although we might do extra work if thunderbird crashes in the time period after the commit ins concluded and before we modify the headers and commit the mork database to disk.)  You will likely notice that the sqlite journal file for the gloda database starts getting really large (and memory may appear to be leaking) once this happens.

The gloda indexer should likely just maintain its own boolean that indicates whether it has its own transaction active and use that to decide whether to start a new one.

One issue I'll float about deferring transaction is that it's possible we can leave a transaction outstanding for longer than the max duration limit if the indexer runs out of work and doesn't get any new work.  The simplest solution to this would be to keep us subscribed to the idle service and use its idle notification to check whether we have a pending transaction and use that as an opportunity to commit and GC.

That doesn't cover the case where we are already idle but the commit duration has not elapsed.  For that case I propose we just commit.  So if we are already idle and there is no more work, we commit regardless of the timer.

I need to look into the unit tests, but I think I was getting explosions because the idle service doesn't exist (for me) under xpcshell (I think).

rkent, if you can address the commit issues, that would be great.
I will address the commit issues.
I fixed the issue of mismatched begin/commit transactions.

In this patch, I added some additional notification calls, because I did not seem to be getting enough updates of the status line. Not sure if that is correct, though. I also notice in my debug builds that indexing is not waiting 10 seconds to start like it is supposed to. I don't believe there is anything that I did that could cause that, however.

There are still some cases where gloda indexing slows things down, but they do not seem to be occurring in the routine that is the focus of this patch, they seem to be in whatever activity this patch kicks off. But I'm not sure it is a serious issue.
Attachment #394626 - Attachment is obsolete: true
Attachment #395852 - Flags: review?
Attachment #394626 - Flags: review?(bugmail)
Attachment #395852 - Flags: review? → review?(bugmail)
This patch makes a few changes on top of rkent's fix, the key differences are:

- The reason for the bienvenu sr request... we had one bug and one annoyance on linux in our version of nsStopwatch.  The bug was that the errno check was incredibly likely to screw up and set our timebase off by a factor of 10,000.  The annoyance was that times() has a granularity of 10ms on my linux box, at least.  Since our desired interval is 32ms when it matters, which is a lot of precision we are losing.  So we use gettimeofday for the wall-clock instead of times().  In theory this could lead to inconsistency between the clock and wall time, but since the Gloda indexer is the only user of the code and we no longer use it in that way, I'm okay with it.

(Note: this led to the implementation seeming much more aggressive than it was intended to be (for me), since it always chose a batch size of 100 tokens.)

- Make xpcshell tests work by getting a reference to nsIIdleService during indexer initialization and keeping it so that glodaTestHelper can replace it.  This is required because GetIdleTime returns an error when there is no display attached, as in the xpcshell case.

- I NOP'ed test_index_adaptive since it is no longer applicable.  I rationalize not having a replacement test immediately available because rkent's new algorithm has fewer decision points and is generally much less likely to go off into the weeds.  I do think we can adapt the tests, however, so am leaving the file around with its contents rather than nuking it.

- At least on linux, every call to GetIdleTime involves an X request.  It looks like the idle check was accidentally inside the loop anyways, so I moved it out a level.

- I changed the GC-forcing to just force a GC after a commit.  The counter was basically dead-weight.

- I removed the added notifyListeners calls.  They were breaking the unit test framework which did not expect to get multiple notifications about indexing ceasing.  Instead I added a decimated (signal processing usage of the term) call inside the steady-state of the loop.

- I changed the pdump calls to use a temporary string rather than dump calls to reduce the potential performance impact.  We should, of course, remove the pdump and string-building entirely before committing.

- I reduced the _INDEX_INTERVAL constant, which is the thing I am most concerned about.  The idea as I understand it behind the new implementation is to get out of the way when there is other stuff going on, and a hard-coded constant does not seem entirely in keeping with that.

The duty cycle with the previous sleep value of 48ms limited us to 40% CPU consumption with a non-idle user, which is in keeping with the current implementation, but seems conservative.  If the user has the spare cycles and it won't hurt, why not let it be much more aggressive?  Changing the constant to 16 lets us get up to 67%.  My concern is that the pause state where we sense other activity and move out of the way should perhaps be altering the sleep value rather than repeatedly sleeping... trying to force ourselves active every 16ms is pretty aggressive and somewhat defeats the purpose.  (48 was much more reasonable for that case)  Not to mention the limit of 10 times at 16ms does not add up to be that much.

A simple fix would be to boost the sleep duration the first time our 16ms sleep fails to be sufficient.  So we go to 48, but then restore it to 16 as soon as we exit the sleep loop.  I suspect exponential back-off would mislead us as the OS is likely to honor a longer sleep request even if there is something else that wants the CPU badly.


rkent, if you could take a look at things and try and run with it, that would be great.  Feel free to fold the changes in to your patch and then make further modifications, especially if you have any ideas on the _INDEX_INTERVAL case.

I'm happy to sign-off on this once we get the _INDEX_INTERVAL kink worked out/consensus reached.
Attachment #396411 - Flags: superreview?(bienvenu)
Attachment #396411 - Flags: review?(kent)
In the (In reply to comment #38)
> times().  In theory this could lead to inconsistency between the clock and wall
> time, but since the Gloda indexer is the only user of the code and we no 

Whoops, I meant to say between wall ("real") time and CPU Time...
Attachment #396411 - Flags: superreview?(bienvenu) → superreview+
Comment on attachment 396411 [details] [diff] [review]
v1 some mods on top of rkent's patch...

sr=me for the unix nsStopWatch.cpp changes.
I need to run this for a day or two to really evaluate, but let me make some initial comments.

>    */
>-  _INDEX_INTERVAL: 48,
>+  _INDEX_INTERVAL: 16,

This is only one tick of the Windows CPU measurement clock. You can consider a single tick as just noise, so you really need to see 2 cpu ticks, or 32 milliseconds, to be sure that there was activity. Depending on the specifics of the implementation of the timer callbacks in js, you may never see that. I really think this is a risky move. And why? Rate of indexing is not a significant issue on even my slow system, while UI responsiveness is. On a faster system, rate of indexing should be even less of an issue.

>-        // We want to stop ASAP when leaving idle, so we can't rely on the
>-        // standard polled callback. We do the polling ourselves.
>-        let idleService = Cc["@mozilla.org/widget/idleservice;1"].
>-                            getService(Ci.nsIIdleService);
>-        if (idleService.idleTime < this._INDEX_IDLE_ADJUSTMENT_TIME) {
>-          inIdle = false;
>-          this._cpuTargetIndexTime = this._CPU_TARGET_INDEX_TIME_ACTIVE;
>-        }
>-        else {
>-          inIdle = true;
>-          this._cpuTargetIndexTime = this._CPU_TARGET_INDEX_TIME_IDLE;
>-        }
>       }
>       this._perfIndexStopwatch.stop();
>-      this._pdump(",")
>+      worklog += ",";
>+
>+      // We want to stop ASAP when leaving idle, so we can't rely on the
>+      // standard polled callback. We do the polling ourselves.
>+      if (this._idleService.idleTime < this._INDEX_IDLE_ADJUSTMENT_TIME) {
>+        inIdle = false;
>+        this._cpuTargetIndexTime = this._CPU_TARGET_INDEX_TIME_ACTIVE;
>+      }
>+      else {
>+        inIdle = true;
>+        this._cpuTargetIndexTime = this._CPU_TARGET_INDEX_TIME_IDLE;
>+      }

By moving this idle check to the end of the batch loop, you are slowing down the speed at which it responds to leaving idle. I assume you are doing this for performance reasons. I have assumed that the idle check is very cheap, just reading a performance counter somewhere. Do you have any evidence that it is not cheap? If cheap, better to leave it where it was.

>+    const HEADER_CHECK_BLOCK_SIZE = 25;

This is good, the sync operations were much faster than the others.
(In reply to comment #41)
> This is only one tick of the Windows CPU measurement clock. You can consider a
> single tick as just noise, so you really need to see 2 cpu ticks, or 32
> milliseconds, to be sure that there was activity. Depending on the specifics of
> the implementation of the timer callbacks in js, you may never see that. I
> really think this is a risky move. And why? Rate of indexing is not a
> significant issue on even my slow system, while UI responsiveness is. On a
> faster system, rate of indexing should be even less of an issue.

I am likely guilty of optimizing for my use-case.  My development system is a quad-core system and I blow away my database a lot.  (Note: I do have the slowest of the quad-cores though; Q6600.)  At the same time, I think we do want to perform our initial indexing as quickly as possible so that the global search is usable as quickly as possible.

I definitely agree that my system is not representative and that our primary concern when active is to not interfere with the use.  I defer to your testing/experimentation/investigation on this.  Certainly a sleep of 48ms is no worse than what we have now, and even 32ms would be better.

Perhaps a better long term strategy is simply to write platform-specific code for our primary platforms to have a better picture of the system load and available cores.
 
> By moving this idle check to the end of the batch loop, you are slowing down
> the speed at which it responds to leaving idle. I assume you are doing this for

I'm not sure I follow.  The check sets two variables, "inIdle" and "this._cpuTargetIndexTime", neither of which is accessed in/by the loop.  There is no break statement or anything that would alter the control flow.  Only the last execution in the loop has any impact anyways.

If the intent was that there would be a change in control flow, I think that the check should 1) be only made when we believe we are currently in idle (no need to make the active case any more expensive), 2) be decimated so that we still only issue that check every (estimates) 80 ms.  For example, every _indexTokens / 2.  This is given my next point...

> performance reasons. I have assumed that the idle check is very cheap, just
> reading a performance counter somewhere. Do you have any evidence that it is
> not cheap? If cheap, better to leave it where it was.

http://mxr.mozilla.org/mozilla-central/source/widget/src/gtk2/nsIdleServiceGTK.cpp#124

http://mxr.mozilla.org/mozilla-central/source/widget/src/qt/nsIdleServiceQt.cpp#105

http://cgit.freedesktop.org/xorg/lib/libXScrnSaver/tree/src/XScrnSaver.c#n208

http://linux.die.net/man/3/xscreensaverqueryinfo

The linux code is issuing multiple calls to the X screensaver extension.  Based on the source code, this appears synchronous and not fast-pathed via shared memory.


> >+    const HEADER_CHECK_BLOCK_SIZE = 25;
> 
> This is good, the sync operations were much faster than the others.

Yeah, I did this in response to your comment and my observation of similar phenomenon with your nice debugging output enabled.
Comment on attachment 395852 [details] [diff] [review]
Fixed commit issues.

er, so this is r=asuth with my changes but _INDEX_INTERVAL set (back) to whatever rkent wants and the debug string building and print-out removed.

I can land it, but need approval of my changes and to know what to set _INDEX_INTERVAL to.

I think the migration of the idle check out of the loop is still under debate, too.
Attachment #395852 - Flags: review?(bugmail) → review+
(In reply to comment #43)

I'm starting to run this now, and am running into my usual sets of problems that may or may not be related to these changes. But I still need to investigate. I want to test the _INDEX_INTERVAL value with the lower value as you suggested, though even if it works I will still have my doubts.

> 
> I think the migration of the idle check out of the loop is still under debate,
> too.

You are correct that the code as written did not really do what I intended. I think something got lost in one of my last interations. The goal was to exit the batch immediately when we leave idle to recalculate the correct index count. I think that should be fixed, and the loop left where it was. If you do, then there is very little risk to setting the amount of time that it executes in idle to a fairly high value, as it will then leave idle immediately if the user touches their system.
(In reply to comment #44)
> (In reply to comment #43)
> > I think the migration of the idle check out of the loop is still under debate,
> > too.
> 
> You are correct that the code as written did not really do what I intended. I
> think something got lost in one of my last interations. The goal was to exit
> the batch immediately when we leave idle to recalculate the correct index
> count. I think that should be fixed, and the loop left where it was. If you do,
> then there is very little risk to setting the amount of time that it executes
> in idle to a fairly high value, as it will then leave idle immediately if the
> user touches their system.

I like the idea... my concern mainly stems from the inconsistency in the cost of each token.  Because of our use of generators and love of extensions, anything in the worker 'process' that wants to call something that could potentially be asynchronous or long-running needs to go through our async driver mechanism rather than a normal function call.

For example, a plugin that only did things when the e-mail sender is facebook/twitter/etc. will do almost no work when the e-mail is from someone else.  Yet it has to pass through that call mechanism, and would incur both the XPConnect call cost as well as the X synchronous call costs (involving context-switches at least on a single-core box).

Mitigating strategies that occur to me are to 1) add new yield value variants to convey that a call was cheap and/or 2) use Date.now() to make sure that some actual number of milliseconds have passed.  Option 1 is not tremendously elegant, and option 2 has its own costs, although they should be relatively cheap given that the JS engine should do a fast native call and the syscall should not be egregious.  I would suspect option 1 would be preferable for the platforms where idle checks are basically free.
Although what you say is true, I don't really understand its bearing on the issue of leaving idle. If we are doing a long idle batch, and we leave idle, then stop processing more tokens, pause, recalculate the tokens per batch, and continue. That has nothing to do with the inconsistency of the cost of each token.

The inconsistency is a concern to the whole algorithm, though. What saves us is 1) we average over about 200 tokens to calculate the cost, and 2) if we get a long token, we immediately slow down.

As I've watched the output, any UI glitches that I see are probably associated with the processes that are going on in the tokens themselves. Sometimes I'll see a token that lasts a second or so, for example (which your change to the debug output display masks, btw). So generally I believe that the techniques here in workbatch are working fine, and any further performance improvements would need to come from understanding other sources of delay, and breaking them up into several tokens if possible. But that is another bug, and I don't think is needed in the short term.

My current problem is that my main big profile that I use in testing gloda is causing gloda indexing to hang after an async token. This smells like an independent bug, but I really need to be sure before I can encourage the patches of this bug to move forward.
(In reply to comment #46)
> Although what you say is true, I don't really understand its bearing on the
> issue of leaving idle. If we are doing a long idle batch, and we leave idle,
> then stop processing more tokens, pause, recalculate the tokens per batch, and
> continue. That has nothing to do with the inconsistency of the cost of each
> token.

I just mean it's wasteful if checking idle costs us more than processing the token does, and it's excessive to check if the user has un-idled every millisecond (or more frequently).  Since new extensions are the most likely source of additional tokens in the process, and are likely to be cheap analyses, I just fear additional waste.

As you point out, the algorithm provides a pretty good smoothed estimate of the cost of tokens, which is why I suggested decimating using that value.  Checking idle every 30 milliseconds is not too silly, and with the averaging, that should be fairly reliable.

> with the processes that are going on in the tokens themselves. Sometimes I'll
> see a token that lasts a second or so, for example (which your change to the

The most likely explanation for that is garbage collection of headers from enumeration.  (Which should go away when we used filtered enumerators, which I think you have volunteered to implement :)
After some testing with a few different values of _INDEX_INTERVAL, I can
clearly see some differences between 16 ms and 32 ms, but at 32 ms I have no
problem getting the pause to take effect when, for example, scrolling the
folder pane. So I'm happy with 32 milliseconds.

Let me try this on a slower single-processor system.
Comment on attachment 396411 [details] [diff] [review]
v1 some mods on top of rkent's patch...

OK I'm happy with this, with the one issue that you should set _INDEX_INTERVAL to 32, otherwise the repeated pauses are not as reliable.
Attachment #396411 - Flags: review?(kent) → review+
Running on my slow main system, the UI glitch that I still see is associated with sqlite transactions, and it is a doozy. I added some additional this._pdump statements to show me when transactions and garbage collect are occurring, and committing transactions is the slow step (sometimes the transaction commits quickly, but then delays on the new transaction setup). At the moment, this seems to lock up my UI for about one second each time the transaction commits, which is once every 5 seconds during idle, and once every 20 seconds when active. I don't really understand why this is, as the transaction statements are supposed to be executed async. But I am not an sqlite expert.

This does not discourage landing of this patch, it only points out that we still have additional work. If we could solve the transaction delay problem, and with this patch, we could make gloda indexing have no visible UI effect on my slow system.
changes made and pushed:
http://hg.mozilla.org/comm-central/rev/d92896496cf2

This includes a minor change I ran by rkent to stall indexing while the commit is in progress.
Whiteboard: [has patch][no l10n impact] → [no l10n impact]
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
much improved. in fact, I detect no ill effects from indexing.
v.fixed

iirc performance tests aren't currently possible. do we want a new bug for that?
Status: RESOLVED → VERIFIED
(In reply to comment #52)
> much improved. in fact, I detect no ill effects from indexing.
> v.fixed

woo!

> iirc performance tests aren't currently possible. do we want a new bug for
> that?

yes, that would be good, thank you.
(In reply to comment #53)
> > iirc performance tests aren't currently possible. do we want a new bug for
> > that?
> 
> yes, that would be good, thank you.

We already have a bug on performance tests for Thunderbird. So it depends what you're going to file. https://wiki.mozilla.org/User:Standard8/Thunderbird_Automated_Tests_Planning has some info.
You need to log in before you can comment on or make changes to this bug.