Closed Bug 415442 Opened 16 years ago Closed 16 years ago

Sunbird consumes much cpu time

Categories

(Calendar :: Internal Components, defect)

defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: aryx, Assigned: berend.cornelius09)

References

Details

(Keywords: perf, regression)

Attachments

(1 file)

After updating to today's nightly (20080202**), Sunbird consumes always more than 60 percent (often 90) of the available cpu performance, resulting in a startup time of several minutes and hangs while using Sunbird. This also happens in safe mode. I was not able to reproduce this with a new profile. There are several mozStorage, local and remote .ics files in this profile.

Regression from bug 413715?
Flags: blocking-calendar0.8?
WFM

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12pre) Gecko/20080202 Calendar/0.8pre 
Yes, I can confirm it also. When I minimize Sunbird CPU usage shows about 3 or 4 %
Umm... wow.  Seriously?  I'll confirm that bug 413715 is the source, and it should be readily apparent why upon casual inspection of the code (How did this get through review?).  To make things short, refreshEventTreeInternal has become incredibly inefficient from 0.7.  After subscribing to http://ical.mac.com/ebony.ivory/Musicians'%20Birthdays.ics in Sunbird I timed repeated calls to the function with the unifinder's existing array (see below).  Results as follows:
0.7: 11,12,12,13,12
0.8pre (20080202): 9173, 9147, 9159, 9154, 9159.

Quick inspection shows that from 0.7 to 0.8pre, the function has moved from being O(n) to being O(n^2) for dealing with hashIds, hash-object setting, and array lookups.  It's also moved from being O(1) to being O(n) for tree invalidations.

For those of you wondering about user-impact, refreshEventTreeInternal is called from refreshEventTree on startup, and in a variety of other places (when timezones are changed, when calendars are added, etc)  That's exactly what archaeopteryx is seeing here with the CPU issue.

Perf script:
var wm = Cc["@mozilla.org/appshell/window-mediator;1"].
         getService(Ci.nsIWindowMediator);
var win = wm.getMostRecentWindow("calendarMainWindow");

// 0.7 vs 0.8 differences
var array = win.gEventArray || win.unifinderTreeView.eventArray;

var start = Date.now();
win.refreshEventTreeInternal(array);
var end = Date.now();

// Too lazy for LOG
Components.utils.reportError(end-start);
Just to illustrate the absurdity... Adding the following lines to the perf script takes the numbers down to the 200ms range.  It's still not even close to where 0.7 was, but fixing that is probably more involved.  Also, since Simon trashed the old unifinder.js, I don't know how to look at what the old code did.

win.refreshEventTreeInternal = function(aEventArray) {
  with(win) {
    var items = aEventArray.filter(isItemInFilter);
    unifinderTreeView.eventArray = items;
    unifinderTreeView.tree = null;
    unifinderTreeView.sortItems();
    unifinderTreeView.setSelectedItems();
    unifinderTreeView.treeElement.view = unifinderTreeView;
  }
}
CC'ing Philipp on this one. Philipp, can you take a look at comment 3 and comment 4 in this bug, please.

> Also, since Simon trashed the old unifinder.js, I don't know how to look at 
> what the old code did.

Joey, the old code is available via bonsai at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/calendar/resources/content/Attic/unifinder.js&rev=&cvsroot=/cvsroot
Flags: blocking-calendar0.8? → blocking-calendar0.8+
Assignee: nobody → Berend.Cornelius
(In reply to comment #3)
> After subscribing to
> http://ical.mac.com/ebony.ivory/Musicians'%20Birthdays.ics in Sunbird I timed
> repeated calls to the function with the unifinder's existing array (see below).
>  Results as follows:
> 0.7: 11,12,12,13,12
> 0.8pre (20080202): 9173, 9147, 9159, 9154, 9159.

FYI, I have NOT reproduced the 0.7 results, using 'All Events' filter.
I replaced Cc and Ci with Components.classes and Components.interfaces resp.
in code from comment 3, and pasted into a javascript shell window.  I got: 
  SB0.7:                5047 4867 5147 4727
  SB0.8pre (20080215):  5227 4837 5618 4636
Maybe the 0.7 had a different filter selected, so its array was much shorter?
I did change the unifinder a bit in a different bug, the unifinder should now behave about like the replacement from comment 4. The values on my system are indeed values ~1780 for 0.8pre and ~200 for sunbird 0.7.

I started taking out code and what really took a chunk out of it was caching the sort keys. This got me down to ~440 for 0.8pre.

Using venkman I found the following data in the 1000 - 2500 ms range:

    Function Name: uTV_setItems  (Lines 416 - 427)
    Total Calls: 1 (max recurse 0)
    Total Time: 2116.68 (min/max/avg 2116.68/2116.68/2116.68)
    Time (ex. calls): 11.62 (min/max/avg 11.62/11.62/11.62)

    Function Name: uTV_sortItems  (Lines 441 - 468)
    Total Calls: 1 (max recurse 0)
    Total Time: 2104.93 (min/max/avg 2104.93/2104.93/2104.93)
    Time (ex. calls): 42.78 (min/max/avg 42.78/42.78/42.78)

    Function Name: compareEvents  (Lines 749 - 778)
    Total Calls: 14391 (max recurse 0)
    Total Time: 1669.56 (min/max/avg 0.08/190.16/0.12)
    Time (ex. calls): 156.48 (min/max/avg 0.01/0.15/0.01)

    Function Name: uTV_cache_get  (Lines 704 - 713)
    Total Calls: 28782 (max recurse 0)
    Total Time: 1484.33 (min/max/avg 0.03/189.73/0.05)
    Time (ex. calls): 1038.48 (min/max/avg 0.02/189.72/0.04)


This supports my findings, as changing compareEvents to behave like 0.7 significantly lowers the needed time. The cache was introduced in bug 278799. gekachecka, maybe you have means of improving this cache?

My tests where with "All Events" enabled on both systems.
gekacheka, do you think we should remove the sortkey caching, or do you have a smarter solution? I guess there was a reason for you to introduce them.
So far I still have not been able to reproduce the fast results.
Could you post the profile data for the fast version?

I reinstalled Sunbird 0.7 (on w2k), created a fresh profile, subscribed to the Musician's Birthdays calendar from comment 3, sorted it by start date (click on column header), in the error console ran the test code from comment 3 (edited all onto one line), and still got numbers around 4900 (not 400 or 200).

I installed venkman javascript debugger 0.87.1, turned off "Debug | Exclude Browser Files", and ran the test code in the error console again.  With the profiler it gets a little slower results, 5200-5800.

The profiler produces numbers like these (SB0.7):

    Function Name: refreshEventTreeInternal  (Lines 748 - 806)
    Total Calls: 1 (max recurse 0)
    Total Time: 5848.41 (min/max/avg 5848.41/5848.41/5848.41)
    Time (ex. calls): 100.15 (min/max/avg 100.15/100.15/100.15)

    Function Name: compareEvents  (Lines 548 - 587)
    Total Calls: 14933 (max recurse 0)
    Total Time: 5718.22 (min/max/avg 0/440.63/0.38)
    Time (ex. calls): 761.09 (min/max/avg 0/30.04/0.05)

    Function Name: msNextOrPreviousRecurrenceStart  (Lines 606 - 618)
      (this just calls calEvent.startDate.nativeTime in 0.7 (comment 5).)
    Total Calls: 29866 (max recurse 0)
    Total Time: 4927.08 (min/max/avg 0/440.63/0.16)
    Time (ex. calls): 3234.65 (min/max/avg 0/440.63/0.11)

    Function Name: compareMSTime  (Lines 601 - 604)
    Total Calls: 14933 (max recurse 0)
    Total Time: 30.04 (min/max/avg 0/10.02/0)
    Time (ex. calls): 30.04 (min/max/avg 0/10.02/0)

    Function Name: dateFilter  (Lines 778 - 784)
    Total Calls: 919 (max recurse 0)
    Total Time: 30.04 (min/max/avg 0/10.02/0.03)
    Time (ex. calls): 30.04 (min/max/avg 0/10.02/0.03)

It might be a platform difference or a hardware difference or measurement error, so could someone post a fast profile data so I can compare?
(In reply to comment #9)
> So far I still have not been able to reproduce the fast results.
> Could you post the profile data for the fast version?
(by "profile data" I mean a Venkman profiler report, from a save to .text file)
One possible 'measurement error':  In a new profile, no sort column has been chosen yet, so the compareEvents method just returns 0 without accessing the events to be compared (link in comment 5, line 586).  The events remain unsorted, and refresh 4-5 sec faster.  Once you click a sort column, it sorts.  The sort column persists, so thereafter results will be sorted, and refresh is 4-5 sec slower.  (Times are for the "Musician's Birthdays" calendar tested, which has ~900 repeating birthdays, many starting 50-100 years ago.)
Just filed bug 418296 with a patch that improves sort time by a factor 3 or 4.

However, even after this 3-4 second improvement, warm startup time is still about 70sec of mostly 100% CPU for me in a profile subscribed to the 'Musician's Birthdays' ics calendar, compared to about 12 sec without this calendar.

The description of this bug also describes "hangs while using sunbird".  
Is this because some ics calendars are being periodically reloaded?
(see also Options | General | "Refresh calendar every [30] minutes")

(If so, maybe it would help to be able to set the reload period separately per calendar: someone might want to load the calendar of a spouse or coworker frequently, but would be happy with much less frequent updates for a holiday or entertainment calendar.)

Some discussion of ics file loading performance is in bug 315959.

Depends on: 418296
Depends on: 415956
(In reply to comment #9)
>     Function Name: compareEvents  (Lines 548 - 587)
>     Total Calls: 14933 (max recurse 0)
>     Total Time: 5718.22 (min/max/avg 0/440.63/0.38)
>     Time (ex. calls): 761.09 (min/max/avg 0/30.04/0.05)
> 
>     Function Name: msNextOrPreviousRecurrenceStart  (Lines 606 - 618)
>       (this just calls calEvent.startDate.nativeTime in 0.7 (comment 5).)
>     Total Calls: 29866 (max recurse 0)
>     Total Time: 4927.08 (min/max/avg 0/440.63/0.16)
>     Time (ex. calls): 3234.65 (min/max/avg 0/440.63/0.11)
> 
>     Function Name: compareMSTime  (Lines 601 - 604)
>     Total Calls: 14933 (max recurse 0)
>     Total Time: 30.04 (min/max/avg 0/10.02/0)
>     Time (ex. calls): 30.04 (min/max/avg 0/10.02/0)
> 
>     Function Name: dateFilter  (Lines 778 - 784)
>     Total Calls: 919 (max recurse 0)
>     Total Time: 30.04 (min/max/avg 0/10.02/0.03)
>     Time (ex. calls): 30.04 (min/max/avg 0/10.02/0.03)
> 

Is it me or do these numbers look strange? An average of .03 with a max of 10.02 means there have been three calls of about 10.02 and 916 calls which are handled very fast, in about 0.00 (or 1 call of 10.02 and 4 calls of 4.4 sec or something like this). The functions above show similar behaviour. Could there be some function which chokes the app under certain data-conditions and, more importantly, is there a mechanism to find this function and data combination?
Sorting is always faster if the set is already sorted, I'll do some more speed checking tomorrow, but I think the numbers are ok (in a sense of not strange)

Manual profiling can be done with inserting log statements, then we could find out these combinations.
Attached patch patch v. #1Splinter Review
Looking at the code I spotted one function where is still room for improvement. Yet it's only called when deleting events and will only have (little) practical effect in cobmbination with deleting recurring events - A penny saved is a penny got.
Attachment #304437 - Flags: review?(philipp)
OS: Windows XP → All
Hardware: PC → All
Version: Mozilla 1.8 Branch → unspecified
Comment on attachment 304437 [details] [diff] [review]
patch v. #1


>     removeItems: function uTV_removeItems(aItemArray) {
>+        var count = 0;
>         for each (var item in aItemArray) {
>             var row = this.getItemRow(item);
>             if (row > -1) {
>                 this.eventArray.splice(row, 1);
>-                if (this.tree) {
>-                    this.tree.rowCountChanged(row, -1);
>-                }
>+                count++;
>             }
>         }
>+        if (this.tree) {
>+            this.tree.rowCountChanged(row, -count);
>+        }
>         this.calculateIndexMap();
>     },
I don't think this will work. I'm not 100% sure when and how often rowCountChanged should be called, but as far as I understand, you need to call it for every row that changes (-> the first argument) and tell it how many rows have been removed/added (-> the second argument).

The way you do it here tells the tree that at row (which is loosely defined inside the for each. While this works since js variables are always scoped on a function level, it will always call for the row of the last item removed.
Attachment #304437 - Flags: review?(philipp) → review-
(In reply to comment #14)
> Sorting is always faster if the set is already sorted, I'll do some more speed
> checking tomorrow, but I think the numbers are ok (in a sense of not strange)

How is the sdb-database sorted on disk? From your comment I conclude the sorting sdb-database is the culprit in this story? Isn't there room for improvement there, adding indexes or reshuffeling existing data?
Depends on: 419210
As we discussed on our last status meeting we will put this from the blocker list. Any input to further reduce the performance is welcome.
Flags: blocking-calendar0.8+ → blocking-calendar0.8-
(In reply to comment #18)
> As we discussed on our last status meeting we will put this from the blocker
> list. Any input to further reduce the performance is welcome.

Let me add that with the fixes that have gone in since this bug was filed, we believe that this is no longer serious enough to warrant the blocking status.

Yes, it doesn't take anymore 15 minutes on start-up. 
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Verified by the reporter per comment#20.
Status: RESOLVED → VERIFIED
Target Milestone: --- → 0.8
Flags: blocking-calendar0.8-
You need to log in before you can comment on or make changes to this bug.