Open Bug 412914 Opened 16 years ago Updated 2 years ago

Enable offline caching for ical calendar with many events -> startup horrible slow with high CPU

Categories

(Calendar :: General, defect)

defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: andreas.treumann, Unassigned)

References

(Blocks 2 open bugs, )

Details

(Keywords: perf, Whiteboard: [patchlove][calperf-key] [not needed beta][no l10n impact][gs])

Attachments

(4 files, 2 obsolete files)

STEPS TO REPRODUCE:
===================

- create a new profile in thunderbird
- install lightning
- create a new calendar with many events, e.g. http://ical.mac.com/ebony.ivory/Musicians'%20Birthdays.ics
- open the properties dialog
- mark the cache option
- restart thunderbird

RESULT:
=======

- wait.... cpu usage is 100%
- thunderbird is unusable for a long, long time

EXPECTED RESULT:
================

- the startup time should be not noticeable longer when the cache option is enabled 

REPRODUCIBLE:
=============

- always
Flags: blocking-calendar0.8?
It's reasonable that the first restartup after switching the option is slow, because the cache calendar is filled up with events. IMO we need some sort of progress indicator show that calendar synching (bug known for that?).
Every further startup should perform better though, but at least for large calendars (with many recurring items, like my work calendar) it doesn't, because of storage calendar's poor performance. Even browsing the month view after startup is usually slower than requesting the remote calendar as it used to be. Switching over to a plain memory calendar shows bad startup performance as well, but much better browsing performance clicking thru the months. So I am currently working on adding a item cache (bug 366177).
I am taking care of this issue and set the blocking-flag, so it doesn't get out of our radar for 0.8.
Flags: blocking-calendar0.8? → blocking-calendar0.8+
Depends on: 366177
This really is serious. It completely blocked my computer with a 500 kb ics-file (no recurring events) I wasn't able to read my mail or even to switch to the calendars to turn off caching. the ics produced a 1200 kb sqlite-file (seems right) in about 10 minutes but after that TB still hang. It did fire an alarm-window which was blank and couldn't be dismissed (dismiss all). If caching is on the roadmap for 0.8 there seems to be a big problem.
setting severity to critical. The only way to get a working TB-L again was to start in safe-mode and downgrade Lightning to a time before the caching was implemented. 
Severity: normal → critical
As caching is experimental, a relnote explaining how to turn off caching for calendars by hand-editing preferences.js (or the correct file which holds the calendar-specific settings) should be enough imho.
Bas, fortunately caching is disabled by default. The user needs to explicitly enable via calendar.cache.enabled in the prefs. When this is enabled, caching can be turned on/off via calendar prefs dialog.
As stated the app freezes, so it's not possible to get to the preferences. 
I tried to check this, but it is unclear to me how to do it.
Do I need to make a new string "calendar.cache.enabled" and set it to "1" ?
There is no global preference to enable/disable caching, at least not with the name mentioned in Comment #5, see <http://lxr.mozilla.org/mozilla1.8/search?string=calendar.cache>. Caching can be enabled in the Calendar Properties dialog for each calendar separately.
regarding comment 5: In the nightly builds simply enabeling in the properties of the calendar builds the cache like ssitter states. Lars, you should try this on a fresh profile to avoid having TB hanging with your email in it.
This is a flaw of the storage provider, but I don't think we really quicken this much within the 0.8 timeframe. All that can be done is bug 366177.

The preference was indeed removed, but it should be possible to turn caching on/off on a per-calendar basis in the properties dialog.

With the checkin of bug 366177, we have done everything we can in the 0.8 timeframe. Removing blocking for this bug. 
Flags: blocking-calendar0.8+
Flags: wanted-calendar0.9+
Assignee: daniel.boelzle → nobody
Depends on: 450104
All the blockers are fixed is this the same for this one ?
Regards
Flags: wanted-calendar1.0+
Flags: wanted-calendar0.9+
Flags: blocking-calendar1.0?
This looks very similar to my problem with shared iCals on a local network drive.

It is a very serious problem, crippling slow speed for just a few shared iCals.

I know it is not my LAN as Calendarscope ( http://www.calendarscope.com/ ) accesed the same iCals instantly, but I prefer the Lightning interface with TB email integration.... tough situation.

Does anybody know if TB 3.0 will have the same problem?
Bug 390036 has a fix (though not perfect), does this work for you?
I retested using Lightning 1.0b2pre with Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.5pre) Gecko/20100428 Lanikai/3.1pre. I subscribed to <http://www.mozilla.org/projects/calendar/caldata/GermanHolidays.ics> and enabled the cache.

Console output shows that a full sync is done on every startup - not only the first startup as stated above. During this time Thunderbird is completely unresponsive.
Flags: wanted-calendar1.0+
Flags: blocking-calendar1.0?
Flags: blocking-calendar1.0+
Whiteboard: [not needed beta][no l10n impact]
(In reply to comment #17)
> Console output shows that a full sync is done on every startup - not only the
> first startup as stated above. During this time Thunderbird is completely
> unresponsive.

as in bug 569555?
Keywords: perf
(ignore comment 18)

fallen on irc "etag ... is a possible solution"
Summary: Enable offline caching for a calendar with many events -> startup horrible slow → Enable offline caching for ical calendar with many events -> startup horrible slow
The problem also appears if i import dozens of events in a calendar with cache enable (import... menu).

I added some logs to display the execution time of functions. 

It appears that the running time of 'this.calendar.addTargetCalendarItem' (calDavRequestHandlers.js) is significantly longer with cache enable. 
addTargetCalendarItem (calDavCalendar.js) call adoptItem (calStorageCalendar.js), wich call 'flushItem' (calStorageCalendar.js).
In the function 'flushItem', i see that the execution time of 'this.releaseTransaction ()' can takes about 100 ms.  A transaction is created for each new element.

If I put in comments 'this.acquireTransaction ()' and 'this.releaseTransaction ()' (flushItem - calStorageCalendar.js),
i can see a normal load time without freeze.

The reason may be that sqlite does a dozen transactions per second. 
See :
http://www.sqlite.org/faq.html
INSERT is really slow - I can only do few dozen INSERTs per second

The solution might be by setting up a global transaction, ie for all elements of a caldav document?
Since my last comment, I made some changes to the file 'calStorageCalendar.js' to test the addition of dozens of events in a single transaction. 
The goal is to test the initial loading of a calendar with cache enabled.

Code modification basics :
--------------------------
During the initial loading phase, the elements returned by the server (event, metadata) are not recorded in the database.
They are temporarily stored in arrays of objects. 
To achieve this, the function adoptItem changes. Instead of recording the event in the database, it is stored in a array. SetMetaData function is also changed on the same way. The initial loading is fast in this case. 

For testing, the getItem function is also disabled during the initial phase. 
After a certain time, it becomes active and causes the writing element in the database.

The writing element is performed as follows: 
Creating a transaction. 
For each item in arrays,  adoptItem or  setMetaData is called.
End of the transaction 

In the function 'flushItem' calls 'deleteItemById', 'acquireTransaction' and 'releaseTransaction' are also commented out. 

Test results: 
-------------
Creating a CalDAV calendar with cache enabled. Creation of 72 events. 

Without modifications :
The initial load time is about 12 s on my machine. Meanwhile, the application appears frozen. 

With modifications: 
Initial load very fast. No display as 'getItems' is disabled. 
After initial download, deactivation / activation of the agenda to force a call to getItems . Data is stored in the database 
Fast execution time (<0,5s for 72 events).
This patch implements the solution described in comment 21.
It can be applied to 1.0b5, for testing purpose.
I do not know if it's the right approach, but we can see the impact of a single transaction when saving data in the database.
Attachment #554439 - Flags: review?(philipp)
Assignee: nobody → philippe.martinak
Status: NEW → ASSIGNED
Attachment #554439 - Attachment is patch: true
Mohit, since you've been working on the batch mode in the views, what do you think of this approach? I know the batch modes are a bit different, but I'd still value your opinion (same goes for Ludovic and Wolfgang)
Hmm... I am not that sure, but seems to be the right approach except that there is disabling of getItems() till the loading is successful. The views are going to call getItems() anyways and getItems is going to fetch info from the sqlite cache and not from the cache object array. If I am right, then the views would be completely frozen during the loading of the data. 

However I like the concept of having a nice wrapper around the transaction instead of putting individual releaseTransanction and acquireTransaction commands within the add/modify/delete functions. 

I mean if we have a number of transactions coming together why not put them together when we know sqlite is going to be slow.
Philippe, as you've disabled the initial getitems call its not quite clear how much performance win we get through this patch. Could you directly compare the time for creation of events, with and without the patch?
(In reply to Philipp Kewisch [:Fallen] from comment #25)
> Philippe, as you've disabled the initial getitems call its not quite clear
> how much performance win we get through this patch. Could you directly
> compare the time for creation of events, with and without the patch?

Here are some comparisons of load time.

The tests are performed on a local CalDAV server.
For each calendar, i created 100 simples events (no reccurence, ...).
Values, in seconds, are an estimate of the initial load time.

In test 4, a transaction is created for each operation.
(acquireTransaction/releaseTransaction inside the loop in the function processStorageOpCache).
So we can see that when loading a lot of events with cache enabled, 
using one transation for each event, load time is much longer.


-----------------------------------------------------------------
    |                            | 1 calendar     | 4 calendar 
| N°|        Test                |  x 100 events  | x 100 events
-----------------------------------------------------------------
| 1 | cache disabled             |       0.5      |      2.15 
-----------------------------------------------------------------
| 2 | cache enabled              |      15.6      |     61 
-----------------------------------------------------------------
| 3 | cache enabled + demo patch |       0.5      |      2.1 
-----------------------------------------------------------------
| 4 | cache enabled + demo patch |      15.3      |     58 
      (1 transaction/event)
-----------------------------------------------------------------

Sorry for the late reply!
This patch is different from the previous (554439).

In batch mode, the first call to 'AddItem', 'adoptitem', etc... starts a new transaction.

The transaction is ended at the end of the batch.
Attachment #554439 - Attachment is obsolete: true
Attachment #560876 - Flags: review?(philipp)
Attachment #554439 - Flags: review?(philipp)
Whiteboard: [not needed beta][no l10n impact] → [not needed beta][no l10n impact][gs]
I happily enabled caching in Lightning 1.0 now that it's not a beta function only to have Lightning freeze up for 13 minutes!

I have five calendar (all Google apps) full of events from about 2 years back until today. Enabling caching on any of them results in a long freeze while the calendar is downloaded. I can understand that this take a while the first it is enabled but at least give us a progress bar and run the process in the background so that user can continue working with other parts of Thunderbird.

As previously mentioned in other comments it also made startup horribly slow. Also when connectivity is lost and then established again (moving between wifi hotspots for example) also causes a long freeze.

I feel that caching is really a great feature but as it is implemented in Lightning 1.0 it doesn't scale very well with large calendars.
(In reply to Robert "Unlogic" Olofsson from comment #28)
> I happily enabled caching in Lightning 1.0

Does caching now also work for multiple Google calendars in the trunk builds (1.0b10pre)?
I had no problems enabling caching on all my Google calendars except for the freezes and slowdowns.
Hi Robert,


> I happily enabled caching in Lightning 1.0 now that it's not a beta function
> only to have Lightning freeze up for 13 minutes!

First of all, I think it's acceptable that a first synchronisation takes a lot of time since it's supposed to parse everything into the cache. For subsequent uses, it must however not take time. So it would be nice to know what is really happening:
does that occur every time or only during the first launch (could indicate a parsing issue leading to an incomplete synchronisation)? How many events do you have? Do you see any lightning-related exception appear in the javascript console?

> First of all, I think it's acceptable that a first synchronisation takes a
> lot of time since it's supposed to parse everything into the cache.

Commenting myself: the parsing code could certainly be enhanced further.
Comment on attachment 560876 [details] [diff] [review]
Patch with a single transaction in batch mode.

Review of attachment 560876 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, aside from these nits as below. I've changed this locally and will push it soon. If you object, let me know.

::: calendar/providers/storage/calStorageCalendar.js
@@ +99,5 @@
>      mRecEventCache: null,
>      mRecTodoCache: null,
>      mLastStatement: null,
> +    
> +    mInBatch: false,

the base provider uses mBatchCount, which can be used instead.

@@ +114,5 @@
> +    //
> +    startBatch: function () {
> +      this.mInBatch=true;
> +      
> +      this.__proto__.__proto__.startBatch.apply(this, arguments); 

Which makes this redundant

@@ +453,2 @@
>          let newItem = aItem.clone();
> +        return this.adoptItemInt(newItem, aListener, transact);

Instead of adding a new *Int function, we could rather put the if block into a function prepareBatchTransaction() that returns the transact variable.
Attachment #560876 - Flags: review?(philipp) → review+
I think its time for a new bug for all other/new perf issues. Lets hope this gets rid of them!

Pushed to comm-central <http://hg.mozilla.org/comm-central/rev/45bca33cdcfd>
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4
(In reply to Philipp Kewisch [:Fallen] from comment #35)
> I think its time for a new bug for all other/new perf issues. Lets hope this
> gets rid of them!
> 
> Pushed to comm-central <http://hg.mozilla.org/comm-central/rev/45bca33cdcfd>

Hi Philipp,

we have tested this patch on different configurations.
When loading a calendar, the loading time is similar to a calendar without cache.
There is however a problem when first loading multiple calendars.
Loading multiple calendars generates an exception when calling 'acquireTransaction'.

For example, loading two agendas, the second startBatch occurs before the endBatch call of the first calendar.
The first transaction in this case has not been completed.
this.mDB.transactionInProgress is false when calling addItem, etc. for the second calendar.

I have not yet the solution. 
the use a single mDB might be a good approch?
I think we tried a single db at some point, but this caused other problems. I have backed out the change for now until we find a solution.

comm-central changeset d6deaa2e2337
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 1.4 → ---
On a sqlite database, one transaction at a time can be active. 
So when loading multiple calendars, the implementation of a single transaction in batch mode is not currently possible. 
This patch (patch_412914_v3) allows the implementation of a single transaction in batch mode with multiple agendas. 

In this patch :
------------------
A transaction is started on the first call 'acquireTransaction'. 
Subsequent calls are ignored in batch mode. 

The transaction is completed when calling 'releaseTransaction' if a transaction is in progress and if the batch is completed. 

To execute multiple transactions simultaneously, a specific database file is used for each calendar.
The calendar identifier is used to build the database file name.

To avoid reloading the content of existing calendars, existing data from cache.sqlite are copied to the new database (prepareInitDB).
Attachment #594195 - Flags: review?(philipp)
Isn't there some different way to count the number of transactions and still do the right thing? Moving calendars into separate files is quite a step and migration happens on each calendar.

We need to make sure backups of sqlite files on upgrade still work (maybe move them to a subdirectory, since the directory will have a lot of files)

Also, its not quite clear to me how this will work out with the current concept of a separate cache and local.sqlite. It seems you are changing it only for calendars subscribed via file:// uri?
Attached patch patch v4Splinter Review
Changes from the patch v3


The migration of calendar data in a specific database is performed in 'migrateCalendarDB' method.
This method could be reused for local calendars.

If an error occurs in the method 'migrateCalendarDB', the new calendar database is not created, 
and the corresponding file is deleted. The file 'cache.sqlite' is used in this case.
In all cases, the original data in cache.sqlite are not removed, but copied to the new database.

The fix implemented in the file 'calStorageUpgrade.jsm is needed to close and delete the new database if necessary.
Without the fix :
  dbService.openDatabase(DB);
  upgradeDB(DB);
  DB.close();
raises an exception with error code 0x80630001.

The patch also includes the removal of the calendar file when it is deleted.

Method 'finalizeStatements' is needed to close the database before deleting the file.



Upgrade and backup (tests) :
============================

Test Conditions 
---------------
2 caldav calendars with data
2 local calendars with data

calendars are created with ln 1.0b3 (schema v19)

For the tests, migrateCalendarDB is used for local calendars (not in the patch v4).

calendar-data files :
cache.sqlite
local.sqlite

Migration test without patch v4 
-------------------------------
ln 1.2 -  original  code

upgradeDB => backupDB
=>
calendar-data files :
cache.sqlite
local.sqlite
backup\local.v19.sqlite
backup\local.v19-1.sqlite (cache.sqlite backup)

If the user must restore the file, how can he determine which file is the backup of cache.sqlite? 


Migration test with patch v4/ln1.0b3 
------------------------------------

* ln 1.0b3 with patch v4

migrateCalendarDB
=>
calendar-data files :
<id1>.sqlite
<id2>.sqlite
<id3>.sqlite
<id4>.sqlite
cache.sqlite
local.sqlite

(<idn> -> calendar id)

* ln 1.2 with patch v4

upgradeDB => backupDB
=>
calendar-data files :
<id1>.sqlite
<id2>.sqlite
<id3>.sqlite
<id4>.sqlite
cache.sqlite
local.sqlite
backup\local.v19-1.sqlite
backup\local.v19-2.sqlite
backup\local.v19-3.sqlite
backup\local.v19.sqlite

'local.vxxx.sqlite' could be modified to '<calendar id>. vxx.sqlite'?


Migration test with patch v4/ln1.2
-------------------------------------------
ln 1.2

upgradeDB => backupDB
=>
Fichiers dans calendar-data :
<id1>.sqlite
<id2>.sqlite
<id3>.sqlite
<id4>.sqlite
cache.sqlite
local.sqlite
backup\local.v19-1.sqlite
backup\local.v19.sqlite


Files in backup are backups of 'cache.sqlite' and 'local.sqlite'.
Attachment #594195 - Attachment is obsolete: true
Attachment #596029 - Flags: review?(philipp)
Attachment #594195 - Flags: review?(philipp)
Status: REOPENED → ASSIGNED
review? ping
Flags: needinfo?(philipp)
Summary: Enable offline caching for ical calendar with many events -> startup horrible slow → Enable offline caching for ical calendar with many events -> startup horrible slow with high CPU
Comment on attachment 596029 [details] [diff] [review]
patch v4

Review of attachment 596029 [details] [diff] [review]:
-----------------------------------------------------------------

I'm sorry this is taking so long, but I still have a bad feeling about this because of the impact and risk of failure with this patch. This kind of change can't easily be undone in case something goes wrong. Before we go this route I'd prefer exploring other options. Write Ahead Logging has been enabled in the nightly builds, another thing that should be tested is if moving to the async sqlite api improves perceived performance.

Aside from that, this patch still has a few flaws. Mostly, hardcoding things is not the right way to go. If we ever get to a final version here, I'd also like to ask you to remove whitespaces at the ends of lines and make sure indentation is correct.

r- for discussion, I'm happy to do a video conference about this as your manager had proposed when we met at Linagora.

::: calendar/base/src/calCachedCalendar.js
@@ +190,5 @@
>                          }
>                          break;
>                      case "storage":
>                          let file = getCalendarDirectory();
> +                        file.append(this.id + ".sqlite");

I'd prefer "cache" + this.id + ".sqlite". Also, what happens if the id contains a character that is not valid for a filename? Lightning itself will use uuid's, but other providers can create free form strings.

::: calendar/providers/storage/calStorageCalendar.js
@@ +178,5 @@
>          }
> +        
> +        //delete calendar file 
> +        let calDBFile=cal.getCalendarDirectory();
> +        calDBFile.append(this.id+".sqlite");

If opened via file:// uri this is not the correct way to delete it.

@@ +200,5 @@
>              this.logError("error calling listener.onDeleteCalendar", ex);
>          }
>      },
> +    
> +    finalizeStatements: function cSC_finalizeStatements() {

I think we have some code to finalize statements now, but in general you are right of course.

@@ +329,5 @@
> +        
> +            let fromDBFile = cal.getCalendarDirectory();
> +            fromDBFile.append("cache.sqlite");
> +              
> +            this.migrateCalendarDB(fromDBFile);

I don't agree. If the calendar is initialized using a file:// uri, then it should open just that uri, not the cache.sqlite. Even if, hardcoding cache.sqlite here is not the right way to go.

@@ +538,5 @@
> +
> +              for each (let table in ["cal_alarms", "cal_attachments",
> +                                      "cal_attendees", "cal_events",
> +                                      "cal_metadata", "cal_properties",
> +                                      "cal_recurrence", "cal_relations",

If this is being done outside of the calStorageUpgrades file then all tables with the cal_id field need to be copied, otherwise we will lose data when the schema changes.

@@ +558,5 @@
> +            this.mDB.executeSimpleSQL("DETACH DATABASE caldb");
> +          }
> +          
> +          this.mDB = calDB;             
> +          cal.LOG("[calStorageCalendar] Migrating to 1 database per calendar done.");

replace 1 with one please.
Attachment #596029 - Flags: review?(philipp) → review-
Flags: needinfo?(philipp)
Hi,

This looks like an issue I am experiencing. I just installed TB 24.1.1 on a new Windows 8.1 machine. Added a lightning plugin v 2.6.3. Added Google calendar using the CalDAV protocol. Everything works fine except it takes about a minute to load the calendar when I start TB but that's another issue. This is when I figured turning on caching might improve things. But now when I open TB it just hangs. Resource Monitor (screen shot attached) shows that TB is crazy busy writing calendar-data\cache.sqlite-journal. In the end I had to kill the process. On a subsequent app start I quickly went and disabled caching.

Hope this helps. If there is any more data I can provide, I think I can reproduce the issue, just not too keen :), I could take a process memory dump or smth.

Thanks
I experience exactly the same. 25% load for minutes. Each disable of a offline-support for a calendar decreases cpu load in steps each (got 2 remote calendars).
Different idea: To me it makes sense to sync "current" events, but not all of the history/archive. Unfortunately the ical settings of google do not allow to have filtered URLs.

So it would definitely help (in my case) to have a date range filter in the calendar settings of lighning. So whatever the calendar load is doing internally, and caching or not, it *could* probably skip a lot of (old) entries and be faster.

not a solution, but a workaround and by the way a little feature?
by the way: 
in my case TB (and lighning) is slow and consuming 1 cpu for itself and grabbing memory up to 400mb.
but lightning (and tb) is still usable, slowed down only during loading/syncing the remote calendar. so if i navigate to the lighning tab you can see (or enable) the "eventlist" and put the filter there to "all entries". you can also enable the colum "calendar name" to see which calendar the item is for and you have to scroll to the place in the list where new items are showing up (line by line).

in my case I can see there every single calendar item showing up in the list as it synchronizes with the remote calendar.

probably this helps somebody to check if it is the same problem occurring or if something else is slowing down tb or lighning.
When enable “Offline Support” in calendar's property, it takes a lots of disk activity for lightning to create the cache.sqlite, which in my case only have less than 500 rows in total. I thought the slowness might have something to do with the hardware, but on the same machine, it is almost instant to insert over 30,000 rows with 4 columns each, if only commit after all insertion.

Most of those 500 rows are in cal_attendees, cal_events, and cal_properties table. In calStorageCalendar.js, the insertion to those three tables appear executed synchronously by use executeStep(), since mozilla “strongly recommended that you use asynchronous execution”, is there a reason to use synchronously INSERT in Lightning?
The synchronous API usage is mostly historical, I think back then there was no asynchronous API. Unfortunately there are still other parts of the code that depend on the storage provider being synchronous. We have put up a Google Summer of Code project to fix this situation, see the following page. Essentially it means fixing bug 501689.

https://wiki.mozilla.org/Community:SummerOfCode14:Brainstorming#Calendar

The idea is to get rid of all those extra tables and only add columns/tables for things that specifically need to be retrieved in SQL queries. Are you a student by any chance? Maybe this would be a great opportunity, you've demonstrated in the other bug that you are capable of tackling problems without requiring prolonged mentoring.
Flags: needinfo?(philipp)
Philipp, thanks. As a weekend hacker who left school many years ago, let me see if I can digest the codes first.
Few observations:

The disk activity I experienced happened within calCachedCalendar.js -> setupCachedCalendar. 

First, it delete all items of the calendar which “Offline” mode just switched on or on the start of thunderbird. This step conducts one sync delete each on 7 tables. It is sufficiently fast.

Second, it add items one by one, this part generate most disk activities. The addItem(adoptItem actually) sometimes will do a SELECT to get item id, then DELETE items with that item id, the deletions is wrapped in a transaction; it will try DELETE the same item again in flushItem() before it write the item to db. The this.acquireTransaction in flushItem() doesn't appear help the I/O problem because the writeItem() inside it uses sync executeStep(), in the worse case, INSERT into 7 tables, sometimes use more than 7 sync writes because the cal_properties table may need multiple INSERTs. Therefor one addItem could generate 3-9, or even more sync writes(not sure whether DELETE on an already deleted item will cause disk I/O or not). 

I am thinking of a dirty fix by add a new function, e.g. addItems to wrap all db operations in the second part into a dbconnection.executeAsync(), execute them all at once and at background. 

Does this approach going to work?
I can't say for sure, but there may be code that expects addItem to be synchronous. This is definitely a bad design decision, but instead of hacking something in specifically for addItem, I think it would be better to bring forward the new, fully async, storage provider in bug 501689, which is scheduled for GSoC in case a good application comes up. The code in that bug is almost there, we just need to fix some more callers and figure out the timezone issue. If you would like to commit to work on that bug, I can take it down from the GSoC list, but if you are not sure you have the time it would be too bad if we were to turn a student down. What do you think? Feel free to email me or see me on IRC in case you don't want to discuss inside a bug.
I agree it is a bad design. It feels heavy, need add large chunk of codes which mostly is just duplication of current codes, and those new codes are not reusable elsewhere. 

A new storage provider is very exciting but I don't have the time, and more importantly, experience for it. It will make a great GsoC project. In the meantime, it is still very cold in my part of the planet, and the friut of GsoC may only arrive at this autumn. Perhaps we can make some temporary, simple fix before the summer come to relax our disk little bit.

I have a second thought. How about:

1) add a boolean property this.mExecuteMany. It decide whether the actual executeAsync() should be done immediately or postponed. It default to false, means the executeAsync() done immediately.

2) add a helper function prepareAyncStatement. 

Use it to prepare INSERT / DELETE / UPDATE statements. It will push those statements into a statements array this.mAsyncSTMTS. 

CREATE TABLE and SELECT still done synchronously. I have not sure once  executeAsync() started, the SELECT can get correct result from the disk file and write queue, or the SELECT is blocked until executeAsync() finish.

3) in the current codes, wherever executeStep() involves INSERT / DELETE / UPDATE, replace it with a block like:
if (!this.mExecuteMany) {
    dbconnection.executeAsync(this.mAsyncSTMTS, length-of-mAsyncSTMTS);
    this.mAsyncSTMT = null;  // empty the statements queue 
}

4) if db operation involves multiple SQL modification statements
 - set this.mExecuteMany to true, executeAync() and empty this.mAsyncSTMTS first
 - continue modify items like before. Now when addItem is called, the statement is not executed immediately because this.mExecuteMany is true. 
 - at the end of modify items, call dbconnection.executeAsync(this.mAsyncSTMTS, length-of-mAsyncSTMTS) to execute modifications together, then empty this.mAsyncSTMTs. We can also wrap it in a try {} finally{} block make sure this.mExecuteMany reset to false, therefor later addItem will be done immediately.

This approach mostly use the mozIStorageConnection.executeAsync() as a batch processor and transaction manager.
Just before upload this patch, I found the latest comm-central calStorageCalendar has enabled WAL, and it solves my disk problem completely, amazing, should found out earlier!

Since I have finished this patch, upload it anyway, the executeAsync batch execute is fast as advertised, it reduce disk activity to almost none, most time were spent on prepare statements.
Yes, WAL is enabled in Lightning 2.9 and newer with Bug 918175.
(PMARTINAK is gone, so unassigned)

Fallen, is comment 53 worth pursuing?  
If not, perhaps Chen Wei could lend a hand in bug 501689?
Assignee: philippe.martinak → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(philipp)
Whiteboard: [not needed beta][no l10n impact][gs] → [not needed beta][no l10n impact][gs][patchlove]
I think for bug 501689 all we need to do is fix the tests, I've looked over the code many times and I think it should be ok. If Chen could take a look at the code to see if there are any potential issues it would be nice.

If possible I think we should still try to land the async storage patch on central/aurora asap so it is available for TB38. We should make sure that storage backups are being made during upgrade and if it gets too risky back it out.
Flags: needinfo?(weichen302)
Flags: needinfo?(philipp)
Flags: needinfo?(anderson)
Lightning has been worked very well for me in the past year. No complains. The last time I looked at lightning storage code was one year ago so I am bit rusty.

To me, WAL has unloaded the async part from lightning to sqlite. Is there any performance complains recently?
Flags: needinfo?(weichen302)
I think that the tests have been fixed on bug 501689, and (hopefully) everything is working as it should.
Flags: needinfo?(anderson)
See Also: → 501689
Whiteboard: [not needed beta][no l10n impact][gs][patchlove] → [patchlove][calperf-key] [not needed beta][no l10n impact][gs]

Is this looking any better since 102.x?

You need to log in before you can comment on or make changes to this bug.