Closed Bug 479867 Opened 15 years ago Closed 14 years ago

Cached calendars don't set id correctly, causing duplicate events to be shown for multiple cached calendars

Categories

(Calendar :: Internal Components, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Fallen, Assigned: Fallen)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [needed beta][no l10n impact][STR in comment 8])

Attachments

(2 files, 4 obsolete files)

I'm surprised this didn't show up earlier, but here:

http://hg.mozilla.org/comm-central/annotate/3b7f1a26cde5/calendar/base/src/calCachedCalendar.js#l205

we only set ?id=NNN if the current calendar uri also contains it. This means for most calendars, we don't explicitly set a calId, so this.mCalId is always 0 for storage calendars created here.

This means all cached calendars will claim to have the same set of items.

The solution could be to make the calId column in the database a TEXT field and instead use |this.id|. This means we either have to do some tricky migration, or we have to let the ?id= trump over this.id and tell users to clear their cache file in the next version, which also kind of sucks...
Flags: blocking-calendar1.0?
Severity: normal → major
Attached patch WiP Patch - v1 (obsolete) β€” β€” Splinter Review
This is the quick-hack solution, which we obviously shouldn't do, but I wanted to attach it to get some discussion going.

What this patch is missing is to do a database upgrade to convert all cal_id INTEGER colums to TEXT columns, and then use this.id instead of this.mCalId. This requires the tricky migration I'm talking about.

The alternative is to prefer ?id=NNN over this.id, but then we'd need to do some extra migration for calendars that pass ?id=NNN when the id is set. This makes the code look ugly, even more ugly than it already is.

I think we should move the updateDB code to a new file.
Attached patch Fix - v2 (obsolete) β€” β€” Splinter Review
This should take care. It upgrades the database to use a TEXT field and also uses :cal_id as a parameter since this.id might be quite arbitrary and we should rather escape it.
Assignee: nobody → philipp
Attachment #363772 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #363849 - Flags: review?(dbo.moz)
Comment on attachment 363849 [details] [diff] [review]
Fix - v2

Asking for additional review
Attachment #363849 - Flags: review?(ssitter)
Since cal_id is pretty much always involved when querying the database, I suspect that using a TEXT column slows down performance.
What about an additional table that maps UUID calids to the internally used INT cal_ids? I am uneasy about large schema changes which had lead to regressions quite too often.
This means we have to keep track of both fields somehow. We also have to find the next available ID for a new calendar, with the TEXT field we get rid of that need.

Maybe we can hash down the calendar id to something smaller that fits into an INT field, but this could cause collisions.

Remember also that we aren't really using indexes, so I'm not sure how much of a performance loss this would be.


I think we should fix this bug before the release since it will break cached calendars completely and I can imagine at least one party that would frown upon that.
Flags: blocking-calendar1.0? → blocking-calendar1.0+
Confirmed as regression.

Works in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090108 Calendar/1.0pre (BuildID=20090108032337)

Fails in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090109 Calendar/1.0pre (BuildID=20090109032231)

Checkins during regression range: https://hg.mozilla.org/comm-central/pushloghtml?startdate=2009-01-08+03:23:37&enddate=2009-01-09+03:22:31

Most probably regressed by Bug 403006.
Blocks: 403006
Keywords: regression
(In reply to comment #5)
> This means we have to keep track of both fields somehow. We also have to find
> the next available ID for a new calendar, with the TEXT field we get rid of
> that need.
I suspect I don't understand your comment. The cal_id would serve as a primary/unique incrementing column in that additional table (e.g. cal_calendars). What's the problem? Adding another table seems far simpler to me, leaving the rest of the code as is.

> I think we should fix this bug before the release since it will break cached
> calendars completely and I can imagine at least one party that would frown upon
> that.
I agree.
As written in comment #6 this seems to be regressed by Bug 403006. Can it be fixed by a small regression fix instead of a big database change?

I used the following steps to reproduce the issue:
1. Start Sunbird with clean profile
2. Subscribe three different calendars from
   <http://www.mozilla.org/projects/calendar/holidays.html>
3. Check the events in the main calendar view
4. Enable cache for all calendars and restart Sunbird
5. Check the events in the main calendar view again
Whiteboard: [STR in comment 8]
(In reply to comment #7)
> (In reply to comment #5)
> > This means we have to keep track of both fields somehow. We also have to find
> > the next available ID for a new calendar, with the TEXT field we get rid of
> > that need.
> I suspect I don't understand your comment. The cal_id would serve as a
> primary/unique incrementing column in that additional table (e.g.
> cal_calendars). What's the problem? Adding another table seems far simpler to
> me, leaving the rest of the code as is.

I think I understand now what you suggest. This sounds like quite a hack to me though, we have an extra table that almost acts as a calendar registry. I guess I could produce a such patch to fix the issue for now to avoid regressions, but do we really want to do this?

I agree the above patch is quite invasive though.
(In reply to comment #9)
> I think I understand now what you suggest. This sounds like quite a hack to me
I don't think it's hack but just another relation. Textual UUID are not that handy as int calids are, and if it's easy to join them in, I'd opt to use them. I don't see the need for such an invasive change, and schema changes have hurt us badly in the past.
Attachment #363849 - Flags: review?(dbo.moz)
Attachment #363849 - Flags: review?(ssitter)
Comment on attachment 363849 [details] [diff] [review]
Fix - v2

Removing review request until discussion about final solution is finished.
I looked into this solution, but I'm not quite sure it will work out: How do we correctly migrate calenders?

table:
"  cal_id    INTEGER," +
"  cal_uuid  TEXT"

Calendars with ?id=N need to have the cal_id field set to that id. Those without (i.e new calendars or the cached calendars) won't have an id, so the maximum value or auto incremented value will be used. Now picture this:


register moz-profile-calendar://?id=1    -> id 1
register moz-profile-calendar://         -> id 2
register moz-profile-calendar://?id=2    -> !! Overwrite id 2

Its quite possible that this condition won't happen, but it seems like a quite fragile process to me...
(In reply to comment #12)
> I looked into this solution, but I'm not quite sure it will work out: How do we
> correctly migrate calenders?
> 
> table:
> "  cal_id    INTEGER," +
> "  cal_uuid  TEXT"
> 
> Calendars with ?id=N need to have the cal_id field set to that id. Those
> without (i.e new calendars or the cached calendars) won't have an id, so the
> maximum value or auto incremented value will be used. Now picture this:

I think there's not problem migrating existing calendar. If there's no id encoded into the url, this.mCalId is and has been set to 0, i.e.

register moz-profile-calendar://?id=1    -> id 1
register moz-profile-calendar://         -> id 0
register moz-profile-calendar://?id=2    -> id 2

Before creating new storage calendars, the calendar creation wizard currently checks for an empty id slot. We should change that code to just create a UUID and encode it into the url, like &id=uuid.

W.r.t. cached calendars the UUID should be shared, e.g.
remote calendar id: uuid_a
=> storage calendar url: &id=uuid_a

Does this make sense?
This is the status of what will happen *with* the patch, not without.

register moz-profile-calendar://?id=1    -> calid 1
register moz-profile-calendar://         -> calid 2
register moz-profile-calendar://?id=2    -> !! Overwrite calid 2

Even if I let new calendars use moz-profile-calendar://?id=<uuid> then we still have the problem:

register moz-profile-calendar://?id=1       -> calid 1
register moz-profile-calendar://?id=<uuid>  -> calid 2
register moz-profile-calendar://?id=2       -> !! Overwrite id 2

Of course I could use a hashing algorithm similar to the uuid generation for the maximum length of what the calid field can do. I still think this mapping is quite hacked.

Daniel, do you want to take over this bug?
(In reply to comment #14)
> Daniel, do you want to take over this bug?
I try to find some time.
Assignee: philipp → dbo.moz
Whiteboard: [STR in comment 8] → [not needed beta][no l10n impact][STR in comment 8]
I've asked my way around #sqlite regarding performance and I've found out that the physical layout of the file is still row-id based, i.e the PRIMARY KEY internally maps to the row id, so if we make the primary key a TEXT field, there will still only be one indirection.

In any case, the performance decrease should only be minimal.
Is there any progress on this? Because the interaction of this with bug 523987 makes alarms quite unusable. I would happily take a performance decrease over the (very high) possibility of missing alarms.
Sorry, I currently don't have any bandwidth to do this.
Assignee: dbo.moz → nobody
Status: ASSIGNED → NEW
A testing server (caldav one) has been set-up. It could help most of sunbird developpers to test 

See explanation on http://caldav-test.ioda.net
Attached patch patch - interim no cache (obsolete) β€” β€” Splinter Review
Since this bug bugs people having the experimental cache switched on I propose to simply switch off using it until this bug is thoroughly fixed.
Attachment #416880 - Flags: review?(philipp)
Oh please, no! Would it be difficult to make the cache flag only available for a single calendar - working like a radio button?
Caching is by default not switched on for any calendar since it's still experimental. Users need to explicitly switch it on for a calendar via the calendar properties (a radio button). But since caching is currently not working correctly, I'd like to keep it off until this bug is fixed.
Please do not disable caching possibility. Seems to me a lazy solution. The problem appears only if user switches caching on, default is off.
And more, it appears to be non working only if user set more than one calendar.

Perharps better ( if not fixing id in cache ) to make a string telling that only one calendar should be locally cached.
Comment on attachment 416880 [details] [diff] [review]
patch - interim no cache

I don't think we should disable the cache either. I agree the cache is quite unusable with multiple calendars, but we shouldn't disable it completly.

We can't add strings this late in the release process, but maybe we can check if there is already a cached calendar registered. This isn't the best user experience, but it is better than showing duplicate events.

What do you think?
Attachment #416880 - Flags: review?(philipp) → review-
I was not thinking it was possible, but I definitively vote for. 
In this case ( and the #530423 ) I can always but the biggest schedule in cache, to optimize time access.
(In reply to comment #31)
> I was not thinking it was possible, but I definitively vote for. 
> In this case ( and the #530423 ) I can always but the biggest schedule in
> cache, to optimize time access.
Sorry, we can't guarantee the largest schedule. Just the first that is configured by the calendar manager. Everything else would be a lot of unneeded work.
(In reply to comment #30)
> ... maybe we can check
> if there is already a cached calendar registered. This isn't the best user
> experience, but it is better than showing duplicate events.

This is what I meant when I said "make the cache flag only available for
a single calendar - working like a radio button?" in comment 26.

The possibilities would be to
a) disable the option if another calendar is already cached, or
b) deactivate caching for the other calendar and cache the currently edited instead.

Option a) would likely be doable without strings changes.
(In reply to comment #30)
> (From update of attachment 416880 [details] [diff] [review])
> I don't think we should disable the cache either. I agree the cache is quite
> unusable with multiple calendars, but we shouldn't disable it completly.

I don't understand this reasoning. Please mind that this regression has yet produced a lot of duplicate bugs and much communication around it. I can imagine when beta comes this would be even more. IMHO release notes don't help much since most users don't read them.

> We can't add strings this late in the release process, but maybe we can check
> if there is already a cached calendar registered. This isn't the best user
> experience, but it is better than showing duplicate events.
> 
> What do you think?

I think we should fix this bug instead of adding further UI. But for the moment: If we know that caching has flaws I'd vote to disable it until it's fixed to not lead users into trouble. Don't get me wrong, this needs to be fixed ASAP.
> (In reply to comment #31)
sorry that's what I want to say. enable only one schedule ( the size is my problem :-) 

(In reply to comment #32)
Don't disable cache in we cannot render calendars quickly. 
How would you resolve the offline use of remote schedules ?
Although I haven't found a real benchmark suite for sqlite, I did some bash scripting foo together with setting ".timer on" in the sqlite3 shell, and found that over 1000 queries including cal_id, there is no notable difference between using cal_id as a TEXT field and what we have now.

Therefore I think we should go ahead and do it. I'll unbitrot this patch soon, I'd like to see this fixed.
Assignee: nobody → philipp
Attached patch Fix - v3 (obsolete) β€” β€” Splinter Review
I'm not totally happy with the conversion mechanism, but I see no better alternative right now. To migrate calendars to the correct ID, we need to have it available. This is not the case when running the normal upgraders.

I was thinking maybe we could introduce another level of upgraders that runs after the calendar is fully initialized, but this will make the process even more confusing. We could also postpone the complete upgrade process until we have an id and uri set, this might work.

What this patch doesn't do is drop the events that are already borked. While they are easy to spot (they all have the calendar id "0"), I'm a bit uncomfortable with dropping all events with an id "0", as this might cause dataloss in unusual cases. For the record, in my development profile my database has two events in local.sqlite that also have id 0, I have no idea how they got there.

The way we have the cache implemented, its impossible to make this upgrade dependent of if the calendar in question is on local.sqlite or cache.sqlite without breaking the encapsulation, so we can't make removing events with id 0 dependent on the calendar either.

I could imagine making a check for this part of the fix database extension created in a previous bug though, we could offer this to users that experience slowness, or maybe very large local/cache.sqlite files.

This patch needs the patch to bug 546300 applied to apply cleanly. Stefan, since you did such a marvelous job reviewing bug 470430, I'd appreciate if you could also review this patch.
Attachment #363849 - Attachment is obsolete: true
Attachment #416880 - Attachment is obsolete: true
Attachment #426988 - Flags: review?(ssitter)
Status: NEW → ASSIGNED
Comment on attachment 426988 [details] [diff] [review]
Fix - v3

Also requesting review from simon since ssitter has many items in his queue. Stefan, if you don't mind I'll check this in as soon as simon finds time for review.
Attachment #426988 - Flags: review?(simon.at.orcl)
Attached patch debitotted patch β€” β€” Splinter Review
I debitrotted the patch (Hopefully without making a mistake).

Here is what I did, without the patch:
- I added 3 calendars each with 2 or 3 meetings
- I enabled the cache and restarted, noticed everything was messed up.

Then I installed the patch and restarted:
2 of my 3 calendars were fine but one of them did not show anything, here are the relevant console messages:
Storage: Migrating numeric cal_id to uuid
Storage: Preparing to upgrade v18 to v19
Storage: Upgrading to v19
Storage: Preparing to upgrade v18 to v19
Storage: Upgrading to v19
Error: Error selecting item by id bcecc37d-c494-484c-a2eb-95262e8a4d34!
[Exception... "Component returned failure code: 0x8052000e (NS_ERROR_FILE_IS_LOCKED) [mozIStorageStatementWrapper.step]"
  nsresult: "0x8052000e (NS_ERROR_FILE_IS_LOCKED)"  location: "JS frame :: file:///C:/Documents%20and%20Settings/svailla
n/Application%20Data/Thunderbird/Profiles/epplsn6t.tb3/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/modules/cal
Utils.jsm -> file:///C:/Documents%20and%20Settings/svaillan/Application%20Data/Thunderbird/Profiles/epplsn6t.tb3/extensi
ons/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/calendar-js/calStorageCalendar.js :: cSC_getItemById :: line 1691"  data:
 no]
DB Error: database table is locked
Error: Error selecting item by id bcecc37d-c494-484c-a2eb-95262e8a4d34!
[Exception... "Component returned failure code: 0x8052000e (NS_ERROR_FILE_IS_LOCKED) [mozIStorageStatementWrapper.step]"
  nsresult: "0x8052000e (NS_ERROR_FILE_IS_LOCKED)"  location: "JS frame :: file:///C:/Documents%20and%20Settings/svailla
n/Application%20Data/Thunderbird/Profiles/epplsn6t.tb3/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/modules/cal
Utils.jsm -> file:///C:/Documents%20and%20Settings/svaillan/Application%20Data/Thunderbird/Profiles/epplsn6t.tb3/extensi
ons/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/calendar-js/calStorageCalendar.js :: cSC_getItemById :: line 1706"  data:
 no]
DB Error: database table is locked
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8052000E: file c:/cygwin/home/svaillan/mozsrc/tb/comm-central/mo
zilla/storage/src/mozStorageStatement.cpp, line 702
Attachment #426988 - Attachment is obsolete: true
Attachment #434655 - Flags: review?(philipp)
Attachment #426988 - Flags: review?(ssitter)
Attachment #426988 - Flags: review?(simon.at.orcl)
(Disclaimer: I have not looked at the patch yet.)

If updating the cache database file is an issue or complicated, why not just remove it completely and recreate it during the next sync operation?
I agree with Stefan comment #43, it would also have the benefit of "fixing" old client/server sync bugs that might have happened during over time...
The problem might actually be the locks. I'm not sure if there is an easy way to delete the database during upgrade, because a connection might already be open. See the complicated upgrade code between storage.sdb and local.sqlite in the calendar manager.
Reads like a race condition. I can imagine it helps to use an exclusive transaction lock (on local.sqlite) while running both id=>uuid conversion and upgradeDB().
If it's a race condition it doesn't seem recuperate from it, ie: once I get a "database locked" message all subsequent requests return the same error.. I noticed that many statements were not "reset" in a finally block but even after fixing all of them, the error still happens. I also tried to figure out what could leave the database in a "locked" state without any luck...

Note that I don't think the patch is the cause of the error I got, I could reproduce it without the patch by:
1) Create 3 calendars each with ~5 meetings
2) Enable cache on all 3 calendars
3) Exit thunderbird and backup the profile for easy re-test
4) Start thunderbird 

I don't mind investigating this further, if anyone has suggestions let me know. I don't see any errors logged so it's pretty hard to identify the initial call that leaves the db in a locked state.
Attached patch debitrotted + lock β€” β€” Splinter Review
Simon, though I haven't tested this patch further, could you give it a try on windows? thanks! It still has the upgrade step out of the exclusive lock section; this may be something to experiment further.
Attachment #434655 - Flags: review?(philipp)
Attachment #438055 - Flags: review?(simon.at.orcl)
Comment on attachment 438055 [details] [diff] [review]
debitrotted + lock

The errors are gone and the cache seems to be getting updated properly when doing test case I described in a previous comment.
Attachment #438055 - Flags: review?(simon.at.orcl) → review+
Pushed to comm-central <http://hg.mozilla.org/comm-central/rev/ba2e735f04b2>
-> FIXED
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.0b2
(In reply to comment #50)
> -> FIXED

Does this mean we can now turn on the "Cache" for our multiple calendars?
Depends on: 561026
Depends on: 561735
(In reply to comment #51)
> Does this mean we can now turn on the "Cache" for our multiple calendars?

Yes, aside from the regressions :-)
(In reply to comment #52)
> (In reply to comment #51)
> > Does this mean we can now turn on the "Cache" for our multiple calendars?
> Yes, aside from the regressions :-)

I don't know if it's related, but Thunderbird is now crashing whenever I try to open the Lightning tab. :-\
It's not related. It's caused by a change in the Toolkit part in Shredder that breaks Lightning, see Bug 561649.
When can we expect this fix pushed to a Lightning update please?  Many thanks for all the hard yards:)
Whiteboard: [not needed beta][no l10n impact][STR in comment 8] → [needed beta][no l10n impact][STR in comment 8]
Same question from me. Currently using Lightning 1.0b1 and still encounter same problem.
As you can see read from the status fields above this bug is fixed for Lightning 1.0b2. It will be released together with Thunderbird 3.1. The current test build 3 for Lightning 1.0b2 can be downloaded from <http://weblogs.mozillazine.org/calendar/>
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: