Closed Bug 402518 Opened 17 years ago Closed 17 years ago

Enhance storage provider to store foreign timezones

Categories

(Calendar :: Internal Components, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: dbo, Assigned: dbo)

References

Details

Attachments

(1 file, 1 obsolete file)

The storage provider needs to self host foreign timezone definitions used in its stored calendar items. Foreign timezone definitions are not available via the (yet to come) reference timezone service (aka the Olson set).
Flags: wanted-calendar0.8+
Attached patch full patch (obsolete) — — Splinter Review
This patch handles timezone columns (the "_tz" ones) as a union {tzid, tz-definition}. This makes it possible to save arbitrary foreign timezones side-by-side in the same storage calendar, e.g. when importing from multiple sources. However we have to keep in mind that foreign timezones are more expensive due to the added parsing and calITimezone creation. I've added a small local cache, so timezones are not parsed over and over again. Moreover, I've removed to save the full timezone information with EXDATEs, coercing those to UTC|floating which is IMO ok. Finally, I've added a getIcsService helper function and consolidated the code base to use it.
Assignee: nobody → daniel.boelzle
Status: NEW → ASSIGNED
Attachment #294664 - Flags: review?(ctalbert)
Can you split of the part that deals with getIcsService? More then half the patch is cleanup, instead of fixing the bug. That's a bit too much imho.
Comment on attachment 294664 [details] [diff] [review]
full patch

Another nit:

>Index: calendar/providers/storage/calStorageCalendar.js
>+function calStorageTimezone(component_) {

The good old fashioned style is to use aComponent. Let's jsut use that, instead of inventing a new naming style.
Comment on attachment 294664 [details] [diff] [review]
full patch

Since you already started the review... please go on.
Attachment #294664 - Attachment description: patch → full patch
Attachment #294664 - Flags: review?(ctalbert) → review?(mvl)
Attached patch calStorageCalendar part — — Splinter Review
Attachment #294723 - Flags: review?(mvl)
Depends on: 410080
Attachment #294664 - Attachment is obsolete: true
Attachment #294664 - Flags: review?(mvl)
Comment on attachment 294723 [details] [diff] [review]
calStorageCalendar part

>+function stripTimezone(dt) {
as discussed on irc: change into getInUtcOrKeepFloating

>+function calStorageTimezone(component_) {
Add a comment that this is an implementation of calITimezone.
And does it need a QueryInterface function? (Maybe xpconnect takes care of that magic, I'm not sure)

>     setDateParamHelper: function (params, entryname, cdt) {
>+                params[entryname + "_tz"] = 

Storing the timezone defintion in each and every item that came from a different client goes directly against my idea of how a database should be normalized. Adding a table would be much nicer. The problem of unused timezeons after deletion can be solved by a smart prune query. I think those queries won't be too slow, and can be done when idle.
(In reply to comment #6)
> Storing the timezone defintion in each and every item that came from a
> different client goes directly against my idea of how a database should be
> normalized. Adding a table would be much nicer. The problem of unused timezeons
> after deletion can be solved by a smart prune query. I think those queries
> won't be too slow, and can be done when idle.

We need to be prepared that multiple calendars (or iMIP invitations) from different sources enter the same storage calendar and it's likely that TZIDs collide. I don't think that holding them in a separate table is that cheap if you can't use the TZID as key (especially w.r.t. to performance). Finally, IMO the timezone definitions are not that big, so I think holding them in a separate table don't buy us much.
How often will it happen that there's different definitions with the same TZID? And is it really neccasserry to delete unused timezones from a database? Furthermore, doesn't this idea collide with the patch ctalbert is (was) working on in bug 314339? Imho (I've seen a lot of bugreports for foreign timezones) it should be enough to check wether a foreign timezone is already in the table and perhaps check for the last-modified date of the timezone (either in the LAST-MODIFIED property or in the TZID). If the timezone-definitions aren't that big, perhaps it's easier to store them with each item, not only foreign ones?
(In reply to comment #8)
> How often will it happen that there's different definitions with the same TZID?
IIRC Clint mentioned that Outlook comes with the same TZID but different definitions from different Outlook versions.

> And is it really neccasserry to delete unused timezones from a database?
Yes, IMHO it is.

> Furthermore, doesn't this idea collide with the patch ctalbert is (was) working
> on in bug 314339? Imho (I've seen a lot of bugreports for foreign timezones) it
Clint and I have overcome the idea to store the foreign timezones in a central database for the mentioned reasons: no solution for clashing TZIDs and no (easy) possibility to purge out the unused ones. Result of that discussion is this bug (providers should store the foreign timezones by themselves, see bug 314339 comment #49).

> big, perhaps it's easier to store them with each item, not only foreign ones?
That way we decouple our internal Olson set from being upgradable.
(In reply to comment #7)
> We need to be prepared that multiple calendars (or iMIP invitations) from
> different sources enter the same storage calendar and it's likely that TZIDs
> collide. I don't think that holding them in a separate table is that cheap if
> you can't use the TZID as key (especially w.r.t. to performance).

I don't understand that. You can use a simple integer as key, nobody said it _has_ to be the tzid. Why would that hurt performance? A inner-join query won't be that much slower than a normal one.

> Finally, IMO
> the timezone definitions are not that big, so I think holding them in a
> separate table don't buy us much.

Well, if you import a reasonably-sized calendar, you can get the same information a lot of times. Or if you regularly deal with invitations coming from other apps.
(In reply to comment #10)
> I don't understand that. You can use a simple integer as key, nobody said it
> _has_ to be the tzid. Why would that hurt performance? A inner-join query won't
> be that much slower than a normal one.
Hmm, please ignore my ignorance (I am no SQL wizard), but I can imagine
- this leads to even longer queries
- this leads to multiple result rows (for individual events) in the case that start and end both refer to foreign timezones.
- slower queries, because I still think inner joins have O(log n) lookups
I suspect this leads to significant overhead (both w.r.t code and performance). Why should we create another table indirection for some simple (not overly lengthy) strings? I don't think that's sensible.

> > Finally, IMO
> > the timezone definitions are not that big, so I think holding them in a
> > separate table don't buy us much.
> 
> Well, if you import a reasonably-sized calendar, you can get the same
> information a lot of times. Or if you regularly deal with invitations coming
> from other apps.
As stated, I doubt that these short strings overly count.
This blocks 0.8, because it blocks bug 314339.
Flags: blocking-calendar0.8+
IMO we need to proceed on this bug since it's (part of) of a top-prio blocker. Any more opinions on the discussed stuff, maybe from Clint/Philipp/...?
(In reply to comment #11)
> As stated, I doubt that these short strings overly count.

From a quick calculation, importing my current copy of campuscalendar.ics would take about 140kb of duplicated timezone strings. I don't know if sqlite compresses that.
(In reply to comment #14)
> From a quick calculation, importing my current copy of campuscalendar.ics would
> take about 140kb of duplicated timezone strings. I don't know if sqlite
> compresses that.
Even if not, I doubt such sizes are relevant on modern computers. Don't get me wrong: I don't want to avoid saving bytes, but I fear an extra indirection will hurt performance.
(In reply to comment #9)
> IIRC Clint mentioned that Outlook comes with the same TZID but different
> definitions from different Outlook versions.

How would that be handled when the calendar is serialized to ICS? (maybe for export) DO we need to change the duplicated tzid's?
Exporting the whole storage calendar (containing clashing TZIDs) as ICS will currently purge out definitions (calIcalComponent::AddTimezoneReference). We would need to create unique TZIDs before exporting the data to avoid that. However, strictly thought this is a falsification of the original data...
I just noticed that currently, a TZID=America/New_York gets mapped to out initial timezone. I'd say that that's the right thing to do, because you get upgrades if the timezone definition ever changes. That's obviously what the user wants.
How does the patch deal with that?
I think what we discuss is beyond the scope of this bug/patch. With this patch the storage provider stores all related tz definitions (regardless of their TZID) at item level, so there's no loss when multiple items share the same TZIDs. Using the ICS service/serializer for serializing into an ICS file we can face this problem (comment #17), because the VTIMEZONEs are organized by TZID. That way, IMO it's the scope of the ICS service to handle that situation correctly.
I think this patch does exactly what it needs to do.  It stores the foreign timezones properly and ensures that each event contains a reference to its specific foreign timezone.  I think that the considerations of exporting to an ICS file, should work as follows:

1. On export each event is written out
2. As the event is written out, the associated VTIMEZONE is added to the VCALENDAR.
2a. If a VTIMEZONE block already exists with this tzid (from step 2) then we do NOT duplicate the VTIMEZONE.
2b. If the VTIMEZONE block does not exist with the tzid (from step 2) then it is added.  

This is the only way that every event will continue to reference a proper VTIMEZONE field.  However, I think that the exporting issue should be a separate bug, and can be dealt with during its own cycle.  Let's just focus on the storage provider piece for now.

As far as database growth due to adding more information into the sqlite tables is concerned, I don't think this is really our problem.  The fact that sqlite TEXT column types are unlimited is a well known and often utilized feature of that database engine.  We can depend (and I argue, we must depend) on the developers of sqlite to store that data in an optimal fashion.  Speaking from experience as a database engineer (Pervasive Software, 7 years), that is always something foremost in their minds: how to be as reliable and optimized as possible.  I don't find any mention of the details of how they store text, I'm sure I could go read the code if I wanted to.  But, I think we can trust that these individuals are going to do the job for us.  And I say we should let them. We have other things to worry about.

A couple of other notes I like about this patch:
* It doesn't change the database structure.  This is awesome!  We always incur a huge headache in terms of QA and in terms of difficulty when we change the database schema and have to code a database migration patch.
* The only other way to do this would be to add a separate timezone table, then every query to fetch an event would also have to fetch a timezone:
SELECT * FROM events, timezones WHERE events.eventID = timezones.eventID
Then you also have to wonder at the other things that can take timezones: alarms, recurrence ID's.  You'd have to perform this join for each of those items, as you can't depend on all the tzid's being the same.  This would have a much much larger performance/overhead impact than would storing extra data in tzid fields.  Joins are done by creating temporary tables either in memory (or on disk if the set is large) and then presenting that table as the result set of the sql query.  This would always be more intensive an operation than pulling the data from a specific field in the database.

So, from my experience writing a similar patch (bug 314339) and from working in the Db world, I really recommend that we take Daniel's approach here.

I also recommend that we file a follow on bug for the ICS issue, and make sure that gets solved properly there.
Comment on attachment 294723 [details] [diff] [review]
calStorageCalendar part

Mvl, no hard feelings please, but we really need to move on and get this tested for 0.8...

Reassigning review to Clint.
Attachment #294723 - Flags: review?(mvl) → review?(ctalbert)
Flags: in-testsuite?
Comment on attachment 294723 [details] [diff] [review]
calStorageCalendar part

r=ctalbert.  Looks good.
Attachment #294723 - Flags: review?(ctalbert) → review+
renamed:
- stripTimezone => getInUtcOrKeepFloating
- component_ => comp

Checked in on HEAD and MOZILLA_1_8_BRANCH.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Target Milestone: --- → 0.8
Flags: wanted-calendar0.8+
Checked in latest nightly build -> task is fixed and verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: