Closed Bug 1121415 Opened 9 years ago Closed 9 years ago

Replace timezones.sqlite with a non-binary format

Categories

(Calendar :: Internal Components, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
4.0.0.1

People

(Reporter: darktrojan, Assigned: darktrojan)

References

Details

Attachments

(1 file, 3 obsolete files)

For various reasons that we've discussed on IRC, and others, I'd like to replace the SQLite database of timezones with a JSON file. This will give us a proper change history and removes a binary blob from the XPI (once libical is gone, that means we can leave to XPI packed on install).
Attached patch 1121415-1.diff (obsolete) β€” β€” Splinter Review
This is a WIP patch. The new files are:

zones.py: extracts the timezone info from the vzic output
zones.json: output of zones.py
aliases.json: list of timezone aliases

The timezone service loads zones.json and aliases.json into the map mZones asynchronously, and zones are created from the map as needed. I haven't yet replaced the bit where a different timezones file can be loaded from the timezones add-on.

Everything else is just fixes to make it all work.
Assignee: nobody → geoff
Status: NEW → ASSIGNED
Attachment #8548842 - Flags: feedback?(philipp)
Comment on attachment 8548842 [details] [diff] [review]
1121415-1.diff

Review of attachment 8548842 [details] [diff] [review]:
-----------------------------------------------------------------

I suspect you've found this, but in case not here is a fork of the vzic tool that has been working a bit better for me: https://code.google.com/p/tzurl/

Generally I think this brings us into a good direction. I especially like there being the python script to generate the data, maybe it could be expanded to do more of the work (like downloading the latest tzdata and calling vzic/tzurl. It could also make sure the --pure version of timezones like Asia/Jerusalem is used without needing to muck around with the files every time.

You already mentioned this on IRC briefly so I assume you'll be taking the timezones extension into account? Maybe we should also revisit the option of reading timezone data from the OS where available, for example on Linux where zoneinfo is installed. The latter would be for a different bug though.

::: calendar/base/src/calTimezone.js
@@ +46,5 @@
> +        let displayName = this.tzid;
> +        try {
> +            displayName = bundle.GetStringFromName(stringName);
> +        } catch (e) {
> +            cal.WARN('No display name for ' + this.tzid);

I don't think we need to warn about this, otherwise we might have a lot of messages for locales that are missing them. Maybe just add a comment in the catch block why we are swallowing it.

::: calendar/base/src/calTimezoneService.js
@@ +77,5 @@
> +                return;
> +            }
> +
> +            let json = NetUtil.readInputStreamToString(inputStream, inputStream.available()).trim();
> +            for (let [tzid, data] of Iterator(JSON.parse(json))) {

I think |Iterator| is being phased out. Not sure what the suggested replacement is though.

@@ +197,4 @@
>      },
>  
>      get version() {
> +        return "2014j";

I think you may need to get this from the JSON file instead. This means we could also put aliases and ical data into one file:

{
   version: "2.2014j",
   other_metadata: "when its needed",
   zones: {
       ...
   },
   aliases: {
       ...
   }
}

What do you think?


Also, the version format 1.2014j was meant to take major updates into account, like this one. You'd be switching to 2.2014j now. (Or would it be 0.2.2014j, given the next patch chunk shows it that way?)
Attachment #8548842 - Flags: feedback?(philipp) → feedback+
(In reply to Philipp Kewisch [:Fallen] from comment #2)
> I suspect you've found this, but in case not here is a fork of the vzic tool
> that has been working a bit better for me: https://code.google.com/p/tzurl/

That's what is listed in the Makefile, so yes, that's what I used. The TZ data link needs updating though.

> You already mentioned this on IRC briefly so I assume you'll be taking the
> timezones extension into account? Maybe we should also revisit the option of
> reading timezone data from the OS where available, for example on Linux
> where zoneinfo is installed. The latter would be for a different bug though.

I've done this bit now, at least as far as making the timezone service check for it.

> @@ +197,4 @@
> >      },
> >  
> >      get version() {
> > +        return "2014j";
> 
> I think you may need to get this from the JSON file instead. This means we
> could also put aliases and ical data into one file:
> 
> {
>    version: "2.2014j",
>    other_metadata: "when its needed",
>    zones: {
>        ...
>    },
>    aliases: {
>        ...
>    }
> }
> 
> What do you think?

That's what I've done since this patch was posted.

> Also, the version format 1.2014j was meant to take major updates into
> account, like this one. You'd be switching to 2.2014j now. (Or would it be
> 0.2.2014j, given the next patch chunk shows it that way?)

The SQLite file has it as 1.2014b, so I guess 2.2014j is the way to go. I'm not sure why the locale file has an extra "0.".
Attached patch 1121415-inter-1.diff (obsolete) β€” β€” Splinter Review
This is just the changes since the first patch, since most of it would be the same. Notable changes:

* I'm now loading the zones into the timezone service with a promise chain, for code readability's sake.
* The Python script takes the non-pure version of the timezones and swaps in  the RRULEs from the pure version, if different. Is this valid? You can see the 7 timezones affected in the JSON file.

I think anyone using the script would have to at least have vzic compiled. The location of the TZ data is compiled in, which is annoying. I could make the script run vzic though (twice, once with --pure).
Attachment #8554134 - Flags: feedback?(philipp)
Comment on attachment 8554134 [details] [diff] [review]
1121415-inter-1.diff

Review of attachment 8554134 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Geoff Lankow (:darktrojan) from comment #4)
> Created attachment 8554134 [details] [diff] [review]
> 1121415-inter-1.diff
> 
> This is just the changes since the first patch, since most of it would be
> the same. Notable changes:
> 
> * I'm now loading the zones into the timezone service with a promise chain,
> for code readability's sake.
> * The Python script takes the non-pure version of the timezones and swaps in
> the RRULEs from the pure version, if different. Is this valid? You can see
> the 7 timezones affected in the JSON file.
So the ics changed is the pure output? I always thought that Asia/Jerusalem was at least 30 lines of ical data.

What we could consider is including both the pure and non-pure data to the zones file for zones where it differs. This way we could (in another bug) use the pure data for calculating events internally, and the non-pure version when exporting for brevity and/or compatibility with outlook. On the other hand, there is a notion towards omitting VTIMEZONE data in exports all together, and letting the client use their own definitions for Olson timezones.

Anyway, I think it would be valid to do what you did, its simple and saves us from figuring out manually which timezones might need pure data. If you think its worth including both data sets go ahead, otherwise stick with what you have.

> 
> I think anyone using the script would have to at least have vzic compiled.
That is totally fine.


> The location of the TZ data is compiled in, which is annoying. I could make
> the script run vzic though (twice, once with --pure).

Can't you pass --output-dir ? This overrides the compiled in value.

Given vzic is installed, I think the script should do everything else on its own: downloading the tzdata from its url if the files don't exist, unpacking them, running vzic pure and non-pure, and finally putting together the zones.json

::: b/calendar/base/src/calTimezoneService.js
@@ +79,5 @@
> +                    deferred.reject(status);
> +                    return;
> +                }
> +
> +                let jsonData = NetUtil.readInputStreamToString(inputStream, inputStream.available()).trim();

Why does it need to be trimmed? Does the JSON parser barf on extra whitespaces?

@@ +80,5 @@
> +                    return;
> +                }
> +
> +                let jsonData = NetUtil.readInputStreamToString(inputStream, inputStream.available()).trim();
> +                let tzData = JSON.parse(jsonData);

Might want to do this on a try/catch block in case there is a parsing error and then reject the promise.

Note that the new way to use promises is now:

return new Promise((resolve, reject) => {
    NetUtil.asyncFetch((inputStream, status) => {
        ...
        resolve(tzData);
    });
});

If you'd like to stay with deferreds, you need to import PromiseUtils.jsm.
Attachment #8554134 - Flags: feedback?(philipp) → feedback+
Attached patch 1121415-3.diff (obsolete) β€” β€” Splinter Review
Okay, I think this is reviewable. I haven't done anything to remove the old system yet, and there's no notes on how to run anything, apart from the usage information in zones.py (although I think that is reasonably good).
Attachment #8548842 - Attachment is obsolete: true
Attachment #8554134 - Attachment is obsolete: true
Attachment #8554148 - Flags: review?(philipp)
Comment on attachment 8554148 [details] [diff] [review]
1121415-3.diff

Review of attachment 8554148 [details] [diff] [review]:
-----------------------------------------------------------------

The js code looks fine, but the python script could use a little love. Normally I'd say we do it in a followup, but I think this is the kind of script that will never get changed until it breaks so I'd prefer to clean it up now.

::: calendar/base/src/calTimezoneService.js
@@ +100,5 @@
> +        }
> +
> +        fetchJSON("resource://" + resNamespace + "/timezones/zones.json").then((tzData) => {
> +            for (let [tzid, data] of Iterator(tzData)) {
> +                if (typeof data == "object") {

if data is null then it will also be "object". Sure you want to set that?

@@ +118,5 @@
> +
> +            return fetchJSON("resource://" + resNamespace + "/timezones/aliases.json");
> +        }).then((tzData) => {
> +            for (let [alias, real] of Iterator(tzData)) {
> +                this.mZones.set(alias, {real: real});

I think a key like "aliasTo" would be nicer, or just "alias":


{
  ...
  "Cologne/Karneval": { alias: "Europe/Berlin" }
}

what do you think?

@@ +171,5 @@
>  
>      // calITimezoneProvider:
>      getTimezone: function calTimezoneService_getTimezone(tzid) {
> +        if (tzid == "floating") return this.floating;
> +        if (tzid == "UTC") return this.UTC;

Do we need these checks? Shouldn't it be auto-resolved using the code below?

::: calendar/lightning/Makefile.in
@@ +105,5 @@
>  
>  # xxx todo: unless our packaging story is revised (bug 406579) we package up timezones.sqlite
>  libs::
> +	$(NSINSTALL) -m 0644 $(srcdir)/../timezones/zones.json $(FINAL_TARGET)/timezones
> +	$(NSINSTALL) -m 0644 $(srcdir)/../timezones/aliases.json $(FINAL_TARGET)/timezones

Not sure if I was unclear, sorry if it didn't come across: In comment 3 you said you had already taken care of what I mentioned before with having both aliases and real timezone in one file. Why do we have both zones.json and aliases.json then? I thought we could just use one file in total containing both aliases and zones?

::: calendar/timezones/zones.py
@@ +23,5 @@
> +    tzdata_download_path = tempfile.mktemp(prefix="zones")
> +    subprocess.check_call(["wget", TZDATA_URL, "-O", tzdata_download_path])
> +    tzdata_path = tempfile.mkdtemp(prefix="zones")
> +    sys.stderr.write("Extracting %s to %s\n" % (tzdata_download_path, tzdata_path))
> +    subprocess.check_call(["tar", "-xzf", tzdata_download_path, "-C", tzdata_path])

I would have expected use of httplib2 and tarfile here :-)

@@ +29,5 @@
> +else:
> +    tzdata_path = args.tzdata_path
> +
> +# Extract version number of tzdata files.
> +fp = open(os.path.join(tzdata_path, "Makefile"), "r")

Can you organize the toplevel code into functions a bit more?

@@ +89,5 @@
> +    data[filename[:-4]] = {
> +        "ics": ics_data
> +    }
> +
> +def read_dir(path, prefix=""):

Instead of using a global, do you think you could change read_dir to return the data?

@@ +101,5 @@
> +# Read zone files into our data.
> +read_dir(zoneinfo_path)
> +
> +tab = open(os.path.join(zoneinfo_path, "zones.tab"), "r")
> +for l in tab:

If you like you can use a context manager here, i.e.

with open(os.path.join(zoneinfo_path, "zones.tab"), "r") as tab:
    # do stuff with file
Attachment #8554148 - Flags: review?(philipp) → review-
Attached patch 1121415-4.diff β€” β€” Splinter Review
(In reply to Philipp Kewisch [:Fallen] from comment #7)
> I think a key like "aliasTo" would be nicer, or just "alias":

I've used aliasTo for now, so it's not confusing which is the alias and which is real. Considering using aliasOf, instead.

> Not sure if I was unclear, sorry if it didn't come across: In comment 3 you
> said you had already taken care of what I mentioned before with having both
> aliases and real timezone in one file. Why do we have both zones.json and
> aliases.json then? I thought we could just use one file in total containing
> both aliases and zones?

I missed some of what you asked for there. I don't really see too much advantage in having one file for everything, but I've done it that way now.

> ::: calendar/timezones/zones.py
> @@ +23,5 @@
> > +    tzdata_download_path = tempfile.mktemp(prefix="zones")
> > +    subprocess.check_call(["wget", TZDATA_URL, "-O", tzdata_download_path])
> > +    tzdata_path = tempfile.mkdtemp(prefix="zones")
> > +    sys.stderr.write("Extracting %s to %s\n" % (tzdata_download_path, tzdata_path))
> > +    subprocess.check_call(["tar", "-xzf", tzdata_download_path, "-C", tzdata_path])
> 
> I would have expected use of httplib2 and tarfile here :-)

Yeah, um, got a bit lazy. My Python-fu is rusty.



Since the previous patch I also:
* Took the schema version out of the comments in timezone.properties (it's not really relevant there, and besides it doesn't even match what's used elsewhere)
* Removed timezones.sqlite and the old update script
* Renamed zones.py to update-zones.py
Attachment #8554148 - Attachment is obsolete: true
Attachment #8555797 - Flags: review?(philipp)
Comment on attachment 8555797 [details] [diff] [review]
1121415-4.diff

Review of attachment 8555797 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good now. One minor thing, but setting r+ :)

::: calendar/timezones/zones.json
@@ +75,5 @@
> +    "utc": {
> +      "aliasTo": "UTC"
> +    }
> +  },
> +  "version": "2.2014j",

I'd suggest putting the version key at the top.
Attachment #8555797 - Flags: review?(philipp) → review+
(In reply to Philipp Kewisch [:Fallen] from comment #9)
> Comment on attachment 8555797 [details] [diff] [review]
> 1121415-4.diff
> 
> Review of attachment 8555797 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good now. One minor thing, but setting r+ :)
> 
> ::: calendar/timezones/zones.json
> @@ +75,5 @@
> > +    "utc": {
> > +      "aliasTo": "UTC"
> > +    }
> > +  },
> > +  "version": "2.2014j",
> 
> I'd suggest putting the version key at the top.

I would but Python sorts the keys (it's a complete mess otherwise). I could call it "_version", I suppose, but don't really want to.
Shipped it!
https://hg.mozilla.org/comm-central/rev/74916ecc3d77

(Also fixed to put version first, and to make test_gdata_provider.js work.)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.0
Depends on: 1128070
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: