Last Comment Bug 403222 - Pattern matching / natural language parsing in e-mail for dynamic event creation
: Pattern matching / natural language parsing in e-mail for dynamic event creation
Status: RESOLVED FIXED
: student-project
Product: Calendar
Classification: Client Software
Component: Lightning Only (show other bugs)
: Trunk
: All All
: -- enhancement with 2 votes (vote)
: 2.6
Assigned To: Merike (:merike)
:
Mentors:
http://merike.github.com/event-extract/
: 412956 481196 482842 (view as bug list)
Depends on: 442003
Blocks: 886124
  Show dependency treegraph
 
Reported: 2007-11-09 10:25 PST by stephan.rubin
Modified: 2013-06-23 21:14 PDT (History)
28 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
GUI Mockup (195.92 KB, image/jpeg)
2007-11-09 10:47 PST, stephan.rubin
no flags Details
part one (49.63 KB, patch)
2012-12-30 08:19 PST, Merike (:merike)
no flags Details | Diff | Review
part one (59.02 KB, patch)
2013-01-03 11:24 PST, Merike (:merike)
philipp: review-
Details | Diff | Review
part one - v2 (64.67 KB, patch)
2013-04-14 02:58 PDT, Merike (:merike)
philipp: review+
Details | Diff | Review
part one - v2 with improved comments (65.91 KB, patch)
2013-06-22 12:57 PDT, Merike (:merike)
no flags Details | Diff | Review

Description stephan.rubin 2007-11-09 10:25:33 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Build Identifier: version 2.0.0.6 (20070728)

I think it would be great to see a kind of basic pattern matching engine in the Lightning codebase that scans incoming emails for possible event info, and upon reading the e-mail, attempts to dynamically construct an event instance from the information in the e-mail, which can then be changed/customized if necessary, and then added to one of the users Lightning calendars. Google Calendar has functionality similar to this. Mockup will be forthcoming.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 stephan.rubin 2007-11-09 10:47:02 PST
Created attachment 288022 [details]
GUI Mockup

One possible mockup of what the prompt could look like. This would maximize the reuse of existing UI code.
Comment 2 Pete Riley 2007-11-11 21:57:51 PST
Such a parsing engine could also be used in the Today Pane -- events could be quickly created by typing in a textfield instead of opening the event dialog.

(The wiki screenshots show that there is already a plan to have such a textfield for tasks, but not for events).
Comment 3 Philipp Kewisch [:Fallen] 2008-05-12 04:35:07 PDT
Will be taken care of in 2008 Summer of Code.

http://wiki.mozilla.org/Community:SummerOfCode08#Calendar
Comment 4 Philipp Kewisch [:Fallen] 2008-05-12 08:56:06 PDT
*** Bug 412956 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Sitter 2008-10-10 11:24:45 PDT
(In reply to comment #3)

When can we expect the first results from the GSoC 2008 project?
Comment 6 Daniel Boelzle [:dbo] 2008-10-10 11:37:21 PDT
AFAIK that project has been stopped.
Comment 7 Stefan Sitter 2009-03-03 09:41:52 PST
*** Bug 481196 has been marked as a duplicate of this bug. ***
Comment 8 Philipp Kewisch [:Fallen] 2009-06-14 05:59:07 PDT
This might make a very interesting student-project. From what I know, there is a regex based parser that can only do en-us in the spicebird project. This does not mean that the parser implemented here needs to be regex based though.

To make this part of the core, the parser needs to be:
* localizable (i.e can also understand other installed languages)
* easily extendable (i.e after adding basic constructs like "Get Drunk on July 4th", make it possible to extend the parser to also understand other constructs that add information, like "Get Drunk every July 4th".

Depending on how much time the student has, I'm sure that implementing even more features might also make this a nice diploma or bachelor/master thesis project.
Comment 9 cmtalbert 2009-06-15 11:05:25 PDT
(In reply to comment #8)
> This might make a very interesting student-project. From what I know, there is
> a regex based parser that can only do en-us in the spicebird project. This does
> not mean that the parser implemented here needs to be regex based though.
> 
> To make this part of the core, the parser needs to be:
> * localizable (i.e can also understand other installed languages)
> * easily extendable (i.e after adding basic constructs like "Get Drunk on July
> 4th", make it possible to extend the parser to also understand other constructs
> that add information, like "Get Drunk every July 4th".
An interesting approach to get started here might be by codifying verbs and nouns and actions the way they are doing on the Ubiquity project in mozilla labs.  It's a natural language based command system that is being localized into three different languages now, and they are trying to make that process as easy as possible.
Comment 10 Markus Adrario [:Taraman] 2009-07-01 08:42:36 PDT
We also already have a language parsing module!
It's in the DateTimePicker:
http://mxr.mozilla.org/comm-central/source/calendar/resources/content/datetimepickers/datetimepickers.xml#151

Also Date-Literals can be parsed.
Function ParseDateTime which takes Dates in various formats:
http://mxr.mozilla.org/comm-central/source/calendar/resources/content/datetimepickers/datetimepickers.xml#1417

Maybe we can make this code available for use everywehre in Calendar...
Comment 11 Philipp Kewisch [:Fallen] 2009-11-07 10:45:09 PST
anirvana has decided to work on this bug, mostly starting December. I'm glad to hear this bug is getting some attention, it would greatly improve productivity. Please coordinate with anirvana if you would like to contribute to this bug.
Comment 12 anirvana 2009-11-09 09:40:09 PST
I think there is a very simple mapping between the keywords(to be sought) in the email and the event to be triggered,we don't need a full fledged language parser for that as we are looking for those keywords only which has a corresponding event  linked with it.Implementing it with various languages is going to be challenging.
Comment 13 Nipun 2009-11-13 20:42:06 PST
Is there any other software/Calendar manger which implements the same,be it open source or closed?
Comment 14 anirvana 2009-11-13 21:30:11 PST
Google Calendar has functionality similar to this.
Comment 15 Philipp Kewisch [:Fallen] 2009-11-14 07:33:44 PST
As mentioned, Spicebird has its event filter as an extension. See https://svn.spicebird.org/viewvc/collab/trunk/extensions/event-filter/
Comment 16 Philipp Kewisch [:Fallen] 2010-03-05 07:18:08 PST
*** Bug 482842 has been marked as a duplicate of this bug. ***
Comment 17 anirvana 2010-04-15 20:31:47 PDT
I was thinking of using an auto-complete text box for event/task 'venue' and 'purpose' whose values would be filled by the suggestions from the parser.For example If somebody writes: Lets have a party today,9:30 at my home.Here the parser would extract date and time directly and in the auto-suggest text box for 'purpose' it would suggest 'party','a party' etc. similarly auto-suggest text box for 'venue' would suggest 'home','my home','{sender}'s home' etc.This would make it more flexible.
Any thoughts, Fallen?
Comment 18 Philipp Kewisch [:Fallen] 2010-04-20 01:58:08 PDT
As mentioned on IRC, I think we should move all invasive UI changes to a different bug. This bug should create the parser along with some unit tests to make sure it works. A further bug could then integrate some UI.

Just an idea I had shortly regarding UI: Obviously, we could show a bar similar to the itip bar if the parser found something.  Since we won't be able to do fully automatic parsing at first we could then overlay fragments the parser has detected (i.e dates) with a colored rounded box. When the user hovers that box, a dropdown arrow appears, allowing the user to select what relation this (i.e date) has with the event. For example, the user could decide that the selected date is the start date.

Time is too short for a mockup, please let me know if the above is not really understandable.
Comment 19 Felix Möller 2011-09-22 13:43:26 PDT
This is a great idea, I just wanted to report it.

I strongly support Philipps comment #18, because in attachment #288022 [details] one cannot cope with a mail with multiple events. Moreover, I would be afraid of random stuff detected as appointment.

For the idea with highlighting it will be a good idea to look at https://addons.mozilla.org/de/firefox/addon/sipgateffx/ which has its source at https://github.com/sipgate/sipgateffx. sipgateffx has and overlay which detects phone numbers and allows to initialize calls to them.
Comment 20 Michael Bauer 2011-11-27 07:35:07 PST
One point though regarding the localization process - however it's done, it needs to be able to handle multiple languages (or at least 2-3) at the same time. My email is predominantly in English, Scots Gaelic and German. For German users, I suspect German and English would both feature and so on. So if the system restricted my to one language, that would significantly reduce the usefulness.
Comment 21 Merike (:merike) 2011-11-27 14:10:08 PST
It will most certainly support multiple languages. In current prototype it goes over dictionaries installed and checks how much of the email content matches. If it can't find one that matches over 50% then it falls back to patterns in application locale.
Comment 22 Benny Beat 2011-11-28 08:52:40 PST
This is a great idea!!!
Nice if I can help you with Catalan translation, or testing about this...

Cheers,
Benny ^_^"
Comment 23 Stefano Fraccaro 2011-12-01 01:49:53 PST
I can help with Italian translation   :-)
Comment 24 Merike (:merike) 2012-12-30 08:19:10 PST
Created attachment 696627 [details] [diff] [review]
part one

Starting from basics, this includes:
 * basic extraction functionality for any language
 * some tests
and excludes:
 * all of UI
 * all of language detection parts
which will come in next patch(es).
Comment 25 Magnus Melin 2012-12-30 11:31:24 PST
Comment on attachment 696627 [details] [diff] [review]
part one

Review of attachment 696627 [details] [diff] [review]:
-----------------------------------------------------------------

Driveby.. really excited to see this move forwards!

::: calendar/base/modules/calExtract.jsm
@@ +1,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +var EXPORTED_SYMBOLS = ["extractor"];

const

@@ +14,5 @@
> +  dailyNumbers: [],
> +  allMonths: "",
> +  months: [],
> +  dayStart: 6,
> +  now: undefined,

I think it's unusual to have it be "undefined". null would be normal
I notice later in the code there's a lot of != and == undefined checks. Those should be strict comparisons (!== and === ) or just falsy comparisons where applicable. ( x != undefined checks both undefined and null)

@@ +59,5 @@
> +  },
> +
> +  checkBundle: function checkBundle(locale) {
> +    let service = Components.classes["@mozilla.org/intl/stringbundle;1"]
> +                 .getService(Components.interfaces.nsIStringBundleService);

There's Services.strings

@@ +631,5 @@
> +    let startTimes = collected.filter(function(val) {
> +        return (val.relation == "start");});
> +    if (startTimes.length == 0)
> +      return {};
> +    else {

no else after return.

@@ +888,5 @@
> +
> +    try {
> +      let value = this.bundle.GetStringFromName(name);
> +      if (value.trim() == "")
> +        throw "";

Shouldn't throw strings. Throw new Error("fooo") if you need. 
But shouldn't it just return "" or something?

@@ +1003,5 @@
> +  normalizeHour: function normalizeHour(hour) {
> +    if (hour < this.dayStart && hour <= 11)
> +      return hour + 12;
> +    else
> +      return hour;

no else after return

@@ +1010,5 @@
> +  normalizeYear: function normalizeYear(year) {
> +    if (year.length == 2)
> +      return "20" + year;
> +    else
> +      return year;

no else after return

or simply 
return (year.length == 2) ? "20" + year : year;

@@ +1101,5 @@
> +      }
> +      return -1;
> +    } else {
> +      return r;
> +    }

No else after return
Comment 26 Merike (:merike) 2013-01-03 11:24:21 PST
Created attachment 697565 [details] [diff] [review]
part one

This should improve on the points Magnus mentioned. It also adds patterns file which I forgot to include in last patch.
Comment 27 Philipp Kewisch [:Fallen] 2013-03-06 11:13:57 PST
Comment on attachment 697565 [details] [diff] [review]
part one

Review of attachment 697565 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review. Here is a code level review to get things going. r- for now to get a new patch and discussion on my comments:

::: calendar/base/modules/calExtract.jsm
@@ +5,5 @@
> +const EXPORTED_SYMBOLS = ["extractor"];
> +Components.utils.import("resource://calendar/modules/calUtils.jsm");
> +Components.utils.import("resource://gre/modules/Services.jsm");
> +
> +var extractor = {

Is this really a singleton that can cope with callers from different areas operating it? If not, you should make this a function with a prototype.

Also, this file uses 2 space indent instead of the usual 4.

@@ +33,5 @@
> +   *                          be in the afternoon, when null then by default
> +   *                          set to 6
> +   */
> +  init: function init(baseUrl, fallbackLocale, dayStart) {
> +    this.bundleFile = baseUrl;

Suggest naming this bundleUrl

@@ +196,5 @@
> +    this.extractDuration("duration.hours", 60);
> +    this.extractDuration("duration.days", 60 * 24);
> +
> +    if (sel !== undefined)
> +      this.markSelected(sel, title);

Brackets around one-line if

@@ +624,5 @@
> +        } else if (one.minute > two.minute) {
> +          return 1;
> +        } else {
> +          return 0;
> +        }

You could shorten this function a bit:

rc = (one.hour < two.hour) - (one.hour > two.hour)
if (!rc) {
  rc = (one.minute < two.minute) - (one.minute > two.minute)
}
return rc;

Same applies to the other part with days and months, and probably something similar with the null checks.

@@ +640,5 @@
> +  guessStart: function guessStart(collected, isTask) {
> +    let startTimes = collected.filter(function(val) {
> +        return (val.relation == "start");});
> +    if (startTimes.length == 0)
> +      return {};

Brackets around one-line if.

@@ +643,5 @@
> +    if (startTimes.length == 0)
> +      return {};
> +
> +    for (let val in startTimes) {
> +      cal.LOG("Start: " + JSON.stringify(startTimes[val]));

Maybe you could prefix your log functions with an identifying message, i.e [calItemExtractor] or [calExtract]. If you want, since you are in a JS module, you could also create a helper function that is not exported.

@@ +646,5 @@
> +    for (let val in startTimes) {
> +      cal.LOG("Start: " + JSON.stringify(startTimes[val]));
> +    }
> +
> +    var guess = {};

var on purpose?

@@ +718,5 @@
> +   * @param isTask    whether start time should be guessed for task or event
> +   * @return          datetime object for end time
> +   */
> +  guessEnd: function guessEnd(collected, start, isTask) {
> +    var guess = {};

var on purpose?

@@ +723,5 @@
> +    let endTimes = collected.filter(function(val) {
> +        return (val.relation == "end");});
> +    let durations = collected.filter(function(val) {
> +        return (val.relation == "duration");});
> +    if (endTimes.length == 0 && durations.length == 0)

brackets around if block

@@ +737,5 @@
> +        return (val.ambiguous === undefined);});
> +      let wMinute = endTimes.filter(function(val) {
> +        return (val.minute != null);});
> +      let wMinuteNA = wMinute.filter(function(val) {
> +        return (val.ambiguous === undefined);});

I'd appreciate if you could put the closing brackets on the next line for this kind of construct:

let wMinuteNA = wMinute.filter(function(val) {
   return (val.ambigious === undefined);
});

In the new Javascript we have you can also do:

let wMinuteNA = wMinute.filter(function(val) val.ambigious === undefined);

If you feel like it, you can also use the shortened function syntax in other places.

@@ +828,5 @@
> +      }
> +
> +      // no zero length events/tasks
> +      if (guess.year == start.year && guess.month == start.month
> +        && guess.day == start.day && guess.hour == start.hour

&& on end of line before.

@@ +849,5 @@
> +
> +  getPatterns: function getPatterns(name) {
> +    let value;
> +    // this should never be found in an email
> +    let def = "061dc19c-719f-47f3-b2b5-e767e6f02b7a";

What happens if I put this in my signature? ;-) (just curious)

::: calendar/locales/filter.py
@@ +18,5 @@
> +  # most extraction related strings are not required
> +  if path == "chrome/calendar/calendar-extract.properties":
> +    if not re.match(r"from.today", entity):
> +      return "report"
> +  

Whitespaces here.
Comment 28 Merike (:merike) 2013-04-14 02:58:50 PDT
Created attachment 737222 [details] [diff] [review]
part one - v2

(In reply to Philipp Kewisch [:Fallen] from comment #27)
> Is this really a singleton that can cope with callers from different areas
> operating it? If not, you should make this a function with a prototype.
I'm not entirely sure so I've gone with the prototype approach (and hopefully I got that right as I'm quite unfamiliar with most javascript patterns). Is it possible to call into javascript module from two threads at the same time? If yes then previous patch would have resulted in some "interesting" results.

> You could shorten this function a bit:
> 
> rc = (one.hour < two.hour) - (one.hour > two.hour)
> if (!rc) {
>   rc = (one.minute < two.minute) - (one.minute > two.minute)
> }
> return rc;
> 
> Same applies to the other part with days and months, and probably something
> similar with the null checks.
I've shortened comparisons. I don't see a simple way to shorten null checks though. Comparisons can give three different result values while null checks need to differentiate four different cases.

> > +    var guess = {};
> 
> var on purpose?
No. At some point it used to not work with let and I didn't want to investigate closer at the time. It seems to work just fine with let now.

> let wMinuteNA = wMinute.filter(function(val) {
>    return (val.ambigious === undefined);
> });
> 
> In the new Javascript we have you can also do:
> 
> let wMinuteNA = wMinute.filter(function(val) val.ambigious === undefined);
> 
> If you feel like it, you can also use the shortened function syntax in other
> places.
As long as it's readable (and in my opinion it is) I'm all for shorter syntax. This lengthened some lines, hopefully that's ok.

> > +  getPatterns: function getPatterns(name) {
> > +    let value;a
> > +    // this should never be found in an email
> > +    let def = "061dc19c-719f-47f3-b2b5-e767e6f02b7a";
> 
> What happens if I put this in my signature? ;-) (just curious)

061dc19c-719f-47f3-b2b5-e767e6f02b7a will be returned for all patterns which are present but empty in the properties file. So the result depends on which ones are empty in the language that is used. Since completely missing patterns are allowed too there are likely no such patterns for most languages. Currently for en-US this results in matches for weekdays, today and noon for end time (empty until.* patterns). Unless there are any other matches in content the only additional match is email date itself. So I got today as start date and noon (tomorrow) as end time. The event you get based on these depends on UI code.
Comment 29 Philipp Kewisch [:Fallen] 2013-06-22 10:42:29 PDT
Comment on attachment 737222 [details] [diff] [review]
part one - v2

Review of attachment 737222 [details] [diff] [review]:
-----------------------------------------------------------------

r=philipp

The review comments are mostly about adding comments, so I will just push the patch as is on Sunday evening. If you have time to add the comments until then, please ping me and/or push beforehand.

::: calendar/locales/en-US/chrome/calendar/calendar-extract.properties
@@ +3,5 @@
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +# LOCALIZATION NOTE:
> +# you don't have to fill all from.*, until.*, *.prefix and *.suffix patterns
> +# it's ok to leave some empty

Maybe you can add an explaining note what all the properties are used for in general, i.e shortly explain how the extractor works and maybe that some strings shouldn't just be translated but in case there are language specific variants, that they should be added too.

Explain that the strings are not necessarily (at all?) displayed in the UI, but used for detection. If there is a way to test this as a localizer, please also add some steps describing that.

@@ +215,5 @@
> +month.11 = november | nov | nov.
> +month.12 = december | dec | dec.
> +
> +# LOCALIZATION NOTE (weekday.0):
> +# this is Sunday no matter which day the week starts with

Consider rewording so people don't think this note goes for all strings.

"Regardless of what the first day of the week is in your country, 0 is Sunday here"

These also need a localization note about what the strings are used for in general

@@ +226,5 @@
> +from.weekday.6 = saturday | saturdays
> +
> +until.weekday.0 =
> +until.weekday.1 =
> +until.weekday.2 =

These need a localization note about what the string is used for and in which cases it can stay empty.

@@ +235,5 @@
> +
> +# LOCALIZATION NOTE (number.*):
> +# can be a list of values, separate variants by |
> +number.0 = zero
> +number.1 = one | first

Could you add a comment that explains this a bit better? IIUC then it can be a list of anything that describes the number and doesn't  have to be only the two values.

I can imagine languages where there are multiple forms of "first" depending on the declension of the following words. They might want to specify more than two values.

Putting that information into the comment somehow would be good.
Comment 30 Merike (:merike) 2013-06-22 12:57:32 PDT
Created attachment 766345 [details] [diff] [review]
part one - v2 with improved comments

This should improve comments and hopefully minimize the number of questions localizers will have :)
Comment 31 Philipp Kewisch [:Fallen] 2013-06-23 21:14:16 PDT
https://hg.mozilla.org/comm-central/rev/8e475124e59f

Note You need to log in before you can comment on or make changes to this bug.