Make the calExtract module work better
Categories
(Thunderbird :: General, task)
Tracking
(Not tracked)
People
(Reporter: pmorris, Unassigned)
References
Details
(Keywords: leave-open)
Attachments
(2 files, 4 obsolete files)
A follow-up on bug 1608610. See comment 106 (https://bugzilla.mozilla.org/show_bug.cgi?id=1608610#c106). "I predict the calExtract module is broken outside of en-US, but then I think that's already the case. It depends on other locales being available, but they aren't."
Reporter | ||
Comment 1•4 years ago
|
||
Per discussion with Magnus, making this non-blocking for bug 1493008 since it was already an issue before calendar integration.
Comment 2•4 years ago
|
||
I think we could improve the detection in general. Using multiple localizations probably is no longer feasible - this was possible in the extension which could repack it's localizations, but we can't really do that.
Things like detecting a full standard ISO date in text should just work, and also length of the event.
Comment 3•4 years ago
|
||
Magnus may you comment some of the desired improvements here for reference?
Comment 4•4 years ago
|
||
First we could do away with things not accurate/relevant anymore like
https://searchfox.org/comm-central/rev/a8444d358c7abb921d81ee97d73b6f6ba26c7c8a/calendar/base/modules/calExtract.jsm#46-49
... and all the other stuff related to multilocale.
I think we could then see how what we have holds up, check https://github.com/wanasit/chrono/tree/master/test. Potentially we could incorporate that library if it's better.
Comment 5•3 years ago
|
||
In TB 78.7.0 (64-bit) there seems to be a lot of [calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #1 calExtract.jsm:1229 in the error console... see some details below... is that link to this bug?
[Exception... "Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIXPCComponents_Utils.readUTF8URI]" nsresult: "0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH)" location: "JS frame :: resource://gre/modules/L10nRegistry.jsm :: L10nRegistry.loadSync :: line 658" data: no] 2 L10nRegistry.jsm:658:19
[Exception... "Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIXPCComponents_Utils.readUTF8URI]" nsresult: "0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH)" location: "JS frame :: resource://gre/modules/L10nRegistry.jsm :: L10nRegistry.loadSync :: line 658" data: no] 2 L10nRegistry.jsm:658:19
[calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #1 calExtract.jsm:1229
getPositionsFor resource:///modules/calendar/calExtract.jsm:1229
getRepPatterns resource:///modules/calendar/calExtract.jsm:1206
extractHourMinutes resource:///modules/calendar/calExtract.jsm:718
extract resource:///modules/calendar/calExtract.jsm:318
extractFromEmail chrome://calendar/content/calendar-extract.js:125
oncommand chrome://messenger/content/messenger.xhtml:1
[calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #2 calExtract.jsm:1229
getPositionsFor resource:///modules/calendar/calExtract.jsm:1229
getRepPatterns resource:///modules/calendar/calExtract.jsm:1206
extractHourMinutes resource:///modules/calendar/calExtract.jsm:718
extract resource:///modules/calendar/calExtract.jsm:318
extractFromEmail chrome://calendar/content/calendar-extract.js:125
oncommand chrome://messenger/content/messenger.xhtml:1
(...)
Comment 7•3 years ago
|
||
This first step attempts to convert to fluent without changing too
much of the logic. Multi-locale and dictionary use has been removed.
Next steps are to refactor the Extractor class for better efficiency.
Updated•3 years ago
|
Comment 8•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 9•3 years ago
|
||
This adds a CalExtractParser that can be configured to use different lexical
and parse rules as needed. Unit tests are included that demonstrate the
concept. This parser could probably be improved to be more efficient and possibly
detect potential errors however if I attempt to do that in one go, this patch
will never be finished.
The next step is to translate some of the existing extract rules and compare
results.
Depends on D121651
Updated•3 years ago
|
Comment 10•3 years ago
|
||
This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.
Depends on D123287
Updated•3 years ago
|
Comment 11•3 years ago
|
||
This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.
Depends on D123287
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/b8045b20b18f
Add customisable parser for calendar item extraction. r=darktrojan
https://hg.mozilla.org/comm-central/rev/cfa14d7c3650
Add CalExtractParserService to allow parsing and extract event info using alternative locales. r=darktrojan
Comment 14•3 years ago
|
||
Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/c64418e8f10b followup - fix black linting. rs=black-lint
Updated•2 years ago
|
Updated•1 year ago
|
Description
•