Open Bug 1612170 Opened 4 years ago Updated 1 year ago

Make the calExtract module work better

Categories

(Thunderbird :: General, task)

Tracking

(Not tracked)

94 Branch

People

(Reporter: pmorris, Unassigned)

References

Details

(Keywords: leave-open)

Attachments

(2 files, 4 obsolete files)

A follow-up on bug 1608610. See comment 106 (https://bugzilla.mozilla.org/show_bug.cgi?id=1608610#c106). "I predict the calExtract module is broken outside of en-US, but then I think that's already the case. It depends on other locales being available, but they aren't."

Per discussion with Magnus, making this non-blocking for bug 1493008 since it was already an issue before calendar integration.

No longer blocks: 1493008

https://searchfox.org/comm-central/rev/6c7626bd9ad949ec1641453a11cb40e72bce54ee/calendar/base/modules/calExtract.jsm

I think we could improve the detection in general. Using multiple localizations probably is no longer feasible - this was possible in the extension which could repack it's localizations, but we can't really do that.

Things like detecting a full standard ISO date in text should just work, and also length of the event.

Assignee: paul → lasana
Type: defect → task
Summary: Make the calExtract module work with non-en-US locales → Make the calExtract module work better

Magnus may you comment some of the desired improvements here for reference?

Flags: needinfo?(mkmelin+mozilla)

First we could do away with things not accurate/relevant anymore like
https://searchfox.org/comm-central/rev/a8444d358c7abb921d81ee97d73b6f6ba26c7c8a/calendar/base/modules/calExtract.jsm#46-49
... and all the other stuff related to multilocale.

I think we could then see how what we have holds up, check https://github.com/wanasit/chrono/tree/master/test. Potentially we could incorporate that library if it's better.

Flags: needinfo?(mkmelin+mozilla)

In TB 78.7.0 (64-bit) there seems to be a lot of [calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #1 calExtract.jsm:1229 in the error console... see some details below... is that link to this bug?

[Exception... "Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIXPCComponents_Utils.readUTF8URI]" nsresult: "0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH)" location: "JS frame :: resource://gre/modules/L10nRegistry.jsm :: L10nRegistry.loadSync :: line 658" data: no] 2 L10nRegistry.jsm:658:19
[Exception... "Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIXPCComponents_Utils.readUTF8URI]" nsresult: "0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH)" location: "JS frame :: resource://gre/modules/L10nRegistry.jsm :: L10nRegistry.loadSync :: line 658" data: no] 2 L10nRegistry.jsm:658:19
[calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #1 calExtract.jsm:1229
getPositionsFor resource:///modules/calendar/calExtract.jsm:1229
getRepPatterns resource:///modules/calendar/calExtract.jsm:1206
extractHourMinutes resource:///modules/calendar/calExtract.jsm:718
extract resource:///modules/calendar/calExtract.jsm:318
extractFromEmail chrome://calendar/content/calendar-extract.js:125
oncommand chrome://messenger/content/messenger.xhtml:1
[calExtract] Faulty extraction pattern from.hour.minutes, missing parameter #2 calExtract.jsm:1229
getPositionsFor resource:///modules/calendar/calExtract.jsm:1229
getRepPatterns resource:///modules/calendar/calExtract.jsm:1206
extractHourMinutes resource:///modules/calendar/calExtract.jsm:718
extract resource:///modules/calendar/calExtract.jsm:318
extractFromEmail chrome://calendar/content/calendar-extract.js:125
oncommand chrome://messenger/content/messenger.xhtml:1
(...)

This first step attempts to convert to fluent without changing too
much of the logic. Multi-locale and dictionary use has been removed.

Next steps are to refactor the Extractor class for better efficiency.

Status: NEW → ASSIGNED
Attachment #9234521 - Attachment description: WIP: Bug 1612170 - Part 1: Move extract patterns out of localisation. r=mkmelin → Bug 1612170 - Part 1: Move extract patterns out of localisation. r=mkmelin
Attachment #9228345 - Attachment is obsolete: true

This adds a CalExtractParser that can be configured to use different lexical
and parse rules as needed. Unit tests are included that demonstrate the
concept. This parser could probably be improved to be more efficient and possibly
detect potential errors however if I attempt to do that in one go, this patch
will never be finished.

The next step is to translate some of the existing extract rules and compare
results.

Depends on D121651

Attachment #9237355 - Attachment description: Bug 1612170 - Part 2: Add customisable parser for calendar item extraction. r=darktrojan → Bug 1612170 - Add customisable parser for calendar item extraction. r=darktrojan

This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.

Depends on D123287

Attachment #9234521 - Attachment is obsolete: true

This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.

Depends on D123287

Attachment #9240662 - Attachment is obsolete: true
Attachment #9240668 - Attachment is obsolete: true

This is still early days, only rules for parsing the included tests have been added so far. The CalExtractParserService
can be used instead of Extractor via a pref for experimentation.

Target Milestone: --- → 94 Branch
Keywords: leave-open

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/b8045b20b18f
Add customisable parser for calendar item extraction. r=darktrojan
https://hg.mozilla.org/comm-central/rev/cfa14d7c3650
Add CalExtractParserService to allow parsing and extract event info using alternative locales. r=darktrojan

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/c64418e8f10b
followup - fix black linting. rs=black-lint
Severity: normal → S3
Assignee: lasana → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: