Closed Bug 527018 Opened 12 years ago Closed 12 years ago

Attachment reminder does not works with Greek (and other non-latin)keywords

Categories

(Thunderbird :: Message Compose Window, defect, P2)

x86
Linux

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: pkst, Assigned: bwinton)

References

Details

(Keywords: l12y, Whiteboard: [has l10n impact])

Attachments

(3 files, 2 obsolete files)

Attachment reminder that recognizes only latin words as keywords
Steps to reproduce
1)Add a a Greek (or a Russian keyword) in the Keyword reminder dialog (or just copy and paste any greek word you like from http://el.wikipedia.org )
2)write this word to the composer window 

Result : nothing - no dialog appearing about a missing attachment

Expected result: Any word that is the same with the word in the attachments keyword list, should trigger the attachment reminder dialog.

This bug is a little bit annoying, since attachment reminder will be one of the new features of TB 3.0, but seems that it fails in most of the non-latin languages.
This looks like a serious issue for localizers. I can't test this at the moment, but if this turns out to be true, then one of the great hidden features of TB3 will be unusable for these locales:

Greek, Russian, Ukrainian, Arabian, Hebrew, Georgian, Punjabi, Sinhala, Korean, Japanese, Chinese, Tamil, Bengalese.

Furthermore, locales like French, German, Vietnamese also use non-Latin characters. 

Does this warrant an RC2?
Severity: normal → major
Flags: blocking-thunderbird3?
Keywords: l12y
Whiteboard: [has l10n impact]
Version: Trunk → 3.0
The German standard set of key words includes a word with an umlaut ("angehängt"). This key word triggers the attachment reminder as expected. The same for individual added key words including umlauts.
Copying a greek word from wikipedia, adding it as a key word and finally composing a new mail with this word, doesn't trigger the attachment reminder in the latest nightly 1.9.1 build.
Ok, I tested this with Kyrilic letters (Ukrainian and Russian), with Arabic letters, Georgian (Kartuli) letters and the letters of the Tamil language. None of them worked.

A test with Vietnamese words however showed a positive results. So this only seems to affect these locales: Greek, Russian, Ukrainian, Arabian, Hebrew, Georgian, Punjabi, Sinhala, Korean, Japanese, Chinese, Tamil, Bengalese
Japanese keywords works well, by the patch of bug 520706.
I don't test for Korean and Chinese, but it will be works for them.
Hmmm, (I am not a good coder) But if we follow the spirit of the  patch of https://bugzilla.mozilla.org/show_bug.cgi?id=520706 and add an a few extra lines in AttachmentChecker.js:

  else if (code >= 0x0370 && code <= 0x006FF)
    {
      // Greek, Cyrillic, Armenian, Hebrew,Arabic
      return true;
    }

reminder works OK in the above five languages
(In reply to comment #6)
> Hmmm, (I am not a good coder) But if we follow the spirit of the  patch of
> https://bugzilla.mozilla.org/show_bug.cgi?id=520706 and add an a few extra
> lines in AttachmentChecker.js:
> 
>   else if (code >= 0x0370 && code <= 0x006FF)
>     {
>       // Greek, Cyrillic, Armenian, Hebrew,Arabic
>       return true;
>     }
> 
> reminder works OK in the above five languages

Kostas would you mind doing a proper patch and attach it here ?
Attached patch Add support for 11 languages (obsolete) — Splinter Review
Greek, Cyrillic, Armenian, Hebrew,Arabic, Tamil, Telugu, Kannada, Malayam, Sinhala, Thai keywords should trigger attachment reminder now
Attachment #411160 - Flags: superreview?(bugzilla)
Attachment #411160 - Flags: review?(bugmail)
Whiteboard: [has l10n impact] → [has l10n impact][has patch][needs review asuth, sr standard8]
Marking blocking and P2, as we would hold for a day to get this.  

Note that the patch can't land as attached, because it's adding code to check for non-CJK languages in a method called IsCJKWord.  It would also be very very desirable to have automated mozmill tests for both the languages that are currently broken, as well as other languages that currently work.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Priority: -- → P2
Target Milestone: --- → Thunderbird 3.0rc1
Kostas, is this something you'll have time to work on more over the next day or so, or should I try and find someone else who can poke at it?
Comment on attachment 411160 [details] [diff] [review]
Add support for 11 languages

r- as per comment 9; no sr necessary since this patch only touches mail/.
Attachment #411160 - Flags: superreview?(bugzilla)
Attachment #411160 - Flags: review?(bugmail)
Attachment #411160 - Flags: review-
Whiteboard: [has l10n impact][has patch][needs review asuth, sr standard8] → [has l10n impact][needs input Kostas; new patch]
So, I've done some tests with http://www.mochikit.com/examples/mochiregexp/index.html, and it looks like putting "\b" before or after the Greek string "Θεωρία" causes the match to fail.

(Just to spell that out a little more, my input text is "abc aa Θεωρία def aaa bc".  The regexp text "/Θεωρία/" works just fine, but the regexp text "/\bΘεωρία/" and "/Θεωρία\b/" both fail.  I don't know why.

There's another example of it failing at http://pastebin.mozilla.org/682437
)

However, "/(\s|^)Θεωρία(\s|$)/" does still match correctly (as per http://stackoverflow.com/questions/629377/regex-word-boundary-for-multi-byte-strings).

So, how about a patch which uses those instead of "\b"?

Later,
Blake.
(And yeah, I would be willing to write the patch and take a shot at some unit-tests if Kostas doesn't have time to.)
Kostas says in comment #8 that he's not a good coder -- I'm not sure I believe it, but I'm not sure it's nice to put a short-deadline blocker on his shoulders.  So I suggest you go ahead and give it a whack, Blake.  Kostas can I'm sure help review the tests.

(It seems to me that we need some better understanding of what \b implies in the various languages, regardless.)

(In reply to comment #10)
> Kostas, is this something you'll have time to work on more over the next day or
> so, or should I try and find someone else who can poke at it?

Dan,better poke someone else. No time for a proper patch. 
Blake go for it.
Assignee: nobody → bwinton
Whiteboard: [has l10n impact][needs input Kostas; new patch] → [has l10n impact][needs new patch bwinton]
(In reply to comment #14)
> (It seems to me that we need some better understanding of what \b implies in
> the various languages, regardless.)

From the Stack Overflow link:
> POSIX regex doesn't specify any character classes for bytes outside the ASCII
> range, so it doesn't know that any of the bytes in "\xe7\xb4\x99" (the UTF-8
> representation of '紙') could be considered word-letters; hence it sees no word
> boundaries.

Now, I don't know if that is what's actually happening, but it makes sense.
Status: NEW → ASSIGNED
Attached patch A first cut at the patch. (obsolete) — Splinter Review
We may run into the behaviour described in bug 526724, but I've been running with this for a while, and we haven't run into it yet, so then again, we might not.

I know, I know, no tests.  I'm working on those next, but we probably want to get started on the review soon.

More info about the patch: NOT_W is the massaged output from http://www.codeproject.com/KB/dotnet/UnicodeCharCatHelper.aspx, as one of the examples on that page.  It's basically anything that isn't "Ll", "Lu" and "Lt", and should work like \W, if \W knew a little bit about Unicode.  Because of that, it doesn't catch other CJK characters, so it looks like I can remove the CJK check.  So I did.

Thanks,
Blake.
Attachment #411160 - Attachment is obsolete: true
Attachment #411312 - Flags: review?(mkmelin+mozilla)
I would also really appreciate it if Makoto Kato could check this change out, and make sure it doesn't break the fix for bug 520706.  (It seemed to mostly work in my testing, but I'm really shockingly unilingual.)
Whiteboard: [has l10n impact][needs new patch bwinton] → [has l10n impact][patch up, needs r=mkmelin]
Blake's patch works fine with all the languages that I've tested so far. (Chinese and Korean seem to working fine also)
Comment on attachment 411312 [details] [diff] [review]
A first cut at the patch.

Looks good to me, r=mkmelin
Please add a code comment about NOT_W though
Attachment #411312 - Flags: review?(mkmelin+mozilla) → review+
And here are some xpcshell tests for the behaviour.
Attachment #411485 - Flags: review?(bugzilla)
Just so that I don't lose it.

Carrying forward r=mkmelin from the previous patch.
Attachment #411312 - Attachment is obsolete: true
Attachment #411486 - Flags: review+
Whiteboard: [has l10n impact][patch up, needs r=mkmelin] → [has l10n impact][fix ready to land][tests need r?standard8]
Attachment #411485 - Flags: review?(bugzilla) → review+
Whiteboard: [has l10n impact][fix ready to land][tests need r?standard8] → [has l10n impact][ready to land]
Pushed as:
http://hg.mozilla.org/releases/comm-1.9.1/rev/d34978771235
http://hg.mozilla.org/comm-central/rev/6e6860f5b85c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [has l10n impact][ready to land] → [has l10n impact]
I will add Japanese test case more.
m_kato mentioned in irc that "添付mailを送る" should trigger "添付", so I'm re-opening this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ta-da.  Now with a fixed test, and four more in the same vein.

(I'm heading to the office in 30 minutes, so if it's okay and you want to push it before I get back, please feel free to, and I'll check it when I arrive.)

Thanks,
Blake.
Attachment #411657 - Flags: review?(bugzilla)
Attachment #411657 - Flags: review?(bugzilla) → review+
Comment on attachment 411657 [details] [diff] [review]
A patch to un-break the CJK support, as per m_kato's comments.

>+    // If the keyword starts (ends) with a CJK character, we don't char

As mentioned on irc the spelling needs fixing. r=Standard8 with that.
Fixed and pushed as:
http://hg.mozilla.org/releases/comm-1.9.1/rev/e3ebf36a049c
http://hg.mozilla.org/comm-central/rev/9542780ebf48

And with that, I believe this bug is well and truly fixed.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Flags: in-testsuite+
Verified fixed in Tb3 RC1 zh-TW.
Depends on: 647871
You need to log in before you can comment on or make changes to this bug.