Closed
Bug 527018
Opened 12 years ago
Closed 12 years ago
Attachment reminder does not works with Greek (and other non-latin)keywords
Categories
(Thunderbird :: Message Compose Window, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 3.0rc1
People
(Reporter: pkst, Assigned: bwinton)
References
Details
(Keywords: l12y, Whiteboard: [has l10n impact])
Attachments
(3 files, 2 obsolete files)
4.30 KB,
patch
|
standard8
:
review+
|
Details | Diff | Splinter Review |
3.23 KB,
patch
|
bwinton
:
review+
|
Details | Diff | Splinter Review |
4.67 KB,
patch
|
standard8
:
review+
|
Details | Diff | Splinter Review |
Attachment reminder that recognizes only latin words as keywords Steps to reproduce 1)Add a a Greek (or a Russian keyword) in the Keyword reminder dialog (or just copy and paste any greek word you like from http://el.wikipedia.org ) 2)write this word to the composer window Result : nothing - no dialog appearing about a missing attachment Expected result: Any word that is the same with the word in the attachments keyword list, should trigger the attachment reminder dialog. This bug is a little bit annoying, since attachment reminder will be one of the new features of TB 3.0, but seems that it fails in most of the non-latin languages.
Comment 1•12 years ago
|
||
This looks like a serious issue for localizers. I can't test this at the moment, but if this turns out to be true, then one of the great hidden features of TB3 will be unusable for these locales: Greek, Russian, Ukrainian, Arabian, Hebrew, Georgian, Punjabi, Sinhala, Korean, Japanese, Chinese, Tamil, Bengalese. Furthermore, locales like French, German, Vietnamese also use non-Latin characters. Does this warrant an RC2?
Severity: normal → major
Flags: blocking-thunderbird3?
Keywords: l12y
Whiteboard: [has l10n impact]
Version: Trunk → 3.0
Comment 2•12 years ago
|
||
The German standard set of key words includes a word with an umlaut ("angehängt"). This key word triggers the attachment reminder as expected. The same for individual added key words including umlauts.
Comment 3•12 years ago
|
||
Copying a greek word from wikipedia, adding it as a key word and finally composing a new mail with this word, doesn't trigger the attachment reminder in the latest nightly 1.9.1 build.
Comment 4•12 years ago
|
||
Ok, I tested this with Kyrilic letters (Ukrainian and Russian), with Arabic letters, Georgian (Kartuli) letters and the letters of the Tamil language. None of them worked. A test with Vietnamese words however showed a positive results. So this only seems to affect these locales: Greek, Russian, Ukrainian, Arabian, Hebrew, Georgian, Punjabi, Sinhala, Korean, Japanese, Chinese, Tamil, Bengalese
Comment 5•12 years ago
|
||
Japanese keywords works well, by the patch of bug 520706. I don't test for Korean and Chinese, but it will be works for them.
Reporter | ||
Comment 6•12 years ago
|
||
Hmmm, (I am not a good coder) But if we follow the spirit of the patch of https://bugzilla.mozilla.org/show_bug.cgi?id=520706 and add an a few extra lines in AttachmentChecker.js: else if (code >= 0x0370 && code <= 0x006FF) { // Greek, Cyrillic, Armenian, Hebrew,Arabic return true; } reminder works OK in the above five languages
Comment 7•12 years ago
|
||
(In reply to comment #6) > Hmmm, (I am not a good coder) But if we follow the spirit of the patch of > https://bugzilla.mozilla.org/show_bug.cgi?id=520706 and add an a few extra > lines in AttachmentChecker.js: > > else if (code >= 0x0370 && code <= 0x006FF) > { > // Greek, Cyrillic, Armenian, Hebrew,Arabic > return true; > } > > reminder works OK in the above five languages Kostas would you mind doing a proper patch and attach it here ?
Reporter | ||
Comment 8•12 years ago
|
||
Greek, Cyrillic, Armenian, Hebrew,Arabic, Tamil, Telugu, Kannada, Malayam, Sinhala, Thai keywords should trigger attachment reminder now
Updated•12 years ago
|
Attachment #411160 -
Flags: superreview?(bugzilla)
Attachment #411160 -
Flags: review?(bugmail)
Updated•12 years ago
|
Whiteboard: [has l10n impact] → [has l10n impact][has patch][needs review asuth, sr standard8]
Comment 9•12 years ago
|
||
Marking blocking and P2, as we would hold for a day to get this. Note that the patch can't land as attached, because it's adding code to check for non-CJK languages in a method called IsCJKWord. It would also be very very desirable to have automated mozmill tests for both the languages that are currently broken, as well as other languages that currently work.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Priority: -- → P2
Target Milestone: --- → Thunderbird 3.0rc1
Comment 10•12 years ago
|
||
Kostas, is this something you'll have time to work on more over the next day or so, or should I try and find someone else who can poke at it?
Comment 11•12 years ago
|
||
Comment on attachment 411160 [details] [diff] [review] Add support for 11 languages r- as per comment 9; no sr necessary since this patch only touches mail/.
Attachment #411160 -
Flags: superreview?(bugzilla)
Attachment #411160 -
Flags: review?(bugmail)
Attachment #411160 -
Flags: review-
Updated•12 years ago
|
Whiteboard: [has l10n impact][has patch][needs review asuth, sr standard8] → [has l10n impact][needs input Kostas; new patch]
Assignee | ||
Comment 12•12 years ago
|
||
So, I've done some tests with http://www.mochikit.com/examples/mochiregexp/index.html, and it looks like putting "\b" before or after the Greek string "Θεωρία" causes the match to fail. (Just to spell that out a little more, my input text is "abc aa Θεωρία def aaa bc". The regexp text "/Θεωρία/" works just fine, but the regexp text "/\bΘεωρία/" and "/Θεωρία\b/" both fail. I don't know why. There's another example of it failing at http://pastebin.mozilla.org/682437 ) However, "/(\s|^)Θεωρία(\s|$)/" does still match correctly (as per http://stackoverflow.com/questions/629377/regex-word-boundary-for-multi-byte-strings). So, how about a patch which uses those instead of "\b"? Later, Blake.
Assignee | ||
Comment 13•12 years ago
|
||
(And yeah, I would be willing to write the patch and take a shot at some unit-tests if Kostas doesn't have time to.)
Comment 14•12 years ago
|
||
Kostas says in comment #8 that he's not a good coder -- I'm not sure I believe it, but I'm not sure it's nice to put a short-deadline blocker on his shoulders. So I suggest you go ahead and give it a whack, Blake. Kostas can I'm sure help review the tests. (It seems to me that we need some better understanding of what \b implies in the various languages, regardless.)
Reporter | ||
Comment 15•12 years ago
|
||
(In reply to comment #10) > Kostas, is this something you'll have time to work on more over the next day or > so, or should I try and find someone else who can poke at it? Dan,better poke someone else. No time for a proper patch. Blake go for it.
Updated•12 years ago
|
Assignee: nobody → bwinton
Whiteboard: [has l10n impact][needs input Kostas; new patch] → [has l10n impact][needs new patch bwinton]
Assignee | ||
Comment 16•12 years ago
|
||
(In reply to comment #14) > (It seems to me that we need some better understanding of what \b implies in > the various languages, regardless.) From the Stack Overflow link: > POSIX regex doesn't specify any character classes for bytes outside the ASCII > range, so it doesn't know that any of the bytes in "\xe7\xb4\x99" (the UTF-8 > representation of '紙') could be considered word-letters; hence it sees no word > boundaries. Now, I don't know if that is what's actually happening, but it makes sense.
Status: NEW → ASSIGNED
Assignee | ||
Comment 17•12 years ago
|
||
We may run into the behaviour described in bug 526724, but I've been running with this for a while, and we haven't run into it yet, so then again, we might not. I know, I know, no tests. I'm working on those next, but we probably want to get started on the review soon. More info about the patch: NOT_W is the massaged output from http://www.codeproject.com/KB/dotnet/UnicodeCharCatHelper.aspx, as one of the examples on that page. It's basically anything that isn't "Ll", "Lu" and "Lt", and should work like \W, if \W knew a little bit about Unicode. Because of that, it doesn't catch other CJK characters, so it looks like I can remove the CJK check. So I did. Thanks, Blake.
Attachment #411160 -
Attachment is obsolete: true
Attachment #411312 -
Flags: review?(mkmelin+mozilla)
Assignee | ||
Comment 18•12 years ago
|
||
I would also really appreciate it if Makoto Kato could check this change out, and make sure it doesn't break the fix for bug 520706. (It seemed to mostly work in my testing, but I'm really shockingly unilingual.)
Whiteboard: [has l10n impact][needs new patch bwinton] → [has l10n impact][patch up, needs r=mkmelin]
Reporter | ||
Comment 19•12 years ago
|
||
Blake's patch works fine with all the languages that I've tested so far. (Chinese and Korean seem to working fine also)
Comment 20•12 years ago
|
||
Comment on attachment 411312 [details] [diff] [review] A first cut at the patch. Looks good to me, r=mkmelin Please add a code comment about NOT_W though
Attachment #411312 -
Flags: review?(mkmelin+mozilla) → review+
Assignee | ||
Comment 21•12 years ago
|
||
And here are some xpcshell tests for the behaviour.
Attachment #411485 -
Flags: review?(bugzilla)
Assignee | ||
Comment 22•12 years ago
|
||
Just so that I don't lose it. Carrying forward r=mkmelin from the previous patch.
Attachment #411312 -
Attachment is obsolete: true
Attachment #411486 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Whiteboard: [has l10n impact][patch up, needs r=mkmelin] → [has l10n impact][fix ready to land][tests need r?standard8]
Updated•12 years ago
|
Attachment #411485 -
Flags: review?(bugzilla) → review+
Updated•12 years ago
|
Whiteboard: [has l10n impact][fix ready to land][tests need r?standard8] → [has l10n impact][ready to land]
Assignee | ||
Comment 23•12 years ago
|
||
Pushed as: http://hg.mozilla.org/releases/comm-1.9.1/rev/d34978771235 http://hg.mozilla.org/comm-central/rev/6e6860f5b85c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [has l10n impact][ready to land] → [has l10n impact]
Comment 24•12 years ago
|
||
I will add Japanese test case more.
Assignee | ||
Comment 25•12 years ago
|
||
m_kato mentioned in irc that "添付mailを送る" should trigger "添付", so I'm re-opening this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 26•12 years ago
|
||
Ta-da. Now with a fixed test, and four more in the same vein. (I'm heading to the office in 30 minutes, so if it's okay and you want to push it before I get back, please feel free to, and I'll check it when I arrive.) Thanks, Blake.
Attachment #411657 -
Flags: review?(bugzilla)
Updated•12 years ago
|
Attachment #411657 -
Flags: review?(bugzilla) → review+
Comment 27•12 years ago
|
||
Comment on attachment 411657 [details] [diff] [review] A patch to un-break the CJK support, as per m_kato's comments. >+ // If the keyword starts (ends) with a CJK character, we don't char As mentioned on irc the spelling needs fixing. r=Standard8 with that.
Assignee | ||
Comment 28•12 years ago
|
||
Fixed and pushed as: http://hg.mozilla.org/releases/comm-1.9.1/rev/e3ebf36a049c http://hg.mozilla.org/comm-central/rev/9542780ebf48 And with that, I believe this bug is well and truly fixed.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Flags: in-testsuite+
Comment 29•12 years ago
|
||
Verified fixed in Tb3 RC1 zh-TW.
You need to log in
before you can comment on or make changes to this bug.
Description
•