Closed
Bug 221024
Opened 21 years ago
Closed 21 years ago
transliterator needs an option to turn ignorable characters to nothingness
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
People
(Reporter: andreasprilopwww, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(3 files, 1 obsolete file)
107.45 KB,
patch
|
smontagu
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
108.11 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
4.57 KB,
text/plain
|
Details |
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0 Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0 Unix systems usually have Hebrew fonts with ISO-8859-8 repertoire. There are no points in ISO-8859-8. Currently Unix Mozilla displays a question mark for each Hebrew point, which makes such text unreadable. IMHO it is better to suppress the points and to show only the Hebrew letters. Reproducible: Always Steps to Reproduce: 1. Install Hebrew ISO-8859-8 fonts on your Unix system. 2. <http://www.unics.uni-hannover.de/nhtcapri/hebrew.win> 3. <http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html#hebrew> Actual Results: Hebrew points (U+0591 to U+05C4) are displayed as question marks. Expected Results: Hebrew points (U+0591 to U+05C4) should not be displayed as question marks but should be suppressed. Hebrew text with points should be displayed on Unix just as letters without points.
Comment 1•21 years ago
|
||
This is a fonts and text layout issue, not a Bidi issue.
Component: BiDi Hebrew & Arabic → Layout: Fonts and Text
Comment 3•21 years ago
|
||
This makes a lot of sense. Hebrew script is an "abjad", i.e. the main letters of the writing system are consonants or long vowels, and other vowels are represented by secondary marks. Unlike diacritics in European languages, if these secondary marks are not rendered at all, the text is still correct and still readable, so as long as no glyphs are available for them, it would be much better to fall back to nothing than to the usual question mark. The same would apply to Arabic and Syriac vowel signs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 4•21 years ago
|
||
In Gfx:Gtk, 'transliterator'(nsISaveAsCharSet) is called upon if no font is available to cover a character. We may need to modify our 'transliterator' to have more fine-grained control over the way characters passed in are handled. Related to this bug (for Xft) is bug 204993.
Keywords: intl
Comment 5•21 years ago
|
||
I was wondering about the transliterator. Does it allow for transliterating to nothing? If so, the fix is easy.
Assignee | ||
Comment 6•21 years ago
|
||
Yes, it does (attr_FallbackNone). Not only Hebrew/Arabic/Syriac vowel marks but also Unicode default ignorable codepoints (that are not covered by any other loaded fonts) have to be turned into nothing. As it stands, Gfx:GTK invokes the transliterator with attr_FallbackQuestionMark. However, we can't just call it with attr_FallbackNone because some characters had better be represented by question marks instead of being turned to nothing. What might be done easily is to remove Hebrew/Arabic/Syriac vowel signs and default ignorable code points before invoking the transliterator in nsFontGTKSubstitute::Convert
Assignee | ||
Comment 7•21 years ago
|
||
A couple of alternatives to what I suggested in comment #6 are : 1. add attr_FallbackIgnoreIgnorable (and attr_FallbackIgnoreHebrewArabicSyriacVowels if they need to be controlled separately) to nsISaveAsCharset 2. explicitly add default_ignorable_codepoints and Hebrew/Arabic/Syriac vowel marks/points/signs to transliterate.properties file (gentransliterate.pl) to turn them to nothingness (as is done for InvisibleTimes/ApplyFunction) I prefer the first method to the second and what I came up with in comment #6 because it's an XP solution and doesn't bloat the transliteration table unnecessarily. I'm taking this and gonna upload a patch once my computer desk arrives :-)
Assignee: font → jshin
Assignee | ||
Comment 8•21 years ago
|
||
I'm changing the summary line because it's not only Hebrew points but also for a bunch of other characters(default ignorable code points in Unicode) that need to be turned to 'nothingness' if we couldn't find any font to cover them.
Status: NEW → ASSIGNED
Component: Layout: Fonts and Text → Internationalization
OS: SunOS → All
Hardware: Sun → All
Summary: Hebrew points are shown as question marks on Unix → transliterator needs an option to turn ignorable characters to nothingness
Assignee | ||
Comment 9•21 years ago
|
||
This patch makes intl/ depend on gfx/, which I'd rather avoid, but if it's all right, we can go with this. smontagu, can you come up with the list of Hebrew/Arabic/Syriac points to turn to nothing in transliteration?
Assignee | ||
Comment 10•21 years ago
|
||
This patch has to go along with my patch for bug 221666. With this, intl doesn't depend on gfx.
Assignee | ||
Comment 11•21 years ago
|
||
To avoid an unnecessary hassle, I decided to go with the first approach. This patch is the same as the first one excep that two files I forgot to 'cvs diff' (GFX:Win and GFX:Mac) were included. The character list is still tentative, though because I'm not sure exactly what characters can be turned to nothing in Hebrew/Arabic/Syriac blocks. Vowel points are obvious, but there are some other points I'm not sure of.
Assignee | ||
Updated•21 years ago
|
Attachment #132929 -
Attachment is obsolete: true
Comment 12•21 years ago
|
||
I think that the list of codepoints is going to contain more or less all codepoints in the Hebrew and Arabic blocks (i.e. from U+0590 to U+06FF), with NSM as bidi category. In theory it should also contain Syriac and Thaana as well, but we need more work to support these scripts in the first place.
Comment 13•21 years ago
|
||
This list is grep'ed from the Unicode Character Database as described in my last comment. I am happy with the Hebrew part, but I would like to get confirmation for the Arabic part.
Assignee | ||
Comment 14•21 years ago
|
||
Comment on attachment 132978 [details] [diff] [review] same as the first patch with two missing files for GFX:Win and GFX:Mac added Asking for r/sr. Thank you for the list. I've merged the list with the list of default ignorable codepoints and generated a new version of ignorables_abjadpoints.x-ccmap.. Because it's mostly for machines, I'm not gona upload the new version here.
Attachment #132978 -
Flags: superreview?(rbs)
Attachment #132978 -
Flags: review?(smontagu)
Comment 15•21 years ago
|
||
Comment on attachment 132978 [details] [diff] [review] same as the first patch with two missing files for GFX:Win and GFX:Mac added r=smontagu
Attachment #132978 -
Flags: review?(smontagu) → review+
Comment 16•21 years ago
|
||
> To avoid an unnecessary hassle, what hassle? I like the alternative better since the undue dependency can be avoided. No need to jump the queue :-) Once you finish off bug 221666, this will follow too.
Assignee | ||
Comment 17•21 years ago
|
||
I meant splitting nsCompressedCharMap.h into two parts with macros for accessing CCMaps put in a new file nsCCMapDefines.h (in intl/unicharutil/public) and the rest left in gfx (see attachment 132925 [details] [diff] [review]). In addition, nsCompressedCharMap.h (with macros for accessing CCMaps removed) may as well be moved out of gfx/public into gfx/src (because it doens't have to be exported any more). If you think avoiding the undue dependency is important, I can follow that path. Can you review attachment 132925 [details] [diff] [review] (in bug 221666)? I asked blizzard for sr (for removing) without even CCing him, but you should be as appropriate as he is.
Assignee | ||
Updated•21 years ago
|
Attachment #132978 -
Flags: superreview?(rbs)
Assignee | ||
Comment 19•21 years ago
|
||
Comment on attachment 132930 [details] [diff] [review] alternative With bug 221666 fixed (not yet landed, but I'll as soon as 1.5b starts), this one works with the last line in the following code snippet( nsSaveAsCharset) #include "nsSaveAsCharset.h" #include "nsCRT.h" #include "nsUnicharUtils.h" +#include "nsCCMapDefines.h" replaced by +#include "nsCompressedCharMap.h" So, I'm asking for r/sr now.
Attachment #132930 -
Flags: superreview?(rbs)
Attachment #132930 -
Flags: review?(smontagu)
Comment 20•21 years ago
|
||
Comment on attachment 132930 [details] [diff] [review] alternative sr=rbs
Attachment #132930 -
Flags: superreview?(rbs) → superreview+
Comment 21•21 years ago
|
||
Comment on attachment 132930 [details] [diff] [review] alternative r=smontagu
Attachment #132930 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 22•21 years ago
|
||
fix checked into the trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•