Closed Bug 221024 Opened 21 years ago Closed 21 years ago

transliterator needs an option to turn ignorable characters to nothingness

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: andreasprilopwww, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0

Unix systems usually have Hebrew fonts with ISO-8859-8 repertoire.
There are no points in ISO-8859-8. Currently Unix Mozilla displays a
question mark for each Hebrew point, which makes such text unreadable.
IMHO it is better to suppress the points and to show only the Hebrew letters.

Reproducible: Always

Steps to Reproduce:
1. Install Hebrew ISO-8859-8 fonts on your Unix system.
2. <http://www.unics.uni-hannover.de/nhtcapri/hebrew.win>
3. <http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html#hebrew>
Actual Results:  
Hebrew points (U+0591 to U+05C4) are displayed as question marks.

Expected Results:  
Hebrew points (U+0591 to U+05C4) should not be displayed
as question marks but should be suppressed.
Hebrew text with points should be displayed on Unix just
as letters without points.
This is a fonts and text layout issue, not a Bidi issue.
Component: BiDi Hebrew & Arabic → Layout: Fonts and Text
.
Assignee: mkaply → font
QA Contact: zach → ian
This makes a lot of sense. Hebrew script is an "abjad", i.e. the main letters of
the writing system are consonants or long vowels, and other vowels are
represented by secondary marks. Unlike diacritics in European languages, if
these secondary marks are not rendered at all, the text is still correct and
still readable, so as long as no glyphs are available for them, it would be much
better to fall back to nothing than to the usual question mark.

The same would apply to Arabic and Syriac vowel signs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
In Gfx:Gtk, 'transliterator'(nsISaveAsCharSet) is called upon if no font is
available to cover a character.  We may need to modify our 'transliterator' to
have more fine-grained control over the way characters passed in are handled. 
Related to this bug (for Xft) is bug 204993. 
Keywords: intl
I was wondering about the transliterator. Does it allow for transliterating to
nothing? If so, the fix is easy.
Yes, it does (attr_FallbackNone). Not only Hebrew/Arabic/Syriac vowel marks but
also Unicode default ignorable codepoints (that are not covered by any other
loaded fonts) have to be turned into nothing.  As it stands, Gfx:GTK invokes the
transliterator with attr_FallbackQuestionMark. However, we can't just call it
with attr_FallbackNone because some characters had better be represented by
question marks instead of being turned to nothing.  What might be done easily is
to remove Hebrew/Arabic/Syriac vowel signs and default ignorable code points
before invoking the transliterator  in nsFontGTKSubstitute::Convert
A couple of alternatives to what I suggested in comment #6 are :

  1. add attr_FallbackIgnoreIgnorable (and
attr_FallbackIgnoreHebrewArabicSyriacVowels if they need to be controlled
separately) to nsISaveAsCharset

  2. explicitly add default_ignorable_codepoints and Hebrew/Arabic/Syriac vowel
marks/points/signs to transliterate.properties file (gentransliterate.pl) to
turn them to nothingness (as is done for InvisibleTimes/ApplyFunction)

I prefer the first method to the second and what I came up with in comment #6
because it's an XP solution and doesn't bloat the transliteration table
unnecessarily. 

I'm taking this and gonna upload a patch once my computer desk arrives :-)
Assignee: font → jshin
Depends on: 221666
I'm changing the summary line because it's not only  Hebrew points but also for
a bunch of other characters(default ignorable code points in Unicode) that need
to be turned to 'nothingness' if we couldn't find any font to cover them.  
Status: NEW → ASSIGNED
Component: Layout: Fonts and Text → Internationalization
OS: SunOS → All
Hardware: Sun → All
Summary: Hebrew points are shown as question marks on Unix → transliterator needs an option to turn ignorable characters to nothingness
Attached patch a simple patch (obsolete) — Splinter Review
This patch makes intl/ depend on gfx/, which I'd rather avoid, but if it's all
right, we can go with this. smontagu, can you come up with the list of
Hebrew/Arabic/Syriac points to turn to nothing in transliteration?
Attached patch alternativeSplinter Review
This patch has to go along with my patch for bug 221666. With this, intl
doesn't depend on gfx.
To avoid an unnecessary hassle, I decided to go with the first approach. This
patch is the same as the first one excep that two files I forgot to 'cvs diff'
(GFX:Win and GFX:Mac) were included. The character list is still tentative,
though because I'm not sure exactly what characters can be turned to nothing in
Hebrew/Arabic/Syriac blocks. Vowel points are obvious, but there are some other
points I'm not sure of.
Attachment #132929 - Attachment is obsolete: true
I think that the list of codepoints is going to contain more or less all
codepoints in the Hebrew and Arabic blocks (i.e. from U+0590 to U+06FF), with
NSM as bidi category. In theory it should also contain Syriac and Thaana as
well, but we need more work to support these scripts in the first place.
This list is grep'ed from the Unicode Character Database as described in my
last comment. I am happy with the Hebrew part, but I would like to get
confirmation for the Arabic part.
Comment on attachment 132978 [details] [diff] [review]
same as the first patch with two missing files for GFX:Win and GFX:Mac added

Asking for r/sr.

Thank you for the list. I've merged the list with the list of default ignorable
codepoints and generated a new version of ignorables_abjadpoints.x-ccmap..
Because it's mostly for machines, I'm not gona upload the new version here.
Attachment #132978 - Flags: superreview?(rbs)
Attachment #132978 - Flags: review?(smontagu)
No longer depends on: 221666
Comment on attachment 132978 [details] [diff] [review]
same as the first patch with two missing files for GFX:Win and GFX:Mac added

r=smontagu
Attachment #132978 - Flags: review?(smontagu) → review+
> To avoid an unnecessary hassle,

what hassle? I like the alternative better since the undue dependency can be
avoided. No need to jump the queue :-) Once you finish off  bug 221666, this
will follow too.
I meant splitting nsCompressedCharMap.h into two parts with macros for accessing
CCMaps put in a new file nsCCMapDefines.h (in intl/unicharutil/public) and the
rest left in gfx (see attachment 132925 [details] [diff] [review]). In addition, nsCompressedCharMap.h
(with macros for accessing CCMaps removed) may as well be moved out of
gfx/public into gfx/src (because it doens't have to be exported any more). 

If you think avoiding the undue dependency is important, I can follow that path.
Can you review attachment 132925 [details] [diff] [review] (in bug 221666)? I asked blizzard for sr (for
removing) without even CCing him, but you should be as appropriate as he is.
bring back the dependency
Depends on: 221666
Attachment #132978 - Flags: superreview?(rbs)
Comment on attachment 132930 [details] [diff] [review]
alternative

With bug 221666 fixed (not yet landed, but I'll as soon as 1.5b starts), this
one works with the last  line in  the following code snippet( nsSaveAsCharset)

 #include "nsSaveAsCharset.h"
 #include "nsCRT.h"
 #include "nsUnicharUtils.h"
+#include "nsCCMapDefines.h"

replaced by

+#include "nsCompressedCharMap.h"

So, I'm asking for r/sr now.
Attachment #132930 - Flags: superreview?(rbs)
Attachment #132930 - Flags: review?(smontagu)
Comment on attachment 132930 [details] [diff] [review]
alternative

sr=rbs
Attachment #132930 - Flags: superreview?(rbs) → superreview+
Comment on attachment 132930 [details] [diff] [review]
alternative

r=smontagu
Attachment #132930 - Flags: review?(smontagu) → review+
fix checked into the trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: