Closed
Bug 232487
Opened 21 years ago
Closed 21 years ago
Text identified as lang="sa" (Sanskrit) uses Western fonts, not Devanagari fonts
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
People
(Reporter: jamie, Assigned: jshin1987)
References
()
Details
(Keywords: fixed1.7, intl, l12y)
Attachments
(5 files, 2 obsolete files)
8.79 KB,
patch
|
Details | Diff | Splinter Review | |
3.18 KB,
text/plain
|
Details | |
886 bytes,
text/plain
|
Details | |
9.75 KB,
patch
|
momoi
:
review+
blizzard
:
superreview+
chofmann
:
approval1.7+
|
Details | Diff | Splinter Review |
469 bytes,
patch
|
jshin1987
:
review+
blizzard
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent:
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031225 Firebird/0.7
In the example page, the first piece of devanagari text (marked as lang="hi") is
displayed by Mozilla (and Mozilla Firebird) using whatever font the user has set
for Devanagari. The second piece of devanagari text (marked as lang="sa") is
display by those browsers using whatever font the user has set for Western.
This is problematic because some fonts which include devanagari characters do
not handle conjoint characters properly (eg, Arial Unicode MS), but Mozilla may
use characters from this font if it cannot find them in the specified Western
font. This makes texts marked up as Sanskrit display very poorly (and is indeed
how I noticed the problem).
Reproducible: Always
Steps to Reproduce:
1. Set Devanagari font to Raghindi (or other font suitable for displaying
devanagari)
2. Set Western font to Code2000 (or other font different from the first, but
suitable for displaying devanagari)
3. Visit the supplied URL
Actual Results:
I noticed that the two pieces of devanagari displayed using different fonts.
Expected Results:
Displayed both pieces of devanagari using the same font (the one specified for
Devanagari).
Comment 1•21 years ago
|
||
Good catch. "sa", "mr" (Marathi), "ne" (Nepali) and probably other languages
should all be treated as in the Devanagari langGroup, by which we really mean a
script group.
There is a list of languages written in Devanagari script at
http://omniglot.com/writing/devanagari.htm.
Status: UNCONFIRMED → NEW
Component: Layout: Fonts and Text → Internationalization
Ever confirmed: true
Assignee | ||
Comment 2•21 years ago
|
||
Indeed, a good catch. Currently, there are 170 languages listed in
intl/locale/src/language.properties while there are only 86 languages in
langGroup.properties. We may have to go through two lists and add missing ones
to langGroup.properties.
Assignee: nobody → jshin
Assignee | ||
Comment 3•21 years ago
|
||
Sorry. Actually, langGroup.properties has only 47 languages mapped to langGroups
(script groups).
Simon, what do you think we have to do for languages written in multiple scripts
(I mean not languages like Japanese but languages like Azeri and Mongolian)?
Would adding script 'identifiers' like '-latn' work?
Comment 4•21 years ago
|
||
I would dearly like to move towards including ISO 15924 script codes in language
tags, but we should probably wait for the successor of RFC 3066. The latest
draft is at http://www.ietf.org/internet-drafts/draft-phillips-langtags-00.txt
and discussion takes place in the ietf-languages list archived at
http://eikenes.alvestrand.no/pipermail/ietf-languages/
Reporter | ||
Comment 5•21 years ago
|
||
I've gone through langGroups.properties and language.properties and filled in a
few gaps in the former among languages listed in the latter. Patch to follow.
Which files determine which languages are recognised by Mozilla? It would be
nice to add recognition/support for, say, Maori (which uses the Latin alphabet).
Reporter | ||
Comment 6•21 years ago
|
||
Assignee | ||
Comment 7•21 years ago
|
||
I went about half way through the list the other day. I'll finish that up
sometime soon.
Status: NEW → ASSIGNED
Keywords: l12y
Assignee | ||
Comment 8•21 years ago
|
||
I built a new patch upon Jamie's patch.
Attachment #140226 -
Attachment is obsolete: true
Assignee | ||
Comment 9•21 years ago
|
||
same as before except that I cleaned up a little bit.
Attachment #143095 -
Attachment is obsolete: true
Assignee | ||
Comment 10•21 years ago
|
||
Comment on attachment 143097 [details] [diff] [review]
update
asking for r/sr.
smontagu, if you happen to read this email, please feel free to chime in.
Attachment #143097 -
Flags: superreview?(blizzard)
Attachment #143097 -
Flags: review?(momoi)
Comment 11•21 years ago
|
||
Just a heads-up. This will take a few days to review. Looking at the first 12 or
so entries, there are already some problematical cases where recent language
policy changes don't match what you find in Ethnolgue. We probably should honor
what the official language policy of the country affected rather than what
Ethnolgue describes in such a case.
Assignee | ||
Comment 12•21 years ago
|
||
I guess you're concerned about some entries mapped to 'x-cyrillic'. Anyway,
please take your time and I'll be happy to incorporate your findings.
Comment 13•21 years ago
|
||
Comment on attachment 143097 [details] [diff] [review]
update
Clearing, waiting for momoi.
Attachment #143097 -
Flags: superreview?(blizzard)
Comment 14•21 years ago
|
||
I'm OK with most of what you have added but I want you to address some issues
before checking these in.
For langGroups.properties file:
1. There are those languages in the list that are added but commented out. You
have certain script categories that seem undefined in the code officially, e.g.
Ethiopic. What are your plans? Can we do something like "x-ethiopic-u" to refer
to any category that is defined in Unicode Standard and have all "..-u" entries
map to "x-unicode" for the purpose of font selection?
2. For those commented out, you can simply take note of my comments and add
comments as needed. But for those not commented out, i.e. ce, gd, gl, and om,
please indicate your agreement or disagreement on my suggestions.
For language.properties file:
3. I went through most of them carefully but since it takes a lot of time to
figure out why certain languages are "false" and shoult not appear on the
Accept Language list, I'ved decided to focus my energy on the ones that you
added to appear on the list. There was only one entry that concerned me under
the "true" category. This is "ve (Venda)". See my comment on this. In general,
I did not get a good sense of why certain langs should not appear. (See my
comments on 6 languages with "false" value.) But as long as they don't appear,
there is no practical harm. We may simply wait until someone alerts us about a
language that's not there and then decide to see if we should change from
"false" to "true".
Reporter | ||
Comment 15•21 years ago
|
||
I stated before that Maori (mi) should be x-western, without remembering that it
requires macronised vowels and therefore falls outside the scope of ISO-8859-1.
Those ten characters from Latin Extended-A are the only non ISO-8859-1
characters used.
Comment 16•21 years ago
|
||
Thanks, Jamie. I've gone back and looked at those classified as x-western and
found that a number of them use characters from Latin Extended A, B, and/or
Latin Extended Additional. They are listed in this attachment. I still think
the list may contain some inaccuracies but these corrections should improve the
situation considerably.
Assignee | ||
Comment 17•21 years ago
|
||
(In reply to comment #14)
Thanks for your thorough review.
> For langGroups.properties file:
>
> 1. There are those languages in the list that are added but commented out. You
> have certain script categories that seem undefined in the code officially, e.g.
> Ethiopic. What are your plans? Can we do something like "x-ethiopic-u" to refer
> to any category that is defined in Unicode Standard and have all "..-u" entries
> map to "x-unicode" for the purpose of font selection?
I think the current langGroup approach has to be overhauled eventually (when?)
and 'Uncode' langGroup (which is at best a hack) has to be removed when we do
that. (see also bug 91190). It doesn't scale very well as you found out
reviewing my changes. For instance, Maori is not fully covered by x-western. Is
there any alternative? Thre may be, but it'd not be easy for speakers of
languages like Maori or African languages (that use Latin alphabet) to figure
out which langGroup their languages belong to.
We have to move on to using Unicode code ranges (or something similar). The
only platform where that doesn't work well with that is X11corefont builds. We
have to support that platform (on Linux, we may not have to but for other
commercial Unix, we have to) so that we have to come up with a way to support
it when we make changes.
In the meantime, we may add 'Ethiopic', 'Georgian', 'Armenian' and some others.
I'd love to add all Indic scripts as well, but our support of Indic scripts are
not uniform across platforms and there's no way to add them selectively in a
platform-dependent manner.
> 2. But for those not commented out, i.e. ce, gd, gl, and om,
> please indicate your agreement or disagreement on my suggestions.
I'll check them out and comment later.
> For language.properties file:
>
> 3. I went through most of them carefully but since it takes a lot of time to
> figure out why certain languages are "false" and shoult not appear on the
I gave up rationalizing my choices there :-) At the beginning, I may have had
some criteria, but soon enough it became quite arbitrary as you found out (my
choice is likely to be biased toward European minority languages). I'm sorry for
forcing you to spend your time figuring out my 'rationale'.
> Accept Language list, I'ved decided to focus my energy on the ones that you
> added to appear on the list. There was only one entry that concerned me under
> the "true" category. This is "ve (Venda)". See my comment on this. ... as
I'll do what you suggested about this.
> long as they don't appear, there is no practical harm. We may simply wait
> until someone alerts us about a language that's not there and then decide
> to see if we should change from "false" to "true".
I agree with you on this point.
Comment 18•21 years ago
|
||
I propose to move forward on this after making changes suggested in my 2
attachments. With the changes suggested there, the 2 lists will be much more
better than before. Though we can't deny that there might still be some
inaccuracy left in there, the ones that will appear on the Accpet-Language list
("true") have been examined carefully.
Assignee | ||
Comment 19•21 years ago
|
||
Comment on attachment 143466 [details]
A list of those currently classfied as x-western but use non-Latin 1 characters.
>The ones below are classified currently as x-western but use characters from Latin Extended A/B and/or Latin Extended Additional
>+eo=x-western (Esperanto uses non Latin 1 characters from Latin Extended A.)
x-western is not only Latin-1 but also includes Latin-3 which covers
Esperanto rather well. So, we should be fine here.
Do you have any alternative for others? The font selection dialog in Mozilla
is not exactly about the coverage (that is only important on X11corefont build
but doesn't matter much in other builds, Windows, Xft on Linux and MacOS X) If
you're an Upper Sorbian/Welsh/Cornish speaker, what langGroup would use to
specify your font preferences? It's likely to be 'Western'. For Fijian, Maori
and Yoruba, there's no obvious choice....
Assignee | ||
Updated•21 years ago
|
Attachment #143097 -
Flags: review?(momoi)
Assignee | ||
Comment 20•21 years ago
|
||
I addressed Kat's comments. I added comments to languages in attachment 143466 [details]
while keeping the assignment because I can't think of a better alternative at
the moment. We may
later add 'x-celtic'.
Assignee | ||
Comment 21•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
asking for review.
Attachment #143866 -
Flags: review?(momoi)
Comment 22•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
r=momoi. All my concerns have been addressed in this new patch. Hopefully we
can overhaul the lang group issues addressed in my and jungshik's omments in
the future.
Attachment #143866 -
Flags: review?(momoi) → review+
Assignee | ||
Comment 23•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
thanks for r.
asking for sr.
Attachment #143866 -
Flags: superreview?(blizzard)
Comment 24•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
I'm not terribly happy about all the commented out stuff without explainations
in the various files but I can live.
Attachment #143866 -
Flags: superreview?(blizzard) → superreview+
Assignee | ||
Comment 25•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
thanks for r/sr.
asking for a1.7
This is to add ~10 mappings from languages to langGroups and ~20 languages to
the list of our supported languages that don't require any special handling in
Mozilla (that is, they've been supported for a long time, but haven't been in
the list)
risk : almost zero if not none.
affected users : speakers/users of those languages added
affected platforms : all
Attachment #143866 -
Flags: approval1.7?
Comment 26•21 years ago
|
||
Comment on attachment 143866 [details] [diff] [review]
update addressing Kat's comments
a=chofmann for 1.7
Attachment #143866 -
Flags: approval1.7? → approval1.7+
Assignee | ||
Comment 27•21 years ago
|
||
sorry I forgot to mark this as fixed.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 28•21 years ago
|
||
Updated•21 years ago
|
Attachment #152111 -
Flags: superreview?(blizzard)
Attachment #152111 -
Flags: review?(jshin)
Updated•21 years ago
|
Attachment #152111 -
Flags: superreview?(blizzard) → superreview+
Assignee | ||
Comment 29•21 years ago
|
||
Comment on attachment 152111 [details] [diff] [review]
Fix duplicate line
r=jshin
thanks for catching it.
Attachment #152111 -
Flags: review?(jshin) → review+
You need to log in
before you can comment on or make changes to this bug.
Description
•