Closed Bug 248792 Opened 20 years ago Closed 19 years ago

pages with bad "lang=" setting in <html> fails to display national language character for that language

Categories

(Core :: Internationalization, defect)

x86
OS/2
defect
Not set
normal

Tracking

()

RESOLVED EXPIRED

People

(Reporter: acrab001, Unassigned)

References

()

Details

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (OS/2; U; Warp 4.5; ko-KR; rv:1.7) Gecko/20040617
Build Identifier: Mozilla/5.0 (OS/2; U; Warp 4.5; ko-KR; rv:1.7) Gecko/20040617

If the page has bad "lang=" info, like "lang=ko   " (serveral blanks after
'ko'), in <html> part,  mozilla fails to display all of its national language
characters.

Similar problem in mozilla mail&news reader, when display mails encoded in UTF-8.

I've tested http://google.co.kr/ which has no "lang=" info in <html> part. But
after download, if put "lang=ko   " in <html> part, the same result as above.


Reproducible: Always
Steps to Reproduce:
1. counrty=082 in CONFIG.SYS and set codepage to 949 in OS/2
2. set primary language setting of mozilla navigator as "ko"
3. view http://kldp.net/ with mozilla 1.7 for OS/2 (you should have korean font
installed)

Actual Results:  
Mozilla fails to disply Korean characters.


Expected Results:  
Mozilla should display proper Korean Characters.
Since the problems are caused by bad HTML coding, I'd say this is more of a Tech
Evangelism issue than a problem with the browser itself. The lang attribute is
supposed to have only the two letter ISO code for the desired country/language,
so having spaces in the lang attribute is actually bad HTML coding. If you ever
see this problem, please contact the webmaster so that he can correct the error.

I tried what you suggested on the Google Web site. There _is_ a slight change
whether I put "ko" or "ko   " in the lang attribute, Gecko seems to change some
subtle things, which I can't quite point out, but no symbols are changed. I'm
guessing that if Gecko reads a bad lang attribute, it ignores it altogether.

I suggest filing this as Tech Evangelism.
I forgot to mention, since I haven't been able to reproduce the bug, it seems
that this another OS/2 specific bug.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040625 Firefox/0.9
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616
(In reply to comment #1)
> Since the problems are caused by bad HTML coding, I'd say this is more of a Tech
> Evangelism issue than a problem with the browser itself. The lang attribute is
> supposed to have only the two letter ISO code for the desired country/language,
> so having spaces in the lang attribute is actually bad HTML coding. If you ever
> see this problem, please contact the webmaster so that he can correct the error.
> 
> I tried what you suggested on the Google Web site. There _is_ a slight change
> whether I put "ko" or "ko   " in the lang attribute, Gecko seems to change some
> subtle things, which I can't quite point out, but no symbols are changed. I'm
> guessing that if Gecko reads a bad lang attribute, it ignores it altogether.
> 
> I suggest filing this as Tech Evangelism.

I think that's the matter of defining letter(when you mean "only the two letter
ISO code for the desired country/language") that it includes white space
characters or not.

and don't know what do you mean by Tech Evangelism.
(In reply to comment #2)

not displaying national characters with bad "lang=" is an OS/2 specific bug.

But, still there's an issue in other OS too. With bad "lang=", Mozilla uses two
fonts for each charset, but with proper "lang=" uses only one font(its national
language font) for both. ("Allow documents to use other fonts" option should be off)

So, fot the first case, mozilla needs font association func, by OS or mozilla
itself and I don't know which one mozilla uses. If the second one(mozilla its
own font association scheme) is the case, then some thing can be done, I think.
(In reply to comment #1)

By the way, where can I find this "lang=" statement spec. (I mean, two letter
ISO code thing)
http://www.w3.org/TR/html4/struct/dirlang.html#h-8.1

The above is the part about the lang attribute in W3C's HTML 4.01 specification.
The section right below (8.1.1) informs about the syntax of language codes. It
seems that the two-letter subcode is not the only allowed as I said before.

Here's the important part as far as you're concerned though, in section 6.8 of
the Basic HTML data types manifest:
http://www.w3.org/TR/html4/types.html#type-langcode

It explicitly states that "whitespace is not allowed within the language-code".
(In reply to comment #6)

I've got that. So this is the problem of the web site.

But I still wonder which font mozilla uses when this is the case.
When bad lang= attribute was used, it look like mozilla uses lang="en" instead,
right?
If mozilla uses lang="en" instead when this is the case, it would be better to
treat it as if no lang attribute specified. Is this (possible) solution a still
Tech Evangelism stuff?
Same statement as

http://bugzilla.mozilla.org/show_bug.cgi?id=248790#c9

If something is worng, plz correct me.
Component: Browser-General → Internationalization
This is both a tech-evangel and a mozilla 'bug'. Mozilla should cope better with
a bad lang specification.  Anyway, why don't you write to the KLDP admin. to fix
their problem? 
I have installed Innotek Font Engine, and now get better view of
http://kldp.net . 

But still some characters are missing(they are too small or bold faced?). I
can't judge which one is a real problem between OS/2 and Mozilla, but I think
Mozilla should be able to handle this better.

I have consulted with Ko Myung Hun(who made a patch for FT/2) about this
matter. He said that it is a problem of OS/2 and can make a patch for this
specific case but that will break OS/2 system's font association scheme. He
also said that mozilla would be able to provide better solution about this
matter.
If I get rid FreeType/2 of fake bold support, I can see bold faced characters
as normal faced with Innotek font engine. Something goes wrong between my fake
bold patch and Innotek's font engine, Eeeeh!
What do you have specified as your unicode font?

In the case of the bad lang, we treat the page as western, and all characters
above FF are displayed using the unicode font in preferences.
(In reply to comment #15)

Now I got the picture of what caused this.

The reason why CJK font fails in this case is, because FreeType/2 registers CJK
unicode font as pifi->szGlyphlistName="PMCHT" "PMJPN" "PMKOR", not "UNICODE". If
I change FT/2 to use "UNICODE" instead "PMxxx", I can see all the characters
correctly.

Problem solved?

NO!!! because font association scheme of OS/2
(OS2.INI->PM_SystemFonts->PM_AssociateFont) does not allow different encoding
from base font encoding. If I use Helv as a base font and "UNICODE"(not "PMKOR")
Gulim as associated font, Gulim is displayed as broken.

So both side have a problem in FT/2.

Of course there is Innotek font engine which ignores OS/2 font drivers and use
its own font rendering, but it has problems too. First, as I stated in comment
#13, it uses font rendering of FT/2 for DBCS bold face, so no character for DBCS
bold face in this case. Second, some Korean characters are missing in some web
pages(ex.
http://news.msn.co.kr/service/msnnews/ShellView.asp?ArticleID=2004071311380550004&LinkID=102
 ) with Mozilla 1.7(Moz 1.7a is fine though, strange)

So I couldn't find flawless solution for this case.

Using Unicode for >0xFF characters in western also could be a problem. For this
works, western font must be a szGlyphlistName="UNICODE" one, but again,
"UNICODE" font breaks font association scheme stated above, in this case, base
font is "UNICODE" and associated "PMxxx". But using "PMUGL" instead "UNICODE"
cause broken unicode chracters in western encoding and broken cyrillic chracters
in cyrillic encoding. So, this can be quite some problem for DBCS
users(only).(for SBCS users, this is not even a problem because they don't need
font assocication scheme.)

As a possible solution, I'm trying to fix FT/2 to register a unicode truetype
font for both "UNICODE", "PMxxx" as a diffrent font family name each. But I'm
not certain this would work.

p.s. to mkaply
Can you help me to find IFI(Intelligent Font Interface) specfication document?
I've searched on internet, but only have found IFI header files in FT/2 source.
I think IBM is the unique source.
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
This bug has been automatically resolved after a period of inactivity (see above
comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → EXPIRED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: