Closed Bug 63608 Opened 24 years ago Closed 16 years ago

What's related (related links) sidebar can't handle non-Western characters

Categories

(SeaMonkey :: Sidebar, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jshin, Unassigned)

References

()

Details

(Keywords: intl)

*Symptom:
What's related (related links) sidebar renders
everything in a font for ISO-8859-1 so that
non-Western-European characters are not rendered
correctly in 'related links' sidebar.
The default encoding (set in Edit|Preference|Navigator
|Language) doesn't seem to affect how 'related links'
sidebar is rendered.


* To reproduce: go to any popular non-English (or non-Western
-European) sites   likely to have related links
in non-Western-European and see how related links
are rendered in 'related links' sidebar.
Change the default encoding in Preference dialog
to the encoding of the site and see if there's
any difference.


* Suggestion:

Given that there are so
many sites with no encoding information or incorrect
encoding information, it's very hard to get
right the encoding of links in 'related links'
sidebar.

However, as a zeroth order approximation, Mozilla
may render everything in 'related links' sidebar
as if they're in the default encoding set in
Preference dialog. If a user sets her/his
default encoding to a particular encoding,
sites (s)he visits are more likely to be in
that encoding than any other encodings and
so are related links to sites (s)he visits.

However, for some users, only a small portion
of sites visited (let alone being the
majority which might be the case for some users)
may be indeed in the default encoding. For them,
the zeroth order approximation doesn't work
very well. A better approach would be use
the encoding of the page currently being viewed
to render links in 'related links' sidebar.
There's no guarantee that related links to
the site are in the same encoding as the site,
but it's pretty likely that they are.

Maybe, in case the encoding of related links
can be explicitly determined (from http header
or meta tag), that should be honored and
two approximations mentioned above can be used
only as fallbacks when the encoding cannot
be determined explicitly (or the encoding
is cached in memory from the previous visit).
There's a problem, though, with depending on
the encoding info. provided via http header
or meta tag because there are a lot of sites
that get it wrong.
The url for a site with non-English related links
is added.
 QA contact to blee since he is familiar with international
issues involved in the Related links. I thought the related
links (NS6) is tagged with UTF-8 encoding so that the problem
has to do with the server database not bein correct. 
QA Contact: shrir → blee
This needs to be fixed for the next release -- added the nsbeta1 keyword.

This may be a server-side problem.  In the first What's Related release, the
server only converted Latin1 and Japanese encodings to UTF-8.  I don't know if
the WR vendor fixed this for other encodings.  So the first step would be to
verify that the client is receiving good data from the WR server.
Keywords: nsbeta1
Adding tpringle, kmurray and lbaliman to cc: list.
Assinging to TPringle for resolution from Alexa. May need help from Bob Jung.

Adding Bobj to cc: list.
Assignee: matt → tpringle
Priority: -- → P2
nav triage team:

Marking nsbeta1+
Whiteboard: nsbeta1+
nav triage team:

Resetting priority so that this bug gets retriaged.
Priority: P2 → --
Changing QA Contact to andreasb.
QA Contact: blee → andreasb
Removing nsbeta1+ from status whiteboard, need to figure what to do in general
with what's related.
Whiteboard: nsbeta1+
Changing QA contact to jonrubin@netscape.com.
QA Contact: andreasb → jonrubin
Jon : is this still a problem in NS 6.01? Which uses the Netscape WR tab rather 
than the Alexa tab. 
Is Alexa sending back info correctly converted to UTF-8?  In Alexa's
original implementation, it only did so for ISO-Latin1 and Japanese
charset encodings.  I don't know if Alexa ever fixed its server to convert
other charsets (e.g., Korean charsets) to UTF-8.

It appears from external usage, that Mozilla only displays ISO-Latin1 WR
titles. For pages with non-ISO-Latin1 titles (even Japanese), the WR sidebar 
displays the URL.

Is this because that is what Alexa is returning?  Or is the browser doing
something?

Netscape 4.x displayed Japanese titles in the WR dropdown.  But it appears
that the WR info returned to 4.x is different than what is returned to
Mozilla.  When I try both, I get a different list of related URLs for the
same URL.  Are they pointing to the same WR server/URL or is Alexa sniffing
the browser?
Ccing Matt and Myron - do you guys know the answer to this?
Does this affect 6.01 as vishy asked.   We are checking in that code instead of 
using the Alexa tab.   If it doesn't this bug is only for mozilla and not 
netscape
Vishy, 6.01 appears to be fine.  Japanese characters to display correctly.
I checked Korean as well.  I can see Korean characters in 6.01, but I cannot 
verify as to whether they make any sense.  But Japanese is definitely displaying 
properly.
In NS 6.0, Korean characters look *mostly* fine,  but Korean characters
in some sites are treated as if they're ISO-8859-1. I guess this is due
to the fact that they're regarded as *non-Korean* sites (and as a result
the conversion to Unicode was not done properly) when the DB entries
for them were made. For instance, try <http://www.ohmynews.com>
and there are two entries in 'What's related' with Korean characters
properly displayed and 5 entries with Korean characters garbled
(rendered as though they're ISO-8859-1).
marking as nsbeta1- per i18n triage.
Keywords: nsbeta1nsbeta1-
Todd - Who's got the answer to this one? This is a server side issue, correct.
Matt, do you think Shawkat would know?
Assigning TM = M0.9.2 | P3.

Linda - Can you work with Todd on this one?
Keywords: rtm
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2
Todd, this was first reported quite a while ago, so I checked W/R results today 
(05.25.01) and I am seeing corrupted (German) characters in the W/R results.

Please let me know how you want to proceed and what I can do to help.
Lynn, I think I see What's Related extended characters working in DE 6.1b, but 
not FR 6.1b.  Would you please confirm?

Teruko, would you please check JA 6.1b?  
It depends what you're looking at. The default language for the browser is currently set wrong on FR, so you won't see the right 
information. 
Assignee: tpringle → vishy
Target Milestone: mozilla0.9.2 → mozilla1.0
Changing milestone, reassigning to vishy.
Keywords: nsBranch
I see a lot of sites are still broken, even netscape one'
for example

http://www.atour.co.jp/golf/index2.html
http://home.netscape.com/zh/tw/
http://home.netscape.com/zh/cn/
http://home.netscape.com/ko/
http://www.edu.cn/

etc. This is a server side issue. I think we should first run the top 100 intl 
QA sites against this bug (See how many of the what's related links for those 
top 100 intl sites are borken) . It is very sad that this kind of problem still 
happen after years of intergrating "What's related" service into the client. 
I quickly walked through the JA top 100 sites and at least the following sites
are broken for "What's related":

http://www.rakuten.co.jp
http://www.cool.ne.jp
http://www.tok2.com
http://www.suntory.co.jp
http://www.otd.co.jp
http://www.fujitv.co.jp
http://www.melma.com
http://www.alpha-net.ne.jp
-> samir for investigation with help from Frank. 
Assignee: vishy → sgehani
Target Milestone: mozilla1.0 → mozilla0.9.5
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu
Blocks: 99227
Marking nsbranch- as it was decided in the August bug triage that we wouldn't
have eenough time in eMojo to fix this.  Let's revisit for MachV.
Keywords: nsbranch-
I just tried these again:
> http://www.rakuten.co.jp
> http://www.cool.ne.jp
> http://www.tok2.com
> http://www.suntory.co.jp
> http://www.otd.co.jp
> http://www.fujitv.co.jp
> http://www.melma.com
> http://www.alpha-net.ne.jp

and all but fujitv returned Japanese results in the What's Related sidebar
panel.  fujitv reported no related links at all.

removed keyword nsbranch since it now has nsbranch-, per pdt mtg.
Keywords: nsbranch
Mass-moving lower-priority 0.9.5 bugs off to 0.9.6 to make way for remaining
0.9.4/eMojo bugs, and MachV planning, performance and feature work. If you
disagree with any of these targets, please let me know.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
No longer blocks: 99227
Blocks: 107067
Keywords: nsbranch-
Moving to mozilla0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
-> mozilla0.9.9
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Sidebar triage team: commercial client invesitgation to be done by Sujay with
help from i18n QA.  (The mozilla What's Related content is entirely web-based.)
Target Milestone: mozilla0.9.9 → Future
No longer blocks: 107067
As per Sujay's email, filed bug 12290 in bugscape for commercial build.
Product: Browser → Seamonkey
Assignee: samir_bugzilla → nobody
Priority: P3 → --
QA Contact: ruixu → sidebar
Target Milestone: Future → ---
Depends on: 468337
Currently the Alexa server seems smart enough to return just the URL as the link if the <title> of the linked page contains non-western characters. Please re-open if this is not the case.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.