What's related (related links) sidebar can't handle non-Western characters

RESOLVED INCOMPLETE

Status

SeaMonkey
Sidebar
RESOLVED INCOMPLETE
18 years ago
10 years ago

People

(Reporter: Jungshik Shin, Unassigned)

Tracking

({intl})

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

18 years ago
*Symptom:
What's related (related links) sidebar renders
everything in a font for ISO-8859-1 so that
non-Western-European characters are not rendered
correctly in 'related links' sidebar.
The default encoding (set in Edit|Preference|Navigator
|Language) doesn't seem to affect how 'related links'
sidebar is rendered.


* To reproduce: go to any popular non-English (or non-Western
-European) sites   likely to have related links
in non-Western-European and see how related links
are rendered in 'related links' sidebar.
Change the default encoding in Preference dialog
to the encoding of the site and see if there's
any difference.


* Suggestion:

Given that there are so
many sites with no encoding information or incorrect
encoding information, it's very hard to get
right the encoding of links in 'related links'
sidebar.

However, as a zeroth order approximation, Mozilla
may render everything in 'related links' sidebar
as if they're in the default encoding set in
Preference dialog. If a user sets her/his
default encoding to a particular encoding,
sites (s)he visits are more likely to be in
that encoding than any other encodings and
so are related links to sites (s)he visits.

However, for some users, only a small portion
of sites visited (let alone being the
majority which might be the case for some users)
may be indeed in the default encoding. For them,
the zeroth order approximation doesn't work
very well. A better approach would be use
the encoding of the page currently being viewed
to render links in 'related links' sidebar.
There's no guarantee that related links to
the site are in the same encoding as the site,
but it's pretty likely that they are.

Maybe, in case the encoding of related links
can be explicitly determined (from http header
or meta tag), that should be honored and
two approximations mentioned above can be used
only as fallbacks when the encoding cannot
be determined explicitly (or the encoding
is cached in memory from the previous visit).
There's a problem, though, with depending on
the encoding info. provided via http header
or meta tag because there are a lot of sites
that get it wrong.
(Reporter)

Comment 1

18 years ago
The url for a site with non-English related links
is added.
Keywords: intl

Comment 2

18 years ago
 QA contact to blee since he is familiar with international
issues involved in the Related links. I thought the related
links (NS6) is tagged with UTF-8 encoding so that the problem
has to do with the server database not bein correct. 
QA Contact: shrir → blee

Comment 3

18 years ago
This needs to be fixed for the next release -- added the nsbeta1 keyword.

This may be a server-side problem.  In the first What's Related release, the
server only converted Latin1 and Japanese encodings to UTF-8.  I don't know if
the WR vendor fixed this for other encodings.  So the first step would be to
verify that the client is receiving good data from the WR server.
Keywords: nsbeta1

Comment 4

18 years ago
Adding tpringle, kmurray and lbaliman to cc: list.

Comment 5

18 years ago
Assinging to TPringle for resolution from Alexa. May need help from Bob Jung.

Adding Bobj to cc: list.
Assignee: matt → tpringle
Priority: -- → P2

Comment 6

18 years ago
nav triage team:

Marking nsbeta1+
Whiteboard: nsbeta1+

Comment 7

18 years ago
nav triage team:

Resetting priority so that this bug gets retriaged.
Priority: P2 → --

Comment 8

18 years ago
Changing QA Contact to andreasb.
QA Contact: blee → andreasb

Comment 9

18 years ago
Removing nsbeta1+ from status whiteboard, need to figure what to do in general
with what's related.
Whiteboard: nsbeta1+

Comment 10

18 years ago
Changing QA contact to jonrubin@netscape.com.
QA Contact: andreasb → jonrubin
Jon : is this still a problem in NS 6.01? Which uses the Netscape WR tab rather 
than the Alexa tab. 

Comment 12

18 years ago
Is Alexa sending back info correctly converted to UTF-8?  In Alexa's
original implementation, it only did so for ISO-Latin1 and Japanese
charset encodings.  I don't know if Alexa ever fixed its server to convert
other charsets (e.g., Korean charsets) to UTF-8.

It appears from external usage, that Mozilla only displays ISO-Latin1 WR
titles. For pages with non-ISO-Latin1 titles (even Japanese), the WR sidebar 
displays the URL.

Is this because that is what Alexa is returning?  Or is the browser doing
something?

Netscape 4.x displayed Japanese titles in the WR dropdown.  But it appears
that the WR info returned to 4.x is different than what is returned to
Mozilla.  When I try both, I get a different list of related URLs for the
same URL.  Are they pointing to the same WR server/URL or is Alexa sniffing
the browser?

Comment 13

18 years ago
Ccing Matt and Myron - do you guys know the answer to this?

Comment 14

18 years ago
Does this affect 6.01 as vishy asked.   We are checking in that code instead of 
using the Alexa tab.   If it doesn't this bug is only for mozilla and not 
netscape

Comment 15

18 years ago
Vishy, 6.01 appears to be fine.  Japanese characters to display correctly.

Comment 16

18 years ago
I checked Korean as well.  I can see Korean characters in 6.01, but I cannot 
verify as to whether they make any sense.  But Japanese is definitely displaying 
properly.
(Reporter)

Comment 17

18 years ago
In NS 6.0, Korean characters look *mostly* fine,  but Korean characters
in some sites are treated as if they're ISO-8859-1. I guess this is due
to the fact that they're regarded as *non-Korean* sites (and as a result
the conversion to Unicode was not done properly) when the DB entries
for them were made. For instance, try <http://www.ohmynews.com>
and there are two entries in 'What's related' with Korean characters
properly displayed and 5 entries with Korean characters garbled
(rendered as though they're ISO-8859-1).

Comment 18

17 years ago
marking as nsbeta1- per i18n triage.
Keywords: nsbeta1 → nsbeta1-

Comment 19

17 years ago
Todd - Who's got the answer to this one? This is a server side issue, correct.

Comment 20

17 years ago
Matt, do you think Shawkat would know?

Comment 21

17 years ago
Assigning TM = M0.9.2 | P3.

Linda - Can you work with Todd on this one?
Keywords: rtm
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2

Comment 22

17 years ago
Todd, this was first reported quite a while ago, so I checked W/R results today 
(05.25.01) and I am seeing corrupted (German) characters in the W/R results.

Please let me know how you want to proceed and what I can do to help.

Comment 23

17 years ago
Lynn, I think I see What's Related extended characters working in DE 6.1b, but 
not FR 6.1b.  Would you please confirm?

Teruko, would you please check JA 6.1b?  

Comment 24

17 years ago
It depends what you're looking at. The default language for the browser is currently set wrong on FR, so you won't see the right 
information. 

Updated

17 years ago
Assignee: tpringle → vishy
Target Milestone: mozilla0.9.2 → mozilla1.0

Comment 25

17 years ago
Changing milestone, reassigning to vishy.

Updated

17 years ago
Keywords: nsBranch

Comment 26

17 years ago
I see a lot of sites are still broken, even netscape one'
for example

http://www.atour.co.jp/golf/index2.html
http://home.netscape.com/zh/tw/
http://home.netscape.com/zh/cn/
http://home.netscape.com/ko/
http://www.edu.cn/

etc. This is a server side issue. I think we should first run the top 100 intl 
QA sites against this bug (See how many of the what's related links for those 
top 100 intl sites are borken) . It is very sad that this kind of problem still 
happen after years of intergrating "What's related" service into the client. 

Comment 27

17 years ago
I quickly walked through the JA top 100 sites and at least the following sites
are broken for "What's related":

http://www.rakuten.co.jp
http://www.cool.ne.jp
http://www.tok2.com
http://www.suntory.co.jp
http://www.otd.co.jp
http://www.fujitv.co.jp
http://www.melma.com
http://www.alpha-net.ne.jp
-> samir for investigation with help from Frank. 
Assignee: vishy → sgehani
Target Milestone: mozilla1.0 → mozilla0.9.5

Comment 29

17 years ago
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu

Updated

17 years ago
Blocks: 99227

Comment 30

17 years ago
Marking nsbranch- as it was decided in the August bug triage that we wouldn't
have eenough time in eMojo to fix this.  Let's revisit for MachV.
Keywords: nsbranch-

Comment 31

17 years ago
I just tried these again:
> http://www.rakuten.co.jp
> http://www.cool.ne.jp
> http://www.tok2.com
> http://www.suntory.co.jp
> http://www.otd.co.jp
> http://www.fujitv.co.jp
> http://www.melma.com
> http://www.alpha-net.ne.jp

and all but fujitv returned Japanese results in the What's Related sidebar
panel.  fujitv reported no related links at all.

Comment 32

17 years ago
removed keyword nsbranch since it now has nsbranch-, per pdt mtg.
Keywords: nsbranch

Comment 33

17 years ago
Mass-moving lower-priority 0.9.5 bugs off to 0.9.6 to make way for remaining
0.9.4/eMojo bugs, and MachV planning, performance and feature work. If you
disagree with any of these targets, please let me know.
Target Milestone: mozilla0.9.5 → mozilla0.9.6

Updated

17 years ago
No longer blocks: 99227

Updated

17 years ago
Blocks: 107067

Updated

17 years ago
Keywords: nsbranch-

Comment 34

17 years ago
Moving to mozilla0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7

Comment 35

17 years ago
-> mozilla0.9.9
Target Milestone: mozilla0.9.7 → mozilla0.9.9

Comment 36

17 years ago
Sidebar triage team: commercial client invesitgation to be done by Sujay with
help from i18n QA.  (The mozilla What's Related content is entirely web-based.)
Target Milestone: mozilla0.9.9 → Future

Updated

17 years ago
No longer blocks: 107067

Comment 37

17 years ago
As per Sujay's email, filed bug 12290 in bugscape for commercial build.
Product: Browser → Seamonkey
Assignee: samir_bugzilla → nobody
Priority: P3 → --
QA Contact: ruixu → sidebar
Target Milestone: Future → ---

Updated

10 years ago
Depends on: 468337

Comment 38

10 years ago
Currently the Alexa server seems smart enough to return just the URL as the link if the <title> of the linked page contains non-western characters. Please re-open if this is not the case.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → INCOMPLETE

Updated

10 years ago
Duplicate of this bug: 57127
You need to log in before you can comment on or make changes to this bug.