Closed Bug 361098 Opened 18 years ago Closed 11 years ago

alltheweb.com - bad browser-sniffing causing encoding problems with non-IE browsers

Categories

(Tech Evangelism Graveyard :: English US, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jerry, Unassigned)

References

()

Details

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

I was testing Unicode with the current build and found some serious problems that resulted in garbage results 



Reproducible: Always

Steps to Reproduce:
This was the sequence of events: 

(1) go to http:www.alltheweb.com using FF2 
(2) Make certain character set is set to Unicode for browser 
(3) copy and Chinese characters into the search window when language is set to "all" 
(4) do search; what we get appears to be garbage. 


Actual Results:  
.. 12289;#27599;#22825;#37117;#23545;#21271;#26397;#40092;#30340;#22934;#39764;#21270;#25253 ... 20154;#25903;#25345;#23545;#21271;#26397;#40092; "#37319;#21462;#34892;#21160; "#12290 ...
more hits from: http://campus.5341.com/msg/6568.html  -  61 KB

fudanls88 : Messages : 105-134 of 175
... 24320;了\ 一家全台北最棒的PLUB,&#25 ... 32467;束掉台北朝鲜这家PLUB&#30\ 340 ...
more hits from: http://groups.yahoo.com/group/fudanls88/messages/105?expand=1  -  204 KB

Recent Posts
... 35752;论贸易、朝鲜、台湾以&#21450 ... 27861;」前夕,北京消息人&#22763 ...
more hits from: http://users.boardnation.com/~mansonviva/index.php?action=recent  -  100 KB

Blackjack Strategy Secrets Tips
... 20234; # 26391; # 21644; # 21271; # 26397; # 40092; # 21451; # 22909; # 65292; # 22312 ... 20013; # 22269; # 20154; # 12289; # 26397; # 40092; # 20154; # 12289; # 20234; # 26391 ...
more hits from: http://alexayakovlev.myopenweb.com/blackjack/strategy-secrets-tips.html  -  17 KB

Spymac.com - Social Online Community Network :: Forums :: Personal :: The Sky of Thunderbird :: 

Expected Results:  
Using IE 6; we get the following results

用Google Maps发现了北朝鲜的军事基地 - 月光博客
用Google Maps发现了北朝鲜的军事基地 ... 一个奇怪的现象,就是北朝鲜的高解析度地图区域超级多,北朝鲜其实更应该抗议Google ... Google地图上赫然发现了高解析度的北朝鲜泰川(Taechon)军事基地的卫星地图(这个位于 ... 
more hits from: http://www.williamlong.info/archives/288.html  -  52 KB 

第 十 二 课  [Adobe PDF]
... 国 家 安 全、军 事 战 略 和 军 控 立 场 南 韩 和 北 朝 鲜) L12 1 防扩散研究中心东亚部 ... 国家安全、军事战略及 军控立场 (南韩和北朝鲜) (放幻灯片 12-1 ... 
more hits from: http://cns.miis.edu/cns/projects/eanp/training/ttt/lessons/chinese/chle12.pdf  -  646 KB 

Internet Public Library: North Korea
The Internet Public Library (IPL) features a searchable, subject-categorized directory of authoritative websites; links to online texts, newspapers, and magazines; and the Ask A
more hits from: http://www.ipl.org/cgi-bin/reading/news.out.pl?co=North+Korea  -  32 KB 

朝鲜民主主义人民共和国 - Wikipedia
朝鲜民主主义人民共和国. 维基百科,自由的百科全书. (重定向自朝鲜) "朝鲜"重定向至此,关于朝鲜的其它含义请 ... 민주주의인민공화국),简称朝鲜(조선),又稱北韓或北朝鲜,位于亚洲东部朝鲜半岛北端,南部与大韓民 ... 稱作"北韩"。日本將朝鲜民主主义人民共和国稱作"北朝鲜(北朝鮮)",双方沒有外交關係 ... 
more hits from: http://zh.wikipedia.org/wiki/%E6%9C%9D%E9%B2%9C  -  82 KB 

熊猫的走向-方觉政论集:北朝鲜的约会
百家争鸣 熊猫的走向-方觉政论集 方觉自传:《龙会转型吗?- 我在中国的5座监狱》 欢迎在此做广告 北朝鲜的约会 2006年7月5日 ... 这些导弹的类型,北朝鲜也多次表示它拥有不少这类导弹,北朝鲜甚至向一些流氓国家 ... 哈德利(Stephen Hadley)当时指出北朝鲜的核裁军说法是"不严肃的"。北朝鲜试图与美国进行"核 ... 
more hits from: http://www.boxun.com/hero/2006/fangjue/19_1.shtml  -  30 KB 

熊猫的走向-方觉政论集:难以置信的中国-北朝鲜关系
百家争鸣 熊猫的走向-方觉政论集 方觉自传:《龙会转型吗?- 我在中国的5座监狱》 欢迎在此做广告 难以置信的中国-北朝鲜关系 2006年7月22日 ... 如果中国愿意运用自己对北朝鲜的有效压力促使北朝鲜停止导弹开发并放弃寻求核武器,就 ... 
more hits from: http://www.boxun.com/hero/2006/fangjue/21_1.shtml  -  30 KB 

RFA: 中国会对北朝鲜进行怎样的制裁?(林保华)
Radio Free Asia Mandarin 2006.11.14 您在东亚的自由报道 中国会对北朝鲜进行怎样的制裁?(林保华) 2006.10.28 (特约评论文章只代表评论员个人的立场和观点) ... 对零票、无异议通 过对北朝鲜施以制裁,在中、俄要求下,决议中明文排除对北朝鲜使用武力 的任何可能 ... 
more hits from: http://www.rfa.org/mandarin/pinglun/2006/10/28/lin_baohua/  -  15 KB 



We may need to change our our platform design efforts as FF 2 seems not to work reliably in the unicode area.  Unless of course there's an  obvious fix that we are missing.

The Alltheweb people (Yahoo) said that they have received complaints of a similar nature from FF users.

Note; we have also tested in FF safe mode, and the same problem.

Any help would be appreciated.


Thanks, Jerry

jerry@gwu.edu
Note, in this email, the chinese characters that we got using IE were converted to numbers,  In our submission to you, the FF output showed garbage, and the IE 6 showed chinese character search results.
I confirmed this bug by going to http://www.alltheweb.com and copied Chinese characters into the search area.  The result was to show the raw unicode codes rather than the Chinese characters.  I then ran the same test using IE 6, and the site worked fine. 

This looks like a problem that may be on the Blocker level as it may men that the internal browser code is broken.  The other issue is practicality - - - that means that FF 2 does not work and IE 6 does(XP  Platform) on a side-by-side comparison.  So FF my be great for surfers but not professional deployment; as it may not have the built-in flexibility to deal with the real world.  

The fix should be such that it just works so that an average user can see these pages.
*** Bug 361177 has been marked as a duplicate of this bug. ***
They set us up to fail, so it's hardly a surprise when we do. The page they send to IE has a content-type header saying it's UTF-8, and the form has a hidden input saying to interpret the form submission as UTF-8, and so any characters you search for, which will all be capable of being encoded in UTF-8 since essentially everything except Klingon is, will be correctly interpreted. The page they send us has a content-type header saying it's ISO8859-1, so we do exactly the same thing that IE does when faced with the need to submit characters from outside ISO8859-1 in an ISO8859-1 form, and first encode them as numeric character references, and then URL-encode those, and alltheweb.com then fails to properly interpret them.

Simplest way to see that this is the case: install the User Agent Switcher extension in Firefox, and making no other adjustments than to tell it to pretend that you are using IE6/Windows, repeat your test, and you'll see the correct results. For bonus points, save alltheweb.com as served to IE, edit the saved file so that both the content-type meta element and the hidden input element say that it's ISO8859-1, like what they serve us, and repeat your test with that, seeing that the URL IE requests contains the exact same NCRed-then-URL-encoded characters we send.
Assignee: nobody → english-us
Component: General → English US
Product: Firefox → Tech Evangelism
QA Contact: general → english-us
Summary: Unicode Problems with current FF2 XP build → alltheweb.com - asking for iso8859-1, misinterpreting the results
Phil, you say that they, Yahoo (the owners of AllTheWeb), set us up to fail.  Does that mean that Yahoo is, for some reason, targeting and trying to sabotage the Open Source movemet?  What would they gain by doing that?  The main Yahoo site seems to work well with the FF browser.  On the other hand, in reviewing the our bug reports, it seems that there is a significant cluster involving the rendering of non-roman character sets and other unicode issues .  Is anybody following those bugs as a unicode rendering issue; not just isolated bugs?  That might be a good idea.

Btw, the user agent change to IE6 works fine; however, I was wondering if there was a way to make the user experience a bit easier by automating the problem detection process so that changes are done automatically or something else is broaght into play.



(In reply to comment #4)
> They set us up to fail, so it's hardly a surprise when we do. The page they
> send to IE has a content-type header saying it's UTF-8, and the form has a
> hidden input saying to interpret the form submission as UTF-8, and so any
> characters you search for, which will all be capable of being encoded in UTF-8
> since essentially everything except Klingon is, will be correctly interpreted.
> The page they send us has a content-type header saying it's ISO8859-1, so we do
> exactly the same thing that IE does when faced with the need to submit
> characters from outside ISO8859-1 in an ISO8859-1 form, and first encode them
> as numeric character references, and then URL-encode those, and alltheweb.com
> then fails to properly interpret them.
> Simplest way to see that this is the case: install the User Agent Switcher
> extension in Firefox, and making no other adjustments than to tell it to
> pretend that you are using IE6/Windows, repeat your test, and you'll see the
> correct results. For bonus points, save alltheweb.com as served to IE, edit the
> saved file so that both the content-type meta element and the hidden input
> element say that it's ISO8859-1, like what they serve us, and repeat your test
> with that, seeing that the URL IE requests contains the exact same
> NCRed-then-URL-encoded characters we send.

Comment 4 describes this bug perfectly. Smoky and Jerald, you guys should both contact AllTheWeb with a standard evangelism letter and inform them their site is broken in non-IE browsers.

http://www.mozilla.org/projects/tech-evangelism/site/procedures.html#contacting
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: alltheweb.com - asking for iso8859-1, misinterpreting the results → alltheweb.com - bad browser-sniffing causing encoding problems with non-IE browsers
404
Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.