Last Comment Bug 657153 - Yahoo search is broken for non-ascii search requests
: Yahoo search is broken for non-ascii search requests
Status: RESOLVED FIXED
[Yahoo broken by our removing Accept-...
:
Product: Tech Evangelism Graveyard
Classification: Graveyard
Component: English US (show other bugs)
: unspecified
: All All
: -- major
: ---
Assigned To: english-us
:
Mentors:
: 657156 (view as bug list)
Depends on:
Blocks: 572652
  Show dependency treegraph
 
Reported: 2011-05-14 11:39 PDT by Alexander L. Slovesnik
Modified: 2015-04-19 23:39 PDT (History)
20 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Screenshot of yahoo.com in 2011-05-09-03-mozilla-central (121.63 KB, image/png)
2011-05-14 11:39 PDT, Alexander L. Slovesnik
no flags Details
Screenshot of yahoo.com in 2011-05-08-03-mozilla-central (148.16 KB, image/png)
2011-05-14 11:40 PDT, Alexander L. Slovesnik
no flags Details
charset log (53.07 KB, text/html)
2011-08-21 22:25 PDT, Bob Clary [:bc:]
no flags Details

Description Alexander L. Slovesnik 2011-05-14 11:39:09 PDT
Created attachment 532455 [details]
Screenshot of yahoo.com in 2011-05-09-03-mozilla-central

STR:
1) Open http://search.yahoo.com/
2) Type search request in Russian (for example "виски")

Expected results:
List of search for "виски"

Actual results:
All non-ascii symbols look as ????

Last good build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-08-03-mozilla-central/

First broken build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-09-03-mozilla-central/

Regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a8f07cad55e2&tochange=9e31df64bfd7

I suspect that Bug 572652 is culprit.
Comment 1 Alexander L. Slovesnik 2011-05-14 11:40:02 PDT
Created attachment 532456 [details]
Screenshot of yahoo.com in 2011-05-08-03-mozilla-central
Comment 2 Alexander L. Slovesnik 2011-05-14 11:42:18 PDT
FWIW, faking user agent as Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1) helps.
Comment 3 Alexander L. Slovesnik 2011-05-14 12:45:00 PDT
*** Bug 657156 has been marked as a duplicate of this bug. ***
Comment 4 Kev Needham [:kev] 2011-05-14 12:52:59 PDT
Email sent to Yahoo! asking them to investigate. I assume this would be a Firefox 6 change if it stays on track.
Comment 5 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-05-15 22:58:21 PDT
Note to Yahoo!:
Accept-Charset already had a constant value and all Firefox builds had the same support for encodings, so paying attention to Accept-Charset was already useless before the header was removed.
Comment 6 Masatoshi Kimura [:emk] 2011-06-01 01:12:35 PDT
(In reply to comment #5)
> Note to Yahoo!:
> Accept-Charset already had a constant value and all Firefox builds had the
> same support for encodings, so paying attention to Accept-Charset was
> already useless before the header was removed.
It's untrue. "intl.charset.default" is localizable and some localized builds of Firefox actually had changed the value. For example, Japanese Firefox 4 sends "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7".
If it was just a constant, it would not be a problem from http-fingerprint's perspective in the first place.
Comment 7 Asa Dotzler [:asa] 2011-07-14 14:52:21 PDT
Kev, any response here?
Comment 8 Alexander L. Slovesnik 2011-07-28 12:45:51 PDT
Looks like Yahoo! Web search (http://search.yahoo.com/) is fixed for non-ascii search requests.
Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) and Yahoo! Video Search (http://video.search.yahoo.com/) still broken though.
Comment 9 christian 2011-08-03 17:26:16 PDT
I'm going to back out bug 572652 as the risk vs reward seems off. Backing out bug 572652 shouldn't have any adverse effects on those that already changed server side to ignore / not use the header. 

Marking as unaffected for 6 due to the backout in bug 572652. I have not backed it out on Aurora or central, so this will crop up again and affects those versions
Comment 10 chris hofmann 2011-08-16 14:33:11 PDT
bc, is there any way for the spider to help in testing this if we turned it back on in aurora and beta
Comment 11 Bob Clary [:bc:] 2011-08-16 14:44:51 PDT
chofmann: is there a header I could look for in the response that would indicate the presence of the problem?
Comment 12 chris hofmann 2011-08-16 17:10:29 PDT
ok, so this might be tricky.  maybe some ideas in how to test over in https://bugzilla.mozilla.org/show_bug.cgi?id=572652

One idea would be to to sent Accept-Charset in one request, then not in the next request, and then look at the diffs to try and detect broken content.

Then we can do more evangelizing to the list of sites that are sending different responses.
Comment 13 Bob Clary [:bc:] 2011-08-17 00:48:03 PDT
chofmann: I tried your idea using XMLHttpRequest to send a default GET to a site followed by a GET with Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 as described in bug 572652. Looking at the content it is very difficult to filter out the differences due to changing session/tracking crap but it appears the charset in the response headers may be the ticket. Spidering the yahoo home page 1 level deep quickly gives:

charset differs: http://images.search.yahoo.com/images : charset1 = ISO-8859-1, charset2 = UTF-8
charset differs: http://video.search.yahoo.com/video : charset1 = ISO-8859-1, charset2 = UTF-8

which matches what Alexander said in comment 8. The scan is still running so I don't have a full answer. If this is sufficient we could do yahoo deeper or more of the top sites or piggy back this on the userhook script used in crash testing or Tomcat can use his vms to collect data. Let me know what you think.
Comment 14 chris hofmann 2011-08-17 07:41:36 PDT
we have contacted yahoo and they are working on fixes, so I think scanning yahoo deeper isn't as valuable as doing a broader scan across sites using a good top site list and/or crash urls.
Comment 15 Bob Clary [:bc:] 2011-08-18 05:11:10 PDT
Ok. I've started a scan with the alexa top-1m list 1 level deep on 1 vm in the colo and will let it run for a while. I doubt that we want it to scan all 1m sites, but I'll let you know before I kill it off.
Comment 16 Bob Clary [:bc:] 2011-08-21 22:25:40 PDT
Created attachment 554793 [details]
charset log

This is the result of comparing 55000+ pages on the top 389 top sites using a Nightly build on Windows XP

1. using XHR with the default headers to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset1
2. using XHR with the header Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset2.
3. output a charset differs line if charset1 != charset2.

I believe most of these results are false positives. I don't think this approach is sufficient since most of the pages appear to display properly with both Namoroka and Nightly. Looking that the results in the browser, it appears in most cases we respect the charset specified in <meta http-equiv="Content-Type" content="text/html; charset=BLAH">

A notable exception pointed out in this bug is http://video.search.yahoo.com/ which Nightly treats as ISO-8859-1 even though the http-equiv specifies UTF-8, Nightly still treats it as ISO-8859-1.

Perhaps a different approach would be more informative:

1. get charset1 as above.
2. get charset3 from http-equiv="Content-Type"
3. output a charset differs line if charset3 exists and charset1 != charset3
4. if charset3 does not exist then get charset2 as above and output a charset differs line if charset1 != charset2
Comment 17 Bob Clary [:bc:] 2011-08-21 22:35:15 PDT
(In reply to Bob Clary [:bc:] from comment #16)

I screwed up the alternative approach.

> 1. get charset4 from document.characterSet
> 2. get charset3 from http-equiv="Content-Type"
> 3. output a charset differs line if charset3 exists and charset4 != charset3
> 4. if charset3 does not exist then get charset1 and charset2 as above and output a charset differs line if charset1 != charset2
Comment 18 Alexander L. Slovesnik 2011-09-05 00:26:12 PDT
Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) is fixed. Yahoo! Video Search still broken for non-ascii search.
Comment 19 Alexander L. Slovesnik 2011-11-01 10:49:57 PDT
Looks like non-ascii search in Yahoo! Video Search is fixed (tested on Mozilla/5.0 (X11; Linux i686; rv:10.0a1) Gecko/20111101 Firefox/10.0a1 ID:20111101031108)
Marking this bug as fixed.

Note You need to log in before you can comment on or make changes to this bug.