Yahoo search is broken for non-ascii search requests

RESOLVED FIXED

Status

Tech Evangelism Graveyard
English US
--
major
RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: Alexander L. Slovesnik, Unassigned)

Tracking

Details

(Whiteboard: [Yahoo broken by our removing Accept-Charset header])

Attachments

(3 attachments)

(Reporter)

Description

6 years ago
Created attachment 532455 [details]
Screenshot of yahoo.com in 2011-05-09-03-mozilla-central

STR:
1) Open http://search.yahoo.com/
2) Type search request in Russian (for example "виски")

Expected results:
List of search for "виски"

Actual results:
All non-ascii symbols look as ????

Last good build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-08-03-mozilla-central/

First broken build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-09-03-mozilla-central/

Regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a8f07cad55e2&tochange=9e31df64bfd7

I suspect that Bug 572652 is culprit.
(Reporter)

Comment 1

6 years ago
Created attachment 532456 [details]
Screenshot of yahoo.com in 2011-05-08-03-mozilla-central
(Reporter)

Comment 2

6 years ago
FWIW, faking user agent as Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1) helps.
(Reporter)

Updated

6 years ago
Duplicate of this bug: 657156

Comment 4

6 years ago
Email sent to Yahoo! asking them to investigate. I assume this would be a Firefox 6 change if it stays on track.

Updated

6 years ago
tracking-firefox6: --- → ?
Note to Yahoo!:
Accept-Charset already had a constant value and all Firefox builds had the same support for encodings, so paying attention to Accept-Charset was already useless before the header was removed.

Updated

6 years ago
Whiteboard: Dao nominated without comment

Updated

6 years ago
tracking-firefox6: ? → +
(In reply to comment #5)
> Note to Yahoo!:
> Accept-Charset already had a constant value and all Firefox builds had the
> same support for encodings, so paying attention to Accept-Charset was
> already useless before the header was removed.
It's untrue. "intl.charset.default" is localizable and some localized builds of Firefox actually had changed the value. For example, Japanese Firefox 4 sends "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7".
If it was just a constant, it would not be a problem from http-fingerprint's perspective in the first place.

Comment 7

6 years ago
Kev, any response here?

Updated

6 years ago
Whiteboard: Dao nominated without comment → [Yahoo broken by our removing Accept-Charset header]
(Reporter)

Comment 8

6 years ago
Looks like Yahoo! Web search (http://search.yahoo.com/) is fixed for non-ascii search requests.
Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) and Yahoo! Video Search (http://video.search.yahoo.com/) still broken though.

Comment 9

6 years ago
I'm going to back out bug 572652 as the risk vs reward seems off. Backing out bug 572652 shouldn't have any adverse effects on those that already changed server side to ignore / not use the header. 

Marking as unaffected for 6 due to the backout in bug 572652. I have not backed it out on Aurora or central, so this will crop up again and affects those versions
status-firefox6: --- → fixed
status-firefox7: --- → affected
status-firefox8: --- → affected
tracking-firefox7: --- → +
tracking-firefox8: --- → +

Updated

6 years ago
status-firefox6: fixed → unaffected
status-firefox7: affected → unaffected

Comment 10

6 years ago
bc, is there any way for the spider to help in testing this if we turned it back on in aurora and beta

Comment 11

6 years ago
chofmann: is there a header I could look for in the response that would indicate the presence of the problem?

Comment 12

6 years ago
ok, so this might be tricky.  maybe some ideas in how to test over in https://bugzilla.mozilla.org/show_bug.cgi?id=572652

One idea would be to to sent Accept-Charset in one request, then not in the next request, and then look at the diffs to try and detect broken content.

Then we can do more evangelizing to the list of sites that are sending different responses.

Comment 13

6 years ago
chofmann: I tried your idea using XMLHttpRequest to send a default GET to a site followed by a GET with Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 as described in bug 572652. Looking at the content it is very difficult to filter out the differences due to changing session/tracking crap but it appears the charset in the response headers may be the ticket. Spidering the yahoo home page 1 level deep quickly gives:

charset differs: http://images.search.yahoo.com/images : charset1 = ISO-8859-1, charset2 = UTF-8
charset differs: http://video.search.yahoo.com/video : charset1 = ISO-8859-1, charset2 = UTF-8

which matches what Alexander said in comment 8. The scan is still running so I don't have a full answer. If this is sufficient we could do yahoo deeper or more of the top sites or piggy back this on the userhook script used in crash testing or Tomcat can use his vms to collect data. Let me know what you think.

Comment 14

6 years ago
we have contacted yahoo and they are working on fixes, so I think scanning yahoo deeper isn't as valuable as doing a broader scan across sites using a good top site list and/or crash urls.

Comment 15

6 years ago
Ok. I've started a scan with the alexa top-1m list 1 level deep on 1 vm in the colo and will let it run for a while. I doubt that we want it to scan all 1m sites, but I'll let you know before I kill it off.

Comment 16

6 years ago
Created attachment 554793 [details]
charset log

This is the result of comparing 55000+ pages on the top 389 top sites using a Nightly build on Windows XP

1. using XHR with the default headers to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset1
2. using XHR with the header Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset2.
3. output a charset differs line if charset1 != charset2.

I believe most of these results are false positives. I don't think this approach is sufficient since most of the pages appear to display properly with both Namoroka and Nightly. Looking that the results in the browser, it appears in most cases we respect the charset specified in <meta http-equiv="Content-Type" content="text/html; charset=BLAH">

A notable exception pointed out in this bug is http://video.search.yahoo.com/ which Nightly treats as ISO-8859-1 even though the http-equiv specifies UTF-8, Nightly still treats it as ISO-8859-1.

Perhaps a different approach would be more informative:

1. get charset1 as above.
2. get charset3 from http-equiv="Content-Type"
3. output a charset differs line if charset3 exists and charset1 != charset3
4. if charset3 does not exist then get charset2 as above and output a charset differs line if charset1 != charset2

Comment 17

6 years ago
(In reply to Bob Clary [:bc:] from comment #16)

I screwed up the alternative approach.

> 1. get charset4 from document.characterSet
> 2. get charset3 from http-equiv="Content-Type"
> 3. output a charset differs line if charset3 exists and charset4 != charset3
> 4. if charset3 does not exist then get charset1 and charset2 as above and output a charset differs line if charset1 != charset2
(Reporter)

Comment 18

6 years ago
Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) is fixed. Yahoo! Video Search still broken for non-ascii search.
(Reporter)

Comment 19

6 years ago
Looks like non-ascii search in Yahoo! Video Search is fixed (tested on Mozilla/5.0 (X11; Linux i686; rv:10.0a1) Gecko/20111101 Firefox/10.0a1 ID:20111101031108)
Marking this bug as fixed.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
status-firefox8: affected → unaffected
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.