Closed Bug 657153 Opened 14 years ago Closed 14 years ago

Yahoo search is broken for non-ascii search requests

Tracking

(firefox6+ unaffected, firefox7+ unaffected, firefox8+ unaffected)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox6

unaffected

firefox7

unaffected

firefox8

unaffected

People

(Reporter: unghost, Unassigned)

References

Details

(Whiteboard: [Yahoo broken by our removing Accept-Charset header])

Attachments

(3 files)

Screenshot of yahoo.com in 2011-05-09-03-mozilla-central 14 years ago Alexander L. Slovesnik 121.63 KB, image/png		Details
Screenshot of yahoo.com in 2011-05-08-03-mozilla-central 14 years ago Alexander L. Slovesnik 148.16 KB, image/png		Details
charset log 14 years ago Bob Clary [:bc] (inactive) 53.07 KB, text/html		Details

Alexander L. Slovesnik

Reporter

Description

•

14 years ago

Attached image Screenshot of yahoo.com in 2011-05-09-03-mozilla-central — Details

STR: 1) Open http://search.yahoo.com/ 2) Type search request in Russian (for example "виски") Expected results: List of search for "виски" Actual results: All non-ascii symbols look as ???? Last good build: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-08-03-mozilla-central/ First broken build: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-09-03-mozilla-central/ Regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a8f07cad55e2&tochange=9e31df64bfd7 I suspect that Bug 572652 is culprit.

Alexander L. Slovesnik

Reporter

Comment 1

•

14 years ago

Attached image Screenshot of yahoo.com in 2011-05-08-03-mozilla-central — Details

Alexander L. Slovesnik

Reporter

Comment 2

•

14 years ago

FWIW, faking user agent as Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1) helps.

Kev Needham [:kev]

Comment 4

•

14 years ago

Email sent to Yahoo! asking them to investigate. I assume this would be a Firefox 6 change if it stays on track.

Dão Gottwald [:dao]

Updated

•

14 years ago

tracking-firefox6: --- → ?

Henri Sivonen (:hsivonen)

Comment 5

•

14 years ago

Note to Yahoo!: Accept-Charset already had a constant value and all Firefox builds had the same support for encodings, so paying attention to Accept-Charset was already useless before the header was removed.

Asa Dotzler [:asa]

Updated

•

14 years ago

Whiteboard: Dao nominated without comment

Asa Dotzler [:asa]

Updated

•

14 years ago

tracking-firefox6: ? → +

Masatoshi Kimura [:emk]

Comment 6

•

14 years ago

(In reply to comment #5) > Note to Yahoo!: > Accept-Charset already had a constant value and all Firefox builds had the > same support for encodings, so paying attention to Accept-Charset was > already useless before the header was removed. It's untrue. "intl.charset.default" is localizable and some localized builds of Firefox actually had changed the value. For example, Japanese Firefox 4 sends "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7". If it was just a constant, it would not be a problem from http-fingerprint's perspective in the first place.

Asa Dotzler [:asa]

Comment 7

•

14 years ago

Kev, any response here?

Asa Dotzler [:asa]

Updated

•

14 years ago

Whiteboard: Dao nominated without comment → [Yahoo broken by our removing Accept-Charset header]

Alexander L. Slovesnik

Reporter

Comment 8

•

14 years ago

Looks like Yahoo! Web search (http://search.yahoo.com/) is fixed for non-ascii search requests. Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) and Yahoo! Video Search (http://video.search.yahoo.com/) still broken though.

LegNeato

Comment 9

•

14 years ago

I'm going to back out bug 572652 as the risk vs reward seems off. Backing out bug 572652 shouldn't have any adverse effects on those that already changed server side to ignore / not use the header. Marking as unaffected for 6 due to the backout in bug 572652. I have not backed it out on Aurora or central, so this will crop up again and affects those versions

status-firefox6: --- → fixed

status-firefox7: --- → affected

status-firefox8: --- → affected

tracking-firefox7: --- → +

tracking-firefox8: --- → +

LegNeato

Updated

•

14 years ago

status-firefox6: fixed → unaffected

status-firefox7: affected → unaffected

chris hofmann

Comment 10

•

14 years ago

bc, is there any way for the spider to help in testing this if we turned it back on in aurora and beta

Bob Clary [:bc] (inactive)

Comment 11

•

14 years ago

chofmann: is there a header I could look for in the response that would indicate the presence of the problem?

chris hofmann

Comment 12

•

14 years ago

ok, so this might be tricky. maybe some ideas in how to test over in https://bugzilla.mozilla.org/show_bug.cgi?id=572652 One idea would be to to sent Accept-Charset in one request, then not in the next request, and then look at the diffs to try and detect broken content. Then we can do more evangelizing to the list of sites that are sending different responses.

Bob Clary [:bc] (inactive)

Comment 13

•

14 years ago

chofmann: I tried your idea using XMLHttpRequest to send a default GET to a site followed by a GET with Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 as described in bug 572652. Looking at the content it is very difficult to filter out the differences due to changing session/tracking crap but it appears the charset in the response headers may be the ticket. Spidering the yahoo home page 1 level deep quickly gives: charset differs: http://images.search.yahoo.com/images : charset1 = ISO-8859-1, charset2 = UTF-8 charset differs: http://video.search.yahoo.com/video : charset1 = ISO-8859-1, charset2 = UTF-8 which matches what Alexander said in comment 8. The scan is still running so I don't have a full answer. If this is sufficient we could do yahoo deeper or more of the top sites or piggy back this on the userhook script used in crash testing or Tomcat can use his vms to collect data. Let me know what you think.

chris hofmann

Comment 14

•

14 years ago

we have contacted yahoo and they are working on fixes, so I think scanning yahoo deeper isn't as valuable as doing a broader scan across sites using a good top site list and/or crash urls.

Bob Clary [:bc] (inactive)

Comment 15

•

14 years ago

Ok. I've started a scan with the alexa top-1m list 1 level deep on 1 vm in the colo and will let it run for a while. I doubt that we want it to scan all 1m sites, but I'll let you know before I kill it off.

Bob Clary [:bc] (inactive)

Comment 16

•

14 years ago

Attached file charset log — Details

This is the result of comparing 55000+ pages on the top 389 top sites using a Nightly build on Windows XP 1. using XHR with the default headers to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset1 2. using XHR with the header Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset2. 3. output a charset differs line if charset1 != charset2. I believe most of these results are false positives. I don't think this approach is sufficient since most of the pages appear to display properly with both Namoroka and Nightly. Looking that the results in the browser, it appears in most cases we respect the charset specified in <meta http-equiv="Content-Type" content="text/html; charset=BLAH"> A notable exception pointed out in this bug is http://video.search.yahoo.com/ which Nightly treats as ISO-8859-1 even though the http-equiv specifies UTF-8, Nightly still treats it as ISO-8859-1. Perhaps a different approach would be more informative: 1. get charset1 as above. 2. get charset3 from http-equiv="Content-Type" 3. output a charset differs line if charset3 exists and charset1 != charset3 4. if charset3 does not exist then get charset2 as above and output a charset differs line if charset1 != charset2

Bob Clary [:bc] (inactive)

Comment 17

•

14 years ago

(In reply to Bob Clary [:bc:] from comment #16) I screwed up the alternative approach. > 1. get charset4 from document.characterSet > 2. get charset3 from http-equiv="Content-Type" > 3. output a charset differs line if charset3 exists and charset4 != charset3 > 4. if charset3 does not exist then get charset1 and charset2 as above and output a charset differs line if charset1 != charset2

Alexander L. Slovesnik

Reporter

Comment 18

•

14 years ago

Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) is fixed. Yahoo! Video Search still broken for non-ascii search.

Alexander L. Slovesnik

Reporter

Comment 19

•

14 years ago

Looks like non-ascii search in Yahoo! Video Search is fixed (tested on Mozilla/5.0 (X11; Linux i686; rv:10.0a1) Gecko/20111101 Firefox/10.0a1 ID:20111101031108) Marking this bug as fixed.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

LegNeato

Updated

•

14 years ago

status-firefox8: affected → unaffected

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: Tech Evangelism → Tech Evangelism Graveyard

You need to log in before you can comment on or make changes to this bug.