Closed
Bug 657153
Opened 13 years ago
Closed 13 years ago
Yahoo search is broken for non-ascii search requests
Categories
(Tech Evangelism Graveyard :: English US, defect)
Tech Evangelism Graveyard
English US
Tracking
(firefox6+ unaffected, firefox7+ unaffected, firefox8+ unaffected)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox6 | + | unaffected |
firefox7 | + | unaffected |
firefox8 | + | unaffected |
People
(Reporter: unghost, Unassigned)
References
Details
(Whiteboard: [Yahoo broken by our removing Accept-Charset header])
Attachments
(3 files)
STR: 1) Open http://search.yahoo.com/ 2) Type search request in Russian (for example "виски") Expected results: List of search for "виски" Actual results: All non-ascii symbols look as ???? Last good build: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-08-03-mozilla-central/ First broken build: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-05-09-03-mozilla-central/ Regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a8f07cad55e2&tochange=9e31df64bfd7 I suspect that Bug 572652 is culprit.
Reporter | ||
Comment 1•13 years ago
|
||
Reporter | ||
Comment 2•13 years ago
|
||
FWIW, faking user agent as Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1) helps.
Comment 4•13 years ago
|
||
Email sent to Yahoo! asking them to investigate. I assume this would be a Firefox 6 change if it stays on track.
Updated•13 years ago
|
tracking-firefox6:
--- → ?
Comment 5•13 years ago
|
||
Note to Yahoo!: Accept-Charset already had a constant value and all Firefox builds had the same support for encodings, so paying attention to Accept-Charset was already useless before the header was removed.
Updated•13 years ago
|
Whiteboard: Dao nominated without comment
Updated•13 years ago
|
Comment 6•13 years ago
|
||
(In reply to comment #5) > Note to Yahoo!: > Accept-Charset already had a constant value and all Firefox builds had the > same support for encodings, so paying attention to Accept-Charset was > already useless before the header was removed. It's untrue. "intl.charset.default" is localizable and some localized builds of Firefox actually had changed the value. For example, Japanese Firefox 4 sends "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7". If it was just a constant, it would not be a problem from http-fingerprint's perspective in the first place.
Comment 7•13 years ago
|
||
Kev, any response here?
Updated•13 years ago
|
Whiteboard: Dao nominated without comment → [Yahoo broken by our removing Accept-Charset header]
Reporter | ||
Comment 8•13 years ago
|
||
Looks like Yahoo! Web search (http://search.yahoo.com/) is fixed for non-ascii search requests. Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) and Yahoo! Video Search (http://video.search.yahoo.com/) still broken though.
I'm going to back out bug 572652 as the risk vs reward seems off. Backing out bug 572652 shouldn't have any adverse effects on those that already changed server side to ignore / not use the header. Marking as unaffected for 6 due to the backout in bug 572652. I have not backed it out on Aurora or central, so this will crop up again and affects those versions
status-firefox6:
--- → fixed
status-firefox7:
--- → affected
status-firefox8:
--- → affected
tracking-firefox7:
--- → +
tracking-firefox8:
--- → +
Comment 10•13 years ago
|
||
bc, is there any way for the spider to help in testing this if we turned it back on in aurora and beta
Comment 11•13 years ago
|
||
chofmann: is there a header I could look for in the response that would indicate the presence of the problem?
Comment 12•13 years ago
|
||
ok, so this might be tricky. maybe some ideas in how to test over in https://bugzilla.mozilla.org/show_bug.cgi?id=572652 One idea would be to to sent Accept-Charset in one request, then not in the next request, and then look at the diffs to try and detect broken content. Then we can do more evangelizing to the list of sites that are sending different responses.
Comment 13•13 years ago
|
||
chofmann: I tried your idea using XMLHttpRequest to send a default GET to a site followed by a GET with Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 as described in bug 572652. Looking at the content it is very difficult to filter out the differences due to changing session/tracking crap but it appears the charset in the response headers may be the ticket. Spidering the yahoo home page 1 level deep quickly gives: charset differs: http://images.search.yahoo.com/images : charset1 = ISO-8859-1, charset2 = UTF-8 charset differs: http://video.search.yahoo.com/video : charset1 = ISO-8859-1, charset2 = UTF-8 which matches what Alexander said in comment 8. The scan is still running so I don't have a full answer. If this is sufficient we could do yahoo deeper or more of the top sites or piggy back this on the userhook script used in crash testing or Tomcat can use his vms to collect data. Let me know what you think.
Comment 14•13 years ago
|
||
we have contacted yahoo and they are working on fixes, so I think scanning yahoo deeper isn't as valuable as doing a broader scan across sites using a good top site list and/or crash urls.
Comment 15•13 years ago
|
||
Ok. I've started a scan with the alexa top-1m list 1 level deep on 1 vm in the colo and will let it run for a while. I doubt that we want it to scan all 1m sites, but I'll let you know before I kill it off.
Comment 16•13 years ago
|
||
This is the result of comparing 55000+ pages on the top 389 top sites using a Nightly build on Windows XP 1. using XHR with the default headers to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset1 2. using XHR with the header Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 to GET the page and response headers and parsing out the charset= from the Content-Type header to obtain charset2. 3. output a charset differs line if charset1 != charset2. I believe most of these results are false positives. I don't think this approach is sufficient since most of the pages appear to display properly with both Namoroka and Nightly. Looking that the results in the browser, it appears in most cases we respect the charset specified in <meta http-equiv="Content-Type" content="text/html; charset=BLAH"> A notable exception pointed out in this bug is http://video.search.yahoo.com/ which Nightly treats as ISO-8859-1 even though the http-equiv specifies UTF-8, Nightly still treats it as ISO-8859-1. Perhaps a different approach would be more informative: 1. get charset1 as above. 2. get charset3 from http-equiv="Content-Type" 3. output a charset differs line if charset3 exists and charset1 != charset3 4. if charset3 does not exist then get charset2 as above and output a charset differs line if charset1 != charset2
Comment 17•13 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #16) I screwed up the alternative approach. > 1. get charset4 from document.characterSet > 2. get charset3 from http-equiv="Content-Type" > 3. output a charset differs line if charset3 exists and charset4 != charset3 > 4. if charset3 does not exist then get charset1 and charset2 as above and output a charset differs line if charset1 != charset2
Reporter | ||
Comment 18•13 years ago
|
||
Non-ascii search in Yahoo! Image Search (http://images.search.yahoo.com) is fixed. Yahoo! Video Search still broken for non-ascii search.
Reporter | ||
Comment 19•13 years ago
|
||
Looks like non-ascii search in Yahoo! Video Search is fixed (tested on Mozilla/5.0 (X11; Linux i686; rv:10.0a1) Gecko/20111101 Firefox/10.0a1 ID:20111101031108) Marking this bug as fixed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•