Closed
Bug 292762
Opened 20 years ago
Closed 20 years ago
Mozilla/FF does reconverts a URL
Categories
(Core :: Networking, defect)
Tracking
()
VERIFIED
INVALID
People
(Reporter: ezh, Assigned: darin.moz)
References
()
Details
(Keywords: intl, testcase)
Attachments
(3 files)
1. Open the page in Moz/FF and IR or Opera. 2. Press the [2] in the main table (move to second page) 3. Moz/FF opens a wrong empty page. IE/Opera opens the right page. It hapens due to URL conversion in Moz/FF in some other codepage. PS Encoding autodetect may be set or turned off - does not matter.
Comment 1•20 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050419 Firefox/1.0.4 WFM, you didn't provide a UA, just said it doesnt work in Mozilla Suite/Firefox but works in Internet Explorer/Opera, and I have a screenshot here (attached) that shows the same page in IE as in FF, after clicking on the [2] as specified.
Updated•20 years ago
|
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WORKSFORME
Comment 2•20 years ago
|
||
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b2) Gecko/20050503 Firefox/1.0+ The URL to page 2 is http://spravka.gramota.ru/buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=2 It should be http://spravka.gramota.ru/buro.html?action=bytext&keyword=&rubrika=&findstr=%F1%E5%F0%E2%E5%F0%FB&page=2 I think that I am seeing the same as you. It is as though the encoding/conversion were being down twice. You are specifying charset=windows-1251 http://code.cside.com/3rdpage/windows/cyrillic.html It looks as though Firefox is generating some sort of two byte characters. What should be %F0 becomes %D1%80 .
| Reporter | ||
Comment 3•20 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050502 Does not work, as comment #2 says.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 4•20 years ago
|
||
(In reply to comment #2) > Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b2) Gecko/20050503 > Firefox/1.0+ > > > You are specifying charset=windows-1251 > http://code.cside.com/3rdpage/windows/cyrillic.html It looks as though Firefox > is generating some sort of two byte characters. I should have had the courage of my convictions. Pasting the hex into http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder shows that it has become UTF-8 I don't know why view source uses CP-1251, the status bar uses CP-1251, but the URL handler uses UTF-8; but there must be a reason!
Comment 5•20 years ago
|
||
This is interesting to me, but the original report states (In reply to comment #0) > 3. Moz/FF opens a wrong empty page. IE/Opera opens the right page. The page is not empty. The page is the same page as seen in other browsers. Therefore, while there is a bug (seemingly) with the conversion, and there's weird stuff in the address bar, the problem is not as described. WORKSFORME still applies. Reporter: Are you not getting the requested page, and still a blank page? Or are you getting the right page? If you're getting the right page and the problem is aesthetic only (meaning: It only looks bad, but still functions just fine) then I suppose this bug should be dropped under Core's "Location Bar" and the severity dropped down to minor (it still works, just not properly). Also, reporter, can you go to "about:" and copy your UA? Example: for me it is Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050502 Firefox/1.0+
Comment 6•20 years ago
|
||
We are doing a look up on the Russian word 'серверы' - 'servers'. The word should be passed from one page to the next in the URL, but this is not happening (seemingly because Firefox is using UTF-8 encoding in the URL, and the site wants CP-1251). It is not the case that the site functions (with or without looking bad). The second page has no hits - is empty. It is as though clicking on the next button (e.g. in Google) transformed your search term from 'servers' to 'sXeYrZvXeYrZsX'. It is possible that http url encoding requires UTF-8 and both the site and any browser that uses CP-1251 in the URL are wrong, see http://weblogs.mozillazine.org/gerv/archives/005539.html . That would imply that IE is wrong, and any server requiring CP-1251 in the URL is mis-guided. Having said that, the http request as viewed in tcpdump (I am using a proxy) does not bear any indication of what encoding is used for the query string.
| Reporter | ||
Comment 7•20 years ago
|
||
Of cause I meant not empry in case of empty page, but of empty main table with the results. The second test is: http://www.gramota.ru/dic/search.php?word=%ED%E8%E6%E5%ED%EA%E0&lop=x&gorb=x&efr=x&ag=x&zar=x&ab=x&sin=x&lv=x&pe=x&az=x In theis URL I misspelled a word and the server offers me to choose the main closes. In Moz/FF it also does not work. And look at the word in the input area - it's also totally misspelled. Actually it's a very popular site in russian speaking community (russian grammar pages)... May someone control it on FF 1.0.?
Comment 8•20 years ago
|
||
After loading the URL I added &page=2 to the contents of the URL bar, and clicked GO to goto the second page: copied from Loacation Bar: http://spravka.gramota.ru/?action=bytext&findstr=%F1%E5%F0%E2%E5%F0%FB&page=2 copied from [2] http://spravka.gramota.ru/buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=2 When you do a view selection source on [1] [2] you'll see: <td align="center" bgcolor="#f1f0f0"> <a class="def" href="buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=1"> [ 1 ] </a> <a class="def" href="buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=2"> <b>[ 2 ]</b> </a></td> View source from Opera uses wordpad: <A class=def HREF='buro.html?action=bytext&keyword=&rubrika=&findstr=серверы&page=1'> [ 1 ] </A> Using Programmers Motepad I made testcases from the opera-saved copy, and got both representations, findstr=серверы changed to findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B& maybe I copied from View Selection source from Mozilla.
Comment 9•20 years ago
|
||
<a class="def" href="buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=1"> <b>[ 1 ]</b> </a>
Comment 10•20 years ago
|
||
<a class="def" href="buro.html?action=bytext&keyword=&rubrika=&findstr=%F1%E5%F0%E2%E5%F0%FB&page=1"> <b>[ 1 ]</b> </a> I don´t see differences between testcases when hovering, the text shown in the statusbar seems to be the same. I see differences, when using the links.
Comment 11•20 years ago
|
||
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8b2) Gecko/20050504 right-click on a link of the original URL, properties, gives: view Selection Source on that link gives: http://spravka.gramota.ru/buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=2 Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.8) Gecko/20050430 Firefox/1.0.4 right-click on a link of the original URL, properties, gives: http://spravka.gramota.ru/buro.html?action=bytext&keyword=&rubrika=&findstr=%F1%E5%F0%E2%E5%F0%FB&page=2 view Selection Source on that link gives: <a class="def" href="buro.html?action=bytext&keyword=&rubrika=&findstr=%D1%81%D0%B5%D1%80%D0%B2%D0%B5%D1%80%D1%8B&page=2"> check the different action of the links using the testcases: compare the testcases by looking at the statusline when hovering compare the testcases by using the links the links of the testcases should go to: http://spravka.gramota.ru/?action=bytext&findstr=%F1%E5%F0%E2%E5%F0%FB&page=1 http://spravka.gramota.ru/?action=bytext&findstr=%F1%E5%F0%E2%E5%F0%FB&page=2
Keywords: testcase
Comment 13•20 years ago
|
||
I'm only seeing a "[1]" on the actual URL. But clicking that exhibits the bug with linux suite trunk build 2005060905, if I understand the description correctly. This regressed between linux trunk builds 2005011206 and 2005011306, pointing to bug 261929. It's possible this is the new "correct" behavior. ==> networking
Comment 14•20 years ago
|
||
I thought I filed a bug on this (I clearly mentioned what I'm gonna write below in another bug) , but I couldn't find it. What's happening is this: 1. Mozilla sends both path and query parts of URLs in UTF-8 2. MS IE and Opera just url-escape the query part 'octet-wise' (without converting to UTF-8). They still use UTF-8 for the path part. (actually, I haven't tested Opera yet, but MS IE certainly does that.) What MS IE and Opera do make sense (at least until every form processing server-side program understands UTF-8) and I guess we have to do the same.
Comment 15•20 years ago
|
||
We should do exactly what is described here: http://whatwg.org/specs/web-forms/current-work/#x-www-form-urlencoded ...or the spec should be changed.
Comment 16•20 years ago
|
||
Well, this bug has little to do with the form submission. We do more or less the right thing when submitting forms (although not exactly the way specified in WHATWG). This bug is about the way we handle URLs with the query part written out in an HTML document like this: <a href="http://www.example.com/test1/test2/test3.cgi?f1=abc&f2=def">Link 1</a>
Comment 17•20 years ago
|
||
Oh, my bad. In that case the spec that reigns in this situation is the IRI spec. I'm not familiar with that spec though. Bjoern, care to make a judgement on what the spec says we should do in this case?
Comment 18•20 years ago
|
||
Well, Martin is here, too :-)
Comment 19•20 years ago
|
||
The test case is interesting, here is what the browsers do: Interent Explorer 6 47 45 54 20 2f 62 75 72 6f 2e 68 74 6d 6c 3f 61 GET /bur o.html?a 63 74 69 6f 6e 3d 62 79 74 65 78 74 26 6b 65 79 ction=by text&key 77 6f 72 64 3d 26 72 75 62 72 69 6b 61 3d 26 66 word=&ru brika=&f 69 6e 64 73 74 72 3d f1 e5 f0 e2 e5 f0 fb 26 70 indstr=. ......&p 61 67 65 3d 31 20 48 54 54 50 2f 31 2e 31 0d 0a age=1 HT TP/1.1.. Opera 8.0 47 45 54 20 2f 62 75 72 6f 2e 68 74 6d 6c 3f 61 GET /bur o.html?a 63 74 69 6f 6e 3d 62 79 74 65 78 74 26 6b 65 79 ction=by text&key 77 6f 72 64 3d 26 72 75 62 72 69 6b 61 3d 26 66 word=&ru brika=&f 69 6e 64 73 74 72 3d 25 46 31 25 45 35 25 46 30 indstr=% F1%E5%F0 25 45 32 25 45 35 25 46 30 25 46 42 26 70 61 67 %E2%E5%F 0%FB&pag 65 3d 31 20 48 54 54 50 2f 31 2e 31 0d 0a 55 73 e=1 HTTP /1.1..Us Gecko/20050323 47 45 54 20 2f 62 75 72 6f 2e 68 74 6d 6c 3f 61 GET /bur o.html?a 63 74 69 6f 6e 3d 62 79 74 65 78 74 26 6b 65 79 ction=by text&key 77 6f 72 64 3d 26 72 75 62 72 69 6b 61 3d 26 66 word=&ru brika=&f 69 6e 64 73 74 72 3d 25 44 31 25 38 31 25 44 30 indstr=% D1%81%D0 25 42 35 25 44 31 25 38 30 25 44 30 25 42 32 25 %B5%D1%8 0%D0%B2% 44 30 25 42 35 25 44 31 25 38 30 25 44 31 25 38 D0%B5%D1 %80%D1%8 42 26 70 61 67 65 3d 31 20 48 54 54 50 2f 31 2e B&page=1 HTTP/1. In other words, MSIE6 and Opera8 use the document encoding to construct the request URL, except that MSIE6 fails to %hh encode the URL, Mozilla uses UTF-8 to construct the request URL (and does %hh escaping). This is a little bit suprising, for a simple ISO-8859-1 test case <a href="Björn?Björn">...</a> Mozilla (since Bug 261929 IIRC) and Opera 8 will request Bj%C3%B6rn?Bj%C3%B6rn and MSIE6 will request Bj%C3%B6rn?Bj%F6rn So I'm not sure why MSIE fails to %hh escape the URL; Opera 8 seems to apply some heuristics to determine when to use %hh escaping in the query component (or maybe even the complete URL). So Mozilla consistently does what http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 suggests and other specifications (like SVG 1.1, XLink 1.0, etc require) (If <a href=""> in HTML is assumed to be an IRI, the result would be the same as long as the URL is in NFC; If it is not in NFC, the user agent must normalize the URL first, depending on the document encoding. I am not aware of any real-world implementation that does that and I do not think any implementation should do that though.)
Comment 20•20 years ago
|
||
(In reply to comment #14) > I thought I filed a bug on this (I clearly mentioned what I'm gonna write below > in another bug) , but I couldn't find it. What's happening is this: > > 1. Mozilla sends both path and query parts of URLs in UTF-8 This is the correct behavior. Putting actual characters into an URI/IRI means that they have to be interpreted as UTF-8. The IRI spec (RFC 3987) says so, and the HTML 4 spec said so years ago (even though it said that putting such character into an URI was a bad idea). The URI/IRI might be to the same site, to the same page, or to some different place. Just because it's in a page encoded e.g. in windows-1251, that doesn't justify in any way to assume that it should be interpreted as windows-1251. If the site expects the data to come back in windows-1251, the right thing is for the site to escape the URI; then there is no problem at all on any browser (correct or not). This is different from when an URI is constructed by the browser from data in form fields; in that case, taking the encoding of the page to encode the information is what has worked best for years, and should not been changed. But it very clearly has to be distinguished from the case above. > 2. MS IE and Opera just url-escape the query part 'octet-wise' (without > converting to UTF-8). They still use UTF-8 for the path part. (actually, I > haven't tested Opera yet, but MS IE certainly does that.) > > > What MS IE and Opera do make sense (at least until every form processing > server-side program understands UTF-8) and I guess we have to do the same. No, it doesn't make sense. The server can easily escape the URI before sending it out, and everything works fine. Copying MSIE's and Opera's mistakes will only get us wedged in a corner, and it will be difficult to get out again. I suggest that issues like this be discussed on a non-browser-specific mailing list, e.g. public-iri@w3.org. Regards, Martin.
Comment 21•20 years ago
|
||
(In reply to comment #20) > putting such character into an URI was a bad idea). The URI/IRI might > be to the same site, to the same page, or to some different place. Just > because it's in a page encoded e.g. in windows-1251, that doesn't justify > in any way to assume that it should be interpreted as windows-1251. I immediately regretted writing comment #14 without any qualification. I certainly agree with you on the above points. > If the site expects the data to come back in windows-1251, the right > thing is for the site to escape the URI; then there is no problem > at all on any browser (correct or not). A little practical problem here: we can't expect every Joe on the street to know this. ... Well, this has to be automatically taken care of by 'authoring tools', but I wonder if there's anyone that does. > not been changed. But it very clearly has to be distinguished from > the case above. sure. comment #16 > The server can easily escape the URI before > sending it out, and everything works fine. Did you mean the server should examine every html it serves and escape URIs before emitting it? resolving as invalid
Status: NEW → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → INVALID
Comment 22•20 years ago
|
||
(In reply to comment #21) > (In reply to comment #20) > > The server can easily escape the URI before > > sending it out, and everything works fine. > > Did you mean the server should examine every html it serves and escape URIs > before emitting it? Sorry I was inprecise. I should have said "the CGI script (or whatever) that constructs the URI/IRI and puts it into the page". It's at that place where it should be known in what encoding that data is expected back at the server. "The server", meaning the generic parts of a Web server, don't know anything about the encoding. Regards, Martin.
Comment 24•20 years ago
|
||
(In reply to comment #22) > Sorry I was inprecise. I should have said "the CGI script (or whatever) that > constructs the URI/IRI and puts it into the page". It's at that place where > it should be known in what encoding that data is expected back at the server. So, we just have to live with 'ignorant' Joe (not the CGI author but someone who just refers to a page with a URL with the query part) putting 'raw' characters in html unless authoring tools help him deal with this problem.
You need to log in
before you can comment on or make changes to this bug.
Description
•