Closed
Bug 14155
Opened 25 years ago
Closed 25 years ago
Submitting non-ASCII keyword in location field disabled
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
M15
People
(Reporter: blee, Assigned: ftang)
References
Details
(Whiteboard: [PDT+]schedule to check in 2/28 night)
To see this, Type in a JA keyword (eg, "tousiba" in hiragana) and hit return. ==> The text input disappears. Instead a horizontal line (rotated L shape) appears. Hitting backspace bar at this point will display "http://" in the same field. Observed in Win32 99-09-16-09-M11 bld.
Assignee | ||
Updated•25 years ago
|
Assignee: ftang → rjc
Assignee | ||
Comment 1•25 years ago
|
||
rjc, are you working on internet keyword ? Let me know what can I help. Also, visit blee's cube to ask him to reproduce this bug to you.Thanks
Updated•25 years ago
|
Assignee: rjc → warren
Comment 2•25 years ago
|
||
Nope, I'm not working on internet keywords, which are being implemented in Necko. Warren, any idea here?
Updated•25 years ago
|
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
The problem still happens in Win32 12-02-08-M12 bld. reopening
Updated•25 years ago
|
Assignee: warren → valeski
Status: REOPENED → NEW
Target Milestone: M14
Summary: Submitting double byte keyword in location field disabled → [BETA] Submitting double byte keyword in location field disabled
We need this bug fixed by M13 so that proper testing can be done after the feature works with JA characters. If it's fixed by M14, we won't be able to test it before beta.
Comment 6•25 years ago
|
||
Can I enter unicode chars into the location bar using a US keyboard? If you can't even get the chars to display, this is a widget bug, not a necko bug (yet ;).
Text input (CJK, High ASCII, etc) is displayed while they're typed in. It's the result of committing the input that is broken. In a recent bld (Win 12-02-08- M12), carriage return after JA input displays "http://" in location field.
Comment 8•25 years ago
|
||
can I enter unicode chars using my US keyboard?
Unicode is used for internal conversion of char's in client. If you're asking whether you can type non ASCII char's using US KB, the answer is yes. Try Alt+ 0192 (or 0193, 0196, etc) using US KB. If you want other specific char's belonging to a certain language, you need to install that particular KB driver from Ctrl panel.
Comment 10•25 years ago
|
||
This is part of a larger problem (I can't find the true bug right now) which is that no protocols can handle unicode right now.
Comment 11•25 years ago
|
||
I disagree with that last comment. I think the location bar needs to convert the unicode to utf-8 before calling necko. That's the way we agreed that nsIURI should work.
Comment 12•25 years ago
|
||
You're right. My fault. I had forgotten our conclusion re: the unicode URI issue. So how does this conversion work?
Comment 13•25 years ago
|
||
You should ask one of the i18n guys how to do this.
Assignee | ||
Comment 15•25 years ago
|
||
Add nhotta, erik and bobj to the cc. Is that true we agree to use escaped UTF-8 for internet keyword for 4.x ? For Unicode conversion- talk to naoki. He have a function which will convert Unicode to escaped UTF-8. (and back)
Comment 16•25 years ago
|
||
Yes, Netscape 4.X and Netcenter's keyword server agreed to use UTF-8 for Internet Keywords. Since these strings are passed as URLs, they need to be "URL-encoded", i.e. %XX (percent followed by hex digits).
Comment 17•25 years ago
|
||
Here is the conversion interface Frank mentioned. http://lxr.mozilla.org/seamonkey/source/intl/uconv/idl/nsITextToSubURI.idl I attached examples of the usage in bug 25034.
Updated•25 years ago
|
Summary: [BETA] Submitting double byte keyword in location field disabled → Submitting double byte keyword in location field disabled
Updated•25 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] [02/11/00]
Comment 18•25 years ago
|
||
well, I can't enter any Unicode in to the location bar now. I'm trying the Alt+0192 trick.
Reporter | ||
Comment 19•25 years ago
|
||
With 2/9/00 Win32 bld, I could enter extended char's in to the location bar using Alt key combination. CJK char input still can't be committed as explained in the original problem description.
Comment 20•25 years ago
|
||
I still can't enter unicode. are you using a home grown build or a release? I may have to repair this blindly (without testing). Here's what I've done. I've added a method to nsITextToSubURI.idl called "Convert()" it does everything convertandescape does, except the escape. I far from an expert on string conversion, but we don't want to escape the URL before we try and parse it because special chars will be escaped that we need to use. Thus I want to convert to UTF8 (what is the actual charset string I should be using in the api?) without losing the original URI...??? Is that even possible? Maybe someone on the i18n team should get this bug? The bottom line is URI's aren't unicode and this is callsite cleanup. I need traction on this. Either someone in the i18n group should get this bug or someone needs to call me or be a little more responsive.
Comment 21•25 years ago
|
||
ok. over to ftang per blee's suggestion. ftang, the code in question is here http://lxr.mozilla.org/seamonkey/source/webshell/src/nsWebShell.cpp#2034 . The URLspec passed into LoadURL() is the one we want to get down to UTF8. I'm not sure if we want to do this for *all* urls (probably) or just keyword urls. Call me if you have questions about the call site.
Assignee: valeski → ftang
Status: ASSIGNED → NEW
Assignee | ||
Comment 22•25 years ago
|
||
erik- location field problem. Please take a look at it.
Assignee: ftang → erik
Updated•25 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [PDT+] [02/11/00] → [PDT+] [02/16/00]
Comment 23•25 years ago
|
||
FYI, IMAP URLs are also specified as being in UTF-8
Updated•25 years ago
|
Whiteboard: [PDT+] [02/16/00] → [PDT+] 02/16
Updated•25 years ago
|
Whiteboard: [PDT+] 02/16 → [PDT+] 02/18
Comment 24•25 years ago
|
||
cc'd nhotta
Comment 25•25 years ago
|
||
Updated the summary because this affects non-ASCII, not just double-byte. Running the 2/14 sweetlou build. I typed and copied the following from the Windows Character Map accessory and pasted it into the location bar: français español You got search results back: Search Results for 'français español' The strings sent to the search engine was not coverted correctly. Is the search from the location bar, also controlled by sherlock files? If there are no sherlock files for a given server, what is the default behavior?
Summary: Submitting double byte keyword in location field disabled → Submitting non-ASCII keyword in location field disabled
Comment 26•25 years ago
|
||
I forgot, here is what was displayed in the location bar after hitting return: http://search.netscape.com/cgi-bin/search?search=fran%C3%A7ais%20espa%C3%B1ol I would expect: http://search.netscape.com/cgi-bin/search?search=fran%E7ais%20espa%F1ol
Comment 27•25 years ago
|
||
It looks like it creates the URL for the search by %-encodeding the UTF-8 strings without converting to Latin1 first.
Reporter | ||
Comment 28•25 years ago
|
||
bobj, perhaps your comment above is for bug 25034. This bug is about not being able to commit CJK input (IK) in location field as originally described.
Comment 29•25 years ago
|
||
Cc'ing Andreas and Jud.
Comment 30•25 years ago
|
||
blee, Thanks for setting me straight on this bug. Now (using 2000021408 build), I typed in the location bar: keyword français español And the resulting URL (after hitting return) displayed in the location bar: http://keyword.netscape.com/keyword/keyword%20fran%E7ais%20espa%F1ol And the keyword results page displays Search Results for 'keyword fran軋is espa�ol' Next I typed: keyword [tou][shiba] (where [tou][shiba] are 2 kanji characters0 And the resulting URL (after hitting return) displayed in the location bar: http://keyword.netscape.com/keyword/keyword%20.. And the keyword results page displays Search Results for 'keyword ..'
Assignee | ||
Comment 31•25 years ago
|
||
I set a break point in nsWebShell::LoadURL the data looks ok when it reach it.
Comment 32•25 years ago
|
||
Frank, are you going to take this bug? If not, reassign it to me. Thanks!
Assignee: erik → ftang
Status: ASSIGNED → NEW
Whiteboard: [PDT+] 02/18 → [PDT+]
Assignee | ||
Comment 33•25 years ago
|
||
I should take a look at BrowserLoadURL in navigator.js
Status: NEW → ASSIGNED
Assignee | ||
Comment 34•25 years ago
|
||
Found the problem, at lease part of it. mozilla/netwerk/base/public/nsNetUtil.h 53 inline nsresult 54 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull) 55 { 56 // XXX if the string is unicode, GetBuffer() returns null. 57 // XXX we need a strategy to deal w/ unicode specs (if there should 58 // XXX even be such a thing) 59 char* specStr = spec.ToNewCString(); // this forces a single byte char* 60 if (specStr == nsnull) 61 warren 1.1 return NS_ERROR_OUT_OF_MEMORY; 62 nsresult rv = NS_NewURI(result, specStr, baseURI); 63 nsAllocator::Free(specStr); 64 return rv; 65 } This spec.ToNewCString(); convert non ASCII to 0x2e !!!
Comment 35•25 years ago
|
||
ugh. ftang, please read my comments. I basically point out what code needs to change (and it's not in necko).
Assignee | ||
Comment 36•25 years ago
|
||
sorry, I miss some comment before I put in mime. 1. about adding Convert() to nsITextToSubURI.idl- there are no need to do that, if you need convert unicode to some charset, you call nsIUnicodeEncoder (nsITextToSubURI call that one indrectly). If you want to convert to UTF8 in char*, you call the nsString::ToNewUTF8String() 2. "code needs to change (and it's not in necko)" ? I think this is debatable- if necko have an interface which accept nsString or PRUnichar* , then itself have to deal with Unicode. You cannot design an interface which using float as type but expect people only passing round down number in the float, right ? The same reason you have to deal with Unicode. And once you deal with unicode, your code should not TRUNCATE data. I do agree part of the right fix should be in the nsWebShell. However, what about other cases ? What happen if other code pass unicode in and you do the same data damage ? The bottom line is you have to change NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI ) to add assertion and return error while the spec contains non ASCII data . Silently pass damage data is not acceptable. You should add the following line between line 56-58 PRUnichar* ck = spec.GetUnicode(); while(*ck) { NS_ASSERTION((*ck >= 0x0080), "non ASCII in URI"); if(*ck++ >= 0x0080) { retrun NS_ERROR_INVALID_ARG; } }
Assignee | ||
Comment 37•25 years ago
|
||
The problem is not in single place of LoadURL- 1. First try 1998 rv = NS_NewURI(getter_AddRefs(uri), urlStr, nsnull); what should we do ? 2. Second try 2005 convertFileToURL(urlStr, urlSpec); 2006 rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull I think this is File url. I think we need to convert to file system charset before calling this NewURI 3. third try- 2042 nsAutoString keywordSpec("keyword:"); 2043 keywordSpec.Append(aURLSpec); 2044 rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, nsnull); this is keyword url. I think we should convert to UTF8 for keyword url 4. forth try- 2088 rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull); This is for ftp or http url. What should we do ? Convert to UTF8 ? 5. fifth- 2186 res = NS_NewURI(getter_AddRefs(newURI), urlstr, nsnull); This urlstr could be the one from the above 4, or mURL. What should we do ? I think this is one after process ? I think we should keep it as the way it is now. This is very nasty issue.
Assignee | ||
Comment 38•25 years ago
|
||
So... we need to divid the problem into at least the following sub problem. 1. full url 2. file url- depend on bug 28171 , and then we should convert the convertFileToURL() output of into file system encoding (get from nsIPlatformCharset ) before passing to NS_NewURI() as char* version 3. keyword url- convert to UTF8 (?) and then pass to NS_NewURI as char* version (?) 4. http/ftp url ? convert to UTF8 ? or the system charset ???? (backward compatability should be consider )
Depends on: 28171
Assignee | ||
Comment 39•25 years ago
|
||
The other way we can do is we always convert to UTF8 in webshell, and let the code in mozilla/netwerk/protocol convert it back to the correct charset. For example- we could still convert the file URL into UTF8 and let the file protcol handler convert the escaped UTF8 back to the Unicode, and then convert to file system charset before calling the file open. If this is the approach, then we should Change the NS_NewURI and replace char* specStr = spec.ToNewCString(); to char* specStr = spec.ToNewUTF8String(); And then in the protocol handler, we change it depend on the protocol. I think this is how we did it in the old libnet. For example- we should convert escaped UTF8 in imap4 protcol into Unicode and then convert to x-imap4-modified-utf7 charset for the mailbox in the imap4 protocol handler.
Comment 40•25 years ago
|
||
Two things: I suggested the new Convert() method because the current one does the escaping which we don't want. necko is particular about having escaped/non-escaped strings handed to it. Also, remember that necko/protocols do not allow unicode to be passed in, or used. I agree that our apis advertise otherwise; which bites. Converting to anything other than UTF8 for necko use doesn't make sense. I think your assertion looks good. And becuase that's a utility function, the ToNewUTF8...() is probably a good idea. I assume that won't trash a true char array? I still think the string should be UTF8'ed at the top of the webshel.
Comment 41•25 years ago
|
||
About the ToNewCString in NS_NewURI -- didn't we agree that the string passed into NewURI is UTF-8? That would make the use of ToNewCString here wrong, wouldn't it?
Assignee | ||
Comment 42•25 years ago
|
||
>I suggested the new Convert() method because the current one does >the escaping which we don't want. necko is particular about having >escaped/non-escaped strings handed to it. I understand you need it in an non-escaped form. My point is such thing is already exist, just not in nsIUnicodeEncoder and does not make sense to duplicate add a new method in nsITextToSubURI. simply call nsIUnicodeEncoder::Convert() instead. >I still think the string should be UTF8'ed at the top of the webshel. I do agree taht we need to change webshell. However, as long as you keep a nsString favor of NS_NewURI, you open a *possibility* for people to pass in non ASCII data in nsString and you have to deal with the conversion (in ToNewCString, ToNewUTF8String or some other form.) Remember, this code is currently ALREADY do conversion- a bad one which WILL damage data. Untill you remove the NEED for conversion, you should make the conversion not damage data- and I simply suggest you use ToNewUTF8String so the data will be preserved. Also, the www is MOVING to use UTF8 in URL as default. (see http://www.ietf.org/rfc/rfc2718.txt http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt, and http://www.w3.org/International/O-URL-and-ident.html for details) Which mean all the NEW protcol will be encouraged to use UTF8 in escaped form for URL. Replace the ToNewCString to ToNewUTF8String in NS_NewURI will make it forward compatabile with new protocol. Of course, I don't believe all the existing protocol will migrate to UTF-8 within 5 years..... >About the ToNewCString in NS_NewURI -- didn't we agree that the string passed >into NewURI is UTF-8? That would make the use of ToNewCString here wrong, >wouldn't it? I think you are confused between NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull) and NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull) I think you probably agree the const char* version is passed in AS UTF8 in char* (there are no way you can pass in anything other than UCS2 / PRUnichar in nsString) If that is your agreement, then the current version of the nsString favor of NS_NewURI is not compliant to your decision because it does not convert to UTF8 before it call the char* version of NS_NewURI 41 inline nsresult 42 NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull) 43 { 44 nsresult rv; 45 static NS_DEFINE_CID(kIOServiceCID, NS_IOSERVICE_CID); 46 NS_WITH_SERVICE(nsIIOService, serv, kIOServiceCID, &rv); 47 if (NS_FAILED(rv)) return rv; 48 49 rv = serv->NewURI(spec, baseURI, result); 50 return rv; 51 } 52 53 inline nsresult 54 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull) 55 { 56 // XXX if the string is unicode, GetBuffer() returns null. 57 // XXX we need a strategy to deal w/ unicode specs (if there should 58 // XXX even be such a thing) 59 char* specStr = spec.ToNewCString(); // this forces a single byte char* 60 if (specStr == nsnull) 61 return NS_ERROR_OUT_OF_MEMORY; 62 nsresult rv = NS_NewURI(result, specStr, baseURI); 63 nsAllocator::Free(specStr); 64 return rv; 65 } If you assume the spec in line 42 is in UTF8 (as what I read from warren's last comment), then you SHOULD convert spec in line 54 into UTF8 before it pass to NS_NewURI in line 62. Currently, the ToNewCString() in line 59 does NOT convert it to UTF8. It simply truncate the higher 8 bits in the PRUnichar and cast it down to char. That conversion will damage any non ISO-8859-1 data and should only be called by the case that only ASCII in nsString. If you guys already have agreement that the char* version of NS_NewURI is expecting UTF8, then change line 59 to ToNewUTF8String simply make the code implement what you agree upon. Basically, warren is RIGHT. add travis to the cc list since he is webshell owner now.
No longer depends on: 28197
Assignee | ||
Comment 43•25 years ago
|
||
I fix 28197 and now it is ToNewUTF8String. Now the problem is somehow the GetSpec of keyword: URI does not escape the UTF8 and cause some double conversion in nsWebShell (get the utf8 back and assign to nsString as Latin 1 and convert to utf8 again ...) Should we always escape it when we call GetSpec ?
Comment 44•25 years ago
|
||
cc'ing andreas for some escaping input.
Assignee | ||
Comment 45•25 years ago
|
||
As valeski suggest, I change the keyword protocol handler to use the latest url parser instead of the simple URI one. Here is the patch (file internal to netscape - http://warp/u/ftang/tmp/keyworddiff.txt) ? keyword/src/nsKeywordProtocolHandler.sbr ? keyword/src/nsKeywordModule.sbr Index: keyword/src/nsKeywordProtocolHandler.cpp =================================================================== RCS file: /m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v retrieving revision 1.12 diff -c -r1.12 nsKeywordProtocolHandler.cpp *** nsKeywordProtocolHandler.cpp 2000/01/12 22:53:12 1.12 --- nsKeywordProtocolHandler.cpp 2000/02/22 00:25:08 *************** *** 30,36 **** #include "nsIPref.h" #include "nsXPIDLString.h" ! static NS_DEFINE_CID(kSimpleURICID, NS_SIMPLEURI_CID); static NS_DEFINE_CID(kIOServiceCID, NS_IOSERVICE_CID); static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID); --- 30,37 ---- #include "nsIPref.h" #include "nsXPIDLString.h" ! static NS_DEFINE_CID(kStandardUrlCID, NS_STANDARDURL_CID); ! static NS_DEFINE_CID(kAuthUrlParserCID, NS_AUTHORITYURLPARSER_CID); static NS_DEFINE_CID(kIOServiceCID, NS_IOSERVICE_CID); static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID); *************** *** 100,124 **** // build up a request to the keyword server. nsCAutoString query; // pull out the "go" action word, or '?', if any ! char one = aSpec[0], two = aSpec[1]; if (one == '?') { // "?blah" ! query = aSpec+1; } else if ( (one == 'g' || one == 'G') // && // (two == 'o' || two == 'O') // "g[G]o[O] blah" && // ! (aSpec[12] == ' ') ) { // ! query = aSpec+3; } else { ! query = aSpec; } query.Trim(" "); // pull leading/trailing spaces. // encode char * encQuery = nsEscape(query.GetBuffer(), url_Path); if (!encQuery) return nsnull; query = encQuery; nsAllocator::Free(encQuery); --- 101,133 ---- // build up a request to the keyword server. nsCAutoString query; + + char *unescaped = nsCRT::strdup(aSpec); + if(unescaped == nsnull) + return nsnull; + nsUnescape(unescaped); + char *pt = unescaped +1; + // pull out the "go" action word, or '?', if any ! char one = pt[0], two = pt[1]; if (one == '?') { // "?blah" ! query = pt+1; } else if ( (one == 'g' || one == 'G') // && // (two == 'o' || two == 'O') // "g[G]o[O] blah" && // ! (pt[12] == ' ') ) { // ! query = pt+3; } else { ! query = pt; } query.Trim(" "); // pull leading/trailing spaces. // encode char * encQuery = nsEscape(query.GetBuffer(), url_Path); + nsAllocator::Free(unescaped); if (!encQuery) return nsnull; query = encQuery; nsAllocator::Free(encQuery); *************** *** 133,151 **** // digests a spec of the form "keyword:blah" NS_IMETHODIMP nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI, ! nsIURI **result) { ! nsresult rv; ! nsIURI* uri; ! rv = nsComponentManager::CreateInstance(kSimpleURICID, nsnull, NS_GET_IID(nsIURI), (void**)&uri); ! if (NS_FAILED(rv)) return rv; ! rv = uri->SetSpec((char*)aSpec); if (NS_FAILED(rv)) return rv; ! ! *result = uri; return rv; } - NS_IMETHODIMP nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri, nsILoadGroup* aLoadGroup, --- 142,179 ---- // digests a spec of the form "keyword:blah" NS_IMETHODIMP nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI, ! nsIURI **result) ! { ! nsresult rv = NS_OK; ! nsCOMPtr<nsIURI> url; ! nsCOMPtr<nsIURLParser> urlparser; ! if (aBaseURI) ! { ! rv = aBaseURI->Clone(getter_AddRefs(url)); ! if (NS_FAILED(rv)) return rv; ! rv = url->SetRelativePath(aSpec); ! } ! else ! { ! rv = nsComponentManager::CreateInstance(kAuthUrlParserCID, ! nsnull, NS_GET_IID(nsIURLParser), ! getter_AddRefs(urlparser)); ! if (NS_FAILED(rv)) return rv; ! rv = nsComponentManager::CreateInstance(kStandardUrlCID, ! nsnull, NS_GET_IID(nsIURI), ! getter_AddRefs(url)); ! if (NS_FAILED(rv)) return rv; ! ! rv = url->SetURLParser(urlparser); ! if (NS_FAILED(rv)) return rv; ! rv = url->SetSpec((char*)aSpec); ! } if (NS_FAILED(rv)) return rv; ! *result = url.get(); ! NS_ADDREF(*result); return rv; } NS_IMETHODIMP nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri, nsILoadGroup* aLoadGroup, After I change the implementation of NewURI as the HTTP one, I find out I also need to change the Mangle function . This will fix the international problem. valeski- could you please code review it ?
Whiteboard: [PDT+] → [PDT+]need code review.
Comment 46•25 years ago
|
||
I disagree with this fix. Keyword URLs should be simple URLs and not standard URLs. Why are you doing this?
Assignee | ||
Comment 47•25 years ago
|
||
If we don't do this, then the GetSpec won't return URL in escaped form and may contains char* in UTF8. And the nsWebShell will assign that char* to nsString in nsWebShell. Why this should be simple URLs ? Any reason ? Should GetSpec always return url in escped form ? Andrea's message that valeski point to me say GetSpec always return esacpe form but the current (without my patch) implementation does not reflect that.
Comment 48•25 years ago
|
||
> Why this should be simple URLs ? Any reason ? Because keywords don't have the syntax protocol://user@host:port/path/file.ext They're just keyword:randomtext > Should GetSpec always return url in escped form ? For nsStdURL it should -- but that's an aspect of the syntax that's being defined, not SetSpec in general (since some syntaxes have no notion of escaping). If you need escaping in keyword: urls, I suggest inventing something new or handling escaping outside of the use of nsSimpleURI, but don't use nsStdURL for keywords. You'll get all sorts of unwanted side effects.
Comment 49•25 years ago
|
||
I'm not sure how this code plugs together, but here is an example of what needs to work. A non-ASCII keyword: keyword:français will need to be escaped (%-encoded UTF8) when it converts to the HTTP url: http://keyword.netscape.com/keyword/fran%C3%A7ais
Comment 50•25 years ago
|
||
That escaping should happen in the keyword protocol handler at the point it constructs the http: url from the keyword: url. It should not happen in the keyword: url (the instance of nsSimpleURI). Jud - can you make this change. The conversion to utf8 should be done by the webshell.
Comment 51•25 years ago
|
||
sure. is this escaping done using nsEscape() or using some i18n escaping api?
Comment 52•25 years ago
|
||
ugh. what a mess. warren, the issue is that the url returned from keywordHandler;:NewURI() (which returns something a url of the form "keyword:sometext") can cause string corruption (because it doesn't escape the spec). Later in ::NewChannel() we call GetSpec() on the URI passed in (which was previously created using keywordhnadler::newuri()). That GetSpec() call will return the corrupted spec. I'm seeing Frank's point here. the escaping/unescaping policies for simple url and std url need to be the same. For now we can either pass in an escaped string when creating a keyword url, or have keyword urls be stdurls.
Assignee | ||
Comment 53•25 years ago
|
||
bobj/warren- what bobj said is NOT the remaining problem. The UCS2 to UTF8 conversion part is already fixed last Friday when we change the ToNewCString to ToNewUTF8String. The remaining problem I try to fix is related to the keyword: uri and history code (including switch default charset to UTF-8 since the keyword search result page return in UTF8 without meta tag or HTTP charset parameter. ) What happen is this 1. the keyword uri convert to the http uri correctly and the page load correctly 2. The webshell call GetSpec() of that uri 3. The GetSpec() return escaped form in all the protocol except "keyword:", which return unescaped utf8 4. webshell passed that return value to history code, which aceept char* (not nsString nor char*) 5. The history code convert the char* (which not in UTF8) to nsString by apply ISO-8859-1 to Unicode conversion rule (by default) 6. When we try to load again, the NS_NewURI convert the nsString into char* by apply Unicode to UTF8 rule. The problem is that the webshell/history code currently assume the GetSpec return in escaped form, which reuqire no special treatment for non ASCII. That is true for all protocol except "keyword:". What happen is now the first time you type keyword: it will work, but if you hit Back/Forward or change encoding (which is very common for now since the keyword search page is not return in UTF8 without meta tag nor http header) then the 2nd and so on loading will be wrong, each time it will apply a ISO-8859-1 to Unicode to UTF8 conversion which will cause garbage in the 2nd and later loading. From the i18n problem point of view, I really don't care keyword uri is standard or simple. I do care the GetSpec from all different URI return in escape or in unescape.
Assignee | ||
Comment 54•25 years ago
|
||
I agree w/ warren and valeski now, I won't change the keyword uri from simple url to standard url now. Howerver, I do need to make the Manguling rounting unescaped first since it looking at ? and space. New patch is in here (Notice the converFileToURL is for bug28171) http://warp/u/ftang/tmp/webshelldiff1.txt Index: nsWebShell.cpp =================================================================== RCS file: /m/pub/mozilla/webshell/src/nsWebShell.cpp,v retrieving revision 1.396 diff -c -r1.396 nsWebShell.cpp *** nsWebShell.cpp 2000/02/19 02:54:32 1.396 --- nsWebShell.cpp 2000/02/22 05:10:51 *************** *** 87,92 **** --- 87,93 ---- #include "nsIDocShellTreeOwner.h" #include "nsCURILoader.h" #include "nsIDOMWindow.h" + #include "nsEscape.h" #include "nsIHTTPChannel.h" // add this to the ick include list...we need it to QI for post data interface #include "nsHTTPEnums.h" *************** *** 1375,1419 **** static void convertFileToURL(const nsString &aIn, nsString &aOut) { ! char szFile[1000]; ! aIn.ToCString(szFile, sizeof(szFile)); #ifdef XP_PC // Check for \ in the url-string (PC) ! if (PL_strchr(szFile, '\\')) { #else #if XP_UNIX // Check if it starts with / or \ (UNIX) ! if (*(szFile) == '/' || *(szFile) == '\\') { #else if (0) { // Do nothing (All others for now) #endif #endif - PRInt32 len = strlen(szFile); - PRInt32 sum = len + sizeof(FILE_PROTOCOL); - char* lpszFileURL = (char *)PR_Malloc(sum + 1); #ifdef XP_PC // Translate '\' to '/' ! for (PRInt32 i = 0; i < len; i++) { ! if (szFile[i] == '\\') { ! szFile[i] = '/'; ! } ! if (szFile[i] == ':') { ! szFile[i] = '|'; ! } ! } #endif // Build the file URL ! PR_snprintf(lpszFileURL, sum, "%s%s", FILE_PROTOCOL, szFile); ! aOut = lpszFileURL; ! PR_Free((void *)lpszFileURL); } - else - { - aOut = aIn; - } } NS_IMETHODIMP --- 1376,1406 ---- static void convertFileToURL(const nsString &aIn, nsString &aOut) { ! aOut = aIn; #ifdef XP_PC // Check for \ in the url-string (PC) ! if(kNotFound != aIn.FindChar(PRUnichar('\\'))) { #else #if XP_UNIX // Check if it starts with / or \ (UNIX) ! const PRUnichar * up = aIn.GetUnicode(); ! if((PRUnichar('/') == *up) || ! (PRUnichar('\\') == *up)) { #else if (0) { // Do nothing (All others for now) #endif #endif #ifdef XP_PC // Translate '\' to '/' ! aOut.ReplaceChar(PRUnichar('\\'), PRUnichar('/')); ! aOut.ReplaceChar(PRUnichar(':'), PRUnichar('|')); #endif // Build the file URL ! aOut.Insert(FILE_PROTOCOL,0); } } NS_IMETHODIMP *************** *** 2004,2009 **** --- 1991,2001 ---- // see if we've got a file url. convertFileToURL(urlStr, urlSpec); + + // XXX + // for file url, we need to convert the nsString to the file system + // charset before we pass to NS_NewURI + rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull); if (NS_FAILED(rv)) { NS_ASSERTION(mPrefs, "the webshell's pref service wasn't initialized"); *************** *** 2014,2019 **** --- 2006,2012 ---- PRInt32 qMarkLoc = -1, spaceLoc = -1; if (keywordsEnabled) { + rv = NS_ERROR_FAILURE; // These are keyword formatted strings // "what is mozilla" // "what is mozilla?" *************** *** 2040,2057 **** } if (keyword) { ! nsAutoString keywordSpec("keyword:"); ! keywordSpec.Append(aURLSpec); ! rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, nsnull); ! } else { ! rv = NS_ERROR_FAILURE; ! } ! } else { ! rv = NS_ERROR_FAILURE; ! } ! } else { ! rv = NS_ERROR_FAILURE; ! } PRInt32 colon = -1; if (NS_FAILED(rv)) { --- 2033,2053 ---- } if (keyword) { ! nsCAutoString keywordSpec("keyword:"); ! char *utf8Spec = urlStr.ToNewUTF8String(); ! if(utf8Spec) { ! char* escapedUTF8Spec = nsEscape(utf8Spec, url_Path); ! if(escapedUTF8Spec) { ! keywordSpec.Append(escapedUTF8Spec); ! rv = NS_NewURI(getter_AddRefs(uri), keywordSpec.GetBuffer(), nsnull); ! nsAllocator::Free(escapedUTF8Spec); ! } // escapedUTF8Spec ! nsAllocator::Free(utf8Spec); ! } // utf8Spec ! } // keyword ! } // FindChar ! } // keywordEnable ! PRInt32 colon = -1; if (NS_FAILED(rv)) { *************** *** 2118,2124 **** rv = uri->GetSpec(getter_Copies(spec)); if (NS_FAILED(rv)) return rv; ! // Get hold of Root webshell nsCOMPtr<nsIWebShell> root; nsCOMPtr<nsISessionHistory> shist; --- 2114,2120 ---- rv = uri->GetSpec(getter_Copies(spec)); if (NS_FAILED(rv)) return rv; ! // Get hold of Root webshell nsCOMPtr<nsIWebShell> root; nsCOMPtr<nsISessionHistory> shist; http://warp/u/ftang/tmp/keyworddiff2.txt ? src/nsKeywordProtocolHandler.sbr ? src/nsKeywordModule.sbr Index: src/nsKeywordProtocolHandler.cpp =================================================================== RCS file: /m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v retrieving revision 1.12 diff -c -r1.12 nsKeywordProtocolHandler.cpp *** nsKeywordProtocolHandler.cpp 2000/01/12 22:53:12 1.12 --- nsKeywordProtocolHandler.cpp 2000/02/22 05:11:54 *************** *** 97,119 **** // digests a spec _without_ the preceeding "keyword:" scheme. static char * MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) { // build up a request to the keyword server. nsCAutoString query; // pull out the "go" action word, or '?', if any ! char one = aSpec[0], two = aSpec[1]; if (one == '?') { // "?blah" ! query = aSpec+1; } else if ( (one == 'g' || one == 'G') // && // (two == 'o' || two == 'O') // "g[G]o[O] blah" && // ! (aSpec[12] == ' ') ) { // ! query = aSpec+3; } else { ! query = aSpec; } query.Trim(" "); // pull leading/trailing spaces. --- 97,127 ---- // digests a spec _without_ the preceeding "keyword:" scheme. static char * MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) { + char * unescaped = nsCRT::strdup(aSpec); + if(unescaped == nsnull) + return nsnull; + + nsUnescape(unescaped); + // build up a request to the keyword server. nsCAutoString query; // pull out the "go" action word, or '?', if any ! char one = unescaped[0], two = unescaped[1]; if (one == '?') { // "?blah" ! query = unescaped+1; } else if ( (one == 'g' || one == 'G') // && // (two == 'o' || two == 'O') // "g[G]o[O] blah" && // ! (unescaped[12] == ' ') ) { // ! query = unescaped+3; } else { ! query = unescaped; } + + nsAllocator::Free(unescaped); query.Trim(" "); // pull leading/trailing spaces.
Comment 55•25 years ago
|
||
Who's reviewing this? warren? valeski? The sooner we get bug fixes checked-in, the more time we've got to make a stable Beta1! Thx.
Comment 56•25 years ago
|
||
I left a message for Frank to call me about this, but haven't heard from him today.
Comment 57•25 years ago
|
||
still trying to get a review. ftang and warren have been playing tag since 2/21 and have not yet discussed ftang's fix...
Comment 58•25 years ago
|
||
Somehow my last comment in this bug seems to have gotten lost. I bounced this back to Frank because I don't want to put these routines into necko right now. I think he should check them in as is.
Comment 59•25 years ago
|
||
Ok, sorry, I confused this bug with bug 28784 which is remarkably similar. My comments went there. Also see bug 29341 which is how necko should fix this problem.
Assignee | ||
Updated•25 years ago
|
Whiteboard: [PDT+]need code review. → [PDT+]schedule to check in 2/28 night
Assignee | ||
Updated•25 years ago
|
Target Milestone: M14 → M15
Assignee | ||
Comment 60•25 years ago
|
||
I will check this in tonight.
Comment 61•25 years ago
|
||
a=bobj
Assignee | ||
Comment 62•25 years ago
|
||
fix and check in
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 63•25 years ago
|
||
Verified fixed in Win32 02-29-09-M15 bld.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•