Closed Bug 14155 Opened 25 years ago Closed 25 years ago

Submitting non-ASCII keyword in location field disabled

Categories

(Core :: Internationalization, defect, P3)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: blee, Assigned: ftang)

References

Details

(Whiteboard: [PDT+]schedule to check in 2/28 night)

To see this,
Type in a JA keyword (eg, "tousiba" in hiragana) and hit return.
==> The text input disappears. Instead a horizontal line (rotated L shape)
    appears. Hitting backspace bar at this point will display "http://" in
    the same field.  Observed in Win32 99-09-16-09-M11 bld.
QA Contact: teruko → blee
Assignee: ftang → rjc
rjc, are you working on internet keyword ? Let me know what can I help. Also,
visit blee's cube to ask him to reproduce this bug to you.Thanks
Assignee: rjc → warren
Nope, I'm not working on internet keywords, which are being implemented in
Necko.  Warren, any idea here?
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
*** This bug has been marked as a duplicate of 888 ***
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The problem still happens in Win32 12-02-08-M12 bld. reopening
Assignee: warren → valeski
Status: REOPENED → NEW
Target Milestone: M14
Summary: Submitting double byte keyword in location field disabled → [BETA] Submitting double byte keyword in location field disabled
We need this bug fixed by M13 so that proper testing can be done after the
feature works with JA characters. If it's fixed by M14, we won't be able to test
it before beta.
Can I enter unicode chars into the location bar using a US keyboard? If you
can't even get the chars to display, this is a widget bug, not a necko bug (yet
;).
Text input (CJK, High ASCII, etc) is displayed while they're typed in. It's the
result of committing the input that is broken. In a recent bld (Win 12-02-08-
M12), carriage return after JA input displays "http://" in location field.
can I enter unicode chars using my US keyboard?
Unicode is used for internal conversion of char's in client. If you're asking
whether you can type non ASCII char's using US KB, the answer is yes. Try Alt+
0192 (or 0193, 0196, etc) using US KB. If you want other specific char's
belonging to a certain language, you need to install that particular KB driver
from Ctrl panel.
This is part of a larger problem (I can't find the true bug right now) which is
that no protocols can handle unicode right now.
I disagree with that last comment. I think the location bar needs to convert the
unicode to utf-8 before calling necko. That's the way we agreed that nsIURI
should work.
You're right. My fault. I had forgotten our conclusion re: the unicode URI
issue. So how does this conversion work?
You should ask one of the i18n guys how to do this.
Keywords: beta1
Putting on PDT+ radar for beta1.
Whiteboard: [PDT+]
Add nhotta, erik and bobj to the cc. Is that true we agree to use escaped UTF-8 
for internet keyword for 4.x ? 

For Unicode conversion- talk to naoki. He have a function which will convert 
Unicode to escaped UTF-8. (and back)
Yes, Netscape 4.X and Netcenter's keyword server agreed to use UTF-8 for
Internet Keywords. Since these strings are passed as URLs, they need to be
"URL-encoded", i.e. %XX (percent followed by hex digits).
Here is the conversion interface Frank mentioned.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/idl/nsITextToSubURI.idl
I attached examples of the usage in bug 25034.
Summary: [BETA] Submitting double byte keyword in location field disabled → Submitting double byte keyword in location field disabled
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] [02/11/00]
well, I can't enter any Unicode in to the location bar now. I'm trying the 
Alt+0192 trick.
With 2/9/00 Win32 bld, I could enter extended char's in to the location bar using 
Alt key combination. CJK char input still can't be committed as explained in the 
original problem description.
I still can't enter unicode. are you using a home grown build or a release?

I may have to repair this blindly (without testing).

Here's what I've done. I've added a method to nsITextToSubURI.idl called 
"Convert()" it does everything convertandescape does, except the escape. I far 
from an expert on string conversion, but we don't want to escape the URL before 
we try and parse it because special chars will be escaped that we need to use. 
Thus I want to convert to UTF8 (what is the actual charset string I should be 
using in the api?) without losing the original URI...??? Is that even possible? 
Maybe someone on the i18n team should get this bug? The bottom line is URI's 
aren't unicode and this is callsite cleanup. I need traction on this. Either 
someone in the i18n group should get this bug or someone needs to call me or be 
a little more responsive.
ok. over to ftang per blee's suggestion.

ftang, the code in question is here 
http://lxr.mozilla.org/seamonkey/source/webshell/src/nsWebShell.cpp#2034 . The 
URLspec passed into LoadURL() is the one we want to get down to UTF8. I'm not 
sure if we want to do this for *all* urls (probably) or just keyword urls. Call 
me if you have questions about the call site.
Assignee: valeski → ftang
Status: ASSIGNED → NEW
erik- location field problem. Please take a look at it.
Assignee: ftang → erik
Status: NEW → ASSIGNED
Whiteboard: [PDT+] [02/11/00] → [PDT+] [02/16/00]
FYI, IMAP URLs are also specified as being in UTF-8
Whiteboard: [PDT+] [02/16/00] → [PDT+] 02/16
Whiteboard: [PDT+] 02/16 → [PDT+] 02/18
cc'd nhotta
Updated the summary because this affects non-ASCII, not just double-byte.

Running the 2/14 sweetlou build.

I typed and copied the following from the Windows Character Map accessory
and pasted it into the location bar:
   français español

You got search results back:
   Search Results for 'français español'

The strings sent to the search engine was not coverted correctly.
Is the search from the location bar, also controlled by sherlock files?
If there are no sherlock files for a given server, what is the default
behavior?
Summary: Submitting double byte keyword in location field disabled → Submitting non-ASCII keyword in location field disabled
I forgot, here is what was displayed in the location bar after hitting return:
  http://search.netscape.com/cgi-bin/search?search=fran%C3%A7ais%20espa%C3%B1ol

I would expect:
  http://search.netscape.com/cgi-bin/search?search=fran%E7ais%20espa%F1ol
It looks like it creates the URL for the search by %-encodeding the UTF-8
strings without converting to Latin1 first.
bobj, perhaps your comment above is for bug 25034. This bug is about not being 

able to commit CJK input (IK) in location field as originally described.

Cc'ing Andreas and Jud.
blee, Thanks for setting me straight on this bug.

Now (using 2000021408 build), I typed in the location bar:
    keyword français español

And the resulting URL (after hitting return) displayed in the location bar:
    http://keyword.netscape.com/keyword/keyword%20fran%E7ais%20espa%F1ol

And the keyword results page displays
    Search Results for 'keyword fran軋is espa�ol'

Next I typed:
    keyword [tou][shiba]   (where [tou][shiba] are 2 kanji characters0

And the resulting URL (after hitting return) displayed in the location bar:
    http://keyword.netscape.com/keyword/keyword%20..

And the keyword results page displays
    Search Results for 'keyword ..'
I set a break point in nsWebShell::LoadURL the data looks ok when it reach it.
Frank, are you going to take this bug? If not, reassign it to me. Thanks!
Assignee: erik → ftang
Status: ASSIGNED → NEW
Whiteboard: [PDT+] 02/18 → [PDT+]
I should take a look at BrowserLoadURL in navigator.js
Status: NEW → ASSIGNED
Found the problem, at lease part of it.
mozilla/netwerk/base/public/nsNetUtil.h
 53             inline nsresult
 54             NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI 
= nsnull)
 55             {
 56                 // XXX if the string is unicode, GetBuffer() returns null. 
 57                 // XXX we need a strategy to deal w/ unicode specs (if there 
should
 58                 // XXX even be such a thing)
 59                 char* specStr = spec.ToNewCString(); // this forces a single 
byte char*
 60                 if (specStr == nsnull)
 61 warren  1.1         return NS_ERROR_OUT_OF_MEMORY;
 62                 nsresult rv = NS_NewURI(result, specStr, baseURI);
 63                 nsAllocator::Free(specStr);
 64                 return rv;
 65             }

This spec.ToNewCString(); convert non ASCII to 0x2e !!!
ugh. ftang, please read my comments. I basically point out what code needs to 
change (and it's not in necko).
sorry, I miss some comment before I put in mime.
1. about adding Convert() to nsITextToSubURI.idl- there are no need to do that, 
if you need convert unicode to some charset, you call nsIUnicodeEncoder 
(nsITextToSubURI call that one indrectly). If you want to convert to UTF8 in 
char*, you call the nsString::ToNewUTF8String()
2. "code needs to change (and it's not in necko)" ? I think this is debatable- 
if necko have an interface which accept nsString or PRUnichar* , then itself 
have to deal with Unicode. You cannot design an interface which using float as 
type but expect people only passing round down number in the float, right ? The 
same reason you have to deal with Unicode. And once you deal with unicode, your 
code should not TRUNCATE data. I do agree part of the right fix should be in the 
nsWebShell. However, what about other cases ? What happen if other code pass 
unicode in and  you do the same data damage ?
The bottom line is you have to change NS_NewURI(nsIURI* *result, const nsString& 
spec, nsIURI* baseURI ) to add assertion and return error while the spec 
contains non ASCII data . Silently pass damage data is not acceptable. You 
should add the following line between line 56-58
PRUnichar* ck = spec.GetUnicode();
while(*ck)
{
   NS_ASSERTION((*ck >= 0x0080), "non ASCII in URI");
   if(*ck++ >= 0x0080) {
     retrun NS_ERROR_INVALID_ARG;
   }
}
The problem is not in single place of LoadURL-
1. First try
1998   rv = NS_NewURI(getter_AddRefs(uri), urlStr, nsnull);
what should we do ?

2. Second try
2005     convertFileToURL(urlStr, urlSpec);
2006     rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull
I think this is File url. I think we need to convert to file system charset 
before calling this NewURI

3. third try-
2042                   nsAutoString keywordSpec("keyword:");
2043                   keywordSpec.Append(aURLSpec);
2044                   rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, 
nsnull); 
this is keyword url. I think we should convert to UTF8 for keyword url

4. forth try-
2088               rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull);
This is for ftp or http url. What should we do ? Convert to UTF8 ?

5. fifth-
2186   res = NS_NewURI(getter_AddRefs(newURI), urlstr, nsnull);
This urlstr could be the one from the above 4, or mURL. What should we do ? I 
think this is one after process ? I think we should keep it as the way it is 
now.

This is very nasty issue. 
So... we need to divid the problem into at least the following sub problem.
1. full url
2. file url- depend on bug 28171 , and then we should convert the 
convertFileToURL() output of into file system encoding (get from 
nsIPlatformCharset ) before passing to NS_NewURI() as char* version 
3. keyword url- convert to UTF8 (?) and then pass to NS_NewURI as char* version 
(?)
4. http/ftp url ? convert to UTF8 ? or the system charset ???? (backward 
compatability should be consider )
Depends on: 28171
The other way we can do is we always convert to UTF8 in webshell, and let the 
code in mozilla/netwerk/protocol convert it back to the correct charset. For 
example- we could still convert the file URL into UTF8 and let the file protcol 
handler convert the escaped UTF8 back to the Unicode, and then convert to file 
system charset before calling the file open. 
If this is the approach, then we should Change the NS_NewURI and replace char* 
specStr = spec.ToNewCString(); to char* specStr = spec.ToNewUTF8String(); 

And then in the protocol handler, we change it depend on the protocol. I think 
this is how we did it in the old libnet. For example- we should convert escaped 
UTF8 in imap4 protcol into Unicode and then convert to x-imap4-modified-utf7 
charset for the mailbox in the imap4 protocol handler.
Two things: I suggested the new Convert() method because the current one does 
the escaping which we don't want. necko is particular about having 
escaped/non-escaped strings handed to it.

Also, remember that necko/protocols do not allow unicode to be passed in, or 
used. I agree that our apis advertise otherwise; which bites. Converting to 
anything other than UTF8 for necko use doesn't make sense.

I think your assertion looks good. And becuase that's a utility function, the 
ToNewUTF8...() is probably a good idea. I assume that won't trash a true char 
array?

I still think the string should be UTF8'ed at the top of the webshel.
About the ToNewCString in NS_NewURI -- didn't we agree that the string passed 
into NewURI is UTF-8? That would make the use of ToNewCString here wrong, 
wouldn't it?
Depends on: 28197
>I suggested the new Convert() method because the current one does
>the escaping which we don't want. necko is particular about having 
>escaped/non-escaped strings handed to it.
I understand you need it in an non-escaped form. My point is such thing is 
already exist, just not in nsIUnicodeEncoder and does not make sense to 
duplicate add a new method in nsITextToSubURI. simply call 
nsIUnicodeEncoder::Convert() instead.

>I still think the string should be UTF8'ed at the top of the webshel.
I do agree taht we need to change webshell. However, as long as you keep a 
nsString favor of NS_NewURI, you open a *possibility* for people to pass in non 
ASCII data in nsString and you have to deal with the conversion (in 
ToNewCString, ToNewUTF8String or some other form.) Remember, this code is 
currently ALREADY do conversion- a bad one which WILL damage data. Untill you 
remove the NEED for conversion, you should make the conversion not damage data- 
and I simply suggest you use ToNewUTF8String so the data will be preserved.
Also, the www is MOVING to use UTF8 in URL as default. (see 
http://www.ietf.org/rfc/rfc2718.txt 
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt,  and  
http://www.w3.org/International/O-URL-and-ident.html for details)  Which mean 
all the NEW protcol will be encouraged to use UTF8 in escaped form for URL. 
Replace the ToNewCString to ToNewUTF8String in NS_NewURI will make it forward 
compatabile with new protocol. Of course, I don't believe all the existing 
protocol will migrate to UTF-8 within 5 years..... 

>About the ToNewCString in NS_NewURI -- didn't we agree that the string passed 
>into NewURI is UTF-8? That would make the use of ToNewCString here wrong, 
>wouldn't it?
I think you are confused between
 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull)
and
NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull)

I think you probably agree the const char* version is passed in AS UTF8 in char* 
(there are no way you can pass in anything other than UCS2 / PRUnichar in 
nsString) If that is your agreement, then the current version of the nsString 
favor of NS_NewURI is not compliant to your decision because it does not convert 
to UTF8 before it call the char* version of NS_NewURI

 41 inline nsresult
 42 NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull)
 43 {
 44     nsresult rv;
 45     static NS_DEFINE_CID(kIOServiceCID, NS_IOSERVICE_CID);
 46     NS_WITH_SERVICE(nsIIOService, serv, kIOServiceCID, &rv);
 47     if (NS_FAILED(rv)) return rv;
 48     
 49     rv = serv->NewURI(spec, baseURI, result);
 50     return rv;
 51 }
 52 
 53 inline nsresult
 54 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull)
 55 {
 56     // XXX if the string is unicode, GetBuffer() returns null. 
 57     // XXX we need a strategy to deal w/ unicode specs (if there should
 58     // XXX even be such a thing)
 59     char* specStr = spec.ToNewCString(); // this forces a single byte char*
 60     if (specStr == nsnull)
 61         return NS_ERROR_OUT_OF_MEMORY;
 62     nsresult rv = NS_NewURI(result, specStr, baseURI);
 63     nsAllocator::Free(specStr);
 64     return rv;
 65 }
If you assume the spec in line 42 is in UTF8 (as what I read from warren's last 
comment), then you SHOULD convert spec in line 54 into UTF8 before it pass to 
NS_NewURI in line 62. Currently, the ToNewCString() in line 59 does NOT convert 
it to UTF8. It simply truncate the higher 8 bits in the PRUnichar and cast it 
down to char. That conversion will damage any non ISO-8859-1 data and should 
only be called by the case that only ASCII in nsString.

If you guys already have agreement that the char* version of NS_NewURI is 
expecting UTF8, then change line 59 to ToNewUTF8String simply make the code 
implement what you agree upon.
Basically, warren is RIGHT.
add travis to the cc list since he is webshell owner now.

No longer depends on: 28197
I fix 28197 and now it is ToNewUTF8String. Now the problem is somehow the 
GetSpec of keyword: URI does not escape the UTF8 and cause some double 
conversion in nsWebShell (get the utf8 back and assign to nsString as Latin 1 
and convert to utf8 again ...) Should we always escape it when we call GetSpec ?
cc'ing andreas for some escaping input.
As valeski suggest, I change the keyword protocol handler to use the latest url 
parser instead of the simple URI one. Here is the patch
(file internal to netscape - http://warp/u/ftang/tmp/keyworddiff.txt)
? keyword/src/nsKeywordProtocolHandler.sbr
? keyword/src/nsKeywordModule.sbr
Index: keyword/src/nsKeywordProtocolHandler.cpp
===================================================================
RCS file: 
/m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsKeywordProtocolHandler.cpp
*** nsKeywordProtocolHandler.cpp        2000/01/12 22:53:12     1.12
--- nsKeywordProtocolHandler.cpp        2000/02/22 00:25:08
***************
*** 30,36 ****
  #include "nsIPref.h"
  #include "nsXPIDLString.h"
  
! static NS_DEFINE_CID(kSimpleURICID,     NS_SIMPLEURI_CID);
  static NS_DEFINE_CID(kIOServiceCID,     NS_IOSERVICE_CID);
  static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID);
  
--- 30,37 ----
  #include "nsIPref.h"
  #include "nsXPIDLString.h"
  
! static NS_DEFINE_CID(kStandardUrlCID, NS_STANDARDURL_CID);
! static NS_DEFINE_CID(kAuthUrlParserCID, NS_AUTHORITYURLPARSER_CID);
  static NS_DEFINE_CID(kIOServiceCID,     NS_IOSERVICE_CID);
  static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID);
  
***************
*** 100,124 ****
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = aSpec[0], two = aSpec[1];
      if (one == '?') {                            // "?blah"
!         query = aSpec+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (aSpec[12] == ' ') ) {      //
  
!         query = aSpec+3;
      } else {
!         query = aSpec;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
      // encode
      char * encQuery = nsEscape(query.GetBuffer(), url_Path);
      if (!encQuery) return nsnull;
      query = encQuery;
      nsAllocator::Free(encQuery);
--- 101,133 ----
      // build up a request to the keyword server.
      nsCAutoString query;
  
+    
+     char *unescaped = nsCRT::strdup(aSpec);
+     if(unescaped == nsnull)
+       return nsnull;
+     nsUnescape(unescaped);
+     char *pt = unescaped +1;
+ 
      // pull out the "go" action word, or '?', if any
!     char one = pt[0], two = pt[1];
      if (one == '?') {                            // "?blah"
!         query = pt+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (pt[12] == ' ') ) {      //
  
!         query = pt+3;
      } else {
!         query = pt;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
      // encode
      char * encQuery = nsEscape(query.GetBuffer(), url_Path);
+     nsAllocator::Free(unescaped);
      if (!encQuery) return nsnull;
      query = encQuery;
      nsAllocator::Free(encQuery);
***************
*** 133,151 ****
  // digests a spec of the form "keyword:blah"
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI,
!                                nsIURI **result) {
!     nsresult rv;
!     nsIURI* uri;
  
!     rv = nsComponentManager::CreateInstance(kSimpleURICID, nsnull, 
NS_GET_IID(nsIURI), (void**)&uri);
!     if (NS_FAILED(rv)) return rv;
!     rv = uri->SetSpec((char*)aSpec);
      if (NS_FAILED(rv)) return rv;
! 
!     *result = uri;
      return rv;
  }
- 
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri,
                                       nsILoadGroup* aLoadGroup,
--- 142,179 ----
  // digests a spec of the form "keyword:blah"
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI,
!                       nsIURI **result)
! {
!     nsresult rv = NS_OK;
  
!     nsCOMPtr<nsIURI> url;
!     nsCOMPtr<nsIURLParser> urlparser; 
!     if (aBaseURI)
!     {
!         rv = aBaseURI->Clone(getter_AddRefs(url));
!         if (NS_FAILED(rv)) return rv;
!         rv = url->SetRelativePath(aSpec);
!     }
!     else
!     {
!         rv = nsComponentManager::CreateInstance(kAuthUrlParserCID, 
!                                     nsnull, NS_GET_IID(nsIURLParser),
!                                     getter_AddRefs(urlparser));
!         if (NS_FAILED(rv)) return rv;
!         rv = nsComponentManager::CreateInstance(kStandardUrlCID, 
!                                     nsnull, NS_GET_IID(nsIURI),
!                                     getter_AddRefs(url));
!         if (NS_FAILED(rv)) return rv;
! 
!         rv = url->SetURLParser(urlparser);
!         if (NS_FAILED(rv)) return rv;
!         rv = url->SetSpec((char*)aSpec);
!     }
      if (NS_FAILED(rv)) return rv;
!     *result = url.get();
!     NS_ADDREF(*result);
      return rv;
  }
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri,
                                       nsILoadGroup* aLoadGroup,


After I change the implementation of NewURI as the HTTP one, I find out I also 
need to change the Mangle function . 
This will fix the international problem. valeski- could you please code review 
it ?
Whiteboard: [PDT+] → [PDT+]need code review.
I disagree with this fix. Keyword URLs should be simple URLs and not standard 
URLs. Why are you doing this?
If we don't do this, then the GetSpec won't return URL in escaped form and may 
contains char* in UTF8. And the nsWebShell will assign that char* to nsString in 
nsWebShell.
Why this should be simple URLs ? Any reason ?
Should GetSpec always return url in escped form ? Andrea's message that valeski 
point to me say GetSpec always return esacpe form but the current (without my 
patch) implementation does not reflect that.
> Why this should be simple URLs ? Any reason ?

Because keywords don't have the syntax protocol://user@host:port/path/file.ext 
They're just keyword:randomtext

> Should GetSpec always return url in escped form ? 

For nsStdURL it should -- but that's an aspect of the syntax that's being 
defined, not SetSpec in general (since some syntaxes have no notion of 
escaping). 

If you need escaping in keyword: urls, I suggest inventing something new or 
handling escaping outside of the use of nsSimpleURI, but don't use nsStdURL for 
keywords. You'll get all sorts of unwanted side effects.
I'm not sure how this code plugs together, but here is an example of what
needs to work.

A non-ASCII keyword:     keyword:français

will need to be escaped (%-encoded UTF8) when it converts to the HTTP url:

     http://keyword.netscape.com/keyword/fran%C3%A7ais
That escaping should happen in the keyword protocol handler at the point it 
constructs the http: url from the keyword: url. It should not happen in the 
keyword: url (the instance of nsSimpleURI). Jud - can you make this change.

The conversion to utf8 should be done by the webshell.
sure. is this escaping done using nsEscape() or using some i18n escaping api?
ugh. what a mess. warren, the issue is that the url returned from 
keywordHandler;:NewURI() (which returns something a url of the form 
"keyword:sometext") can cause string corruption (because it doesn't escape the 
spec).

Later in ::NewChannel() we call GetSpec() on the URI passed in (which was 
previously created using keywordhnadler::newuri()). That GetSpec() call will 
return the corrupted spec.

I'm seeing Frank's point here. the escaping/unescaping policies for simple url 
and std url need to be the same. For now we can either pass in an escaped string 
when creating a keyword url, or have keyword urls be stdurls.
bobj/warren- what bobj said is NOT the remaining problem. The UCS2 to UTF8 
conversion part is already fixed last Friday when we change the ToNewCString to 
ToNewUTF8String. The remaining problem I try to fix is related to the keyword: 
uri and history code (including switch default charset to UTF-8 since the 
keyword search result page return in UTF8 without meta tag or HTTP charset 
parameter. ) 

What happen is this
1. the keyword uri convert to the http uri correctly and the page load correctly
2. The webshell call GetSpec() of that uri 
3. The GetSpec() return escaped form in all the protocol except "keyword:", 
which return unescaped utf8
4. webshell passed that return value to history code, which aceept char* (not 
nsString nor char*)
5. The history code convert the char* (which not in UTF8) to nsString by apply 
ISO-8859-1 to Unicode conversion rule (by default)
6. When we try to load again, the NS_NewURI convert the nsString into char* by 
apply Unicode to UTF8 rule. 

The problem is that the webshell/history code currently assume the GetSpec 
return in escaped form, which reuqire no special treatment for non ASCII. That 
is true for all protocol except "keyword:".

What happen is now the first time you type keyword: it will work, but if you 
hit Back/Forward or change encoding (which is very common for now since the 
keyword search page is not return in UTF8 without meta tag nor http header) then 
the 2nd and so on loading will be wrong, each time it will apply a ISO-8859-1 to 
Unicode to UTF8 conversion which will cause garbage in the 2nd and later 
loading. 

From the i18n problem point of view, I really don't care keyword uri is standard 
or simple. I do care the GetSpec from all different URI return in escape or in 
unescape. 

 

I agree w/ warren and valeski now, I won't change the keyword uri from simple 
url to standard url now. Howerver, I do need to make the Manguling rounting 
unescaped first since it looking at ? and space.

New patch is in here (Notice the converFileToURL is for bug28171) 

http://warp/u/ftang/tmp/webshelldiff1.txt

Index: nsWebShell.cpp
===================================================================
RCS file: /m/pub/mozilla/webshell/src/nsWebShell.cpp,v
retrieving revision 1.396
diff -c -r1.396 nsWebShell.cpp
*** nsWebShell.cpp      2000/02/19 02:54:32     1.396
--- nsWebShell.cpp      2000/02/22 05:10:51
***************
*** 87,92 ****
--- 87,93 ----
  #include "nsIDocShellTreeOwner.h"
  #include "nsCURILoader.h"
  #include "nsIDOMWindow.h"
+ #include "nsEscape.h"
  
  #include "nsIHTTPChannel.h" // add this to the ick include list...we need it 
to QI for post data interface
  #include "nsHTTPEnums.h"
***************
*** 1375,1419 ****
  
  static void convertFileToURL(const nsString &aIn, nsString &aOut)
  {
!   char szFile[1000];
!   aIn.ToCString(szFile, sizeof(szFile));
  #ifdef XP_PC
    // Check for \ in the url-string (PC)
!   if (PL_strchr(szFile, '\\')) {
  #else
  #if XP_UNIX
    // Check if it starts with / or \ (UNIX)
!   if (*(szFile) == '/' || *(szFile) == '\\') {
  #else
    if (0) {  
    // Do nothing (All others for now) 
  #endif
  #endif
-     PRInt32 len = strlen(szFile);
-     PRInt32 sum = len + sizeof(FILE_PROTOCOL);
-     char* lpszFileURL = (char *)PR_Malloc(sum + 1);
  
  #ifdef XP_PC
      // Translate '\' to '/'
!     for (PRInt32 i = 0; i < len; i++) {
!       if (szFile[i] == '\\') {
!         szFile[i] = '/';
!       }
!       if (szFile[i] == ':') {
!         szFile[i] = '|';
!       }
!     }
  #endif
  
      // Build the file URL
!     PR_snprintf(lpszFileURL, sum, "%s%s", FILE_PROTOCOL, szFile);
!     aOut = lpszFileURL;
!     PR_Free((void *)lpszFileURL);
    }
-   else
-   {
-     aOut = aIn;
-   }
  }
  
  NS_IMETHODIMP
--- 1376,1406 ----
  
  static void convertFileToURL(const nsString &aIn, nsString &aOut)
  {
!   aOut = aIn;
  #ifdef XP_PC
    // Check for \ in the url-string (PC)
!   if(kNotFound != aIn.FindChar(PRUnichar('\\'))) {
  #else
  #if XP_UNIX
    // Check if it starts with / or \ (UNIX)
!   const PRUnichar * up = aIn.GetUnicode();
!   if((PRUnichar('/') == *up) ||
!      (PRUnichar('\\') == *up)) {
  #else
    if (0) {  
    // Do nothing (All others for now) 
  #endif
  #endif
  
  #ifdef XP_PC
      // Translate '\' to '/'
!     aOut.ReplaceChar(PRUnichar('\\'), PRUnichar('/'));
!     aOut.ReplaceChar(PRUnichar(':'), PRUnichar('|'));
  #endif
  
      // Build the file URL
!     aOut.Insert(FILE_PROTOCOL,0);
    }
  }
  
  NS_IMETHODIMP
***************
*** 2004,2009 ****
--- 1991,2001 ----
  
      // see if we've got a file url.
      convertFileToURL(urlStr, urlSpec);
+ 
+     // XXX
+     // for file url, we need to convert the nsString to the file system
+     // charset before we pass to NS_NewURI
+ 
      rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull);
      if (NS_FAILED(rv)) {
        NS_ASSERTION(mPrefs, "the webshell's pref service wasn't initialized");
***************
*** 2014,2019 ****
--- 2006,2012 ----
        PRInt32 qMarkLoc = -1, spaceLoc = -1;
  
        if (keywordsEnabled) {
+           rv = NS_ERROR_FAILURE;
            // These are keyword formatted strings
            // "what is mozilla"
            // "what is mozilla?"
***************
*** 2040,2057 ****
                }
  
                if (keyword) {
!                   nsAutoString keywordSpec("keyword:");
!                   keywordSpec.Append(aURLSpec);
!                   rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, nsnull);
!               } else {
!                   rv = NS_ERROR_FAILURE;
!               }
!           } else {
!               rv = NS_ERROR_FAILURE;
!           }
!       } else {
!           rv = NS_ERROR_FAILURE;
!       }
  
        PRInt32 colon = -1;
        if (NS_FAILED(rv)) {
--- 2033,2053 ----
                }
  
                if (keyword) {
!                   nsCAutoString keywordSpec("keyword:");
!                   char *utf8Spec = urlStr.ToNewUTF8String();
!                   if(utf8Spec) {
!                       char* escapedUTF8Spec = nsEscape(utf8Spec, url_Path);
!                       if(escapedUTF8Spec) {
!                           keywordSpec.Append(escapedUTF8Spec);
!                           rv = NS_NewURI(getter_AddRefs(uri), 
keywordSpec.GetBuffer(), nsnull);
!                           nsAllocator::Free(escapedUTF8Spec);
!                       } // escapedUTF8Spec
!                       nsAllocator::Free(utf8Spec);
!                   } // utf8Spec
!               } // keyword 
!           } // FindChar
!       } // keywordEnable
!    
  
        PRInt32 colon = -1;
        if (NS_FAILED(rv)) {
***************
*** 2118,2124 ****
  
     rv = uri->GetSpec(getter_Copies(spec));
     if (NS_FAILED(rv)) return rv;
!   
    // Get hold of Root webshell
    nsCOMPtr<nsIWebShell>  root;
    nsCOMPtr<nsISessionHistory> shist;
--- 2114,2120 ----
  
     rv = uri->GetSpec(getter_Copies(spec));
     if (NS_FAILED(rv)) return rv;
! 
    // Get hold of Root webshell
    nsCOMPtr<nsIWebShell>  root;
    nsCOMPtr<nsISessionHistory> shist;

http://warp/u/ftang/tmp/keyworddiff2.txt
? src/nsKeywordProtocolHandler.sbr
? src/nsKeywordModule.sbr
Index: src/nsKeywordProtocolHandler.cpp
===================================================================
RCS file: 
/m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsKeywordProtocolHandler.cpp
*** nsKeywordProtocolHandler.cpp        2000/01/12 22:53:12     1.12
--- nsKeywordProtocolHandler.cpp        2000/02/22 05:11:54
***************
*** 97,119 ****
  // digests a spec _without_ the preceeding "keyword:" scheme.
  static char *
  MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) {
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = aSpec[0], two = aSpec[1];
      if (one == '?') {                            // "?blah"
!         query = aSpec+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (aSpec[12] == ' ') ) {      //
  
!         query = aSpec+3;
      } else {
!         query = aSpec;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
--- 97,127 ----
  // digests a spec _without_ the preceeding "keyword:" scheme.
  static char *
  MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) {
+     char * unescaped = nsCRT::strdup(aSpec);
+     if(unescaped == nsnull) 
+        return nsnull;
+ 
+     nsUnescape(unescaped);
+ 
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = unescaped[0], two = unescaped[1];
      if (one == '?') {                            // "?blah"
!         query = unescaped+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (unescaped[12] == ' ') ) {      //
  
!         query = unescaped+3;
      } else {
!         query = unescaped;
      }
+ 
+     nsAllocator::Free(unescaped);
  
      query.Trim(" "); // pull leading/trailing spaces.
  

Depends on: 28784
Who's reviewing this?  warren?  valeski?
The sooner we get bug fixes checked-in, the more time we've got to make
a stable Beta1!  Thx.
I left a message for Frank to call me about this, but haven't heard from him 
today.
still trying to get a review.  ftang and warren have been playing tag since
2/21 and have not yet discussed ftang's fix...
Somehow my last comment in this bug seems to have gotten lost. I bounced this 
back to Frank because I don't want to put these routines into necko right now. I 
think he should check them in as is.
Ok, sorry, I confused this bug with bug 28784 which is remarkably similar. My 
comments went there. Also see bug 29341 which is how necko should fix this 
problem.
Whiteboard: [PDT+]need code review. → [PDT+]schedule to check in 2/28 night
Target Milestone: M14 → M15
I will check this in tonight.
a=bobj
fix and check in
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Verified fixed in Win32 02-29-09-M15 bld.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.