14155 - Submitting non-ASCII keyword in location field disabled

Reporter

Description

•

25 years ago

To see this,
Type in a JA keyword (eg, "tousiba" in hiragana) and hit return.
==> The text input disappears. Instead a horizontal line (rotated L shape)
    appears. Hitting backspace bar at this point will display "http://" in
    the same field.  Observed in Win32 99-09-16-09-M11 bld.

blee

Reporter

Updated

•

25 years ago

QA Contact: teruko → blee

Frank Tang

Assignee

Updated

•

25 years ago

Assignee: ftang → rjc

Frank Tang

Assignee

Comment 1

•

25 years ago

rjc, are you working on internet keyword ? Let me know what can I help. Also,
visit blee's cube to ask him to reproduce this bug to you.Thanks

Robert John Churchill

Updated

•

25 years ago

Assignee: rjc → warren

Robert John Churchill

Comment 2

•

25 years ago

Nope, I'm not working on internet keywords, which are being implemented in
Necko.  Warren, any idea here?

Warren Harris

Updated

•

25 years ago

Status: NEW → RESOLVED

Closed: 25 years ago

Resolution: --- → DUPLICATE

Warren Harris

Comment 3

•

25 years ago

*** This bug has been marked as a duplicate of 888 ***

blee

Reporter

Updated

•

25 years ago

Status: RESOLVED → REOPENED

blee

Reporter

Updated

•

25 years ago

Resolution: DUPLICATE → ---

blee

Reporter

Comment 4

•

25 years ago

The problem still happens in Win32 12-02-08-M12 bld. reopening

Warren Harris

Updated

•

25 years ago

Assignee: warren → valeski

Status: REOPENED → NEW

Target Milestone: M14

msanz

Updated

•

25 years ago

Summary: Submitting double byte keyword in location field disabled → [BETA] Submitting double byte keyword in location field disabled

msanz

Comment 5

•

25 years ago

We need this bug fixed by M13 so that proper testing can be done after the
feature works with JA characters. If it's fixed by M14, we won't be able to test
it before beta.

Judson Valeski

Comment 6

•

25 years ago

Can I enter unicode chars into the location bar using a US keyboard? If you
can't even get the chars to display, this is a widget bug, not a necko bug (yet
;).

blee

Reporter

Comment 7

•

25 years ago

Text input (CJK, High ASCII, etc) is displayed while they're typed in. It's the
result of committing the input that is broken. In a recent bld (Win 12-02-08-
M12), carriage return after JA input displays "http://" in location field.

Judson Valeski

Comment 8

•

25 years ago

can I enter unicode chars using my US keyboard?

blee

Reporter

Comment 9

•

25 years ago

Unicode is used for internal conversion of char's in client. If you're asking
whether you can type non ASCII char's using US KB, the answer is yes. Try Alt+
0192 (or 0193, 0196, etc) using US KB. If you want other specific char's
belonging to a certain language, you need to install that particular KB driver
from Ctrl panel.

Judson Valeski

Comment 10

•

25 years ago

This is part of a larger problem (I can't find the true bug right now) which is
that no protocols can handle unicode right now.

Warren Harris

Comment 11

•

25 years ago

I disagree with that last comment. I think the location bar needs to convert the
unicode to utf-8 before calling necko. That's the way we agreed that nsIURI
should work.

Judson Valeski

Comment 12

•

25 years ago

You're right. My fault. I had forgotten our conclusion re: the unicode URI
issue. So how does this conversion work?

Warren Harris

Comment 13

•

25 years ago

You should ask one of the i18n guys how to do this.

Warren Harris

Updated

•

25 years ago

Keywords: beta1

leger

Comment 14

•

25 years ago

Putting on PDT+ radar for beta1.

Whiteboard: [PDT+]

Frank Tang

Assignee

Comment 15

•

25 years ago

Add nhotta, erik and bobj to the cc. Is that true we agree to use escaped UTF-8 
for internet keyword for 4.x ? 

For Unicode conversion- talk to naoki. He have a function which will convert 
Unicode to escaped UTF-8. (and back)

Erik van der Poel

Comment 16

•

25 years ago

Yes, Netscape 4.X and Netcenter's keyword server agreed to use UTF-8 for
Internet Keywords. Since these strings are passed as URLs, they need to be
"URL-encoded", i.e. %XX (percent followed by hex digits).

nhottanscp

Comment 17

•

25 years ago

Here is the conversion interface Frank mentioned.
http://lxr.mozilla.org/seamonkey/source/intl/uconv/idl/nsITextToSubURI.idl
I attached examples of the usage in bug 25034.

Michael Lowe

Updated

•

25 years ago

Summary: [BETA] Submitting double byte keyword in location field disabled → Submitting double byte keyword in location field disabled

Judson Valeski

Updated

•

25 years ago

Status: NEW → ASSIGNED

Whiteboard: [PDT+] → [PDT+] [02/11/00]

Judson Valeski

Comment 18

•

25 years ago

well, I can't enter any Unicode in to the location bar now. I'm trying the 
Alt+0192 trick.

blee

Reporter

Comment 19

•

25 years ago

With 2/9/00 Win32 bld, I could enter extended char's in to the location bar using 
Alt key combination. CJK char input still can't be committed as explained in the 
original problem description.

Judson Valeski

Comment 20

•

25 years ago

I still can't enter unicode. are you using a home grown build or a release?

I may have to repair this blindly (without testing).

Here's what I've done. I've added a method to nsITextToSubURI.idl called 
"Convert()" it does everything convertandescape does, except the escape. I far 
from an expert on string conversion, but we don't want to escape the URL before 
we try and parse it because special chars will be escaped that we need to use. 
Thus I want to convert to UTF8 (what is the actual charset string I should be 
using in the api?) without losing the original URI...??? Is that even possible? 
Maybe someone on the i18n team should get this bug? The bottom line is URI's 
aren't unicode and this is callsite cleanup. I need traction on this. Either 
someone in the i18n group should get this bug or someone needs to call me or be 
a little more responsive.

Judson Valeski

Comment 21

•

25 years ago

ok. over to ftang per blee's suggestion.

ftang, the code in question is here 
http://lxr.mozilla.org/seamonkey/source/webshell/src/nsWebShell.cpp#2034 . The 
URLspec passed into LoadURL() is the one we want to get down to UTF8. I'm not 
sure if we want to do this for *all* urls (probably) or just keyword urls. Call 
me if you have questions about the call site.

Assignee: valeski → ftang

Status: ASSIGNED → NEW

Frank Tang

Assignee

Comment 22

•

25 years ago

erik- location field problem. Please take a look at it.

Assignee: ftang → erik

Erik van der Poel

Updated

•

25 years ago

Status: NEW → ASSIGNED

Whiteboard: [PDT+] [02/11/00] → [PDT+] [02/16/00]

John G. Myers

Comment 23

•

25 years ago

FYI, IMAP URLs are also specified as being in UTF-8

Erik van der Poel

Updated

•

25 years ago

Whiteboard: [PDT+] [02/16/00] → [PDT+] 02/16

Erik van der Poel

Updated

•

25 years ago

Whiteboard: [PDT+] 02/16 → [PDT+] 02/18

bobj

Comment 24

•

25 years ago

cc'd nhotta

bobj

Comment 25

•

25 years ago

Updated the summary because this affects non-ASCII, not just double-byte.

Running the 2/14 sweetlou build.

I typed and copied the following from the Windows Character Map accessory
and pasted it into the location bar:
   français español

You got search results back:
   Search Results for 'franÃ§ais espaÃ±ol'

The strings sent to the search engine was not coverted correctly.
Is the search from the location bar, also controlled by sherlock files?
If there are no sherlock files for a given server, what is the default
behavior?

Summary: Submitting double byte keyword in location field disabled → Submitting non-ASCII keyword in location field disabled

bobj

Comment 26

•

25 years ago

I forgot, here is what was displayed in the location bar after hitting return:
  http://search.netscape.com/cgi-bin/search?search=fran%C3%A7ais%20espa%C3%B1ol

I would expect:
  http://search.netscape.com/cgi-bin/search?search=fran%E7ais%20espa%F1ol

bobj

Comment 27

•

25 years ago

It looks like it creates the URL for the search by %-encodeding the UTF-8
strings without converting to Latin1 first.

blee

Reporter

Comment 28

•

25 years ago

bobj, perhaps your comment above is for bug 25034. This bug is about not being 

able to commit CJK input (IK) in location field as originally described.

Warren Harris

Comment 29

•

25 years ago

Cc'ing Andreas and Jud.

bobj

Comment 30

•

25 years ago

blee, Thanks for setting me straight on this bug.

Now (using 2000021408 build), I typed in the location bar:
    keyword français español

And the resulting URL (after hitting return) displayed in the location bar:
    http://keyword.netscape.com/keyword/keyword%20fran%E7ais%20espa%F1ol

And the keyword results page displays
    Search Results for 'keyword franè»‹is espaï¿½ol'

Next I typed:
    keyword [tou][shiba]   (where [tou][shiba] are 2 kanji characters0

And the resulting URL (after hitting return) displayed in the location bar:
    http://keyword.netscape.com/keyword/keyword%20..

And the keyword results page displays
    Search Results for 'keyword ..'

Frank Tang

Assignee

Comment 31

•

25 years ago

I set a break point in nsWebShell::LoadURL the data looks ok when it reach it.

Erik van der Poel

Comment 32

•

25 years ago

Frank, are you going to take this bug? If not, reassign it to me. Thanks!

Assignee: erik → ftang

Status: ASSIGNED → NEW

Whiteboard: [PDT+] 02/18 → [PDT+]

Frank Tang

Assignee

Comment 33

•

25 years ago

I should take a look at BrowserLoadURL in navigator.js

Status: NEW → ASSIGNED

Frank Tang

Assignee

Comment 34

•

25 years ago

Found the problem, at lease part of it.
mozilla/netwerk/base/public/nsNetUtil.h
 53             inline nsresult
 54             NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI 
= nsnull)
 55             {
 56                 // XXX if the string is unicode, GetBuffer() returns null. 
 57                 // XXX we need a strategy to deal w/ unicode specs (if there 
should
 58                 // XXX even be such a thing)
 59                 char* specStr = spec.ToNewCString(); // this forces a single 
byte char*
 60                 if (specStr == nsnull)
 61 warren  1.1         return NS_ERROR_OUT_OF_MEMORY;
 62                 nsresult rv = NS_NewURI(result, specStr, baseURI);
 63                 nsAllocator::Free(specStr);
 64                 return rv;
 65             }

This spec.ToNewCString(); convert non ASCII to 0x2e !!!

Judson Valeski

Comment 35

•

25 years ago

ugh. ftang, please read my comments. I basically point out what code needs to 
change (and it's not in necko).

Frank Tang

Assignee

Comment 36

•

25 years ago

sorry, I miss some comment before I put in mime.
1. about adding Convert() to nsITextToSubURI.idl- there are no need to do that, 
if you need convert unicode to some charset, you call nsIUnicodeEncoder 
(nsITextToSubURI call that one indrectly). If you want to convert to UTF8 in 
char*, you call the nsString::ToNewUTF8String()
2. "code needs to change (and it's not in necko)" ? I think this is debatable- 
if necko have an interface which accept nsString or PRUnichar* , then itself 
have to deal with Unicode. You cannot design an interface which using float as 
type but expect people only passing round down number in the float, right ? The 
same reason you have to deal with Unicode. And once you deal with unicode, your 
code should not TRUNCATE data. I do agree part of the right fix should be in the 
nsWebShell. However, what about other cases ? What happen if other code pass 
unicode in and  you do the same data damage ?
The bottom line is you have to change NS_NewURI(nsIURI* *result, const nsString& 
spec, nsIURI* baseURI ) to add assertion and return error while the spec 
contains non ASCII data . Silently pass damage data is not acceptable. You 
should add the following line between line 56-58
PRUnichar* ck = spec.GetUnicode();
while(*ck)
{
   NS_ASSERTION((*ck >= 0x0080), "non ASCII in URI");
   if(*ck++ >= 0x0080) {
     retrun NS_ERROR_INVALID_ARG;
   }
}

Frank Tang

Assignee

Comment 37

•

25 years ago

The problem is not in single place of LoadURL-
1. First try
1998   rv = NS_NewURI(getter_AddRefs(uri), urlStr, nsnull);
what should we do ?

2. Second try
2005     convertFileToURL(urlStr, urlSpec);
2006     rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull
I think this is File url. I think we need to convert to file system charset 
before calling this NewURI

3. third try-
2042                   nsAutoString keywordSpec("keyword:");
2043                   keywordSpec.Append(aURLSpec);
2044                   rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, 
nsnull); 
this is keyword url. I think we should convert to UTF8 for keyword url

4. forth try-
2088               rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull);
This is for ftp or http url. What should we do ? Convert to UTF8 ?

5. fifth-
2186   res = NS_NewURI(getter_AddRefs(newURI), urlstr, nsnull);
This urlstr could be the one from the above 4, or mURL. What should we do ? I 
think this is one after process ? I think we should keep it as the way it is 
now.

This is very nasty issue.

Frank Tang

Assignee

Comment 38

•

25 years ago

So... we need to divid the problem into at least the following sub problem.
1. full url
2. file url- depend on bug 28171 , and then we should convert the 
convertFileToURL() output of into file system encoding (get from 
nsIPlatformCharset ) before passing to NS_NewURI() as char* version 
3. keyword url- convert to UTF8 (?) and then pass to NS_NewURI as char* version 
(?)
4. http/ftp url ? convert to UTF8 ? or the system charset ???? (backward 
compatability should be consider )

Depends on: 28171

Frank Tang

Assignee

Comment 39

•

25 years ago

The other way we can do is we always convert to UTF8 in webshell, and let the 
code in mozilla/netwerk/protocol convert it back to the correct charset. For 
example- we could still convert the file URL into UTF8 and let the file protcol 
handler convert the escaped UTF8 back to the Unicode, and then convert to file 
system charset before calling the file open. 
If this is the approach, then we should Change the NS_NewURI and replace char* 
specStr = spec.ToNewCString(); to char* specStr = spec.ToNewUTF8String(); 

And then in the protocol handler, we change it depend on the protocol. I think 
this is how we did it in the old libnet. For example- we should convert escaped 
UTF8 in imap4 protcol into Unicode and then convert to x-imap4-modified-utf7 
charset for the mailbox in the imap4 protocol handler.

Judson Valeski

Comment 40

•

25 years ago

Two things: I suggested the new Convert() method because the current one does 
the escaping which we don't want. necko is particular about having 
escaped/non-escaped strings handed to it.

Also, remember that necko/protocols do not allow unicode to be passed in, or 
used. I agree that our apis advertise otherwise; which bites. Converting to 
anything other than UTF8 for necko use doesn't make sense.

I think your assertion looks good. And becuase that's a utility function, the 
ToNewUTF8...() is probably a good idea. I assume that won't trash a true char 
array?

I still think the string should be UTF8'ed at the top of the webshel.

Warren Harris

Comment 41

•

25 years ago

About the ToNewCString in NS_NewURI -- didn't we agree that the string passed 
into NewURI is UTF-8? That would make the use of ToNewCString here wrong, 
wouldn't it?

Frank Tang

Assignee

Updated

•

25 years ago

Depends on: 28197

Frank Tang

Assignee

Comment 42

•

25 years ago

>I suggested the new Convert() method because the current one does
>the escaping which we don't want. necko is particular about having 
>escaped/non-escaped strings handed to it.
I understand you need it in an non-escaped form. My point is such thing is 
already exist, just not in nsIUnicodeEncoder and does not make sense to 
duplicate add a new method in nsITextToSubURI. simply call 
nsIUnicodeEncoder::Convert() instead.

>I still think the string should be UTF8'ed at the top of the webshel.
I do agree taht we need to change webshell. However, as long as you keep a 
nsString favor of NS_NewURI, you open a *possibility* for people to pass in non 
ASCII data in nsString and you have to deal with the conversion (in 
ToNewCString, ToNewUTF8String or some other form.) Remember, this code is 
currently ALREADY do conversion- a bad one which WILL damage data. Untill you 
remove the NEED for conversion, you should make the conversion not damage data- 
and I simply suggest you use ToNewUTF8String so the data will be preserved.
Also, the www is MOVING to use UTF8 in URL as default. (see 
http://www.ietf.org/rfc/rfc2718.txt 
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt,  and  
http://www.w3.org/International/O-URL-and-ident.html for details)  Which mean 
all the NEW protcol will be encouraged to use UTF8 in escaped form for URL. 
Replace the ToNewCString to ToNewUTF8String in NS_NewURI will make it forward 
compatabile with new protocol. Of course, I don't believe all the existing 
protocol will migrate to UTF-8 within 5 years..... 

>About the ToNewCString in NS_NewURI -- didn't we agree that the string passed 
>into NewURI is UTF-8? That would make the use of ToNewCString here wrong, 
>wouldn't it?
I think you are confused between
 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull)
and
NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull)

I think you probably agree the const char* version is passed in AS UTF8 in char* 
(there are no way you can pass in anything other than UCS2 / PRUnichar in 
nsString) If that is your agreement, then the current version of the nsString 
favor of NS_NewURI is not compliant to your decision because it does not convert 
to UTF8 before it call the char* version of NS_NewURI

 41 inline nsresult
 42 NS_NewURI(nsIURI* *result, const char* spec, nsIURI* baseURI = nsnull)
 43 {
 44     nsresult rv;
 45     static NS_DEFINE_CID(kIOServiceCID, NS_IOSERVICE_CID);
 46     NS_WITH_SERVICE(nsIIOService, serv, kIOServiceCID, &rv);
 47     if (NS_FAILED(rv)) return rv;
 48     
 49     rv = serv->NewURI(spec, baseURI, result);
 50     return rv;
 51 }
 52 
 53 inline nsresult
 54 NS_NewURI(nsIURI* *result, const nsString& spec, nsIURI* baseURI = nsnull)
 55 {
 56     // XXX if the string is unicode, GetBuffer() returns null. 
 57     // XXX we need a strategy to deal w/ unicode specs (if there should
 58     // XXX even be such a thing)
 59     char* specStr = spec.ToNewCString(); // this forces a single byte char*
 60     if (specStr == nsnull)
 61         return NS_ERROR_OUT_OF_MEMORY;
 62     nsresult rv = NS_NewURI(result, specStr, baseURI);
 63     nsAllocator::Free(specStr);
 64     return rv;
 65 }
If you assume the spec in line 42 is in UTF8 (as what I read from warren's last 
comment), then you SHOULD convert spec in line 54 into UTF8 before it pass to 
NS_NewURI in line 62. Currently, the ToNewCString() in line 59 does NOT convert 
it to UTF8. It simply truncate the higher 8 bits in the PRUnichar and cast it 
down to char. That conversion will damage any non ISO-8859-1 data and should 
only be called by the case that only ASCII in nsString.

If you guys already have agreement that the char* version of NS_NewURI is 
expecting UTF8, then change line 59 to ToNewUTF8String simply make the code 
implement what you agree upon.
Basically, warren is RIGHT.
add travis to the cc list since he is webshell owner now.

No longer depends on: 28197

Frank Tang

Assignee

Comment 43

•

25 years ago

I fix 28197 and now it is ToNewUTF8String. Now the problem is somehow the 
GetSpec of keyword: URI does not escape the UTF8 and cause some double 
conversion in nsWebShell (get the utf8 back and assign to nsString as Latin 1 
and convert to utf8 again ...) Should we always escape it when we call GetSpec ?

Judson Valeski

Comment 44

•

25 years ago

cc'ing andreas for some escaping input.

Frank Tang

Assignee

Comment 45

•

25 years ago

As valeski suggest, I change the keyword protocol handler to use the latest url 
parser instead of the simple URI one. Here is the patch
(file internal to netscape - http://warp/u/ftang/tmp/keyworddiff.txt)
? keyword/src/nsKeywordProtocolHandler.sbr
? keyword/src/nsKeywordModule.sbr
Index: keyword/src/nsKeywordProtocolHandler.cpp
===================================================================
RCS file: 
/m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsKeywordProtocolHandler.cpp
*** nsKeywordProtocolHandler.cpp        2000/01/12 22:53:12     1.12
--- nsKeywordProtocolHandler.cpp        2000/02/22 00:25:08
***************
*** 30,36 ****
  #include "nsIPref.h"
  #include "nsXPIDLString.h"
  
! static NS_DEFINE_CID(kSimpleURICID,     NS_SIMPLEURI_CID);
  static NS_DEFINE_CID(kIOServiceCID,     NS_IOSERVICE_CID);
  static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID);
  
--- 30,37 ----
  #include "nsIPref.h"
  #include "nsXPIDLString.h"
  
! static NS_DEFINE_CID(kStandardUrlCID, NS_STANDARDURL_CID);
! static NS_DEFINE_CID(kAuthUrlParserCID, NS_AUTHORITYURLPARSER_CID);
  static NS_DEFINE_CID(kIOServiceCID,     NS_IOSERVICE_CID);
  static NS_DEFINE_CID(kPrefServiceCID, NS_PREF_CID);
  
***************
*** 100,124 ****
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = aSpec[0], two = aSpec[1];
      if (one == '?') {                            // "?blah"
!         query = aSpec+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (aSpec[12] == ' ') ) {      //
  
!         query = aSpec+3;
      } else {
!         query = aSpec;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
      // encode
      char * encQuery = nsEscape(query.GetBuffer(), url_Path);
      if (!encQuery) return nsnull;
      query = encQuery;
      nsAllocator::Free(encQuery);
--- 101,133 ----
      // build up a request to the keyword server.
      nsCAutoString query;
  
+    
+     char *unescaped = nsCRT::strdup(aSpec);
+     if(unescaped == nsnull)
+       return nsnull;
+     nsUnescape(unescaped);
+     char *pt = unescaped +1;
+ 
      // pull out the "go" action word, or '?', if any
!     char one = pt[0], two = pt[1];
      if (one == '?') {                            // "?blah"
!         query = pt+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (pt[12] == ' ') ) {      //
  
!         query = pt+3;
      } else {
!         query = pt;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
      // encode
      char * encQuery = nsEscape(query.GetBuffer(), url_Path);
+     nsAllocator::Free(unescaped);
      if (!encQuery) return nsnull;
      query = encQuery;
      nsAllocator::Free(encQuery);
***************
*** 133,151 ****
  // digests a spec of the form "keyword:blah"
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI,
!                                nsIURI **result) {
!     nsresult rv;
!     nsIURI* uri;
  
!     rv = nsComponentManager::CreateInstance(kSimpleURICID, nsnull, 
NS_GET_IID(nsIURI), (void**)&uri);
!     if (NS_FAILED(rv)) return rv;
!     rv = uri->SetSpec((char*)aSpec);
      if (NS_FAILED(rv)) return rv;
! 
!     *result = uri;
      return rv;
  }
- 
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri,
                                       nsILoadGroup* aLoadGroup,
--- 142,179 ----
  // digests a spec of the form "keyword:blah"
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewURI(const char *aSpec, nsIURI *aBaseURI,
!                       nsIURI **result)
! {
!     nsresult rv = NS_OK;
  
!     nsCOMPtr<nsIURI> url;
!     nsCOMPtr<nsIURLParser> urlparser; 
!     if (aBaseURI)
!     {
!         rv = aBaseURI->Clone(getter_AddRefs(url));
!         if (NS_FAILED(rv)) return rv;
!         rv = url->SetRelativePath(aSpec);
!     }
!     else
!     {
!         rv = nsComponentManager::CreateInstance(kAuthUrlParserCID, 
!                                     nsnull, NS_GET_IID(nsIURLParser),
!                                     getter_AddRefs(urlparser));
!         if (NS_FAILED(rv)) return rv;
!         rv = nsComponentManager::CreateInstance(kStandardUrlCID, 
!                                     nsnull, NS_GET_IID(nsIURI),
!                                     getter_AddRefs(url));
!         if (NS_FAILED(rv)) return rv;
! 
!         rv = url->SetURLParser(urlparser);
!         if (NS_FAILED(rv)) return rv;
!         rv = url->SetSpec((char*)aSpec);
!     }
      if (NS_FAILED(rv)) return rv;
!     *result = url.get();
!     NS_ADDREF(*result);
      return rv;
  }
  NS_IMETHODIMP
  nsKeywordProtocolHandler::NewChannel(const char* verb, nsIURI* uri,
                                       nsILoadGroup* aLoadGroup,


After I change the implementation of NewURI as the HTTP one, I find out I also 
need to change the Mangle function . 
This will fix the international problem. valeski- could you please code review 
it ?

Whiteboard: [PDT+] → [PDT+]need code review.

Warren Harris

Comment 46

•

25 years ago

I disagree with this fix. Keyword URLs should be simple URLs and not standard 
URLs. Why are you doing this?

Frank Tang

Assignee

Comment 47

•

25 years ago

If we don't do this, then the GetSpec won't return URL in escaped form and may 
contains char* in UTF8. And the nsWebShell will assign that char* to nsString in 
nsWebShell.
Why this should be simple URLs ? Any reason ?
Should GetSpec always return url in escped form ? Andrea's message that valeski 
point to me say GetSpec always return esacpe form but the current (without my 
patch) implementation does not reflect that.

Warren Harris

Comment 48

•

25 years ago

> Why this should be simple URLs ? Any reason ?

Because keywords don't have the syntax protocol://user@host:port/path/file.ext 
They're just keyword:randomtext

> Should GetSpec always return url in escped form ? 

For nsStdURL it should -- but that's an aspect of the syntax that's being 
defined, not SetSpec in general (since some syntaxes have no notion of 
escaping). 

If you need escaping in keyword: urls, I suggest inventing something new or 
handling escaping outside of the use of nsSimpleURI, but don't use nsStdURL for 
keywords. You'll get all sorts of unwanted side effects.

bobj

Comment 49

•

25 years ago

I'm not sure how this code plugs together, but here is an example of what
needs to work.

A non-ASCII keyword:     keyword:français

will need to be escaped (%-encoded UTF8) when it converts to the HTTP url:

     http://keyword.netscape.com/keyword/fran%C3%A7ais

Warren Harris

Comment 50

•

25 years ago

That escaping should happen in the keyword protocol handler at the point it 
constructs the http: url from the keyword: url. It should not happen in the 
keyword: url (the instance of nsSimpleURI). Jud - can you make this change.

The conversion to utf8 should be done by the webshell.

Judson Valeski

Comment 51

•

25 years ago

sure. is this escaping done using nsEscape() or using some i18n escaping api?

Judson Valeski

Comment 52

•

25 years ago

ugh. what a mess. warren, the issue is that the url returned from 
keywordHandler;:NewURI() (which returns something a url of the form 
"keyword:sometext") can cause string corruption (because it doesn't escape the 
spec).

Later in ::NewChannel() we call GetSpec() on the URI passed in (which was 
previously created using keywordhnadler::newuri()). That GetSpec() call will 
return the corrupted spec.

I'm seeing Frank's point here. the escaping/unescaping policies for simple url 
and std url need to be the same. For now we can either pass in an escaped string 
when creating a keyword url, or have keyword urls be stdurls.

Frank Tang

Assignee

Comment 53

•

25 years ago

bobj/warren- what bobj said is NOT the remaining problem. The UCS2 to UTF8 
conversion part is already fixed last Friday when we change the ToNewCString to 
ToNewUTF8String. The remaining problem I try to fix is related to the keyword: 
uri and history code (including switch default charset to UTF-8 since the 
keyword search result page return in UTF8 without meta tag or HTTP charset 
parameter. ) 

What happen is this
1. the keyword uri convert to the http uri correctly and the page load correctly
2. The webshell call GetSpec() of that uri 
3. The GetSpec() return escaped form in all the protocol except "keyword:", 
which return unescaped utf8
4. webshell passed that return value to history code, which aceept char* (not 
nsString nor char*)
5. The history code convert the char* (which not in UTF8) to nsString by apply 
ISO-8859-1 to Unicode conversion rule (by default)
6. When we try to load again, the NS_NewURI convert the nsString into char* by 
apply Unicode to UTF8 rule. 

The problem is that the webshell/history code currently assume the GetSpec 
return in escaped form, which reuqire no special treatment for non ASCII. That 
is true for all protocol except "keyword:".

What happen is now the first time you type keyword: it will work, but if you 
hit Back/Forward or change encoding (which is very common for now since the 
keyword search page is not return in UTF8 without meta tag nor http header) then 
the 2nd and so on loading will be wrong, each time it will apply a ISO-8859-1 to 
Unicode to UTF8 conversion which will cause garbage in the 2nd and later 
loading. 

From the i18n problem point of view, I really don't care keyword uri is standard 
or simple. I do care the GetSpec from all different URI return in escape or in 
unescape.

Frank Tang

Assignee

Comment 54

•

25 years ago

I agree w/ warren and valeski now, I won't change the keyword uri from simple 
url to standard url now. Howerver, I do need to make the Manguling rounting 
unescaped first since it looking at ? and space.

New patch is in here (Notice the converFileToURL is for bug28171) 

http://warp/u/ftang/tmp/webshelldiff1.txt

Index: nsWebShell.cpp
===================================================================
RCS file: /m/pub/mozilla/webshell/src/nsWebShell.cpp,v
retrieving revision 1.396
diff -c -r1.396 nsWebShell.cpp
*** nsWebShell.cpp      2000/02/19 02:54:32     1.396
--- nsWebShell.cpp      2000/02/22 05:10:51
***************
*** 87,92 ****
--- 87,93 ----
  #include "nsIDocShellTreeOwner.h"
  #include "nsCURILoader.h"
  #include "nsIDOMWindow.h"
+ #include "nsEscape.h"
  
  #include "nsIHTTPChannel.h" // add this to the ick include list...we need it 
to QI for post data interface
  #include "nsHTTPEnums.h"
***************
*** 1375,1419 ****
  
  static void convertFileToURL(const nsString &aIn, nsString &aOut)
  {
!   char szFile[1000];
!   aIn.ToCString(szFile, sizeof(szFile));
  #ifdef XP_PC
    // Check for \ in the url-string (PC)
!   if (PL_strchr(szFile, '\\')) {
  #else
  #if XP_UNIX
    // Check if it starts with / or \ (UNIX)
!   if (*(szFile) == '/' || *(szFile) == '\\') {
  #else
    if (0) {  
    // Do nothing (All others for now) 
  #endif
  #endif
-     PRInt32 len = strlen(szFile);
-     PRInt32 sum = len + sizeof(FILE_PROTOCOL);
-     char* lpszFileURL = (char *)PR_Malloc(sum + 1);
  
  #ifdef XP_PC
      // Translate '\' to '/'
!     for (PRInt32 i = 0; i < len; i++) {
!       if (szFile[i] == '\\') {
!         szFile[i] = '/';
!       }
!       if (szFile[i] == ':') {
!         szFile[i] = '|';
!       }
!     }
  #endif
  
      // Build the file URL
!     PR_snprintf(lpszFileURL, sum, "%s%s", FILE_PROTOCOL, szFile);
!     aOut = lpszFileURL;
!     PR_Free((void *)lpszFileURL);
    }
-   else
-   {
-     aOut = aIn;
-   }
  }
  
  NS_IMETHODIMP
--- 1376,1406 ----
  
  static void convertFileToURL(const nsString &aIn, nsString &aOut)
  {
!   aOut = aIn;
  #ifdef XP_PC
    // Check for \ in the url-string (PC)
!   if(kNotFound != aIn.FindChar(PRUnichar('\\'))) {
  #else
  #if XP_UNIX
    // Check if it starts with / or \ (UNIX)
!   const PRUnichar * up = aIn.GetUnicode();
!   if((PRUnichar('/') == *up) ||
!      (PRUnichar('\\') == *up)) {
  #else
    if (0) {  
    // Do nothing (All others for now) 
  #endif
  #endif
  
  #ifdef XP_PC
      // Translate '\' to '/'
!     aOut.ReplaceChar(PRUnichar('\\'), PRUnichar('/'));
!     aOut.ReplaceChar(PRUnichar(':'), PRUnichar('|'));
  #endif
  
      // Build the file URL
!     aOut.Insert(FILE_PROTOCOL,0);
    }
  }
  
  NS_IMETHODIMP
***************
*** 2004,2009 ****
--- 1991,2001 ----
  
      // see if we've got a file url.
      convertFileToURL(urlStr, urlSpec);
+ 
+     // XXX
+     // for file url, we need to convert the nsString to the file system
+     // charset before we pass to NS_NewURI
+ 
      rv = NS_NewURI(getter_AddRefs(uri), urlSpec, nsnull);
      if (NS_FAILED(rv)) {
        NS_ASSERTION(mPrefs, "the webshell's pref service wasn't initialized");
***************
*** 2014,2019 ****
--- 2006,2012 ----
        PRInt32 qMarkLoc = -1, spaceLoc = -1;
  
        if (keywordsEnabled) {
+           rv = NS_ERROR_FAILURE;
            // These are keyword formatted strings
            // "what is mozilla"
            // "what is mozilla?"
***************
*** 2040,2057 ****
                }
  
                if (keyword) {
!                   nsAutoString keywordSpec("keyword:");
!                   keywordSpec.Append(aURLSpec);
!                   rv = NS_NewURI(getter_AddRefs(uri), keywordSpec, nsnull);
!               } else {
!                   rv = NS_ERROR_FAILURE;
!               }
!           } else {
!               rv = NS_ERROR_FAILURE;
!           }
!       } else {
!           rv = NS_ERROR_FAILURE;
!       }
  
        PRInt32 colon = -1;
        if (NS_FAILED(rv)) {
--- 2033,2053 ----
                }
  
                if (keyword) {
!                   nsCAutoString keywordSpec("keyword:");
!                   char *utf8Spec = urlStr.ToNewUTF8String();
!                   if(utf8Spec) {
!                       char* escapedUTF8Spec = nsEscape(utf8Spec, url_Path);
!                       if(escapedUTF8Spec) {
!                           keywordSpec.Append(escapedUTF8Spec);
!                           rv = NS_NewURI(getter_AddRefs(uri), 
keywordSpec.GetBuffer(), nsnull);
!                           nsAllocator::Free(escapedUTF8Spec);
!                       } // escapedUTF8Spec
!                       nsAllocator::Free(utf8Spec);
!                   } // utf8Spec
!               } // keyword 
!           } // FindChar
!       } // keywordEnable
!    
  
        PRInt32 colon = -1;
        if (NS_FAILED(rv)) {
***************
*** 2118,2124 ****
  
     rv = uri->GetSpec(getter_Copies(spec));
     if (NS_FAILED(rv)) return rv;
!   
    // Get hold of Root webshell
    nsCOMPtr<nsIWebShell>  root;
    nsCOMPtr<nsISessionHistory> shist;
--- 2114,2120 ----
  
     rv = uri->GetSpec(getter_Copies(spec));
     if (NS_FAILED(rv)) return rv;
! 
    // Get hold of Root webshell
    nsCOMPtr<nsIWebShell>  root;
    nsCOMPtr<nsISessionHistory> shist;

http://warp/u/ftang/tmp/keyworddiff2.txt
? src/nsKeywordProtocolHandler.sbr
? src/nsKeywordModule.sbr
Index: src/nsKeywordProtocolHandler.cpp
===================================================================
RCS file: 
/m/pub/mozilla/netwerk/protocol/keyword/src/nsKeywordProtocolHandler.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsKeywordProtocolHandler.cpp
*** nsKeywordProtocolHandler.cpp        2000/01/12 22:53:12     1.12
--- nsKeywordProtocolHandler.cpp        2000/02/22 05:11:54
***************
*** 97,119 ****
  // digests a spec _without_ the preceeding "keyword:" scheme.
  static char *
  MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) {
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = aSpec[0], two = aSpec[1];
      if (one == '?') {                            // "?blah"
!         query = aSpec+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (aSpec[12] == ' ') ) {      //
  
!         query = aSpec+3;
      } else {
!         query = aSpec;
      }
  
      query.Trim(" "); // pull leading/trailing spaces.
  
--- 97,127 ----
  // digests a spec _without_ the preceeding "keyword:" scheme.
  static char *
  MangleKeywordIntoHTTPURL(const char *aSpec, const char *aHTTPURL) {
+     char * unescaped = nsCRT::strdup(aSpec);
+     if(unescaped == nsnull) 
+        return nsnull;
+ 
+     nsUnescape(unescaped);
+ 
      // build up a request to the keyword server.
      nsCAutoString query;
  
      // pull out the "go" action word, or '?', if any
!     char one = unescaped[0], two = unescaped[1];
      if (one == '?') {                            // "?blah"
!         query = unescaped+1;
      } else if ( (one == 'g' || one == 'G')       //
                              &&                   //
                  (two == 'o' || two == 'O')       // "g[G]o[O] blah"
                              &&                   //
!                      (unescaped[12] == ' ') ) {      //
  
!         query = unescaped+3;
      } else {
!         query = unescaped;
      }
+ 
+     nsAllocator::Free(unescaped);
  
      query.Trim(" "); // pull leading/trailing spaces.

Frank Tang

Assignee

Updated

•

25 years ago

Depends on: 28784

bobj

Comment 55

•

25 years ago

Who's reviewing this?  warren?  valeski?
The sooner we get bug fixes checked-in, the more time we've got to make
a stable Beta1!  Thx.

Warren Harris

Comment 56

•

25 years ago

I left a message for Frank to call me about this, but haven't heard from him 
today.

bobj

Comment 57

•

25 years ago

still trying to get a review.  ftang and warren have been playing tag since
2/21 and have not yet discussed ftang's fix...

Warren Harris

Comment 58

•

25 years ago

Somehow my last comment in this bug seems to have gotten lost. I bounced this 
back to Frank because I don't want to put these routines into necko right now. I 
think he should check them in as is.

Warren Harris

Comment 59

•

25 years ago

Ok, sorry, I confused this bug with bug 28784 which is remarkably similar. My 
comments went there. Also see bug 29341 which is how necko should fix this 
problem.

Frank Tang

Assignee

Updated

•

25 years ago

Whiteboard: [PDT+]need code review. → [PDT+]schedule to check in 2/28 night

Frank Tang

Assignee

Updated

•

25 years ago

Target Milestone: M14 → M15

Frank Tang

Assignee

Comment 60

•

25 years ago

I will check this in tonight.

bobj

Comment 61

•

25 years ago

a=bobj

Frank Tang

Assignee

Comment 62

•

25 years ago

fix and check in

Status: ASSIGNED → RESOLVED

Closed: 25 years ago → 25 years ago

Resolution: --- → FIXED

blee

Reporter

Comment 63

•

25 years ago

Verified fixed in Win32 02-29-09-M15 bld.

Status: RESOLVED → VERIFIED