Closed Bug 130393 Opened 23 years ago Closed 23 years ago

Unable to access to the site whose URL having DBCS is typed at location bar.

Categories

(SeaMonkey :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.0

People

(Reporter: hansoodev, Assigned: nhottanscp)

References

Details

(Keywords: intl, topembed)

Attachments

(1 file, 3 obsolete files)

Try to type any DBCS URL in the location bar and enter. You will get "file not found" error. It's b/c mozilla encode/escape URL typed in location bar to UTF-8 b/f sending it to server. For IE, it sends in system charset when "always send URLs as UTF-8" is off, and it sends in UTF-8 encoding when the option is on, which increases the possibility to find the file in the server. According to research done in bugs related to non-ASCII URLs, most modern servers are not ready to receive URLs in UTF-8 encoding, but expects them in their file system charset. Therefore, sending URLs in system charset will increase the possibility of fidning the file wanted, even though it would not work when the server charset does not match with client machine's charset. HK
There is a bug related to UTF-8 option in mozilla.. http://bugzilla.mozilla.org/show_bug.cgi?id=129726 Since this would not do much when the server charset does not match with client machine's charset and this option is off, why don't we do ths way ? We have three new properties. 1) network.standard-url.escape-utf8 --> Boolean 2) network.standard-url.embedded-charset --> String 3) network.standard-url.typed-charset --> String When property (1) is on, we sends all URLs in UTF-8 format, even embbeded URLs in documents. When prperty (1) is off, (2) controls charset used for encoding/escaping embedded URLs in documents. (3) controls charset used for encoding/escaping URLs typed in location bar. Valid values for (2) would be "doccharset", and all encoding available, where default "doccharset" indicates mozilla should use charset of the document which has an embedded URLs. Valid values for (3) would be "syscharset", and all encoding available, where default "syscharset" indicates mozilla should use charset of system. I am not sure what's involved for changes and the performance hits on these changes, but it will guarantee that mozilla is more flexible than IE on this issue. Plus, mozilla will always get files targeted if those options have right values. Just a thought =)
CCing nhotta
Status: UNCONFIRMED → NEW
Ever confirmed: true
Copy keyword from bugscape 12324.
Assignee: asa → nhotta
Keywords: intl, topembed
Target Milestone: --- → mozilla1.0
Roy, could you reveiw?
Status: NEW → ASSIGNED
Comment on attachment 73927 [details] [diff] [review] Use system charset for URL bar if the URI scheme is http, https, ftp or file. some minor nits: >Index: nsDefaultURIFixup.cpp >+ PRBool bAsciiURI = nsCRT::IsAscii(uriString.get()); how about this instead: PRBool bAsciiURI = IsASCII(uriString); >+ if (mFsCharset.IsEmpty()) >+ { >+ nsAutoString charset(NS_LITERAL_STRING("ISO-8859-1")); // set the fallback first. >+ nsCOMPtr<nsIPlatformCharset> plat(do_GetService(NS_PLATFORMCHARSET_CONTRACTID)); >+ if (plat) >+ (void) plat->GetCharset(kPlatformCharsetSel_FileName, charset); >+ mFsCharset.Assign(NS_ConvertUCS2toUTF8(charset)); NS_LossyConvertUCS2toASCII would be better since you know that charsets are really ASCII. and instead of initializing |charset| with a NS_LITERAL_STRING (which calls NS_ConvertASCIItoUCS2 under linux and other platforms), it'd be better to check if charset is empty and if so assign a NS_LITERAL_CSTRING to mFsCharset.
Attached patch Updated with darin's comment. (obsolete) — Splinter Review
Attachment #73927 - Attachment is obsolete: true
>+nsCAutoString nsDefaultURIFixup::mFsCharset; I think it's better to use nsCString instead of nsCAuto. nsCAutoString is meant for short life span and mostly used in stack. nsCString is allocated in heap. >+nsCOMPtr<nsIPlatformCharset>plat(do_GetService(NS_PLATFORMCHARSET_CONTRACTID)); >+if (plat) Instead of checking for _plat_, can we do something like nsresult rv; nsCOMPtr<nsIPlatformCharset> plat(do_GetService(NS_PLATFORMCHARSET_CONTRACTID), &rv); and check for _rv_ > + (void) plat->GetCharset(kPlatformCharsetSel_FileName, charset); Do we need (void) casting? > +const char * nsDefaultURIFixup::GetCharsetForUrlBar(const nsString& aStringURI) Can we use nsAString or nsAFlatString for parameter? I am not even close to knowing all the underlying string staff; but in this case I believe we should use abstract string classe for function parameter instead of using the concrete classe. I believe we have Ready-Only methods for nsAString (ie, FindChar(), Equals(),.. ) alecf and jag can comment here better. My question to alecf/jag: Do we have nsAString::EqualsIgnoreCase()? Current implementation of nsString has EqualsIgnoreCase() and it will call unicharutil for lower case conversion.
Comment on attachment 73952 [details] [diff] [review] Updated with darin's comment. >+nsCAutoString nsDefaultURIFixup::mFsCharset; Well, in this case I think an autostring is fine, because charsets are usually very short (i.e. less than 64 bytes) so this will save us a 2nd heap allocation >+ PRBool bAsciiURI = IsASCII(uriString); what's "IsASCII()"? I don't see it defined anywhere... sure you don't mean nsCRT::IsASCII? I'm kind of baffled by the number of conversion back and forth between UCS2 and ASCII for the charset.. (see my comments below, regarding this) > // Just try to create an URL out of it >- NS_NewURI(aURI, uriString, nsnull); >+ NS_NewURI(aURI, uriString, bAsciiURI ? nsnull : GetCharsetForUrlBar(uriString)); Huh. Now I'm confused. How do we get a unicode string that's encoded in a certain charset? I kind of dont' get that because if we have a PRUnichar* buffer, shouldn't it already be in UCS2? It kind of looks like we should be looking at the caller of CreateFixupURI to make sure they are passing in a true unicode string, rather than forcing us to do a wierd conversion... Assuming there's a reasonable explanation, I'll continue with the review: > >- nsresult rv = NS_NewURI(aURI, uriString, nsnull); >+ nsresult rv = NS_NewURI(aURI, uriString, bAsciiURI ? nsnull : GetCharsetForUrlBar(uriString)); > > >+const char * nsDefaultURIFixup::GetFileSystemCharset() >+{ since this always returns mFsCharset, we should make this return nsAString& (due to the comments I explain below) >+} >+ >+const char * nsDefaultURIFixup::GetCharsetForUrlBar(const nsString& aStringURI) >+{ >+ if (aStringURI.FindChar(':') == -1 || not -1, use kNotFound. >+ aStringURI.EqualsIgnoreCase("http:", 5) || >+ aStringURI.EqualsIgnoreCase("https:", 6) || >+ aStringURI.EqualsIgnoreCase("ftp:", 4) || >+ aStringURI.EqualsIgnoreCase("file:", 5)) >+ { Why only for these types? what about other protocols that might be installed? >+ // for file url, we need to convert the nsString to the file system >+ // charset before we pass to NS_NewURI >+ nsAutoString fsCharset(NS_ConvertUTF8toUCS2(GetFileSystemCharset())); If you change from const char* to nsAString&, You can avoid the double-conversion (since GetFileSystemCharset itself does one conversion, and this line converts it back. >Index: nsDefaultURIFixup.h >+ static nsCAutoString mFsCharset; Bah! This should not be static. No static classes allowed (see the C++ coding guidelines on www.mozilla.org) Is this a service? If so, a simple member variable will suffice. Anyway, the char/PRUnichar conversions in this patch are a bit of a mess..
Attachment #73952 - Flags: needs-work+
>what's "IsASCII()"? I don't see it defined anywhere... sure you don't mean >nsCRT::IsASCII? nsReadableUtils.h >+ NS_NewURI(aURI, uriString, bAsciiURI ? nsnull : > GetCharsetForUrlBar(uriString)); uriString is always in Unicode, charset here is not for the current charset encoding of uriString, it will be used later to encode the URI. >Why only for these types? what about other protocols that might be installed? The default is to use UTF-8 to encode URI. I listed protocols which we prefer system charset instead of UTF-8. E.g., UTF-8 is preffered for ldap, javascript instead of system charset. >+ nsAutoString fsCharset(NS_ConvertUTF8toUCS2(GetFileSystemCharset())); >If you change from const char* to nsAString&, You can avoid the >double-conversion (since GetFileSystemCharset itself does one conversion, >and this line converts it back. I use char* because NS_NewURI takes char* which is called more frequently than this part of the code which is file URI specific.
character set names are defined using ASCII characters, so we shouldn't be using unicode to represent charsets anywhere. unfortunately, so much of our code already uses unicode for charsets :-( in this case, i'm not sure what the best solution is... but, fwiw: you can get a unicode encoder/decoder from an ASCII charset using nsICharsetConverterManager2::GetCharsetAtom2.
From comment #8, > +const char * nsDefaultURIFixup::GetCharsetForUrlBar(const nsString& aStringURI) >Can we use nsAString or nsAFlatString for parameter? Need suggestion. Do we want to do that?
Comment on attachment 73952 [details] [diff] [review] Updated with darin's comment. oops, right. I forgot the use of the charset parameter in NS_NewURI (darin explained it to me a while back.. whoops!) I guess my concern with the whole character conversion is that in the case of this line: + nsAutoString fsCharset(NS_ConvertUTF8toUCS2(GetFileSystemCharset())); you actually are converting it twice - once inside of GetFileSystemCharset() (from UCS2 to ASCII) and once with NS_ConvertUTF8toUCS2...(ASCII back to UCS2) not to mention that if charsets are really ASCII then we should be using NS_ConvertASCIItoUCS2 So I guess that's why I'm complaining - I know that charset names are always ascii, but it sounds like it would be faster to store UCS2 for the existing APIs... I guess going through GetCharsetAtom2 might be just as slow :)
oh, and yes, you want to use nsAString, not nsString, as your parameter, if possible.
should necko have used UCS2 for charsets just to be consistent, or is there a chance at all that i can coerce the rest of the tree into using ASCII for charset strings??? :-)
>if charsets are really ASCII then we should be using NS_ConvertASCIItoUCS2 I will make the change. About nsString ->nsAString change, I tried something and I found that I need to link unicharutil in order to use nsCaseInsensitiveStringComparator (cannot simply done by adding REQUIRES = unicharutil). Alec, do you know anything about that?
no way, I think necko does the right thing. I would so love to see the whole tree change.. maybe after 1.0? :) Its just in this case, it seems like storing UCS2 makes sense. I dunno.
you need to #include nsIUnicharUtils.h to get that. Ugh. what a mess :)
>you need to #include nsIUnicharUtils.h to get that. But I am getting a link error. Do I need to do a static link to unicharutil for docshell?
Attachment #73952 - Attachment is obsolete: true
Roy, could you review?
Comment on attachment 73997 [details] [diff] [review] Changed to no argument for GetCharsetForUrlBar, removed static for mFsCharset. + if (charset.IsEmpty()) + mFsCharset.Assign(NS_LITERAL_CSTRING("ISO-8859-1")); Instead of hardcoding the charset, can we read from nsPref's default charset "intl.charset.default" and assign it here? However, if we need to static link nsPref, then forget it.
The charset here is for the system charset and has different meaning from "intl.charset.default". Does it make sense to use it as a substitution?
Comment on attachment 73997 [details] [diff] [review] Changed to no argument for GetCharsetForUrlBar, removed static for mFsCharset. I thought the "intl.charset.default" Pref is used when every charset detections are failed. Granted that this setting may be used for Document encoding; but the wordings in the Pref dlgbox says Default Character Encoding for _Navigator_, not for _Document_ alone. My point is who decides the fallback encoding? Developers? "ISO-8859-1" is just an another encoding. Why do we treat it any differently from other encodings? Well, I don't want to fight over this. Either way is fine with me and give /r=yokoyama
Attachment #73997 - Flags: review+
In this case, the hard coded string is used either nsIPlatformCharset is not available or it returns an error. I think writing the fallback to use the pref value everytime using nsIPlatformCharse is too match. It could be done in nsIPlatformCharse but note that nsIPlatformCharset also has its own fallback to use the hard coded string. > My point is who decides the fallback encoding? Developers? > "ISO-8859-1" is just an another encoding. Why do we treat it any differently > from other encodings? I think that depends on the situation. For the case like the service error, we can use ISO-8859-1 then we can expect the consistent result (even if the result is not correct).
Darin, could you review?
Comment on attachment 73997 [details] [diff] [review] Changed to no argument for GetCharsetForUrlBar, removed static for mFsCharset. >Index: nsDefaultURIFixup.cpp >+const char * nsDefaultURIFixup::GetCharsetForUrlBar() >+{ >+ const char *charset = GetFileSystemCharset(); >+#ifdef XP_MAC >+ if (charset[0] == 'x' && charset[2] == 'm') what if GetFileSystemCharset returns an empty string? charset[2] might result in a crash. >+ // for file url, we need to convert the nsString to the file system >+ // charset before we pass to NS_NewURI >+ nsAutoString fsCharset(NS_ConvertASCIItoUCS2(GetFileSystemCharset())); NS_ConvertASCIItoUCS2 is actually a class, so write this instead: NS_ConvertASCIItoUCS2 fsCharset(GetFileSystemCharset());
Attachment #73997 - Flags: needs-work+
Attachment #73997 - Attachment is obsolete: true
Comment on attachment 74128 [details] [diff] [review] Updated with darin's comment. sr=darin.... i don't much like seeing nsCAutoString used as a member variable, cuz it bloats the class by 64 bytes, but i guess you're doing that to eliminate a heap alloc. i'm not sure which is better... one larger than necessary heap alloc or two small heap allocs.
Attachment #74128 - Flags: superreview+
Comment on attachment 74128 [details] [diff] [review] Updated with darin's comment. a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #74128 - Flags: approval+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
Blocks: 393246
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: