Closed Bug 116711 Opened 23 years ago Closed 14 years ago

Mozilla messes up german umlauts in file:// url

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 2000
defect

Tracking

()

RESOLVED WORKSFORME
mozilla1.0.1

People

(Reporter: mb1611, Assigned: tetsuroy)

References

Details

(Keywords: intl)

Attachments

(1 file, 2 obsolete files)

Mozilla isn't able to load files you open via command line on Windows 2000 if
the filename contains an german umlaut like ä (ä in HTML).
This only happens if you are opening the file via doubleclick in the Explorer.
It seems like Mozilla take the URL on the commandline and then deletes the
umlaut and the character immediatly following. 
_But_ this only happens if you didn't open another Mozilla window before. If one
window is open then all loads ok and Moz doesn't screw up the URL.
My version is Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7)
Gecko/20011221.
Probably not actually a "file:" URL problem.

What are you typing at the command line? Maybewe can move this bug to the right 
component. Also, does it happen in other veresions of Windows?
Well my command-line is the following:

mozilla -url "<url>"
I assume this might not only be depending on the file://-url-type. But I found
it there because when I go online I always fist start Moz and then type in an
URL. Then all is ok. The command-line even works if I already have another
window open. This seems only be related to the first start of Moz.

I don't have another version of Windows for testing (I think one version is
already more then enough of this system *g*). I also didn't try this on Linux,
but this would be possible if you'd like to see this.
-> XP-apps 

Thanks for the additional info. We'll have someone else check the other 
platforms. I think this should be looked at by XP-apps.
Assignee: dougt → law
Component: Networking: File → XP Apps: Cmd-line Features
Keywords: qawanted
QA Contact: benc → sairuh
Reproducable on Win2000 Build 0.9.8 2002020406:

File c:\mäh.html does exist in filesystem
Browser must be closed (even Quickstart)!
Commandline:

mozilla -url c:\mäh.html
mozilla -url "c:\mäh.html"  (both the same result)
- Errormessage: "File /c:/m.html could not be found. Please check the location
and try again"
- The URL-Bar shows: c:\m.html

mozilla -url "file://C:\mäh.html"
- No Errormessage
- URL-Bar: file://C:\m.html

Other case: File mähx.html does NOT exist in filesystem:
mozilla -url "c:\mähx.html"
- Errormessage: "File /c:/mx.html could not be found. Please check the location
and try again" 
- URL-bar: c:\mx.html
- (see the filename mähx.html <-> mx.html!)

Tests on Linux might follow soon.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Damn, the filename I ment was: m&auml;h.html.  &auml; instead of a questionmark.
Nominating and resetting component in order to grovel and beg for help!  This 
seems important.  It probably isn't too hard to fix but I really need help 
recreating the problem.  I don't think I can create files with umlauts on my 
machine, can I?
Assignee: law → yokoyama
Component: XP Apps: Cmd-line Features → Internationalization
Keywords: nsbeta1
QA Contact: sairuh → ruixu
This is a general issue, hope it can be fixed on system level with an unicode 
solution.
Blocks: 101606
Keywords: intl
Seems to work fine under Mac OS 9 using Mac/2002020405 (0.9.8). A file named
"Tëst.html" (&euml;), saved from Composer, when double-clicked is successfully
opened in a Navigator window.

There is, however, a cosmetic issue, and that is that Composer doesn't display
the filename right in the Title bar; it shows [T%91st.html] as the filename, as
though it were a URL rather than a real file name.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
nsbeta1+ per i18n triage
Keywords: nsbeta1nsbeta1+
/nhotta: can your review?
Comment on attachment 71982 [details] [diff] [review]
Convert Commandline URL to Unicode using Platform charset

r=nhotta
Attachment #71982 - Flags: review+
Comment on attachment 71982 [details] [diff] [review]
Convert Commandline URL to Unicode using Platform charset

>Index: Makefile.in
>===================================================================
>RCS file: /cvsroot/mozilla/xpfe/bootstrap/Makefile.in,v
>retrieving revision 1.168
>diff -u -r1.168 Makefile.in
>--- Makefile.in	19 Feb 2002 10:01:03 -0000	1.168
>+++ Makefile.in	1 Mar 2002 00:24:28 -0000
>@@ -49,6 +49,7 @@
> 		  mpfilelocprovider \
> 		  browser \
> 		  docshell \
>+		  uconv \
> 		  xremoteservice \
> 		  $(NULL)

xpfe depending on uconv?  Yurk. You should check with alecf, keeper of all
dependencies, about that.

>+static NS_DEFINE_CID(kCharsetConverterManagerCID, NS_ICHARSETCONVERTERMANAGER_CID);

Even if we do keep this dependency, we want to use the contractID rather
than the class ID.

>+static nsresult ConvertToUnicode(nsString& aCharset, const char* inString, nsAWritableString& outString)
>+{
>+  nsresult rv;
>+  // convert result to unicode
>+  nsCOMPtr<nsICharsetConverterManager> ccm = 
>+           do_GetService(kCharsetConverterManagerCID, &rv);

Prefer initialization to assignment:
nsCOMPtr<nsICharsetConverterManager> ccm(do_GetService(CCM_CONTRACT_ID, 
					 &rv));

>+    nsCOMPtr <nsIUnicodeDecoder> decoder; // this may be cached

Where can it be cached?  Does that caching matter to this code?

>+    rv = ccm->GetUnicodeDecoder(&aCharset, getter_AddRefs(decoder));
>+
>+    if(NS_SUCCEEDED(rv) && decoder) {

Can GetUnicodeDecoder return a success code and provide a NULL decoder?
(I don't know that code well, but it seems unlikely-ish.)  If so, you're
going to return that success code but not actually decode anything.

(How do you feel about returning early, rather than nesting conditional
blocks deeply?

rv = do->Something();
if (NS_FAILED(rv))
    return rv;

rv = do->SomethingElse();
if (NS_FAILED(rv))
    return rv;

etc.  I find it a lot easier to analyze the control flow that way, myself.)

>+      if (NS_SUCCEEDED(rv)) {
>+        PRUnichar *unichars = new PRUnichar [uniLength];
>+        if (nsnull != unichars) {
>+          // convert to unicode
>+          rv = decoder->Convert(inString, &srcLength, unichars, &uniLength);
>+          if (NS_SUCCEEDED(rv)) {
>+            // Pass back the unicode string
>+            outString.Assign(unichars, uniLength);
>+          }
>+          delete [] unichars;
>+        }

I don't expect you to write them, but boy -- would an iterator-based
decoding system ever be nicer.	Do we not have an NS_ConvertXXXtoYYY string
helper that lets you parameterize the character sets?  Again, alecf will
know.

>+        if (NS_FAILED(rv)) {
>+          charSet.Assign(NS_LITERAL_STRING("ISO-8859-1"));  // use ISO-8859-1 in case of any error
>+          NS_ASSERTION(0, "failed to get a platform charset");
>+        }

In this case, we just need to inflate, right?  Do we care about the
conversion overhead when we could 

>+        // convert the cmdLine URL to Unicode
>+        rv = ConvertToUnicode(charSet, urlToLoad, url);
>+        NS_ASSERTION(NS_SUCCEEDED(rv), "failed to convert commandline url to unicode");
>+      }
>       rv = OpenWindow(chromeUrlForTask, url, width, height);

If the conversion fails, perhaps because we don't have a charset converter
for the selected platform charset, what are we going to end up loading here?
Random memory in |url|?

Please consult alecf for string guidance, and resubmit for review when
these issues are addressed.
Attachment #71982 - Flags: needs-work+
Just to note.
Here we have a limitation of the characters we can support as
a commandline parameter.  (eg. Euro symbol in Chinese OS.)
We need to have wmain() in order to support such characters; but
it is outside of the scope of this bug, thus 101606. 
I am also concerned about wmain() support in other platforms.

======== MSVC++ online help ===============
main( int argc, char *argv[ ], char *envp[ ] )
{
  program-statements
}
wmain( int argc, wchar_t *argv[ ], wchar_t *envp[ ] )
{
  program-statements
}

The main function marks the beginning and end of program execution. 
A C or C++ program must have one function named main. 
If your code adheres to the Unicode programming model, 
you can use the wide-character version of main, which is wmain.
========== online help end ==========================
Attached patch revised (obsolete) — Splinter Review
Attachment #71982 - Attachment is obsolete: true
alecf: I need to make xpfe depend on uconv (thus the makefile change) because 
       of this bug which needs to have unicode converter.
       I agree with shaver (#12), I am all open for suggesions. I just can't
       think of better place to put the code.
Keywords: qawanted
Priority: -- → P3
adding darin for url help, though this sounds like a windows integration issue,
so cc'ing law
so now that I've looked over it my feelings are:
1) xpfe depending on uconv - not terrible.. not great but better than the other
way around
2) that whole block of code around line 853 needs to be moved into another
static helper function
3) would it make sense to put this stuff in nsCmdLineService somewhere? either
by changing some of nsICmdLineService's apis to spit back unicode, or by adding
additional methods? (I'd prefer the API change, frankly)
If we can support unicode commandline (see #13), then
we wouldn't need this unicode converter stuff at nsAppRunner level 
and we able to supprt the non-locale characters.

Unfortunately, I'd be very surprised if wmain worked on every platform.. have
you tried the Big 3? 

The more I think about it, the more I think we should be fixing
nsICmdLineService::GetCmdLineValue to return a wstring instead of a string.
this solution looks correct to me.
Summary: Mozilla screws up german umlauts in file:// url → Mozilla messes up german umlauts in file:// url
Well, originally I thought it will be easy to change but
it ended up with more than what we wanted.

I dont' think this patch is feasible for 1.0; because
- not sure if we want to land this kind of interface change at this time:
- ANSI compatible function may not be supported in all the platforms:
  swscanf(..) is ANSI compatible according to MSVC++ help.
  Not sure if it's true.
- not sure of what to do with JavaScript call:
  /xpfe/components/resetPref/nsResetPref.js, line 78 
  -- var prefList = cmdLine.getCmdLineValue( "-resetPref" ).split( "," );
- If we want to go ahead with the changes,then there may have impact 
  on the performance, bacause we need to call NS_ConvertUCS2toUTF8() for
  non-unicode friendly classes/APIs, nsFileSpec, strtok,
nsWindowWatcher::Open()

Given the above concerns, we should reconsider the previous patch. 
http://bugzilla.mozilla.org/attachment.cgi?id=72509&action=view

alecf/darin: what do you think?
Comment on attachment 72696 [details] [diff] [review]
nsICmdLineService::GetCmdLineValue() and GetURLToLoad() to return a wstring instead of a string.

I agree, This probably isn't necessary for mozilla 1.0... that said, I think
the patch is a good place to start... however:

- L"1" is not portable, use NS_LITERAL_STRING("1") and then use .Equals()
- %S is not portable in printf(), use NS_ConvertUCStoUTF8
- don't use nsCString as a local variable, just create a temporary:
OpenWindow(NS_ConvertUCS2toUTF8(cmdResult).get(), ..)
- actually, every place in this patch where you're using NS_CONST_CAST to
convert NS_ConvertUCS2toUTF8 is unnecessary.


while you're there, fix the (const PRUnichar*) casts to use NS_CONST_CAST so we
dont loose our datatype
- you shouldn't have to use swscanf, we can use NS_LossyConvertUCS2toASCII()
and pass to sscanf
- I think we should just convert OpenWindow over to using unicode
- don't use AssignWithConversion (see UILocaleName) - at least use
CopyUCS2toASCII()
- we need a better way of doing the strtok() stuff - strtok() itself is not
portable, and nsCRT::strtok() only works on char*'s as well. I think we should
just manually parse the profile dir ourselves.

Anyway, the general approach looks good!
Attachment #72696 - Flags: needs-work+
alecf: thanks for valuable input.  Can we do the following?
1) Finish th first patch (03/04/02 16:36) and fix the bug before moz 1.0. 
   We leave this bug OPEN and change the targeted Milestone.
2) Once we ship moz 1.0, then we can proceed with the latter approach.
Comment on attachment 72509 [details] [diff] [review]
revised

sure. sr=alecf on this one. i'd prefer to see this happen after 1.0 branches,
just so we're assured we get the right fix on the trunk. How critical is this
that it lands now?
Attachment #72509 - Flags: superreview+
>How critical is this that it lands now?
It has nsbeta1+ keyword after i18n triage.
Comment on attachment 72509 [details] [diff] [review]
revised

/r=nhotta from previous patch
Attachment #72509 - Flags: review+
Comment on attachment 72509 [details] [diff] [review]
revised

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #72509 - Flags: approval+
Patch (03/04/02 16:36) checked in the trunk.
Moving the milestone to moz 1.01 for the second
recommendation (03/05/02 16:58). 
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment on attachment 72509 [details] [diff] [review]
revised

Marking as Invalid for new milestone.
Attachment #72509 - Attachment is obsolete: true
it seems we already fix by using the 1st approach. Should we remove the nsbeta1+
for now so it won't show on the radar ?
It's fixed for now; but need better patch for future.
removing nsbeta1+ 
Keywords: nsbeta1+
(firefox 1.0.1)

Sorry guys, this bug is still there.

Another way to reproduce it on windows, is to double-click the file c:\üü.html
in the windows explorer. 

It results in a firefox alert message box "The file /c:/%FC%FC.html cannot be
found. Please check the location and try again."
in 1.0.2 the bug shows up if Firefox is already open. If Firefox is not running,
the Umlaut File is loaded!
This bug is reproducable in SeaMonkey 1.0.1/Windows 2000, even if another SeaMonkey window is already open.

My test case:
Rename the (valid) HTML file D:\stories\Baen Books\startUp.htm to 
D:\stories\Baen Books\stärtUp.htm.

Doubleclick the file stärtUp.htm i Windows Explorer. The error message
"The file /d:/stories/Baen Books/st%E4rtUp.htm cannot be found. Please check..." appears.    
(In reply to comment #34)
> This bug is reproducable in SeaMonkey 1.0.1/Windows 2000, even if another
> SeaMonkey window is already open.
> 
> My test case:
> Rename the (valid) HTML file D:\stories\Baen Books\startUp.htm to 
> D:\stories\Baen Books\stärtUp.htm.
> 
> Doubleclick the file stärtUp.htm i Windows Explorer. The error message
> "The file /d:/stories/Baen Books/st%E4rtUp.htm cannot be found. Please
> check..." appears.    

Update/Correction:
This bug is ONLY reproducable if another SeaMonkey window is already open. If SeaMonkey was not started before performing the double click, the file is loaded as expected.

Martin, can you try a trunk build of firefox and seamonkey? 
QA Contact: ruixu → i18n
This is wfm with SM trunk on win7.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: