Open Bug 267980 Opened 20 years ago Updated 2 years ago

URL with german characters not parsed correctly when protocol file:/ missing

Categories

(Core :: Internationalization, defect)

defect

Tracking

()

People

(Reporter: christophe_cornu, Unassigned)

References

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031128
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031128

Under a Linux Box (I am using RH9, german language set - LANG is de_DE.UTF-8)

Create a folder with german characters äö as in below. Add any html file into it.
/chrisx/äö/test.html

Open Mozilla. Paste the following URL (that's an absolute path)
file:///chrisx/äö/test.html
It works.

Paste the following URL:
/chrisx/äö/test.html
Mozilla brings up a dialog: 
The file /chrisx/%E4%F6/test.html cannot be found. Please check the location and
try again.

Verified with Mozilla 1.4 and Mozilla 1.7.2
It looks like when going through the code that 'guesses' the protocol that part
of that code is not handling UTF8 correctly?


Reproducible: Always
Steps to Reproduce:
See above

Actual Results:  
 

Expected Results:  
Should be the same with or without file:/
> /chrisx/%E4%F6/test.html

0xE4 and 0xF6 are the representation of "äö" in ISO-8859-1. I tried the
following cases under ko_KR.UTF-8 locale:

/home/jshin/가각        (U+AC00 U+AC01)
/home/jshin/가각/abc
/home/jshin/ġĢ   (U+0121 U+0122)
/home/jshin/αβ   (U+03B1 U+03B2)
/home/jshin/äö
/home/jshin/äö/abc 

The first four cases (Korean, Latin letters above U+0100, Greek letters) work
fine while the last two (letters in [U+00A0, U+00FF]) don't work as reported
here. This is very interesting because I expected all six cases to fail or to
succeed together. It seems like somewhere we use 'if (uniChar < 0x100)' where we
should use 'if (uniChar < 0x80)'. 

We have to test on Mac OS X and Windows to see if it's platform-specific or XP. 
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
per code inspection, looks like it's win/os2/unix only (this includes mac)

http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDefaultURIFixup.cpp#461

521         if (PossiblyByteExpandedFileName(in)) {
522           // removes high byte
523           rv = NS_NewNativeLocalFile(NS_LossyConvertUCS2toASCII(in),
PR_FALSE, getter_AddRefs(filePath));
524         }
525         else {
526           // input is unicode
527           rv = NS_NewLocalFile(in, PR_FALSE, getter_AddRefs(filePath));
528         }

PossiblyByteExpandedFilename returns true for any non-ascii latin1 character...


a better check would be to always try to interpret it as valid UTF-16 and check
if the file exists. if it does, use that.

otherwise, fall back to the LossyConvertUCS2 codepath.
Attached patch patchSplinter Review
I made a patch following cbie's suggestion. I'm wondering if we still need that
fallback, though.
Assignee: smontagu → jshin
Status: NEW → ASSIGNED
Comment on attachment 164851 [details] [diff] [review]
patch

asking for r/sr
Attachment #164851 - Flags: superreview?(darin)
Attachment #164851 - Flags: review?(bzbarsky)
1)  We have existing bugs on removing that docshell code, as I recall.  All this
    mess is a workaround for the fact that our command line code is buggy.  My
    questions as to whether this is still a problem in the relevant bugs went
    rather unanswered...  jshin, could you look that stuff up, please?

2)  If there is still an issue, I'd MUCH rather fix the command line code than
    hack this code yet more.
On Linux, I tested under ll_RR.UTF-8, ll_RR.ISO-8859-1 and ko_KR.EUC-KR. WIth
the fallback code removed, it just worked, which is a bit unexpected (see the
second paragraph from this) in a sense. When I added a couple of debug
statements, it became a bit clearer as to why. When I invoked mozilla with
'/full/path' on the command line, |ConvertFileToStringURI| is not invoked at
all.  I haven't checked where it's 'fixed up'. Unless there's a
platform-dependency (other than '\' and vs '/'), which I don't think is the case
, that long comment and fallback code can be removed.


As for nsICmdLine* implementations, I don't see anything wrong (after a cursory
look) [1]. Well,  I don't like very much the fact that nsICmdLineService APIs
return 'string' in the native encoding (and are still declared as 'scriptable').
However, that may be inevitable to avoid the round-trip between the native
encoding and UTF-16. Perhaps, documenting clearly that those APIs return values
in the native encoding would be necessary.

What's suspicious is their consumers (especially nsAppRunner [2]) that apply
'*WithConversion' and friends to what's returned by nsICmdLine as if they're
ASCII-only. 

[1]
I read comments on  bug 58866 and bug 86948, but it seems to me that the problem
in nsICmdLineService implementation that made necessary the hack in the current
code  has  gone. 

[2]
http://lxr.mozilla.org/seamonkey/source/xpfe/bootstrap/nsAppRunner.cpp#656


 
 
Attachment #164851 - Flags: superreview?(darin)
Attachment #164851 - Flags: review?(bzbarsky)
This code does the right thing (converting from the native encoding to Unicode): 
http://lxr.mozilla.org/seamonkey/source/xpfe/components/startup/src/nsAppStartup.cpp#850

This code can be simplified using NS_CopyNativeToUnicode as is done for
nsProfile.cpp in bug 268266.
An alternative is to change nsICmdLineService::URLtoLoad as talked about in bug
87127.

Anyway, the fallback code can be safely removed.




(In reply to comment #6)
> What's suspicious is their consumers (especially nsAppRunner [2]) that apply
> '*WithConversion' and friends to what's returned by nsICmdLine as if they're
> ASCII-only. 

Right.  This code used to be called by such a codepath, hence the stuff about
"byte-inflated" and so forth...

> I read comments on  bug 58866 and bug 86948, but it seems to me that the
> problem in nsICmdLineService implementation that made necessary the hack in
> the current code has gone. 

That's what it seems like to me to, but as I said in comment 58866 someone with
a relevant Windows system would have to test removal of the fallback code....

In particular testing would need to be done on both initial startup and DDE
invokations.  Similar testing should be done for xremote.

(Don't misunderstand me, as the comments in bug 87127 and bug 58866 indicate I
would love to remove this code... if you're sure it's ok to remove, I'll gladly
review the patch to do so.)
Blocks: 255344
Thanks for the explanation. I was also looking at XRemote code (as you
suggested). I can test this on Windows, but I'm not familiar with DDE (and don't
know how to test it. I'll google it :-)). 

Overall, I guess we have to put off removing the fallback code until removing it
is demonstrated to be safe on all three tier-1 platforms. In the meantime, do
you think it's worth fixing this bug only on branches using attachment 164851 [details] [diff] [review]?
This is rather minor so that we can just release-note it if necessary. 
  
(In reply to comment #9)
> Overall, I guess we have to put off removing the fallback code until removing
> it is demonstrated to be safe on all three tier-1 platforms.

We should just aim to do that, and remove the code.  CCing Javier in the hope
that he can help out on Mac.

> In the meantime, do you think it's worth fixing this bug only on branches
> using attachment 164851 [details] [diff] [review]?

I think pretty much any changes to this code are dangerous... if you're very
sure the patch is safe, I guess we can try to put it on branches, but it won't
make Firefox 1.0 anyway, so I'm not sure whether we care.
Tested the patch on the Mac, and it fixes the problem.
Thanks for testing. However, the patch still has the fallback code and I'm sure
that it's safe. What we're talking about is removing the following 

+        PRBool exist = PR_FALSE;
+        if (NS_SUCCEEDED(rv)) {
+          filePath->Exists(&exist);
+        }
+        if (NS_FAILED(rv) || !exist) {
+          // try deflating assuming it's byte-expanded 
+          rv = NS_NewNativeLocalFile(NS_LossyConvertUTF16toASCII(in),
+                                     PR_FALSE, getter_AddRefs(filePath));
+        }
Yes, still works with that bit of code removed.

What's odd is that when I first open the file with non-ASCII characters, it gets
escaped in the URL bar.  If I go to the URL bar and remove "file://" from the
escaped file name, it will fail to load, complaining that it cannot find the
file.  However, Mozilla opens the file just fine if I manually enter the file
name (non-escaped).
(In reply to comment #13)
> Yes, still works with that bit of code removed.

Thanks again for testing. My Mac is back up again (after a mysterious
post-10.3.6-upgrade syndrom :-)), I can test it, too. Can you tell me what's Mac
OS X equivalent of Xremote (X11) and DDE (Windows)? That is, how can I pass a
url to a running instance of mozilla? '-remote' option doesn't work on OS X.
 
> What's odd is that when I first open the file with non-ASCII characters, it gets
> escaped in the URL bar.  If I go to the URL bar and remove "file://" from the
> escaped file name, it will fail to load, complaining that it cannot find the
> file.  However, Mozilla opens the file just fine if I manually enter the file
> name (non-escaped).

That's just normal. Without 'file://', '/some/path/%41%42' is interpreted
literally rather than 'file:///some/path/AB'. Unless you have a file '%41%42' in
'/some/path', it should fail. 


OS: Linux → All
Hardware: PC → All
(In reply to comment #14)
> Can you tell me what's Mac
> OS X equivalent of Xremote (X11) and DDE (Windows)? That is, how can I pass a
> url to a running instance of mozilla?

You know, I'm not quite sure.  I think it just uses LaunchServices API to handle
this.  So if you set your browser as the default, and click on an HTML link in
another app (Mail for example), then it will open it up in the browser.

Ooh, actually, you can use the 'open' command from the command line.  Just do
"open -a .../MozillaDebug.app test.html".
Thanks, Javier. I'll play with it.  Adding Masayuki to Cc for help with testing
(see comment #12) DDE on Windows. (I can test it on Windows, but I rarely boot
up  Windows)


On DDE, it doesn't work.
I try to test with 'file://D:/wwwtest/テスト/test.html'.
But Mozilla read 'file://D:/wwwtest/eXg/test.html'.
Oops...

I forget important comment.
I tested on 'Current' build and 'Patched build',
but both build failed my test.
Thanks for testing.

'テスト' in Shift_JIS is 0x83 0x65 0x83 0x58 0x83 0x67. Apparently, that was
inflated to UTF-16 with AssignWithConversion or equivalent. 

How about 'mozilla \\D:\wwwtest\テスト\test.html' in Start | Run with an
instance of Mozilla already running? 
No, I use HTML Editor.
The editor convert the path to file://.

If I use explorer with DDE(WWW_OpenURL), Mozilla returns error message.
The message is "'d' protocol is not supported".

Should I test "mozilla %1"? Does not use DDE?
I tested "mozilla %1".
If the code is removed(comment 12), it does _not_ work fine.
But if the code exists, it works fine.

On the removed case, mozilla return message following.
'd:/wwwtest/_e_X_g/test.html' is not found.
(In reply to comment #5)

> 2)  If there is still an issue, I'd MUCH rather fix the command line code than
>     hack this code yet more.

Fixing the command line code is kinda 'put on hold' by Benjamin's plan for overhauling XUL(runner).  See bug 98952. Should we go ahead with the patch with the fallback for 1.8 branch? 


The nsICommandLine code was bug 276588 and has replaced all of this (in toolkit apps). If this is still an issue in FF1.5 it's caused by a different codepath.
QA Contact: amyy → i18n

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: jshin1987 → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: