Closed Bug 161352 Opened 22 years ago Closed 22 years ago

URL: can't load local file with semicolon (;) in name

Categories

(Core :: Networking: File, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: mni, Assigned: neeti)

References

(Depends on 1 open bug)

Details

(Keywords: testcase)

Attachments

(4 files, 1 obsolete file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL; rv:1.1b) Gecko/20020721
BuildID:    2002072104

Any filename on local disk, which contains semicolon ";" can't be loaded. The
error message says that this file cannot be found. 
Example of filename: Title;dcopt.html

I'm using Windows XP Prof English, Mozilla 1.1b.
This error happens since I remember - mozilla version 0.9.8, english or localized.



Reproducible: Always
Steps to Reproduce:
1. Create file on local disk with name: Title;dcopt.html
2. Load it in Mozilla thru Open File (ctrl-O)

Actual Results:  Error message about file not to be found.

There are similar reported, but concerning high-ascii codes or national
characters (japanese), but no one reported problems with just semicolon.
Confirmed on linux too (today's trunk).
Assignee: law → dougt
Status: UNCONFIRMED → NEW
Component: File Handling → Networking: File
Ever confirmed: true
OS: Windows XP → All
QA Contact: sairuh → benc
Hardware: PC → All
andreas, ideas?
A quick look at RFC 1630's BNF shows this is "reserved":

reserved               = | ; | / | # | ? | : | space
Summary: Mozilla can't load local file with semicolon (;) in name → URL: can't load local file with semicolon (;) in name
Andreas: so in this case, shouldn't we escape it when making the URL? I'm
assuming this should work in the right situations.
Yes, it yhould be escaped when creating the url from the local filename. I find
it strange that it does not already happen. We escape a ; everywhere in the url
except in directory, param query and ref. Escaping on this case should have
happen with the filename mask.
Yes, and that is okay this way in my opinion. URLs have to be parsed this way.
The trick is to escaped the ; in the filename while creating the url from the
local filepath. 
+nsbeta1 - these files are rare, but this seems to be a standards problem.

UPDATE: mozilla 1.1b - Mac OS X
I found a chart:

http://i-technica.com/whitestuff/urlencodechart.html

It said that the encode value was %3b.

I tried to use that encoding in a file URL:
file:///Macintosh%20HD/fileURLs/testof%3B.txt

it works.

And if you open the file via the XUL dir view of the directory:

file:///Macintosh%20HD/fileURLs/

it does properly escape.
Keywords: nsbeta1
me
Assignee: dougt → neeti
Attached patch proposed patchSplinter Review
andreas: the above patch loads the file. Is this the correct way to do it?
I can't say I like the hardcoded replace. Instead why not change the Escape-Mask
from esc_Directory to esc_FileBaseName as asked for in comment #5 ? That should
do the trick.
andreas: I have tried changing the Escape-Mask from esc_Directory to
esc_FileBaseName

Index: nsURLHelperWin.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/base/src/nsURLHelperWin.cpp,v
retrieving revision 1.1
diff -u -r1.1 nsURLHelperWin.cpp
--- nsURLHelperWin.cpp	13 Sep 2002 19:32:30 -0000	1.1
+++ nsURLHelperWin.cpp	3 Oct 2002 17:16:25 -0000
@@ -69,7 +69,7 @@
     NS_NAMED_LITERAL_CSTRING(prefix, "file:///");
   
     // Escape the path with the directory mask
-    if (NS_EscapeURL(ePath.get(), ePath.Length(), esc_Directory+esc_Forced,
escPath))
+
if (NS_EscapeURL(ePath.get(), ePath.Length(), esc_FileBaseName|esc_AlwaysCopy,
escPath))
         escPath.Insert(prefix, 0);
     else
        escPath.Assign(prefix + ePath);


I get a lot of assertions and then it hangs with this change. First, we fail to
get the base url in nsChromeRegistry::ConvertChromeURL(nsChromeRegistry *....)
in mozilla/rdf/chrome/src/nsChromeRegistry.cpp, 

rv = GetBaseURL(package, provider, finalURL);
when package is "communicator", and provider is "content"

These are the assertions and warnings I get before it hangs

Type Manifest File: D:\build\mozilla\dist\bin\components\xpti.dat
+++ JavaScript debugging hooks installed.
nsNativeComponentLoader: autoregistering begins.
*** Registering necko_core_and_primary_protocols components (all right -- a gene
ric module!)
nsNativeComponentLoader: autoregistering succeeded
nNCL: registering deferred (0)
WARNING: dependent window created without a parent, file d:/build/mozilla/xpfe/b
ootstrap/nsWindowCreator.cpp, line 148
WEBSHELL+ = 1
For application/x-java-vm found plugin C:\Program Files\JavaSoft\JRE\1.3.0_01\bi
n\NPOJI600.dll
WARNING: chrome: failed to get base url for chrome://communicator/content/profil
e/profileSelection.xul -- using wacky default, file d:/build/mozilla/rdf/chrome/
src/nsChromeRegistry.cpp, line 562
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed, file d:/build/mozilla/docshell
/base/nsDocShell.cpp, line 2521
WARNING: NS_ENSURE_TRUE(window) failed, file d:/build/mozilla/xpfe/appshell/src/
nsContentTreeOwner.cpp, line 635
WARNING: NS_ENSURE_TRUE(docShellElement) failed, file d:/build/mozilla/xpfe/apps
hell/src/nsXULWindow.cpp, line 1120
WARNING: NS_ENSURE_TRUE(windowElement) failed, file d:/build/mozilla/xpfe/appshe
ll/src/nsXULWindow.cpp, line 1174
###!!! ASSERTION: no xul:window: 'windowElement', file d:/build/mozilla/xpfe/app
shell/src/nsXULWindow.cpp, line 935
###!!! ASSERTION: no xul:window: 'windowElement', file d:/build/mozilla/xpfe/app
shell/src/nsXULWindow.cpp, line 867
###!!! ASSERTION: no xul:window: 'windowElement', file d:/build/mozilla/xpfe/app
shell/src/nsXULWindow.cpp, line 989
WARNING: no preshell for window, file d:/build/mozilla/dom/src/base/nsGlobalWind
ow.cpp, line 3915
WARNING: NS_ENSURE_TRUE(presShell) failed, file d:/build/mozilla/dom/src/base/ns
GlobalWindow.cpp, line 3952
WEBSHELL- = 0
WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL 'chrome://global/skin/scroll
bars.css' failed.  Error code: 18, file d:/build/mozilla/content/html/style/src/
nsCSSLoader.cpp, line 1931
WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL 'resource:///res/forms.css'
failed.  Error code: 18, file d:/build/mozilla/content/html/style/src/nsCSSLoade
r.cpp, line 1931
WEBSHELL+ = 1
WEBSHELL+ = 2
D:\build\mozilla\dist\bin\chrome\inspector.jar mtime changed, invalidating FastL
oad file
WARNING: chrome: failed to get base url for chrome://navigator/content/navigator
.xul -- using wacky default, file d:/build/mozilla/rdf/chrome/src/nsChromeRegist
ry.cpp, line 562
WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL 'resource:///res/ua.css' fai
led.  Error code: 18, file d:/build/mozilla/content/html/style/src/nsCSSLoader.c
pp, line 1931
*** open of resource:/res/ua.css failed: error=80520012
###!!! ASSERTION: unexpected null pointer: 'aUAStyleSheet', file d:/build/mozill
a/content/base/src/nsDocumentViewer.cpp, line 1272
WARNING: unable to load UA style sheet, file d:/build/mozilla/content/base/src/n
sDocumentViewer.cpp, line 1663
WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL 'chrome://global/skin/scroll
bars.css' failed.  Error code: 18, file d:/build/mozilla/content/html/style/src/
nsCSSLoader.cpp, line 1931
WARNING: CSSLoaderImpl::LoadAgentSheet: Load of URL 'resource:///res/forms.css'
failed.  Error code: 18, file d:/build/mozilla/content/html/style/src/nsCSSLoade
r.cpp, line 1931
###!!! ASSERTION: no quirk stylesheet: 'mQuirkStyleSheet', file d:/build/mozilla
/content/base/src/nsStyleSet.cpp, line 867
Note: verifyreflow is disabled
###!!! ASSERTION: no quirk stylesheet: 'mQuirkStyleSheet', file d:/build/mozilla
/content/base/src/nsStyleSet.cpp, line 867
Note: styleverifytree is disabled
Note: frameverifytree is disabled
We are returning a NO_RDF_VALUE for
rv = ds->GetTarget(aSource, aProperty, aTruthValue, aResult) 
in CompositeDataSourceImpl::GetTarget(...)

CompositeDataSourceImpl::GetTarget(CompositeDataSourceImpl * const 0x01415d48,
nsIRDFResource * 0x0193f8f8, nsIRDFResource * 0x014177b8, int 1, nsIRDFNode * *
0x0012e9a8) line 832
nsChromeRegistry::FollowArc(nsIRDFDataSource * 0x01415d48, nsACString & {...},
nsIRDFResource * 0x0193f8f8, nsIRDFResource * 0x014177b8) line 1220 + 46 bytes
nsChromeRegistry::GetBaseURL(const nsACString & {...}, const nsACString & {...},
nsACString & {...}) line 670 + 48 bytes
nsChromeRegistry::ConvertChromeURL(nsChromeRegistry * const 0x0144f5c8, nsIURI *
0x01437fa8, nsACString & {...}) line 551 + 26 bytes
nsChromeProtocolHandler::NewChannel(nsChromeProtocolHandler * const 0x01437eb0,
nsIURI * 0x01437fa8, nsIChannel * * 0x0012f0ac) line 640 + 40 bytes
nsIOService::NewChannelFromURI(nsIOService * const 0x01440838, nsIURI *
0x01437fa8, nsIChannel * * 0x0012f0ac) line 511 + 40 bytes
NS_NewChannel(nsIChannel * * 0x0012f1a4, nsIURI * 0x01437fa8, nsIIOService *
0x01440838, nsILoadGroup * 0x0145cb78, nsIInterfaceRequestor * 0x014395b0,
unsigned int 524288) line 164 + 20 bytes
nsDocShell::DoURILoad(nsIURI * 0x01437fa8, nsIURI * 0x00000000, nsISupports *
0x00000000, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, int 1,
nsIDocShell * * 0x00000000, nsIRequest * * 0x00000000) line 5102 + 91 bytes
nsDocShell::InternalLoad(nsDocShell * const 0x01439588, nsIURI * 0x01437fa8,
nsIURI * 0x00000000, nsISupports * 0x00000000, int 1, const unsigned short *
0x01930bd0, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, unsigned
int 1, nsISHEntry * 0x00000000, int 1, nsIDocShell * * 0x00000000, nsIRequest *
* 0x00000000) line 5020 + 51 bytes
nsDocShell::LoadURI(nsDocShell * const 0x01439588, nsIURI * 0x01437fa8,
nsIDocShellLoadInfo * 0x0193f580, unsigned int 0, int 1) line 720 + 73 bytes
nsWindowWatcher::OpenWindowJS(nsWindowWatcher * const 0x0191de24, nsIDOMWindow *
0x00000000, const char * 0x014404c0, const char * 0x0218da2c, const char *
0x0218d52c, int 1, unsigned int 1, long * 0x0190f2d0, nsIDOMWindow * *
0x0012fa44) line 771
nsWindowWatcher::OpenWindow(nsWindowWatcher * const 0x0191de20, nsIDOMWindow *
0x00000000, const char * 0x014404c0, const char * 0x0218da2c, const char *
0x0218d52c, nsISupports * 0x01440590, nsIDOMWindow * * 0x0012fa44) line 459 + 48
bytes
nsProfile::LoadDefaultProfileDir(nsCString & {...}, int 1) line 580 + 94 bytes
nsProfile::StartupWithArgs(nsProfile * const 0x01425cc8, nsICmdLineService *
0x01445c10, int 1) line 412 + 16 bytes
nsAppShellService::DoProfileStartup(nsAppShellService * const 0x0145bdd8,
nsICmdLineService * 0x01445c10, int 1) line 266 + 31 bytes
InitializeProfileService(nsICmdLineService * 0x01445c10) line 1171 + 31 bytes
main1(int 1, char * * 0x00284870, nsISupports * 0x00276f08) line 1450 + 14 bytes
main(int 1, char * * 0x00284870) line 1883 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e97d08()


My failure. I now think the best approach is to mask *each* segment of the path
individually with the esc_FileBaseName mask. That should do it. Simply replacing
the mask will get all the / escaped with the results seen above.
andreas: when you say we should mask *each* segment of the path individually
with the esc_FileBaseName mask, is *each* segment, is the portion of the path
delimited by the "/"  ? 

thanks,
Neeti

Yes, but I would do that with the native path, before changing \ to / for
example or : to / on mac. In fact that change can happen while going through the
path and escaping each segment.
Keywords: testcase
andreas:could you take a look at this patch
win, mac and os2 need to be tested by build buddies ...
Attachment #101732 - Attachment is obsolete: true
-     1023,1023,1023,1023,1023,1023,1023,1023,1023,1023, 912, 912,   0,1008,  
0, 768,       /* 3x  0123456789:;<=>?	 */
+     1023,1023,1023,1023,1023,1023,1023,1023,1023,1023,1008, 912,   0,1008,  
0, 768,       /* 3x  0123456789:;<=>?	 */

Here is andreas's explanation of how : will not be escaped when being part of
the filename.

This is a bit code, bit set says character is allowed in that part of the url
without escaping.

bite meaning:

  1: scheme
  2: user
  4: password
  8: host
 16: directory
 32: filebasename
 64: fileextension
128: param
256: query
512: ref

So 1023 says the character is allowed in every part of the url (no need for
escape). In case of the : which had previously 912 and now 1008 this means:

 912 = 512 + 256 + 128 + 16 = ref + query + param + directory
1008 = 512 + 256 + 128 + 64 + 32 + 16 = ref + query + param + fileextension +
filebase + directory.

Now the : will not be escaped when being part of the filename = filebasename +
fileextension. 
Comment on attachment 101804 [details] [diff] [review]
patch to allow filenames with ; and in the process do not escape : in filenames 

this looks terribly expensive.	why don't we come up with a new mask or modify
the meaning of esc_Directory instead?  we shouldn't need to modify the code in
nsURLHelperXXX.cpp.
Attachment #101804 - Flags: needs-work+
We can't modify esc_Directory mask because we have to allow ; in the directory
component as part of the URL spezification (param) ... those parts after the ;
are not part of the directory name. Because of this while constructing a file
url from a native path any ; has to be escaped. The only solution I see would be
a new mask but those do not fit well with the current schema because it is
oriented on parts of the url and not on what kind of url (file for example) is
constructed. 
but we don't support ';' inside the directory path.  we only support it
following the filename.  i know RFC 2396 allows params within each path segment,
but mozilla does not support that part of the RFC.  just look at nsIURL::param.
 it only provides access to a param found between the filename and the query/ref.

i just don't see the point of writing slow code in order to support something
that is non-existent in practice.  our frozen embedding APIs do not expose param
between path segments, so should the implementation of those interfaces support
params between path segments?  it makes no sense at this point in time.
Okay, but we can't escape ; in esc_Directory because if we do not support it in
mozilla we should at least not change the semantics of the url by escaping it
and for example leave it to a http server to do the right thing and serve the
correct page. That is what we do. Not escaping ; in the directory part is there
for a good reason. That should not change.

The problem here is that escaping the *whole* path with esc_directory does not
cut it because that path contains the filename and when the filename contains a
; it will not get escaped and therefore part of the filename will be parsed as
param and cut from the filename ... file not found ...

Maybe we can instead just cut off the filename from the native path, escape it
with the esc_FileBaseName mask and mask the directory part with esc_Directory
and then put both together. That should be faster then going through the path
step by step.
yeah, you make a good point.  if some server is using params, we shouldn't munge
them.

so what if we just do a ReplaceSubstring(";", "%3b") on the result returned from
NS_EscapeURL(esc_Directory)?
I can't say I like that programming style. It was also Neetis first idea. But
what if masks esc_directory and esc_FileBaseName change in the future? Who will
remember that have to do here more replaces?
but we just got done enumerating reasons why the escaping rules for
esc_Directory and friends cannot change.  anyways, ';' is special... i wouldn't
advocate repeating this pattern if another character needs to be escaped.  if
that ever happens, we should write a single pass loop.

bottom line: this is very performance critical code.  the conversion routines
between nsIFile and file:// are called a lot, especially at startup.

neeti's original patch has the advantage of being very performant.  we don't
have to touch the heap allocator at all unless there is a ';' in the path. 
seems like the ideal situation IMO.  i'd just like to see some documentation in
the original patch.
They can not change regarding ; but otherwise ... it has happened in the past.
But I agree with you, this point is very performance critical.
tested this on linux and windows
this patch compiles fine on OS/2 and does fix the problem.
Comment on attachment 102594 [details] [diff] [review]
revised original patch with updated comments

i think the comment could be simplified down to:

    // esc_Directory does not escape the semicolons, so if a filename 
    // contains semicolons we need to manually escape them.

should be apparent from the ReplaceSubstring call that ';' maps to %3b :)

sr=darin with or without this change
Attachment #102594 - Flags: superreview+
Attachment #102801 - Flags: review+
Comment on attachment 102801 [details] [diff] [review]
revised comments on the patch

r=dougt
Comment on attachment 102801 [details] [diff] [review]
revised comments on the patch

sr=darin
Attachment #102801 - Flags: superreview+
Comment on attachment 102801 [details] [diff] [review]
revised comments on the patch

a=asa for checkin to 1.2b (on behalf of drivers)
Attachment #102801 - Flags: approval+
checked in fix
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
VERIFIED: Mozilla 1.3a, allplats.
Mariusz, I'd like to get your confirmation this works for you now.

I was able to get these files to open in the file picker and via HTML view as well.

Since this only happened via "open file", can someone explain why it worked in
HTML view and/or XUL tree? Does file picker have it's own entry point w/
different escaping?
Status: RESOLVED → VERIFIED
Depends on: 473280
Comment on attachment 102594 [details] [diff] [review]
revised original patch with updated comments

>? diff1.txt
>? neckobase_s.lib
>? neckobase_s.pdb
>Index: nsURLHelperMac.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/base/src/nsURLHelperMac.cpp,v
>retrieving revision 1.1
>diff -u -r1.1 nsURLHelperMac.cpp
>--- nsURLHelperMac.cpp	13 Sep 2002 19:32:30 -0000	1.1
>+++ nsURLHelperMac.cpp	11 Oct 2002 17:04:15 -0000
>@@ -89,6 +89,11 @@
>     // need encoding; use %2F which is a forward slash
>     escPath.ReplaceSubstring(":", "%2F");
> 
>+    // esc_Directory does not escape the semicolons but if a filename 
>+    // contains semicolons we need to encode it; use %3b which is a 
>+    // semicolon
>+    escPath.ReplaceSubstring(";", "%3b");
>+
>     escPath.Insert(prefix, 0);
> 
>     // XXX this should be unnecessary
>Index: nsURLHelperOS2.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/base/src/nsURLHelperOS2.cpp,v
>retrieving revision 1.2
>diff -u -r1.2 nsURLHel
Flags: needinfo?(MAWAVANSMOOKEY)
Comment on attachment 102594 [details] [diff] [review]
revised original patch with updated comments

>? diff1.txt
>? neckobase_s.lib
>? neckobase_s.pdb
>Index: nsURLHelperMac.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/base/src/nsURLHelperMac.cpp,v
>retrieving revision 1.1
>diff -u -r1.1 nsURLHelperMac.cpp
>--- nsURLHelperMac.cpp	13 Sep 2002 19:32:30 -0000	1.1
>+++ nsURLHelperMac.cpp	11 Oct 2002 17:04:15 -0000
>@@ -89,6 +89,11 @@
>     // need encoding; use %2F which is a forward slash
>     escPath.ReplaceSubstring(":", "%2F");
> 
>+    // esc_Directory does not escape the semicolons but if a filename 
>+    // contains semicolons we need to encode it; use %3b which is a 
>+    // semicolon
>+    escPath.ReplaceSubstring(";", "%3b");
>+
>     escPath.Insert(prefix, 0);
> 
>     // XXX this should be unnecessary
>Index: nsURLHelperOS2.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/base/src/nsURLHelperOS2.cpp,v
>retrieving revision 1.2
>diff -u -r1.2 nsURLHel
Flags: needinfo?(MAWAVANSMOOKEY)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: