Closed
Bug 59220
Opened 24 years ago
Closed 24 years ago
nsFormFrame::GetFileNameWithinPath() not I18N clean
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
People
(Reporter: havill, Assigned: m_kato)
Details
(Keywords: intl)
some Japanese and Chinese pathnames/filenames when converted to multibyte C
strings will contain 0x7E (the Windows directory separator) in the second byte
and this will be erroneously detected as a directory separator in the above
code.
Also, why is the code detecting backslashes for Linux? This character is a legal
(even though you'd never want to do it) character in a filename for Unix
systems.
A fixed version of GetFileNameWithinPath can be found at:
<URL:http://bugzilla.mozilla.org/showattachment.cgi?attach_id=18775>
which is linked to bug #59216
<URL:http://bugzilla.mozilla.org/show_bug.cgi?id=59216>
Reporter | ||
Comment 1•24 years ago
|
||
Oops, backslash is 0x5C, not 0x7E. (Need more coffee)
ASCII memorization problems aside, the function still will fail with certain
kanji/kanxi filenames.
Reporter | ||
Comment 2•24 years ago
|
||
note that nsFormFrame::GetContentType is also dbcs unclean, but the odds of this
showing up on most modern systems are slim (the second byte range of Big5 and
Shift_JIS start at 0x40... the period (which is scans for) is at 0x2E... and
it's unlikely there are any
filesystems that store that filename in 7-bit ISO-2022-JP.
Comment 3•24 years ago
|
||
Reporter is this still a problem in the latest nightlies?
Reporter | ||
Comment 4•24 years ago
|
||
Yes, but the bug is fixed in the patch for bug 44464, which rods is including
into the tree.
Assignee | ||
Comment 7•24 years ago
|
||
Adrian, your code is not smart as fix:-(. Because your change is too big.
So I think that I should fix the following.
And rods, please assign to me.
Index: nsFormFrame.cpp
===================================================================
RCS file: /cvsroot/mozilla/layout/html/forms/src/nsFormFrame.cpp,v
retrieving revision 3.139
diff -u -r3.139 nsFormFrame.cpp
--- nsFormFrame.cpp 2001/01/04 20:44:14 3.139
+++ nsFormFrame.cpp 2001/01/18 11:44:54
@@ -1237,8 +1237,11 @@
// On a Mac the only invalid character in a file name is a : so we have to av
oid
// the test for '\'
char* fileNameStart = PL_strrchr(aPathName, ':');
+#elif XP_WIN
+ char* fileNameStart = _mbsrchr(aPathName, '\\'); // windows
#else
- char* fileNameStart = PL_strrchr(aPathName, '\\'); // windows
+ char* fileNameStart = PL_strrchr(aPathName, '\\');
+#endif
if (!fileNameStart) { // try unix
fileNameStart = PL_strrchr(aPathName, '/');
}
Assignee | ||
Comment 8•24 years ago
|
||
CCing nhotta due to well-known 0x5C issue.
Comment 9•24 years ago
|
||
Added 'intl' keyword, cc to ylong, please prepare a test case.
I think using _mbsrchr is fine. How this is related to bug 44464? I don't see a
change for nsFormFrame::GetFileNameWithinPath in the patch (of bug 44464).
Keywords: intl
Reporter | ||
Comment 10•24 years ago
|
||
The separated version is o.k. I guess, except I rewrote it to work natively in
Unicode to avoid the character set conversion from the Unicode string to the C
string (and thus making the specialized Windows specific function irrelevant) in
the original non-patched code.
If you do the little patch that Kato suggests, the patch will again need to mash
the Unicode string back to a C string, which is a little ugly and unnecessary
IMHO. We were working in Unicode to begin with... why convert to a C string to
do multibyte string operations?
I didn't bother separating the code because the multipart/form-data post code is
the only thing that uses it, and you can't change the function to Unicode
without changing the code that calls it.
nhotta: bug 44464 references nsFormFrame::GetFileNameWithinPath at line 266 in
the patch.
Comment 11•24 years ago
|
||
I found it, I was looking at the second patch before.
I agree with Adrian, if he is going to change it to unicode then no need to do a
platform specific change.
Comment 12•24 years ago
|
||
Thanks, please fix this and check it in. It would be great to get the ifdefs out
of there. I think we have some XP file classes that can help get this info so.
sdagley should never have put in the ifdefs and I should have reviewed the code.
Assignee: rods → m_kato
Comment 13•24 years ago
|
||
Is the particular patch identified and working at present?
If so, can it be checked in to enable unicode strings in filenames?
Given that these strings appear in HTTP POST transactions, will Unicode work on
all anticipated platforms, and if not, what kinds of things might break?
Comment 14•24 years ago
|
||
Hotta san, this should be verified by engineer.
QA Contact: teruko → nhotta
Reporter | ||
Comment 15•24 years ago
|
||
When the strings are sent, the are sent in the same character encoding as the
form. The is the same thing that Internet Explorer does.
According to HTML 4 specs, however, both are wrong (both will send raw UTF-8 in
the MIME-like file-upload header, when the spec says that non-ASCII names should
be encoded as per MIME rules.
However, very few file upload parsers in the wild process file upload forms
according to real MIME rules, so changing this to "conform to the spec" would
break too much. (Sad)
Reporter | ||
Comment 16•24 years ago
|
||
*** This bug has been marked as a duplicate of 44464 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•