Closed Bug 59220 Opened 24 years ago Closed 24 years ago

nsFormFrame::GetFileNameWithinPath() not I18N clean

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 44464

People

(Reporter: havill, Assigned: m_kato)

Details

(Keywords: intl)

some Japanese and Chinese pathnames/filenames when converted to multibyte C strings will contain 0x7E (the Windows directory separator) in the second byte and this will be erroneously detected as a directory separator in the above code. Also, why is the code detecting backslashes for Linux? This character is a legal (even though you'd never want to do it) character in a filename for Unix systems. A fixed version of GetFileNameWithinPath can be found at: <URL:http://bugzilla.mozilla.org/showattachment.cgi?attach_id=18775> which is linked to bug #59216 <URL:http://bugzilla.mozilla.org/show_bug.cgi?id=59216>
Oops, backslash is 0x5C, not 0x7E. (Need more coffee) ASCII memorization problems aside, the function still will fail with certain kanji/kanxi filenames.
note that nsFormFrame::GetContentType is also dbcs unclean, but the odds of this showing up on most modern systems are slim (the second byte range of Big5 and Shift_JIS start at 0x40... the period (which is scans for) is at 0x2E... and it's unlikely there are any filesystems that store that filename in 7-bit ISO-2022-JP.
Blocks: 44464
Reporter is this still a problem in the latest nightlies?
Yes, but the bug is fixed in the patch for bug 44464, which rods is including into the tree.
okay, then I reassign to rods.
Assignee: nhotta → rods
setting bug status to New
Status: UNCONFIRMED → NEW
Ever confirmed: true
Adrian, your code is not smart as fix:-(. Because your change is too big. So I think that I should fix the following. And rods, please assign to me. Index: nsFormFrame.cpp =================================================================== RCS file: /cvsroot/mozilla/layout/html/forms/src/nsFormFrame.cpp,v retrieving revision 3.139 diff -u -r3.139 nsFormFrame.cpp --- nsFormFrame.cpp 2001/01/04 20:44:14 3.139 +++ nsFormFrame.cpp 2001/01/18 11:44:54 @@ -1237,8 +1237,11 @@ // On a Mac the only invalid character in a file name is a : so we have to av oid // the test for '\' char* fileNameStart = PL_strrchr(aPathName, ':'); +#elif XP_WIN + char* fileNameStart = _mbsrchr(aPathName, '\\'); // windows #else - char* fileNameStart = PL_strrchr(aPathName, '\\'); // windows + char* fileNameStart = PL_strrchr(aPathName, '\\'); +#endif if (!fileNameStart) { // try unix fileNameStart = PL_strrchr(aPathName, '/'); }
CCing nhotta due to well-known 0x5C issue.
Added 'intl' keyword, cc to ylong, please prepare a test case. I think using _mbsrchr is fine. How this is related to bug 44464? I don't see a change for nsFormFrame::GetFileNameWithinPath in the patch (of bug 44464).
Keywords: intl
The separated version is o.k. I guess, except I rewrote it to work natively in Unicode to avoid the character set conversion from the Unicode string to the C string (and thus making the specialized Windows specific function irrelevant) in the original non-patched code. If you do the little patch that Kato suggests, the patch will again need to mash the Unicode string back to a C string, which is a little ugly and unnecessary IMHO. We were working in Unicode to begin with... why convert to a C string to do multibyte string operations? I didn't bother separating the code because the multipart/form-data post code is the only thing that uses it, and you can't change the function to Unicode without changing the code that calls it. nhotta: bug 44464 references nsFormFrame::GetFileNameWithinPath at line 266 in the patch.
I found it, I was looking at the second patch before. I agree with Adrian, if he is going to change it to unicode then no need to do a platform specific change.
Thanks, please fix this and check it in. It would be great to get the ifdefs out of there. I think we have some XP file classes that can help get this info so. sdagley should never have put in the ifdefs and I should have reviewed the code.
Assignee: rods → m_kato
Is the particular patch identified and working at present? If so, can it be checked in to enable unicode strings in filenames? Given that these strings appear in HTTP POST transactions, will Unicode work on all anticipated platforms, and if not, what kinds of things might break?
Hotta san, this should be verified by engineer.
QA Contact: teruko → nhotta
When the strings are sent, the are sent in the same character encoding as the form. The is the same thing that Internet Explorer does. According to HTML 4 specs, however, both are wrong (both will send raw UTF-8 in the MIME-like file-upload header, when the spec says that non-ASCII names should be encoded as per MIME rules. However, very few file upload parsers in the wild process file upload forms according to real MIME rules, so changing this to "conform to the spec" would break too much. (Sad)
No longer blocks: 44464
*** This bug has been marked as a duplicate of 44464 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.