59220 - nsFormFrame::GetFileNameWithinPath() not I18N clean

Reporter

Description

•

24 years ago

some Japanese and Chinese pathnames/filenames when converted to multibyte C strings will contain 0x7E (the Windows directory separator) in the second byte and this will be erroneously detected as a directory separator in the above code. Also, why is the code detecting backslashes for Linux? This character is a legal (even though you'd never want to do it) character in a filename for Unix systems. A fixed version of GetFileNameWithinPath can be found at: <URL:http://bugzilla.mozilla.org/showattachment.cgi?attach_id=18775> which is linked to bug #59216 <URL:http://bugzilla.mozilla.org/show_bug.cgi?id=59216>

Adrian Havill

Reporter

Comment 1

•

24 years ago

Oops, backslash is 0x5C, not 0x7E. (Need more coffee) ASCII memorization problems aside, the function still will fail with certain kanji/kanxi filenames.

Adrian Havill

Reporter

Comment 2

•

24 years ago

note that nsFormFrame::GetContentType is also dbcs unclean, but the odds of this showing up on most modern systems are slim (the second byte range of Big5 and Shift_JIS start at 0x40... the period (which is scans for) is at 0x2E... and it's unlikely there are any filesystems that store that filename in 7-bit ISO-2022-JP.

Adrian Havill

Reporter

Updated

•

24 years ago

Blocks: 44464

Keyser Sose

Comment 3

•

24 years ago

Reporter is this still a problem in the latest nightlies?

Adrian Havill

Reporter

Comment 4

•

24 years ago

Yes, but the bug is fixed in the patch for bug 44464, which rods is including into the tree.

nhottanscp

Comment 5

•

24 years ago

okay, then I reassign to rods.

Assignee: nhotta → rods

Asa Dotzler [:asa]

Comment 6

•

24 years ago

setting bug status to New

Status: UNCONFIRMED → NEW

Ever confirmed: true

Makoto Kato [:m_kato]

Assignee

Comment 7

•

24 years ago

Adrian, your code is not smart as fix:-(. Because your change is too big. So I think that I should fix the following. And rods, please assign to me. Index: nsFormFrame.cpp =================================================================== RCS file: /cvsroot/mozilla/layout/html/forms/src/nsFormFrame.cpp,v retrieving revision 3.139 diff -u -r3.139 nsFormFrame.cpp --- nsFormFrame.cpp 2001/01/04 20:44:14 3.139 +++ nsFormFrame.cpp 2001/01/18 11:44:54 @@ -1237,8 +1237,11 @@ // On a Mac the only invalid character in a file name is a : so we have to av oid // the test for '\' char* fileNameStart = PL_strrchr(aPathName, ':'); +#elif XP_WIN + char* fileNameStart = _mbsrchr(aPathName, '\\'); // windows #else - char* fileNameStart = PL_strrchr(aPathName, '\\'); // windows + char* fileNameStart = PL_strrchr(aPathName, '\\'); +#endif if (!fileNameStart) { // try unix fileNameStart = PL_strrchr(aPathName, '/'); }

Makoto Kato [:m_kato]

Assignee

Comment 8

•

24 years ago

CCing nhotta due to well-known 0x5C issue.

nhottanscp

Comment 9

•

24 years ago

Added 'intl' keyword, cc to ylong, please prepare a test case. I think using _mbsrchr is fine. How this is related to bug 44464? I don't see a change for nsFormFrame::GetFileNameWithinPath in the patch (of bug 44464).

Keywords: intl

Adrian Havill

Reporter

Comment 10

•

24 years ago

The separated version is o.k. I guess, except I rewrote it to work natively in Unicode to avoid the character set conversion from the Unicode string to the C string (and thus making the specialized Windows specific function irrelevant) in the original non-patched code. If you do the little patch that Kato suggests, the patch will again need to mash the Unicode string back to a C string, which is a little ugly and unnecessary IMHO. We were working in Unicode to begin with... why convert to a C string to do multibyte string operations? I didn't bother separating the code because the multipart/form-data post code is the only thing that uses it, and you can't change the function to Unicode without changing the code that calls it. nhotta: bug 44464 references nsFormFrame::GetFileNameWithinPath at line 266 in the patch.

nhottanscp

Comment 11

•

24 years ago

I found it, I was looking at the second patch before. I agree with Adrian, if he is going to change it to unicode then no need to do a platform specific change.

rods (gone)

Comment 12

•

24 years ago

Thanks, please fix this and check it in. It would be great to get the ifdefs out of there. I think we have some XP file classes that can help get this info so. sdagley should never have put in the ifdefs and I should have reviewed the code.

Assignee: rods → m_kato

James Salsman

Comment 13

•

24 years ago

Is the particular patch identified and working at present? If so, can it be checked in to enable unicode strings in filenames? Given that these strings appear in HTTP POST transactions, will Unicode work on all anticipated platforms, and if not, what kinds of things might break?

Teruko Kobayashi

Comment 14

•

24 years ago

Hotta san, this should be verified by engineer.

QA Contact: teruko → nhotta

Adrian Havill

Reporter

Comment 15

•

24 years ago

When the strings are sent, the are sent in the same character encoding as the form. The is the same thing that Internet Explorer does. According to HTML 4 specs, however, both are wrong (both will send raw UTF-8 in the MIME-like file-upload header, when the spec says that non-ASCII names should be encoded as per MIME rules. However, very few file upload parsers in the wild process file upload forms according to real MIME rules, so changing this to "conform to the spec" would break too much. (Sad)

Adrian Havill

Reporter

Updated

•

24 years ago

No longer blocks: 44464

Adrian Havill

Reporter

Comment 16

•

24 years ago

*** This bug has been marked as a duplicate of 44464 ***

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → DUPLICATE

Bugzilla

nsFormFrame::GetFileNameWithinPath() not I18N clean

Categories

(Core :: Internationalization, defect, P3)

Tracking

()

People

(Reporter: havill, Assigned: m_kato)

References

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Comment 16