Hotmail Chinese attachment filename is displayed as question marks on download file dialog

VERIFIED FIXED in mozilla1.0.1

Status

()

Core
Internationalization
P1
major
VERIFIED FIXED
16 years ago
14 years ago

People

(Reporter: ji, Assigned: nhottanscp)

Tracking

({intl, topembed})

Trunk
mozilla1.0.1
x86
Windows XP
intl, topembed
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt2 RTM] [Land on 1.0 branch post MachV])

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

16 years ago
Build: 08/02 branch build
OS: Simplified Chinese XP

From Chinese Hotmail, when I try to download an attachment with filename in
Chinese, the Chinese filename is displayed as question marks on download file
dialog. IE and 4.x don't have this problem. 

Steps to reproduce:
1. From Netscape Mail client, send a gb2312 mail with a Chinese attachment
filename to a hotmail account. Please use doc attachment file so you can have a
download dialog on hotmail.
2. Login Hotmail from Chinese UI.
3. Download the attachment file from Hotmail account on browser.

The Chinese filename is shown as question marks on download file dialog.
(Reporter)

Comment 1

16 years ago
Nominating.
Keywords: intl, nsbeta1
(Assignee)

Updated

16 years ago
Blocks: 157673
Status: NEW → ASSIGNED

Updated

16 years ago
QA Contact: ruixu → ylong
(Assignee)

Comment 2

16 years ago

*** This bug has been marked as a duplicate of 155569 ***
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → DUPLICATE

Comment 3

16 years ago
Mark as verified dup.
Status: RESOLVED → VERIFIED
(Assignee)

Comment 4

16 years ago
This still happens after fixing bug 162523.

For this case, the document charset ("GB2312") is once set correctly by the
patch of bug 162523. Howerver, later in the code, there is another NewURI call
which passes null. So the charset is "UTF-8" instead of "GB23212" when we try to
unescape the string.

The code to set null for the charset is in HTTP code. I don't understand that code.

cc to darin for a help, do you think we can get a document charset at that point?

Here is a call stack when NewURI is called with null charset (by
nsHttpChannel::ProcessRedirection).

nsHttpHandler::NewURI(nsHttpHandler * const 0x0100f428, const nsACString &
{...}, const char * 0x00000000, nsIURI * 0x00000000, nsIURI * * 0x0012fc68) line
1830
nsIOService::NewURI(nsIOService * const 0x0102e608, const nsACString & {...},
const char * 0x00000000, nsIURI * 0x00000000, nsIURI * * 0x0012fc68) line 747 +
39 bytes
nsHttpChannel::ProcessRedirection(unsigned int 302) line 1474 + 93 bytes
nsHttpChannel::ProcessResponse() line 544 + 12 bytes
nsHttpChannel::OnStartRequest(nsHttpChannel * const 0x053651ac, nsIRequest *
0x05161d6c, nsISupports * 0x00000000) line 2875 + 11 bytes
nsOnStartRequestEvent::HandleEvent() line 161 + 53 bytes
nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x0475d364) line 116
PL_HandleEvent(PLEvent * 0x0475d364) line 596 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00f574e0) line 526 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x003c05b2, unsigned int 49458, unsigned int 0,
long 16086240) line 1077 + 9 bytes
USER32! 77e148dc()
USER32! 77e14aa7()
USER32! 77e266fd()
nsAppShellService::Run(nsAppShellService * const 0x01845490) line 452
main1(int 1, char * * 0x00277410, nsISupports * 0x00000000) line 1509 + 32 bytes
main(int 1, char * * 0x00277410) line 1873 + 37 bytes
mainCRTStartup() line 338 + 17 bytes


Here is a URI spec (%B8%B4%BC%FE is two Chinese characters encoded in GB2312).
"http://216.33.236.250/cgi-bin/saferd/%B8%B4%BC%FE.doc?_lang=CN&hm___tg=http%3a%2f%2f216%2e33%2e236%2e250%2fcgi%2dbin%2fgetmsg%2f%b8%b4%bc%fe%2edoc&hm___qs=curmbox%3dF000000001%26a%3d361df2fceaedfeae2c961053864f7f51%26msg%3dMSG1028763000%2e123"

Status: VERIFIED → REOPENED
Resolution: DUPLICATE → ---

Comment 5

16 years ago
ah, so what's happening then is that the server is redirecting the original http
request to a new URL.  the HTTP header containing the new URL doesn't give any
information about the origin charset, so we just assume UTF-8.  perhaps we
should instead inherit the origin charset from the original URL.  in most cases
that should be okay i think.  anyways, your new unescape function for URLs will
detect mismatched charset and fallback on UTF-8, right?
(Reporter)

Comment 6

16 years ago
Test results for other two popular Chinese webmails:
1. sina.com.cn webmail: it uses a link containing raw native filename to
download the attachment£¬so it's fixed by bug 155569.
2. sohu.com webmail: it also uses a link, but doesn't handle Chinese attachment
filename properly in itself. On IE, it replaces any Chinese filename with
"download" + extension. On Netscape, the filename is not shown at all. So their
webmail service has some problems in Chinese attachment filename handling. 

Since we still have problems with Chinese Hotmail. It is possible we may have
problems with other webmail services in China and other non-English spoken
countries.
Keywords: topembed
(Assignee)

Comment 7

16 years ago
darin, the problem is that orginCharset is "UTF-8" but the actual URI is encoded
with a doc charset of the original document (e.g. %B8%B4%BC%FE" in GB2312).

The escape code fails to convert to UTF-8 for that case. Perhaps, we may do a
fallback (to browser' default?) after the failure if the input charset is UTF-8.
Status: REOPENED → ASSIGNED
(Assignee)

Comment 8

16 years ago
>perhaps we
>should instead inherit the origin charset from the original URL.

If this is possible then we want to do that. 
darin, can this be done in necko?

Comment 9

16 years ago
yes, it is a very simple patch in necko... i'll put together a patch shortly.

Comment 10

16 years ago
Created attachment 95280 [details] [diff] [review]
v1 patch

nhotta: can you review this patch and make sure it fixes the bug?  feel free to
reassign to me for checkin (if it indeed fixes the problem).
(Assignee)

Comment 11

16 years ago
I applied the patch and my debug build shows that the charset is set to "GB2312"
in the redirect code also in the intl unescape function.
Need testing on Chinese system.
(Assignee)

Comment 12

16 years ago
So the patch works fine but the save dialog still shows question mark (I'll
attach the screen shot). Looks like still need to fix somewhere else (wiget?).
(Assignee)

Comment 13

16 years ago
Created attachment 95287 [details]
Two Chinese characters "%B8%B4%BC%FE" turns to question marks.
(Assignee)

Comment 14

16 years ago
Comment on attachment 95280 [details] [diff] [review]
v1 patch

r=nhotta

we need this patch
Attachment #95280 - Flags: review+

Comment 15

16 years ago
Do we want to take these fixes (bug 161174 and bug 162523) for 1.0.1 if
there's a respin? 

We definitely need it for 1.0.x embedded clients.

We should update the summary because it probably affects more than just
Hotmail Chinese.  What's the more accurate summary?
(Assignee)

Comment 16

16 years ago
nsFilePicker takes PRUnichar*, but it is infrated.
For 0xB8B4 0xBCFE, it is gettting 0x00B8, 0x00B4, 0x00BC, 0x00FE.

That is caused by the line 912 below. I am not sure why the filename must be
ASCII but removing this line fixes the problem.

http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#909

909         if (iter != start) { // not empty
910           // ONLY if we got here, will we remember the suggested file name...
911           // The filename must be ASCII, see RFC 2183, section 2.3
912           CopyASCIItoUCS2(Substring(start, iter), mSuggestedFileName);

(Assignee)

Comment 17

16 years ago
http://www.ietf.org/rfc/rfc2183.txt
> Current [RFC 2045] grammar restricts parameter values (and hence
>    Content-Disposition filenames) to US-ASCII.  We recognize the great
This is allowed by 
http://www.ietf.org/rfc/rfc2231.txt
MIME Parameter Value and Encoded Word Extensions:
              Character Sets, Languages, and Continuations

So there is a way to encode a filename (to ASCII). And when we present the file
name to the user we should show non ASCII too. I think the line to restrict the
file name to ASCII is not necessary.

>    desirability of allowing arbitrary character sets in filenames, but
>    it is beyond the scope of this document to define the necessary
>    mechanisms.  We expect that the basic [RFC 1521] `value'
>    specification will someday be amended to allow use of non-US-ASCII
>    characters, at which time the same mechanism should be used in the
>    Content-Disposition filename parameter.


(Assignee)

Comment 18

16 years ago
So, instead of converting as ASCII, we want to get a charset to apply the
correct Unicode conversion. But not sure how do we get the charset when it is
not specified by the header. Or can we use the file name already set in other
parts in the code (the one we did in bug 155569)?
If someone has bothered to specify a filename in content-disposition, we should
absolutely use that instead of the filename from the URL.   If the header does
not specify a charset, we could go with the URL charset, perhaps?
(Assignee)

Comment 20

16 years ago
The header for the hotmail case is something like this.
"attachment ;filename=¸´¼þ.doc"

It is raw 8bit, no encoding is applied.
Wonderful... just wonderful....

So we could guess at the encoding based on the url encoding or something.  Or we
could add an IsAscii() test to that substring before doing the CopyASCIIToUCS2.
 If it does not test as ascii, we don't do the copy.

In either case, we should add support for the RFC 2231 format.
(Assignee)

Comment 22

16 years ago
I talked to bzbarsky. Here is a plan.
Separate a bug for RFC2231 to handle legal headers.

In a short term, we check a file name in the header, if that is 8 bit (which
means illegal) then we use the file name specified in the URI instead. 
Later, when we have RFC2231 implemented (with charset conversion code) then we
can leverage that for the raw 8bit case to use the original document charset to
decode the header specified file name.
 
(Assignee)

Comment 23

16 years ago
Created attachment 95322 [details] [diff] [review]
ignore http header's filename if it contains 8bit characters
(Assignee)

Comment 24

16 years ago
bzbarsky, could you review the last patch and the darin's patch too?
(Assignee)

Comment 25

16 years ago
filed bug 162765 for RFC2231 support

Updated

16 years ago
Attachment #95322 - Flags: review+

Comment 26

16 years ago
Comment on attachment 95322 [details] [diff] [review]
ignore http header's filename if it contains 8bit characters

r=ftang
Comment on attachment 95322 [details] [diff] [review]
ignore http header's filename if it contains 8bit characters

+	   nsCAutoString newFileName(Substring(start, iter));

How about:

nsACString& newFileName = Substring(start, iter);

with the rest being exactly as you have it?  This avoids a string copy.

sr=bzbarsky with that change.
Attachment #95322 - Flags: superreview+
(Assignee)

Comment 29

16 years ago
Created attachment 95329 [details] [diff] [review]
Addess bzbarsky's comment and combined darin's patch.
Attachment #95280 - Attachment is obsolete: true
Attachment #95322 - Attachment is obsolete: true
(Assignee)

Comment 30

16 years ago
Comment on attachment 95329 [details] [diff] [review]
Addess bzbarsky's comment and combined darin's patch.

r=nhotta,ftang
sr=bzbarsky
Attachment #95329 - Flags: superreview+
Attachment #95329 - Flags: review+
(Assignee)

Comment 31

16 years ago
checked in to the trunk, thanks everyone!
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED

Updated

16 years ago
Severity: normal → major
Keywords: nsbeta1 → adt1.0.1, mozilla1.0.1, nsbeta1+
Priority: -- → P1
Whiteboard: [adt2 RTM] [Land on 1.0.1 post MachV]
Target Milestone: --- → mozilla1.0.1
(Reporter)

Comment 32

16 years ago
Verified as fixed on 08/15 trunk build.
Status: RESOLVED → VERIFIED
QA Contact: ylong → ji

Updated

16 years ago
Blocks: 154896
Whiteboard: [adt2 RTM] [Land on 1.0.1 post MachV] → [adt2 RTM] [Land on 1.0 branch post MachV]
(Reporter)

Comment 33

16 years ago
QA contact to Rui for the verification on the branch build. Thanks.
QA Contact: ji → ruixu

Comment 34

16 years ago
jaimejr, when should we land this into the m1.0 branch? how many approval we
need to got post machv?

Comment 35

16 years ago
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending
Drivers' approval. pls check this in asap, the replace "Mozilla1.0.1+" with
"fixed1.0.1". [Using adt1.0.1 keywords as a proxy, since no edt1.0.1 keywords
were created for the 1.0.1 branch. This is needed on the 1.0 branch for a major
embedor]. thanks! 
Keywords: adt1.0.1 → adt1.0.1+, approval, mozilla1.2

Comment 36

16 years ago
Comment on attachment 95329 [details] [diff] [review]
Addess bzbarsky's comment and combined darin's patch.

a=chofmann for 1.0.2
Attachment #95329 - Flags: approval+

Comment 37

16 years ago
patch checked into 1.02 branch.

Comment 38

16 years ago
mark it as fixed1.0.2
Keywords: fixed1.0.2
(Reporter)

Comment 39

16 years ago
verified as fixed with 2002-08-28-08-1.0 build.
Keywords: fixed1.0.2 → verified1.0.2

Updated

16 years ago
Keywords: mozilla1.0.1

Updated

16 years ago
Depends on: 180372

Updated

16 years ago
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.