Closed Bug 161174 Opened 23 years ago Closed 23 years ago

Hotmail Chinese attachment filename is displayed as question marks on download file dialog

Categories

(Core :: Internationalization, defect, P1)

Product:

Component:

Platform:

x86

Windows XP

Type:

defect

Priority:

P1

Severity:

major

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0.1

People

(Reporter: ji, Assigned: nhottanscp)

References

Details

(Keywords: intl, topembed, Whiteboard: [adt2 RTM] [Land on 1.0 branch post MachV])

Attachments

(2 files, 2 obsolete files)

v1 patch 23 years ago Darin Fisher 975 bytes, patch	nhottanscp : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
Two Chinese characters "%B8%B4%BC%FE" turns to question marks. 23 years ago nhottanscp 24.68 KB, image/jpeg		Details
ignore http header's filename if it contains 8bit characters 23 years ago nhottanscp 1.09 KB, patch	ftang : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
Addess bzbarsky's comment and combined darin's patch. 23 years ago nhottanscp 2.21 KB, patch	nhottanscp : review+ nhottanscp : superreview+ chofmann : approval+	Details \| Diff \| Splinter Review

Reporter

Description

•

23 years ago

Build: 08/02 branch build OS: Simplified Chinese XP From Chinese Hotmail, when I try to download an attachment with filename in Chinese, the Chinese filename is displayed as question marks on download file dialog. IE and 4.x don't have this problem. Steps to reproduce: 1. From Netscape Mail client, send a gb2312 mail with a Chinese attachment filename to a hotmail account. Please use doc attachment file so you can have a download dialog on hotmail. 2. Login Hotmail from Chinese UI. 3. Download the attachment file from Hotmail account on browser. The Chinese filename is shown as question marks on download file dialog.

Reporter

Comment 1

•

23 years ago

Nominating.

Keywords: intl, nsbeta1

Assignee

Updated

•

23 years ago

Blocks: 157673

Status: NEW → ASSIGNED

Updated

•

23 years ago

QA Contact: ruixu → ylong

Assignee

Comment 2

•

23 years ago

*** This bug has been marked as a duplicate of 155569 ***

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

Comment 3

•

23 years ago

Mark as verified dup.

Status: RESOLVED → VERIFIED

Assignee

Comment 4

•

23 years ago

This still happens after fixing bug 162523. For this case, the document charset ("GB2312") is once set correctly by the patch of bug 162523. Howerver, later in the code, there is another NewURI call which passes null. So the charset is "UTF-8" instead of "GB23212" when we try to unescape the string. The code to set null for the charset is in HTTP code. I don't understand that code. cc to darin for a help, do you think we can get a document charset at that point? Here is a call stack when NewURI is called with null charset (by nsHttpChannel::ProcessRedirection). nsHttpHandler::NewURI(nsHttpHandler * const 0x0100f428, const nsACString & {...}, const char * 0x00000000, nsIURI * 0x00000000, nsIURI * * 0x0012fc68) line 1830 nsIOService::NewURI(nsIOService * const 0x0102e608, const nsACString & {...}, const char * 0x00000000, nsIURI * 0x00000000, nsIURI * * 0x0012fc68) line 747 + 39 bytes nsHttpChannel::ProcessRedirection(unsigned int 302) line 1474 + 93 bytes nsHttpChannel::ProcessResponse() line 544 + 12 bytes nsHttpChannel::OnStartRequest(nsHttpChannel * const 0x053651ac, nsIRequest * 0x05161d6c, nsISupports * 0x00000000) line 2875 + 11 bytes nsOnStartRequestEvent::HandleEvent() line 161 + 53 bytes nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x0475d364) line 116 PL_HandleEvent(PLEvent * 0x0475d364) line 596 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x00f574e0) line 526 + 9 bytes _md_EventReceiverProc(HWND__ * 0x003c05b2, unsigned int 49458, unsigned int 0, long 16086240) line 1077 + 9 bytes USER32! 77e148dc() USER32! 77e14aa7() USER32! 77e266fd() nsAppShellService::Run(nsAppShellService * const 0x01845490) line 452 main1(int 1, char * * 0x00277410, nsISupports * 0x00000000) line 1509 + 32 bytes main(int 1, char * * 0x00277410) line 1873 + 37 bytes mainCRTStartup() line 338 + 17 bytes Here is a URI spec (%B8%B4%BC%FE is two Chinese characters encoded in GB2312). "http://216.33.236.250/cgi-bin/saferd/%B8%B4%BC%FE.doc?_lang=CN&hm___tg=http%3a%2f%2f216%2e33%2e236%2e250%2fcgi%2dbin%2fgetmsg%2f%b8%b4%bc%fe%2edoc&hm___qs=curmbox%3dF000000001%26a%3d361df2fceaedfeae2c961053864f7f51%26msg%3dMSG1028763000%2e123"

Status: VERIFIED → REOPENED

Resolution: DUPLICATE → ---

Comment 5

•

23 years ago

ah, so what's happening then is that the server is redirecting the original http request to a new URL. the HTTP header containing the new URL doesn't give any information about the origin charset, so we just assume UTF-8. perhaps we should instead inherit the origin charset from the original URL. in most cases that should be okay i think. anyways, your new unescape function for URLs will detect mismatched charset and fallback on UTF-8, right?

Reporter

Comment 6

•

23 years ago

Test results for other two popular Chinese webmails: 1. sina.com.cn webmail: it uses a link containing raw native filename to download the attachment£¬so it's fixed by bug 155569. 2. sohu.com webmail: it also uses a link, but doesn't handle Chinese attachment filename properly in itself. On IE, it replaces any Chinese filename with "download" + extension. On Netscape, the filename is not shown at all. So their webmail service has some problems in Chinese attachment filename handling. Since we still have problems with Chinese Hotmail. It is possible we may have problems with other webmail services in China and other non-English spoken countries.

Keywords: topembed

Assignee

Comment 7

•

23 years ago

darin, the problem is that orginCharset is "UTF-8" but the actual URI is encoded with a doc charset of the original document (e.g. %B8%B4%BC%FE" in GB2312). The escape code fails to convert to UTF-8 for that case. Perhaps, we may do a fallback (to browser' default?) after the failure if the input charset is UTF-8.

Status: REOPENED → ASSIGNED

Assignee

Comment 8

•

23 years ago

>perhaps we >should instead inherit the origin charset from the original URL. If this is possible then we want to do that. darin, can this be done in necko?

Comment 9

•

23 years ago

yes, it is a very simple patch in necko... i'll put together a patch shortly.

Comment 10

•

23 years ago

Attached patch v1 patch (obsolete) — Details — Splinter Review

nhotta: can you review this patch and make sure it fixes the bug? feel free to reassign to me for checkin (if it indeed fixes the problem).

Assignee

Comment 11

•

23 years ago

I applied the patch and my debug build shows that the charset is set to "GB2312" in the redirect code also in the intl unescape function. Need testing on Chinese system.

Assignee

Comment 12

•

23 years ago

So the patch works fine but the save dialog still shows question mark (I'll attach the screen shot). Looks like still need to fix somewhere else (wiget?).

Assignee

Comment 13

•

23 years ago

Attached image Two Chinese characters "%B8%B4%BC%FE" turns to question marks. — Details

Assignee

Comment 14

•

23 years ago

Comment on attachment 95280 [details] [diff] [review] v1 patch r=nhotta we need this patch

Attachment #95280 - Flags: review+

Comment 15

•

23 years ago

Do we want to take these fixes (bug 161174 and bug 162523) for 1.0.1 if there's a respin? We definitely need it for 1.0.x embedded clients. We should update the summary because it probably affects more than just Hotmail Chinese. What's the more accurate summary?

Assignee

Comment 16

•

23 years ago

nsFilePicker takes PRUnichar*, but it is infrated. For 0xB8B4 0xBCFE, it is gettting 0x00B8, 0x00B4, 0x00BC, 0x00FE. That is caused by the line 912 below. I am not sure why the filename must be ASCII but removing this line fixes the problem. http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#909 909 if (iter != start) { // not empty 910 // ONLY if we got here, will we remember the suggested file name... 911 // The filename must be ASCII, see RFC 2183, section 2.3 912 CopyASCIItoUCS2(Substring(start, iter), mSuggestedFileName);

Assignee

Comment 17

•

23 years ago

http://www.ietf.org/rfc/rfc2183.txt > Current [RFC 2045] grammar restricts parameter values (and hence > Content-Disposition filenames) to US-ASCII. We recognize the great This is allowed by http://www.ietf.org/rfc/rfc2231.txt MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations So there is a way to encode a filename (to ASCII). And when we present the file name to the user we should show non ASCII too. I think the line to restrict the file name to ASCII is not necessary. > desirability of allowing arbitrary character sets in filenames, but > it is beyond the scope of this document to define the necessary > mechanisms. We expect that the basic [RFC 1521] `value' > specification will someday be amended to allow use of non-US-ASCII > characters, at which time the same mechanism should be used in the > Content-Disposition filename parameter.

Assignee

Comment 18

•

23 years ago

So, instead of converting as ASCII, we want to get a charset to apply the correct Unicode conversion. But not sure how do we get the charset when it is not specified by the header. Or can we use the file name already set in other parts in the code (the one we did in bug 155569)?

Boris Zbarsky [:bzbarsky]

Comment 19

•

23 years ago

If someone has bothered to specify a filename in content-disposition, we should absolutely use that instead of the filename from the URL. If the header does not specify a charset, we could go with the URL charset, perhaps?

Assignee

Comment 20

•

23 years ago

The header for the hotmail case is something like this. "attachment ;filename=¸´¼þ.doc" It is raw 8bit, no encoding is applied.

Boris Zbarsky [:bzbarsky]

Comment 21

•

23 years ago

Wonderful... just wonderful.... So we could guess at the encoding based on the url encoding or something. Or we could add an IsAscii() test to that substring before doing the CopyASCIIToUCS2. If it does not test as ascii, we don't do the copy. In either case, we should add support for the RFC 2231 format.

Assignee

Comment 22

•

23 years ago

I talked to bzbarsky. Here is a plan. Separate a bug for RFC2231 to handle legal headers. In a short term, we check a file name in the header, if that is 8 bit (which means illegal) then we use the file name specified in the URI instead. Later, when we have RFC2231 implemented (with charset conversion code) then we can leverage that for the raw 8bit case to use the original document charset to decode the header specified file name.

Assignee

Comment 23

•

23 years ago

Attached patch ignore http header's filename if it contains 8bit characters (obsolete) — Details — Splinter Review

Assignee

Comment 24

•

23 years ago

bzbarsky, could you review the last patch and the darin's patch too?

Assignee

Comment 25

•

23 years ago

filed bug 162765 for RFC2231 support

Updated

•

23 years ago

Attachment #95322 - Flags: review+

Comment 26

•

23 years ago

Comment on attachment 95322 [details] [diff] [review] ignore http header's filename if it contains 8bit characters r=ftang

Boris Zbarsky [:bzbarsky]

Comment 27

•

23 years ago

Comment on attachment 95322 [details] [diff] [review] ignore http header's filename if it contains 8bit characters + nsCAutoString newFileName(Substring(start, iter)); How about: nsACString& newFileName = Substring(start, iter); with the rest being exactly as you have it? This avoids a string copy. sr=bzbarsky with that change.

Attachment #95322 - Flags: superreview+

Boris Zbarsky [:bzbarsky]

Comment 28

•

23 years ago

Comment on attachment 95280 [details] [diff] [review] v1 patch sr=bzbarsky

Attachment #95280 - Flags: superreview+

Assignee

Comment 29

•

23 years ago

Attached patch Addess bzbarsky's comment and combined darin's patch. — Details — Splinter Review

Attachment #95280 - Attachment is obsolete: true

Attachment #95322 - Attachment is obsolete: true

Assignee

Comment 30

•

23 years ago

Comment on attachment 95329 [details] [diff] [review] Addess bzbarsky's comment and combined darin's patch. r=nhotta,ftang sr=bzbarsky

Attachment #95329 - Flags: superreview+

Attachment #95329 - Flags: review+

Assignee

Comment 31

•

23 years ago

checked in to the trunk, thanks everyone!

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

Jaime Rodriguez, Jr.

Updated

•

23 years ago

Severity: normal → major

Keywords: nsbeta1 → adt1.0.1, mozilla1.0.1, nsbeta1+

Priority: -- → P1

Whiteboard: [adt2 RTM] [Land on 1.0.1 post MachV]

Target Milestone: --- → mozilla1.0.1

Reporter

Comment 32

•

23 years ago

Verified as fixed on 08/15 trunk build.

Status: RESOLVED → VERIFIED

QA Contact: ylong → ji

Jaime Rodriguez, Jr.

Updated

•

22 years ago

Blocks: 154896

Whiteboard: [adt2 RTM] [Land on 1.0.1 post MachV] → [adt2 RTM] [Land on 1.0 branch post MachV]

Reporter

Comment 33

•

22 years ago

QA contact to Rui for the verification on the branch build. Thanks.

QA Contact: ji → ruixu

Comment 34

•

22 years ago

jaimejr, when should we land this into the m1.0 branch? how many approval we need to got post machv?

Jaime Rodriguez, Jr.

Comment 35

•

22 years ago

adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending Drivers' approval. pls check this in asap, the replace "Mozilla1.0.1+" with "fixed1.0.1". [Using adt1.0.1 keywords as a proxy, since no edt1.0.1 keywords were created for the 1.0.1 branch. This is needed on the 1.0 branch for a major embedor]. thanks!

Keywords: adt1.0.1 → adt1.0.1+, approval, mozilla1.2

Comment 36

•

22 years ago

Comment on attachment 95329 [details] [diff] [review] Addess bzbarsky's comment and combined darin's patch. a=chofmann for 1.0.2

Attachment #95329 - Flags: approval+

Comment 37

•

22 years ago

patch checked into 1.02 branch.

Comment 38

•

22 years ago

mark it as fixed1.0.2

Keywords: fixed1.0.2

Reporter

Comment 39

•

22 years ago

verified as fixed with 2002-08-28-08-1.0 build.

Keywords: fixed1.0.2 → verified1.0.2

Randell Jesup [:jesup] (needinfo me)

Updated

•

22 years ago

Keywords: mozilla1.0.1

Updated

•

22 years ago

Depends on: 180372

Updated

•

22 years ago

No longer blocks: 157673

You need to log in before you can comment on or make changes to this bug.