Closed Bug 365926 Opened 18 years ago Closed 14 years ago

Attachments mis-served as utf-8

Categories

(Bugzilla :: Attachments & Requests, defect)

2.23.3
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.6

People

(Reporter: smontagu, Assigned: mkanat)

References

()

Details

Attachments

(1 file)

HTML testcases in bugs are being served with a Content-Type header that says "text/html; charset=UTF-8", even if they have a different encoding specified in a <meta> charset, e.g. attachment 250472 [details] in bug 365922.

Marking as major, because this makes QA much more time-consuming.
I suppose attachments should be sent without a charset, unless they specify some charset. I'm not quite sure how to do that, though, since we're serving them through a CGI.
OS: Linux → All
Hardware: PC → All
Target Milestone: --- → Bugzilla 3.0
Version: unspecified → 2.23.3
With a bit a research to try to fix this, I found that in attachment.cgi CGI:header seems to inherit the UTF-8 charset value from somewhere, but resetting the charset it to an empty string with $cgi->header(-charset=>'') just causes the CGI module to use the more usual default, ie 'iso-8859-1'. There's no obvious method available to really remove it.

But whenever it's important to view the attachment with a specific encoding, to enter the Content Type manually when creating the attachment and set it to a value like "text/plain; charset=iso-8859-1" is a better solution than having bz send it without a specified charset and relying on the browser to do the right thing.

So I think the best solution might be to close this as invalid and/or find a place to document clearly what the good method to do that is.
BTW I've already made the change in the originally impacted bug, so that the correct charset will be used when that attachment is visualized. And I'm lowering the severity though I'll let the component owners decide the final outcome of this bug.
Severity: major → minor
That's all very well for new attachments, but what about all the existing attachments that have a charset specified internally but no charset in the content-type? I have been editing the content-type as I come across them, but it's annoying. I'm sure there are also testcases attached to charset autodetection bugs that depend on not having a charset specified!
Severity: minor → major
Is this bug related to or a dupe of bug 226404?
Unrelated. That's specifically about patches.
Max, isn't this bug fixed by bug 408446 on tip?
(In reply to comment #8)
> Max, isn't this bug fixed by bug 408446 on tip?

  No, we still send the charset header, as far as I know.
This URL https://bugzilla.mozilla.org/attachment.cgi?id=306957 gets a "Content-Type: text/plain; name="patch359083.txt"; charset=UTF-8"
header causing the European Umlauts to break. 

Bug 226404 comment 1 states that multi-byte source code is used. Is this really an issue? Why aren't all patches delivered with the US-ASCII charset?
Severity: major → normal
The Bugzilla 3.0 branch is now locked to security bugs and dataloss fixes only. This bug doesn't fit into one of these two categories and is retargetted to 3.2 as part of a mass-change. To catch bugmails related to this mass-change, use lts081207 in your email client filter.
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
Depends on: 504104
I know how to fix this, but I first need the patch from bug 504104 for a complete fix. As it may be a bit invasive, I'm retargetting this bug to 3.6. For 3.4 and older, you should specify the charset together with the MIME type, e.g. "text/html; charset=euc-jp". The charset will then be passed to the browser and your HTML page will be displayed correctly.

In this bug, I will only focus on the automatic charset detection: if no charset is specified with the MIME type (as shown above), Bugzilla will call HTML::Parser to get the <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=euc-jp"> tag and extract the charset from it.
Assignee: attach-and-request → LpSolit
Target Milestone: Bugzilla 3.2 → Bugzilla 3.6
Bug formerly depended on bug 504104 which became a dupe of bug 477442.
Depends on: 477442
No longer depends on: 504104
Attached patch v1Splinter Review
This patch simply sends a blank charset with the file and lets the browser decide. (At least, I believe the browser will decide in this case.)
Assignee: LpSolit → mkanat
Status: NEW → ASSIGNED
Attachment #427547 - Flags: review?(LpSolit)
Attachment #427547 - Flags: review?(LpSolit) → review+
Comment on attachment 427547 [details] [diff] [review]
v1

Seems to trigger the automatic detection of browsers. But must browsers are pretty bad at detecting the correct encoding, from what I could see. r=LpSolit
No longer depends on: 477442
Flags: approval3.6+
Flags: approval+
(In reply to comment #15)
> Seems to trigger the automatic detection of browsers. But must browsers are

s/must/most/
Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/trunk/
modified attachment.cgi
Committed revision 7083.

Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/3.6/
modified attachment.cgi
Committed revision 7046.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
See Also: → 1297243
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: