Closed Bug 145828 Opened 22 years ago Closed 11 years ago

<a href="xxx" charset="xxx"> does not work (charset attribute ignored)

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: teruko, Assigned: smontagu)

References

()

Details

(Keywords: intl)

<a href="xxx" charset="xxx"> does not work.

Steps of reproduce
1. Launch Netscape
2. Select Edit | Preferences to open Preferences dialog
3. Click on Navigator -> Languages 
4. Change the Default character coding to anything other than Western 
(ISO-88-59-1).  At this time, change to Cyrillic (IBM-855)
5. Go to above url
6. Click on links

Expected result
1st link - Character coding menu should be marked as Western (ISO-8859-1) 

2nd link - Character coding menu should be marked as Western (ISO-8859-1) 

3rd link - Character coidng menu should be marked as Japanese (EUC-JP)

Actual result
1st link - Character coding menu is marked as Japanese (EUC-JP) from HTTP 
Charset.

2nd link - Character coding menu is marked as Japanese (Shift_JIS) from Meta 
charset info.

3rd link - Character coding menu is marked as Cyrillic (IBM-855) from Default 
character coding.

Tested 5-13-rv:1.0rc2 Win32, Linux and Mac build.  This happens in NS 6.2.
Keywords: intl
QA Contact: ruixu → teruko
I believe nhotta is  more familiar with this issue than me.
Assignee: yokoyama → nhotta
My Web site includes links to pages that use Thai Windows encoding but do not 
specify this.

I think the following HTML should force the page to open with the proper 
encoding:

<p><a href="http://www.thairath.co.th/" charset="windows-874">The ThaiRath 
Daily</a></p>

But it does not work.  In Mozilla 1.0 RC3 with Windows 95 I still have to use 
View > Character Coding to see Thai characters.

Alan Wood, 5th June 2002
Adding bz and myself. Is there a place in document loading of text/* docs where 
we can use fallback mechanisms like this before we go into fullblown sniffing/
guessing?
Sure there is.  See the charset handling in nsHTMLDocument::StartDocumentLoad --
we could hook that in that process.  The problem is that the url load needs to
pass the charset along until we get to that point.  This would mean that a
charset needs to get passed to nsDocShell::LoadInternal() somehow....
When the link is clicked, charset attribute is not passed over. Set a break
point in docshell/base/nsWebShell.cpp, nsWebShell::OnLinkClickSync, you will
know what I am talking about. 

We can get charset in that function using code :
  nsCOMPtr<nsIDOMHTMLAnchorElement> anchor = do_QueryInterface(aContent);
  nsAutoString charset;
  if (anchor) {
    anchor->GetCharset(charset);
  }
and then pass this charset info in uri. Uri structure and interface might need
to be change to make this happen. It shouldn't be difficult when processing the
request.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2alpha
Target Milestone: mozilla1.2alpha → ---
*** Bug 221917 has been marked as a duplicate of this bug. ***
Summary: <a href="xxx" charset="xxx"> does not work → <a href="xxx" charset="xxx"> does not work (charset attribute ignored)
Do we really want/need to support this? 
Why not?  We could use it as a hint, the way we do "type" (see bug 214626).

History already handles charsets correctly, so the patch should be a little
simpler...
this bug is still there.
<http://clisp.cons.org/propaganda.html> -->
click on 
"Here is a source file  with Japanese characters in it."
the <a href="fibjap.lisp" charset="utf-8">source file</a>
is displayed in the default encoding instead of utf-8.

not supporting this makes it impossible to display
plain text pages in the specified encoding.
this attribute is there for a reason!
> not supporting this makes it impossible to display
> plain text pages in the specified encoding.

Not at all.  The server can easily send the right headers for these as needed...
how can the server know what the charset is?
if you request a plain text file, the server has no idea what the encoding is!
not to mention that the same file might be viewable with different encodings.
E.g.:

<a href="foo" charset="ascii">see FOO as an ascii file</a> or as a
<a href="foo" charset="utf-8">Unicode</a> file

how can the server know the encoding with which "foo" has been requested?
(In reply to comment #11)
> how can the server know what the charset is?
> if you request a plain text file, the server has no idea what the encoding is!

cat > .htaccess
AddCharset utf-8 txt
^D

and http headers override charset attributes anyway
You assume that I have control over the server configuration.  I do not.
Besides, this does not help me specify different charsets for different
text file - or even the same text file in different links.

> and http headers override charset attributes anyway
why is that?!
> how can the server know the encoding with which "foo" has been requested?

The Accept-Charset header, for example?  That does require the UA to support the
charset attribute to that extent, of course.

> why is that?!

Because the HTML specification says so?

In any case, the point is that there are other solutions to the common problem
(sending text files with a given charset)... That doesn't mean this bug should
not be fixed, and if someone produces a patch as outlined in comment 8 and the
bug comment 8 references that would be great.
(In reply to comment #13)
> You assume that I have control over the server configuration.  I do not.
> Besides, this does not help me specify different charsets for different
> text file - or even the same text file in different links.

Well,  in case of Apache, you can also specify charset per file basis with
.htaccess file. As long as your server uses Apache (clisp.cons.org does) and
your apache admin. didn't go to extra length to disable it (it's possible to
fine-tune what you can and can't do with htaccess file), you CAN change
.htaccess file to specify charset for each and every file *under your control*.
See  http://www.w3.org/International/questions/qa-htaccess-charset

> <a href="foo" charset="ascii">see FOO as an ascii file</a> or as a
> <a href="foo" charset="utf-8">Unicode</a> file

  What difference does it make other than a possible change in fonts if 'foo' is
indeed in US-ASCII.  

 Anyway, it'd be nice to fix this bug, but it's not a high priority (to me). 





> 
> > and http headers override charset attributes anyway
> why is that?!

Assignee: nhottanscp → smontagu
QA Contact: teruko → i18n
Not a feature defined in the HTML Standard.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.