(In reply to Adam Borowski from comment #89) > You can't label a text file. On common non-`file` transports you can: with `; charset=utf-8` appended to the `text/plain` type. > And it makes no sense to apply a different encoding to local files (which on most computers shipped today are exclusively UTF-8) than to the same files carried over the net. It does when a file carried over the net loses its `Content-Type` header when saved locally and sniffing for UTF-8 locally is feasible in a way that doesn't apply to network streams in the context of the Web's incremental rendering requirements. > So if one of competing standards bodies declares it wants Windows-1252, what about having a config option WHATWGLY_CORRECT that defaults to off, and doing a sane thing otherwise? Do I understand correctly that you'd want to assume UTF-8 for unlabeled content carried over the network and break unlabeled legacy content in order to give newly-authored content the convenience of not having to declare UTF-8?
Bug 1071816 Comment 90 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(In reply to Adam Borowski from comment #89) > You can't label a text file. On common non-`file` transports you can: with `; charset=utf-8` appended to the `text/plain` type. (Also, the BOM is an option on any transport, but has other issues.) > And it makes no sense to apply a different encoding to local files (which on most computers shipped today are exclusively UTF-8) than to the same files carried over the net. It does when a file carried over the net loses its `Content-Type` header when saved locally and sniffing for UTF-8 locally is feasible in a way that doesn't apply to network streams in the context of the Web's incremental rendering requirements. > So if one of competing standards bodies declares it wants Windows-1252, what about having a config option WHATWGLY_CORRECT that defaults to off, and doing a sane thing otherwise? Do I understand correctly that you'd want to assume UTF-8 for unlabeled content carried over the network and break unlabeled legacy content in order to give newly-authored content the convenience of not having to declare UTF-8?