Bug 1071816 Comment 90 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Adam Borowski from comment #89)
> You can't label a text file.

On common non-`file` transports you can: with `; charset=utf-8` appended to the `text/plain` type.

>  And it makes no sense to apply a different encoding to local files (which on most computers shipped today are exclusively UTF-8) than to the same files carried over the net.

It does when a file carried over the net loses its `Content-Type` header when saved locally and sniffing for UTF-8 locally is feasible in a way that doesn't apply to network streams in the context of the Web's incremental rendering requirements.

> So if one of competing standards bodies declares it wants Windows-1252, what about having a config option WHATWGLY_CORRECT that defaults to off, and doing a sane thing otherwise?

Do I understand correctly that you'd want to assume UTF-8 for unlabeled content carried over the network and break unlabeled legacy content in order to give newly-authored content the convenience of not having to declare UTF-8?
(In reply to Adam Borowski from comment #89)
> You can't label a text file.

On common non-`file` transports you can: with `; charset=utf-8` appended to the `text/plain` type. (Also, the BOM is an option on any transport, but has other issues.)

>  And it makes no sense to apply a different encoding to local files (which on most computers shipped today are exclusively UTF-8) than to the same files carried over the net.

It does when a file carried over the net loses its `Content-Type` header when saved locally and sniffing for UTF-8 locally is feasible in a way that doesn't apply to network streams in the context of the Web's incremental rendering requirements.

> So if one of competing standards bodies declares it wants Windows-1252, what about having a config option WHATWGLY_CORRECT that defaults to off, and doing a sane thing otherwise?

Do I understand correctly that you'd want to assume UTF-8 for unlabeled content carried over the network and break unlabeled legacy content in order to give newly-authored content the convenience of not having to declare UTF-8?

Back to Bug 1071816 Comment 90