Bug 1071816 Comment 92 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

> today it is well overdue to make such basics work out of the box

Doing this on the browser side is incompatible with not breaking existing content. If legacy content expects something unreasonable, reasonable behavior needs to be opt-in resulting in boilerplate for all new content. UTF-8 isn't the only case. Other obvious cases are the standards mode and viewport behavior. So newly-authored HTML needs to start with `<!DOCTYPE html><meta charset="utf-8"><meta content="width=device-width, initial-scale=1" name="viewport">`. It's sad that new content bears this burden instead of old content, but that's how backward compatibility works.

If you are already at peace with putting `<!DOCTYPE html>` and `<meta content="width=device-width, initial-scale=1" name="viewport">` in your template, I suggest just treating `<meta charset="utf-8">` as yet another template bit and not trying to fight it.

It doesn't work for `text/plain`, which is something of an afterthought in the Web Platform compared to `text/html`. However, `text/plain` is also significantly less common than `text/html`, so it kinda works out in the aggregate even though it's annoying for people who actually do serve `text/plain`.

Web servers have tried to change their out-of-the-box experience. For example, it takes special effort to get nginx not to state any `charset` for `text/html`. This has its own set of problem when the Web server is upgraded without making the corresponding content changes. (Previously with Apache, similar issues lead to browsers not trusting server-claimed `text/plain` to actually be text.)

In any case, this is off-topic for _this_ bug.

> A today's node.php developer doesn't even know what "encoding" is.

At least with Node or PHP they are in control of their HTTP headers.
> today it is well overdue to make such basics work out of the box

Doing this on the browser side is incompatible with not breaking existing content. If legacy content expects something unreasonable, reasonable behavior needs to be opt-in resulting in boilerplate for all new content. UTF-8 isn't the only case. Other obvious cases are the standards mode and viewport behavior. So newly-authored HTML needs to start with `<!DOCTYPE html><meta charset="utf-8"><meta content="width=device-width, initial-scale=1" name="viewport">`. It's sad that new content bears this burden instead of old content, but that's how backward compatibility works.

If you are already at peace with putting `<!DOCTYPE html>` and `<meta content="width=device-width, initial-scale=1" name="viewport">` in your template, I suggest just treating `<meta charset="utf-8">` as yet another template bit and not trying to fight it.

It doesn't work for `text/plain`, which is something of an afterthought in the Web Platform compared to `text/html`. However, `text/plain` is also significantly less common than `text/html`, so it kinda works out in the aggregate even though it's annoying for people who actually do serve `text/plain`.

Web servers have tried to change their out-of-the-box experience. For example, it takes special effort to get nginx not to state any `charset` for `text/html`. This has its own set of problems when the Web server is upgraded without making the corresponding content changes. (Previously with Apache, similar issues lead to browsers not trusting server-claimed `text/plain` to actually be text.)

In any case, this is off-topic for _this_ bug.

> A today's node.php developer doesn't even know what "encoding" is.

At least with Node or PHP they are in control of their HTTP headers.

Back to Bug 1071816 Comment 92