Closed Bug 1156936 Opened 9 years ago Closed 9 years ago

Stop generating ugly ids that unnecessarily to conform to HTML 4.01

Categories

(developer.mozilla.org Graveyard :: General, enhancement)

All
Other
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fs, Unassigned)

Details

(Whiteboard: [specification][type:change])

What feature should be changed? Please provide the URL of the feature if possible.
==================================================================================
We are generating section ids that conform to HTML 4.01

HTML 4.01
> ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of
> letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

See https://github.com/mozilla/kuma/blob/master/kuma/wiki/tests/test_content.py#L390

def test_non_ascii_section_headers(self):
  headers = [
    (u'Documentation à propos de HTML',
      'Documentation_.C3.A0_propos_de_HTML'),
    (u'Outils facilitant le développement HTML',
      'Outils_facilitant_le_d.C3.A9veloppement_HTML'),
    (u'例:\u00a0スキューと平行移動',          '.E4.BE.8B.3A_.E3.82.B9.E3.82.AD.E3.83.A5.E3.83.BC.E3.81.A8.E5.B9.B3.E8.A1.8C.E7.A7.BB.E5.8B.95'),
    (u'例:\u00a0回転',
      '.E4.BE.8B.3A_.E5.9B.9E.E8.BB.A2'),
    (u'Documentação',
    'Documenta.C3.A7.C3.A3o'),
  ]


However, as of HTML5
> There are no other restrictions on what form an ID can take


We should still be doing some adjustments, though. Like spaces to underscore, for example.

What problems would this solve?
===============================
Localizers are having a hard time linking to fragment ids of their documents:
https://developer.mozilla.org/ru/docs/Tools/Page_Inspector/UI_Tour#.D0.9F.D0.B0.D0.BD.D0.B5.D0.BB.D1.8C_HTML

We have to link to release notes like this, too:
https://developer.mozilla.org/en-US/Firefox/Releases/40#Interfaces.2FAPIs.2FDOM

Who would use this?
===================
Everyone.

What would users see?
=====================
Better fragment identifiers.

Like:

https://developer.mozilla.org/ru/docs/Tools/Page_Inspector/UI_Tour#Панель_HTML
and 
https://developer.mozilla.org/en-US/Firefox/Releases/40#Interfaces/API/DOM

What would users do? What would happen as a result?
===================================================
Readable URLs is a best practice.

Is there anything else we should know?
======================================
Existing deep links with the escaped fragment identifiers will be gone. This might break deep links with these fragment identifiers from inside MDN and from external sites using these.

This was probably implemented for compliance with how MindTouch generated these fragment ids – the HTML 4 way.
:fscholz, can you dig into the implications of this change and make a more detailed suggestion (or PR) based on that analysis?
Severity: normal → enhancement
Flags: needinfo?(fscholz)
Interesting post:
https://mathiasbynens.be/notes/html5-id-class

Seems like MediaWiki has a similar ticket open:
https://phabricator.wikimedia.org/T59093
and also thoughts about special chars:
https://phabricator.wikimedia.org/T26918

I am trying to write a PR for this.
Flags: needinfo?(fscholz)
This was in fact done for compatibility with MindTouch, because that was how MindTouch handled non-ASCII characters, and at the time of the MindTouch-to-kuma migration it appeared we would need to preserve the ID-generation behavior of MindTouch to avoid breaking links to specific sections of MDN articles.

See bug 747403 and bug 776703 for the history.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.