kuma: Refine section ID generation to better match MindTouch anchor links



Mozilla Developer Network
6 years ago
4 years ago


(Reporter: lorchard, Assigned: ubernostrum)



(Whiteboard: u=contributor c=wiki s=2012-06-19 p=2 t=2012-06-26)



6 years ago
So, as part of my mega-pull-request check-in madness, I slipped in changes to how section IDs get injected into documents:


Before this, all IDs looked something like `sect1`, `sect2`, etc. But, that slightly breaks links like this:


And, since I found anchor links being used between pages, that seemed pretty important. Thus, I made changes to try to match MindTouch's ID generation.

But, it's not great, and could use some refinement in areas such as:

* IDs can be made non-unique if two headers have the same name

* Headers with non-ASCII characters might cause an issue in slugification. More testing might be useful there

* I only looked at a handful of MindTouch header IDs. It would be useful to get a survey of more IDs from the wild, particularly from non-EN pages, and make sure Kuma's IDs match those.


6 years ago
Whiteboard: s=2012-06-05


6 years ago
Priority: -- → P1
Sheppy and/or Jean-Yves, can you give us good examples of links to MDN *sections* so we can make sure we're matching existing links to sections?
Whiteboard: s=2012-06-05 → u=contributor c=wiki s=2012-06-05 p=2
I really don't know what this bug is about, exactly. Jean-Yves?

Comment 3

6 years ago
In MindTouch, when an article is broken up into sections by headers, the headers also generate anchor link IDs. So, you can directly link to a section within an article, like this:


The hash part is derived from the text of the header for which the link is created. We have something equivalent in Kuma, but I'm not sure if it's 100% the same as what MindTouch does.

So, some of the same headers might end up with slightly different anchor link IDs. And so, within-page links will drop the user at the top of the page rather than the section to which they were meant to be linked.

Comment 4

6 years ago
And, as I describe this, it prompts me to say that if we don't get this 100% right the worst case will be a few links that dump the user to the top of an article - and not a 404 Page Not Found. 

We'll probably never get it 100% right, so the goal here I guess is to get a little better than what's there and at least make sure it doesn't do the completely wrong thing for non-English pages. I suspect the encoding there might be different between MindTouch and Kuma for ID-ifying section titles with non-ASCII characters.
I think we should solve this problem in a more generic way (post-July). We have to think how our anchor links will looks like, how we can make them permanent (or another equivalent process keeping coherence between pages). We need not to break inter-pages link when we change the title of the page.

For July, we should have a list of all inter-pages w/ anchors links and see how to fix them. If there aren't too many, we could even do it by hand.

If this looks sensible, I'll create a few bugs for post-launch management (post-July).
Assignee: nobody → lcrouch
Releasing back into the wild so I'm not blocking this. I'm working PR's and release notes.
Assignee: lcrouch → nobody
Whiteboard: u=contributor c=wiki s=2012-06-05 p=2 → u=contributor c=wiki s=2012-06-19 p=2


6 years ago
Assignee: nobody → jbennett

Comment 7

6 years ago
Commits pushed to master at https://github.com/mozilla/kuma

Bug 747403 -- refine section ID generation

This is a first step, namely matching MindTouch behavior for sections
whose names contain non-ASCII characters. We now generate IDs in a
similar way: any section name which contains only ASCII content merely
has spaces replaced with underscores. A name which contains non-ASCII
characters has each such character replaced by hexadecimal digits
representing the appropriate UTF-8 codepoint(s), with each set of
digits preceded by a dot.

The test cases are a sampling of non-ASCII and mixed-character-set
section names and the slugs MindTouch generates for them.

This does not guarantee absolute parity with MindTouch, but probably
gets us close enough. It also does not deal with the problem of a
document in which not all section names are unique, but so far as I
can tell this does not introduce any new problems, merely perpeturts
an old one, assuming any such documents exist.

Merge pull request #283 from ubernostrum/section-ids-747403

Bug 747403 -- refine section ID generation
Another bug to keep in mind when doing the section ID generation:



6 years ago
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: u=contributor c=wiki s=2012-06-19 p=2 → u=contributor c=wiki s=2012-06-19 p=2 t=2012-06-26
Version: Kuma → unspecified
Component: Docs Platform → Editing
Product: Mozilla Developer Network → Mozilla Developer Network
You need to log in before you can comment on or make changes to this bug.