Closed Bug 438265 Opened 16 years ago Closed 14 years ago

Auto generated ID (anchor) for headers don't support multi-byte chars

Categories

(support.mozilla.org :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE
Future

People

(Reporter: bugzilla, Unassigned)

References

Details

When use headers with

!1st level header
!!2nd level header
!!!3rd level header

id attribute will be generated to h1/h2/h3 tags like:

<h1 class="showhide_heading" id="1st_level_header">
<h2 class="showhide_heading" id="2nd_level_header">
<h3 class="showhide_heading" id="3rd_level_header">

But if the text of the header are written with multi-byte chars like Japanese/Chinese/Korea etc, header id will be poor meeningless one like:
id="_", id="__2", id="__3", ... , id="__n"
If the text start with ascii but contains multi-byte chars, multi-byte chars will be trimmed and only the first ascii part will be the id.

Poor id example in Japanese translated in-product help page:
http://support.mozilla.com/ja/kb/Options+Window#_
http://support.mozilla.com/ja/kb/Options+Window#__2
http://support.mozilla.com/ja/kb/Options+Window#JavaScript_

We can use toc correctly within the page but we cannot use ((page name|#anchor_id)) untill this bug is solved. We cannot link to broken anchors from other site.
You can always create extra (manual) anchors for headings. We do this for in-product help articles:

{ANAME()}navigating_web_pages{ANAME}
!Navigating web pages

Of course we can use the workaround as you say when we need link to anchor of the page from some other page of SUMO.
But it can be used only when the page linked from was in SUMO. If someone want to link to the part of the SUMO page, (unless they register, study how to edit the page, contribute, wait to be reviewed...) they cannot use correct anchor.

This may be not critical just now (not required to be fixed before the Fx3 release) but the later this bug will be fixed the more pages including links to poor anchor will be made. IMHO this should be fixed as soon as possible.
# I know there are more important other problems and not saying that this must be fixed right now. I mean we should keep this bug in mind.
Severity: major → normal
Target Milestone: --- → 0.7
Target Milestone: 0.7 → 0.8
Target Milestone: 0.8 → 0.9
clipped from IRC: not sure if this is the same bug...

tomer_: I understand that sumo/tikiwiki has very bad UTF8 support for anchors. For example, see the internal links in the following document - http://support.mozilla.com/he/kb/new page template
tomer_: It should also convert %20 (space) to underscore as mediawiki does, in order not to break links

We need to check what has been fixed upstream in the latest TikiWiki, what has not been fixed, and address this as we merge stuff.
Assignee: nobody → nelson
Target Milestone: 0.9 → 1.0
The actual tikilib.php in tikiwiki is
						// create stable anchors for all headers
						// use header but replace non-word character sequences
						// with one underscore (for XHTML 1.0 compliance)
						// Workaround pb with plugin replacement and header id
						//  first we remove hash from title_text for headings beginning
						//  with images and HTML tags
						$thisid = ereg_replace('§[a-z0-9]{32}§', '', $title_text);
						$thisid = ereg_replace('</?[^>]+>', '', $thisid);
						$thisid = ereg_replace('[^a-zA-Z0-9\:\.\-\_]+', '_', $thisid);
						$thisid = ereg_replace('^[^a-zA-Z]*', '', $thisid);
						if (empty($thisid)) $thisid = 'a'.md5($title_text);

instead of sumo line 6474

At least the id follows the html standard.
http://www.w3.org/TR/html401/types.html#type-name

But it is true that for Japanese - it will bee not friendly - a md5 of the title.
This bug makes anchors look ugly, and I agree 100% with dynamis that this should be fixed. Right now we don't have the dev resources to look at this, since it's not a critical bug. 

-> Future
Assignee: nelson → nobody
Target Milestone: 1.0 → Future
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.