URLs should be normalized (Unicode NFKC)
Categories
(developer.mozilla.org Graveyard :: Wiki pages, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: jwhitlock, Assigned: jwhitlock)
References
Details
(Keywords: in-triage, Whiteboard: [specification][type:bug][points=3])
Assignee | ||
Comment 1•7 years ago
|
||
Here's some code to detect the problem:
import unicodedata
import urllib
for doc in Document.objects.filter_for_list().order_by('id'):
unquote_slug = urllib.unquote(doc.slug)
normslug = unicodedata.normalize('NFC', unquote_slug)
encode_normslug = normslug.encode('utf8')
quote_slug = urllib.quote(encode_normslug)
if normslug != doc.slug and quote_slug != doc.slug:
full = doc.get_full_url()
print "* %s: %s" % (doc.id, full)
Here's the 24 results:
- 3863: https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML5_%25E8%25A1%25A8%25E5%2596%25AE
- 7844: https://developer.mozilla.org/ko/docs/Web/HTML/HTML%25EC%2597%2590%25EC%2584%259C_%25ED%258F%25BC
- 8084: https://developer.mozilla.org/es/docs/Web/HTML/Consejos_para_la_creaci%25C3%25B3n_de_p%25C3%25A1ginas_HTML_de_carga_r%25C3%25A1pida
- 32894: https://developer.mozilla.org/es/docs/Web/HTML/Gesti%25C3%25B3n_del_foco_en_HTML
- 65417: https://developer.mozilla.org/ru/docs/Web/HTML/%25D0%2598%25D1%2581%25D0%25BF%25D0%25BE%25D0%25BB%25D1%258C%25D0%25B7%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D0%25B5_HTML5_audio_and_video
- 65755: https://developer.mozilla.org/bn-BD/docs/Apps/%E0%A6%85%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%AA_%E0%A6%AE%E0%A6%BE%E0%A6%A8%E0%A7%8B%E0%A6%A8%E0%A7%8D%E0%A6%A8%E0%A7%9F%E0%A6%A8%E0%A6%95%E0%A6%BE%E0%A6%B0%E0%A7%80_%E0%A6%9F%E0%A7%81%E0%A6%B2%E0%A6%B8%E0%A6%B8%E0%A6%AE%E0%A7%82%E0%A6%B9
- 65757: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8_%E0%A6%93%E0%A6%8F%E0%A6%B8_%E0%A6%B8%E0%A6%BF%E0%A6%AE%E0%A7%81%E0%A6%B2%E0%A7%87%E0%A6%9F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0_%E0%A6%95%E0%A6%B0%E0%A6%BE
- 79903: https://developer.mozilla.org/bn-BD/docs/%E0%A6%93%E0%A7%9F%E0%A7%87%E0%A6%AC%E0%A6%8F%E0%A6%AA%E0%A6%BF%E0%A6%86%E0%A6%87
- 87507: https://developer.mozilla.org/bn-BD/docs/Mozilla/%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8/%E0%A6%B0%E0%A6%BF%E0%A6%B2%E0%A6%BF%E0%A6%9C%E0%A6%B8
- 87533: https://developer.mozilla.org/bn-BD/docs/Web/Guide/CSS/%E0%A6%B8%E0%A6%BF%E0%A6%8F%E0%A6%B8%E0%A6%8F%E0%A6%B8_%E0%A6%97%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%A1%E0%A6%BF%E0%A7%9F%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%A8%E0%A7%8D%E0%A6%9F_%E0%A6%8F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0
- 88203: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Quickstart/%E0%A6%93%E0%A6%AA%E0%A7%87%E0%A6%A8_%E0%A6%93%E0%A7%9F%E0%A7%87%E0%A6%AC_%E0%A6%85%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%AA_%E0%A6%AA%E0%A6%B0%E0%A6%BF%E0%A6%9A%E0%A6%BF%E0%A6%A4%E0%A6%BF
- 91053: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE
- 100515: https://developer.mozilla.org/ru/docs/Web/HTML/%25D0%2598%25D1%2581%25D0%25BF%25D0%25BE%25D0%25BB%25D1%258C%25D0%25B7%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D0%25B5_%25D0%25BA%25D1%258D%25D1%2588%25D0%25B8%25D1%2580%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D1%258F_%25D0%25BF%25D1%2580%25D0%25B8%25D0%25BB%25D0%25BE%25D0%25B6%25D0%25B5%25D0%25BD%25D0%25B8%25D0%25B9
- 107519: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps
- 108085: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Introduction_to_Gaia
- 108923: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Video
- 111879: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Weinre_As_Remote_Debugger
- 124063: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Settings
- 124435: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Window_Management
- 125375: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/System
- 125427: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/LockScreen_Architecture_(v1.5_)
- 127857: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Browser
- 146589: https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML_%25E5%2585%2583%25E7%25B4%25A0/meter
- 222920: https://developer.mozilla.org/ml/docs/Tools/Page_Inspector/How_to/%E0%B4%9A%E0%B4%9F%E0%B5%8D%E0%B4%9F%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%82%E0%B4%9F%E0%B5%8D_%E0%B4%B0%E0%B5%87%E0%B4%96%E0%B4%BE%E0%B4%9A%E0%B4%BF%E0%B4%A4%E0%B5%8D%E0%B4%B0%E0%B4%82_%E0%B4%AA%E0%B4%B0%E0%B4%BF%E0%B4%B6%E0%B5%87%E0%B4%BE%E0%B4%A7%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%81%E0%B4%95
Updated•7 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
When "%25" is in the URL, then the issue is that a slug was double-encoded. For document 3863, the slug was Web/HTML/HTML5_%E8%A1%A8%E5%96%AE
. When this is URL encoded, the %
is converted to %25
, so the URL is https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML5_%25E8%25A1%25A8%25E5%2596%25AE'.
The solution is to manually unquote the slug:
from urllib import unquote
doc = Document.objects.get(id=3863)
doc.current_revision.slug = unquote(doc.current_revision.slug.replace('%25', '%')
doc.current_revision.save()
This updates the URL to https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML5_%E8%A1%A8%E5%96%AE.
This is a old translation, and the new version of the page is translated at:
https://developer.mozilla.org/zh-TW/docs/Learn/HTML/Forms
I updated the now-working page to be a redirect to that page.
Here's an auto-fixer:
import unicodedata
import urllib
for doc in Document.objects.filter(slug__contains='%').order_by('id'):
old_url = doc.get_full_url()
old_slug = doc.slug
doc.current_revision.slug = urllib.unquote(old_slug.replace('%25', '%'))
doc.current_revision.save()
new_doc = Document.objects.get(id=doc.id)
new_url = new_doc.get_full_url()
print "* %s: %s -> %s" % (doc.id, old_url, new_url)
This would potentially break a page that legitimately used a percent %
in the slug, but I could find none. This suggests that %
should be forbidden from slugs to avoid this issue in the future.
Converted pages:
- 1222: https://developer.mozilla.org/en-US/docs/User:%22_%253E%253Cinput%253Exxxxx -> https://developer.mozilla.org/en-US/docs/User:%22_%3E%3Cinput%3Exxxxx
- 3429: https://developer.mozilla.org/fr/docs/Web/CSS/:after_%257C_::after -> https://developer.mozilla.org/fr/docs/Web/CSS/:after_%7C_::after
- 3472: https://developer.mozilla.org/en-US/docs/User:te%22%253E%253Ch1%253E%3F%3F%253C -> https://developer.mozilla.org/en-US/docs/User:te%22%3E%3Ch1%3E%3F%3F%3C
- 3473: https://developer.mozilla.org/en-US/docs/User:te%22%253E%253Ch1%253E%3F%3F%253C/h1%253Est123bbb -> https://developer.mozilla.org/en-US/docs/User:te%22%3E%3Ch1%3E%3F%3F%3C/h1%3Est123bbb
- 3679: https://developer.mozilla.org/en-US/docs/%257B%257Bdomxref(%22/TouchList.item -> https://developer.mozilla.org/en-US/docs/%7B%7Bdomxref(%22/TouchList.item
- 3680: https://developer.mozilla.org/en-US/docs/%257B%257Bdomxref(%22/Input.setSelectionRange -> https://developer.mozilla.org/en-US/docs/%7B%7Bdomxref(%22/Input.setSelectionRange
- 4897: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/JavaScript_-_Array%2523splice -> https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/JavaScript_-_Array%23splice
- 6927: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Proxy%257B%257Btemplate(%22Non-standard_heade -> https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Proxy%7B%7Btemplate(%22Non-standard_heade
- 7315: https://developer.mozilla.org/en-US/docs/User:123'%22%253E%253C -> https://developer.mozilla.org/en-US/docs/User:123'%22%3E%3C
- 7844: https://developer.mozilla.org/ko/docs/Web/HTML/HTML%25EC%2597%2590%25EC%2584%259C_%25ED%258F%25BC -> https://developer.mozilla.org/ko/docs/Web/HTML/HTML%C3%AC%C2%97%C2%90%C3%AC%C2%84%C2%9C_%C3%AD%C2%8F%C2%BC
- 8084: https://developer.mozilla.org/es/docs/Web/HTML/Consejos_para_la_creaci%25C3%25B3n_de_p%25C3%25A1ginas_HTML_de_carga_r%25C3%25A1pida -> https://developer.mozilla.org/es/docs/Web/HTML/Consejos_para_la_creaci%C3%83%C2%B3n_de_p%C3%83%C2%A1ginas_HTML_de_carga_r%C3%83%C2%A1pida
- 8692: https://developer.mozilla.org/ja/docs/Web/CSS/border-top-left-radius_%257C_-moz-border-radius-topleft -> https://developer.mozilla.org/ja/docs/Web/CSS/border-top-left-radius_%7C_-moz-border-radius-topleft
- 11077: https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Getting_started/Why_use_CSS%253f -> https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Getting_started/Why_use_CSS%3F
- 11667: https://developer.mozilla.org/en-US/docs/User:ryan%252Blinefeed -> https://developer.mozilla.org/en-US/docs/User:ryan+linefeed
- 13030: https://developer.mozilla.org/fr/docs/Web/HTML/Element/Output_%257B%257BHTMLVersionInline(5)%257D%257D_%257B%257Bfx_minversion_inline(4)%257D%257D -> https://developer.mozilla.org/fr/docs/Web/HTML/Element/Output_%7B%7BHTMLVersionInline(5)%7D%7D_%7B%7Bfx_minversion_inline(4)%7D%7D
- 13576: https://developer.mozilla.org/en-US/docs/User:user01/page02%2523hello_world -> https://developer.mozilla.org/en-US/docs/User:user01/page02%23hello_world
- 14732: https://developer.mozilla.org/ja/docs/Jetpack/UI/%257B%257Bwiki.template('%E7%BF%BB%E8%A8%B3%E4%B8%AD')%257D%257D -> https://developer.mozilla.org/ja/docs/Jetpack/UI/%7B%7Bwiki.template('%E7%BF%BB%E8%A8%B3%E4%B8%AD')%7D%7D
- 15013: https://developer.mozilla.org/en-US/docs/CSS/:%255BProperty_Name%255D/:-moz-locale-dir(rtl) -> https://developer.mozilla.org/en-US/docs/CSS/:%5BProperty_Name%5D/:-moz-locale-dir(rtl)
- 17662: https://developer.mozilla.org/en-US/docs/Talk:Accessibility/Implementing_an_MSAA_Server_%257BTalk%257D -> https://developer.mozilla.org/en-US/docs/Talk:Accessibility/Implementing_an_MSAA_Server_%7BTalk%7D
- 20311: https://developer.mozilla.org/en-US/docs/User:Brigettek/C%252B%252B_test -> https://developer.mozilla.org/en-US/docs/User:Brigettek/C++_test
- 32894: https://developer.mozilla.org/es/docs/Web/HTML/Gesti%25C3%25B3n_del_foco_en_HTML -> https://developer.mozilla.org/es/docs/Web/HTML/Gesti%C3%83%C2%B3n_del_foco_en_HTML
- 47468: https://developer.mozilla.org/en-US/docs/User:%22/%22%253E%253Cimg_src=x_onerror=prompt(1);%253E -> https://developer.mozilla.org/en-US/docs/User:%22/%22%3E%3Cimg_src=x_onerror=prompt(1);%3E
- 65417: https://developer.mozilla.org/ru/docs/Web/HTML/%25D0%2598%25D1%2581%25D0%25BF%25D0%25BE%25D0%25BB%25D1%258C%25D0%25B7%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D0%25B5_HTML5_audio_and_video -> https://developer.mozilla.org/ru/docs/Web/HTML/%C3%90%C2%98%C3%91%C2%81%C3%90%C2%BF%C3%90%C2%BE%C3%90%C2%BB%C3%91%C2%8C%C3%90%C2%B7%C3%90%C2%BE%C3%90%C2%B2%C3%90%C2%B0%C3%90%C2%BD%C3%90%C2%B8%C3%90%C2%B5_HTML5_audio_and_video
- 67851: https://developer.mozilla.org/ja/docs/Web/Guide/CSS/Getting_started/Why_use_CSS%253f -> https://developer.mozilla.org/ja/docs/Web/Guide/CSS/Getting_started/Why_use_CSS%3F
- 69871: https://developer.mozilla.org/ja/docs/CSS/border-top-left-radius_%257C_-moz-border-radius-topleft -> https://developer.mozilla.org/ja/docs/CSS/border-top-left-radius_%7C_-moz-border-radius-topleft
- 74433: https://developer.mozilla.org/fr/docs/HTML/Element/Output_%257B%257BHTMLVersionInline(5)%257D%257D_%257B%257Bfx_minversion_inline(4)%257D%257D -> https://developer.mozilla.org/fr/docs/HTML/Element/Output_%7B%7BHTMLVersionInline(5)%7D%7D_%7B%7Bfx_minversion_inline(4)%7D%7D
- 100515: https://developer.mozilla.org/ru/docs/Web/HTML/%25D0%2598%25D1%2581%25D0%25BF%25D0%25BE%25D0%25BB%25D1%258C%25D0%25B7%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D0%25B5_%25D0%25BA%25D1%258D%25D1%2588%25D0%25B8%25D1%2580%25D0%25BE%25D0%25B2%25D0%25B0%25D0%25BD%25D0%25B8%25D1%258F_%25D0%25BF%25D1%2580%25D0%25B8%25D0%25BB%25D0%25BE%25D0%25B6%25D0%25B5%25D0%25BD%25D0%25B8%25D0%25B9 -> https://developer.mozilla.org/ru/docs/Web/HTML/%C3%90%C2%98%C3%91%C2%81%C3%90%C2%BF%C3%90%C2%BE%C3%90%C2%BB%C3%91%C2%8C%C3%90%C2%B7%C3%90%C2%BE%C3%90%C2%B2%C3%90%C2%B0%C3%90%C2%BD%C3%90%C2%B8%C3%90%C2%B5_%C3%90%C2%BA%C3%91%C2%8D%C3%91%C2%88%C3%90%C2%B8%C3%91%C2%80%C3%90%C2%BE%C3%90%C2%B2%C3%90%C2%B0%C3%90%C2%BD%C3%90%C2%B8%C3%91%C2%8F_%C3%90%C2%BF%C3%91%C2%80%C3%90%C2%B8%C3%90%C2%BB%C3%90%C2%BE%C3%90%C2%B6%C3%90%C2%B5%C3%90%C2%BD%C3%90%C2%B8%C3%90%C2%B9
- 117277: https://developer.mozilla.org/fr/docs/CSS/:after_%257C_::after -> https://developer.mozilla.org/fr/docs/CSS/:after_%7C_::after
- 146589: https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML_%25E5%2585%2583%25E7%25B4%25A0/meter -> https://developer.mozilla.org/zh-TW/docs/Web/HTML/HTML_%C3%A5%C2%85%C2%83%C3%A7%C2%B4%C2%A0/meter
- 233270: https://developer.mozilla.org/zh-CN/docs/User:te%22%253E%253Ch1%253E%3F%3F%253C -> https://developer.mozilla.org/zh-CN/docs/User:te%22%3E%3Ch1%3E%3F%3F%3C
There were additional documents that would have caused a collision, and were deleted:
- 65031: https://developer.mozilla.org/zh-CN/docs/HTML%252FCanvas%252FTutorial -> https://developer.mozilla.org/zh-CN/docs/HTML/Canvas/Tutorial
- 67423: https://developer.mozilla.org/en-US/docs/CSS/Getting_Started/Why_use_CSS%253f -> https://developer.mozilla.org/en-US/docs/CSS/Getting_Started/Why_use_CSS%3F
Assignee | ||
Comment 3•6 years ago
|
||
This leaves 16 documents that appears to be valid, but require normalization to NFC. Here's that code:
for doc in Document.objects.filter_for_list().order_by('id'):
normslug = unicodedata.normalize('NFC', doc.slug)
if normslug != doc.slug:
old_url = doc.get_full_url()
doc.current_revision.slug = unicodedata.normalize('NFC', doc.current_revision.slug)
doc.current_revision.save()
new_doc = Document.objects.get(id=doc.id)
new_url = new_doc.get_full_url()
print "* %s: %s -> %s" % (doc.id, old_url, new_url)
And here's the documents with now-working URLs:
- 65757: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8_%E0%A6%93%E0%A6%8F%E0%A6%B8_%E0%A6%B8%E0%A6%BF%E0%A6%AE%E0%A7%81%E0%A6%B2%E0%A7%87%E0%A6%9F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0_%E0%A6%95%E0%A6%B0%E0%A6%BE -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/%E0%A6%AB%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8_%E0%A6%93%E0%A6%8F%E0%A6%B8_%E0%A6%B8%E0%A6%BF%E0%A6%AE%E0%A7%81%E0%A6%B2%E0%A7%87%E0%A6%9F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0_%E0%A6%95%E0%A6%B0%E0%A6%BE
- 79903: https://developer.mozilla.org/bn-BD/docs/%E0%A6%93%E0%A7%9F%E0%A7%87%E0%A6%AC%E0%A6%8F%E0%A6%AA%E0%A6%BF%E0%A6%86%E0%A6%87 -> https://developer.mozilla.org/bn-BD/docs/%E0%A6%93%E0%A6%AF%E0%A6%BC%E0%A7%87%E0%A6%AC%E0%A6%8F%E0%A6%AA%E0%A6%BF%E0%A6%86%E0%A6%87
- 87507: https://developer.mozilla.org/bn-BD/docs/Mozilla/%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8/%E0%A6%B0%E0%A6%BF%E0%A6%B2%E0%A6%BF%E0%A6%9C%E0%A6%B8 -> https://developer.mozilla.org/bn-BD/docs/Mozilla/%E0%A6%AB%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE%E0%A6%B0%E0%A6%AB%E0%A6%95%E0%A7%8D%E0%A6%B8/%E0%A6%B0%E0%A6%BF%E0%A6%B2%E0%A6%BF%E0%A6%9C%E0%A6%B8
- 87533: https://developer.mozilla.org/bn-BD/docs/Web/Guide/CSS/%E0%A6%B8%E0%A6%BF%E0%A6%8F%E0%A6%B8%E0%A6%8F%E0%A6%B8_%E0%A6%97%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%A1%E0%A6%BF%E0%A7%9F%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%A8%E0%A7%8D%E0%A6%9F_%E0%A6%8F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0 -> https://developer.mozilla.org/bn-BD/docs/Web/Guide/CSS/%E0%A6%B8%E0%A6%BF%E0%A6%8F%E0%A6%B8%E0%A6%8F%E0%A6%B8_%E0%A6%97%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%A8%E0%A7%8D%E0%A6%9F_%E0%A6%8F%E0%A6%B0_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%AC%E0%A6%B9%E0%A6%BE%E0%A6%B0
- 88203: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Quickstart/%E0%A6%93%E0%A6%AA%E0%A7%87%E0%A6%A8_%E0%A6%93%E0%A7%9F%E0%A7%87%E0%A6%AC_%E0%A6%85%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%AA_%E0%A6%AA%E0%A6%B0%E0%A6%BF%E0%A6%9A%E0%A6%BF%E0%A6%A4%E0%A6%BF -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Quickstart/%E0%A6%93%E0%A6%AA%E0%A7%87%E0%A6%A8_%E0%A6%93%E0%A6%AF%E0%A6%BC%E0%A7%87%E0%A6%AC_%E0%A6%85%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%AA_%E0%A6%AA%E0%A6%B0%E0%A6%BF%E0%A6%9A%E0%A6%BF%E0%A6%A4%E0%A6%BF
- 91053: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE
- 107519: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps
- 108085: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Introduction_to_Gaia -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Introduction_to_Gaia
- 108923: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Video -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps/Video
- 111879: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Weinre_As_Remote_Debugger -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Weinre_As_Remote_Debugger
- 124063: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Settings -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps/Settings
- 124435: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Window_Management -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps/Window_Management
- 125375: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/System -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps/System
- 125427: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/LockScreen_Architecture_(v1.5_) -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/LockScreen_Architecture_(v1.5_)
- 127857: https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A7%9F%E0%A6%BE/Gaia_apps/Browser -> https://developer.mozilla.org/bn-BD/docs/Archive/B2G_OS/Platform/%E0%A6%97%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BE/Gaia_apps/Browser
- 222920: https://developer.mozilla.org/ml/docs/Tools/Page_Inspector/How_to/%E0%B4%9A%E0%B4%9F%E0%B5%8D%E0%B4%9F%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%82%E0%B4%9F%E0%B5%8D_%E0%B4%B0%E0%B5%87%E0%B4%96%E0%B4%BE%E0%B4%9A%E0%B4%BF%E0%B4%A4%E0%B5%8D%E0%B4%B0%E0%B4%82_%E0%B4%AA%E0%B4%B0%E0%B4%BF%E0%B4%B6%E0%B5%87%E0%B4%BE%E0%B4%A7%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%81%E0%B4%95 -> https://developer.mozilla.org/ml/docs/Tools/Page_Inspector/How_to/%E0%B4%9A%E0%B4%9F%E0%B5%8D%E0%B4%9F%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%82%E0%B4%9F%E0%B5%8D_%E0%B4%B0%E0%B5%87%E0%B4%96%E0%B4%BE%E0%B4%9A%E0%B4%BF%E0%B4%A4%E0%B5%8D%E0%B4%B0%E0%B4%82_%E0%B4%AA%E0%B4%B0%E0%B4%BF%E0%B4%B6%E0%B5%8B%E0%B4%A7%E0%B4%BF%E0%B4%95%E0%B5%8D%E0%B4%95%E0%B5%81%E0%B4%95
Assignee | ||
Comment 4•6 years ago
|
||
The current slugs with this issue have been manually modified.
A percent sign %
, is already forbidden in slugs. The remaining work is to normalize slugs with NFC.
Assignee | ||
Comment 5•6 years ago
•
|
||
Some of the un-quoted slugs were incorrectly encoded. I attempted to manually fix these, but when I looked at the details, there were collisions, and moved pages, and often translations of deleted pages. I've manually fixed items, but I didn't take careful notes.
I'm working on detecting badly encoded strings in the database, and I'm thinking about solutions.
Assignee | ||
Comment 6•6 years ago
|
||
I ran this to find "badly encoded" URLS:
for doc in Document.objects.only('id', 'locale', 'slug', 'is_redirect'):
bad_chars = sum([1 for c in doc.slug if 0x80 <= ord(c) <= 0xbf])
if bad_chars:
print "%d (%d, %s): %s" % (doc.id, bad_chars, doc.is_redirect, doc.get_full_url())
Most were redirects. There was one remaining that seems to validly use these characters ("Instal·lació" does appear to be "Installation" in Catalan):
Next, I looked at how Django does this, and it used NFKC to slugify unicode.
After some more reading, I found RFC 3987, section 5.3.2.2
Equivalence of IRIs MUST rely on the assumption that IRIs are
appropriately pre-character-normalized rather than apply character
normalization when comparing two IRIs. The exceptions are conversion
from a non-digital form, and conversion from a non-UCS-based
character encoding to a UCS-based character encoding. In these cases,
NFC or a normalizing transcoder using NFC MUST be used for
interoperability. To avoid false negatives and problems with
transcoding, IRIs SHOULD be created by using NFC. Using NFKC may
avoid even more problems; for example, by choosing half-width Latin
letters instead of full-width ones, and full-width instead of
half-width Katakana.
So it looks like the standard is "must be normalized", not "normalized in NFC". I'm inclined to follow Django's lead and use NFKC.
One change that even I can understand is that this URL:
https://developer.mozilla.org/fr/docs/JavaScript/Reference/Instructions/for_each%E2%80%A6in
would stop using the horizontal ellipsis …
and change to this URL:
https://developer.mozilla.org/fr/docs/JavaScript/Reference/Instructions/for_each...in
In this case, this was already done, and the first URL is a Kuma redirect to the second. There were three non-redirects that were not already NFKD normalized, which I normalized using the Move Page feature:
- https://developer.mozilla.org/en-US/docs/User:Shichijo/This%E3%80%80is_a_subpage -> https://developer.mozilla.org/en-US/docs/User:Shichijo/This_is_a_subpage (normalized to a space, I changed it to an underscore)
- https://developer.mozilla.org/th/docs/Tools/WebIDE/%E0%B8%81%E0%B8%B2%E0%B8%A3%E0%B8%97%E0%B8%B3%E0%B8%87%E0%B8%B2%E0%B8%99%E0%B8%A3%E0%B9%88%E0%B8%A7%E0%B8%A1%E0%B8%81%E0%B8%B1%E0%B8%9A%E0%B8%84%E0%B8%AD%E0%B8%A3%E0%B9%8C%E0%B9%82%E0%B8%94%E0%B8%A7%E0%B8%B2%E0%B9%81%E0%B8%AD%E0%B8%9E%E0%B8%9E%E0%B8%A5%E0%B8%B4%E0%B9%80%E0%B8%84%E0%B8%8A%E0%B8%B1%E0%B9%88%E0%B8%99%E0%B9%83%E0%B8%99_WebIDE -> https://developer.mozilla.org/th/docs/Tools/WebIDE/%E0%B8%81%E0%B8%B2%E0%B8%A3%E0%B8%97%E0%B9%8D%E0%B8%B2%E0%B8%87%E0%B8%B2%E0%B8%99%E0%B8%A3%E0%B9%88%E0%B8%A7%E0%B8%A1%E0%B8%81%E0%B8%B1%E0%B8%9A%E0%B8%84%E0%B8%AD%E0%B8%A3%E0%B9%8C%E0%B9%82%E0%B8%94%E0%B8%A7%E0%B8%B2%E0%B9%81%E0%B8%AD%E0%B8%9E%E0%B8%9E%E0%B8%A5%E0%B8%B4%E0%B9%80%E0%B8%84%E0%B8%8A%E0%B8%B1%E0%B9%88%E0%B8%99%E0%B9%83%E0%B8%99_WebIDE (I can't see the difference in the rendered text)
- https://developer.mozilla.org/ja/docs/Glossary/Constant%EF%BC%88%E5%AE%9A%E6%95%B0%EF%BC%89 -> https://developer.mozilla.org/ja/docs/Glossary/Constant(定数) (this is an improvement, in my opinion)
I'm going to change the task to converting to NFKC. An NFKC string is also an NFC string, but some NFC strings (like the above) are not NFKC strings. One of the design goals of NFKC is to avoid codepoints that look like other codepoints, and they are suggested as a better choice than NFC for identifiers.
Assignee | ||
Comment 7•6 years ago
|
||
https://github.com/mozilla/kuma/pull/5344 converts slugs to NFKC when editing pages. This takes care of badly-formed slugs going forward.
There's a couple of follow-on tasks that are uncovered:
- Middleware should redirect incoming requests that are not already NFC encoded
- Other fields (such as page content) that expect Unicode should be converted to NFC when saving to the database or used in forms
- Tag names appear in URLs, and also need NFKC
I think these can be addressed by new bugs if and when they cause issues for users.
Updated•5 years ago
|
Description
•