Closed Bug 769686 Opened 13 years ago Closed 13 years ago

Document slugs containing "//" are problematic

Categories

(developer.mozilla.org Graveyard :: Wiki pages, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trevorhemail-mozbug, Assigned: lorchard)

References

Details

(Whiteboard: s=2012-08-01 p=3)

Ideally this will also provide correct uris created by wiki.uri and will be evaluated correctly by wiki.pageExists. For example if en-US/docs/NSPR_API_Reference/I/O_Functions is passed to wiki.uri or wiki.pageExists it would be able to figure out that this is a link to the "I/O Functions" page.
TL;DR: I think we might need to end up renaming these pages and changing links. On current production, there's this section of docs: https://developer.mozilla.org/en/NSPR_API_Reference/ In that section, there are these pages: https://developer.mozilla.org/en/NSPR_API_Reference/I%2F%2FO_Types https://developer.mozilla.org/en/NSPR_API_Reference/I%2F%2FO_Functions But, even on current production MindTouch, these seem problematic - both of those links in the ToC lead to Edit pages. Now, on developer-new, the section exists: https://developer-new.mozilla.org/en-US/docs/NSPR_API_Reference And, from the admin, I can see that the pages in question got migrated: https://developer-new.mozilla.org/admin/wiki/document/44159/ (I/O Types) https://developer-new.mozilla.org/admin/wiki/document/18419/ (I/O Functions) But, I think the slugs - I//O_Types and I//O_Functions - are choking up our URL routing and section hierarchy handling. I'm not sure if there's an easy solution here that can accommodate slugs with slashes in them. Maybe we need to rename these kinds of pages, and fix the links? The titles can stay the same, but the URL slugs probably need tweaking.
Blocks: 756263
Summary: Handle "/" in article titles → Document slugs containing "//" are problematic
Oh, in case it's not obvious: One change from MindTouch to Kuma is that now document titles and URLs can be changed independently. So, we can keep the *title* "I/O Functions" but change the URL to something like "I_O_Functions"
FWIW, I just tweaked these pages in the admin to see what would happen, and I can access them: https://developer-new.mozilla.org/en-US/docs/NSPR_API_Reference/I_O_Types https://developer-new.mozilla.org/en-US/docs/NSPR_API_Reference/I_O_Functions I don't think that's the end of this bug, though, because we might need to do some fancy footwork with redirects and many mod_rewrite in Apache to get legacy links to land in the right place. (eg. convert "//" to "_" at the apache level, because it gets funbled at the Django level)
People who have bookmarked the old pages and search engines will have links such as https://developer.mozilla.org/index.php?title=en/NSPR_API_Reference/I%2F%2FO_Functions so I imagine we would need to support this?
(In reply to Les Orchard [:lorchard] from comment #2) > Oh, in case it's not obvious: One change from MindTouch to Kuma is that now > document titles and URLs can be changed independently. So, we can keep the > *title* "I/O Functions" but change the URL to something like "I_O_Functions" Lots of templates currently take the page title as a parameter and create a uri using that. This does not work on the old system, it would be nice if we could fix that breakage without having to rewrite the templates.
(In reply to Trevor from comment #5) > Lots of templates currently take the page title as a parameter and create a > uri using that. This does not work on the old system, it would be nice if we > could fix that breakage without having to rewrite the templates. FWIW, most (if not all) templates *are* being rewritten right now, if only partially [1]. Do you have some examples in mind? [1] https://developer.mozilla.org/Project:en/Introduction_to_KumaScript#Limitations_of_content_migration_from_MindTouch
(In reply to Trevor from comment #4) > People who have bookmarked the old pages and search engines will have links > such as > https://developer.mozilla.org/index.php?title=en/NSPR_API_Reference/ > I%2F%2FO_Functions so I imagine we would need to support this? We might be able to make a fake "index.php" view that takes a title parameter for problematic pages. We probably still need to rename the pages, since "//" will continue to be an issue in URL slugs. If we followed a renaming pattern of "//" -> "_", then at least we could do the same in the fake "index.php" The really weird thing is that, as far as I can tell, the NSPR_API_Reference page links to URLs like this: https://developer.mozilla.org/en/NSPR_API_Reference/I%2F%2FO_Types But, those don't seem to work, not even in the current system. Is there an example page that links to index.php for these pages?
Blocks: 773295
No longer blocks: 756263
Priority: -- → P2
Whiteboard: s=2012-08-01
Whiteboard: s=2012-08-01 → s=2012-08-01 p=3
After banging my head on this for awhile today, a first note: Looks like we can't put mod_rewrite rules into .htaccess that correct for "//" in the URL path. Seems like "//" has already been collapsed to "/" by that point But, if I add this to the Apache config for the virtual host, I can turn "//" into "_" for the cases that currently exist in the DB: RewriteRule ^(.*)//(.*)//(.*)$ $1_$2_$3 [R=301,L,NC] RewriteRule ^(.*)//(.*)$ $1_$2 [R=301,L,NC] So, that's going to be an IT bug. I tried writing a single repeating [N] rule to replace any number of occurrences of "//" with "_" and failed. But, as far as I can tell: there are 583 docs with "//"x1, 6 docs with "//"x2; and no docs with any number greater than 2 occurrences
Next step, after the above rewrite rules are in place, is to replace "//" with "_" in the slugs of those 583 documents. I think that can be a single SQL query. And, as far as I can tell, the current editing code won't allow "//" to be used in the slug for any pages in the future. The bonus points will be in simulating index.php?title={path} for further legacy redirect support.
Depends on: 779292
After 779292 is finished, I can take a shot at renaming the affected docs - there are 582 of them that contain "//". Going to sleep on this SQL and try it again on another fresh import on my laptop, but I think this should replace "//" in all documents and current revisions: create temporary table old_slugs select slug from wiki_document; update wiki_document set slug=replace(slug,'//','_') where slug like '%//%' and replace(slug,'//','_') not in (select slug from old_slugs); update wiki_revision r left join wiki_document d on d.current_revision_id=r.id set r.slug=replace(r.slug,'//','_') where d.current_revision_id is not null and r.slug like '%//%' and replace(r.slug,'//','_') not in (select slug from old_slugs);
Alright, so I didn't sleep on it. Seems like the Apache config from bug 779292 has been applied, and the SQL worked fine on a fresh DB import. So, I just went ahead and ran the renaming. This page seems to work now, albeit routing through several redirects first: http://developer-new.mozilla.org/en-US/docs/File_I//O Might still need another tweak to Apache rules, because this page doesn't work: https://developer-new.mozilla.org/en/NSPR_API_Reference/I%2F%2FO_Functions But, this one (with %2F decoded to "/") does, after some redirects: https://developer-new.mozilla.org/en/NSPR_API_Reference/I//O_Functions
Huh... But the %2F case works on my dev VM, without additional Apache rules. Not sure what's up on developer-new
Since these docs should all be reachable and editable. The last bits are legacy redirects, which should be rare but I have no data on their frequency of use. Going to close this and file 2 followups, bug 779499 (index.php) and bug 779501 (the "%2F%2F" issue)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee: nobody → lorchard
Version: Kuma → unspecified
Component: Website → Landing pages
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.