Closed Bug 710724 Opened 13 years ago Closed 13 years ago

Migrate localized pages from MindTouch to Kuma

Categories

(developer.mozilla.org Graveyard :: Editing, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lorchard, Assigned: lorchard)

References

Details

(Whiteboard: u=user c=wiki p=3)

MindTouch supports localized pages. Kuma does too, but there seem to be some differences and some rough edges to sand over. These might warrant breaking out into separate bugs, but I found at least these issues from an initial import: * The first step is to ensure MindTouch page locales are carried over into Kuma pages. * MindTouch and Kuma locales are different. (eg. en vs en-US) We need to normalize & convert between the two. * Title and slug in kuma are required to be unique across the site. In MindTouch, they're unique *per locale*. So, in MindTouch there are multiple pages with the same title, but with different locales. This currently throws exceptions during a simple import. So, we should revise the uniqueness constraint in Kuma to match MindTouch. * MindTouch includes the locale in the page slug. Kuma does locale detection at a higher level. Thus, with a simple import, we end up with imported URLs like /en-US/docs/en/CSS/Attribute_selectors. We probably need to chop off the MindTouch locale from page slugs and use Kuma locale detection to select which locale is used in looking up a page.
re: normalizing/converting locales, I started a bit of that for devmo_url helper way back when - https://github.com/mozilla/kuma/blob/mdn/settings.py#L142
Blocks: 710713
2.1, maybe?
Target Milestone: --- → 2.1
Depends on: 717380
Migrate MindTouch pages, parsing locale from slug, normalizing, and inserting into django document model locale field.
Whiteboard: u=user c=wiki p=2
From IRC, relevant to this and page localization in Kuma in general: <sheppy> groovecoder: quick question… on the current wiki, when the user chooses the language they want to see content in, that also affects the UI they're shown for the wiki. We don't actually want those coupled together. I should be able to look at a page that's in the French locale while seeing my English-language UI for managing the content. The current plan ties both UI and document language together. Maybe the wrong plan? It's possible (but ugly) to have both locales in the URL (eg. stay with something like /en-US/docs/en/HTML/HTML5 and /en-US/docs/fr/HTML/HTML5)
FWIW, URLs with 2 locales would be ugly in a certain sense, but could make migration and rewrite rules easier down the road.
Comment #4 resulted in bug 717445. I think a requirement to decouple UI and content locales demands URLs with 2 locales, and changes the approach to this bug.
Depends on: 717445
Additional quick analysis on MindTouch page locales... Here's a look at pages with blank language per namespace: mysql> select page_namespace, count(*) as page_count from pages where page_language='' and page_namespace in (0, 1, 2, 3, 4, 5) group by page_namespace; +----------------+------------+ | page_namespace | page_count | +----------------+------------+ | 0 | 553 | |Talk: 1 | 140 | |User: 2 | 175 | |User_talk: 3 | 87 | |Project_talk: 5 | 4 | +----------------+------------+ 5 rows in set (3.58 sec) Not sure what to do with the blank locales for pages in namespaces 1, 2, 3, and 5. But, the majority of namespace 0 pages with a blank page_language actually look like they should be "en-US" locale: mysql> select count(*) from pages where page_language='' and page_namespace=0 and page_title like 'en/%'; +----------+ | count(*) | +----------+ | 546 | +----------+ 1 row in set (12.69 sec) mysql> select page_title from pages where page_language='' and page_namespace=0 and page_title not like 'en/%'; +----------------------------------------------------------------------+ | page_title | +----------------------------------------------------------------------+ | Cn/Creating_Custom_Firefox_Extensions_with_the_Mozilla_Build_System | | cn/Creating_XPCOM_Components/Setting_up_the_Gecko_SDK | | fr/HTML | | fa/توسعه_وب | | cn/E4X_Tutorial/访问_XML_子节点 | | ja/JavaScript/Server-Side_JavaScript/ECMAScript_5_support_in_Mozilla | | AppLinks/WebConsoleHelp | +----------------------------------------------------------------------+ 7 rows in set (0.79 sec) Maybe we're relatively okay just defaulting to "en-US" locale for any page that's missing a locale?
FWIW, we could build a hardcoded map of title-to-locale for known exceptions, or manually fix up the locales after migration.
Also, since parsing the locale from the page slug was mentioned earlier, and in case the above two comments didn't make it more obvious: There does exist a `page_language` column, and we should use that before trying to parse a locale from the page slug. Maybe try parsing the page title as a last resort, but definitely remove the locale part from the page title before migrating it
One more monkey wrench, speaking of parsing page titles: User: pages tend not to have a locale in the title. So, the title can't and shouldn't be parsed for a locale prefix. But, some of them have a blank `page_language`. I'd suggest just shoving them into the en-US locale as a default.
Target Milestone: 2.1 → 2.2
Target Milestone: 2.2 → 2.3
Whiteboard: u=user c=wiki p=2 → u=user c=wiki p=3
Assignee: nobody → lorchard
Blocks: 728417
Blocks: 710726
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Version: MDN → unspecified
Component: Docs Platform → Editing
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.