Closed Bug 1159930 Opened 9 years ago Closed 4 years ago

Move MDN to Postgres

Categories

(developer.mozilla.org Graveyard :: Code Cleanup, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jezdez, Unassigned)

References

Details

(Whiteboard: [specification][type:change])

Attachments

(1 file)

What feature should be changed? Please provide the URL of the feature if possible.
==================================================================================
The database is currently MySQL

What problems would this solve?
===============================
We have some issues (https://bugzilla.mozilla.org/show_bug.cgi?id=776048#c13 for example, but also stuff in the Django upgrade branch that broke with and update of django-taggit) with case insensitive collations that also take unicode characters into account. We don't want to jump through hoops to custom build every query with query level collations that is created by Django's ORM or write raw SQL so we need to think about other ways to properly store Unicode data for our multilingual website.

Who would use this?
===================
The backend devs.

What would users see?
=====================
Nothing

What would users do? What would happen as a result?
===================================================
Nothing

Is there anything else we should know?
======================================
Component: General → Code Cleanup
+1. PostgreSQL has always been a good match for Django, and avoids some of the multilingual issues that come with stock MySQL.  In addition, Django 1.8 has started adding native support for new PostgreSQL features [1], is adding new types that are best supported in PostgreSQL [2], and third-party packages are quickly adding support [3].  PostgreSQL is a good move for compatibility with Django's future.

[1] https://docs.djangoproject.com/en/1.8/ref/contrib/postgres/
[2] https://docs.djangoproject.com/en/1.8/ref/models/fields/#uuidfield
[3] http://www.django-rest-framework.org/api-guide/fields/#uuidfield
:lonnen - can you share some of socorro's experience using Postgres? How has it been, in comparison to projects using MySQL? Will it help us with things like bug 1184513, bug 776048, bug 1054277? Does it cause other issues we may want to avoid?
Flags: needinfo?(chris.lonnen)
See Also: → 1184513, 776048, 1054277
I've only worked with Postgres as an RDBMS so I can't offer much in the way of comparison. I'll find a better person to ask.
+1 on the feature set. Especially for Django. E.g. timezones, microseconds, ability to do full-text searches, and the django 1.8 persistent db connection features. MySQL is faster in some aspects and might have an easier path for sharding/HA. Also, PG has much better tooling for analyzing over- and under-use of indexes which would mean it'd be harder to make certain queries much faster. 

pgloader [0] is the recommended tool to migrate because it's so fast the downtime gets minimal when you have large databases. However, certain updates might be necessary afterwards such as adding timezones because since mysql doesn't have these it can't make it up when you convert. 

[0] http://pgloader.io/
Flags: needinfo?(chris.lonnen)
Assignee: nobody → robhudson
Some status updates:

1) I created a branch to update Django to use postgresql and have travis-ci test this branch and fix problems discovered. This resulted in a handful of commits to fix mysql specific queries. There are currently a few broken tests that need to be investigated.

This branch also updates our docker setup (since this seemed easier) so we can run MDN with postgresql locally using Docker.

That branch is here:
https://github.com/mozilla/kuma/compare/master...robhudson:mdn-on-postgresql

2) I started investigating the use of pgloader to transfer data from mysql to postgresql. To ease the set up of this I created a docker VM with 3 containers: mysql, postgresql, and "hub" which contains pgloader that talks to both. The git repo for this work is here:
https://github.com/robhudson/mdn-pgloader

This ended up failing in multiple ways. The current stable version has a problem which was fixed[1] but only on master. I tested this fix by recompiling pgloader from master but this failed on another issue[2] which I currently haven't found a fix for. More work is needed here. The unfortunate thing is running pgloader in "debug" mode errors much earlier b/c of a unicode output problem.

[1] https://github.com/dimitri/pgloader/issues/106
[2] https://github.com/dimitri/pgloader/issues/109
Blocks: 1110799
Update:

1) The mdn-on-postgresql branch has been moving forward and is down to 3 bugs which are all related to adding 'class="new"'. Once that is tracked down this branch will require some light commit cleanup (removing the intermixed docker additions)

2) All the pgloader issues have been resolved and the master branch now works. Thanks to the pgloader author. The mdn-pgloader github project has everything needed to build pgloader from master and config file to convert MDN to postgresql.
Attached file 3 failures
This is a copy/paste from the test run of the 3 failures remaining. Help figuring these out is appreciated.
I suspect this is because MySQL does case-insensitive string matching, and PostgreSQL is case-sensitive.

Here's the code I suspect is adding class="new":

https://github.com/robhudson/kuma/blob/mdn-on-postgresql/kuma/wiki/content.py#L475-L494

For the first test, in my MySQL VM, ContentSectionToolTests.test_link_annotation, the search is for the slug "css/héritage", which matches the existing slug "CSS/Héritage".  I think that if the database were PostgreSQL, this would not be a match, and the link would be marked as "new".
(In reply to John Whitlock [:jwhitlock] from comment #8)
> I suspect this is because MySQL does case-insensitive string matching, and
> PostgreSQL is case-sensitive.

Exactly right. I updated the code to use the new `Lower()` annotations and that fixed the issue.

Currently the branch is passing all tests. \o/
Current work is this branch:
https://github.com/mozilla/kuma/tree/mdn-on-postgresql

This branch includes changes to the Docker set up that allows you to test MDN using Postgresql pretty easily. By the time anyone gets to using this branch it may be outdated and need a serious rebase. Also, a new data conversion would likely be needed for testing using https://github.com/robhudson/mdn-pgloader.
Assignee: robhudson → nobody
I'm closing the PR, mozilla/kuma#3719, associated with this branch. I don't think there's a sane way to rebase it, now that we've implemented a Docker development environment.

I imagine a "convert to Postgres" docker-compose environment, where:

1. docker-compose.pg.yaml defines a pgloader and PostgreSQL services
2. pgloader converts data from the mysql to the pgsql server
3. Postgres-specific migrations are stored in a different folder (it appeared there were a few migration changes)
4. Kuma introspects the database engine to pick mysql or pgsql variants as needed

In any case, I'm not sure we can get there from mdn-on-postgresql anymore.
I'm planning to drop the custom collation (bug 1391084) as an intermediate step to get us to AWS, before converting to PostgreSQL.  I really wish we could have done both, but this is a smaller change, and I can see the engine migration being easier post-AWS.
No longer blocks: 1110799
See Also: → 1391084
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: