Closed Bug 924435 Opened 6 years ago Closed 5 years ago

Can we add spatial query capabilities to Mozillians.org?

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: hoosteeno, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [data: db_mig] [Triage 2015-04-17])

We're hoping to add true geographic data to Mozillians profiles (right now people's locations are really just strings). To get the most out of this change we will probably want to run spatial queries (e.g. "show me mozillians near the portland office"). 

Here's one approach we might consider: https://github.com/coderholic/django-cities

Can we talk about the right way to spatial queries built into the Mozillians.org database? Looks like there are some libraries you can compile MySQL with to get some coverage; and obviously postgreSQL would do the trick. Glad to talk more about it.
We can do spatial queries using ElasticSearch. AFAIK that was the initial reason to first support it in the project :)
We're happy to support changes if you want to migrate to postgres, or if you want the spatial libraries in MySQL (but really, if you want built-in support, go Postgres).

Or will ElasticSearch fill your needs?
Based on a cursory review, I think ElasticSearch may do the trick. Will reopen if not!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
I suggest we consider again moving to PostgreSQL for mozillians.org.

We can still use Elastic Search for searching if we want, but any of the Django libraries that might help us deal with spatial data assume the database supports storing it. MySQL only supports that in MyISAM tables (https://docs.djangoproject.com/en/1.4/ref/contrib/gis/db-api/#mysql-spatial-limitations), which do not support transactions or foreign keys... not really something we want to try to cope with when switching to PostgreSQL would get us data integrity, spatial data support, and a lot of other advantages.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Hrm, according to http://dev.mysql.com/doc/refman/5.6/en/creating-spatial-columns.html

"Currently, spatial columns are supported for MyISAM, InnoDB, NDB, and ARCHIVE tables."

So InnoDB does support storing data types.

That article references an older version of MySQL, 5.1 and is specifically talking about indexes, not storage. MySQL 5.6 does not support RTREE indexes at all, according to http://dev.mysql.com/doc/refman/5.6/en/create-index.html:

"The parser recognizes RTREE as a type name, but currently this cannot be specified for any storage engine."

We are happy to move mozillians.org if you'd like. Have you considered moving mozillians.org to the PaaS so you can do more self-service? If that's not going to work, we're more than happy to move it to a Postgres server, we have experience doing that.
I'm also interested in the "data integrity" features you speak of in Postgres. Can you elaborate? I want to make sure your knowledge of MySQL is current; MySQL 5.1 was GA'd in November 2008, so the major features in it (and not in it) are over 5 years old at this point, and a LOT has changed in the MySQL landscape.

I want to make sure you have the most up to date information on the technologies you're using.
My understanding was that MyISAM tables did not support transactions or foreign keys.

Additionally, as far as I know, MySQL still does not support transactions for data definition operations, meaning that an aborted schema migration can leave things in an indeterminate state.
You are correct that MyISAM does not support transactions for foreign keys.

However, InnoDB, which *does* support transactions and foreign keys, also supports spatial data types. This has been possible since MySQL 5.0.16, released Nov 10 2005, as per http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-columns.html

Also: InnoDB tables put a data definition in a transaction. You may be confused because you can't put a data definition inside a transaction with other statements, but if a long-running InnoDB DDL statement aborts, InnoDB absolutely rolls back. (I've had to wait long times for it to finish!) What you cannot do is have a transaction where you update some table records, then alter the table, then commit the transaction....most DDL does an implicit commit beforehand, specifically so that there is a transaction with the DDL in place.

In addition, these days there are several ALTER TABLE statements that are done in an online fashion, so that there is less of a chance of having a schema migration aborted in the first place.

So I guess my point is, spatial data types have been allowed since at least 2005 in non-MyISAM tables. Spatial *indexes* are MyISAM only. 

You can absolutely use the django libraries you linked with InnoDB tables, you just can't have spatial indexes on the data. You still can have BTREE or HASH indexes if you want/need them.

The statement "MySQL only supports storing spatial data in MyISAM" is untrue. If you  need spatial *indexes*, then Postgres is the way to go, nobody's going to argue that. 
We are happy to migrate the data for you, we just want to make sure you have an updated idea of what MySQL offers, since it sounds like there is some confusion, including the data integrity stuff.
I'd like to get to the bottom of this one. 

I don't think that comment 7 relates to the question in this bug, but I do want to be sure our database can support some basic spatial queries. In particular, we want to use distance queries. It appears that MySQL 5.6 supports distance queries (undocumented!)[0], so maybe it's just fine for our needs. But MySQL's geodata support seems to have a few esoteric limitations (as in comment 8, and in the post below, and elsewhere), while everything I read suggests that PostgreS is sure to work.

:dpoirier, can you evaluate the feasibility of deploying Mozillians on PostgreS? It's not clear to me whether that is trivial from an application developer's perspective, or whether it is going to require substantial refactoring and bug fixing.

:sheeri, if we asked for that kind of migration, about when would you have time for it?

[0] http://www.mysqlperformanceblog.com/2013/10/21/using-the-new-spatial-functions-in-mysql-5-6-for-geo-enabled-applications/
Flags: needinfo?(scabral)
Flags: needinfo?(dpoirier)
Hi Justin,

I'm out of the office this week, so definitely "not this week".

I'll needinfo Brandon and Matt, as they've both worked on other migrations from Postgres to MySQL and can also handle this type of thing. It depends on how complicated the migration is, and how much of the migration is done by developers versus how much is done by DBAs. (For example, when airmo migrated, peterbe wrote scripts to migrate the data, so migrating was not too hard for us).
Flags: needinfo?(scabral)
Flags: needinfo?(mpressman)
Flags: needinfo?(bjohnson)
:hoosteeno, I tried running Mozillians on Postgres a few months ago, and the only thing I know will need changing is the way we compute the number of vouched members in groups to display in the groups changelist page in the Django admin - we're using a clever hack that won't work on Postgres.

As for migrating non-trivial amounts of data from MySQL to Postgres, I haven't had to do that. Maybe the DBAs have done that? Or I could ask around Caktus if anyone has experience to share.
Flags: needinfo?(dpoirier)
We have indeed, we migrated air mozilla, which had about double the amount of data that's in Mozillians.
Alright. We're going to do some final exploration of whether ElasticSearch can do what we need (it looks like maybe it can[0]); we'll get back to this bug when we have an answer.

[0] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-filter.html
Justin, as sheeri mentioned the amount of data would be trivial compared to what we did with airmo. As far as getting the data transferred, that could be done relatively quickly. More important is to what Dan mentioned as far as application code being migrated.
Flags: needinfo?(mpressman)
If you're curious about migration to postgres, it might behoove you to talk to :peterbe. He was the application engineer behind the airmo migration. I did the database portions. You can feel free to read up on how we accomplished it in this bug, particularly comment 32 for my steps to transition: https://bugzilla.mozilla.org/show_bug.cgi?id=899267#c32
Flags: needinfo?(bjohnson)
Product: mozilla.org → Data & BI Services Team
Whiteboard: [data: db_mig]
Following program meeting decision, closing this. We'll reopen if needed.
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → WONTFIX
Whiteboard: [data: db_mig] → [data: db_mig] [Triage 2015-04-17]
You need to log in before you can comment on or make changes to this bug.