[AirMozilla] We're moving to PostgreSQL

RESOLVED FIXED

Status

Data & BI Services Team
DB: MySQL
P3
normal
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: peterbe, Assigned: cyborgshadow)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
We have decided to switch Air Mozilla to run on PostgreSQL instead.
The code for the migration has been completed https://bugzilla.mozilla.org/show_bug.cgi?id=897527 but only tested locally (automation and manual testing on my OSX)

The reason for the move is to gain access to Postgres's powerful fulltext index so we can use that in conjunction with the Django ORM to be able to search for fuzzy strings in English (and other languages some day) without having the overhead of having to "manually" index documents with an external database (e.g. elastic search). 

I'm slightly uncertain about the whens and hows we will do this. I'm open for a discussion how to make this switch.
(Reporter)

Updated

4 years ago
Blocks: 897527
Hey Matt,

What do you think about just putting AirMo's new database onto internal1? That system has quite a bit of headroom, from what I can tell. 

-selena
internal1 works for me

Updated

4 years ago
Assignee: server-ops-webops → server-ops-database
Component: WebOps: IT-Managed Tools → Server Operations: Database
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → scabral
What exactly are you trying to gain here? MySQL has a powerful SOUNDEX implementation, which is available in InnoDB in MySQL 5.6, which we're upgrading to this quarter and next (and we can prioritize the airmo stuff).

There aren't trigrams in MySQL like there are in Postgres, but there is proximity search. 

If you need trigrams or something about Postgres specifically, we're happy to migrate, but if fulltext search in InnoDB is sufficient, it's less of an overhaul change and thus less of a risk.
(Reporter)

Comment 4

4 years ago
FullText indexing search is what we're after. A Soundex is not good enough. 

Last time I looked, deep, into using MySQL's full text index I concluded that it was indeed inferior to PostgreSQLs. It was order of magnitude slower and contained significantly less features (admittedly, at the time, I was only focusing on matching English).

No, we don't need trigrams. What we need is ability to do fulltext queries against event titles and descriptions. 

Unless MySQL has made HUGE improvements that I haven't noticed on its fulltext functionality in the last 2-3 years I think we're better off switching to Postgres which has had this now since 8.3.

Sorry. MySQL is awesome at many other things. 

PS. If anybody is wondering why we're not going full-blown on an Elastic Search, it is to vastly reduce complexity. Having to maintain an ES and carefully craft the handling of index maintenance and invalidation is just not worth it. By being able to use the same relational database as the Django ORM is hooked into makes it really really easy to combine fetching with fulltext querying.
Depends on how HUGE you mean - the performance improvements in InnoDB's FULLTEXT search are 10-1000 times more performant than MyISAM's FULLTEXT search. https://blogs.oracle.com/mysqlinnodb/entry/performance_enhancement_in_full_text
Also, if you're not using soundex or trigrams, what algorithms are you using for fuzzy search?
(Reporter)

Comment 7

4 years ago
Wow! So now mysql has a fast fulltext implementation under innodb. I did not know. 

I'm currently upgrading my mysql to 5.6.12 so I can try this out. I'll report back if I can make it work nicely. 

Whilst waiting for it to download and install I'm going to read http://www.mysqlperformanceblog.com/2013/02/26/myisam-vs-innodb-full-text-search-in-mysql-5-6-part-1/ and http://www.mysqlperformanceblog.com/2013/03/04/innodb-full-text-search-in-mysql-5-6-part-2-the-queries/
(Reporter)

Comment 8

4 years ago
Ok. So I have a working branch now using search with InnoDB. 

I can't evaluate the speed because my table is so darn little. It's only 300 rows with a total of 13k words. 

What's really annoying that it doesn't support stemming. 
http://dev.mysql.com/worklog/task/?id=2423 which means that pluralization of most words prevent search from matching at all. 

I'm tempted to resolve this bug since I'm now so close to a working solution. However, I need to play with it some more to see if it's any good and/or if postgres is any better (other than stemming).
You're correct about stemming. But there *are* fulltext parser plugins that DO support it. http://www.mnogosearch.org/doc/msearch-udmstemmer.html

Also, there is the ability to make fulltext parser plugins now. Just putting the bug in your ear because it sounds like something you'd play with (maybe not right now, of course).
(In reply to Sheeri Cabral [:sheeri] from comment #9)
> You're correct about stemming. But there *are* fulltext parser plugins that
> DO support it. http://www.mnogosearch.org/doc/msearch-udmstemmer.html
> 
> Also, there is the ability to make fulltext parser plugins now. Just putting
> the bug in your ear because it sounds like something you'd play with (maybe
> not right now, of course).

Is this sometihng we're supporting in other MySQL installs? 

Postgres supports stemming with no additional library support.

Also - do you know if there's more information about the stemming support elsewhere? The link in the mnogosearch page goes to http://dev.mysql.com/doc/refman/5.1/en/plugin-full-text-plugins.html which is 404. From looking at the limited docs available, it's not clear how stemming is supported -- the "all words are normalized..." portion makes me think that the plugin is modifying the text that's inserted, which surprises me.
(Reporter)

Comment 11

4 years ago
(In reply to Sheeri Cabral [:sheeri] from comment #9)
> You're correct about stemming. But there *are* fulltext parser plugins that
> DO support it. http://www.mnogosearch.org/doc/msearch-udmstemmer.html
> 
I'm not sure I understand that link. It talks of mysql 5.1 so that must imply it's referring to myisam which we're not interested in. 

> Also, there is the ability to make fulltext parser plugins now. Just putting
> the bug in your ear because it sounds like something you'd play with (maybe
> not right now, of course).

What does look very attractive to me is that suppose I go ahead and land search in Air Mozilla but it simply doesn't support stemming; you (or someone) could do the magical work of installing the plugin and alter the table and just like that stemming just works (TM). 

Depending on plugins in the code does scare me a lot and it's something I'm strictly hesitant to include into the project (e.g. a migration script that uses `FULLTEXT(a) WITH PARSER stemming`) but if it's something we can add "retroactively" that works like just another feather in its cap, I'd be delighted to go down that route.
(Reporter)

Comment 12

4 years ago
:sheeri,

I've now written a pretty solid patch that uses mysql for search. I have some more code to write but I'm nearing completion. 

I have 5.6.12 here on my laptop but because the patch involves a migration that depends on `alter table main_event ADD FULLTEXT KEY (title);` etc. if I deploy this that migration will fail hard and break the deployment. 

So, what is the realistic timeframe to have airmozilla's mysqls upgraded to 5.6?
(Reporter)

Comment 13

4 years ago
:sheeri,

Writing tests using this index is turning out to be very hard. It seems as if it's unable to find anything in the integration tests. Do you know if there's an asynchronous delay in things getting indexed?
I'm a bit wary about just randomly upgrading airmo, I should talk to zandr probably first.

I don't believe there's any delay in indexing on the innodb FTS. 

Do you want to set up a short meeting about this? my calendar's up to date so if you find a time and make a meeting, I'll be there on Vidyo.
(Reporter)

Comment 15

4 years ago
The problem I constantly get with my test suite is error::

_mysql_exceptions.Warning: InnoDB rebuilding table to add column FTS_DOC_ID

The code that causes it is my migration code which looks like this::

  from south.db import db
  from south.v2 import SchemaMigration
  ...

  class Migration(SchemaMigration):

    def forwards(self, orm):
        # You are running MySQL 5.6 right?
        db.execute("ALTER TABLE main_event ADD FULLTEXT KEY (title)")
        db.execute("ALTER TABLE main_event ADD FULLTEXT KEY (description, short_description)")
    
     ...

I really don't understand why this happens. The error always happens on the first `db.execute()` command so it appears to not be a matter of the second command happening too soon.
Hrm, what happens isn't an error, it's a warning. I think it's just letting you know that MySQL is rebuilding the table. Does it actually die?
(Reporter)

Comment 17

4 years ago
That's what I thought so I did something like::

  try:
    db.execute('...')
  except _mysql_exceptions.Warning:
    # ignore the blather

But it just doesn't create it. 
I even tried combining that with a `time.sleep(3)` and that didn't work either.
(Reporter)

Comment 18

4 years ago
Created attachment 789901 [details]
Screen Shot 2013-08-13 at 4.43.36 PM.png

After spending almost a day debugging and tearing my hair out I could not get the tests to work with full text in mysql. I tried everything but I just couldn't get it to work. 

So, I had a free our on the train and decided to go back and try the full text search with postgres. I made a branch where I merged the migration-to-postgres code with the mysql-search code and changed the SQL piece and it just worked. Also it was possible to add really nice unit tests (technically, "integration tests" since it uses the database). 

The combined code is here and it's ready to go:
https://github.com/peterbe/airmozilla/tree/bug-895126-search-with-postgres

And stemming support is great. I didn't realize that stemming is so much more than just pluralization. Also, out of the box Postgres has a ts_headline() function which automatically highlights the column values with markers so you can get highlighting the search result page. 

So please, can we go ahead and run the migration to switch over to Postgres. 
The instructions are here:
https://github.com/peterbe/airmozilla/tree/bug-895126-search-with-postgres/mysql2postgres

I just tested this an hour ago using a fresh dump of the production database and it worked just fine.
We can indeed migrate; at this point it's figuring out the procedure and a good time to do so.
(Reporter)

Comment 20

4 years ago
The procedure I have in https://github.com/peterbe/airmozilla/tree/bug-895126-search-with-postgres/mysql2postgres 

I've tried it several times locally. Both with my 5.5 and my 5.6 version. 

If someone at IT is ready to give this a go on dev, I can merge my branch into master and push.
Other than you, with whom do we need to coordinate to make this happen?
And for my reference, confluence1.stage.webapp.scl3 is the stage server we're going to do this on.

After tomorrow afternoon, we should be able to get to this (so, say, Wed).
(Reporter)

Comment 23

4 years ago
:sheeri,
I'm doing a workweek this week but I think I should be able to find the time. Ping me.
I'll land the code into master in the meantime.
(Reporter)

Comment 24

4 years ago
Any news here? I'm eager to deploy on dev.
Hi Peter,

The sticking point here is that we are moving the stage server to using an improved configuration management module (puppet module). I'm going to do some work on this over the weekend and hopefully things will sail more smoothly after the weekend.
(Reporter)

Comment 26

4 years ago
Please don't work on the weekend on my behalf. I certainly avoid work like the plague on weekends. :)

Tomorrow is Friday so no release on stage or prod but I'd be happy to deploy on dev. 

Thing is a lot of bugs are kinda building up by being blocked on this. That's because a lot of new patches will need new migrations and my postgres patch "resets" all existing migrations into a brand new one. 

So, is there any news on this?
Before I left for Tribe in Toronto on Tuesday night, I created the db called air_allizom_org on confluence1.stage.webapp.scl3.mozilla.com and a user for the database, and the puppetizing is half done. If postgres needs to be restarted for any reason it will read the wrong configs and confluence/mana will stop working - which is why I did not mention it before.

I hope to have the rest of the puppetizing (the config files) finished tomorrow. We are working on growing the infrastructure to support postgres this quarter and next, and your needs came about 2 weeks early from what we were planning on having ready. The good news it that that is the only reason we were able to prioritize this at all. By end of Q4 we will be able to do this type of thing much more easily, but we are not there yet.

Pretty soon a new postgres deployment will be as trivial and painless as a new MySQL deployment. This will hopefully be the last painful deployment/migration.
Update: the puppetizing is not going to be finished today. My apologies.
The puppetizing has been complete! But in order to make this work, I'm going to need to know where the airmo stage webserver is coming from, so I can make the ACLs work.

Comment 30

4 years ago
air.m.o is on the Generic cluster web servers are:

genericrhel6: generic1.webapp.phx1.mozilla.com,generic2.webapp.phx1.mozilla.com,generic3.webapp.phx1.mozilla.com,generic4.webapp.phx1.mozilla.com,generic5.webapp.phx1.mozilla.com,generic102.webapp.phx1.mozilla.com

genericrhel6-dev: generic1.dev.webapp.phx1.mozilla.com

genericrhel6-stage: generic1.stage.webapp.phx1.mozilla.com,generic2.stage.webapp.phx1.mozilla.com

Might want to include the admin node as that is where most migrations, cron jobs and such are run from:
genericadm.private.phx1.mozilla.com
Depends on: 913156
Depends on: 915864
OK, I now have the stage and dev users and databases all set up! In order to do that I had to figure out the netflows, so bug 915864 is for that. As soon as that's done we should be all good to go.
(Assignee)

Comment 32

4 years ago
Last night peterbe and I worked on this a bit more.

We were able to attempt the dev migration (jenkins was failing, but we tried anyhow)

We got through most of the steps here with a few modifications:

https://github.com/mozilla/airmozilla/tree/master/mysql2postgres
- pip isn't allowed on our servers (OpSec restriction), so he included the module in the package.
- This changed line 4 to be this command instead: 
    PYTHONPATH=./py-mysql2pgsql-0.1.5/ ./py-mysql2pgsql-0.1.5/bin/py-mysql2pgsql -v -f mysql2pgsql.yml
- The mysql2pgsql.yml file had to be manually created and populated. It's living on the genericadm node for generic dev. It's got the credentials for the postgres and mysql servers. This needs to be hand created for stage and prod eventually too.
- Migration table creation and all steps succeeded up until the modified step 4. It errors with: "psycopg2.InterfaceError: can't encode unicode string to latin-1", which was believed to be a failure resulting from an old version of psycopg2 on the server (it's running '2.0.14 (dt dec ext pq3)'). 
- peter's going to file a bug to get a more recent version pushed to the generic cluster and we can then move forward by rerunning the new modified step 4 and 5.
(Assignee)

Comment 33

4 years ago
as a side note: mpressman was able to get jenkins working today. But, peterbe expressed jenkins is allowing the pip install, which is likely why it worked.
(Reporter)

Updated

4 years ago
Depends on: 919050
(Reporter)

Updated

4 years ago
No longer depends on: 919050
(Reporter)

Updated

4 years ago
Depends on: 919050
(Assignee)

Comment 34

4 years ago
Webops upgraded the psql connector and we finished the migration for dev. It ran perfectly with the following:

 PYTHONPATH=./py-mysql2pgsql-0.1.5/ ./py-mysql2pgsql-0.1.5/bin/py-mysql2pgsql -v -f mysql2pgsql.yml

Then I migrated the convert-all-timestamps.sql to the postgres machine on confluence and ran this as the postgres user:

psql air_dev_allizom_org < convert-all-timestamps.sql 


Dev is done.
(In reply to Brandon Johnson [:cyborgshadow] from comment #34)
> Webops upgraded the psql connector and we finished the migration for dev. It
> ran perfectly with the following:

> Dev is done.

Yay!!!
Can anyone give me an estimate of when we can push to prod?   This is now blocking all feature pushes to the production server.
Severity: normal → major
Dropping severity since it's paging oncall and is not actionable now.
Adding a priority instead.
Severity: major → normal
Priority: -- → P3
We have hardware for stage and production, so it's a matter of timing when we can migrate stage and prod.
I'm ready to do this but can't find peterbe. :( We can migrate stage tomorrow (Friday), I'll set up an appointment in Peter's calendar.
(Reporter)

Comment 40

4 years ago
I'm back! Sorry. Took the dog for a walk. I'm ready when you are.
I've set up a meeting tomorrow at 10:30 am Pacific time to do this. Clear, focused time (I have an appt in a few minutes so it wasn't a good time to start something). Hopefully :solarce can help out if needed.
(In reply to Sheeri Cabral [:sheeri] from comment #41)
> I've set up a meeting tomorrow at 10:30 am Pacific time to do this. Clear,
> focused time (I have an appt in a few minutes so it wasn't a good time to
> start something). Hopefully :solarce can help out if needed.

I'm available, am I responsible for the config changes to the Django application? If so, please let me know the DNS for the new Zeus VIP and login info via IRC

Thanks!
(Assignee)

Comment 43

4 years ago
We migrated stage today. Everything ran perfectly, 26 minutes from beginning to end.

The documentation provided from comment 32 and comment 34 helped provide everything necessary.
(Assignee)

Comment 44

4 years ago
Note there was one minor blip: the pg_hba entry for the stage user didn't exist on the stage server. I added it as puppet revision 76329.
(Assignee)

Comment 45

4 years ago
For note: while stage and dev are talking to a db in scl3, prod will be all in phx1.

The servers it's migrating to are:
generic-pg[1-2].db.phx1
(In reply to Brandon Johnson [:cyborgshadow] from comment #45)
> For note: while stage and dev are talking to a db in scl3, prod will be all
> in phx1.
> 
> The servers it's migrating to are:
> generic-pg[1-2].db.phx1

I am not finding any load balancer configuration for this cluster, so I'm going to need to file some bugs to get that configured and get network flows from the generic production web servers and admin node

I'm going to file the CAB bug and assume I can get things all ready before Thursday, if not we can reschedule
Depends on: 926432
Depends on: 926486
(Assignee)

Comment 47

4 years ago
We migrated production today during the approved maintenance window and everything went off without a hitch. :)

Excellent maintenance!
Assignee: server-ops-database → bjohnson
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(Assignee)

Comment 48

4 years ago
Added check-in for revision 76673 to put lines in databases.pp to create the airmo user in production. This same call to postgres::create::role existed for dev/stage so I wanted to make sure we were consistent.
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.