570656 - Push SUMO 2.1 Thursday, 10 June

Reporter

Description

•

14 years ago

Per the webdev releases calendar[1], the SUMO 2.1/1.5.5 release and discussion forum migration.

This involves moving some data. It takes around 15 minutes for the important part, during which we'll need an outage page. The safe thing is to assume we'll need the outage page up for around an hour, total.

SVN tag for 1.5.5 is coming, but git tag is `2.1`.

Here's the big list of steps:

* Get RabbitMQ set up and a git clone checked out to `2.1` up and running celeyrd. (See bug 568329. Hopefully this can happen ahead of time?)

* Will need hg for `pip install`. (I know, it's gross, another VCS.)
* Will need java for `./manage.py compress_assets`.

* Outage page.
* `git co 2.1`

* Set some configuration in `settings_local.py`
** EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
** (Other EMAIL_* constants as needed [2])
** BROKER_* and CELERY_* constants as needed. (See bug 568329.)
*** Particularly CELERY_ALWAYS_EAGER = False (on both webnodes and celeryd instance).
** DEBUG = False, TEMPLATE_DEBUG = False

* Need to run a couple commands from the virtualenv:
** `pip install -Ur requirements.txt`
** `schematic migrations/` [3]
** `./manage.py migrate_forum 3 4 5` (will take 5-15 minutes).
** `./manage.py compress_assets`

* svn sw to 1.5.5 tag (coming soon)
* Outage page can come down now.

* One more command on the virtualenv:
** `./manage.py build_avatars` (Will take up to half an hour, but shouldn't affect site up-time while it runs. Needs /tmp to be writeable.)

* Add an Alias to Apache:
    Alias /admin-media/ /path/to/virtualenv/src/django/django/contrib/admin/media/
** Make sure it's readable, etc.

* Flush all caches.

I am fairly sure that's everything. If anyone remembers something I've forgotten, please add it here.

Then IT is done and we have some dev stuff to take care of.
* Update default site to support.mozilla.com.
* Make sure ForumModerators group has necessary permissions.


[1] https://mail.mozilla.com/home/morgamic@mozilla.com/Webdev%20Releases.html
[2] http://docs.djangoproject.com/en/dev/topics/email/#smtp-backend
[3] I really hope it's this easy. I have a patch to make it this easy. Otherwise, I'll walk you through the slightly worse version.

James Socol [:jsocol, :james]

Reporter

Comment 1

•

14 years ago

Forgot two things:

1) There is a new sphinx.conf in SVN (and in git, under configs/sphinx/). We'll also need to update that and reindex.

2) We'll also need to set the ADMIN_MEDIA_PREFIX in settings_local.py. I have

    MEDIA_URL = '//support.mozilla.com/media/'
    ADMIN_MEDIA_PREFIX = '//support.mozilla.com/admin-media/'

matthew zeier [:mrz]

Comment 2

•

14 years ago

4pm?

Flags: needs-downtime+

Whiteboard: 06/08/2010 @ 4pm

James Socol [:jsocol, :james]

Reporter

Comment 3

•

14 years ago

Works for me.

James Socol [:jsocol, :james]

Reporter

Updated

•

14 years ago

Depends on: 568329

matthew zeier [:mrz]

Comment 4

•

14 years ago

User impacting?

oremj, can you do this?

Assignee: server-ops → jeremy.orem+bugs

Jeremy Orem [:oremj]

Assignee

Comment 5

•

14 years ago

Yeah, I can grab this.

James Socol [:jsocol, :james]

Reporter

Comment 6

•

14 years ago

Forgot one more (easy) bit:

Set up a cron job to run the `./manage.py build_avatars` command once a day. support-stage-new does it at 1:15am PT which seems fine. (It's much shorter after the first run.)

James Socol [:jsocol, :james]

Reporter

Comment 7

•

14 years ago

SVN tag: https://svn.mozilla.org/projects/sumo/tags/1.5/1.5.5_r68485_20100608

James Socol [:jsocol, :james]

Reporter

Comment 8

•

14 years ago

After an hour of attempt at this we've reverted and are going to look into the errors we saw during the push, and why we never saw them with the data we had available for testing.

James Socol [:jsocol, :james]

Reporter

Comment 9

•

14 years ago

The first problem we saw was an unexpected schema:

| forums_thread | CREATE TABLE `forums_thread` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(255) collate utf8_unicode_ci NOT NULL,
  `forum_id` int(11) NOT NULL,
  `created` datetime NOT NULL,
  `creator_id` int(11) NOT NULL,
  `last_post_id` int(11) default NULL,
  `replies` int(11) NOT NULL,
  `is_locked` tinyint(1) NOT NULL,
  `is_sticky` tinyint(1) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `forums_thread_forum_id` (`forum_id`),
  KEY `forums_thread_created` (`created`),
  KEY `forums_thread_creator_id` (`creator_id`),
  KEY `forums_thread_last_post_id` (`last_post_id`),
  KEY `forums_thread_is_sticky` (`is_sticky`),
  CONSTRAINT `creator_id_refs_id_4938e584` FOREIGN KEY (`creator_id`) REFERENCES `auth_user` (`id`),
  CONSTRAINT `forum_id_refs_id_7f5fd759` FOREIGN KEY (`forum_id`) REFERENCES `forums_forum` (`id`),
  CONSTRAINT `last_post_id_refs_id_3fa89f33` FOREIGN KEY (`last_post_id`) REFERENCES `forums_post` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci | 


mysql> show create table forums_post;  

| forums_post | CREATE TABLE `forums_post` (
  `id` int(11) NOT NULL auto_increment,
  `thread_id` int(11) NOT NULL,
  `content` longtext collate utf8_unicode_ci NOT NULL,
  `author_id` int(11) NOT NULL,
  `created` datetime NOT NULL,
  `updated` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `forums_post_thread_id` (`thread_id`),
  KEY `forums_post_author_id` (`author_id`),
  KEY `forums_post_created` (`created`),
  KEY `forums_post_updated` (`updated`),
  CONSTRAINT `author_id_refs_id_59fe2704` FOREIGN KEY (`author_id`) REFERENCES `auth_user` (`id`),
  CONSTRAINT `thread_id_refs_id_5646bc53` FOREIGN KEY (`thread_id`) REFERENCES `forums_thread` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

James Socol [:jsocol, :james]

Reporter

Comment 10

•

14 years ago

The problem we couldn't immediately work around was the following stack trace during our migrate_forum step:

Starting migration for forum "Contributors" (3)
Created forum "Contributors" (1)...
Processing thread 1529...
Traceback (most recent call last):
  File "./manage.py", line 36, in <module>
    execute_manager(settings)
  File "/data/virtualenvs/kitsune/src/django/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/data/virtualenvs/kitsune/src/django/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/data/virtualenvs/kitsune/src/django/django/core/management/base.py", line 195, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/data/virtualenvs/kitsune/src/django/django/core/management/base.py", line 222, in execute
    output = self.handle(*args, **options)
  File "/data/www/support.mozilla.com/kitsune/apps/forums/management/commands/migrate_forum.py", line 193, in handle
    last_post = thread.post_set.order_by('-created')[0]
  File "/data/virtualenvs/kitsune/src/django/django/db/models/query.py", line 187, in __getitem__
    return list(qs)[0]
IndexError: list index out of range

Paul Craciunoiu [:paulc]

Comment 11

•

14 years ago

Another problem we hadn't encountered before was mentioned by timellis:
[17:17]	<timellis> Hi. Someone killed the SUMO master in Phoenix with this statement:
[17:17]	<timellis> Error 'Error on rename of './support_mozilla_com/forums_thread' to './support_mozilla_com/#sql2-12b1-4833' (errno: 152)' on query. Default database: 'support_mozilla_com'. Query: 'alter table forums_thread drop foreign key last_post_id_refs_id_3fa89f33
[17:17]	<timellis> The reason is thus:
[17:17]	<timellis> "Cannot delete a parent row"
[17:18]	<timellis> The Phoenix SUMO master is a slave of the SJ SUMO master.

Neither James nor I were aware of this slavemaster, and we're still not sure why a command that ran fine on the SJ master failed on the Phoenix one. One assumption is that the two weren't in sync (with the SJ master having the unexpected schema from comment 9).

Ricky Rosario [:rrosario, :r1cky]

Comment 12

•

14 years ago

The migration code has been modified to keep track of the thread's last_post as they are created, removing the need to go ask the database for it afterwards. This *should* eliminate the race condition with the master/slaves.

http://github.com/jsocol/kitsune/commit/243433b2c6dfc08381cd0fe5bbbcf688d39cb5c7

James Socol [:jsocol, :james]

Reporter

Comment 13

•

14 years ago

We believe we've got everything ironed out and ready to go tomorrow afternoon. Let's plan on getting on

the phone: 92, 309#
IRC: #sumodev

Summary: Push SUMO 2.1 Tuesday, 8 June → Push SUMO 2.1 Tuesday, 10 June

Whiteboard: 06/08/2010 @ 4pm → 06/10/2010 @ 4pm, needs-downtime+

Vishal Kamdar [:vish_moz]

Updated

•

14 years ago

Summary: Push SUMO 2.1 Tuesday, 10 June → Push SUMO 2.1 Thursday, 10 June

matthew zeier [:mrz]

Comment 14

•

14 years ago

Duration 2 hrs?

Whiteboard: 06/10/2010 @ 4pm, needs-downtime+ → 06/10/2010 @ 4pm

James Socol [:jsocol, :james]

Reporter

Comment 15

•

14 years ago

(In reply to comment #14)
> Duration 2 hrs?

Yep.

Jeremy Orem [:oremj]

Assignee

Comment 16

•

14 years ago

Let's start at 2 or 3 this time.

James Socol [:jsocol, :james]

Reporter

Comment 17

•

14 years ago

(In reply to comment #16)
> Let's start at 2 or 3 this time.

2 WFM. QA?

Stephen Donner [:stephend] Not actively reading bugmail

Comment 18

•

14 years ago

2PM sounds _great_ to me.

James Socol [:jsocol, :james]

Reporter

Comment 19

•

14 years ago

(In reply to comment #18)
> 2PM

Moved to 2pm on the Webdev:Releases calendar.

[:Cww]

Updated

•

14 years ago

Whiteboard: 06/10/2010 @ 4pm → 06/10/2010 @ 2pm

James Socol [:jsocol, :james]

Reporter

Updated

•

14 years ago

Depends on: 571283

James Socol [:jsocol, :james]

Reporter

Comment 20

•

14 years ago

UPDATED INSTRUCTIONS!

So we've got slightly updated instructions, since much of this is still done from Tuesday.

* We still need an Outage page up for the duration of the migration.

* `git co 2.1.1` for both web servers and celeryd instance. (Note the new tag)
** Reload celeryd

* Make sure settings_local.py is still configured correctly: (see comment 0)

* Clean up from yesterday:
** SQL: `TRUNCATE TABLE forums_post; TRUNCATE TABLE forums_thread; TRUNCATE TABLE forums_forum;`

* Need to run a couple commands from the virtualenv:
** `./manage.py migrate_forum 3 4 5` (will take 5-15 minutes).
** `./manage.py compress_assets`
*** Make sure that both the generated JS/CSS and the generated build.py (next to settings.py) get synced out.

* svn sw to 1.5.5 tag (see comment 7)
* run webroot/htaccess.sh in SVN.

* One more command on the virtualenv:
** `./manage.py build_avatars` (Will take up to half an hour, but shouldn't
affect site up-time while it runs. Needs /tmp to be writeable.)

* Outage page can come down now.

* Make sure this alias is there.
    Alias /admin-media/
/path/to/virtualenv/src/django/django/contrib/admin/media/
** Make sure it's readable, etc.

* Flush all caches.

* Make sure to update Sphinx again as well.

Jeremy Orem [:oremj]

Assignee

Comment 21

•

14 years ago

 git fetch
remote: Counting objects: 186, done.
remote: Compressing objects: 100% (115/115), done.
remote: Total 124 (delta 67), reused 12 (delta 7)
Receiving objects: 100% (124/124), 59.39 KiB, done.
Resolving deltas: 100% (67/67), completed with 28 local objects.
From http://github.com/jsocol/kitsune
 + c803754...0f0484e 561530-logging -> origin/561530-logging  (forced update)
   572ed80..4d61e5c  development -> origin/development
   93d393c..8b8e5b9  master     -> origin/master
   e291839..94279b7  questions  -> origin/questions
 * [new branch]      sphinx-doc -> origin/sphinx-doc
 * [new tag]         2.1.1      -> 2.1.1
[root@mradm02 prod]# git checkout 2.1.1
Previous HEAD position was cb19e8c... Adding WebTrends meta tags and test for them.
HEAD is now at 8b8e5b9... Merge branch 'development'

Jeremy Orem [:oremj]

Assignee

Comment 22

•

14 years ago

svn switch https://svn.mozilla.org/projects/sumo/tags/1.5/1.5.5_r68485_20100608
A    webroot/lang/ilo
A    webroot/lang/ilo/language.php
A    webroot/lang/ilo/index.php
U    webroot/lang/langmapping.php
U    webroot/lib/commentslib.php
U    webroot/tiki-login.php
A    webroot/django_utils.php
U    webroot/tiki-change_password.php
U    webroot/htaccess.dist
U    scripts/sphinx/sphinx.conf
Updated to revision 68644.

Jeremy Orem [:oremj]

Assignee

Comment 23

•

14 years ago

Attached file migrate_forum output — Details

Jeremy Orem [:oremj]

Assignee

Comment 24

•

14 years ago

Trevor updated the sphinx config.

James Socol [:jsocol, :james]

Reporter

Comment 25

•

14 years ago

We ran into a pretty serious architectural issue in the code related to replication. We need to re-examine and we'll take another run at this.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard