Closed
Bug 756585
Opened 13 years ago
Closed 13 years ago
Setup wiki content migration on developer-(dev,stage,prod}.allizom.org
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lorchard, Assigned: nmaul)
References
Details
(Whiteboard: u=developer c=infrastructure s=2012-06-05 p=)
Once we have working environments for developer-{dev,stage}.allizom.org, we'll need to get some content into the respective wikis by using the migration tool.
It would also be very nice to set up regular incremental migrations until we launch, maybe from MindTouch prod.
Reporter | ||
Updated•13 years ago
|
Reporter | ||
Updated•13 years ago
|
Summary: Setup wiki content migration on developer-(dev,stage}.allizom.org → Setup wiki content migration on developer-(dev,stage,prod}.allizom.org
Reporter | ||
Comment 1•13 years ago
|
||
What needs to happen to get migrations working:
1) Configure settings_local.py to point to a read-only production DB, or as close to that as we can get:
https://github.com/mozilla/kuma/blob/master/puppet/files/vagrant/settings_local.py#L88
2) Run a complete migration once, which will probably take a few hours:
https://github.com/mozilla/kuma/blob/master/scripts/migrate_all.sh
3) Set up incremental migrations to run periodically, on an ongoing basis:
https://github.com/mozilla/kuma/blob/master/scripts/migrate_recent.sh
Reporter | ||
Comment 2•13 years ago
|
||
This should also probably be turned into an IT bug :/
Reporter | ||
Updated•13 years ago
|
Assignee: nobody → server-ops
Component: Docs Platform → Server Operations
Product: Mozilla Developer Network → mozilla.org
QA Contact: docs-platform → phong
Version: Kuma → other
Reporter | ||
Comment 3•13 years ago
|
||
Tried moving to an IT bug, but can't check a, "infra-related bugs" box :/
Whiteboard: u=developer c=infrastructure s=2012-06-05 p=
Assignee | ||
Comment 4•13 years ago
|
||
From comment1:
#1 is done.
#2 is running.
Note that I'm doing this on developer-new.mozilla.org, not dev/stage. Moving data to dev/stage.
Very shortly in I got this non-fatal error:
/data/developer/src/developer.mozilla.org/kuma/vendor/src/django/django/db/backends/mysql/base.py:86: Warning: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it accesses a non-transactional table after accessing a transactional table within the same transaction.
return self.cursor.execute(query, args)
/data/developer/src/developer.mozilla.org/kuma/vendor/src/django/django/db/backends/mysql/base.py:86: Warning: Field 'show_toc' doesn't have a default value
return self.cursor.execute(query, args)
It's still running now. I will update again when it finishes.
Assignee: server-ops → nmaul
Component: Server Operations → CA Certificates
QA Contact: phong → ca-certificates
Assignee | ||
Updated•13 years ago
|
Component: CA Certificates → Server Operations: Web Operations
QA Contact: ca-certificates → cshields
Reporter | ||
Comment 5•13 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #4)
> Very shortly in I got this non-fatal error:
>
> /data/developer/src/developer.mozilla.org/kuma/vendor/src/django/django/db/
> backends/mysql/base.py:86: Warning: Unsafe statement written to the binary
> log using statement format since BINLOG_FORMAT = STATEMENT. Statement is
> unsafe because it accesses a non-transactional table after accessing a
> transactional table within the same transaction.
> return self.cursor.execute(query, args)
Huh. Have never seen that error before.
> /data/developer/src/developer.mozilla.org/kuma/vendor/src/django/django/db/
> backends/mysql/base.py:86: Warning: Field 'show_toc' doesn't have a default
> value
> return self.cursor.execute(query, args)
I *thought* we had a fix for that, hmm.
Assignee | ||
Comment 6•13 years ago
|
||
Is there any way to get a status on this job? Either a percentage complete, or some way to check the output? Just looking for a rough estimate, nothing super-precise. :)
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #6)
> Is there any way to get a status on this job? Either a percentage complete,
> or some way to check the output? Just looking for a rough estimate, nothing
> super-precise. :)
Yeah, percentage complete is on my TODO list :/ But, it should spew out a completion count every 5 seconds or so.
I think the rough estimate of total documents is about 90000, and revisions is about 182000:
mysql> select count(*) from pages;
+----------+
| count(*) |
+----------+
| 90496 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from old;
+----------+
| count(*) |
+----------+
| 182713 |
+----------+
1 row in set (0.00 sec)
Reporter | ||
Comment 8•13 years ago
|
||
Also, if the full migration command ever stops or dies, it's built to just be run over again. It will skip things that look already migrated. That can help if it hits a transient error, or if your shell got disconnected. etc
Reporter | ||
Comment 9•13 years ago
|
||
Oh, but also, the final document count in the Django DB will look something like this:
mysql> select count(*) from wiki_document;
+----------+
| count(*) |
+----------+
| 45411 |
+----------+
1 row in set (0.00 sec)
The migration script filters out a crapload of User:* pages with "Welcome to MindTouch!" boilerplate content, which disturbingly enough comprise roughly half of the content. There are a handful of other pages it skips, like spammy pages that are over 1MB in character length.
Reporter | ||
Comment 10•13 years ago
|
||
And one more thing, the wiki_revisions table count on the Django side will be weird. Kuma has a data model where a Document has a record, and 1 or more associated Revisions. MindTouch has a Page, and 0 or more associated Old records.
So, in Kuma, the revision count is something like (# documents + all revisions) because each Document has a current Revision associated.
But, in MindTouch it's something like (all revisions - # of documents) because the current revision *is* the page.
FWIW, my laptop shows a Django revision count like this:
mysql> select count(*) from wiki_revision;
+----------+
| count(*) |
+----------+
| 227139 |
+----------+
1 row in set (0.00 sec)
Assignee | ||
Comment 11•13 years ago
|
||
This is pending a bug fix in jsocol's Bleach sanitizer library, which (for some reason) seems to cause this migrate job to take an *extremely* long time... potentially infinite looping, we're not sure. The offending line was identified (in fact a single offending character in that line), but *why* it's problematic has not been determined.
Reporter | ||
Comment 12•13 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #11)
> This is pending a bug fix in jsocol's Bleach sanitizer library, which (for
> some reason) seems to cause this migrate job to take an *extremely* long
> time... potentially infinite looping, we're not sure. The offending line was
> identified (in fact a single offending character in that line), but *why*
> it's problematic has not been determined.
I made a fix, but it's not merged yet. But, here's the pending pull request for reference:
https://github.com/jsocol/bleach/pull/61
Reporter | ||
Comment 13•13 years ago
|
||
Alright, this PR has been merged and closed. And, I just updated all the pointers to the latest version in Kuma:
https://github.com/mozilla/kuma/commit/3bb10fa8fba702b4f7ea2e1c3f6519dcf79c1781
I think this should unblock the migration.
Assignee | ||
Comment 14•13 years ago
|
||
I tried running this again, and it ran for several hours without outputting any significant status information (comment 7 indicates there should be something every few seconds... there isn't).
Here is the complete output after 3+ hours, when I Ctrl-C'd the job.
http://jakem.pastebin.mozilla.org/1655570
[root@developeradm.private.scl3 kuma]# python manage.py dbshell
mysql> select count(*) from wiki_revision;
+----------+
| count(*) |
+----------+
| 50889 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from wiki_document;
+----------+
| count(*) |
+----------+
| 3312 |
+----------+
1 row in set (0.00 sec)
I started it back up, and in a few minutes wiki_document grew 2 rows, and wiki_revision grew 146. A minute later wiki_revision was up another 500 or so, but no change in wiki_document.
I'm hoping there's a few pages near the beginning that have an inordinately large number of revisions (by this count method), because the ratio between the 2 values is nothing like comment 9 and 10, which implies (roughly) 5 revisions per page. So far we're getting more like *15* revisions per page.
Assignee | ||
Comment 15•13 years ago
|
||
This is improving slightly, and we're now up to 6255 wiki_documents and 74133 wiki_revisions... around 12 revisions per page. It's running in a screen on developeradm.private.scl3.mozilla.com.
Reporter | ||
Comment 16•13 years ago
|
||
How's this going today? I'm having trouble logging in (bug 761633), but I see 234 pages of docs here:
https://developer-new.mozilla.org/en-US/docs/all
That's pretty promising, since my laptop VM has a full migration from a few months ago and it has 232 pages.
Reporter | ||
Comment 17•13 years ago
|
||
(Oh, and there are 100 docs per page on that view, which makes for ~23400 in en-US. That view only shows one locale, so it won't account for *every* document. But, still, promising.)
Assignee | ||
Comment 18•13 years ago
|
||
This bombed out: http://jakem.pastebin.mozilla.org/1656289
mysql> select count(*) from wiki_document;
+----------+
| count(*) |
+----------+
| 46002 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from wiki_revision;
+----------+
| count(*) |
+----------+
| 217186 |
+----------+
1 row in set (0.00 sec)
Starting it up again since it should be safe to do so, but we might be stuck until that's fixed.
Reporter | ||
Comment 19•13 years ago
|
||
Hmm, I've never seen that error before. I wonder if it might have something to do with a failed migration back when the thing was hanging?
Quickest fix I can think of is to do something like this in the kuma DB:
delete from wiki_document where mindtouch_page_id=86011
Then, start up the full migration again
Assignee | ||
Comment 20•13 years ago
|
||
mysql> select id,title,slug,is_template,is_localizable,locale,current_revision_id,parent_id,category,mindtouch_page_id,modified,parent_topic_id from wiki_document where mindtouch_page_id=86011;
+-------+----------------+--------------------+-------------+----------------+--------+---------------------+-----------+----------+-------------------+---------------------+-----------------+
| id | title | slug | is_template | is_localizable | locale | current_revision_id | parent_id | category | mindtouch_page_id | modified | parent_topic_id |
+-------+----------------+--------------------+-------------+----------------+--------+---------------------+-----------+----------+-------------------+---------------------+-----------------+
| 559 | Using flexbox | Using_flexbox | 0 | 1 | en-US | 9934 | NULL | 0 | 86011 | 2012-06-04 13:02:54 | NULL |
| 1080 | Flexible Box | Flexible_Box | 0 | 1 | en-US | 19072 | NULL | 0 | 86011 | 2012-06-05 10:10:01 | NULL |
| 46017 | Flexible boxes | CSS/Flexible_boxes | 0 | 1 | en-US | 217307 | NULL | 0 | 86011 | 2012-06-05 10:10:45 | NULL |
+-------+----------------+--------------------+-------------+----------------+--------+---------------------+-----------+----------+-------------------+---------------------+-----------------+
3 rows in set (0.00 sec)
mysql> delete from wiki_document where mindtouch_page_id=86011;
Query OK, 3 rows affected (0.01 sec)
Job is started back up.
Reporter | ||
Comment 21•13 years ago
|
||
Okay, spent some time bulletproofing the migration command, and got a few full runs completed on my EC2 VM. This should help get the first full run completed and get incrementals working:
https://github.com/mozilla/kuma/commit/11658810ee4a4c5455e0415cce04bc06164f2caa
Reporter | ||
Comment 22•13 years ago
|
||
One last tweak, and I think this covers things:
https://github.com/mozilla/kuma/commit/738ad9b5b263528f5f99fa50fbe8838f81bfc223
If you can update to that, and re-run a full migration, that'd be awesome. Hopefully that completes in under 15min since it's just the tail-end parts that need doing.
If that works, hopefully you can set up the regular incremental runs from cron
Reporter | ||
Comment 23•13 years ago
|
||
Initial migration completed, and incremental migrations running every 30 min on developer-new.mozilla.org.
Calling this one done.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•