push MindTouch 2010 to developer.mozilla.org on 2011-08-30

VERIFIED FIXED

Status

--
enhancement
VERIFIED FIXED
7 years ago
5 years ago

People

(Reporter: groovecoder, Assigned: nmaul)

Tracking

Details

(URL)

Comment hidden (empty)
(Reporter)

Comment 1

7 years ago
Jeremy,

Do you have notes from the MindTouch 2010 staging upgrade?

Comment 2

7 years ago
I think it is going to be:
* revert all the security patches
* svn switch to the mindtouch 2010 branch
* sync that out
* run php maintenance/update-db.php
* reapply all the patches
(Reporter)

Comment 3

7 years ago
is there a revert-db.php also?

Comment 4

7 years ago
(In reply to comment #3)
> is there a revert-db.php also?

Nope.
(Reporter)

Comment 5

7 years ago
did we test a rollback in staging? :/

Craig, did you make most of the MindTouch 2010 bug fixes? Can you be on-hand for the push?
(Reporter)

Comment 6

7 years ago
After scanning thru the bug 600834 dependencies, I think the potential issues could be:

* Product Activation Key - we may have to re-activate MindTouch after the upgrade to fix API issues. (bug 605549 and bug 605645)
* Restore site preferences - such as the From: email (bug 646989)
* Re-apply security patches (do we have a list of all applied patches?)

We would like to push tomorrow at 2pm PST.
(Reporter)

Updated

7 years ago
Blocks: 593941
(Reporter)

Updated

7 years ago
Depends on: 656444
(Reporter)

Updated

7 years ago
No longer blocks: 593941
(Reporter)

Updated

7 years ago
Blocks: 593941

Updated

7 years ago
Assignee: server-ops → nmaul
(Assignee)

Comment 7

7 years ago
Should this be WONTFIX'd? We're upgrading developer-stage9 first, right? Then this sometime afterwards?

The project plan from MindTouch (received today) calls for stage9 to be upgraded on or around May 27, and presumably a prod upgrade by June 14, if not sooner (depending on QA + bugs found + fixes from MT).
Status: NEW → ASSIGNED
(Reporter)

Comment 8

7 years ago
No need to WONTFIX it - we will fix it. Just not until 6/14. :)
(Assignee)

Updated

7 years ago
Whiteboard: [waiting on 656444]
(Assignee)

Comment 9

7 years ago
During this upgrade, we also need to upgrade Mono to 2.10. Just noting this here so it's not forgotten.
(Assignee)

Comment 10

7 years ago
See this bug for details on the -stage9 upgrade, which this should roughly mirror:

https://bugzilla.mozilla.org/show_bug.cgi?id=656444
(Assignee)

Comment 11

7 years ago
We are still seeing huge CPU loads on sm-devmostage01 and 02 at times, presumably corresponding with some type of script being run against developer-stage9.mozilla.org.

I cannot in good conscience recommend this upgrade. Performance/concurrency on -stage9 is undeniably worse, and the only explanation is that it can't handle pages with large numbers of images or attachments. We are having to restart mono processes daily (or more) to get the servers back operational.

The problem is not just memory usage (which seems to be better now), as that was apparently primarily caused by sheppy's mass-import script adding many images/attachments to the same page. The current problem is completely maxxed out CPU usage... high enough to cause mono to completely stop responding on both servers, although the rest of the server is fine. Apache/SSH respond normally... mono/dekiwiki doesn't.

I believe this problem will persist in production- it just might take a bit longer before the whole cluster is frozen solid. I don't believe this is some type of legitimate high CPU usage that will go away if it runs long enough- we've tried letting it run for several minutes, with no obvious effect except that Mono/dekiwiki is non-responsive throughout.


If you'd like to proceed anyway, just let us know. I don't have any concern about being able to do the upgrade, only that I believe it will adversely affect cluster performance and reliability. The main reason to proceed as planned would be that you know what is causing -stage9 to die tonight, and can prevent it somehow. :)


In the meantime, I have stopped -stage9 for the night. Something is killing it, causing on-call to be paged repeatedly. We can easily bring this back up tomorrow morning with a simple 'service dekiwiki9 start' on the 2 servers.

Comment 12

7 years ago
jakem:  can you contact mindtouch tomorrow morning to see what else they might have to say about this?  it's disappointing that even after working with them, we aren't confident enough to do the upgrade... nor do we know exactly why sheppy's script (and any other odd behavior) is bringing stage9 down.

a few random questions we might want to ask:

1. if this is a mono issue, and mindtouch 10 is dependent on the new mono... what are our options?   is there any way to get the fixes in mindtouch 10 without mono upgrade?

2. why can't they work with you to investigate this further to better understand our issues?   getting a simple answer about what *might* be causing the meltdown is not satisfactory... even if what sheppy was doing is an edge case and can't be supported without perf problems.

3. what is the proper escalation path to get to the bottom of our current issues?  do they need to fly out their best support team to work directly with our servers?  is that something they can do remotely with cooperation from our IT team?
(In reply to comment #11)

> I cannot in good conscience recommend this upgrade. Performance/concurrency
> on -stage9 is undeniably worse, and the only explanation is that it can't
> handle pages with large numbers of images or attachments. We are having to
> restart mono processes daily (or more) to get the servers back operational.

I did change my script to no longer attach lots of stuff to one page; instead of a single page for all attachments, they're now actually being attached to the pages that use them.

I wish I had known in advance before you killed stage9; you sort of bunged up the test I was running against it. :)
(Assignee)

Updated

7 years ago
Depends on: 661370
No longer depends on: 656444
Whiteboard: [waiting on 656444] → [waiting on 661370]
(Reporter)

Updated

7 years ago
Blocks: 653935
(Reporter)

Comment 14

7 years ago
heads up on this. bug 661370 is resolving, so we're tentatively scheduling this for Aug 30th if that's okay.

Comment 15

7 years ago
(In reply to Luke Crouch [:groovecoder] from comment #14)
> heads up on this. bug 661370 is resolving, so we're tentatively scheduling
> this for Aug 30th if that's okay.

Let's go for it.  It's been long enough... we'll see what happens.
(Reporter)

Updated

7 years ago
Summary: push MindTouch 2010 to developer.mozilla.org → push MindTouch 2010 to developer.mozilla.org on 2011-08-30
Realistically, it's unlikely to be worse than the current situation, no matter what. :D
(Reporter)

Comment 17

7 years ago
Don't ever say that! :)
Crap, now I've jinxed it. Dammit. :)
(Assignee)

Comment 19

7 years ago
From the evaluation bug, here's the output from the 10.0.9 -> 10.1 upgrade:
http://etherpad.mozilla.com:9000/miy99arFVY

The 'steps' referred to are from here:
http://projects.mindtouch.com/Mozilla/Documentation/10.1.1_Upgrade_Steps

Note that prod actually has to upgrade from 9.12.3, not 10.0.9. sheppy emailed them to ask if that can be done in one shot, or if we have to do it in 2 stages (9.12->10.0->10.1 or just 9.12->10.1)
(Reporter)

Comment 20

7 years ago
Jake, you and Sheppy have more experience with this upgrade but considering how much hassle it's been so far I would hesitate to do anything other than what we did on stage9.
(Reporter)

Comment 21

7 years ago
Jake, Sheppy:

Are you both still comfortable doing this tomorrow?
Yes.

Jake has synced up with Brian at MindTouch, and they plan to begin at 11 AM PDT; Brian will be in IRC just in case.
(Assignee)

Updated

7 years ago
Severity: enhancement → normal
Whiteboard: [waiting on 661370]
(Assignee)

Comment 23

7 years ago
After quite a bit of hassle, this is now completed.

http://etherpad.mozilla.com:9000/36zk8fSzT1


The basic procedure was as follows:

1) backup the dekiwiki document root on mradm02

2) backup the database(s) being upgraded

We had a custom robots.txt in place... this broke svn switch. Just move it out of the way *before* switching (after might work too but I don't know how to get a clean switch that way).
3) switch to the 10.1.1 SVN branch: 
svn switch https://svn.mindtouch.com/source/public/dekiwiki/10.1.1/web

4) Deploy to frontends, do *not* restart deki

5) run database upgrade script:  cd dekiwiki/maintenance; php update-db.php

6) upgrade Mono by loosely following these instructions, but via puppet instead of directly:http://developer.mindtouch.com/en/docs/mindtouch_setup/010Installation/060Installing_on_CentOS/Installing%2f%2fUpgrade_Mono_on_CentOS

7) issue-multi-command dekiwiki service dekiwiki restart



Major hurdles encountered (that I can remember... we went through a lot):

1) On step 4, the part about not restarting dekiwiki was not initially understood. This resulted in 500 ISE errors for the whole devmo site until we could roll back.

2) Lucene indexes apparently needed to be rebuilt. These are stored on the NetApp, and it was a simple matter of removing the existing indexes, once this was diagnosed. I believe this is still rebuilding now, in the background (automatically).

3) There is a separate license key in MT 10 that is needed to allow anonymous (non-logged-in) access. This is now pushed out from mradm02 along with the rest of the content. Without this all wiki pages redirect to a login page.

4) Mono needed to refresh it's SSL keystore cache of CAs on each web head.

5) The UI cache had to be flushed to fix various resource string errors reported by sheppy and others. This is done inside the dekiwiki admin interface. Sheppy and Brian from MT did this.

6) A few extensions "lost" their manifest setting, and sheppy had to port these over from the staging site. I'm not entirely sure what this means, but these settings are apparently pointers to files in the dekiwiki/ dir somewhere. Why these were lost is a mystery.


There are still some small issues here and there that the MDN team is opening separate bugs for. However, the main push is done so I'm closing this one out... finally. Yay!
Severity: normal → enhancement
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
verified fixed http://developer.mozilla.org
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.