656135 - push MindTouch 2010 to developer.mozilla.org on 2011-08-30

Luke Crouch [:groovecoder]

Reporter

Description

•

14 years ago

No description provided.

Luke Crouch [:groovecoder]

Reporter

Comment 1

•

14 years ago

Jeremy, Do you have notes from the MindTouch 2010 staging upgrade?

Jeremy Orem [:oremj]

Comment 2

•

14 years ago

I think it is going to be: * revert all the security patches * svn switch to the mindtouch 2010 branch * sync that out * run php maintenance/update-db.php * reapply all the patches

Luke Crouch [:groovecoder]

Reporter

Comment 3

•

14 years ago

is there a revert-db.php also?

Jeremy Orem [:oremj]

Comment 4

•

14 years ago

(In reply to comment #3) > is there a revert-db.php also? Nope.

Luke Crouch [:groovecoder]

Reporter

Comment 5

•

14 years ago

did we test a rollback in staging? :/ Craig, did you make most of the MindTouch 2010 bug fixes? Can you be on-hand for the push?

Luke Crouch [:groovecoder]

Reporter

Comment 6

•

14 years ago

After scanning thru the bug 600834 dependencies, I think the potential issues could be: * Product Activation Key - we may have to re-activate MindTouch after the upgrade to fix API issues. (bug 605549 and bug 605645) * Restore site preferences - such as the From: email (bug 646989) * Re-apply security patches (do we have a list of all applied patches?) We would like to push tomorrow at 2pm PST.

Luke Crouch [:groovecoder]

Reporter

Updated

•

14 years ago

Blocks: 593941

Luke Crouch [:groovecoder]

Reporter

Updated

•

14 years ago

Depends on: 656444

Luke Crouch [:groovecoder]

Reporter

Updated

•

14 years ago

No longer blocks: 593941

Luke Crouch [:groovecoder]

Reporter

Updated

•

14 years ago

Blocks: 593941

Justin Dow [:jabba]

Updated

•

14 years ago

Assignee: server-ops → nmaul

Jake Maul [:jakem]

Assignee

Comment 7

•

14 years ago

Should this be WONTFIX'd? We're upgrading developer-stage9 first, right? Then this sometime afterwards? The project plan from MindTouch (received today) calls for stage9 to be upgraded on or around May 27, and presumably a prod upgrade by June 14, if not sooner (depending on QA + bugs found + fixes from MT).

Status: NEW → ASSIGNED

Luke Crouch [:groovecoder]

Reporter

Comment 8

•

14 years ago

No need to WONTFIX it - we will fix it. Just not until 6/14. :)

Jake Maul [:jakem]

Assignee

Updated

•

14 years ago

Whiteboard: [waiting on 656444]

Jake Maul [:jakem]

Assignee

Comment 9

•

14 years ago

During this upgrade, we also need to upgrade Mono to 2.10. Just noting this here so it's not forgotten.

Jake Maul [:jakem]

Assignee

Comment 10

•

14 years ago

See this bug for details on the -stage9 upgrade, which this should roughly mirror: https://bugzilla.mozilla.org/show_bug.cgi?id=656444

Jake Maul [:jakem]

Assignee

Comment 11

•

14 years ago

We are still seeing huge CPU loads on sm-devmostage01 and 02 at times, presumably corresponding with some type of script being run against developer-stage9.mozilla.org. I cannot in good conscience recommend this upgrade. Performance/concurrency on -stage9 is undeniably worse, and the only explanation is that it can't handle pages with large numbers of images or attachments. We are having to restart mono processes daily (or more) to get the servers back operational. The problem is not just memory usage (which seems to be better now), as that was apparently primarily caused by sheppy's mass-import script adding many images/attachments to the same page. The current problem is completely maxxed out CPU usage... high enough to cause mono to completely stop responding on both servers, although the rest of the server is fine. Apache/SSH respond normally... mono/dekiwiki doesn't. I believe this problem will persist in production- it just might take a bit longer before the whole cluster is frozen solid. I don't believe this is some type of legitimate high CPU usage that will go away if it runs long enough- we've tried letting it run for several minutes, with no obvious effect except that Mono/dekiwiki is non-responsive throughout. If you'd like to proceed anyway, just let us know. I don't have any concern about being able to do the upgrade, only that I believe it will adversely affect cluster performance and reliability. The main reason to proceed as planned would be that you know what is causing -stage9 to die tonight, and can prevent it somehow. :) In the meantime, I have stopped -stage9 for the night. Something is killing it, causing on-call to be paged repeatedly. We can easily bring this back up tomorrow morning with a simple 'service dekiwiki9 start' on the 2 servers.

Jay Patel [:jay]

Comment 12

•

14 years ago

jakem: can you contact mindtouch tomorrow morning to see what else they might have to say about this? it's disappointing that even after working with them, we aren't confident enough to do the upgrade... nor do we know exactly why sheppy's script (and any other odd behavior) is bringing stage9 down. a few random questions we might want to ask: 1. if this is a mono issue, and mindtouch 10 is dependent on the new mono... what are our options? is there any way to get the fixes in mindtouch 10 without mono upgrade? 2. why can't they work with you to investigate this further to better understand our issues? getting a simple answer about what *might* be causing the meltdown is not satisfactory... even if what sheppy was doing is an edge case and can't be supported without perf problems. 3. what is the proper escalation path to get to the bottom of our current issues? do they need to fly out their best support team to work directly with our servers? is that something they can do remotely with cooperation from our IT team?

Eric Shepherd [:sheppy]

Comment 13

•

14 years ago

(In reply to comment #11) > I cannot in good conscience recommend this upgrade. Performance/concurrency > on -stage9 is undeniably worse, and the only explanation is that it can't > handle pages with large numbers of images or attachments. We are having to > restart mono processes daily (or more) to get the servers back operational. I did change my script to no longer attach lots of stuff to one page; instead of a single page for all attachments, they're now actually being attached to the pages that use them. I wish I had known in advance before you killed stage9; you sort of bunged up the test I was running against it. :)

Jake Maul [:jakem]

Assignee

Updated

•

14 years ago

Depends on: 661370
No longer depends on: 656444

Whiteboard: [waiting on 656444] → [waiting on 661370]

Luke Crouch [:groovecoder]

Reporter

Comment 14

•

14 years ago

heads up on this. bug 661370 is resolving, so we're tentatively scheduling this for Aug 30th if that's okay.

Jay Patel [:jay]

Comment 15

•

14 years ago

(In reply to Luke Crouch [:groovecoder] from comment #14) > heads up on this. bug 661370 is resolving, so we're tentatively scheduling > this for Aug 30th if that's okay. Let's go for it. It's been long enough... we'll see what happens.

Luke Crouch [:groovecoder]

Reporter

Updated

•

14 years ago

Summary: push MindTouch 2010 to developer.mozilla.org → push MindTouch 2010 to developer.mozilla.org on 2011-08-30

Eric Shepherd [:sheppy]

Comment 16

•

14 years ago

Realistically, it's unlikely to be worse than the current situation, no matter what. :D

Luke Crouch [:groovecoder]

Reporter

Comment 17

•

14 years ago

Don't ever say that! :)

Eric Shepherd [:sheppy]

Comment 18

•

14 years ago

Crap, now I've jinxed it. Dammit. :)

Jake Maul [:jakem]

Assignee

Comment 19

•

14 years ago

From the evaluation bug, here's the output from the 10.0.9 -> 10.1 upgrade: http://etherpad.mozilla.com:9000/miy99arFVY The 'steps' referred to are from here: http://projects.mindtouch.com/Mozilla/Documentation/10.1.1_Upgrade_Steps Note that prod actually has to upgrade from 9.12.3, not 10.0.9. sheppy emailed them to ask if that can be done in one shot, or if we have to do it in 2 stages (9.12->10.0->10.1 or just 9.12->10.1)

Luke Crouch [:groovecoder]

Reporter

Comment 20

•

14 years ago

Jake, you and Sheppy have more experience with this upgrade but considering how much hassle it's been so far I would hesitate to do anything other than what we did on stage9.

Luke Crouch [:groovecoder]

Reporter

Comment 21

•

14 years ago

Jake, Sheppy: Are you both still comfortable doing this tomorrow?

Eric Shepherd [:sheppy]

Comment 22

•

14 years ago

Yes. Jake has synced up with Brian at MindTouch, and they plan to begin at 11 AM PDT; Brian will be in IRC just in case.

Jake Maul [:jakem]

Assignee

Updated

•

14 years ago

Severity: enhancement → normal

Whiteboard: [waiting on 661370]

Jake Maul [:jakem]

Assignee

Comment 23

•

14 years ago

After quite a bit of hassle, this is now completed. http://etherpad.mozilla.com:9000/36zk8fSzT1 The basic procedure was as follows: 1) backup the dekiwiki document root on mradm02 2) backup the database(s) being upgraded We had a custom robots.txt in place... this broke svn switch. Just move it out of the way *before* switching (after might work too but I don't know how to get a clean switch that way). 3) switch to the 10.1.1 SVN branch: svn switch https://svn.mindtouch.com/source/public/dekiwiki/10.1.1/web 4) Deploy to frontends, do *not* restart deki 5) run database upgrade script: cd dekiwiki/maintenance; php update-db.php 6) upgrade Mono by loosely following these instructions, but via puppet instead of directly:http://developer.mindtouch.com/en/docs/mindtouch_setup/010Installation/060Installing_on_CentOS/Installing%2f%2fUpgrade_Mono_on_CentOS 7) issue-multi-command dekiwiki service dekiwiki restart Major hurdles encountered (that I can remember... we went through a lot): 1) On step 4, the part about not restarting dekiwiki was not initially understood. This resulted in 500 ISE errors for the whole devmo site until we could roll back. 2) Lucene indexes apparently needed to be rebuilt. These are stored on the NetApp, and it was a simple matter of removing the existing indexes, once this was diagnosed. I believe this is still rebuilding now, in the background (automatically). 3) There is a separate license key in MT 10 that is needed to allow anonymous (non-logged-in) access. This is now pushed out from mradm02 along with the rest of the content. Without this all wiki pages redirect to a login page. 4) Mono needed to refresh it's SSL keystore cache of CAs on each web head. 5) The UI cache had to be flushed to fix various resource string errors reported by sheppy and others. This is done inside the dekiwiki admin interface. Sheppy and Brian from MT did this. 6) A few extensions "lost" their manifest setting, and sheppy had to port these over from the staging site. I'm not entirely sure what this means, but these settings are apparently pointers to files in the dekiwiki/ dir somewhere. Why these were lost is a mystery. There are still some small issues here and there that the MDN team is opening separate bugs for. However, the main push is done so I'm closing this one out... finally. Yay!

Severity: normal → enhancement

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

raymond [:retornam] (inactive)

Comment 24

•

14 years ago

verified fixed http://developer.mozilla.org

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard