Closed Bug 707587 Opened 13 years ago Closed 13 years ago

zimbra mail via http is next to impossible to use

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: justdave)

References

Details

(Whiteboard: [Zimbra #00080267][HP #4636507424])

Zimbra is so slow and has been since thursday last week.  It takes me about 5 seconds to view an email and while I do that my browser is next to frozen (doesn't respond to key strokes or other clicks in the same tab).
I can't find anything in the performance graphs to indicate that there's been any kind of issues with Zimbra this weekend aside from the time period when the backups were running on Saturday night.  Thursday Zimbra was pretty much unusable all day (see email you should have received titled "RFO: Zimbra Performance Issues 2011/12/01" that was sent on Friday afternoon).  Since then (with the exception of when the backup was running) it appears to have been fine.  It's been working fine for me all weekend.  Has Zimbra loaded emails faster than 5 seconds before?  I've never seen it work any faster than that, ever.  That's part of the price of using a web interface.  My browser doesn't seem unusable during that 5 seconds though.
(In reply to Dave Miller [:justdave] from comment #1)
> My browser doesn't seem unusable during that 5 seconds though.

Oh, this is because I have my browser set to open emails in a new window instead of using the preview panel.
s/browser/zimbra prefs/
I have been using Zimbra as a webapp exclusively for 3+ years at Mozilla and there have been rough patches here and there, but I am used to waiting about 1 second for a mail message to load not 5 seconds.  Worse yet, if I delete a message after it is loaded it seems to take an additional 5 seconds.  

As a reference gmail and hotmail are other email webapps that I use regularly and they have much faster response times (instant 99% of the time).  It might be time to set up a forward rule to just forward everything to gmail.
Of course, we *have* had issues this morning since I posted that.  Symptoms seem to be indicated the disk array again (not all of the problem disks got replaced yet - have to wait for one to rebuild before you can swap the next one, etc).
I also noticed that using Firefox 8 for the last few days was a BIG mistake.  Firefox 11a1 and Chrome seem to work better although as you mentioned there are some disk issues causing slowness.
I think we've been having issues since the drive rebuild completed on the drive we swapped Thursday actually.  There's two more disks to swap yet.
This is slowly devolving again as people come into work and usage is picking up :(
phong is enroute to the colo to swap the next drive.
Assignee: server-ops → justdave
Severity: major → blocker
zimbra server is no longer responding at all.
Is there a separate bug for IMAP?  That's been down too for the last hour or more
(In reply to Chris Beard from comment #11)
> Is there a separate bug for IMAP?  That's been down too for the last hour or
> more

No, IMAP is part of Zimbra.  Here's a quick summary copied off Yammer (bugzilla was down at the time, too, due to the phx network issues) :

David Miller (Announcement) 
Zimbra issues continued
We are experiencing some performance issues with Zimbra again this morning.  Right now it appears that everything is still operating, but it's slow.  It still appears to be related to the degraded disk array.  Because of the type of RAID in use, the disks can only be swapped one at a time and we have to wait for one to rebuild before we can swap the next one, and it takes almost an entire day to rebuild one. Phong's going to the datacenter this morning to get the next one swapped.

11 hours ago

--

mrz: Quick update - Zimbra is offline while we work with the hardware vendor on the storage array issues. WIll update here.

9 hours ago

--

mrz: Team is still on it. It's a 4TB disk array that's just taking time to verify.

7 hours ago

--

Corey Shields: We are in a holding pattern while data copies, so some background and update on this:

While changing a bad disk in the array (which was mentioned in justdave's original post), as soon as the replacement disk was inserted 3 other good disks in the array failed immediately, putting the array in a failed state. Those of you who have dealt with raid arrays before know that this shouldn't/doesn't happen, and HP was consulted to find out why and what to do next. Their response was less than helpful and their solution was to write it off. We disagreed, and forced the disks in to a good state and got access to the data again, but with a blank disk marked as active in the array there was a bit of confusion.

There is a bit of corruption in the array, and some steps we are taking to try and recover from that. In fact, there are many solutions we are working on simultaneously to 1) recover email functionality, and 2) restore email data. Each of these solutions takes a lot of time and effort, and we will be working through the night to try and speed the process up.

That said, I don't have a firm ETA to recovery, because it would depend on which solution is finished first and which way we decide to go. It is likely that we will end up restoring functionality and a portion of data while more data is restored later on.

A quick apology to you all, I know this has been an important mail day for some teams. Believe me when I say I would have opted for any other day (or better yet not at all).

3 hours ago

--
=========

As of this moment we still have no firm ETA.  We estimate about 4 hours remaining on an rsync of the data to more reliable storage, followed by an fsck of the original drive, which we're unsure how long it'll take (but likely multiple more hours).  We'll be attempting to get something up and running on a different box based on the preliminary rsync while we wait for the fsck to run, but again, no guarantees how soon.  So many variables here it's hard to estimate anything.
Any update?
from mrz's post on the intranet forum:

-----8<-----
Following up here in the forums on the course of action and what you can expect today and over the next couple days.

As of 11a Pacific, Zimbra is back online. We had several plans of attack over the past two days and as one would fall apart, another would present. I want to thank you for being patient with us - we tried as best as we could to share information on Yammer.

Here's what works right now:

    inbound email is catching up from Monday
    Zimbra is only accessible at https://mail.mozilla.com/
    IMAP is disabled


We've left IMAP disabled so we can work with users who have client-side cached IMAP folders (more on this below).

However, since the Web interface is up, so is CalDAV. Calendar data was restored to its state as of October 22. Anything entered since that time will have been lost. We have some data that was salvaged off the corrupted disk array. If you lost something important it's possible (but not guaranteed) that we might be able to go look for it.

Where are we today?
We restored account information from the last known good full backup, Oct 22. To put it in other words, Zimbra was reset to the state it was on Oct 22 (including calendar data). Part of the delay this morning was manually adding all accounts, aliases and distribution lists created since then.
This is a partial restore and does not include all email body content.

This includes

    accounts & aliases
    email filters
    distribution lists
    message folders structure
    calendar data
    message headers

This does not include


    message content
    email attachments


If you login to Zimbra you'll see messages but clicking them will generate an error. That's expected for now. New email will work.

What's next?
Over the next few days Ops will be restoring individual email accounts (message content & attachments). We will be prioritizing based on business needs and then by alphabetical order (yes, I'm towards the end here).

We'll update on this as we get more clarity on how quicly this can happen.

IMAP User
A number of you use Thunderbird, Postbox or Mail.app and have it configured for offline mode. You likely have a really good backup of your email.
We will work with you to enable IMAP on your account and help sync your email back to Zimbra. Please file bugs to help us track who you are.

CalDAV
Unfortunately there wasn't any easy way to enable Web access without CalDAV. If your calendar was set to sync, you've likley lost calendar state since Oct 22. If you know you had something important, contact us. We have some data salvaged off the corrupted array that we can dig through. It's possible we might be able to recover something, but no guarantees.
-----8<-----
Whiteboard: [Zimbra #00080267][HP #4636507424]
Zimbra server is back online and we have another bug to track the various restores.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.