Closed
Bug 894913
Opened 11 years ago
Closed 11 years ago
Specific Etherpads becoming unavailable
Categories
(Infrastructure & Operations :: IT-Managed Tools, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: davida, Assigned: bjohnson)
References
Details
Attachments
(1 file)
598 bytes,
text/plain
|
Details |
Discussed, in #it, but: https://teach.etherpad.mozilla.org/party-calll https://etherpad.mozilla.org/opennews-2014-fellowships-develop-capture are at least two that are still down. https://etherpad.mozilla.org/toronto-standing-desk-tryout was down, had data loss, even though it's back up (without some of the data).
Comment 1•11 years ago
|
||
etherpad3.webapp.phx1.mozilla.com:/root/etherpad-2013-07-17.txt has the recent contents of the screen session. Instances of: Exception in thread "1700983583@qtp-554167265-5100" net.appjet.ajstdlib.SocketMa nager$HandlerException: An error occurred while handling a request: 500 - You li ke apples? An error occurred in the error handler while handling an error. How d o you like <i>them</i> apples?<br> net.appjet.bodylock.JSRuntimeException: Error while executing: TypeError: Can't use instanceof on a non-object. (module etherpad/log.js#121)<br>
Comment 2•11 years ago
|
||
This one too if it helps to identify the issue. https://etherpad.mozilla.org/5iHD1O6XeK
Comment 3•11 years ago
|
||
If anyone would like a specific bad pad removed because they have the data elsewhere and/or can recreate the data this can be done by the current oncall SRE via: https://etherpad.mozilla.org/ep/admin/delete-pad
Comment 4•11 years ago
|
||
Pad that seems to fit this description: https://etherpad.mozilla.org/devtools-firstweek Can we recover the data?
Comment 5•11 years ago
|
||
Also: https://webmaker.etherpad.mozilla.org/crossteamcall
Comment 6•11 years ago
|
||
We are working on a way to recover the lost text, but etherpad is tricky. We can't make any promises, but we're developing what we can. I will update you on or before 2 pm Pacific.
Comment 7•11 years ago
|
||
We have recovered what was at https://teach.etherpad.mozilla.org/party-calll and put it in https://teach.etherpad.mozilla.org/party-call
Comment 8•11 years ago
|
||
https://etherpad.mozilla.org/5iHD1O6XeK has been remade as https://etherpad.mozilla.org/new5iHD1O6XeK
Comment 9•11 years ago
|
||
https://etherpad.mozilla.org/devtools-firstweek has been re-created as https://etherpad.mozilla.org/devtools-1stweek
Assignee | ||
Comment 10•11 years ago
|
||
All etherpads reported so far have been fixed. Please re-open if any additional issues occur with these etherpads or open a new bug if a new etherpad break is found. Thanks!
Assignee | ||
Updated•11 years ago
|
Assignee: server-ops-webops → bjohnson
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 11•11 years ago
|
||
https://etherpad.mozilla.org/weekly-addons-mtg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 12•11 years ago
|
||
Oh, sorry, I misread this. I'll file a new bug.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 13•11 years ago
|
||
Actually, please use this bug as a centralized place for all the broken etherpads. We're actually working on a way to proactively find broken ones too.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 15•11 years ago
|
||
https://taiwan.etherpad.mozilla.org/device-agile-team
Comment 16•11 years ago
|
||
https://taiwan.etherpad.mozilla.org/255
Comment 17•11 years ago
|
||
https://teach.etherpad.mozilla.org/DogDays
Comment 18•11 years ago
|
||
https://sumo.etherpad.mozilla.org/234
Reporter | ||
Comment 20•11 years ago
|
||
Hi, @sheeri: any ETA on that broken-etherpad-finding-and-repairing script?
Comment 21•11 years ago
|
||
SHeeri - do we have an idea of what has caused all these pads to break and loose data in some cases?
Comment 22•11 years ago
|
||
Sylvie - yes, we did a routine database failover and any pad that was being written to at the time seems to have been affected. Different etherpads have different authentications so we've been trying to find an internal way to find the broken pads. We researched a few ways to find broken pads by comparing fields in the database but haven't come up with a 100% correlation yet between differences and broken etherpads. We may have to resort to spidering the public etherpads to find broken ones, but we have not come up with a way to find all the broken pads among all the different private etherpads.
Comment 23•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #22) > Sylvie - yes, we did a routine database failover and any pad that was being > written to at the time seems to have been affected. > > Different etherpads have different authentications so we've been trying to > find an internal way to find the broken pads. We researched a few ways to > find broken pads by comparing fields in the database but haven't come up > with a 100% correlation yet between differences and broken etherpads. We may > have to resort to spidering the public etherpads to find broken ones, but we > have not come up with a way to find all the broken pads among all the > different private etherpads. Thanks. If we do not have a solution soon, is there a way to revert back to the prior database or an option to try?
Assignee | ||
Comment 24•11 years ago
|
||
https://taiwan.etherpad.mozilla.org/30 fixed Fixing the other pads now.
Comment 25•11 years ago
|
||
Sylvie - we have backups, but then we'll end up losing the information that has been put into all etherpads since the incident. It's not easy to find all the etherpads that were changed in a time period and export them all, unfortunately.
Assignee | ||
Comment 26•11 years ago
|
||
https://taiwan.etherpad.mozilla.org/device-agile-team fixed.
Comment 27•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #22) > Sylvie - yes, we did a routine database failover and any pad that was being > written to at the time seems to have been affected. 'routine' sounds like we expect this to happen again. I lost a fair bit of work with this failover. Is this really routine? Thanks
Comment 28•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #25) > Sylvie - we have backups, but then we'll end up losing the information that > has been put into all etherpads since the incident. It's not easy to find > all the etherpads that were changed in a time period and export them all, > unfortunately. OK I assumed they DB's were active/passive and replicating and data loss would be at a minimum and less impactful then the situation today?
Assignee | ||
Comment 29•11 years ago
|
||
https://taiwan.etherpad.mozilla.org/255 was empty. Free'd up the URL.
Comment 30•11 years ago
|
||
They are active/passive replicating. But the way etherpads work is not a traditional client-server application. The data loss wasn't from the database losing information - the information is there (if it was able to be saved), which is how we can recover the etherpads. Etherpads depend on javascript (node.js in particular) and folks can make changes to a doc when the database isn't available, those changes are supposed to be saved when the database comes back online, but etherpad is not always good about that. Etherpad corruption happens frequently; Jake reports that their team usually fixes a few a week.
Comment 31•11 years ago
|
||
(In reply to Joe Walker [:jwalker] from comment #27) > (In reply to Sheeri Cabral [:sheeri] from comment #22) > > Sylvie - yes, we did a routine database failover and any pad that was being > > written to at the time seems to have been affected. > > 'routine' sounds like we expect this to happen again. I lost a fair bit of > work with this failover. Is this really routine? > > Thanks Yes, the database failover is really routine. We have done this on average once every 3 months for the past 18 months, and have not had corruption like this before.
Assignee | ||
Comment 32•11 years ago
|
||
https://teach.etherpad.mozilla.org/DogDays was empty. Free'd up the URL.
Assignee | ||
Comment 33•11 years ago
|
||
https://sumo.etherpad.mozilla.org/234 fixed.
Comment 34•11 years ago
|
||
(In reply to SylvieV from comment #28) > (In reply to Sheeri Cabral [:sheeri] from comment #25) > > Sylvie - we have backups, but then we'll end up losing the information that > > has been put into all etherpads since the incident. It's not easy to find > > all the etherpads that were changed in a time period and export them all, > > unfortunately. > > OK I assumed they DB's were active/passive and replicating and data loss > would be at a minimum and less impactful then the situation today? OK - Jake for another time- lets see how else etherpad like services can be delivered - maybe some SaaS offering while we work out the ne Communication and Collaboration tools for Mozillians
Assignee | ||
Comment 35•11 years ago
|
||
https://etherpad.mozilla.org/summit-peopleandprocess fixed. As for now, all etherpads that are reported broken are fixed. Please let me know if we find any others and I'll fix them. I'm still actively working on a script to proactively identify pads that broke during this event.
Comment 36•11 years ago
|
||
(In reply to Brandon Johnson [:cyborgshadow] from comment #32) definitely wasn't empty. Entire team was hacking on it yesterday...
Comment 37•11 years ago
|
||
Lost https://etherpad.mozilla.org/swc-data --- would appreciate recovery of data.
Comment 38•11 years ago
|
||
https://etherpad.mozilla.org/devtools-meeting Any edits I try to make here don't go through and aren't visible after reload.
Comment 39•11 years ago
|
||
I cannot access https://etherpad.mozilla.org/FirefoxWalkthrough
Assignee | ||
Comment 40•11 years ago
|
||
(In reply to Laura Hilliger [:epilepticrabbit] from comment #36) > (In reply to Brandon Johnson [:cyborgshadow] from comment #32) > definitely wasn't empty. Entire team was hacking on it yesterday... Hi Laura, Unfortunately there was no data in the database for this pad. I'm really sorry. :( New pads are definitely stable. This incident was isolated to that specific timeframe.
Assignee | ||
Comment 41•11 years ago
|
||
https://etherpad.mozilla.org/swc-data Fixed. https://etherpad.mozilla.org/FirefoxWalkthrough Fixed. fitzgen: Your pad (https://etherpad.mozilla.org/devtools-meeting ) exists and opens properly. The issue is not related to this incident. Please file a new bug with webops.
Comment 42•11 years ago
|
||
Went through all the public etherpads and found 36 out of 3600 touched in the past few days since the incident. Attaching them here.
Assignee | ||
Comment 43•11 years ago
|
||
All etherpads from comment 42's attachment are fixed. Note that many of these appear to be normal anomalies from foreign utf8 characters and unrelated to the incident yesterday.
Comment 44•11 years ago
|
||
when we have resolved the etherpad issues - can we send an incident management note on resolution and any insight to the root cause please?
Many of us are getting periodically disconnected from https://etherpad.mozilla.org/qa-staff-meeting - we've tried different browsers, and restarting Firefox, etc.
Comment 46•11 years ago
|
||
Stephen - the database is stable and has been since yesterday. This bug is for the unavailable etherpads, that do not load up at all. The server has an error, there is a red box with the text: Oops! A server error occured. It's been logged. Any other problems with etherpad are unrelated to this bug.
(In reply to Sheeri Cabral [:sheeri] from comment #46) > Stephen - the database is stable and has been since yesterday. This bug is > for the unavailable etherpads, that do not load up at all. The server has an > error, there is a red box with the text: > > Oops! A server error occured. It's been logged. > > Any other problems with etherpad are unrelated to this bug. Thanks, will file a separate bug, then.
Comment 48•11 years ago
|
||
Hello - this etherpad disconnects every few seconds after activity and won't save any changes - https://webmakersupport.etherpad.mozilla.org/FAQ thanks for any help!
Comment 49•11 years ago
|
||
Jacob - please see comment 46.
Comment 50•11 years ago
|
||
(In reply to PTO until 6 Aug 2013 Sheeri Cabral [:sheeri] from comment #13) > Actually, please use this bug as a centralized place for all the broken > etherpads. We're actually working on a way to proactively find broken ones > too. Well if this is to be tracking bug for all ehterpad issues then the other etherpad bugs should be dependencies.
Seems like it might be related to the core bug : bug 887753 Having said that I am experiencing this issue on safari, chrome, and firefox / aurora.
More affected etherpads: https://mozqa.etherpad.mozilla.org/b2g-qa-roundtable https://mozqa.etherpad.mozilla.org/b2g-standup https://etherpad.mozilla.org/b2g-meeting-notes https://etherpad.mozilla.org/qa-staff-meeting https://mozqa.etherpad.mozilla.org/nhirata-todo https://mozqa.etherpad.mozilla.org/nhirata-todo2
Assignee | ||
Comment 53•11 years ago
|
||
Hi Naoki, None of the etherpads you listed are affected by this issue. Please see comment 46. Bill, This is not for all etherpad issues, only a single specific issue where pads have a red box that says the below and do not load at all: "Oops! A server error occured. It's been logged. Any other problems with etherpad are unrelated to this bug." We also fixed all the public bugs that were affected by this issue. Only team etherpads should have this issue, although I believe we may have fixed them all so far.
Comment 54•11 years ago
|
||
Another broken one that needs immediate attention for Maya/Debbie Cohen: https://etherpad.mozilla.org/LEAD-20Etherpad
Comment 56•11 years ago
|
||
Fixed https://etherpad.mozilla.org/LEAD-20Etherpad It contained some top bit set characters: 0xe2 0x80 0xa8
Assignee | ||
Comment 57•11 years ago
|
||
Since no bugs have come in for over 12 days now that have been affected by this issue, I'm closing it out. bit set characters are an unrelated issue.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•