Etherpad data integrity issues

RESOLVED FIXED

Status

Infrastructure & Operations
Change Requests
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: sheeri, Unassigned)

Tracking

Details

(Reporter)

Description

3 years ago
Etherpad has data integrity issues, and we'd like to fail over the db to the slave to try to mitigate.
(Reporter)

Comment 1

3 years ago
Etherpad db was filling up disk space in /, so I moved the db to /data. 

In the process, the MyISAM table "store" was marked as crashed and needed to be repaired. The repair restored 120 records:

2015-10-28 16:27:59 13292 [Note] Found 4143174 of 4143054 rows when repairing './etherpad/store'
+----------------+--------+----------+---------------------------------------------------+
| Table          | Op     | Msg_type | Msg_text                                          |
+----------------+--------+----------+---------------------------------------------------+
| etherpad.store | repair | info     | Wrong bytesec: 108-108- 44 at 1116783308; Skipped |
| etherpad.store | repair | info     | Wrong bytesec: 115-115-105 at 1116778176; Skipped |
| etherpad.store | repair | info     | Wrong bytesec:  52- 71-116 at 1116785704; Skipped |
| etherpad.store | repair | info     | Wrong bytesec:  58-106-105 at 1116785548; Skipped |
| etherpad.store | repair | info     | Wrong bytesec:  58- 49-110 at 1116794512; Skipped |
| etherpad.store | repair | warning  | Number of rows changed from 4143054 to 4143174    |
| etherpad.store | repair | status   | OK                                                |
+----------------+--------+----------+---------------------------------------------------+
7 rows in set (35.71 sec)

This resulted in a 4 minute outage from 16:23 to 16:27 UTC (9:23 - 9:27 Pacific).
(Reporter)

Comment 2

3 years ago
Received a complaint that https://public.etherpad-mozilla.org/p/measurement-team-meeting-notes is blank after the outage. 

Etherpad's database is not set up in a way that we can extract text or history without using the API. There aren't any commandline tools available, so we'd like to fail over to the redundant slave (which didn't crash) in the hopes that the history/text is still there.
(Reporter)

Updated

3 years ago
Assignee: team73 → server-ops
Component: DB: MySQL → Change Requests
Product: Data & BI Services Team → Infrastructure & Operations
QA Contact: scabral → lypulong
(Reporter)

Updated

3 years ago
Cab Review: --- → ?
(Reporter)

Comment 3

3 years ago
needinfo'ing jakem, as this needs updating: https://mana.mozilla.org/wiki/display/websites/etherpad.mozilla.org#etherpad.mozilla.org-RestartEtherpad
Flags: needinfo?(nmaul)
(Reporter)

Comment 4

3 years ago
svn sysadmins repo r109826 committed to change config of etherpad db's to swap master and slave (configs only, nothing changes until the lb changes).
(Reporter)

Comment 5

3 years ago
:atoll stopped etherpad, I updated the load balancer, :atoll restarted etherpad. Functionality is good, unfortunately the etherpad that lost data, overwrote with an empty pad, so the slave did not have any history.

There may be other pads that lost data, but due to the nature of how etherpad stores data in the db, it's not possible to sleuth out how many pads were affected.
Cab Review: ? → emergency

Comment 6

3 years ago
Seems closable.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(nmaul)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.