Closed Bug 87006 Opened 23 years ago Closed 22 years ago

shadow database gets corrupted frequently

Categories

(Bugzilla :: Bugzilla-General, defect, P1)

2.13

Tracking

()

RESOLVED FIXED
Bugzilla 2.18

People

(Reporter: justdave, Assigned: justdave)

References

Details

The shadow database seems to be getting corrupted frequently on 
bugzilla.mozilla.org.  This usually manifests itself in the profiles or 
login_cookies tables.  Since b.m.o runs a nightly |syncshadowdb --syncall| and 
this corruption seems to be there for a large number of days in a row, this seems 
to indicate something is wrong with the "syncall" process.
*** Bug 74856 has been marked as a duplicate of this bug. ***
Let's rip out the shadow database code!  Yeah!
Target Milestone: --- → Bugzilla 2.16
Priority: -- → P1
moving
Assignee: tara → justdave
Component: Bugzilla → Bugzilla-General
Product: Webtools → Bugzilla
Version: Bugzilla 2.13 → 2.10
correcting version field lost in product move
Version: 2.10 → 2.13
Is this still a problem?  Is there anything we can do for 2.16?  This sounds
like another one of those unreproducable problems.
*** Bug 117120 has been marked as a duplicate of this bug. ***
in today's bug list: 120464 has 120465s summary, 120463 has 120464's summary,
120457 has 120458's summary, 120456 has 120457's summary, 120455 has 120456's
summary, 120453 has 120454's summary. Etc. Didn't look at the other attribs now.
(messing with the severity since this is kinda getting in the way a bit)
Severity: normal → major
this will be fixed in 7 hours when the shadow database regenerates.
I'm giving you the day off. =)


As for what causes this problem, i still don't know. I believe
when this bug was reported the shadow database wasn't being
regenerated every night which wasn't fixing the problem every
night and made the problem seem more common than it is.
*** Bug 120484 has been marked as a duplicate of this bug. ***
While I don't know about the "corruption that lasts even when regenerated bug",
a frequent problem seems to be when all the bug summaries get out of whack by 1
(eg dupe bug #120484).

Is that caused by us being bitten by bug #104589 during the regular small sync
process?  It seems to me to fit the description, because the shadow sync SQL
commands could get run twice.
I am fairly certain that Dave was wrong about the syncall. There was a time when
syncall was turned off or broken on b.m.o for quite a while.
OK, well upon reading bug #120484, I'm still very suspicious of the window-close
situation, but I'm not sure the scenario I envisage would cause those symptoms. 
If an INSERT command got executed twice I would expect all bugs to be shifted,
and I would expect N+1 to get the summary of N, not N to get the summary of N+1.

I believe the process will get killed by window-close even though it is a fork. 
I remember this being fingered as a reason why the shadow often gets more out of
sync than you might expect.

Doesn't bmo now have logging on for MySQL?  Hence, wouldn't we have a log of the
recent corruption?
no patch, not a blocker, 2.16 is now in freeze mode

-> 2.18

adding dependency to bug 104589 because I think that stands a REAL good chance
of being the cause of it.
Depends on: 104589
Target Milestone: Bugzilla 2.16 → Bugzilla 2.18
I've just looked through how the shadow db stuff is propogated. Is the following
correct?

1. When an insert/update/replace is done, we store the query into the shadowlog
table. There is some magic to get LAST_INDERT_ID working
2. At the end of a page, we run syncshadowdb to write all those out.

Doesn't this avoid locks? The order queries are sent to the database is not the
order which they are committed, necessarily. Also, the last insert id stuff
looks a bit dubious, and the fact that we often get off by one errors seems to
agree with that.

(Note that this scheme will totally fail in the presense of transactions.)
Hmm...  I notice in syncshadowdb, that it only sets a lock if it's doing a
syncall.  It should probably set a lock even for the normal syncs as well.  If
two processes were to attempt to run a sync at the same time, and both inserted
records into the shadowdb before either cleared the shadowlog table, it could
give those results as well...
>I notice in syncshadowdb, that it only sets a lock if it's doing a syncall.

syncshadowdb uses SET_LOCK and RELEASE_LOCK to ensure that only one syncshadowdb
process at a time touches the shadowlog table.
syncshadowdb is gone, so this is FIXED by implementing the suggestion at comment
#2 :)
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.