The script which moved components between products incorrectly set the timestamp of the product change in the activity log to the existing modification date on the bug before it bumped the modification date on the bug. This resulted in bugmail being sent which contained no changes or irrelevant changes on many bugs, and none of the bugmail mentioned the product change. Not to mention the history on those bugs is now inaccurate. In the case of components which moved to new products, it should be possible to fix this by looking for any move to that product dated before we actually did it, and correcting the matches. For components which moved to existing products we're going to have a problem. Although in theory any time you move a bug between products outside of this script it's going to have a component change at the same timestamp, so we cam probably track them down by looking for standalone product changes that didn't have a component change at the same timestamp.
I made the following changes to the script to avoid this issue next time: --- movecomponent.pl 2008-07-31 01:14:05.000000000 -0700 +++ movecomponent.pl.new 2008-08-01 22:00:14.000000000 -0700 @@ -114,24 +114,24 @@ # Components $dbh->do("UPDATE components SET product_id = ? WHERE id = ?", undef, ($newprodid, $compid)); +# Mark bugs as touched +# +$dbh->do("UPDATE bugs SET delta_ts = NOW() + WHERE component_id = ?", undef, $compid); + # Update bugs_activity my $userid = 1; # email@example.com $dbh->do("INSERT INTO bugs_activity(bug_id, who, bug_when, fieldid, removed, added) SELECT bug_id, ?, delta_ts, ?, ?, ? FROM bugs WHERE component_id = ?", undef, ($userid, $fieldid, $oldproduct, $newproduct, $compid)); -# Mark bugs as touched -# -$dbh->do("UPDATE bugs SET delta_ts = NOW() - WHERE component_id = ?", undef, $compid); - #$dbh->bz_commit_transaction(); exit(0);
we could check stage or backup to correct this if necessary...
Ah, good point. We did run a backup immediately prior to starting (precisely for this reason, in fact, in case we hosed something). We could look for any product change dated before the reorg that didn't exist in the backup.
Mark: any chance you might be able to help with this, since I'm going to be out on vacation this week? Something of this magnitude probably should have you involved anyway, if nothing more than knowing what's going on. I made a backup immediately prior to starting, it's at tm-bugs01-slave01:/data/backups/mysql/bugs/bugs.2008.07.31.sql.gz Would need to restore that database to a different name on tm-bugs01-master01, then do a cross-db join on the bugs_activity table with a 3-way left join condition on timestamp, bug_id, and fieldid to locate any records that exist in current production that didn't in the backup, and are timestamped prior to 2008-07-31 00:00:00, and reset the timestamp on those records to 2008-07-31 04:30:00 in current production.
bugs_20080731, but it's only on the master at the moment. I figure the best way to do this is select the data, verify it looks right, then do the updates. I can handle that. This should work for getting the data. I've not tested it yet since the import is still running. As soon as it's done, I'll take a look and see if this works like expected. SELECT prod.bug_when, prod.bug_id, prod.fieldid FROM bugs.bugs_activity prod RIGHT JOIN bugs_20080731.bugs_activity backup ON ( prod.bug_when = backup.bug_when AND prod.bug_id = backup.bug_id AND prod.fieldid = backup.fieldid ) WHERE backup.bug_when IS NULL AND prod.bug_when < '2008-07-31 00:00:00'; Your instructions seem pretty straightforward. Once I've found all of the rows in production that are timestamped earlier than midnight that are NOT in the backup, I will update their timestamps to 04:30. I'll keep this bug updated on progress. The import is taking a while, no idea when it will finish. It's still on attach_data 20 minutes in.
Okay. First off, it's a LEFT JOIN, not a RIGHT. Then the query works as expected. However, it seems like this query is going to take a long period of time to run. I just tried to do it and it hit 10 minutes with no sign of stopping, so I killed it as it was locking up the master. Can we do this during the maintenance window tomorrow, or is this something that is ASAP and I should just roll it and get it done with?
I missed the window for today, so this will be in Thursday's window. Judging from the comments, this doesn't seem to be a critical/blocker issue. Please correct me if I'm wrong, and I will find a way to get this done sooner if needed.
Depends on: 449612
Mark: did this happen on Thursday? Gerv
I attempted this Thursday, but it turned into a mess. This has now been resolved as of right now. Storytime! The query above doesn't finish, even after running for 45 minutes. I suspected that the problem was that there are too many rows in the output and MySQL was suffering some sort of catastrophic fail after a certain size. Every time I tried the query with a much limited set of times (i.e., one month at a time), it worked just fine. I wrote a script to give me a bunch of queries - one for each month that Bugzilla has been around. I then ran these and concatenated the results out to a file. Of course, it also seemed to get stuck partway through, at some point in 2004. I did some tweaks and tried again, same deal. Analysis showed that November 2004 contained a lot more records in bugs_activity than were typical. But only a 4-5x increase? I broke November down into daily queries, instead of doing it as one solid month. The 1st-21st went just fine, then it bogged down on the 22nd and would not complete. So I broke the 22nd down into hours. 12AM-4PM were fine, then 5PM broke. I did some more analysis and discovered the reason! There are ~100,000 bugs_activity rows with one of two timestamps. And given that the index is on time, since everything happened in two seconds, it is as if there was on index. MySQL joining two tables with 100,000 rows and no index is pretty slow. It's so slow, it doesn't seem like it will ever complete. I manually queried the contents of this period and dumped them out to two files - "rows from backup" and "rows from production" and then had to write a script that compared the two and found rows that are new. Of course, then it turns out that my "uniqueness parameters" of bug_when + bug_id + fieldid don't work since for all of these rows, the last activity was also a move! There are two move rows! (Apparently Myk reorganized Bugzilla on this day in 2004, just like we're doing now!) Anyway, so I modified the queries to add a restriction to find rows with 'who = 1' which works, since all of the current changes are done by nobody, and the last reorg was done with Myk's user. Huzzah! Once this was done, I appended this data to my original list, and wrote a script that went through and sanity checked each row before updating: 1) row must exist once and only once in production 2) row must not exist in the backup 3) row must be a product move (fieldid 25), error if not 4) perform UPDATE of timestamp, log exact SQL run and old values 5) if error, print error 6) if updated more or less than 1 row, print error Then I ran the script. Updated all ~40,000 bugs that were moved in this most recent reorganization. This is done. I've got the log if I need to revert anything. Sorry it took so long, this was way, way more involved than I expected.
Looks like this is all set. Thanks Mark! Someone reopen this is there's still any issues.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Mark, Thanks :-) That sounds like way above and beyond the call of duty. Are there any lessons here for future reorgs, apart from to use a script without bugs in it? Gerv
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
You need to log in before you can comment on or make changes to this bug.