It looks like we closed about 15000 bugs and reopened about 8000, all on the same day. This looks like it coinsides with the server move, so I'm blaming that. The funny numbers are there regardless of what product you pick, and they don't match the query results (eg NEW Tech-Evangalish bugs - query says 2379, the graphs says about 200.
According to the emails I saw going back and forth between Myk and Gerv, we lost one day of stats because of Berkely DB incompatibilities between Solaris and RedHat. Last I heard, the plan at this point is to run the "regenerate" script after b.m.o gets upgraded (the fast regenerate requires a newer version of Bugzilla than 2.17.1)
We lost a day of duplicates stats (and as far as I know, there's no way to get them back, since "regenerate" didn't do it when I backported the new version and ran it, but maybe my backport had some problem). These other stats are different, and I don't know where the data is kept or what the solution for them would be. Anyone know?
But the numbers are still wrong. Those stats aren't in the duplicates db, but rather in text files - can someone manually inspect those?
Yes, the stats for these are in the files in data/mining. If someone could attach data/mining/Browser from mecha, that would be good. Gerv
Immediately it's obvious what's wrong :-) On the 14th of August, the UNCONFIRMED data moved from the first column to the fourth column. This shifted the other three along one, thereby making all four plot incorrectly. The data up until the 14th matches the order of fields at the top of the file, which is UNCONFIRMED|NEW|ASSIGNED|REOPENED. This is very strange, because in collectstats.pl, checksetup.pl, and my own data files, the order is and always has been NEW|ASSIGNED|REOPENED|UNCONFIRMED. So the question is: how did the b.m.o. data get in the order UNCONFIRMED|NEW|ASSIGNED|REOPENED, and stay in that order collecting stats properly for so long, until the upgrade? The plotting code uses the names at the top of the file, and so plots the data according to the old field order - hence the recent jumps. It's very confusing. myk, dave: what exactly happened on the 14th of August? Did we upgrade b.m.o. in any way, or move machines with it? Gerv
That's probably when we set up Bugzilla on mecha. Bugzilla on mothra doesn't switch orders on the 14th, so it's strictly a new-server phenomenon. In addition to the new server, we're using a new version of MySQL; perhaps a query is returning results in a different order?
Myk: could you attach mecha's collectstats.pl, please? Have we run any scripts that mess with this data? Given that the top of the file matches the new order, and that running collectstats.pl merely appends, it's got to be something which writes the whole file which has caused the problem. Did you run collectstats.pl --regenerate at any point? Gerv
collectstats.pl is unchanged from the version that existed on December 12, 2002, although I did backport the --regenerate patch to it at one point and run that.
*** Bug 223691 has been marked as a duplicate of this bug. ***
So, the key question is: is this problem confined to b.m.o. or will other sites see it? Perhaps it was just b.m.o. which had this inexplicable thing where the stats used to be gathered in the wrong order. I've issed a request for help to n.p.m.webtools. We can write a script to fix b.m.o.'s stats. Gerv
Extending summary with what one would first search for...
Gerv, any chance of that script happening so we can have fixed stats for b.m.o at least?
Created attachment 137016 [details] Script to fix Bugzilla stats Here's a script which works on the sample file given. If someone sends me a tarball of all the stats, I'll test it on those too. Or you could just use it :-) In the sample file, there are a couple of stats runs which are a few months out of place. I assume that was a clock glitch on the Bugzilla server. I've not done anything special with those lines; but that may be something that also wants fixing. Dave, Myk: let me know how you want to proceed from here. Gerv
Hmm, script didn't work. It swapped Before: http://bugzilla.mozilla.org/reports.cgi?product=-All-&datasets=UNCONFIRMED%3A&datasets=NEW%3A&datasets=ASSIGNED%3A&datasets=REOPENED%3A After: http://webtools.mozilla.org/bztest/reports.cgi?product=-All-&datasets=UNCONFIRMED%3A&datasets=NEW%3A&datasets=ASSIGNED%3A&datasets=REOPENED%3A This appears to have swapped UNCONFIRMED and REOPENED in the messed up section, and UNCONFIRMED now appears to be correct, but REOPENED, NEW, and ASSIGNED are all still off...
Arse. Two of the columns in the test data have almost the same value, and I got them confused. Or something. Feel free to fix the script - it's trivial to understand. Or, I'll do it this evening. Gerv
*** Bug 227829 has been marked as a duplicate of this bug. ***
Created attachment 137070 [details] Script v.2 Dave: apologies for wasting your time. This should do a better job. Gerv
No waste. :) This looks much better.... Same urls as before (see comment 16) I note this is fixing all the new data... do we need to do something to collectstats.pl to prevent it from messing it up again? Otherwise tonight's stats run is just going to start the problem again.
/me has a nasty feeling he's messed up again... Have I untwisted this all the wrong way? Should I be reversing all the ones prior to the 15th of August, rather than after? I'll check. Gerv
Well, I just so happened to enable the cron job on the test install so we'd get tonight's stats, and it's now past midnight PST, so have a look at the URLs again, and notice that the numbers did indeed swap again on tonight's run. However, doing some quick queries to match up the numbers being reported against the current state of the database seems to indicate that the numbers on the OLD data are correct and collectstats is recording the data in the wrong order currently. Which means your script did the right thing, but collectstats is still broken.
The fields line at the top of the files say this: # fields: DATE|UNCONFIRMED|NEW|ASSIGNED|REOPENED|RESOLVED|VERIFIED|CLOSED|FIXED|INVALID|WONTFIX|LATER|REMIND|DUPLICATE|WORKSFORME|MOVED The one collectstats wants to put in the file if it doesn't exist yet (and the order it writes the stats) is: # fields: DATE|NEW|ASSIGNED|REOPENED|UNCONFIRMED|RESOLVED|VERIFIED|CLOSED|FIXED|INVALID|WONTFIX|LATER|REMIND|DUPLICATE|WORKSFORME|MOVED reports.cgi is reading the labels on that # fields: line and trusting it. so collectstats isn't broken, and yes, the script needs to convert all the old data prior to the swap date, AND swap the order of the fields in the # fields line at the top. OR collectstats needs to be fixed to also read that fields line and write the stats in the same order.
Created attachment 137496 [details] [diff] [review] Script v.3 Here, try this then. It adds a new "fields:" line and switches everything _before_ the key date. Again, note that Browser (and probably the other files two) have a couple of lines with incorrect dates around the 20th of August (they are dated 15th of May) - but they are currently right, and so will be switched wrong by the script. As this data is duplicate data (in terms of the dates), I don't know if this will have a noticeable effect on the plotted graphs. Gerv
justdave: any luck with this version? Gerv
haven't had a chance yet, sorry. Will probably try Friday night sometime.
Yay! all fixed! :) Had to add a \n to the end of the new # fields: line, and also a next; after the print so it didn't print the original one back into the file. Otherwise it worked as designed. Those two weird dates happened to be in between correct ones that were two days apart, so the clock was probably messed up, and it missed a day. I deleted the first one, and manually changed the date on the second to match the missed day in between, which is probably close enough.