Active daily user count dropped significantly on 2011-08-30



7 years ago
3 years ago


(Reporter: frfxtst, Assigned: dre)





7 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre) Gecko/20110303 Firefox/4.0b13pre
Build ID: 20110303122446

Steps to reproduce:

The active daily user count for 30th of August dropped significantly. Please check e.g. here:

Comment 1

7 years ago
Ever confirmed: true
Assignee: nobody → deinspanjer


7 years ago
Summary: Active daily user count dropped significantly → Active daily user count dropped significantly on 2011-08-30
Hey guys, not sure what is going on here.  I'd normally suspect the migration of AMO to phx datacenter, but I can't think of any way that would not also affect the 28th and 29th as well.

I will investigate tonight and first thing tomorrow and report back by morning Pacific.

Comment 3

7 years ago
The same happened on 31th.

Comment 4

7 years ago
For Easy YouTube Video Downloader we see a 94.91% drop for yesterday's data - August 31st.  Is there an update on the cause of this issue or time to resolution?
IT made some sort of change on the 30th. I'm trying to get details to figure out whether that might have been the cause.
Okay.  The changes that were pushed on the 30th caused us to be missing some data because it arrived after the max time we wait to process logs for each hour.

I configured the process to be forced to wait for data from all the new servers, and it is picking up the missing data from today.  That said, IT has reported having some intermittent problems with our web serving clusters today and there might still be a *slight* dip.

After we get today caught up, I'll work on setting up a backfill process to take care of the previous two days.
Getting caught up with all the missing data from today has been very slow going.  We are still about 4 hours behind, and we have now passed the normally scheduled time for our End of Day (EOD) processes to start so metrics exports will be delayed.
Okay, The data for yesterday looks more reasonable.  That said, please understand that AMO suffered a 3 hour outage on the morning of the 1st UTC, and as such, we have a 10% to 15% depression in the ADU numbers across the board.

I will try to set up a back processing script to import data for the previous two days and run it over the weekend.

Comment 9

7 years ago
The data from 30/31 seems to be still unchanged.
I had to abort the back-processing due to disk space issues.  We are working on clearing up some room so we can resume this, but it is likely to take a while. I'll give an update at the end of the week.

Comment 11

7 years ago
Did you manage to resolve the issue preventing the back-processing?
Ugh, sorry I didn't post an update on this thread.  We have not yet resolved the issue with lack of resources on the current cluster.  We have a new replacement cluster that is being built out, but it had to be reformatted because the wrong distro was installed. :/
While I would be happy to back-process the data if we can easily get to it when the new cluster is up, I would like to know how critical filling in this missing data is.  Is it something we could live without since we have the correct data flowing in at this point?

Comment 13

7 years ago
I would consider this low priority unless anyone else says otherwise. It would still be helpful to get it fixed eventually as the hole in the data messes up things a little bit when my stat checking scripts graph a few things. Certainly no rush though.
Duplicate of this bug: 688030
Daniel: any news on this?
We have the new cluster online, and it has made it possible for us to do the most recent reprocessing that was needed for 2011-12-15.  We don't have the manpower necessary to try to dig up older data to backfill holes at this point though, so I would prefer to close this bug out.

Bug 708539 was the more recent outage where we backfilled for 12-15.
Marking WONTFIX per comment 16.
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
Product: → Graveyard
You need to log in before you can comment on or make changes to this bug.