Last Comment Bug 708539 - Active daily user count dropped significantly since 2011-12-05
: Active daily user count dropped significantly since 2011-12-05
Status: RESOLVED FIXED
:
Product: addons.mozilla.org Graveyard
Classification: Graveyard
Component: Statistics (show other bugs)
: unspecified
: All All
: -- normal
: ---
Assigned To: Daniel Einspanjer [:dre] [:deinspanjer]
:
:
Mentors:
http://blog.mozilla.com/addons/2012/0...
: 711817 713647 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-08 00:20 PST by Simon Paquet [:sipaq]
Modified: 2016-02-04 14:54 PST (History)
11 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Simon Paquet [:sipaq] 2011-12-08 00:20:17 PST
See data on

https://addons.mozilla.org/de/thunderbird/addon/lightning/statistics/
https://addons.mozilla.org/en-US/statistics/addon/2313
https://addons.mozilla.org/en-US/statistics/addon/1865

The number of ADUs is approximately 10% lower for the Monday numbers and approx. 50% to 60% lower for the Tuesday numbers than normal. Probably some cluster breakdown, data center move or something else.

As with earlier bugs like bug 683439 this will probably not be fixed, but I still wanted to report it. Perhaps you'll surprise me and fix it.
Comment 1 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-08 06:29:34 PST
Lots of stuff was broken on Tuesday while the Phoenix datacenter was having issues.
Comment 2 Wil Clouser [:clouserw] 2011-12-08 11:56:10 PST
This information comes from logs and the site was actually disabled at the firewall (meaning, no logs) so this is a CANTFIX as far as I know.  Daniel: if I'm missing something please reopen.
Comment 3 Simon Paquet [:sipaq] 2011-12-09 05:18:34 PST
Wil, I know that the datacenter on Tuesday had issues, but the ADU data has been incorrect ever since. The hill structure of the ADU curve on https://addons.mozilla.org/en-US/statistics/addon/2313 indicates to me that even now something is still terribly wrong.

Is that a good enough reason to reopen this bug and investigate the issue?
Comment 4 Wil Clouser [:clouserw] 2011-12-09 16:11:46 PST
There's a new stats dashboard!

https://addons.mozilla.org/en-US/thunderbird/addon/lightning/statistics/

I think it looks like it's following mostly the same trend line.  Going down today and up next week I imagine.
Comment 5 Simon Paquet [:sipaq] 2011-12-10 04:39:43 PST
Wil, I know about the new stats dashboard, but since it shows exactly the same data as the old one, I don't know what benefit it gives us here?

And I really don't follow why the data is following the same trend line. If you at https://addons.mozilla.org/en-US/thunderbird/addon/lightning/statistics/?last=90 you'll notice that our daily ADUs during the week are in the high 1.3M or low 1.4M area with a drop of 100K ADUs on Friday.

This weeks data

Monday:    1,254,787
Tuesday:     729,540
Wednesday: 1,323,269
Thursday:  1,095,863
Friday:    1,067,630

is totally off base and a clear indication of a bug. Trust me, I've been monitoring the Lightning usage data for more than two years. I know when something is severely wrong.

And BTW this is not something that happens just for Lightning. Just look at the Adblock Plus numbers, which are also down by 10% - 20% since the beginning of this week.
Comment 6 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-10 13:51:54 PST
We have had Zeus and datacenter related problems on several days last week that resulted in lost data that would affect this metric.  I definitely understand how annoying it can be, because these disruptions affect all of the Firefox and Thunderbird metrics as well.  Let's wait and see what next week looks like.  I suspect it will stabilize.
Comment 7 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-10 16:43:40 PST
Okay, I just found out the rest of the problem here.

IT brought a new vamo server online starting on the 7th and Metrics wasn't notified so we weren't processing the logs out of that server.

Data on the 5th and 6th were messed up because of the outages, data from the 7th to the 9th were messed up because of missed processing for the new server.

Data for today should be correct (although, of course, lower due to the weekend).
Comment 8 Justin Scott [:fligtar] 2011-12-10 22:24:12 PST
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #7)
> Data on the 5th and 6th were messed up because of the outages, data from the
> 7th to the 9th were messed up because of missed processing for the new
> server.

Does that mean Dec. 7-9 has no logs or they just need to be back-processed?
Comment 9 Simon Paquet [:sipaq] 2011-12-11 05:59:33 PST
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #7)
> Okay, I just found out the rest of the problem here.
> 
> IT brought a new vamo server online starting on the 7th and Metrics wasn't
> notified so we weren't processing the logs out of that server.
> 
> Data on the 5th and 6th were messed up because of the outages, data from the
> 7th to the 9th were messed up because of missed processing for the new
> server.
> 
> Data for today should be correct (although, of course, lower due to the
> weekend).

Thanks for the info, Daniel.

The Saturday data looks alright to me. If (as Justin has already asked) the Wednesday to Friday data could be backprocessed, that would reduce the impact of this bug significantly. Would that be possible?
Comment 10 Simon Paquet [:sipaq] 2011-12-14 01:36:48 PST
Daniel, any update on the possibility of back-processing the data from the 7th to the 9th of December?
Comment 11 Simon Paquet [:sipaq] 2011-12-16 04:01:56 PST
It seems that something is again going wrong. :-( Today's (Thursday) data is again approx. 20% lower than usual for Lightning as well as for Adblock Plus.
Comment 12 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-16 09:05:22 PST
IT was having lots of issues with datacenter connectivity yesterday (2011-12-15), and a lot of the log data did not arrive until well after the cutoff for processing it.

It is not feasible for us to backprocess the data from the 7th to the 9th.  We are going to try to backprocess the data missing from today, but even that is going to take hours to correct.
Comment 13 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-16 15:12:51 PST
Okay, found the delay in processing.  We reprocessed the missing data from 2011-12-15 and I am inserting it now.  Please verify or reopen tomorrow if the data is not correct for either 2011-12-15 or 2011-12-16
Comment 14 Simon Paquet [:sipaq] 2011-12-18 04:37:42 PST
Hey Daniel,

unfortunately things haven't improved. Thursday's data went up a little bit (but not enough), but Friday's and especially Saturday's data is way lower than usual.
I measure this again by looking at Adblock Plus (most popular add-on for Firefox) and Lightning (most popular add-on for Thunderbird).

Therefore I'm reopening this bug :-(
Comment 15 Wil Clouser [:clouserw] 2011-12-19 13:44:28 PST
*** Bug 711817 has been marked as a duplicate of this bug. ***
Comment 16 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-19 17:11:52 PST
Added more servers and backprocessed.  Let me know how it looks tomorrow morning?
Comment 17 testit 2011-12-20 23:39:16 PST
The data from 17th - 20th seems to be still far below normal.

A general question: is there any monitoring on your side to make sure that statistics data processing is working without problems? If not, this would be very appreciated.
Comment 18 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-21 10:14:46 PST
I've double checked the days in question and I can find no sign of missing data.  I think this might be a combination of weekend + holiday trends.

We have lots of checks and safeguards against errors during processing.  However, discovering that there is new data that we aren't tracking is much more difficult and basically requires notification from the IT team.  When that is compounded with the severe load problems the versioncheck.addons.mozilla.org site has been having, it causes a real mess that we have to sort through.
Comment 19 testit 2011-12-22 07:39:25 PST
Thanks for your answer.

Of course there will be fewer users than normal because of christmas holidays, but the difference is in my opinion too large.

Is it possible that the lower numbers are connected to this problem? http://blog.mozilla.com/addons/2011/12/20/statistics-correction-in-developer-dashboards/. In the posting is mentioned, that the total number of users shouldn't be affected, but it looks like that the total number is just the sum of the users of all versions. Any idea if the statistics data is already re-indexed and if this problem is fixed?
Comment 20 Daniel Einspanjer [:dre] [:deinspanjer] 2011-12-22 08:14:56 PST
I don't know, I am not familiar with the system in question on that bug.

I agree that the numbers feel too low to be the result of just a holiday dip. That said, I haven't been able to find any further causes.  I will pick this up and continue investigation after the holidays when the numbers even out again.  Of course we will continue the regular monitoring for any missing or late arriving data.
Comment 21 Wil Clouser [:clouserw] 2011-12-27 08:39:15 PST
*** Bug 713647 has been marked as a duplicate of this bug. ***
Comment 22 Simon Paquet [:sipaq] 2011-12-30 06:28:22 PST
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #20)
> I agree that the numbers feel too low to be the result of just a holiday
> dip. That said, I haven't been able to find any further causes.  I will pick
> this up and continue investigation after the holidays when the numbers even
> out again.  Of course we will continue the regular monitoring for any
> missing or late arriving data.

Daniel, I just compared last year's numbers of Lightning (last 14 days) with this year's number. On average last year's numbers were 5% higher than this year's numbers.

This does not track at all with the data from the rest of 2011, where the Lightning add-on saw a significant growth of 30% to 40% in terms of active users compared with 2010.

So something *is* really wrong here, it doesn't just feel wrong. The number of active users should be considerably higher, by at least 30%. Looking at last year and at 2009 the ADUs on the christmas weekend should at least be in the range of our weekend ADU numbers, but they are far below.
Comment 23 Andrew C 2011-12-30 12:51:46 PST
You can see the exact same pattern in the public stats for Add-block Plus:
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/

If you look at the same time last year, they saw a huge drop on Christmas day but bounced right back up. This drop started around 12/16 or 12/17 and has been sustained for two weeks straight. 

Also, the same pattern is showing for my add-on. In both cases it is about a 30% drop.

I agree with Simon. Something definitely is really wrong here.
Comment 24 Daniel Einspanjer [:dre] [:deinspanjer] 2012-01-03 14:42:16 PST
I agree too that there is some unknown problem.  I've pinged a few more people, but didn't hear back over the holidays.  I will post here once I hear back.
Comment 25 Daniel Einspanjer [:dre] [:deinspanjer] 2012-01-03 16:52:53 PST
IT has found the problem.  They started shifting a third of the versioncheck.addons.mozilla.org traffic (on which we base our add-on usage counts) to a datacenter that was not properly configured for request logging.  That means that Metrics never received the data to process.

IT corrected the problem at 4 PM Pacific today.

Since 3 PM Pacific is the end of the day UTC, we should see a complete recovery starting tomorrow evening when tomorrow's EndOfDay is run.

I'll update tomorrow evening.
Comment 26 Potch [:potch] 2012-01-04 11:09:58 PST
Once these stats are re-indexed, We can re-run the stats indexing job for the affected timerange to update the UI.
Comment 27 Daniel Einspanjer [:dre] [:deinspanjer] 2012-01-04 13:15:52 PST
We are getting the additional traffic in our logs now and the numbers tonight should reflect that.

(In reply to Potch [:potch] from comment #26)
> Once these stats are re-indexed, We can re-run the stats indexing job for
> the affected timerange to update the UI.

Unfortunately, there isn't any data to re-index.  The logs from the missing cluster were never generated, so the time period from Dec 16 through yesterday is permanently down by 33%.
Comment 28 Justin Scott [:fligtar] 2012-01-04 14:50:49 PST
Blogged about this: http://blog.mozilla.com/addons/2012/01/04/issue-corrected-with-add-on-usage-stats/
Comment 29 Simon Paquet [:sipaq] 2012-01-05 08:30:46 PST
Last days numbers show a significant uptick in the range of 50% (which correlates perfectly to the 33% loss). Marking FIXED.

Too bad that the data from the middle of December till now cannot be reindexed :(

Note You need to log in before you can comment on or make changes to this bug.