Closed Bug 1122110 Opened 9 years ago Closed 9 years ago

ADI stats missing since 01/11

Categories

(addons.mozilla.org Graveyard :: Statistics, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
2015-02

People

(Reporter: jorgev, Assigned: jason)

References

()

Details

Attachments

(1 file)

ADI stats dropped to zero on 01/11: https://addons.mozilla.org/addon/adblock-plus/statistics/

There also seems to be a problem calculating average ratings, which could be related to this.
The stats seem to be back for the last couple of days, but they're still missing for last week.
The missing data are for the days 2015-01-12 to 2015-01-15 included (4 days), only for update counts.

As I didn't have any cron error mails, I'm guessing it's because of some missing data coming from hive (the peach-gw server).

Could :jason (or :jlaz?) rebuild the data for those 4 days?

To rebuild the data, please use the steps defined in https://bugzilla.mozilla.org/show_bug.cgi?id=1089358#c10 (comments 10 and 11), but only for the update counts
Assignee: nobody → jthomas
I've started the rebuild process. I will let you know once completed.
Done. Please verify.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/?last=30

I see gaps from Jan 11th to Jan 15th and from Jan 19th to Jan 23rd.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I ran 'update_counts_from_file' and 'index_stats' again for only 2015-01-11 but I am not seeing that reflected in https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/?last=30. :magopian is there a way I can verify if the data has been imported correctly into the DB?
Flags: needinfo?(mathieu)
The following sql request will tell you how many "update count" entries they are for a given day:

mysql> select count(*) from update_counts where date="2015-01-11";
+----------+
| count(*) |
+----------+
|    19355 |
+----------+
1 row in set (0,23 sec)

On production, it seems that for this day (but not for 12 to 15), the data is in the database. However, it's not displaying in the graphs, even after a localstorage clear (reminder: enter "localStorage.clear()" in your javascript console each time you want to pull the data from the server).
Flags: needinfo?(mathieu)
Added another fix to the PR to deal with http://sentry.mktmon.services.phx1.mozilla.com/mkt/addonsmozillaorg/group/13220/ (doesn't seem related to this very issue, but it seems stats related).
I added another commit to the PR, that validates the data before storing it in the database.

I need confirmation that this is something we want to do, as it'll definitely lower the numbers in the stats. We didn't want to do that earlier because we wanted to match the numbers with the previous stats system from pentaho, to make sure we were not missing anything. Now is (maybe?) the time.

Wil, thoughts?
Flags: needinfo?(wclouser)
Can you explain the validation?  We'll need to blog about it before it goes live.  Also CCing some people who will be affected.
Flags: needinfo?(wclouser)
Sure: if you confirm you want this validation, here's what it does (it's only for the update counts, at least for now):
1/ versions: it will only count versions that exist for a given addon. Any update request for a version that doesn't exist won't be counted
2/ statuses: only statuses in the following list will be counted: ["userDisabled,incompatible", "userEnabled", "Unknown", "userDisabled", "userEnabled,incompatible"]
3/ applications: only count if the application exists, and if the version exists for this application
4/ OSes: only count if the os is in the following list: ['all', 'linux', 'mac', 'macosx', 'darwin', 'bsd', 'bsd_os', 'freebsd', 'win', 'winnt', 'windows', 'sun', 'sunos', 'solaris', 'android']
5/ locales: only count if the locale is supported/known by AMO

I'm not sure, especially for point 1/ and 5/.

Regarding point 1/, I understand it could be undercounting if the developer did delete a lot of his widely used versions (does that happen a lot? Do we want to deal with that?). We need to find a way to validate that, and be sure the version number isn't just anything forged (or badly formatted...) which could add up and fill the 64k limit on the TEXT field in mysql.

Regarding point 5/, I'm not sure where the locale in the request is coming from. If it's from the browser, we have no reason to restrict on only the locales that are supported/known by amo. In that case I would change the validation to just be some regex that accepts only those formats:
- es
- es_ar
- es_AR
- es-ar
- es-AR

What do you think?
I like all of them, but I'm not sure how much they'd affect developers.  I'm going to needinfo jorge who will probably have some insight (anyone else feel free to chime in too).
Flags: needinfo?(jorge)
I don't think I have objections to any of these, except #5. I've gotten used to non-AMO versions showing up in stats, but I'm not sure that it's a desirable behavior.

The locale in the requests is the browser locale, many of which may not be supported by AMO. I'm also not sure it's a good idea to restrict the format. Just a quick look turns up the following codes which don't match:

ach
ang
byn
csb
cgg
ckb
crh
csb
fil
fur
gez
haw
kab
kok
mai
nds
nso
sco
szl
tet
tig
wal
roa-ES-val

And I know I've seen others that are even stranger. Perhaps normalize hyphens and underscores to the same character, though.
All of these are valid locales, that are used by firefox?
I could easily be more lax about locales, and for example only restrict them to be less than 10 chars (as they're already at the moment).
It just feels a tiny bit wrong to not restrict them to real locales (maybe there's a list of "official" locales used in Firefox i could restrict to?)
Firefox doesn't have official locales, as such. There are official localized builds and language packs, but there are also third-party language packs. Several of those locale codes have official or unofficial Firefox language packs, and there are likely other third-party language packs that I don't know about.

These are the ones we have on AMO:

ach af ak an ar as ast ast-ES az bb-BK be bg bn-BD bn-IN br bs ca
ca-valencia cs csb cy cy-GB da de dsb el en-GB en-ZA eo es-AR es-CL
es-ES es-MX et eu fa ff fi fj-FJ fr fur-IT fy-NL ga-IE gd gl gu-IN he
hi hi-IN hr hsb hu hy-AM id is it ja kk km kn ko ku lg lij lt lv mai mg
mk ml mr ms nb-NO nl nn-NO nr nso or pa-IN pl pt-BR pt-PT rm ro ru si
sk sl son sq sr ss st sv-SE sw sw-TZ ta ta-IN ta-LK te th tn tr ts uk
ve vi wa wo-SN xh zap-MX-diiste zh-CN zh-TW zu

Or, at least, the ones we have in the language pack index.
Ok, so seeing this, what do you recommend? Should we just restrict on the maximum length? That would be at least 13 then, so zap-MX-diiste fits?
I think my preference would be to replace _ with -, and then restrict it to /^[a-z]{2}[a-z-]{0,14}$/i (I don't have a strong opinion on the length, just something reasonable)
I agree with Kris that we should try to be lenient with locale codes. The other restrictions look okay, though they might need some adjustments in the future since there are probably some edge cases that affect developers.

It'd be useful to see how big a difference this is for some of the top add-ons. I'd like to get an idea of the impact before we push this. Also, I'm curious as to why this is part of this bug and not a separate thing.
Flags: needinfo?(jorge)
I don't know how we could get an idea of the impact before we push this... since we switched the stats to the "new system" (does it have a name?), we don't have the ability to test the changes prior to pushing them to prod, sadly. It's not possible to run the stats on -dev or stage because they lack data (addons, versions...).

I'll have a try at running this locally, connected with the read-only production database, and some code changes (to cope with the "read-only"-ness of the database), and see if I can get anything useful out of it.

To answer your second question (why this is part of this bug): I believe this bug is caused at least partly because we don't validate the data. This caused some very long invalid data to be parsed, and thus broke the stats system when trying to store the data in the database. This is why some addons have stats for the missing days (the ones that were before the bug occured), and some don't.

I'd be happy to open a new bug for the data validation though, and move the code modifications to this new bug, if you think it's a better idea, please let me know!
(In reply to Mathieu Agopian [:magopian] from comment #21)
> To answer your second question (why this is part of this bug): I believe
> this bug is caused at least partly because we don't validate the data.

I see, thanks.

> I'd be happy to open a new bug for the data validation though, and move the
> code modifications to this new bug, if you think it's a better idea, please
> let me know!

No, it's okay, I just wanted to understand the connection.
It seems to be that the changes led to a massive drop of daily users. For AdBlock Plus the users dropped from 21,500,000 (2015-01-05) users to 14,500,000 (2015-02-02).
Hello testit, the modifications didn't go live yet.
(In reply to Mathieu Agopian [:magopian] from comment #24)
> Hello testit, the modifications didn't go live yet.

Okay, thanks. But what is the reason for this user count drop? You can see this also for Flashblock: https://addons.mozilla.org/en-US/firefox/addon/flashblock/statistics/?last=30
The certificates for the servers that host the update service were revoked, so several hours of update pings failed, and therefore could not be counted. The stats should be back to normal by tomorrow.
Ok, so here are some numbers computed for the date 2015-01-10, for the 10 top usage:

+----------+--------------------------------+----------------------------------------+-----------+-----------+-------+
| addon_id | slug                           | guid                                   | count     | new count | %     |
+----------+--------------------------------+----------------------------------------+-----------+-----------+-------+
|     8150 | default                        | {972ce4c6-7e08-4474-a285-3208198ce6fd} | 106896330 | 0         | 0     |
|   354399 | firefox-hotfix                 | firefox-hotfix@mozilla.org             |  94383496 | 90248264  | 95.61 |
|     1865 | adblock-plus                   | {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} |  19101159 | 19074236  | 99.8  |
|   504302 | mcafee-security-scan-plus      | {e4f94d1e-2f53-401e-8885-681602c0ddd8} |  14834842 | 14834827  | 99.99 |
|   570032 | skype-click-to-call            | {82AF8DCA-6DE9-405D-BD5E-43525BDAD38A} |  11496627 |  9522901  | 82.83 |
|     9449 | microsoft-net-framework-assist | {20a82645-c095-46ed-80e3-08825760534b} |  11101004 |   156259  |  1.40 |
|   473816 | avast-online-security          | wrc@avast.com                          |   7121993 |    36187  |       |
|     6973 | idm-cc                         | mozilla_cc@internetdownloadmanager.com |   6019992 |     3401  |       |
|     3006 | video-downloadhelper           | {b9db16a4-6edc-47ec-a1f4-b86292ed211d} |   5136602 |  5124934  | 99.77 |
|   417168 | english-gb-language-pack       | langpack-en-GB@firefox.mozilla.org     |   2249630 |  1957416  | 87.01 |
+----------+--------------------------------+----------------------------------------+-----------+-----------+-------+
10 rows in set (0,29 sec)

The two last columns I computed manually with numbers I got from running the new code (with data validation) locally.
A few remarks and questions:
1/ I believe the "default" addon is something that is packaged with firefox? Should it have a special treatment, or maybe we don't we care about its stats? It has one version "0" in the database, and thus no version update requested match (they all look like firefox versions)
2/ for the firefox-hotfix addon: there are 4130595 requests that come out with the version "Invalid" from hive, from what I understand from the query (https://github.com/mozilla/olympia/blob/master/apps/stats/management/commands/update_counts_by_version_from_hive.py#L25), it seems to be because the request is malformed? Or should the hive query be changed?
3/ from what I could see on https://addons.mozilla.org, avast-online-security and idm-cc both are disabled? do we still care about the numbers?
4/ I checked the microsoft-net-framework-assist addon to understand why there was such a discrepancy: here is the "update_counts.versions" field in the database (so those are the different versions we get update requests for):

{"10.0": 9,
 "userEnabled": 322,
 "1.0": 2468281,
 "0.0.0": 8476085,
 "1.2": 8,
 "userDisabled": 48,
 "1.3.1": 156203,
 "1.3.0": 3,
 "1.2.2": 5,
 "1.2.1": 38,
 "1.1": 2}

Here are the versions that actually exist right now:

{u'1.1': 2,
 u'1.2': 8,
 u'1.2.1': 38,
 u'1.2.2': 5,
 u'1.3.0': 3,
 u'1.3.1': 156203}

As you can see, neither version "0.0.0" nor "1.0" which count for the most of the users exist (anymore?).

We need to discuss all this... and also the "be lax about locales", because I'm afraid if we don't restrict the data to a limited set of values, we'll keep having this issue (about the data being too big to fit in the database field) over and over again.
(In reply to Mathieu Agopian [:magopian] from comment #28)
> 1/ I believe the "default" addon is something that is packaged with firefox?
> Should it have a special treatment, or maybe we don't we care about its
> stats? It has one version "0" in the database, and thus no version update
> requested match (they all look like firefox versions)

It's the default theme that ships with Firefox. I don't think we care about that stat.

> 2/ for the firefox-hotfix addon: there are 4130595 requests that come out
> with the version "Invalid" from hive, from what I understand from the query
> (https://github.com/mozilla/olympia/blob/master/apps/stats/management/
> commands/update_counts_by_version_from_hive.py#L25), it seems to be because
> the request is malformed? Or should the hive query be changed?

I don't know, but the number is too large to ignore. It should be investigated.

> 3/ from what I could see on https://addons.mozilla.org,
> avast-online-security and idm-cc both are disabled? do we still care about
> the numbers?

Yes, we still care about disabled add-ons.

> 4/ I checked the microsoft-net-framework-assist addon to understand why
> there was such a discrepancy: here is the "update_counts.versions" field in
> the database (so those are the different versions we get update requests
> for):

> As you can see, neither version "0.0.0" nor "1.0" which count for the most
> of the users exist (anymore?).

I think we should still care about versions that are not on AMO. It'd be nice to eventually be able to filter them out in the dashboard, but I think in general they are valid versions that developers will want to know about.
 
> We need to discuss all this... and also the "be lax about locales", because
> I'm afraid if we don't restrict the data to a limited set of values, we'll
> keep having this issue (about the data being too big to fit in the database
> field) over and over again.

It's fine to limit the length. 128 chars? That should cover everything that is valid.
We've discussed this further on IRC with Jorge, and he had very good suggestions:
1/ don't restrict on existing versions, or existing locales
2/ limit the maximum size for the version/locale (for example 20 for locales, and 32 for versions?)
3/ if the final data is too big to fit in the database field, drop the less used versions/locales until it fits

This might result in quite some code changes, and I need to make sure I can come up with a solution that computes fast enough to still be acceptable.
So, regarding the default theme and .NET assistant, most versions have never been on AMO. The default theme is always distributed with Firefox, and the .NET framework assistant is crapware that Microsoft installed automatically with some Windows updates.

Regarding the hotfix add-on, those pings are special. The pings get sent whether the add-on is installed or not, so for most queries the version string is empty. I think ignoring these pings for stats purposes is appropriate. I'm not sure why the query is written to return 'Invalid' rather than the empty string, but *shrug*

And speaking of that query... that query... wow...
The suggestions in comment 30 are good, but I'd like to talk about timelines and priorities.  Mathieu - can you give us a rough estimate of how long it would take and we can weigh that against the add-on signing bugs?  Jorge, Kris, I'm interested in how you'd prioritize the projects also.  (We can move this discussion to email if we want).
I think having reliably working stats is a very high priority. I'm less concerned about having an ideal fix in the short term, as long as we have something that works.
I agree with Kris that we need reliably working stats as soon as possible (backfilling the stats is still possible later, but it's a major pain, and should be avoided as much as possible).

I believe and hope I'll be able to come up with a reliable and good enough solution today.
Ok, so I've come up with a new version of the PR https://github.com/mozilla/olympia/pull/439/. The meat of the code additions in this revision is at https://github.com/mozilla/olympia/pull/439/files#diff-135dc53fbf6376ba3c7e4eeed163a67bR190.

This is the place where we trim the dictionaries to only keep to most used items (with the highest counts) if we can't fit the data in the database. Thanks again Jorge for the idea.

With the PR at this point, I've tested that:
1/ the numbers remain the same for all the stats (we're not changing the way we count, and we're not restricting on valid/existing versions)
2/ the data is better validated for fields that we can validate easily: OSes, applications and their versions, status
3/ we do some light validation on the locale, just to weed out the ones that are really invalid
4/ the versions' length are limited to 32 chars
5/ the issue where we would have the stats processing interrupted because of http://sentry.mktmon.services.phx1.mozilla.com/mkt/addonsmozillaorg/group/13193/ or http://sentry.mktmon.services.phx1.mozilla.com/mkt/addonsmozillaorg/group/13220/ is now fixed
6/ the issue where we would have the stats processing interrupted because the data was too big to fit in the database  is now fixed

I'd love a review before the code freeze (which is tomorrow), to have this included in next week's push.
We landed the patch.  Needinfoing Jorge so he can communicate this however he wants (blog post or something?).  Thanks all.
Flags: needinfo?(jorge)
Fixed in https://github.com/mozilla/olympia/commit/536cfd84fa7e94e4ed532ff7f45c9b25605ccbdb

Regarding the communication: total "daily users" won't change. The only visible changes will be for the drill-down (per status, per application and application version, per OS, per locale): those should be "cleaner", we shouldn't see garbage in those anymore.

Also the "per versions" page will now display only the first 32 chars of the version (if it's longer).

This last point means that if an add-on uses very long version names, then it might end up showing no drill-down. Eg "my super long version which is 3.1" and "my super long version which is 3.9" will both be merged together and displayed as one.

If this is a concern, we can increase this number.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: 2015-01 → 2015-02
I'm needinfoing Jason because once the code is pushed to prod, it'll need the backfilling done again for the missing days.
Flags: needinfo?(jthomas)
I'll write a short heads up for the blog. When is this push happening?
Flags: needinfo?(jorge)
It should be next wednesday, february the 11th
(man I wish I had found this bug sooner...)

I had another suggestion, filter out any blocklisted add-ons/versions. For instance: https://addons.mozilla.org/en-US/firefox/addon/the-fox-only-better/statistics/usage/versions/?last=30

There are over 3400 blocklisted versions of this add-on alone. And that's for these last days where most of the stats are missing, go back to the Dec 29 - Jan 11 timeframe, which has all the data, and that number is over 6000! Blocklisted versions are blocked for a reason, and I don't see how showing them in the dashboards at all helps anyone (other than justifying the need for the new add-on signing system). Not accounting these versions would at least cover some of the "too big to fit in db" issues. Plus, it takes quite a while to load/compute the dashboards sometimes when they're not yet cached.

I mentioned this aspect in the addons-user-experience mailing list some months ago, and at the time of course the efforts were to put the signing system in place first and then look into it from there, integrating the stats with the current blocklist system would/could be understandably messy. But maybe it could have helped with this issue, at least a little. So I thought I'd mention it here again, keep its spark alive in everyone's mind for the future, so-to-speak.
Thanks for the heads up. I'm not sure it would help with the "too big to fit in db" issues, because I'm pretty sure those are because of malicious/forged requests with buffer overflow attempts and the like. 64k is quite a lot for the common case, were we list the versions that are requested (there's something like 900 different versions at most for a single addon), with their count (the number of time the version has been requested).

So the number of requests for blocklisted addons isn't a big deal I believe. That being said, it could indeed make sense to filter them out for perf reason on the client side. Not sure it would make a big change, but still.

If you believe this is something we should do, please open a new bug for that matter, as I don't think it's linked with the issues we have in this one.
We can't filter out pings for blocked add-ons on the client side. They need to send update pings in case updates to non-blocked versions are available. We could probably do it for blocks with '*' as the maximum version, but I doubt that it's worth the effort, and those stats can be useful in any case.
(In reply to Mathieu Agopian [:magopian] from comment #40)
> It should be next wednesday, february the 11th

Are your changes live now? Until now I can't see download/user numbers for 2015-02-11 (https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/?last=30).
Yes, the changes are live now. I see data for 2015-02-11.

We are currently backfilling previous days data. I estimate that it will be completed within the next 24 hours.
Flags: needinfo?(jthomas)
(In reply to Jason Thomas [:jason] from comment #45)
> Yes, the changes are live now. I see data for 2015-02-11.
> 
> We are currently backfilling previous days data. I estimate that it will be
> completed within the next 24 hours.

Okay, thanks. I deleted the browser cache, now I can see the data too.
Stats have been backfilled. Let us know if you see any issues.
Something is still wrong with the stats by application, I don't know if it's global but in my addon it's clear: https://addons.mozilla.org/addon/mouse-gestures-suite/statistics/usage/applications/?last=30 - for example, stats for Firefox 35.0 show values around 30 for a few days, then above 500 for two days, then again back to 30's until now - this can't be right. Numbers for SeaMonkey 2.32 also look abnormal. Some stats are clearly missing for certain days and they don't add up even remotely to the totals reported in the global Daily Users stats.
Depends on: 1133543
(In reply to lemon_juice from comment #49)
> Something is still wrong with the stats by application, I don't know if it's
> global but in my addon it's clear:
> https://addons.mozilla.org/addon/mouse-gestures-suite/statistics/usage/
> applications/?last=30 - for example, stats for Firefox 35.0 show values
> around 30 for a few days, then above 500 for two days, then again back to
> 30's until now - this can't be right. Numbers for SeaMonkey 2.32 also look
> abnormal. Some stats are clearly missing for certain days and they don't add
> up even remotely to the totals reported in the global Daily Users stats.

Thanks, I filed bug 1133543 as a followup.
The stats for my add-on, Toolbar Buttons, dropped down to 0 for today.  So maybe there is still something wrong?  https://addons.mozilla.org/en-US/firefox/addon/toolbar-buttons/statistics/?last=30
See bug 1133543. There's another fix that went live today. Stats for the 10th will need to be backfilled.
I am not sure if that is what I am seeing.  The bug mentions the "by Application" chart.  This is on the overview page (and every other).  Also it talks about backfilled data, but this effects March 10th and now 12th, which looks too new for that bug to apply.
Hello Michael,

could you please post a screenshot of the issue you're seeing? Also, if the stats for this addon are public, can you please post the link too?

Before that, please make sure that you've got the most up to date data from the server by clearing the localstorage on your browser for this page (it'll grab the data back from the server again).

To do that, open a js console (tools>web dev>js console, or something like that, I don't have an english localized browser), then type in: "localStorage.clear()" without the quotes, and respecting the uppercase "S".

Thanks!
(In reply to Mathieu Agopian [:magopian] from comment #54)

> could you please post a screenshot of the issue you're seeing? Also, if the
> stats for this addon are public, can you please post the link too?

You can see the drop e.g. here: https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/?last=30
Ask asked a screen shot showing the drop to 0.  There is a link in my previous comment, and the other poster has given a link to Adblock Plus in the previous comment.
The extension is the featured extension for the month, so I have been obsessively checking my stats ;)
It seems that the script didn't run for the "update counts" on that particular day. Jason, could you please try a backfill and see what happens?

In any case, we'll need to do some backfill for bug 1133543, so I guess this could be done at the same time, what do you think?
Flags: needinfo?(jthomas)
03-11-2015 and 03-15-2015 have been backfilled. No errors on execution.
Flags: needinfo?(jthomas)
Remember to clear the localStorage to get the updated data from the server (on data you've already seen previously): in the javascript console: "localStorage.clear()"

Thanks Jason for the backfilling, from what I can see on adblockplus and toolbar-buttons, the stats are back to normal (no more dip).
> Thanks Jason for the backfilling, from what I can see on adblockplus and
> toolbar-buttons, the stats are back to normal (no more dip).

The previous dips are filled now, but there is a new dip for 2015-03-17: https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/?last=30
This is related to the backfill operations we are performing in bug 1133543. 2015-03-17 will be backfilled shortly.
Backfilled 2015-04-01, 2015-04-12 and 2015-04-13. I've added additional monitoring for stats and adjusted cron run times for stats related tasks https://github.com/mozilla/olympia/commit/90d9bd0e64f7022d62b6a737641841b48a02ed46
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: