The default bug view has changed. See this FAQ.

Create a script to parse AMO logs and update add-on counts

RESOLVED FIXED in 3.1

Status

addons.mozilla.org Graveyard
Maintenance Scripts
--
blocker
RESOLVED FIXED
10 years ago
a year ago

People

(Reporter: morgamic, Assigned: sancus)

Tracking

Details

(URL)

Attachments

(5 attachments, 2 obsolete attachments)

(Reporter)

Description

10 years ago
Currently install counts are logged real-time which is a drain on resources.  Parsing logs nightly to update weekly or daily counts would be more useful.

This method will be deprecated in order to save resources and make the app scale better during peak load times.

We need a two things out of this script:
a) count unique logs hitting the downloads controller to generate weekly counts for each add-on, storing the information in addons.weeklydownloads
b) the same script should update addons.totaldownloads as appropriate
(Reporter)

Updated

10 years ago
Target Milestone: --- → 3.1
(Reporter)

Updated

10 years ago
Blocks: 384085
(Reporter)

Comment 1

10 years ago
Andrei said he's working on it...
Assignee: nobody → sancus

Updated

10 years ago
Duplicate of this bug: 386851

Updated

10 years ago
Severity: normal → major

Updated

10 years ago
Duplicate of this bug: 386858

Updated

10 years ago
Severity: major → blocker

Comment 4

10 years ago
Created attachment 272096 [details]
Parses Addons logs and inserts them into the downloads table.

Parses Addons logs and inserts them into the downloads table.

You will also need this table:

CREATE TABLE `logs_parsed` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `logname` varchar(300) NOT NULL default '',
  `done` tinyint(3) unsigned NOT NULL default '1',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `logname` (`logname`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Attachment #272096 - Flags: review?(morgamic)
It seems like the downloads table is kind of an unnecessary middle-man to me.

We're filling it with data, only to delete it after 8 days (via the maintenance script).  As far as I know, the only thing we use the table for is to calculate total downloads and store in the addons table.  Why not just go directly from logs -> addons table, and save the logs for 8 days?

My motivation for the suggestion is that the database dumps are becoming too large to import on khan-vm.  I'll write a script to strip the downloads/logs table before I import it if I have to, but I'd rather have an exact mirror of the production db.
See also: bug 335569, having to do with never getting rid of data
(Reporter)

Comment 7

10 years ago
Parsing 8 days of logs is not efficient.  A day takes about 15 minutes to get parsed.

An alternative would be to store per-day counts in a separate table, which would get us trends over time for later if we chose to offer graphs of d/l over time to users.  Then the parser could just worry about updating that table 1 day at a time and a maint script could just sum 7 records per add-on.  How's that sound?
(In reply to comment #7)
> An alternative would be to store per-day counts in a separate table, which
> would get us trends over time for later if we chose to offer graphs of d/l over
> time to users.  Then the parser could just worry about updating that table 1
> day at a time and a maint script could just sum 7 records per add-on.  How's
> that sound?
> 

It sounds alright to me.  Is this going to be the never-purged table from bug 335569?
(Reporter)

Comment 9

10 years ago
Would make sense to use the same method.  I think that sounds sane.  I'll post a table structure there.

One thing we didn't talk about was dupes.  Andrei, what do you have in mind about avoiding duplicates?  Could you post the PHP version so we can compare?  Do you think the Perl script would work?
(Assignee)

Comment 10

10 years ago
The script that I've written just adjusts the download counts in the addons table directly, although I was considering doing it this way, it seems to me that it's better to just do it directly. Is there a reason not to do that?
(Assignee)

Comment 11

10 years ago
Created attachment 272846 [details] [diff] [review]
download count log parser

This script parses the log, incrementes the download counts and "blacklists" ips for 30 seconds to eliminate duplication of counting. Note that it does not deal with picking the log files to parse.

I think it'd work fine to have a shell script grep a log file for "downloads/file" and then call this script on it -- that way the ~1GB log files become about 15-20MB before this script runs on them.

Also, this script doesn't keep a record of what log files have been parsed in the db, that's actually a good idea that I'll add, to help avoid reparsing the same log file, although there is still an issue if parsing dies partway through a log file.

The script also prints to stdout if updating a count *failed* because no addon could be found for that file, this should never really happen on prod, but in case it does...
From what I understand from morgamic, Andrei is fixing the two issues in his comment above and this will be fixed tomorrow (Thursday) at the latest. Is that correct Andrei?
(Assignee)

Comment 13

10 years ago
Created attachment 274023 [details] [diff] [review]
updated download counter

I've updated the php log parser to automatically parse any access_*.gz log files in a given directory, as the perl one does, as well as keep track of what log files have been parsed.

The script still modifies download counts directly, and really ends the need for the downloads table at all. It also still blocks IPs from being counted for 30s windows, etc.
Attachment #272846 - Attachment is obsolete: true
Attachment #274023 - Flags: review?(morgamic)
(Reporter)

Comment 14

10 years ago
Have you tried this with a full day of logs?  How long does it take to parse a single day?
(Assignee)

Comment 15

10 years ago
I've tested it on a VM, and it takes about 3 hours for a day's worth of logs. Most of that time is the database queries, so I would think it'll be significantly faster on the cluster.

Comment 16

10 years ago
From what I can find by searching Bugzilla, this seems to be the right bug to report on.

The download counter on my themes has been stuck on my stuff for weeks and weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

Two questions

1. Why does the present figure never move? i.e. go from 120 to 121 or something.

2. When this all gets resolved, what happens to the total download figures? Will they include all the download figures since this side broke down? i.e. true total figures from the start of this year, for instance.
(In reply to comment #16)
> The download counter on my themes has been stuck on my stuff for weeks and
> weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

See the AMO web development blog:

http://blog.mozilla.com/webdev/2007/06/30/download-counts-halted/
(Reporter)

Comment 18

10 years ago
Jeremy - will exec() work on any PHP install?  Doesn't redhat turn that off in its RPM?

Comment 19

10 years ago
Looks like safe_mode is off by default so you should be able to use exec().
(Reporter)

Comment 20

10 years ago
Comment on attachment 274023 [details] [diff] [review]
updated download counter

Hey Andrei, I think this looks great but we need to have some of those DEFINE's be arguments that can be passed from CRON:
* LOGDIR
* TEMPDIR
Attachment #274023 - Flags: review?(morgamic) → review-
(Assignee)

Comment 21

10 years ago
Created attachment 274675 [details] [diff] [review]
download counter with arguments

Okay, I changed the constants to required arguments, and I also fixed a small bug with a couple non-download related lines getting through the grep that shouldn't have been.
Attachment #274675 - Flags: review?(morgamic)
(Assignee)

Updated

10 years ago
Attachment #274023 - Attachment is obsolete: true
(Reporter)

Comment 22

10 years ago
Comment on attachment 274675 [details] [diff] [review]
download counter with arguments

Sweet -- looks good.  Jeremy, what's your schedule like?  Can we work on setting up an initial run tomorrow?
Attachment #274675 - Flags: review?(morgamic) → review+

Comment 23

10 years ago
I'm up for it.

Comment 24

10 years ago
I have created this table in the add-ons database:

CREATE TABLE `logs_parsed` (   `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(255) NOT NULL default '',   `done` tinyint(3) unsigned NOT NULL default '1',   PRIMARY KEY  (`id`),   UNIQUE KEY `name` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Comment 25

10 years ago
I am adding the following MySQL grants:

GRANT SELECT ON `addons_remora`.`versions`
GRANT SELECT ON `addons_remora`.`files`
GRANT UPDATE ON `addons_remora`.`addons`
GRANT SELECT, INSERT, UPDATE ON `addons_remora`.`logs_parsed`

(Reporter)

Comment 26

10 years ago
That should work, don't see any other tables hit by the statements.

Comment 27

10 years ago
not to be a pest but... is there an ETA for when this might go live?  Thx.
(Reporter)

Comment 28

10 years ago
Hey Jon, total downloads are currently being updated.  Work is being done on:
a) backfilling June 27th -> Aug 9th from backed up logs
b) restoring weekly counts

We have an IT bug filed for the backlogs, and ETA on that is by Friday.  Andrei -- how is the weekly count patch coming?  What is the ETA for a patch?
(Assignee)

Comment 29

10 years ago
I'm going to have a patch for weekly counts tomorrow/thursday at the latest. I'll try for sooner, though.

This will include a new table that'll hold daily counts for every add-on, to replace the old downloads table completely, we won't store each individual download anymore.

The counting script will update the daily counts in the database, and then the maintenance script will produce weekly counts from those totals.
(Assignee)

Comment 30

10 years ago
Created attachment 276716 [details] [diff] [review]
weekly counting patch
(Assignee)

Comment 31

10 years ago
Created attachment 276717 [details]
daily download counts table

I've upload the patch to the counter script and the new count table, only thing remaining to do is change the maintenance script to update weekly counts from the new table.

Updated

10 years ago
Duplicate of this bug: 392451
(Assignee)

Comment 33

10 years ago
Created attachment 277749 [details] [diff] [review]
maintenance.php weekly download patch

This fixes the weekly count updating in the maintenance script, total downloads are handled by the parser.
(Assignee)

Updated

10 years ago
Attachment #277749 - Flags: review?(morgamic)
(Reporter)

Comment 34

10 years ago
Comment on attachment 277749 [details] [diff] [review]
maintenance.php weekly download patch

Wil can you take a look and test with a log from stats?
Attachment #277749 - Flags: review?(morgamic) → review?(clouserw)

Updated

10 years ago
Attachment #277749 - Flags: review?(clouserw) → review+

Comment 35

10 years ago
Any word yet on when we'll have the weekly counts?  It's been two months ...

And what about this (from http://blog.mozilla.com/webdev/category/amo):

"An improved API for aggregating add-on statistics and integrating it into your blog or external sites"

Is that still in the works?
(Reporter)

Comment 36

10 years ago
This was pushed in the latest update (last Saturday).  Weekly counts are working along with total counts.  Thanks to Sancus and Oremj for their hard work.  :)
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Comment 37

10 years ago
Have the backlogs been processed as well (see comment 28)?

Comment 38

10 years ago
w00t!  Thanks, guys!

Comment 39

10 years ago
Weekly and total counts are back, but the rating fields are not updating.  Was that fix supposed to be included in this bug?

Comment 40

10 years ago
It's still not working in sandbox is it?

Comment 41

10 years ago
I think that it really doesn't work in sandbox. I didn't have a single download in over two weeks (which es unprobable, but not impossible). Since my extension is public, everythings works great! Thanks a lot for fixing!!
(Reporter)

Updated

10 years ago
Attachment #272096 - Flags: review?(morgamic)

Comment 42

10 years ago
Despite this being marked as RESOLVED FIXED, I do not feel it is.

Ever since this bug was claimed to have been fixed, the Weekly Download figures for Themes has been showing at 10% of the true figure. This has been confirmed with other themers. The Weekly Download figures for Extensions, however, appear about right.

As this 10% figure has been applied across the board with Themes, it has no effect on the relative positions of them, in terms of 'most popular'. However, it has made a nonsense of the Total Download figures for all the Themes. 

None of this is supposition on my part, as I have screenshot after screenshot going back to Jan 2006 that will confirm all this.

I would ask, therefore, that either this bug is fixed or else that AMO confirm here that this 10% is the position, for whatever reason. Otherwise, at sometime in the future, I suspect that someone at Mozilla will be waving about a totally meaningless set of figures to support a position that third party Firefox Themes are not in great demand by Firefox users.

Thanks :) 

Comment 43

10 years ago
The download count for our extension is also way too low, quite plausibly around 10% of the true figure. Both our own click count (which redirects to AMO) and the number of new registrations (which is a subset of the number of new downloads) are several times larger than the download count reported by AMO.

Comment 44

10 years ago
Same thing happening to me.  My extension presents a welcome page after installation is complete.  I have a counter on that page that indicates over 42,000 page loads for the last seven days (the page is only loaded once per install).  Dev cpanel recorded only about 5200 downloads during the same period.  While I don't expect an exact one-to-one correlation between the two, the numbers should be *much* closer than they are.
I'm trying to clean up the download counter bugs, several of which have comments on this issue. I'm considering bug 402796 the bug for figuring out if there's a problem with the counts, so please post further comments there.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.