Closed Bug 384084 Opened 17 years ago Closed 17 years ago

Create a script to parse AMO logs and update add-on counts

Categories

(addons.mozilla.org Graveyard :: Maintenance Scripts, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: sancus)

References

()

Details

Attachments

(5 files, 2 obsolete files)

Currently install counts are logged real-time which is a drain on resources.  Parsing logs nightly to update weekly or daily counts would be more useful.

This method will be deprecated in order to save resources and make the app scale better during peak load times.

We need a two things out of this script:
a) count unique logs hitting the downloads controller to generate weekly counts for each add-on, storing the information in addons.weeklydownloads
b) the same script should update addons.totaldownloads as appropriate
Target Milestone: --- → 3.1
Blocks: 384085
Andrei said he's working on it...
Assignee: nobody → sancus
Severity: normal → major
Severity: major → blocker
Parses Addons logs and inserts them into the downloads table.

You will also need this table:

CREATE TABLE `logs_parsed` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `logname` varchar(300) NOT NULL default '',
  `done` tinyint(3) unsigned NOT NULL default '1',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `logname` (`logname`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Attachment #272096 - Flags: review?(morgamic)
It seems like the downloads table is kind of an unnecessary middle-man to me.

We're filling it with data, only to delete it after 8 days (via the maintenance script).  As far as I know, the only thing we use the table for is to calculate total downloads and store in the addons table.  Why not just go directly from logs -> addons table, and save the logs for 8 days?

My motivation for the suggestion is that the database dumps are becoming too large to import on khan-vm.  I'll write a script to strip the downloads/logs table before I import it if I have to, but I'd rather have an exact mirror of the production db.
See also: bug 335569, having to do with never getting rid of data
Parsing 8 days of logs is not efficient.  A day takes about 15 minutes to get parsed.

An alternative would be to store per-day counts in a separate table, which would get us trends over time for later if we chose to offer graphs of d/l over time to users.  Then the parser could just worry about updating that table 1 day at a time and a maint script could just sum 7 records per add-on.  How's that sound?
(In reply to comment #7)
> An alternative would be to store per-day counts in a separate table, which
> would get us trends over time for later if we chose to offer graphs of d/l over
> time to users.  Then the parser could just worry about updating that table 1
> day at a time and a maint script could just sum 7 records per add-on.  How's
> that sound?
> 

It sounds alright to me.  Is this going to be the never-purged table from bug 335569?
Would make sense to use the same method.  I think that sounds sane.  I'll post a table structure there.

One thing we didn't talk about was dupes.  Andrei, what do you have in mind about avoiding duplicates?  Could you post the PHP version so we can compare?  Do you think the Perl script would work?
The script that I've written just adjusts the download counts in the addons table directly, although I was considering doing it this way, it seems to me that it's better to just do it directly. Is there a reason not to do that?
Attached patch download count log parser (obsolete) — Splinter Review
This script parses the log, incrementes the download counts and "blacklists" ips for 30 seconds to eliminate duplication of counting. Note that it does not deal with picking the log files to parse.

I think it'd work fine to have a shell script grep a log file for "downloads/file" and then call this script on it -- that way the ~1GB log files become about 15-20MB before this script runs on them.

Also, this script doesn't keep a record of what log files have been parsed in the db, that's actually a good idea that I'll add, to help avoid reparsing the same log file, although there is still an issue if parsing dies partway through a log file.

The script also prints to stdout if updating a count *failed* because no addon could be found for that file, this should never really happen on prod, but in case it does...
From what I understand from morgamic, Andrei is fixing the two issues in his comment above and this will be fixed tomorrow (Thursday) at the latest. Is that correct Andrei?
Attached patch updated download counter (obsolete) — Splinter Review
I've updated the php log parser to automatically parse any access_*.gz log files in a given directory, as the perl one does, as well as keep track of what log files have been parsed.

The script still modifies download counts directly, and really ends the need for the downloads table at all. It also still blocks IPs from being counted for 30s windows, etc.
Attachment #272846 - Attachment is obsolete: true
Attachment #274023 - Flags: review?(morgamic)
Have you tried this with a full day of logs?  How long does it take to parse a single day?
I've tested it on a VM, and it takes about 3 hours for a day's worth of logs. Most of that time is the database queries, so I would think it'll be significantly faster on the cluster.
From what I can find by searching Bugzilla, this seems to be the right bug to report on.

The download counter on my themes has been stuck on my stuff for weeks and weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

Two questions

1. Why does the present figure never move? i.e. go from 120 to 121 or something.

2. When this all gets resolved, what happens to the total download figures? Will they include all the download figures since this side broke down? i.e. true total figures from the start of this year, for instance.
(In reply to comment #16)
> The download counter on my themes has been stuck on my stuff for weeks and
> weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

See the AMO web development blog:

http://blog.mozilla.com/webdev/2007/06/30/download-counts-halted/
Jeremy - will exec() work on any PHP install?  Doesn't redhat turn that off in its RPM?
Looks like safe_mode is off by default so you should be able to use exec().
Comment on attachment 274023 [details] [diff] [review]
updated download counter

Hey Andrei, I think this looks great but we need to have some of those DEFINE's be arguments that can be passed from CRON:
* LOGDIR
* TEMPDIR
Attachment #274023 - Flags: review?(morgamic) → review-
Okay, I changed the constants to required arguments, and I also fixed a small bug with a couple non-download related lines getting through the grep that shouldn't have been.
Attachment #274675 - Flags: review?(morgamic)
Attachment #274023 - Attachment is obsolete: true
Comment on attachment 274675 [details] [diff] [review]
download counter with arguments

Sweet -- looks good.  Jeremy, what's your schedule like?  Can we work on setting up an initial run tomorrow?
Attachment #274675 - Flags: review?(morgamic) → review+
I'm up for it.
I have created this table in the add-ons database:

CREATE TABLE `logs_parsed` (   `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(255) NOT NULL default '',   `done` tinyint(3) unsigned NOT NULL default '1',   PRIMARY KEY  (`id`),   UNIQUE KEY `name` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

I am adding the following MySQL grants:

GRANT SELECT ON `addons_remora`.`versions`
GRANT SELECT ON `addons_remora`.`files`
GRANT UPDATE ON `addons_remora`.`addons`
GRANT SELECT, INSERT, UPDATE ON `addons_remora`.`logs_parsed`

That should work, don't see any other tables hit by the statements.
not to be a pest but... is there an ETA for when this might go live?  Thx.
Hey Jon, total downloads are currently being updated.  Work is being done on:
a) backfilling June 27th -> Aug 9th from backed up logs
b) restoring weekly counts

We have an IT bug filed for the backlogs, and ETA on that is by Friday.  Andrei -- how is the weekly count patch coming?  What is the ETA for a patch?
I'm going to have a patch for weekly counts tomorrow/thursday at the latest. I'll try for sooner, though.

This will include a new table that'll hold daily counts for every add-on, to replace the old downloads table completely, we won't store each individual download anymore.

The counting script will update the daily counts in the database, and then the maintenance script will produce weekly counts from those totals.
I've upload the patch to the counter script and the new count table, only thing remaining to do is change the maintenance script to update weekly counts from the new table.
This fixes the weekly count updating in the maintenance script, total downloads are handled by the parser.
Attachment #277749 - Flags: review?(morgamic)
Comment on attachment 277749 [details] [diff] [review]
maintenance.php weekly download patch

Wil can you take a look and test with a log from stats?
Attachment #277749 - Flags: review?(morgamic) → review?(clouserw)
Attachment #277749 - Flags: review?(clouserw) → review+
Any word yet on when we'll have the weekly counts?  It's been two months ...

And what about this (from http://blog.mozilla.com/webdev/category/amo):

"An improved API for aggregating add-on statistics and integrating it into your blog or external sites"

Is that still in the works?
This was pushed in the latest update (last Saturday).  Weekly counts are working along with total counts.  Thanks to Sancus and Oremj for their hard work.  :)
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Have the backlogs been processed as well (see comment 28)?
w00t!  Thanks, guys!
Weekly and total counts are back, but the rating fields are not updating.  Was that fix supposed to be included in this bug?
It's still not working in sandbox is it?
I think that it really doesn't work in sandbox. I didn't have a single download in over two weeks (which es unprobable, but not impossible). Since my extension is public, everythings works great! Thanks a lot for fixing!!
Attachment #272096 - Flags: review?(morgamic)
Despite this being marked as RESOLVED FIXED, I do not feel it is.

Ever since this bug was claimed to have been fixed, the Weekly Download figures for Themes has been showing at 10% of the true figure. This has been confirmed with other themers. The Weekly Download figures for Extensions, however, appear about right.

As this 10% figure has been applied across the board with Themes, it has no effect on the relative positions of them, in terms of 'most popular'. However, it has made a nonsense of the Total Download figures for all the Themes. 

None of this is supposition on my part, as I have screenshot after screenshot going back to Jan 2006 that will confirm all this.

I would ask, therefore, that either this bug is fixed or else that AMO confirm here that this 10% is the position, for whatever reason. Otherwise, at sometime in the future, I suspect that someone at Mozilla will be waving about a totally meaningless set of figures to support a position that third party Firefox Themes are not in great demand by Firefox users.

Thanks :) 
The download count for our extension is also way too low, quite plausibly around 10% of the true figure. Both our own click count (which redirects to AMO) and the number of new registrations (which is a subset of the number of new downloads) are several times larger than the download count reported by AMO.
Same thing happening to me.  My extension presents a welcome page after installation is complete.  I have a counter on that page that indicates over 42,000 page loads for the last seven days (the page is only loaded once per install).  Dev cpanel recorded only about 5200 downloads during the same period.  While I don't expect an exact one-to-one correlation between the two, the numbers should be *much* closer than they are.
I'm trying to clean up the download counter bugs, several of which have comments on this issue. I'm considering bug 402796 the bug for figuring out if there's a problem with the counts, so please post further comments there.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: