Closed Bug 493763 Opened 15 years ago Closed 15 years ago

Need help debugging Affiliates metrics counting

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: abuchanan, Assigned: justdave)

Details

Hey,

For bug 493567, I need some help debugging a problem with the daily Spread Firefox Affiliates stats tracking.

The script run.pl parses mozilla.com and d.m.o log files to track affiliates points.  It outputs SQL inserts which are saved in some output file.  After the script finishes, those inserts are run on the SFx DB.

I'd like to get a copy of the most recent SQL inserts file, and the config.pl file that goes along with run.pl

We really want to fix this today, so I've made this critical

Thanks
dmoore and I were both poking around earlier, and are unable to find this actually running anywhere in production.  I would assume that would probably explain why they're all zero?
fwiw, we located the code that would mess with the production database on dm-stats01, but can't find any cron jobs that actually trigger it.  There's an sql temp file in there dated May 1st.  Perhaps there used to be a cron job and it got lost, and that's the last time it ran?
(In reply to comment #2)
> fwiw, we located the code that would mess with the production database on
> dm-stats01, but can't find any cron jobs that actually trigger it.  There's an
> sql temp file in there dated May 1st.  Perhaps there used to be a cron job and
> it got lost, and that's the last time it ran?

that sounds plausible, and would explain the symptoms I'm seeing.

This location has run.pl and access to the mozilla.com and d.m.o logs?  Is this a good place to create the cron job, if it doesn't exist?

Could you drop the sql temp file on khan:/home/abuchanan/ please?

Also, I want to double check config.pl has the correct settings, could you drop that on khan also?
(In reply to comment #3)
> (In reply to comment #2)
> This location has run.pl and access to the mozilla.com and d.m.o logs?  Is
> this a good place to create the cron job, if it doesn't exist?

Yes.

> Could you drop the sql temp file on khan:/home/abuchanan/ please?
> 
> Also, I want to double check config.pl has the correct settings, could you
> drop that on khan also?

Done.
Assignee: server-ops → justdave
config.pl looks fine.

We need to set up the cron job:

1) run.pl should run once daily and save the output (SQL queries) to a file, e.g.

perl run.pl > out.sql

2) after run.pl finishes, the SQL queries should be run on the SFX DB, e.g.

mysql sfx_db < out.sql

I'd love to get the output of this cron job if it's not much trouble (e.g. as a daily reminder that it's running, and any errors)

thanks
The cron job on staging looks like this:

0 0 * * * apache cd /data/build/affiliates-download-counting; /usr/bin/php -f logmaker.php; /usr/bin/perl run.pl > affl_output.sql; cat affl_output.sql | mysql -u sfx -h 10.2.70.130 -pXXXXXXXXXX sfx_stage_affiliates_archive
(In reply to comment #6)
> The cron job on staging looks like this:
> 
> 0 0 * * * apache cd /data/build/affiliates-download-counting; /usr/bin/php -f
> logmaker.php; /usr/bin/perl run.pl > affl_output.sql; cat affl_output.sql |
> mysql -u sfx -h 10.2.70.130 -pXXXXXXXXXX sfx_stage_affiliates_archive

that cron job can be removed, we don't need logmaker.php anymore and stage doesn't need to stay updated with affiliates points.  

Anyway, it looks like the cron job meant for production, possibly was added to stage instead.
Is there a way we can monitor this?
(In reply to comment #7)
> (In reply to comment #6)
> Anyway, it looks like the cron job meant for production, possibly was added to
> stage instead.

I set up the one on stage personally.  That was back before it was put on production, when you were still testing it.  I wasn't involved when it went to production though, so we're not sure what happened with that.

So basically, we can copy that cron job without the logmaker part to the production server, right?

(In reply to comment #8)
> Is there a way we can monitor this?

Sure.  The cron job output can go to Alex just like now. :)

Since the temp file it generates sticks around we could potentially stick a file age check on that temp file, too.
(In reply to comment #9)
> 
> So basically, we can copy that cron job without the logmaker part to the
> production server, right?

yes.  and to the production DB, of course
Alex can't fix it if it breaks, so I'd prefer to have a nagios check on something like this.
(In reply to comment #11)
> Alex can't fix it if it breaks, so I'd prefer to have a nagios check on
> something like this.

What do you have in mind for monitoring this?
Some things we could check:
- all counts being zero
- stats changing by > 50% in one day
(In reply to comment #13)
> Some things we could check:
> - all counts being zero
> - stats changing by > 50% in one day

So, something like the AMO stats script fligtar wrote? How about filing a new bug against webdev to write a script like fligtar's check stats script, and once it's written, we'll be happy to add it to nagios.
Ok.  For the meantime, can we implement a nagios check that makes sure the temp file gets updated every day?  per justdave's suggestion in comment #9
OK, the logfiles this thing wants don't appear to be present at the path that's listed in config.pl currently.  I'm not sure which logs it's supposed to be looking at... Given which box this is, I suspect chizu probably knows...
I've fixed the log file paths (the correct logs are found again). The change that broke them was done this morning.
I ran the job manually after Trevor fixed the paths yesterday, and it looks like it ran via the cron job this morning.  Everything looking okay now?  Reopen if not.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
yes, I can see new referrals in the most recent DB dump, thanks.

I don't think the tally scripts have caught up with them yet, so the points still look behind.

I'll keep an eye on it.  Did we settle on a nagios check in the meantime?  I'll work on getting a more elaborate stats check script implemented in the future.

Thanks again.
(In reply to comment #19)
> Did we settle on a nagios check in the meantime?

Might be good to file a bug for that separately just to make sure it happens.  Reed and someone were brainstorming about it on irc the other day, but I don't know if they did it or not.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.