Closed Bug 488333 Opened 15 years ago Closed 15 years ago

Set up SFx metrics tools

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: abuchanan, Assigned: chizu)

References

Details

Hey,

For bug 477638 and the SFx v3 launch, we need to parse lots of access logs so that we can build accurate data for Spread Firefox Affiliates.  

At the moment, this will include parsing logs for mozilla.com, download.mozilla.org, and spreadfirefox.com

We're not sure yet how far back in time we'd like to go, we'll decide that today and I'll update here soon.

The script to do this exists, but needs some tweaking to account for past changes in the affiliates program.  I wanted to get this bug filed now though, since it's a big task, so that it's on IT's radar.

I'd like to get this started running this week, if IT thinks that's OK, but I'll know more about timing when I've finished tweaking the parse script.

How far back do our logs for these 3 sites go?

oremj, do you have any tips for parsing this amount of logs?  Would you have time to look over the parse script for serious problems and performance bottlenecks?

Thanks!
Should Metrics be taking over this?
(In reply to comment #1)
> Should Metrics be taking over this?

I'm looking for people from metrics to talk to about this now.  I'm not familiar with what they are capable of yet.  This process would involve matching IPs from one site to another, and I don't know if their weblog framework can handle that.
mozilla.com and spreadfirefox.com logs are trimmed at two months. download goes back a year.
Trevor's talking about what's online - offline on tape we go back further.  

In any event, this is something Metrics should really get involved in.
if this is a one time process - should stay with ops.  if it's an ongoing need, it should head over to metrics.
Assignee: server-ops → thardcastle
chizu, is this ready to go yet?
Depends on: 493836
Depends on: 494022
Metrics will be assisting with the processing of this. It's now waiting on log restores and finding available hardware.
Whiteboard: waiting on restores
Hey all:  We'd like to get this data available and ready to share with Affiliates for our 3.0.4 release (scheduled for first week of Aug).
Let us know if you have any questions!
Restores are done.  What has to happen now?  ETA?
I'll be meeting with Metrics next week to do a training session in Metrics tools and to start working on the next steps.  I'll have a better idea of ETA then.

One open question is what kind of hardware we'll have available for processing.  The current stats script lives on dm-stats01.  I'm not sure what the Metrics tools require, maybe Daniel can elaborate?

Thanks.
Whiteboard: waiting on restores
There's two boxes ready for this. I'll get our basic RHEL image on them and setup to install kettle.
I believe this is being processed right?  What's the ETA to completion?
(In reply to comment #12)
> I believe this is being processed right?  What's the ETA to completion?

No, afaik, Kettle isn't installed on these boxes yet, but this project has been on the back burner anyway.  I hope to figure out a plan & ETA for this next week.
Are you stalled on IT or are we stalled on you?  This bug is nearly 6 months old and I'd love to close it out.
Me too :)
Mostly, I've been stalled on free time.

Completing this will take some work back and forth between me and Metrics.

But this is an Ops bug :)  so here's what I think is left for Ops,

1) Set up the two boxes chizu mentioned in comment #11
* RHEL basics
* Kettle
* a MySQL server

2) Should I have access to these boxes?  I think so.  
They're not affecting any production properties yet, and it would help avoid the "ok, now type this into the CLI" game, and would help give me a feel for how the servers operate.

That should give me enough to run with for now, and I can file other bugs when I need more from Ops.

Btw, the title of this bug was a bit misleading, it's not really on Ops to parse the all-time totals, just to facilitate the tools needed.
Summary: Parse logs to get SFx affiliate all-time totals → Set up SFx metrics tools
Easiest way to install Kettle would be to get the Sun JRE 1.6 package installed then just rsync:

rsync -rti --exclude '*.log' cm-metricsetl01:/opt/pentaho/kettle ...
im-metrics-temp01/02 are setup with RHEL 5.4 + MySQL + java 1.6 and you (Alex or Daniel) should be able to log in as root with your keys. Let me know if you need anything else.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
thank you!
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.