Closed Bug 564940 Opened 14 years ago Closed 14 years ago

Please advise on best policy to accomplish data aggregation

Categories

(mozilla.org Graveyard :: Server Operations: Projects, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: murali, Assigned: tellis)

Details

Attachments

(1 file)

Tellis,

As we discussed, here is the problem I am trying to solve. I have a MySQL Database which is back end for a Django application. The data in the DB pertains to daily unittest failures from Tinderbox. Data gets added to DB twice a day using cron job. 

For the sake of application performance, I want to keep only a month's worth of data for the purpose of reporting on the Master DB server and  I want to accumulate all the older data into a backup DB server.

How do I accomplish that in a most efficient way ?

Please look at the attached DB schema.

Thanks
Murali
Assignee: server-ops → tellis
Is the specific issue for application performance?  Is fetching the data from the DB the bottleneck, or is it in the ORM stage, or elsewhere?  Which client has decided that a month's worth of data to be the appropriate amount (or where from requirements is this from)?

In general, before doing any performance optimization, I would take the following steps:
 - assess the need for performance;  how fast is fast enough?  
 - assess client needs; is a month of active data what the client wants?  
 - document client needs and what the problem actually is
 - assess performance enhancement options; using a month's worth of data vs an arbitrary amount is one such optimization.  If the common pages, such as top 25 failures, etc, are of the most concern to the client, then using a push cache of these is likely a bigger gain.  Likewise, a push cache could be done in general.  Also, client-side caching.  Etc.  There are always a million ways to speed up code, but only some will be appropriate to what is desire.
 - profile the code.  You can't know, except heuristically (and then best of luck that your heuristics match the client experience), if your optimizations are doing anything until you can measure where time is going now.

Hopefully some helpful suggestions.  I hesitate into jumping into a specific solution until the problem is defined.
I'm unaware of any benchmarking that's been done on this problem.

In general, this "archive strategy," where old rows are moved onto a slower medium and accessed less-often and in OLAP fashion is a smart strategy in the face of performance problems on the database, but you're right, it can be premature optimisation if the problem doesn't demand that sort of work. It can also be a waste of time if it turns out the front-end code was responsible for the slowness.

Let's all three sit down somewhere and talk a little more about this to be sure we're spending effort in the right place. I could meet today or tomorrow.
(In reply to comment #2)
> I'm unaware of any benchmarking that's been done on this problem.
> 
> In general, this "archive strategy," where old rows are moved onto a slower
> medium and accessed less-often and in OLAP fashion is a smart strategy in the
> face of performance problems on the database, but you're right, it can be
> premature optimisation if the problem doesn't demand that sort of work. It can
> also be a waste of time if it turns out the front-end code was responsible for
> the slowness.

It's my understanding that the main goal,at least for the short-term, is just keeping all test failures archived.  I suppose this is the first question:  what is the problem we're trying to solve?
 
> Let's all three sit down somewhere and talk a little more about this to be sure
> we're spending effort in the right place. I could meet today or tomorrow.

all three==me, you + ctalbert?  I am also free today + tomorrow, though I'm not sure if ctalbert is here
I thought Murali was the one pondering the DB architecture. I may mean myself, Murali, Jeff, C.Talbert. But whomever understands the current problem the best, and needs database-related advice would be who I really mean.
(In reply to comment #4)
> I thought Murali was the one pondering the DB architecture. I may mean myself,
> Murali, Jeff, C.Talbert. But whomever understands the current problem the best,
> and needs database-related advice would be who I really mean.

He was indeed but he is no longer at Mozilla.  I'm not really in a good position to assess the priority of this bug currently, though I've taken over the topfails work so if there is priority here then I'm a person to talk to.
Component: Server Operations → Server Operations: Projects
I don't mind being a fly on the wall to know what is being discussed. If it is alright with you folks. Remember!! I am still a MozVolunteer :)
Severity: normal → minor
Severity: minor → enhancement
Reopen this bug if you need concrete advice. Cleaning my bug queue.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: