Closed Bug 564940 Opened 14 years ago Closed 14 years ago

Please advise on best policy to accomplish data aggregation

Tracking

(Not tracked)

Status:

RESOLVED WORKSFORME

People

(Reporter: murali, Assigned: tellis)

Details

Attachments

(1 file)

Python script to generate the DB schema 14 years ago Murali Nandigama [:murali] 3.40 KB, text/plain		Details

Murali Nandigama [:murali]

Reporter

Description

•

14 years ago

Attached file Python script to generate the DB schema — Details

Tellis,

As we discussed, here is the problem I am trying to solve. I have a MySQL Database which is back end for a Django application. The data in the DB pertains to daily unittest failures from Tinderbox. Data gets added to DB twice a day using cron job. 

For the sake of application performance, I want to keep only a month's worth of data for the purpose of reporting on the Master DB server and  I want to accumulate all the older data into a backup DB server.

How do I accomplish that in a most efficient way ?

Please look at the attached DB schema.

Thanks
Murali

Murali Nandigama [:murali]

Reporter

Updated

•

14 years ago

Assignee: server-ops → tellis

Jeff Hammel

Comment 1

•

14 years ago

Is the specific issue for application performance?  Is fetching the data from the DB the bottleneck, or is it in the ORM stage, or elsewhere?  Which client has decided that a month's worth of data to be the appropriate amount (or where from requirements is this from)?

In general, before doing any performance optimization, I would take the following steps:
 - assess the need for performance;  how fast is fast enough?  
 - assess client needs; is a month of active data what the client wants?  
 - document client needs and what the problem actually is
 - assess performance enhancement options; using a month's worth of data vs an arbitrary amount is one such optimization.  If the common pages, such as top 25 failures, etc, are of the most concern to the client, then using a push cache of these is likely a bigger gain.  Likewise, a push cache could be done in general.  Also, client-side caching.  Etc.  There are always a million ways to speed up code, but only some will be appropriate to what is desire.
 - profile the code.  You can't know, except heuristically (and then best of luck that your heuristics match the client experience), if your optimizations are doing anything until you can measure where time is going now.

Hopefully some helpful suggestions.  I hesitate into jumping into a specific solution until the problem is defined.

timellis

Assignee

Comment 2

•

14 years ago

I'm unaware of any benchmarking that's been done on this problem.

In general, this "archive strategy," where old rows are moved onto a slower medium and accessed less-often and in OLAP fashion is a smart strategy in the face of performance problems on the database, but you're right, it can be premature optimisation if the problem doesn't demand that sort of work. It can also be a waste of time if it turns out the front-end code was responsible for the slowness.

Let's all three sit down somewhere and talk a little more about this to be sure we're spending effort in the right place. I could meet today or tomorrow.

Jeff Hammel

Comment 3

•

14 years ago

(In reply to comment #2)
> I'm unaware of any benchmarking that's been done on this problem.
> 
> In general, this "archive strategy," where old rows are moved onto a slower
> medium and accessed less-often and in OLAP fashion is a smart strategy in the
> face of performance problems on the database, but you're right, it can be
> premature optimisation if the problem doesn't demand that sort of work. It can
> also be a waste of time if it turns out the front-end code was responsible for
> the slowness.

It's my understanding that the main goal,at least for the short-term, is just keeping all test failures archived.  I suppose this is the first question:  what is the problem we're trying to solve?
 
> Let's all three sit down somewhere and talk a little more about this to be sure
> we're spending effort in the right place. I could meet today or tomorrow.

all three==me, you + ctalbert?  I am also free today + tomorrow, though I'm not sure if ctalbert is here

timellis

Assignee

Comment 4

•

14 years ago

I thought Murali was the one pondering the DB architecture. I may mean myself, Murali, Jeff, C.Talbert. But whomever understands the current problem the best, and needs database-related advice would be who I really mean.

Jeff Hammel

Comment 5

•

14 years ago

(In reply to comment #4)
> I thought Murali was the one pondering the DB architecture. I may mean myself,
> Murali, Jeff, C.Talbert. But whomever understands the current problem the best,
> and needs database-related advice would be who I really mean.

He was indeed but he is no longer at Mozilla.  I'm not really in a good position to assess the priority of this bug currently, though I've taken over the topfails work so if there is priority here then I'm a person to talk to.

matthew zeier [:mrz]

Updated

•

14 years ago

Component: Server Operations → Server Operations: Projects

Murali Nandigama [:murali]

Reporter

Comment 6

•

14 years ago

I don't mind being a fly on the wall to know what is being discussed. If it is alright with you folks. Remember!! I am still a MozVolunteer :)

timellis

Assignee

Updated

•

14 years ago

Severity: normal → minor

timellis

Assignee

Updated

•

14 years ago

Severity: minor → enhancement

timellis

Assignee

Comment 7

•

14 years ago

Reopen this bug if you need concrete advice. Cleaning my bug queue.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Please advise on best policy to accomplish data aggregation

Categories

(mozilla.org Graveyard :: Server Operations: Projects, task)

Tracking

(Not tracked)

People

(Reporter: murali, Assigned: tellis)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated

Updated

Comment 7

Updated

Attachment

General

Description

File Name

Content Type