Please advise on best policy to accomplish data aggregation

RESOLVED WORKSFORME

Status

mozilla.org Graveyard
Server Operations: Projects
--
enhancement
RESOLVED WORKSFORME
8 years ago
3 years ago

People

(Reporter: murali, Assigned: timellis)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
Created attachment 444511 [details]
Python script to generate the DB schema

Tellis,

As we discussed, here is the problem I am trying to solve. I have a MySQL Database which is back end for a Django application. The data in the DB pertains to daily unittest failures from Tinderbox. Data gets added to DB twice a day using cron job. 

For the sake of application performance, I want to keep only a month's worth of data for the purpose of reporting on the Master DB server and  I want to accumulate all the older data into a backup DB server.

How do I accomplish that in a most efficient way ?

Please look at the attached DB schema.

Thanks
Murali
(Reporter)

Updated

8 years ago
Assignee: server-ops → tellis

Comment 1

8 years ago
Is the specific issue for application performance?  Is fetching the data from the DB the bottleneck, or is it in the ORM stage, or elsewhere?  Which client has decided that a month's worth of data to be the appropriate amount (or where from requirements is this from)?

In general, before doing any performance optimization, I would take the following steps:
 - assess the need for performance;  how fast is fast enough?  
 - assess client needs; is a month of active data what the client wants?  
 - document client needs and what the problem actually is
 - assess performance enhancement options; using a month's worth of data vs an arbitrary amount is one such optimization.  If the common pages, such as top 25 failures, etc, are of the most concern to the client, then using a push cache of these is likely a bigger gain.  Likewise, a push cache could be done in general.  Also, client-side caching.  Etc.  There are always a million ways to speed up code, but only some will be appropriate to what is desire.
 - profile the code.  You can't know, except heuristically (and then best of luck that your heuristics match the client experience), if your optimizations are doing anything until you can measure where time is going now.

Hopefully some helpful suggestions.  I hesitate into jumping into a specific solution until the problem is defined.
(Assignee)

Comment 2

8 years ago
I'm unaware of any benchmarking that's been done on this problem.

In general, this "archive strategy," where old rows are moved onto a slower medium and accessed less-often and in OLAP fashion is a smart strategy in the face of performance problems on the database, but you're right, it can be premature optimisation if the problem doesn't demand that sort of work. It can also be a waste of time if it turns out the front-end code was responsible for the slowness.

Let's all three sit down somewhere and talk a little more about this to be sure we're spending effort in the right place. I could meet today or tomorrow.

Comment 3

8 years ago
(In reply to comment #2)
> I'm unaware of any benchmarking that's been done on this problem.
> 
> In general, this "archive strategy," where old rows are moved onto a slower
> medium and accessed less-often and in OLAP fashion is a smart strategy in the
> face of performance problems on the database, but you're right, it can be
> premature optimisation if the problem doesn't demand that sort of work. It can
> also be a waste of time if it turns out the front-end code was responsible for
> the slowness.

It's my understanding that the main goal,at least for the short-term, is just keeping all test failures archived.  I suppose this is the first question:  what is the problem we're trying to solve?
 
> Let's all three sit down somewhere and talk a little more about this to be sure
> we're spending effort in the right place. I could meet today or tomorrow.

all three==me, you + ctalbert?  I am also free today + tomorrow, though I'm not sure if ctalbert is here
(Assignee)

Comment 4

8 years ago
I thought Murali was the one pondering the DB architecture. I may mean myself, Murali, Jeff, C.Talbert. But whomever understands the current problem the best, and needs database-related advice would be who I really mean.

Comment 5

8 years ago
(In reply to comment #4)
> I thought Murali was the one pondering the DB architecture. I may mean myself,
> Murali, Jeff, C.Talbert. But whomever understands the current problem the best,
> and needs database-related advice would be who I really mean.

He was indeed but he is no longer at Mozilla.  I'm not really in a good position to assess the priority of this bug currently, though I've taken over the topfails work so if there is priority here then I'm a person to talk to.

Updated

8 years ago
Component: Server Operations → Server Operations: Projects
(Reporter)

Comment 6

8 years ago
I don't mind being a fly on the wall to know what is being discussed. If it is alright with you folks. Remember!! I am still a MozVolunteer :)
(Assignee)

Updated

8 years ago
Severity: normal → minor
(Assignee)

Updated

7 years ago
Severity: minor → enhancement
(Assignee)

Comment 7

7 years ago
Reopen this bug if you need concrete advice. Cleaning my bug queue.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.