Closed Bug 1182201 Opened 9 years ago Closed 9 years ago

performance_series blobs are not compressed

Categories

(Tree Management :: Perfherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(3 files)

Extracted from a binlog on stage as part of bug 1179223:

#150708 23:55:58 server id 22070103  end_log_pos 577079 CRC32 0x29864a81        Query   thread_id=94784291      exec_time=0     error_code=0
use `mozilla_inbound_jobs_1`/*!*/;
SET TIMESTAMP=1436399758/*!*/;
UPDATE `performance_series`                    SET `last_updated` = 1436399757, `blob` = '[{\"std\": 1.91, \"result_set_id\": 5461, \"job_id\": 4390164, \"min\": 2.39, \"max\": 32.47, \"median\": 2.44, \"total_replicates\": 25,
...

No wonder the binlogs are so large.
Blocks: 1179223
Attached file Example blob
Prettified blob attached from the following:

Execute:
> SELECT * FROM mozilla_inbound_jobs_1.performance_series WHERE series_signature = "9ea7ac2f2a3b8876688540640239d5ee8b211278" AND interval_seconds = 31536000 LIMIT 1

+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| id      | interval_seconds      | series_signature      | type      | last_updated      | blob      | active_status      |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| 47156   | 31536000              | 9ea7ac2f2a3b8876688540640239d5ee8b211278 | talos_data | 1436465792        | ...       | active             |
| NULL    | NULL                  | NULL                  | NULL      | NULL              | NULL      | NULL               |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
2 rows

The size of the blob is 800+ KB.

(Said as a lay person, and as someone who doesn't really know much about the current perfherder implementation) - Storing this kind of data structure in the DB seems like a misuse of mysql to me? Particularly given we continually update these series. Is there not a better way we can do this? Some other DB more suited for time series? Using memcached more so we don't continually update these blobs in the DB?

Short term we can compress them to reduce storage impact + binlog impact, but that seems like wallpapering to me?
Flags: needinfo?(wlachance)
Adding the Pythian data folks, as treeherder stage is complaining about disk space.
(In reply to Ed Morley [:emorley] from comment #1)
> Created attachment 8631734 [details]
> Example blob
> 
> Prettified blob attached from the following:
> 
> Execute:
> > SELECT * FROM mozilla_inbound_jobs_1.performance_series WHERE series_signature = "9ea7ac2f2a3b8876688540640239d5ee8b211278" AND interval_seconds = 31536000 LIMIT 1
> 
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | id      | interval_seconds      | series_signature      | type      |
> last_updated      | blob      | active_status      |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | 47156   | 31536000              | 9ea7ac2f2a3b8876688540640239d5ee8b211278
> | talos_data | 1436465792        | ...       | active             |
> | NULL    | NULL                  | NULL                  | NULL      | NULL
> | NULL      | NULL               |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> 2 rows
> 
> The size of the blob is 800+ KB.
> 
> (Said as a lay person, and as someone who doesn't really know much about the
> current perfherder implementation) - Storing this kind of data structure in
> the DB seems like a misuse of mysql to me? Particularly given we continually
> update these series. Is there not a better way we can do this?

> Some other DB
> more suited for time series

I don't know of anything which would be suitable, but I haven't investigated much. The current approach has worked fine up to now...

> Using memcached more so we don't continually
> update these blobs in the DB?

Not an option, we need persistent data storage. We can't afford to recalculate this data in case memcached gets reset, that would take a *really* long time (and we don't keep all the artifacts needed for the range of numbers we store).

> Short term we can compress them to reduce storage impact + binlog impact,
> but that seems like wallpapering to me?

I am open to suggestions, honestly I don't see the problem though. 800k isn't that much data. I have a lot of other things to do, I'd need a pretty strong reason to justify researching and migrating to a new database solution.

Compressing this data should be easy though, I'd be happy to do that. We could also reduce the interval on stage-- we don't need to store the full year's worth of performance numbers there (we do on prod).
Flags: needinfo?(wlachance)
(In reply to William Lachance (:wlach) from comment #3)
> I am open to suggestions, honestly I don't see the problem though. 800k
> isn't that much data. I have a lot of other things to do, I'd need a pretty
> strong reason to justify researching and migrating to a new database
> solution.

The problem is that yet again I'm spending time investigating problems caused by the amount of abuse we cause the DB.

You have to admin that 60GB DB data and 280GB binlogs (for 24hours of logs; when most Mozilla DBs have 10 days of logs saved) is somewhat farcical.

Oncall just had to be paged since this almost took down stage (97% disk usage).

I know this sounds awful, but there are times when I kind of wish perfherder were it's own project and used it's own DB/webheads/workers and the rest of the project wouldn't be affected by it :-(

> Compressing this data should be easy though, I'd be happy to do that. We
> could also reduce the interval on stage-- we don't need to store the full
> year's worth of performance numbers there (we do on prod).

I have a PR for this, will attach now.

Good idea for reducing the interval; guess we can try that next.
s/admin/admit/
It turns out we were accidentally storing some duplicate performance data which exacerbates this problem; filed bug 1182282 to address that.
See Also: → 1182282
See Also: 1182282
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Attachment #8631816 - Flags: review?(wlachance)
Attachment #8631816 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/c4c660f277ba666d9cd01cf248a9d3fdbf643342
Bug 1182201 - Compress blobs in the performance_series table

Since they can be up to 800+ KB in size and whilst there are not many
rows in the table, if they are not compressed it bloats the binlogs.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
This has made a considerable difference actually:

-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:27 treeherder1-bin.012236
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:29 treeherder1-bin.012237
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:31 treeherder1-bin.012238
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:33 treeherder1-bin.012239
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:36 treeherder1-bin.012240
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:38 treeherder1-bin.012241
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:41 treeherder1-bin.012242
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:43 treeherder1-bin.012243
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:45 treeherder1-bin.012244
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:47 treeherder1-bin.012245
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:49 treeherder1-bin.012246
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:51 treeherder1-bin.012247
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:53 treeherder1-bin.012248
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:55 treeherder1-bin.012249
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:57 treeherder1-bin.012250
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:00 treeherder1-bin.012251
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:05 treeherder1-bin.012252
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:08 treeherder1-bin.012253
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:10 treeherder1-bin.012254
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:13 treeherder1-bin.012255
-rw-rw---- 1 mysql mysql 621M Jul  9 22:26 treeherder1-bin.012256

I deployed to stage at ~22:15 (converted to UTC+0 since that's what the times above are in).

Prior to deploy we were starting a new 1.1GB binlog chunk every 2-3 mins.

After deploy it's been 13 mins and we've only half filled the next chunk.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached file Followup
Attachment #8631885 - Flags: review?(wlachance)
Attachment #8631885 - Flags: review?(wlachance) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/83fa4b4e5c388f6537531067b5f015459523ec70
Bug 1182201 - Create a utils.py helper for zlib.decompress try-except

We have several places where we need to decompress blobs, but have to
gracefully handle old data that is not compressed. Let's use a helper
function to save repeating the same pattern everywhere.

https://github.com/mozilla/treeherder/commit/42be24ee66c938293c834eb457ffc1842c69be46
Bug 1182201 - Decompress performance_series blobs in a few more places

Fix a few places where we weren't decompressing the blob retrieved from
the performance_series table.
Is working fine now :-)
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Yay! Thanks for all your hard work everyone.
Blocks: 1179858
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: