1182201 - performance_series blobs are not compressed

Assignee

Description

•

9 years ago

Extracted from a binlog on stage as part of bug 1179223:

#150708 23:55:58 server id 22070103  end_log_pos 577079 CRC32 0x29864a81        Query   thread_id=94784291      exec_time=0     error_code=0
use `mozilla_inbound_jobs_1`/*!*/;
SET TIMESTAMP=1436399758/*!*/;
UPDATE `performance_series`                    SET `last_updated` = 1436399757, `blob` = '[{\"std\": 1.91, \"result_set_id\": 5461, \"job_id\": 4390164, \"min\": 2.39, \"max\": 32.47, \"median\": 2.44, \"total_replicates\": 25,
...

No wonder the binlogs are so large.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Blocks: 1179223

Ed Morley [:emorley]

Assignee

Comment 1

•

9 years ago

Attached file Example blob — Details

Prettified blob attached from the following:

Execute:
> SELECT * FROM mozilla_inbound_jobs_1.performance_series WHERE series_signature = "9ea7ac2f2a3b8876688540640239d5ee8b211278" AND interval_seconds = 31536000 LIMIT 1

+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| id      | interval_seconds      | series_signature      | type      | last_updated      | blob      | active_status      |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| 47156   | 31536000              | 9ea7ac2f2a3b8876688540640239d5ee8b211278 | talos_data | 1436465792        | ...       | active             |
| NULL    | NULL                  | NULL                  | NULL      | NULL              | NULL      | NULL               |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
2 rows

The size of the blob is 800+ KB.

(Said as a lay person, and as someone who doesn't really know much about the current perfherder implementation) - Storing this kind of data structure in the DB seems like a misuse of mysql to me? Particularly given we continually update these series. Is there not a better way we can do this? Some other DB more suited for time series? Using memcached more so we don't continually update these blobs in the DB?

Short term we can compress them to reduce storage impact + binlog impact, but that seems like wallpapering to me?

Flags: needinfo?(wlachance)

Sheeri Cabral [:sheeri]

Comment 2

•

9 years ago

Adding the Pythian data folks, as treeherder stage is complaining about disk space.

William Lachance (:wlach)

Comment 3

•

9 years ago

(In reply to Ed Morley [:emorley] from comment #1)
> Created attachment 8631734 [details]
> Example blob
> 
> Prettified blob attached from the following:
> 
> Execute:
> > SELECT * FROM mozilla_inbound_jobs_1.performance_series WHERE series_signature = "9ea7ac2f2a3b8876688540640239d5ee8b211278" AND interval_seconds = 31536000 LIMIT 1
> 
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | id      | interval_seconds      | series_signature      | type      |
> last_updated      | blob      | active_status      |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | 47156   | 31536000              | 9ea7ac2f2a3b8876688540640239d5ee8b211278
> | talos_data | 1436465792        | ...       | active             |
> | NULL    | NULL                  | NULL                  | NULL      | NULL
> | NULL      | NULL               |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> 2 rows
> 
> The size of the blob is 800+ KB.
> 
> (Said as a lay person, and as someone who doesn't really know much about the
> current perfherder implementation) - Storing this kind of data structure in
> the DB seems like a misuse of mysql to me? Particularly given we continually
> update these series. Is there not a better way we can do this?

> Some other DB
> more suited for time series

I don't know of anything which would be suitable, but I haven't investigated much. The current approach has worked fine up to now...

> Using memcached more so we don't continually
> update these blobs in the DB?

Not an option, we need persistent data storage. We can't afford to recalculate this data in case memcached gets reset, that would take a *really* long time (and we don't keep all the artifacts needed for the range of numbers we store).

> Short term we can compress them to reduce storage impact + binlog impact,
> but that seems like wallpapering to me?

I am open to suggestions, honestly I don't see the problem though. 800k isn't that much data. I have a lot of other things to do, I'd need a pretty strong reason to justify researching and migrating to a new database solution.

Compressing this data should be easy though, I'd be happy to do that. We could also reduce the interval on stage-- we don't need to store the full year's worth of performance numbers there (we do on prod).

Flags: needinfo?(wlachance)

Ed Morley [:emorley]

Assignee

Comment 4

•

9 years ago

(In reply to William Lachance (:wlach) from comment #3)
> I am open to suggestions, honestly I don't see the problem though. 800k
> isn't that much data. I have a lot of other things to do, I'd need a pretty
> strong reason to justify researching and migrating to a new database
> solution.

The problem is that yet again I'm spending time investigating problems caused by the amount of abuse we cause the DB.

You have to admin that 60GB DB data and 280GB binlogs (for 24hours of logs; when most Mozilla DBs have 10 days of logs saved) is somewhat farcical.

Oncall just had to be paged since this almost took down stage (97% disk usage).

I know this sounds awful, but there are times when I kind of wish perfherder were it's own project and used it's own DB/webheads/workers and the rest of the project wouldn't be affected by it :-(

> Compressing this data should be easy though, I'd be happy to do that. We
> could also reduce the interval on stage-- we don't need to store the full
> year's worth of performance numbers there (we do on prod).

I have a PR for this, will attach now.

Good idea for reducing the interval; guess we can try that next.

Ed Morley [:emorley]

Assignee

Comment 5

•

9 years ago

s/admin/admit/

William Lachance (:wlach)

Comment 6

•

9 years ago

It turns out we were accidentally storing some duplicate performance data which exacerbates this problem; filed bug 1182282 to address that.

Updated

•

9 years ago

Comment 7

•

9 years ago

Attached file Compress blobs in the performance_series table — Details

Assignee: nobody → emorley

Status: NEW → ASSIGNED

Attachment #8631816 - Flags: review?(wlachance)

William Lachance (:wlach)

Updated

•

9 years ago

Attachment #8631816 - Flags: review?(wlachance) → review+

Treeherder GitHub Bugbot

Comment 8

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/c4c660f277ba666d9cd01cf248a9d3fdbf643342
Bug 1182201 - Compress blobs in the performance_series table

Since they can be up to 800+ KB in size and whilst there are not many
rows in the table, if they are not compressed it bloats the binlogs.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Assignee

Comment 9

•

9 years ago

This has made a considerable difference actually:

-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:27 treeherder1-bin.012236
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:29 treeherder1-bin.012237
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:31 treeherder1-bin.012238
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:33 treeherder1-bin.012239
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:36 treeherder1-bin.012240
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:38 treeherder1-bin.012241
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:41 treeherder1-bin.012242
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:43 treeherder1-bin.012243
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:45 treeherder1-bin.012244
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:47 treeherder1-bin.012245
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:49 treeherder1-bin.012246
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:51 treeherder1-bin.012247
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:53 treeherder1-bin.012248
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:55 treeherder1-bin.012249
-rw-rw---- 1 mysql mysql 1.1G Jul  9 21:57 treeherder1-bin.012250
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:00 treeherder1-bin.012251
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:05 treeherder1-bin.012252
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:08 treeherder1-bin.012253
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:10 treeherder1-bin.012254
-rw-rw---- 1 mysql mysql 1.1G Jul  9 22:13 treeherder1-bin.012255
-rw-rw---- 1 mysql mysql 621M Jul  9 22:26 treeherder1-bin.012256

I deployed to stage at ~22:15 (converted to UTC+0 since that's what the times above are in).

Prior to deploy we were starting a new 1.1GB binlog chunk every 2-3 mins.

After deploy it's been 13 mins and we've only half filled the next chunk.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Ed Morley [:emorley]

Assignee

Comment 10

•

9 years ago

Attached file Followup — Details

Attachment #8631885 - Flags: review?(wlachance)

Ed Morley [:emorley]

Assignee

Comment 11

•

9 years ago

The first PR also appears to have halved the time taken for a performance_series UPDATE, which is pretty nice :-)

https://rpm.newrelic.com/accounts/677903/applications/5585473/datastores?tw%5Bend%5D=1436486914&tw%5Bstart%5D=1436470629#/overview/All/drilldown?metric=Datastore%252Fstatement%252FMySQL%252Fperformance_series%252Fupdate&value=total_call_time_per_minute

William Lachance (:wlach)

Updated

•

9 years ago

Attachment #8631885 - Flags: review?(wlachance) → review+

Treeherder GitHub Bugbot

Comment 12

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/83fa4b4e5c388f6537531067b5f015459523ec70
Bug 1182201 - Create a utils.py helper for zlib.decompress try-except

We have several places where we need to decompress blobs, but have to
gracefully handle old data that is not compressed. Let's use a helper
function to save repeating the same pattern everywhere.

https://github.com/mozilla/treeherder/commit/42be24ee66c938293c834eb457ffc1842c69be46
Bug 1182201 - Decompress performance_series blobs in a few more places

Fix a few places where we weren't decompressing the blob retrieved from
the performance_series table.

Ed Morley [:emorley]

Assignee

Comment 13

•

9 years ago

Is working fine now :-)

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

Resolution: --- → FIXED

Sheeri Cabral [:sheeri]

Comment 14

•

9 years ago

Yay! Thanks for all your hard work everyone.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Blocks: 1179858

Example blob 9 years ago Ed Morley [:emorley] 855.32 KB, text/plain		Details
Compress blobs in the performance_series table 9 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	wlach : review+	Details \| Review
Followup 9 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	wlach : review+	Details \| Review