Closed Bug 1111032 Opened 10 years ago Closed 9 years ago

add caching to balrog rule evaluation

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum, Mentored)

References

Details

(Whiteboard: [lang=python])

Attachments

(1 file, 1 obsolete file)

rule-caching.diff 10 years ago bhearsum@mozilla.com (:bhearsum) 4.02 KB, patch		Details \| Diff \| Splinter Review
simple full-table rule caching 9 years ago bhearsum@mozilla.com (:bhearsum) 41 bytes, text/x-github-pull-request	nthomas : feedback+ bhearsum : checked-in+	Details \| Review

bhearsum@mozilla.com (:bhearsum)

Assignee

Description

•

10 years ago

Attached patch rule-caching.diff (obsolete) — Details — Splinter Review

I started looking at this a bit in bug 671488, but it's less important than blob caching, and different enough to deserve it's own bug. Attached is my really really brain dead rule caching. It's probably not what we want, but it's a starting point. Catlee, Nick, and I talked about this yesterday too. A few notes from that: * The attached patch has too many axis' to be useful. OS Version in particular multiplies into thousands (if not tens of thousands) of values in the real world. We already only query by product, buildTarget, locale, headerArchitecture, and distribution/distVersion (https://github.com/mozilla/balrog/blob/4338af9338012201e4fc355c778305bc09e905b7/auslib/db.py#L709) - creating the cache with those values as keys might be good enough. * Another approach could be to have a short cache on the entire set of rules, and go through the work of evaluating them for every query. We definitely want to cache the rules we get back from the database in some way. What's still up in the air is whether or not we should cache the evaluated rules (ie, incoming request with product, version, branch = foo, bar, blah gets release X). I think it's worth doing some profiling to see whether or not that's a hotspot before we go down that road.

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

9 years ago

Mentor: bhearsum

Whiteboard: [lang=python]

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 1

•

9 years ago

I'm going to look at doing this ASAP, because the database instance size we'd like to use in CloudOps is smaller than the scl3 one, even with just 50% of the traffic at it, the database is seeing significant traffic from the rules table queries.

Assignee: nobody → bhearsum

Blocks: 1248741

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 2

•

9 years ago

Attached file simple full-table rule caching — Details

Attachment #8768524 - Flags: feedback?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Updated

•

9 years ago

Attachment #8768524 - Flags: feedback?(nthomas) → feedback+

[github robot]

Comment 3

•

9 years ago

Commit pushed to master at https://github.com/mozilla/balrog https://github.com/mozilla/balrog/commit/8d5ad72fba921a54b2a9c9a7fb36299821b0f2a6 bug 1111032: Implement rule caching with a few parts of the update query as the key (#97). r=nthomas,rail,mostlygeek

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 4

•

9 years ago

Comment on attachment 8768524 [details] [review] simple full-table rule caching This got some feedback and updates in the PR, and has since been merged.

Attachment #8768524 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 5

•

9 years ago

We did a bunch of verification in this in CloudOps stage and a canary node in prod. In prod, the canary node received about 1/2 the number of bytes that the other nodes did and the load average was also about half of the others - so it's definitely working. There was a drop in CPU usage as well, which may be from decreased overhead (fewer requests to rds), or it may be due to :mostlygeek turning off uwsgi per-request logging at the same time. I think we can do better here still, but this is good enough for now. Possible ideas for the future: * Fully cache rules table instead of querying it for unique combination of certain parts of the updateQuery. * Ensure that we aren't doing a bunch of simultaneous updates of the same key in the same proc. I'm not certain if one proc handles more than one request at a time or not, so this may not be the case at all. repoze.lru may also give us this locking for free if we are. * Use a local memcache on each node. This would reduce memory usage by having a shared cache across all procs as well as reduce the frequency of blob cache rebuilding (we currently kill procs after 5000 requests, which is every few min, and the blob cache will normally last for up to an hour).

Benson Wong [:mostlygeek]

Comment 6

•

9 years ago

> * Ensure that we aren't doing a bunch of simultaneous updates of the same key in the same proc. I'm not certain if one proc handles more than one request at a time or not, so this may not be the case at all. repoze.lru may also give us this locking for free if we are. One request/uwsgi process. No locking required. On the database side, the changes has resulted in a huge reduction in CPU/outgoing traffic. CPU : ~30% down to ~10% Outgoing: ~25MB/sec down to ~5MB/sec

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

9 years ago

Attachment #8535856 - Attachment is obsolete: true

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 7

•

9 years ago

I'm calling this fixed now. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1285953 as a follow-up to improve the caching.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Updated

•

5 years ago

Product: Release Engineering → Release Engineering Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

add caching to balrog rule evaluation

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect)

Tracking

(Not tracked)

People

(Reporter: bhearsum, Assigned: bhearsum, Mentored)

References

Details

(Whiteboard: [lang=python])

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated

Attachment

General

Description

File Name

Content Type