Closed
Bug 849345
Opened 11 years ago
Closed 9 years ago
Design schema for MySQL pushlog database
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: ted, Unassigned)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] )
We'll need a schema for a MySQL pushlog db. The current single-repository sqlite schema is here: http://hg.mozilla.org/hgcustom/hghooks/file/tip/mozhghooks/pushlog.py#l40
Comment 1•11 years ago
|
||
One thing that we're exposing in the pushlog API right now is that per-repo push ID. It'd be good if that would continue to work as is, at least for a transition period.
Comment 2•11 years ago
|
||
As we move forward, we're more likely to be interested in other attributes of the cset, such as branch or bookmark. While hg bookmarks can be retrieved remotely, branches can not. Since we're redoing the schema, seems a good time to add (at least) hg branch to the schema. Bonus points for adding the bookmarks (but note that is plural in general, fine to just store/report the active one, imo). If we do add branch & bookmark, we'll have to pick an arbitrary maximum length for MySQL (based on Sheeri's comments in bug 827123). Please coordinate with RelEng on picking something appropriate (as much of the automation generates the longer strings)
Comment 3•11 years ago
|
||
Hal's comment may cover this already, but there are some pushlog operations which are slow because they need to query the repo to get more information. Bonus points for stuffing that into the db instead. I can't remember any examples right now but hopefully Ted/Axel/others might.
Comment 4•11 years ago
|
||
I only use what's currently in the json version of pushes, "2240": { "date": 1362670203, "changesets": [ "b859178117e3f70abb4e7e0a5c8b39eb9115577c", "1295953a4b7b93e3ead2fb22f8bff4c9b67a6a2d" ], "user": "ffxbld" } the key "2240" being the push id per repo. I don't think I use that actual data for anything but keeping track which pushes I already got for an individual repo. All the details I need beyond that are added by inspecting a local clone of the repo, which I update once I get a push.
Reporter | ||
Comment 5•11 years ago
|
||
How much data do we really need out of the repository? Essentially, as you can see from the schema linked in comment 0, the pushlog db stores a pretty minimal set of information: a push has an id, the user who pushed it, the timestamp, and the changeset ids that were included in this push. Not included is any of the metadata from the changeset: the changeset author, description, branch/tag info and files changed.
Comment 6•11 years ago
|
||
Since we're building for the future here, can we check with the tbpl2 folks to see if they need something to avoid expensive queries?
Comment 8•11 years ago
|
||
(In reply to Hal Wine [:hwine] from comment #6) > Since we're building for the future here, can we check with the tbpl2 folks > to see if they need something to avoid expensive queries? I can't think of anything, but jeads might.
Flags: needinfo?(jeads)
Comment 9•11 years ago
|
||
Can we also do proactive query reviews on dev/stage to periodically check for expensive queries, so we can catch them before they go into production?
Comment 10•11 years ago
|
||
Ted - one other item dawned on me. It's highly likely that we'll have some repositories on git.m.o being committed to within the next year. That will require that git.m.o support pushlog. With per repo pushlog databases, that didn't seem like a major hurdle. I'm not sure what the right answer will be - one grand pushlog service, or a per-repo-type pushlog service, or something else. And, I'm not suggesting making those decisions is a blocker to this work. Rather, it may influence some design tradeoffs in this new schema.
Reporter | ||
Comment 11•11 years ago
|
||
Hal: is there a bug on supporting a git pushlog? That's the first I've heard of it.
Comment 12•11 years ago
|
||
The expensive query part comes from looking at the pushlog joined with test data over long time courses. We've been storing json returned by json-pushes in this mysql schema, https://github.com/mozilla/datazilla/blob/master/datazilla/model/sql/template_schema/schema_hgmozilla.sql.tmpl#L95, to analyze talos data trends over time. For tbpl2 we would like to store either the push id or relevant timestamps directly with test data so we can use it to order test data according to push order when we retrieve it. That way we avoid the expensive part of the join altogether, it would enable us to analyze data over longer time courses. To do this we will likely need to retrieve the push id or timestamp for a particular changeset when we receive/load test data for storage in tbpl2. I don't think this will require anything different in the pushlog schema design but it will make sense to change the data ingestion protocol in tbpl2 to make use of a non-polling strategy, we're hoping to utilize http://pulse.mozilla.org/ to accomplish this. If there's going to be a git.m.o pushlog in the future it would be great if it could be encapsulated under a single web service for downstream consumers.
Flags: needinfo?(jeads)
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•10 years ago
|
Product: Release Engineering → Developer Services
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/112]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/112] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] [kanban:engops:https://kanbanize.com/ctrl_board/6/112]
Assignee | ||
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] [kanban:engops:https://kanbanize.com/ctrl_board/6/112] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738]
Comment 13•9 years ago
|
||
There are no plans to move pushlog off SQLite at this time.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•