849345 - Design schema for MySQL pushlog database

Reporter

Description

•

11 years ago

We'll need a schema for a MySQL pushlog db. The current single-repository sqlite schema is here:
http://hg.mozilla.org/hgcustom/hghooks/file/tip/mozhghooks/pushlog.py#l40

Axel Hecht [:Pike]

Comment 1

•

11 years ago

One thing that we're exposing in the pushlog API right now is that per-repo push ID.

It'd be good if that would continue to work as is, at least for a transition period.

Hal Wine [:hwine] use NI!

Comment 2

•

11 years ago

As we move forward, we're more likely to be interested in other attributes of the cset, such as branch or bookmark. While hg bookmarks can be retrieved remotely, branches can not.

Since we're redoing the schema, seems a good time to add (at least) hg branch to the schema. Bonus points for adding the bookmarks (but note that is plural in general, fine to just store/report the active one, imo).

If we do add branch & bookmark, we'll have to pick an arbitrary maximum length for MySQL (based on Sheeri's comments in bug 827123). Please coordinate with RelEng on picking something appropriate (as much of the automation generates the longer strings)

Nick Thomas [:nthomas] (UTC+12)

Comment 3

•

11 years ago

Hal's comment may cover this already, but there are some pushlog operations which are slow because they need to query the repo to get more information. Bonus points for stuffing that into the db instead. I can't remember any examples right now but hopefully Ted/Axel/others might.

Axel Hecht [:Pike]

Comment 4

•

11 years ago

I only use what's currently in the json version of pushes,

 "2240": {
  "date": 1362670203, 
  "changesets": [
   "b859178117e3f70abb4e7e0a5c8b39eb9115577c", 
   "1295953a4b7b93e3ead2fb22f8bff4c9b67a6a2d"
  ], 
  "user": "ffxbld"
 }

the key "2240" being the push id per repo. I don't think I use that actual data for anything but keeping track which pushes I already got for an individual repo.

All the details I need beyond that are added by inspecting a local clone of the repo, which I update once I get a push.

(not currently active) Ted Mielczarek

Reporter

Comment 5

•

11 years ago

How much data do we really need out of the repository? Essentially, as you can see from the schema linked in comment 0, the pushlog db stores a pretty minimal set of information: a push has an id, the user who pushed it, the timestamp, and the changeset ids that were included in this push. Not included is any of the metadata from the changeset: the changeset author, description, branch/tag info and files changed.

Hal Wine [:hwine] use NI!

Comment 6

•

11 years ago

Since we're building for the future here, can we check with the tbpl2 folks to see if they need something to avoid expensive queries?

(not currently active) Ted Mielczarek

Reporter

Comment 7

•

11 years ago

Ed: can you speak to comment 6?

Ed Morley [:emorley]

Comment 8

•

11 years ago

(In reply to Hal Wine [:hwine] from comment #6)
> Since we're building for the future here, can we check with the tbpl2 folks
> to see if they need something to avoid expensive queries?

I can't think of anything, but jeads might.

Flags: needinfo?(jeads)

Sheeri Cabral [:sheeri]

Comment 9

•

11 years ago

Can we also do proactive query reviews on dev/stage to periodically check for expensive queries, so we can catch them before they go into production?

Hal Wine [:hwine] use NI!

Comment 10

•

11 years ago

Ted - one other item dawned on me. It's highly likely that we'll have some repositories on git.m.o being committed to within the next year. That will require that git.m.o support pushlog. With per repo pushlog databases, that didn't seem like a major hurdle.

I'm not sure what the right answer will be - one grand pushlog service, or a per-repo-type pushlog service, or something else. And, I'm not suggesting making those decisions is a blocker to this work. Rather, it may influence some design tradeoffs in this new schema.

(not currently active) Ted Mielczarek

Reporter

Comment 11

•

11 years ago

Hal: is there a bug on supporting a git pushlog? That's the first I've heard of it.

Jonathan Eads ( :jeads )

Comment 12

•

11 years ago

The expensive query part comes from looking at the pushlog joined with test data over long time courses. We've been storing json returned by json-pushes in this mysql schema, https://github.com/mozilla/datazilla/blob/master/datazilla/model/sql/template_schema/schema_hgmozilla.sql.tmpl#L95, to analyze talos data trends over time. For tbpl2 we would like to store either the push id or relevant timestamps directly with test data so we can use it to order test data according to push order when we retrieve it. That way we avoid the expensive part of the join altogether, it would enable us to analyze data over longer time courses. To do this we will likely need to retrieve the push id or timestamp for a particular changeset when we receive/load test data for storage in tbpl2. I don't think this will require anything different in the pushlog schema design but it will make sense to change the data ingestion protocol in tbpl2 to make use of a non-polling strategy, we're hoping to utilize http://pulse.mozilla.org/ to accomplish this.

If there's going to be a git.m.o pushlog in the future it would be great if it could be encapsulated under a single web service for downstream consumers.

Flags: needinfo?(jeads)

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Product: Release Engineering → Developer Services

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/112]

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/112] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] [kanban:engops:https://kanbanize.com/ctrl_board/6/112]

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] [kanban:engops:https://kanbanize.com/ctrl_board/6/112] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738]

Gregory Szorc [:gps]

Comment 13

•

9 years ago

There are no plans to move pushlog off SQLite at this time.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

Design schema for MySQL pushlog database

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

Tracking

(Not tracked)

People

(Reporter: ted, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/738] )

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Updated

Updated

Updated

Updated

Comment 13