1053567 - Pushlog should be self sufficient

Reporter

Description

•

11 years ago

Currently, the pushlog database content is such that any request to the pushlog requires both a query to the database and a (or several) query(ies) to the mercurial repository. Considering all buildbot masters poll the pushlog, as well as every user of tbpl, and possibly many other things, it would be better if the pushlog didn't have to rely on the mercurial repository at all.

Ed Morley [:emorley]

Updated

•

11 years ago

Blocks: 770811

(not currently active) Ted Mielczarek

Comment 1

•

11 years ago

This actually isn't entirely true. For buildbot's purposes the queries only hit the sqlite db. Buildbot (last I knew) is polling the json-pushes endpoint: http://hg.mozilla.org/mozilla-central/json-pushes ...which, unless you specify full=1 only does a database query: http://hg.mozilla.org/hgcustom/pushlog/file/0ffe19e2e343/pushlog-feed.py#l442 Now TBPL is almost certainly polling with full=1, since it wants changeset author information. We didn't store the full changeset information in the pushlog db since it seemed inefficient at the time, duplicating the data that's already in hg.

Gregory Szorc [:gps]

Comment 2

•

11 years ago

I suspect high pushlog query latencies have more to do with overall server load and the fundamental design of pushlog's storage than interaction with Mercurial. It is not very expensive to open a repository and grab metadata from a single changeset, especially if the hgweb context already has a handle on the repository. Also, our current approach to pushlog has us opening and closing a SQLite database with every HTTP request. That's extremely inefficient and is likely contributing a non-trivial amount of overhead to pushlog serving. I wouldn't be surprised if moving pushlog to revlog/Mercurial storage actually decreased work. Until I see Python profiling proving this is a problem, I'm inclined to WONTFIX.

(dormant account)

Updated

•

11 years ago

Blocks: 1055298

(dormant account)

Comment 3

•

11 years ago

We need this to migrate to HEADlessTry. Pushlog needs to have a full url to the newfangled headless urls because there wont be a repository to query. Eg something like http://hgwebserver.com/hg/<mozilla-central>/<bundle-id>/tip will be needed for HEADlessTry, we'll be running old try with existing urls in parallel while the new system stabilizes.

(not currently active) Ted Mielczarek

Comment 4

•

11 years ago

If you don't have a repo, I'm not sure the pushlog buys you anything honestly. I would just architect headlesstry without the current pushlog unless you find you need it somewhere. pushlog solves the problem of "who pushed what to this repo?". If you're using bundles I think you can answer that question just by querying hg against the bundle (and you'll have the username from whatever API is used to transmit the bundle, presumably).

Gregory Szorc [:gps]

Comment 5

•

11 years ago

Support the same URL->JSON API as the existing pushlog and you should be fine. Some mod_rewrite magic on hg.mozilla.org can go a long way towards not having to change TBPL, Treeherder, and other clients. But that would require a hard flag day as opposed to a gradual transition.

Gregory Szorc [:gps]

Comment 6

•

11 years ago

Or you could make the "upload a bundle" operation shoehorn through the Mercurial SSH connection established during push. It just wouldn't use the existing Mercurial bundle transfer mechanism. UX from user perspective wouldn't change - but the underlying "push" mechanism sure would!

Mike Hommey [:glandium]

Reporter

Comment 7

•

11 years ago

Except to keep the json urls tbpl and other clients use working, you need a database of what changeset is in what bundle, then when the url is accessed, you need to pull the bundle and setup hg to use it, and then query there. That seems like a whole lot more work on the hg server side than keeping a pushlog with more data. Although, sure, that would be needed for other non-json urls, although for those, we actually don't need to keep track of what changeset is in what bundle. Sure, we could also change tbpl, treeherder, and every other thing that uses the pushlog. But then we'd have a lot more dependencies to make headless try work.

Ed Morley [:emorley]

Comment 8

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #7) > Sure, we could also change tbpl, treeherder, and every other thing that uses > the pushlog. But then we'd have a lot more dependencies to make headless try > work. Note that TBPL will be going away in the not so distant future, and Treeherder is much more flexible in terms of data model, data ingestion & the API it makes available. As such, tools that currently consume the pushlog, could possibly use treeherder's API, which will also allow them to use associated results (and post regex job names) rather than each having to roll their own. eg: A bisect in the cloud tool can avoid known busted pushes.

(dormant account)

Comment 9

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4) > If you don't have a repo, I'm not sure the pushlog buys you anything > honestly. I would just architect headlesstry without the current pushlog > unless you find you need it somewhere. pushlog solves the problem of "who > pushed what to this repo?". If you're using bundles I think you can answer > that question just by querying hg against the bundle (and you'll have the > username from whatever API is used to transmit the bundle, presumably). headless try is a new 'virtual' repo. pushes into it should be shown alongside existing try pushlog entries. I don't want to force people to look at a new tbpl. This is not within scope of 'fix try'. I think 'bloating' existing entries is a good way to go.

Gregory Szorc [:gps]

Comment 10

•

11 years ago

I think the easiest solution is to pick a flag day and repurpose the https://hg.mozilla.org/try/* URL space to employ magic. Over time, we can convert downstream consumers to something more robust. I believe this solution falls in the spirit of this bug tree (a quick fix to solve Mozilla's immediate needs). mpm has stated he wants Mercurial to scale to tens of thousands of heads and this work is doable: it just requires investing in Mercurial core development. I can make a strong argument that operating a real multi-headed repository is the right thing to do long term. That work would not preclude S3-backed storage of repository data.

Gregory Szorc [:gps]

Updated

•

11 years ago

Depends on: 1055795

(dormant account)

Comment 11

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #10) > I think the easiest solution is to pick a flag day and repurpose the > https://hg.mozilla.org/try/* URL space to employ magic. Over time, we can > convert downstream consumers to something more robust. I believe this > solution falls in the spirit of this bug tree (a quick fix to solve > Mozilla's immediate needs). We can play Apache tricks to make http://hg.mozilla.org/try/json-pushes?full=1 whatever we want...but I think for now we should support logging events that are not caused by a push(eg 'synthetic' pushes to headlesstry). In the spirit of focusing on fixing try quickly, this seems like the simplest way to proceed to get headlesstry going and tied into our existing infrastructure. > > mpm has stated he wants Mercurial to scale to tens of thousands of heads and > this work is doable: it just requires investing in Mercurial core > development. I can make a strong argument that operating a real multi-headed > repository is the right thing to do long term. That work would not preclude > S3-backed storage of repository data. This is offtopic in this bug.

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Product: Webtools → Developer Services

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/43]

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/892] [kanban:engops:https://kanbanize.com/ctrl_board/6/43]

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/892] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/897] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/897] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] [kanban:engops:https://kanbanize.com/ctrl_board/6/43]

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896]

Gregory Szorc [:gps]

Comment 12

•

10 years ago

I was able to make headless repos work without making pushlog self sufficient. Removing blocker.

No longer blocks: 1055298

Gregory Szorc [:gps]

Comment 13

•

10 years ago

There are no plans to perform the work as requested. Instead, we're planning to adapt pushlog and downstream consumers to work with headless repositories.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Bugzilla

Pushlog should be self sufficient

Categories

(Developer Services :: Mercurial: Pushlog, defect)

Tracking

(Not tracked)

People

(Reporter: glandium, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] )

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Updated

Updated

Updated

Updated

Comment 12

Comment 13