Closed Bug 1079788 Opened 10 years ago Closed 7 years ago

Gather stats on json-pushes load & DB deadlocks

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/693] [treeherder])

Attachments

(1 file)

Broken out from bug 1076826.

Treeherder polls json-pushes once a minute for every hg.mozilla.org repo listed in:
https://github.com/mozilla/treeherder-service/blob/master/treeherder/model/fixtures/repository.json
(This is an improvement over TBPL, where every client polls json-pushes directly)

We've been seeing a number of timeouts polling json-pushes, which we believe have increased in number since we switched from calls of form:
  json-pushes?full=1&startID=N
to:
  json-pushes?full=1&fromchange=SHA123
(The latter request means an additional join: https://hg.mozilla.org/hgcustom/version-control-tools/file/default/hgext/pushlog-legacy/pushlog-feed.py#l102)

We'd like to try and reduce the number of timeouts. To do this it would be helpful to know:

1) What is the breakdown of requests on json-pushes by referrer/UA/params? We don't know who exactly polls it (outside of TBPL's UI & treeherder's backend), and what their respective impact is. I also don't know what impact releng's hgpollers have on it (or if they even use json-pushes over the raw Hg repo).
-> Knowing this will mean we know whether things like bug 1079784 are worth doing.

2) Are there any deadlocks occurring in the pushlog DB?

We'd be interested in perhaps the last few days worth of logs (perhaps a week, if it's not a pain), and particularly for:
hg.mozilla.org/mozilla-central/json-pushes
hg.mozilla.org/integration/mozilla-inbound/json-pushes
hg.mozilla.org/integration/b2g-inbound/json-pushes
hg.mozilla.org/integration/fx-team/json-pushes
hg.mozilla.org/try/json-pushes
FWIW, full=1 will change the request from "fully serviced by SQLite" to "pull in information from Mercurial as well." This will increase latencies. Whether Mercurial or SQLite is responsible for the increased latency is still unknown.
Unfortunately we need the commit message and author for each commit, which is only available with full=1.
Bug 1053567 is related.

I've been campaigning for storing our version control data in a giant database. We'll likely get that for headless try. See bug 910040 comment 26 for what I'd like to build. It's likely pushlog would be some of the first data to go in there after headless try.
:gps that sounds VERY exciting from a treeherder perspective. As :edmorley reported, the current pushlog service is timing out very often. Is there a way we can +1 bug 910040?
(In reply to Mauro Doglio [:mdoglio] from comment #5)
> :gps that sounds VERY exciting from a treeherder perspective. As :edmorley
> reported, the current pushlog service is timing out very often. Is there a
> way we can +1 bug 910040?

Voice your enthusiasm to Laura Thomson. But it's likely migrating pushlog to this database is lower priority than current goals of standing up ReviewBoard and making Try scale. Fortunately, getting the database is part of headless try, so moving pushlog into the db should be the next natural domino to fall.

If your issue is you need to extract some basic data from Mercurial, I'd suggest pulling the repositories locally and extracting data from an offline copy. I know you'd prefer to fetch it from a web service. But it isn't too much work to index repositories. See code at https://bitbucket.org/indygreg/moz-tree-utopia/src/c7cc829ce37cbe6c36221e6aa7d2dbebb8ec4d5e/repodata/hgrepo.py?at=default#cl-34 for an example. https://bitbucket.org/indygreg/moz-tree-utopia/src/c7cc829ce37cbe6c36221e6aa7d2dbebb8ec4d5e/repodata/hgrepo.py?at=default#cl-298 is likely most relevant. Mercurial 3.2 (to be released around Nov 1) allows you to tack -Tjson onto any command to have it dump JSON output instead of human optimized output. This adds more complexity to your systems, but it might be a worthwhile workaround until the server is in a happier state.
(In reply to Gregory Szorc [:gps] from comment #6)
> If your issue is you need to extract some basic data from Mercurial, I'd
> suggest pulling the repositories locally and extracting data from an offline
> copy. I know you'd prefer to fetch it from a web service. But it isn't too
> much work to index repositories.

We already only request the pushlog data once - and then store in treeherder's DB - when we say polling, we mean "polling for new pushes using &fromchange=LAST_SEEN_SHA" not making duplicate queries.
Whiteboard: [treeherder] → [kanban:engops:https://kanbanize.com/ctrl_board/6/212] [treeherder]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/212] [treeherder] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/693] [kanban:engops:https://kanbanize.com/ctrl_board/6/212] [treeherder]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/693] [kanban:engops:https://kanbanize.com/ctrl_board/6/212] [treeherder] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/693] [treeherder]
Blocks: 1076750
No longer blocks: 1076826
QA Contact: hwine → klibby
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: