Closed Bug 1065567 Opened 6 years ago Closed 3 years ago

Use Pulse for creation of hg resultsets

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: camd)

References

Details

Attachments

(2 files)

Broken out from bug 1048043.

If/when bug 1022701 is fixed, we should see if it's viable to use pulse to detect new pushes, rather than having to poll json-pushes.

We may still want to poll json-pushes periodically to catch up any missed pulse notifications, but it would at least mean we can ingest pushes more quickly in the majority of the time.
Component: Treeherder → Treeherder: Data Ingestion
Priority: P4 → P5
Blocks: 1084493
Duplicate of this bug: 1266559
Depends on: 1196915
Ahh shoot, I didn't see this one when I searched.  Thanks for clearing it up.  :)
Assignee: nobody → cdawson
Removing the dependency on the old bug because we don't need it fixed to move forward on this
No longer depends on: 1022701
Summary: Use pulse for quicker pushlog ingestion rather than polling json-pushes → Use Pulse for creation of hg resultsets
Comment on attachment 8861190 [details] [review]
[treeherder] mozilla:hg-pulse-resultsets > mozilla:master

Hey Buddy-- Rebased this old branch, but seems like time to enable this.  

After this is rolled out and working I will submit a PR for Bug 1359246 to remove the celery beat and ingestion mechanism for the old way.
Attachment #8861190 - Flags: review?(emorley)
Blocks: 1359246
When deploying this, add this to PULSE_RESULTSET_SOURCES:

        {
            "exchange": "exchange/hgpushes/v1",
            "routing_keys": [
                "#"
            ]
        }

After deploy, wait a few minutes to be 100% sure the celery beat and pulse mechanisms are overlapping.  Then turn off the celerybeat pushlog mechanism by changing the dyno setting for ``worker_pushlog`` to 0 dynos.
Comment on attachment 8861190 [details] [review]
[treeherder] mozilla:hg-pulse-resultsets > mozilla:master

I've left some comments on the PR :-)
Attachment #8861190 - Flags: review?(emorley)
Comment on attachment 8861190 [details] [review]
[treeherder] mozilla:hg-pulse-resultsets > mozilla:master

Thanks for catching those items.  Apologies that I didn't.  I guess I rushed it.
Attachment #8861190 - Flags: review?(emorley)
Comment on attachment 8861190 [details] [review]
[treeherder] mozilla:hg-pulse-resultsets > mozilla:master

Awesome! :-)
Attachment #8861190 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/23c40414c0c9dec6160628e3e577d076db642128
Bug 1065567 - Add ability to ingest Mercurial Pushes via Pulse (#2420)

This uses the same mechanism we use for ingesting GitHub pushes.
This adds an additional Transformer for HG pushes, and requires
adding the Pulse exchange of ``exchange/hgpushes/v1`` to the
existing PULSE_RESULTSET_SOURCES environment variable.
The next PR will remove the proc and code for the old hg push ingestion mechanism and unify the Push ingestion for ``ingest_push`` and ``resultset_loader``.
Stage is showing some keyerror exceptions:
https://rpm.newrelic.com/accounts/677903/applications/14179733/filterable_errors#/show/c2ab2c9e-2f50-11e7-931c-0242ac110012_0_4526/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=nu64vc

File "/app/treeherder/etl/tasks/pulse_tasks.py", line 30, in store_pulse_resultsetsFile "/app/treeherder/etl/resultset_loader.py", line 35, in process
exceptions:KeyError: 'details'

I also got a `Queue total messages alarm: treeherder-stage pushlog`, which I'm guessing is due to turning down the pushlog dyno count to zero on stage. Before we do that we have to stop celery scheduling pushlog tasks, otherwise it causes queue alerts :-)
(In reply to Cameron Dawson [:camd] from comment #14)
> The next PR will remove the proc and code for the old hg push ingestion
> mechanism and unify the Push ingestion for ``ingest_push`` and
> ``resultset_loader``.

Pulse is not a reliable delivery channel. hg.mozilla.org has some robustness so it guarantees at least once delivery to Pulse. But Pulse itself can lose messages. So unless you are OK with data loss in the event that Pulse suffers a failure (which happened a few months ago), you should continue to periodically poll the pushlog to make sure you didn't miss any messages. However, the polling interval can be significantly reduced (to say every 5 minutes) because Pulse works 99.9+% of the time.

If you want to get out of the polling pushlog game completely, we could potentially look into a durable SNS topic or similar. The notification mechanism on hg.mozilla.org can ensure at least once delivery and it can write to pretty much anything (including directly to treeherder if we wanted to go that route).
Depends on: 1361518
Hey Cam, I was looking at this in the context of the missing resultsets yesterday. Should we resolve this bug? Per comment 16 it doesn't sound like we want to disable pushlog polling any time soon.
Flags: needinfo?(cdawson)
Comment on attachment 8891471 [details] [review]
[treeherder] mozilla:pushlog-longer-interval > mozilla:master

Hey Will-- Since Ed is on PTO, would you be up for reviewing this micro-PR?  :)
Flags: needinfo?(cdawson)
Attachment #8891471 - Flags: review?(wlachance)
Attachment #8891471 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/12af9cbd01d73fb0cfa91b2e84b172729e4ed829
Bug 1065567 - Decrease pushlog interval to 5 minutes (#2667)

Now that we are also ingesting HG pushes via Pulse, this celery beat
fetch interval is just a failsafe.  So we can decrease the interval from
every minute to every 5 minutes.
I think we're done here.  We've kept the json-pushes polling, but now it's just a back-up to the pulse ingestion.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.