Closed Bug 1221536 Opened 9 years ago Closed 9 years ago

Make it clearer that the celery worker must be started before running ingest_push

Categories

(Tree Management :: Treeherder, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

Details

Attachments

(1 file)

The ingest single push command doesn't run all of the tasks for ingestion synchronously. For things like log parsing, these must currently be handled by a celery worker.

However it turns out this celery worker must be started *before* they are scheduled, since unlike normal ingestion, when using ingest_push CELERY_ALWAYS_EAGER is True, which makes celery throw away the jobs if the worker isn't running (rather than persisting them in the rabbitmq queue).

http://treeherder.readthedocs.org/installation.html#ingesting-a-single-push-at-a-time
I've filed an issue against Celery since IMO it shouldn't silently fail *and* throw away the job. It should either give an exception, or else still put the jobs in the rabbitmq:
https://github.com/celery/celery/issues/2910
Attachment #8683090 - Flags: review?(wlachance)
Comment on attachment 8683090 [details] [review]
Docs: Emphasise starting the worker before ingest_push

FWIW, I don't think doc updates like this need review. Looks good though!
Attachment #8683090 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/ed8498710c3794c94552014e29ea09815f2178e6
Bug 1221536 - Docs: Emphasise starting the worker before ingest_push

If the worker is not running, any `apply_async()` calls are silently
thrown away, due to `ingest_push`'s use of `CELERY_ALWAYS_EAGER` and:
https://github.com/celery/celery/issues/2910

As such, running the worker after ingest_push doesn't help (since the
rabbitmq queues are empty) and so if people are interested in perf/log
data, then they must start the worker first instead.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:emorley] from comment #1)
> I've filed an issue against Celery since IMO it shouldn't silently fail
> *and* throw away the job. It should either give an exception, or else still
> put the jobs in the rabbitmq:
> https://github.com/celery/celery/issues/2910

Ah so I've finally figured this out.

Whilst we were setting the "always eager" setting, that only occurred in the management command's process, whereas log ingestion was triggered via the API call to /jobs/, which was running in the gunicorn process and so had no idea "always eager" was set.

Now that we no longer submit to our own API this isn't an issue. I'll file a bug to simplify the docs.
Component: Treeherder: Docs & Development → TreeHerder
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: