The ingest single push command doesn't run all of the tasks for ingestion synchronously. For things like log parsing, these must currently be handled by a celery worker. However it turns out this celery worker must be started *before* they are scheduled, since unlike normal ingestion, when using ingest_push CELERY_ALWAYS_EAGER is True, which makes celery throw away the jobs if the worker isn't running (rather than persisting them in the rabbitmq queue). http://treeherder.readthedocs.org/installation.html#ingesting-a-single-push-at-a-time
I've filed an issue against Celery since IMO it shouldn't silently fail *and* throw away the job. It should either give an exception, or else still put the jobs in the rabbitmq: https://github.com/celery/celery/issues/2910
Created attachment 8683090 [details] [review] Docs: Emphasise starting the worker before ingest_push
Attachment #8683090 - Flags: review?(wlachance)
Comment on attachment 8683090 [details] [review] Docs: Emphasise starting the worker before ingest_push FWIW, I don't think doc updates like this need review. Looks good though!
Attachment #8683090 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/ed8498710c3794c94552014e29ea09815f2178e6 Bug 1221536 - Docs: Emphasise starting the worker before ingest_push If the worker is not running, any `apply_async()` calls are silently thrown away, due to `ingest_push`'s use of `CELERY_ALWAYS_EAGER` and: https://github.com/celery/celery/issues/2910 As such, running the worker after ingest_push doesn't help (since the rabbitmq queues are empty) and so if people are interested in perf/log data, then they must start the worker first instead.
2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:emorley] from comment #1) > I've filed an issue against Celery since IMO it shouldn't silently fail > *and* throw away the job. It should either give an exception, or else still > put the jobs in the rabbitmq: > https://github.com/celery/celery/issues/2910 Ah so I've finally figured this out. Whilst we were setting the "always eager" setting, that only occurred in the management command's process, whereas log ingestion was triggered via the API call to /jobs/, which was running in the gunicorn process and so had no idea "always eager" was set. Now that we no longer submit to our own API this isn't an issue. I'll file a bug to simplify the docs.
You need to log in before you can comment on or make changes to this bug.