If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Add ability to run heavy data processing task to Celery workers

NEW
Unassigned

Status

Mozilla Developer Network
Code Cleanup
3 years ago
3 years ago

People

(Reporter: jezdez, Unassigned)

Tracking

Details

(Whiteboard: [specification][type:feature])

(Reporter)

Description

3 years ago
What problems would this solve?
===============================
This is a follow-up on #2047 (https://github.com/mozilla/kuma/pull/2047) in which I argued that the newly added cache_with_field decorator should be considered only the first step to handle heavy data processing. The next step is to allow the ability to run those tasks on Celery workers.

Quoting my comment there: "BTW, this decorator works procedural, but there may be use cases in our code where pushing the caching task to the Celery worker would make sense, to not strain the web workers and increasing the likelihood of the thundering herd problem. In other words, separating the concerns of rendering the page from heavy data processing would be a good strategy forward to handle increased load/use of the site. In the past I've successfully used django-cacheback for that, although that doesn't actually store things in the database but just in the cache backend. That said, maybe that'd be a neat contribution to cacheback -- a Job subclass that is able to write the result to a model field instead of the cache."

https://github.com/codeinthehole/django-cacheback/

Who would use this?
===================
-

What would users see?
=====================
-

What would users do? What would happen as a result?
===================================================
-

Is there anything else we should know?
======================================
Replies from Les in the old GitHub issue (https://github.com/mozilla/kuma/issues/2051):

"This could be interesting for processing content that is not immediately a part of the save / preview cycle - e.g. pre-generated document JSON, search indexing, etc.

But, the problem we've run into while trying to defer document processing in general is that it degrades the content authoring process. In other words, a doc edit that kicks off background processing tasks does not immediately lead to a rendered result that reflects the changes."

My response:

"

That's a fair point and I'm not advocating cutting off the author from writing effectively. What is important to note though is that pushing something to Celery doesn't require to be completely decoupled from the request/response cycle. For some important tasks that have an impact on the user (e.g. the writers) we can simply push the job to the celery worker, but wait for the response instead of moving on. That way we tackle the most pressing issue that the Celery use is all about: reducing the load on the web heads to the far easier scalable Celery workers. You are probably worried that it may lead to a stalled response, but the truth is we're barely using the basic stuff from Celery right now. E.g. we should consider having a separate high priority queue for tasks like this. The workers would in that case always first work on those instead of the less important tasks."
(Reporter)

Updated

3 years ago
Component: General → Code Cleanup
You need to log in before you can comment on or make changes to this bug.