Open Bug 907277 Opened 11 years ago Updated 4 days ago

[tracker] rewrite processor as thread-free service

Categories

(Socorro :: Processor, task, P3)

Tracking

(Not tracked)

People

(Reporter: lonnen, Unassigned)

References

Details

Processor threads sometimes fail to shut down cleanly while they hang on calls to MDSW. We suspect this is because during long IO calls the threads cannot respond to signals. Instead of multithreading, we could simplify the processor to run in a single thread and let upstart, circus, supervisord, etc. handle running multiple copies of processor on a single box.
Assignee: nobody → willkg
Oy.
Component: Backend → Processor
Summary: Implement thread-free processors → [tracker] rewrite processor as thread-free service
Making this a P2. This is something we want to do and I'm going to move a bunch of bugs to block on this one that involve rewriting bits of the processor.
Priority: -- → P3

Unassigning myself since I'm not going to get to this any time soon.

Assignee: willkg → nobody

It's been a long time since this bug was created, so it's prudent to update the mission here.

In our current infrastructure, each processor node runs a Docker container with a single processor process in it. Each processor process runs multiple threads. The processor spends the bulk of its time running minidump-stackwalk on crash reports and moving data over the network with S3 and Elasticsearch.

There are a few disadvantages to the current threaded model:

  1. Threaded architectures are more complex because they have to deal with contention between threads and resources. If we rewrite to a non-threaded model, we can remove some of this contention-handling code and the corresponding tests.
  2. Python threads doing network i/o block the entire process. One way to speed up applications that are network i/o heavy, is to change the model to multiprocess, or coroutines, or switch to asyncio and asyncio-aware libraries. The latter is hard especially since (at the time of this writing) boto3 isn't ascynio-aware, yet. Going multiprocess or coroutines seems helpful. We use multiprocess in some parts of Socorro already and we use coroutines in the collector.
  3. The processor spends the bulk of its time waiting for minidump-stackwalk to run in a separate process. The LLT team is rewriting breakpad bits in Rust and that will eventually involve us switching to a different minidump-stackwalk. We may not want to run that in a separate process. It's a lot simpler to plan for that if we're not dealing with multiple threads.
  4. The current processor architecture doesn't work in other contexts. It's not possible to write a cli that takes a crash id, processes the crash report, and then ends. Having something like that would be really helpful.

Thus, I want to rewrite the processor as a single-thread application and either switch to coroutines or move the concurrency up a level to either a process manager or to the infrastructure.

OS: macOS → Unspecified
Hardware: x86 → Unspecified

Making this block GCP migration because this will make instance sizing and scaling easier.

Blocks: 1687802
Blocks: 1742100
No longer depends on: 1742100
No longer blocks: 1767282
Depends on: 1767282
No longer blocks: 1742100
Depends on: 1742100

Removing this from the GCP migration. We can migrate the processor we've got as is.

No longer blocks: 1687802
No longer blocks: 1795017
See Also: → 1795017
You need to log in before you can comment on or make changes to this bug.