Open Bug 907277 Opened 11 years ago Updated 4 days ago

[tracker] rewrite processor as thread-free service

Categories

(Socorro :: Processor, task, P3)

Product:

Component:

Type:

task

Priority:

P3

Severity:

normal

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: lonnen, Unassigned)

References

Details

Reporter

Description

•

11 years ago

Processor threads sometimes fail to shut down cleanly while they hang on calls to MDSW. We suspect this is because during long IO calls the threads cannot respond to signals. Instead of multithreading, we could simplify the processor to run in a single thread and let upstart, circus, supervisord, etc. handle running multiple copies of processor on a single box.

Reporter

Updated

•

7 years ago

Assignee: nobody → willkg

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 1

•

7 years ago

Oy.

Component: Backend → Processor

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

7 years ago

Summary: Implement thread-free processors → [tracker] rewrite processor as thread-free service

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 2

•

6 years ago

Making this a P2. This is something we want to do and I'm going to move a bunch of bugs to block on this one that involve rewriting bits of the processor.

Priority: -- → P3

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

6 years ago

Depends on: 1357246

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

6 years ago

Depends on: 1383113

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

6 years ago

Depends on: 1470702

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 3

•

5 years ago

Unassigning myself since I'm not going to get to this any time soon.

Assignee: willkg → nobody

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

5 years ago

Depends on: 1453086

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

5 years ago

Depends on: 1529342

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 4

•

5 years ago

It's been a long time since this bug was created, so it's prudent to update the mission here.

In our current infrastructure, each processor node runs a Docker container with a single processor process in it. Each processor process runs multiple threads. The processor spends the bulk of its time running minidump-stackwalk on crash reports and moving data over the network with S3 and Elasticsearch.

There are a few disadvantages to the current threaded model:

Threaded architectures are more complex because they have to deal with contention between threads and resources. If we rewrite to a non-threaded model, we can remove some of this contention-handling code and the corresponding tests.
Python threads doing network i/o block the entire process. One way to speed up applications that are network i/o heavy, is to change the model to multiprocess, or coroutines, or switch to asyncio and asyncio-aware libraries. The latter is hard especially since (at the time of this writing) boto3 isn't ascynio-aware, yet. Going multiprocess or coroutines seems helpful. We use multiprocess in some parts of Socorro already and we use coroutines in the collector.
The processor spends the bulk of its time waiting for minidump-stackwalk to run in a separate process. The LLT team is rewriting breakpad bits in Rust and that will eventually involve us switching to a different minidump-stackwalk. We may not want to run that in a separate process. It's a lot simpler to plan for that if we're not dealing with multiple threads.
The current processor architecture doesn't work in other contexts. It's not possible to write a cli that takes a crash id, processes the crash report, and then ends. Having something like that would be really helpful.

Thus, I want to rewrite the processor as a single-thread application and either switch to coroutines or move the concurrency up a level to either a process manager or to the infrastructure.

OS: macOS → Unspecified

Hardware: x86 → Unspecified

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Blocks: 1795017

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 5

•

2 years ago

Making this block GCP migration because this will make instance sizing and scaling easier.

Blocks: 1687802

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Depends on: 1742100

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Blocks: 1767282

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Blocks: 1742100

No longer depends on: 1742100

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

No longer blocks: 1767282

Depends on: 1767282

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

No longer blocks: 1742100

Depends on: 1742100

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Depends on: 1698682

Will Kahn-Greene [:willkg] ET needinfo? me

Updated

•

2 years ago

Depends on: 1673493

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 6

•

9 months ago

Removing this from the GCP migration. We can migrate the processor we've got as is.

No longer blocks: 1687802

Updated

•

4 days ago

No longer blocks: 1795017

See Also: → 1795017

You need to log in before you can comment on or make changes to this bug.