Closed Bug 1407997 Opened 7 years ago Closed 7 years ago

change the configuration for running stackwalker

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

We have processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.command_line
which defines the command line to run mdsw.

There are a couple of bits of that which we should break out into separate variables because they're going to differ between environments. The rest of it should be in behavioral configuration ... or maybe not in configuration at all.

There's another bug for changing the command line and adding additional arguments for other symbols urls. When we work on that bug, it'd be a lot easier if we didn't have to change configuration values that were 300 characters long.

This bug covers breaking that configuration item up into smaller parts.
Grabbing this and making it a P1 because it'd be good to do soon.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P1
This is messy, but not as complicated as I thought it would be.

Making bug #1383067 block on this one. Easier to do this one first, then that one.

-prod has this in configuration:

timeout -s KILL 600 {command_pathname} --raw-json {raw_crash_pathname} --symbols-url {public_symbols_url} --symbols-url {private_symbols_url} --symbols-cache /mnt/symbols/cache --symbols-tmp /mnt/symbols/tmp {dump_file_pathname} 2> /dev/null

-stage has this in configuration:

timeout -s KILL 600 {command_pathname} --raw-json {raw_crash_pathname} --symbols-url {public_symbols_url} --symbols-url {private_symbols_url} --symbols-cache /mnt/symbols/cache --symbols-tmp /mnt/symbols/tmp {dump_file_pathname} 2> /dev/null

The default is this:

timeout -s KILL 30 {command_pathname} --raw-json {raw_crash_pathname} --symbols-url {public_symbols_url} --symbols-url {private_symbols_url} --symbols-cache {symbol_cache_path} {dump_file_pathname} 2>/dev/null

They differ in three ways:

* kill timeout (600 vs. 30)
* --symbols-cache value is hard-coded in -prod and -stage
* --symbols-tmp

I think we want to stop setting this value in -stage and -prod. So we want to merge all three of these.

I'm going to make some code changes to get the defaults and possibilities closer in line with what we're doing in -stage and -prod. The goal here is to not have to set the command_line in -stage, -prod, or the docker environment at all.
Blocks: 1383067
I want to make the following changes to -stage and -prod today. We need to do this before any code changes.

consolate kv set socorro/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbols_cache /mnt/symbols/cache
consolate kv set socorro/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbols_tmp /mnt/symbols/tmp

That sets the variables that the code changes will need allowing us to remove the command_line key. It uses values that we're using in -stage and -prod.

Miles: Do these look ok?
Flags: needinfo?(miles)
Oops--"symbols" should be singular and they need "path" at the end. That should be:

consolate kv set socorro/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_cache_path /mnt/symbols/cache
consolate kv set socorro/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_tmp_path /mnt/symbols/tmp

This is the correct set.
I'm not particularly familiar with the first part (the key) of those configuration values, however I verified that /mnt/symbols/{cache|tmp} look like the correct values for stage and prod.

I backed up the stage/prod configuration today. Go ahead.
Flags: needinfo?(miles)
Miles: This is an example configuration key:

socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.command_line timeout

So the namespace is this:

socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015

(Just in case you see anything there that looks wrong.)
Ugh. I had two more typos. The final lines were these:

consulate kv set socorro/processor/raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_cache_path /mnt/symbols/cache
consulate kv set socorro/processor/raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_tmp_path /mnt/symbols/tmp
Ugh. This was the final final:

consulate kv set socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_cache_path /mnt/symbols/cache
consulate kv set socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.symbol_tmp_path /mnt/symbols/tmp

After this, no more configuration changes for me today. I'm cutting myself off.
Commits pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/e6c3ac2f91e809ff7220e9da9ad6b616d031af82
fixes bug 1407997 - reworks mdsw command line

This adds variables to the mdsw command line to account for variations between
our default, -stage, and -prod values:

* kill_timeout
* symbol_tmp_path

This also fixes the tests accordingly as well as the values we use in the local
development environment configuration.

https://github.com/mozilla-services/socorro/commit/2b71e29a5b91616014ae8f5ed7a043e09dc5c815
bug 1407997 - redo command line interpolation

This moves command line interpolation to a method so we can use it in the tests.
This also makes the list of parameters that get interpolated explicit and
replaces some weird-looking code.

https://github.com/mozilla-services/socorro/commit/83f783bf4af5d843b6735f954f84b7e0d6f2c80a
Merge pull request #4061 from willkg/1407997-mdsw-config

fixes bug 1407997 - redo mdsw command line config
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
This landed. Next steps:

1. wait for it to deploy to -stage

2. verify the processors are running mdsw correctly

3. on a -stage admin node, do:

   consulate kv rm socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.command_line

4. verify the processors restart and are still running mdsw correctly

5. deploy to -prod

6. verify the processors are running mdsw in -prod correctly

7. on a -prod admin node, do:

   consulate kv rm socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.command_line

Then we're all set!

In the meantime, I'm reopening this until the follow-up work is done.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
It was deployed to -stage. I verified the processors are running mdsw correctly. I logged into a -stage admin node and removed the command_line variable.

I logged into a -stage processor node and ... spent some time trying to find the logs. They should be in /var/log/socorro/, but that directory is empty. Kind of feels like the processor isn't logging anything anywhere which is curious.

The processor process restarted when I removed the command_line variable, so it picked that up. mdsw is running correctly, so everything seems fine.

Next step is to deploy to -prod and do the whole thing again. We'll do that on Wednesday.
Bah--I figured out the mystery of the missing logs: the processor runs as a systemd service, so the logs are in the journal. This works nicely:

journalctl -u socorro-processor.service -f -o cat
We did a -prod deploy just now. I logged into a -prod processor and everything looked groovy. I did:

   consulate kv rm socorro/processor/processor.raw_to_processed_transform.BreakpadStackwalkerRule2015.command_line

on the -prod admin node and the processor restarted and continued to work fine.

I reprocessed a crash to make sure it's still picking up mdsw work.

Everything looks ok, so I'm marking this as FIXED.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: