cleanse_upload takes too long to run
Categories
(Tecken :: General, defect, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
Details
Attachments
(1 file)
In bug #1638412, I wrote a cleanse_upload command that cleaned out several tables that accumulate data about uploads, but didn't expire the data in the table when the corresponding uploads expire out of the S3 bucket.
Cool.
Except we can't run the command because it takes too long to run.
This bug covers redoing the command to make it possible to run or figuring out some other way to expire the data in the table.
| Assignee | ||
Comment 1•4 years ago
|
||
| Assignee | ||
Comment 2•4 years ago
|
||
willkg merged PR #2471: "bug 1749309: rework cleanse_upload to skip SELECTs" in 53d71e1.
We'll test this in stage, but it's not clear how helpful that'll be given stage has very little data.
| Assignee | ||
Comment 3•4 years ago
•
|
||
This deployed to stage.
Jason: Can you run this in stage?:
$ pwd
/app
$ time ./manage.py cleanse_upload --dry-run
<output>
$ time ./manage.py cleanse_upload
<output>
Then copy and paste the output (it shouldn't have any secrets) here. Thanks!
Comment 4•4 years ago
|
||
See outputs below:
app@ip-172-31-0-33:/app$ time ./manage.py cleanse_upload --dry-run
{"Timestamp": 1641917185890163968, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641917185890280448, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641917185892795392, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641917185892888832, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641917185892934912, "Type": "markus.backends.datadog", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "DatadogMetrics configured: localhost:8125 "}}
{"Timestamp": 1641917186064189952, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Found credentials in environment variables."}}
{"Timestamp": 1641917186165501696, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 23, "Fields": {"msg": "Found credentials in environment variables."}}
cleanse_upload: THIS IS A DRY RUN.
cleanse_upload: count before cleansing: upload=194, fileupload=4912
cleanse_upload: try=True, cutoff=2021-12-12: deleted upload=0, fileupload=0
cleanse_upload: try=False, cutoff=2020-01-12: deleted upload=51, fileupload=4428
real 0m1.428s
user 0m0.867s
sys 0m0.096s
app@ip-172-31-0-33:/app$ time ./manage.py cleanse_upload
{"Timestamp": 1641917196316438784, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641917196316556800, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641917196319038464, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641917196319123968, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641917196319173632, "Type": "markus.backends.datadog", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "DatadogMetrics configured: localhost:8125 "}}
{"Timestamp": 1641917196479342848, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Found credentials in environment variables."}}
{"Timestamp": 1641917196579687168, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 24, "Fields": {"msg": "Found credentials in environment variables."}}
cleanse_upload: count before cleansing: upload=194, fileupload=4912
cleanse_upload: try=True, cutoff=2021-12-12: deleted upload=0, fileupload=0
cleanse_upload: try=False, cutoff=2020-01-12: deleted upload=51, fileupload=4428
real 0m2.034s
user 0m0.860s
sys 0m0.098s
| Assignee | ||
Comment 5•4 years ago
|
||
If you run the second command again, what does it print out?
$ time ./manage.py cleanse_upload
Comment 6•4 years ago
|
||
app@ip-172-31-0-33:/app$ time ./manage.py cleanse_upload
{"Timestamp": 1641922705935144448, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641922705935268096, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641922705937828864, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641922705937921024, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641922705937966336, "Type": "markus.backends.datadog", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "DatadogMetrics configured: localhost:8125 "}}
{"Timestamp": 1641922706128034048, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Found credentials in environment variables."}}
{"Timestamp": 1641922706228626688, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-0-33.ec2.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 25, "Fields": {"msg": "Found credentials in environment variables."}}
cleanse_upload: count before cleansing: upload=143, fileupload=484
cleanse_upload: try=True, cutoff=2021-12-12: deleted upload=0, fileupload=0
cleanse_upload: try=False, cutoff=2020-01-12: deleted upload=0, fileupload=0
real 0m1.370s
user 0m0.900s
sys 0m0.099s
| Assignee | ||
Comment 7•4 years ago
|
||
Awesome! It's generating two DELETE queries for try=True (file_upload and then upload) and for try=False (file_upload and then upload), so that'll take as long as the manual query we did in bug #1638412. It's definitely running and working and is way easier to run than the hand-crafted SQL.
Comment 8•4 years ago
|
||
Done on prod:
app@ip-172-31-28-151:/app$ time ./manage.py cleanse_upload --dry-run
{"Timestamp": 1641935510346351104, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641935510346476288, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641935510354401792, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641935510354484480, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641935510354526464, "Type": "markus.backends.datadog", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "DatadogMetrics configured: localhost:8125 "}}
{"Timestamp": 1641935510865625600, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Found credentials in environment variables."}}
{"Timestamp": 1641935510973184512, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 127, "Fields": {"msg": "Found credentials in environment variables."}}
cleanse_upload: THIS IS A DRY RUN.
cleanse_upload: count before cleansing: upload=257975, fileupload=22135370
cleanse_upload: try=True, cutoff=2021-12-12: deleted upload=54182, fileupload=9962815
cleanse_upload: try=False, cutoff=2020-01-12: deleted upload=67740, fileupload=4153589
real 4m49.196s
user 0m0.889s
sys 0m0.156s
app@ip-172-31-28-151:/app$ time ./manage.py cleanse_upload
{"Timestamp": 1641936085484666368, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641936085484787968, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641936085487146240, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Statsd buffering is disabled"}}
{"Timestamp": 1641936085487227648, "Type": "datadog.dogstatsd", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Statsd periodic buffer flush is disabled"}}
{"Timestamp": 1641936085487270144, "Type": "markus.backends.datadog", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "DatadogMetrics configured: localhost:8125 "}}
{"Timestamp": 1641936085718197504, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Found credentials in environment variables."}}
{"Timestamp": 1641936085814236416, "Type": "botocore.credentials", "Logger": "tecken", "Hostname": "ip-172-31-28-151.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 134, "Fields": {"msg": "Found credentials in environment variables."}}
cleanse_upload: count before cleansing: upload=257977, fileupload=22135445
cleanse_upload: try=True, cutoff=2021-12-12: deleted upload=54182, fileupload=9962815
cleanse_upload: try=False, cutoff=2020-01-12: deleted upload=67740, fileupload=4153589
real 11m8.257s
user 0m0.840s
sys 0m0.101s
| Assignee | ||
Comment 9•4 years ago
|
||
11m is awesome!
Quick stats:
| table | original | deleted |
|---|---|---|
| uploads | 257,977 | 121,922 (47%) |
| fileuploads | 22,135,445 | 14,116,404 (63%) |
That's so silly.
Marking this as FIXED.
| Assignee | ||
Updated•4 years ago
|
Description
•