Bug 1910613 Comment 7 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Will Kahn-Greene [:willkg] ET needinfo? me

on 2024-07-30 17:49:41 PDT

I built a load test that simulates 3 users uploading 10mb payloads full of 1mb sym files.

Load test before landing that change and doing a stage deploy:

```
Tue Jul 30 21:13:41 UTC 2024: Locust end 20240730-210000-aws_stage-normal.
20240730-210000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240730-210000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 203      | 193      | 0.85  | 2,948.42      | 3,000    | 4,700    

Failures:
 Method | Name     | Error                                             | Occurrences 
--------+----------+---------------------------------------------------+-------------
 POST   | /upload/ | HTTPError('429 Client Error:  for url: /upload/') | 193    
```

Load test after landing that change and doing a stage deploy:

```
Wed Jul 31 00:28:26 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 137      | 0        | 0.58  | 4,652.26      | 4,300    | 6,900    

Failures: None
```

The zip files were 10mb with 1mb sym files. 10 requests vs. 137 requests. That's bonkers--so much better.

Grafana looks good, too:

https://earthangel-b40313e5.influxcloud.net/d/6gimTZ6Vz/tecken-upload-api-metrics?orgId=1&var-env=stage&from=1722385330821&to=1722385918162

In production, payloads are in the 100s of mb range. I changed the load test to simulate 3 users with 100mb zip files with 10mb sym files in it.

```
Wed Jul 31 00:38:38 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 15       | 0        | 0.06  | 39,338.47     | 40,000   | 45,000   

Failures: None
```

Still looks good.

Then I did 5 users with 400mb zip files with 10mb sym files in it. That completely saturated the upload on my Internet connection and so it didn't finish any.

```
Wed Jul 31 00:45:10 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=5 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
------+----------+----------+-------+---------------+----------+----------

Failures: None
```

Oh well.

I tweaked the load test and landed it:

https://github.com/mozilla-services/tecken-loadtests/pull/33

I think this change is good enough to go to production.

Revision 1 by

Will Kahn-Greene [:willkg] ET needinfo? me

on 2024-07-30 17:56:52 PDT

I ran the systemtests 10 times in rapid succession:

```
round 0: elapsed: 0:00:05.332435
round 1: elapsed: 0:00:04.751373
round 2: elapsed: 0:00:04.554610
round 3: elapsed: 0:00:04.767880
round 4: elapsed: 0:00:05.808504
round 5: elapsed: 0:00:04.283236
round 6: elapsed: 0:00:05.042892
round 7: elapsed: 0:00:04.950264
round 8: elapsed: 0:00:04.417888
round 9: elapsed: 0:00:04.812643
```

We went from 1m (best time in comment #4--rest were like 2.5m) to 5s for each test. That's cool.

I built a load test that simulates 3 users uploading 10mb payloads full of 1mb sym files.

Load test before landing that change and doing a stage deploy:

```
Tue Jul 30 21:13:41 UTC 2024: Locust end 20240730-210000-aws_stage-normal.
20240730-210000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240730-210000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 203      | 193      | 0.85  | 2,948.42      | 3,000    | 4,700    

Failures:
 Method | Name     | Error                                             | Occurrences 
--------+----------+---------------------------------------------------+-------------
 POST   | /upload/ | HTTPError('429 Client Error:  for url: /upload/') | 193    
```

Load test after landing that change and doing a stage deploy:

```
Wed Jul 31 00:28:26 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 137      | 0        | 0.58  | 4,652.26      | 4,300    | 6,900    

Failures: None
```

The zip files were 10mb with 1mb sym files. 10 requests vs. 137 requests. That's bonkers--so much better.

Grafana looks good, too:

https://earthangel-b40313e5.influxcloud.net/d/6gimTZ6Vz/tecken-upload-api-metrics?orgId=1&var-env=stage&from=1722385330821&to=1722385918162

In production, payloads are in the 100s of mb range. I changed the load test to simulate 3 users with 100mb zip files with 10mb sym files in it.

```
Wed Jul 31 00:38:38 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=3 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name     | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
----------+----------+----------+-------+---------------+----------+----------
 /upload/ | 15       | 0        | 0.06  | 39,338.47     | 40,000   | 45,000   

Failures: None
```

Still looks good.

Then I did 5 users with 400mb zip files with 10mb sym files in it. That completely saturated the upload on my Internet connection and so it didn't finish any.

```
Wed Jul 31 00:45:10 UTC 2024: Locust end 20240731-000000-aws_stage-normal.
20240731-000000-aws_stage-normal users=5 runtime=4m
Runname: logs/20240731-000000-aws_stage-normal

Requests:
 Name | Requests | Failures | Req/s | Avg Time (ms) | 50% (ms) | 95% (ms) 
------+----------+----------+-------+---------------+----------+----------

Failures: None
```

Oh well.

I tweaked the load test and landed it:

https://github.com/mozilla-services/tecken-loadtests/pull/33

I think this change is good enough to go to production.