Bug 2009372 Comment 18 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I've taken an initial look at this patch, focusing just on DNS resolution (not the impact on other network timings or visual metrics).

It's a bit tricky to compare DoH against the OS resolver in CI because the Android OS will cache the hostnames after the first iteration, thus giving the OS resolver an "unfair advantage."
So I've run it locally, flushing the OS cache between iterations.

The results are reproducible and actually match what Valentin found in his perf run, [comment 16](https://bugzilla.mozilla.org/show_bug.cgi?id=2009372#c16).

These are the variants that I compared:

`Native` has DoH disabled
`Baseline TRR` is a push of today Fenix nightly, using either Cloudflare, CF, or CIRA
`TRR Prio` is Valentin's TRR prioritization patch stack

Using the [`trr-multi`](https://mozilla-necko.github.io/tests/dns/trr_multi_domain.html) test to loads 50 favicons.
It's a stress test for DoH as there are so many simultaneous requests. (Although some top sites request a similar amount).

Findings:

  1. TRR Prio reduces TRR lookup times by 91-98ms (53-61%)
  2. With TRR Prio, TRR overhead vs Native is reduced to 29-40ms (62-89%)
  3. Without TRR Prio, TRR overhead vs Native is 122-146ms (244-324%)

```
  Comparison Table (Mean / Median)                                                                                                                                                                     
                                                                                                                                                                                                       
                      Mean      Median                                                                                                                                                                 
  --------------------------------------                                                                                                                                                               
  Native               50ms       45ms                                                                                                                                                                 
  Baseline TRR CF     184ms      179ms                                                                                                                                                                 
  Baseline TRR CIRA   172ms      191ms                                                                                                                                                                 
  TRR Prio CF          86ms       85ms                                                                                                                                                                 
  TRR Prio CIRA        81ms       74ms                                                                                                                                                                 


  TRR Prio Improvement vs Baseline TRR                                                                                                                                                                 
                                                                                                                                                                                                       
  Provider      Mean Reduction     Median Reduction                                                                                                                                                    
  -------------------------------------------------                                                                                                                                                    
  Cloudflare    -98ms (-53%)       -94ms (-53%)                                                                                                                                                        
  CIRA          -91ms (-53%)      -117ms (-61%)                                                                                                                                                        
                                                                                                                                                                                                       
  TRR Overhead vs Native (with TRR Prio)                                                                                                                                                               
                                                                                                                                                                                                       
  Provider      Mean Overhead      Median Overhead                                                                                                                                                     
  -------------------------------------------------                                                                                                                                                    
  Cloudflare    +36ms (+72%)       +40ms (+89%)                                                                                                                                                        
  CIRA          +31ms (+62%)       +29ms (+64%)                                                                                                                                                        
                                                                                                                                                                                                       
  TRR Overhead vs Native (Baseline, without TRR Prio)                                                                                                                                                  
                                                                                                                                                                                                       
  Provider      Mean Overhead      Median Overhead                                                                                                                                                     
  -------------------------------------------------                                                                                                                                                    
  Cloudflare   +134ms (+268%)     +134ms (+298%)                                                                                                                                                       
  CIRA         +122ms (+244%)     +146ms (+324%)                                                                                                                                                       
```

Some profiles from the run:

TRR Baseline: https://share.firefox.dev/4a95Bx2
TRR Prio: https://share.firefox.dev/4tppSHF
Native resolver (OS Dns cache flushed): https://share.firefox.dev/4tiKrVW
I've taken an initial look at this patch, focusing just on DNS resolution (not the impact on other network timings or visual metrics).

It's a bit tricky to compare DoH against the OS resolver in CI because the Android OS will cache the hostnames after the first iteration, thus giving the OS resolver an "unfair advantage."
So I've run it locally, flushing the OS cache between iterations.

The results are reproducible and actually match what Valentin found in his perf run, [comment 16](https://bugzilla.mozilla.org/show_bug.cgi?id=2009372#c16).

These are the variants that I compared:

`Native` has DoH disabled
`Baseline TRR` is a push of today Fenix nightly, using either Cloudflare, CF, or CIRA
`TRR Prio` is Valentin's TRR prioritization patch stack

Using the [`trr-multi`](https://mozilla-necko.github.io/tests/dns/trr_multi_domain.html) test to loads 50 favicons.
It's a stress test for DoH as there are so many simultaneous requests. (Although some top sites request a similar amount).

Findings:

  1. TRR Prio reduces TRR lookup times by 91-98ms (53-61%)
  2. With TRR Prio, TRR overhead vs Native is reduced to 29-40ms (62-89%)
  3. Without TRR Prio, TRR overhead vs Native is 122-146ms (244-324%)

```
  Comparison Table (Mean / Median)                                                                                                                                                                     
                                                                                                                                                                                                       
                      Mean      Median                                                                                                                                                                 
  --------------------------------------                                                                                                                                                               
  Native               50ms       45ms                                                                                                                                                                 
  Baseline TRR CF     184ms      179ms                                                                                                                                                                 
  Baseline TRR CIRA   172ms      191ms                                                                                                                                                                 
  TRR Prio CF          86ms       85ms                                                                                                                                                                 
  TRR Prio CIRA        81ms       74ms                                                                                                                                                                 


  TRR Prio Improvement vs Baseline TRR                                                                                                                                                                 
                                                                                                                                                                                                       
  Provider      Mean Reduction     Median Reduction                                                                                                                                                    
  -------------------------------------------------                                                                                                                                                    
  Cloudflare    -98ms (-53%)       -94ms (-53%)                                                                                                                                                        
  CIRA          -91ms (-53%)      -117ms (-61%)                                                                                                                                                        
                                                                                                                                                                                                       
  TRR Overhead vs Native (with TRR Prio)                                                                                                                                                               
                                                                                                                                                                                                       
  Provider      Mean Overhead      Median Overhead                                                                                                                                                     
  -------------------------------------------------                                                                                                                                                    
  Cloudflare    +36ms (+72%)       +40ms (+89%)                                                                                                                                                        
  CIRA          +31ms (+62%)       +29ms (+64%)                                                                                                                                                        
                                                                                                                                                                                                       
  TRR Overhead vs Native (Baseline, without TRR Prio)                                                                                                                                                  
                                                                                                                                                                                                       
  Provider      Mean Overhead      Median Overhead                                                                                                                                                     
  -------------------------------------------------                                                                                                                                                    
  Cloudflare   +134ms (+268%)     +134ms (+298%)                                                                                                                                                       
  CIRA         +122ms (+244%)     +146ms (+324%)                                                                                                                                                       
```

Some profiles from the run:

TRR Baseline: https://share.firefox.dev/4a95Bx2
TRR Prio: https://share.firefox.dev/4tppSHF
Native resolver (OS Dns cache flushed): https://share.firefox.dev/4tiKrVW

edit: taken on Samsung A54, PGO builds, wifi

Back to Bug 2009372 Comment 18