Closed Bug 1522133 Opened 6 years ago Closed 1 year ago

Investigate and integrate Web Page Replay with Raptor

Categories

(Testing :: Raptor, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1770457

People

(Reporter: Bebe, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [fxp])

Attachments

(2 files, 1 obsolete file)

Web Page Replay (Wpr) is a performance testing tool written in Golang for recording and replaying web pages.

Web page replay :
https://github.com/catapult-project/catapult/tree/master/web_page_replay_go

WPR creates a DNS host map to reroute all browser traffic through it and record/replay the web packages

We can build WPR using go on all platforms. using go build

Recording:

go run src/wpr.go record --host=127.0.0.1 --http_port=8080 --https_port=8081 /tmp/archive.wprgo

Replaying:

go run src/wpr.go replay --host=127.0.0.1 --http_port=8080 --https_port=8081 /tmp/archive.wprgo

Browser setup:

Chrome:

google-chrome-beta --user-data-dir=$foo  --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost" --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=

Firefox:

  1. Start firefox
  2. Set preference:
    "network.dns.forceResolve": "127.0.0.1"

Note:
When you run WPR for firefox you must set --http_port=80 --https_port=443
Also on linux WPR need sudo access to be able to bind to ports 80 and 443

As we have the sudo 'limitation' Firefox we need to investigate a way to run/setup Firefox and WPR.

To avoid using sudo we need to remap the browser traffic to other ports ports (80 and 443)

The issue is that currently on firefox the network.dns.forceResolve only accepts an IP address and we can't remap the 80 and 443 traffic to other ports.
This is what Chrome --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost" does.

:mcmanus as you initially introduced network.dns.forceResolve in Bug 1361099 can you help shed some light on this preference

Do you know if we can remap the ports in firefox do have a similar setup as Chrome

Other options would be:

  1. Get sudo access to start WPR and bind ports 80 and 443
  2. Add a middle layer to (docker or other server) to run WPR
Assignee: nobody → fstrugariu

:bebe is the network.dns.forceResolve preference supported on Android builds? In CI we run these on linux using Docker containers, so the sudo limitation shouldn't be a problem. I'm wondering if we might be able to use WebPageReplay for Android in the near future, and consider using it against desktop builds once we're able to do so on the various platforms.

Flags: needinfo?(fstrugariu)

Hi folks! I started digging into this as well. I think having a few people poking at it is a great idea 'cuz there are a lot of different angles to expore here.

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #3)

:bebe is the network.dns.forceResolve preference supported on Android builds? In CI we run these on linux using Docker containers, so the sudo limitation shouldn't be a problem. I'm wondering if we might be able to use WebPageReplay for Android in the near future, and consider using it against desktop builds once we're able to do so on the various platforms.

Yes, network.dns.forceResolve is supported for the Necko stack. (It's not supported at all for other networking stacks, but those aren't relevant for pageload; they're only relevant for other parts of the App, e.g., favicons, Sync 1.5, etc.)

I can confirm that arranging for {cert9,key4}.db to have WPR's certs in Fennec's Gecko profile (and by extension, GVE's profile, etc) addresses the MITM SSL errors. I did this by:

  1. killing the browser entirely
  2. adb push-ing browsertime's DBs to /mnt/sdcard in an Android x86 emulator
  3. adb shell run-as org.mozilla.fennec_$USER cp /mnt/sdcard/...db files/... with appropriate paths

Since the x86 emulator has root we can in fact map the Android target port 80 to host port 8080 and target port 443 to host port 8081:

  1. adb reverse tcp:80 tcp:8080
  2. adb reverse tcp:443 tcp:8081

I tested with http://example.com, https://example.com, https://reddit.com, and https://microsoft.com. Reverse port-forwarding avoids having to bind WPR itself to privileged ports (80/443) in the host/container. Of course, it's hard to arrange that on actual hardware!

I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.

I dug pretty deep into net.dns.forceResolver and the equivalent of Chrome's --host-resolver-rules. I think we should follow-up with the Necko team: it looks like the abstraction that Chrome has for remapping (host, port) pairs in the DNS layer not only doesn't exist in Necko, it doesn't really apply to how our DNS code is structured. An expert's eyes would be well worth the effort, I think. More next week!

I dug pretty deep into net.dns.forceResolver and the equivalent of Chrome's --host-resolver-rules. I think we should follow-up with the Necko team: it looks like the abstraction that Chrome has for remapping (host, port) pairs in the DNS layer not only doesn't exist in Necko, it doesn't really apply to how our DNS code is structured. An expert's eyes would be well worth the effort, I think. More next week!

I've filed Bug 1523367 for this discussion, since it's not necessarily relevant for this ticket.

I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.

On IRC, Alex_Gaynor kindly elaborated:

<Alex_Gaynor> FWIW, I think you mean HPKP, not HSTS. HSTS is just for enforcing HTTPS, not pinning keys
<Alex_Gaynor> We don't enforce key pins with custom roots.
<Alex_Gaynor> Chrome's description of why matches our own: https://dev.chromium.org/Home/chromium-security/security-faq#TOC-How-does-key-pinning-interact-with-local-proxies-and-filters-

(In reply to Nick Alexander :nalexander [he/him] from comment #6)

I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.

On IRC, Alex_Gaynor kindly elaborated:

<Alex_Gaynor> FWIW, I think you mean HPKP, not HSTS. HSTS is just for enforcing HTTPS, not pinning keys
<Alex_Gaynor> We don't enforce key pins with custom roots.
<Alex_Gaynor> Chrome's description of why matches our own: https://dev.chromium.org/Home/chromium-security/security-faq#TOC-How-does-key-pinning-interact-with-local-proxies-and-filters-

Further to this, I think what's actually happening is that WPR doesn't require a custom cert like Gecko does: this flag provides a smaller surface area to ignore a certain class of errors for just one certificate.

One of the advantages of WPR is that it allows for playback with no connections to the live servers.

We're trying to remove these upstream connections from mitmproxy's playback (1517323), but so far that's preventing http/2 playback.

In local tests, I was able to do offline playback of http/2 with WPR.

:davehunt managed to make the WPR work with android and raptor

the biggest issue I had is to have a proper root on the phone. From what I have found to make the port redirect on android you need to have root access and the adb root command to work.

Other wise you would get:

mozdevice.adb.ADBProcessError: args: adb -s ZY3228PRPF wait-for-device reverse tcp:80 tcp:8080, exitcode: 1, stdout: error: cannot bind listener: Permission denied

When using Magisk to root you don't get that out of the box.

That can be easily fixed on the Pixel2. You just need to install a Magisk package:
https://forum.xda-developers.com/apps/magisk/module-insecure-adbd-pixel-devices-t3871067

After that on the Pixel the things when strait forward:

  1. Install the chert
  2. Setup the perfs
  3. Setup the app
  4. Start WPR
  5. Start the app

I updated the patch and made a recording of instagram using WPR

I will search how to modify the Magisk package to work on the Motorola and have both devices work properly

Flags: needinfo?(fstrugariu)
Attached file WebPageReplay POC (obsolete) —

This is still a wip
you need:
https://drive.google.com/drive/folders/1tyFctTTDitHJxScZLa6r9k20l8Gd5fTz?usp=sharing

in the opb/raptor dir to work

found the solution to proper root the Motorola Device

See:
https://forum.xda-developers.com/showpost.php?p=77688791&postcount=256

keep in mind what we have in automation might be different

Blocks: 1518791, 1518793
Priority: -- → P2

This is just work-in-progress. I'm building a modified wpr-go 'cuz I
want to be able to distribute my HTTP CONNECT proxy changes to folks
in the performance team.

We should see if we can avoid the multiple Docker images.

bebe: not sure if we'll want this, but https://phabricator.services.mozilla.com/D20567 builds wpr-go (my modified version) in TC as a toolchain. I want to be able to get binaries for the perf team so I built the task.

No longer blocks: 1518793

:nalexander what is the state of your modified binary? are we able to revisit using WebPageReplay in Raptor? We suspect mitmproxy's cert sniffing caused bug 1544288, and there's no current way to disable this without also losing support for HTTP/2.

Flags: needinfo?(nalexander)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #15)

:nalexander what is the state of your modified binary? are we able to revisit using WebPageReplay in Raptor? We suspect mitmproxy's cert sniffing caused bug 1544288, and there's no current way to disable this without also losing support for HTTP/2.

My modified binary, which tries to wire in a true HTTP proxy to wpr-go, didn't seem to work out. What I'm doing for browsertime is awkward but is working well:

I run:

  • mitmproxy to HTTP proxy on 4040 and port-forward reddit.com:80 -> 127.0.0.1:8080, etc
  • wpr-go to record/replay 127.0.0.1:8080 -> reddit.com:80

So that's not as slick as just having wpr-go be the HTTP proxy, but the mitmproxy appears to be more robust. It's possible that my setup suffers from the same as Bug 1544288 and I just haven't locked for it.

The mitmproxy port forwarding script is here.

Flags: needinfo?(nalexander)

:nalexander how are you starting mitmproxy? This may tell us whether it will be requesting the upstream certificates. Are you able to check if you see HTTP/2 traffic? We should identify if you are suffering from the same issues as it can have a significant impact on the performance.

It sounds like for now we'll continue with mitmproxy for Raptor, but we would like to consider moving to WebPageReplay in Q3. We should determine what the most appropriate path would be, given the issues we face.

:nalexander I'm curious - as a quick test can you run your solution with no internet?
I know I've done this with WPR on desktop (and mitmproxy without the upstream cert checking).

Flags: needinfo?(nalexander)

(In reply to Andrew Creskey from comment #18)

:nalexander I'm curious - as a quick test can you run your solution with no internet?
I know I've done this with WPR on desktop (and mitmproxy without the upstream cert checking).

Sorry for the extremely (!) delayed reply. I was investigating this and never quite got a satisfactory answer, but here's what I think I see.

In my setup, mitmproxy "fronts" wpr and serves as a full HTTP CONNECT proxy (which wpr doesn't provide) and also port maps. (That's what I describe in https://bugzilla.mozilla.org/show_bug.cgi?id=1522133#c16.) In my logs, I definitely see the TLS handhake that you are describing happening in mitmproxy, like:

images-na.ssl-images-amazon.com:443 -> 192.168.1.141:8081
::ffff:192.168.1.29:49514: Certificate verification error for images-na.ssl-images-amazon.com: unable to get local issuer certificate (errno: 20, depth: 0)
::ffff:192.168.1.29:49514: Ignoring server verification error, continuing with connection

(from the log at https://drive.google.com/drive/u/1/folders/1PJQCUjf_bdrzc5vBvfzGLP6YvmdZfadu).

So the certificate sniffing happens, it just goes from localhost -> localhost and therefore fails very quickly, 'cuz mitmproxy doesn't trust the wpr cert (which happens to be mitmproxy's own cert(!) -- but that cert isn't in the system trust store, so it's rejected).

AFAICT, this doesn't stop HTTP/2 negotiations; I see things like

192.168.1.29:49518: GET https://192.168.1.141:8081/images/I/41gLHRJCBML._RC|41yzBNflHFL.css,01-yf03D4rL.css,01gq5Ie9j3L.css,31AR91yvQsL.css,01V1p6UrY3L.css,21vlPx3LyyL.css,11UksSmDw-L.css,11+zKfQDbkL.css,21quTPIGBxL.css,01jkimhv0aL.css,01YhS3Cs-hL.css,31K+W+JrcNL.css,01COiFb05sL.css,0110epzdnQL.css,21AUJFAFVFL.css,11X2-nh0PYL.css,01h2e2BEitL.css,110Q3MAjYJL.css,11aMMTVEKqL.css,11oyQ9RIYtL.css,01vd5lqeZUL.css,31G1aiwFgcL.css,01tcPs9LM0L.css,11XscvGD69L.css,01cbS3UK11L.css,21ybJUlhyGL.css,01L8Y-JFEhL.css,01OmXOCBEaL.css_.css?AUIClients/AmazonUI HTTP/2.0
                 << 200  25.08k

(also from that log) all over the place.

I have not (yet) run with the network off, although I expect it to be possible.

Flags: needinfo?(nalexander)

Thank you Nick. That certainly looks like HTTP/2 is working.

Unrelated, I learnt from the Chrome Performance Testing talk that WprGo does support playback with a fixed network latency which would help in making our tests more realistic without introducing noise.

I'm investigating performance testing with fixed network latency using tsproxy (the tool that Chromium uses). (See Bug 1548572)
Because tsproxy is a SOCKS proxy, it's quite easy to integrate with Firefox and Web Page Replay (just set the SOCKS Host in Firefox).

However in doing so I was also able to configure tsproxy as a port mapper so that WebPageRaplay does not need to bind to ports 80 and 443.
i.e. WebPageReplay doesn't need admin privileges to start.

So the setup is:

Firefox:                      <->             tsproxy                      <->               WebPageReplay
SOCKS Host: {lanIP:1080}                bind to {lanip:1080}                     http_port=4040 https_port=4041 host {lanIP}
dns.forceResolve: {lanIP}            redirect outbound to WPR @ {lanIP}
                                 remap outbound ports (80->4040, 443->4041)               

Advantages
• Controls for network latency and bandwidth throttling using battle-tested tsproxy (realism++, expected to reduce noise)
• I verified that replay works with no internet connection
• No administrator privileges required
• Also works on Android

I've been testing this and it seems to work very well.
I'll start long running android tests tonight.

(In reply to Andrew Creskey from comment #21)

I'm investigating performance testing with fixed network latency using tsproxy (the tool that Chromium uses). (See Bug 1548572)
Because tsproxy is a SOCKS proxy, it's quite easy to integrate with Firefox and Web Page Replay (just set the SOCKS Host in Firefox).

However in doing so I was also able to configure tsproxy as a port mapper so that WebPageRaplay does not need to bind to ports 80 and 443.
i.e. WebPageReplay doesn't need admin privileges to start.

So the setup is:

Firefox:                      <->             tsproxy                      <->               WebPageReplay
SOCKS Host: {lanIP:1080}                bind to {lanip:1080}                     http_port=4040 https_port=4041 host {lanIP}
dns.forceResolve: {lanIP}            redirect outbound to WPR @ {lanIP}
                                 remap outbound ports (80->4040, 443->4041)

This remapping + proxy is exactly what I'm doing with mitmproxy remapping, so I'm glad there's another way.

(In reply to Andrew Creskey from comment #21)

Advantages
• Controls for network latency and bandwidth throttling using battle-tested tsproxy (realism++, expected to reduce noise)
• I verified that replay works with no internet connection
• No administrator privileges required
• Also works on Android

I've been testing this and it seems to work very well.
I'll start long running android tests tonight.

That looks very promising! Any idea of how much overhead this will be if run on the same device as playback?

(In reply to Nick Alexander :nalexander [he/him] from comment #22)

This remapping + proxy is exactly what I'm doing with mitmproxy remapping, so I'm glad there's another way.

Yes, that's where I got the idea :)

(In reply to Denis Palmeiro [:denispal] from comment #23)

That looks very promising! Any idea of how much overhead this will be if run on the same device as playback?

That's a good question - in the general case, tsproxy's job is actually to introduce overhead.
I've been thinking more of the android+host setup, but there's a good way to test this: run it with 0ms round trip time and compare against using just Web Page Replay.

I'll include that scenario in the tests that I start now.

:nalexander

I tested integrating WPR with a combination of MitmProxy and WPR using the port forwarding you suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1517323#c28

Same as strategy as suggested
Firefox > Mitm (port forwarding) > WPR

And works as it should

Moving to tsproxy does the same job and adds more features.

I have a pull request on raptor studio with both configurations:
https://github.com/davehunt/raptor-studio/pull/14

Denis, as discussed, attached is a comparisons of running with and without a local ts_proxy.

tsproxy_0 and tsproxy_1 are the same configuration: tsproxy running locally on the reference laptop, tunnelling to WPR on my Macbook.
straight_wpr_0 and straight_wpr_1 are also the same configuration: Firefox on the reference laptop connects directly to WPR on my Macbook.

It's not clear to me that this shows any overhead from running ts_proxy locally.
But certainly even with WPR recordings and cold loads, there is a lot of variance in the load time.

Attachment #9039932 - Attachment is obsolete: true

FWIW, I don't believe we need administrator access for running WPR on Windows when binding to 127.0.0.1.

Assignee: fstrugariu → nobody

Something that I'm seeing quite frequently with WPR is exceptionally long TCP connection establishment times.

In these cases geckoview_example/Fenix was running on Pixel 3 with WPR running on my MacBookPro, same lan:

e.g.
269ms for first connection
https://perfht.ml/2tvVCBk

217ms for first connection
https://perfht.ml/38rdtbA

I'd like to know if anyone else is seeing connection times like this.

This now works with:

go run src/wpr.go record --http_port=8080 --https_port=8081 /tmp/archive.wprgo

or:

go run src/wpr.go replay --http_port=8080 --https_port=8081 /tmp/archive.wprgo

set firefox prefs:

network.dns.forceResolve	127.0.0.1
network.socket.forcePort	443=8081;80=8080

import wpr.cert to firefox

Priority: P2 → P3
Version: Version 3 → unspecified
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1770457
Resolution: --- → DUPLICATE
Whiteboard: [fxp]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: