Investigate and integrate Web Page Replay with Raptor
Categories
(Testing :: Raptor, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: Bebe, Unassigned)
References
(Depends on 1 open bug)
Details
(Whiteboard: [fxp])
Attachments
(2 files, 1 obsolete file)
Web Page Replay (Wpr) is a performance testing tool written in Golang for recording and replaying web pages.
Web page replay :
https://github.com/catapult-project/catapult/tree/master/web_page_replay_go
WPR creates a DNS host map to reroute all browser traffic through it and record/replay the web packages
We can build WPR using go on all platforms. using go build
Recording:
go run src/wpr.go record --host=127.0.0.1 --http_port=8080 --https_port=8081 /tmp/archive.wprgo
Replaying:
go run src/wpr.go replay --host=127.0.0.1 --http_port=8080 --https_port=8081 /tmp/archive.wprgo
Browser setup:
Chrome:
google-chrome-beta --user-data-dir=$foo --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost" --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=
Firefox:
- Start firefox
- Set preference:
"network.dns.forceResolve": "127.0.0.1"
Note:
When you run WPR for firefox you must set --http_port=80 --https_port=443
Also on linux WPR need sudo
access to be able to bind to ports 80 and 443
Reporter | ||
Comment 1•6 years ago
|
||
As we have the sudo
'limitation' Firefox we need to investigate a way to run/setup Firefox and WPR.
To avoid using sudo
we need to remap the browser traffic to other ports ports (80 and 443)
The issue is that currently on firefox the network.dns.forceResolve
only accepts an IP address and we can't remap the 80 and 443 traffic to other ports.
This is what Chrome --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost"
does.
:mcmanus as you initially introduced network.dns.forceResolve in Bug 1361099 can you help shed some light on this preference
Do you know if we can remap the ports in firefox do have a similar setup as Chrome
Reporter | ||
Comment 2•6 years ago
|
||
Other options would be:
- Get
sudo
access to start WPR and bind ports 80 and 443 - Add a middle layer to (docker or other server) to run WPR
Reporter | ||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
:bebe is the network.dns.forceResolve
preference supported on Android builds? In CI we run these on linux using Docker containers, so the sudo limitation shouldn't be a problem. I'm wondering if we might be able to use WebPageReplay for Android in the near future, and consider using it against desktop builds once we're able to do so on the various platforms.
Comment 4•6 years ago
|
||
Hi folks! I started digging into this as well. I think having a few people poking at it is a great idea 'cuz there are a lot of different angles to expore here.
(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #3)
:bebe is the
network.dns.forceResolve
preference supported on Android builds? In CI we run these on linux using Docker containers, so the sudo limitation shouldn't be a problem. I'm wondering if we might be able to use WebPageReplay for Android in the near future, and consider using it against desktop builds once we're able to do so on the various platforms.
Yes, network.dns.forceResolve
is supported for the Necko stack. (It's not supported at all for other networking stacks, but those aren't relevant for pageload; they're only relevant for other parts of the App, e.g., favicons, Sync 1.5, etc.)
I can confirm that arranging for {cert9,key4}.db
to have WPR's certs in Fennec's Gecko profile (and by extension, GVE's profile, etc) addresses the MITM SSL errors. I did this by:
- killing the browser entirely
adb push
-ing browsertime's DBs to /mnt/sdcard in an Android x86 emulatoradb shell run-as org.mozilla.fennec_$USER cp /mnt/sdcard/...db files/...
with appropriate paths
Since the x86 emulator has root we can in fact map the Android target port 80 to host port 8080 and target port 443 to host port 8081:
adb reverse tcp:80 tcp:8080
adb reverse tcp:443 tcp:8081
I tested with http://example.com
, https://example.com
, https://reddit.com
, and https://microsoft.com
. Reverse port-forwarding avoids having to bind WPR itself to privileged ports (80/443) in the host/container. Of course, it's hard to arrange that on actual hardware!
I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=
? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.
I dug pretty deep into net.dns.forceResolver
and the equivalent of Chrome's --host-resolver-rules
. I think we should follow-up with the Necko team: it looks like the abstraction that Chrome has for remapping (host, port) pairs in the DNS layer not only doesn't exist in Necko, it doesn't really apply to how our DNS code is structured. An expert's eyes would be well worth the effort, I think. More next week!
Comment 5•6 years ago
|
||
I dug pretty deep into
net.dns.forceResolver
and the equivalent of Chrome's--host-resolver-rules
. I think we should follow-up with the Necko team: it looks like the abstraction that Chrome has for remapping (host, port) pairs in the DNS layer not only doesn't exist in Necko, it doesn't really apply to how our DNS code is structured. An expert's eyes would be well worth the effort, I think. More next week!
I've filed Bug 1523367 for this discussion, since it's not necessarily relevant for this ticket.
Comment 6•6 years ago
|
||
I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's
--ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=
? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.
On IRC, Alex_Gaynor kindly elaborated:
<Alex_Gaynor> FWIW, I think you mean HPKP, not HSTS. HSTS is just for enforcing HTTPS, not pinning keys
<Alex_Gaynor> We don't enforce key pins with custom roots.
<Alex_Gaynor> Chrome's description of why matches our own: https://dev.chromium.org/Home/chromium-security/security-faq#TOC-How-does-key-pinning-interact-with-local-proxies-and-filters-
Comment 7•6 years ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #6)
I don't understand why our SSL cert mangling doesn't break HSTS key pinning -- i.e., why doesn't Gecko require the equivalent of Chrome's
--ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=
? I think we have special handling for custom certs: if we witness a custom cert in the DB we stop HSTS key pinning entirely.On IRC, Alex_Gaynor kindly elaborated:
<Alex_Gaynor> FWIW, I think you mean HPKP, not HSTS. HSTS is just for enforcing HTTPS, not pinning keys <Alex_Gaynor> We don't enforce key pins with custom roots. <Alex_Gaynor> Chrome's description of why matches our own: https://dev.chromium.org/Home/chromium-security/security-faq#TOC-How-does-key-pinning-interact-with-local-proxies-and-filters-
Further to this, I think what's actually happening is that WPR doesn't require a custom cert like Gecko does: this flag provides a smaller surface area to ignore a certain class of errors for just one certificate.
Comment 8•6 years ago
|
||
One of the advantages of WPR is that it allows for playback with no connections to the live servers.
We're trying to remove these upstream connections from mitmproxy's playback (1517323), but so far that's preventing http/2 playback.
In local tests, I was able to do offline playback of http/2 with WPR.
Reporter | ||
Comment 9•6 years ago
|
||
:davehunt managed to make the WPR work with android and raptor
the biggest issue I had is to have a proper root on the phone. From what I have found to make the port redirect on android you need to have root access and the adb root command to work.
Other wise you would get:
mozdevice.adb.ADBProcessError: args: adb -s ZY3228PRPF wait-for-device reverse tcp:80 tcp:8080, exitcode: 1, stdout: error: cannot bind listener: Permission denied
When using Magisk to root you don't get that out of the box.
That can be easily fixed on the Pixel2. You just need to install a Magisk package:
https://forum.xda-developers.com/apps/magisk/module-insecure-adbd-pixel-devices-t3871067
After that on the Pixel the things when strait forward:
- Install the chert
- Setup the perfs
- Setup the app
- Start WPR
- Start the app
I updated the patch and made a recording of instagram using WPR
I will search how to modify the Magisk package to work on the Motorola and have both devices work properly
Reporter | ||
Comment 10•6 years ago
|
||
This is still a wip
you need:
https://drive.google.com/drive/folders/1tyFctTTDitHJxScZLa6r9k20l8Gd5fTz?usp=sharing
in the opb/raptor dir to work
Reporter | ||
Comment 11•6 years ago
|
||
found the solution to proper root the Motorola Device
See:
https://forum.xda-developers.com/showpost.php?p=77688791&postcount=256
Comment 12•6 years ago
|
||
keep in mind what we have in automation might be different
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Comment 13•6 years ago
|
||
This is just work-in-progress. I'm building a modified wpr-go 'cuz I
want to be able to distribute my HTTP CONNECT proxy changes to folks
in the performance team.
We should see if we can avoid the multiple Docker images.
Comment 14•6 years ago
|
||
bebe: not sure if we'll want this, but https://phabricator.services.mozilla.com/D20567 builds wpr-go (my modified version) in TC as a toolchain. I want to be able to get binaries for the perf team so I built the task.
Comment 15•6 years ago
|
||
:nalexander what is the state of your modified binary? are we able to revisit using WebPageReplay in Raptor? We suspect mitmproxy's cert sniffing caused bug 1544288, and there's no current way to disable this without also losing support for HTTP/2.
Updated•6 years ago
|
Comment 16•6 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #15)
:nalexander what is the state of your modified binary? are we able to revisit using WebPageReplay in Raptor? We suspect mitmproxy's cert sniffing caused bug 1544288, and there's no current way to disable this without also losing support for HTTP/2.
My modified binary, which tries to wire in a true HTTP proxy to wpr-go
, didn't seem to work out. What I'm doing for browsertime
is awkward but is working well:
I run:
- mitmproxy to HTTP proxy on 4040 and port-forward reddit.com:80 -> 127.0.0.1:8080, etc
- wpr-go to record/replay 127.0.0.1:8080 -> reddit.com:80
So that's not as slick as just having wpr-go be the HTTP proxy, but the mitmproxy appears to be more robust. It's possible that my setup suffers from the same as Bug 1544288 and I just haven't locked for it.
The mitmproxy port forwarding script is here.
Comment 17•6 years ago
|
||
:nalexander how are you starting mitmproxy? This may tell us whether it will be requesting the upstream certificates. Are you able to check if you see HTTP/2 traffic? We should identify if you are suffering from the same issues as it can have a significant impact on the performance.
It sounds like for now we'll continue with mitmproxy for Raptor, but we would like to consider moving to WebPageReplay in Q3. We should determine what the most appropriate path would be, given the issues we face.
Comment 18•6 years ago
|
||
:nalexander I'm curious - as a quick test can you run your solution with no internet?
I know I've done this with WPR on desktop (and mitmproxy without the upstream cert checking).
Comment 19•6 years ago
|
||
(In reply to Andrew Creskey from comment #18)
:nalexander I'm curious - as a quick test can you run your solution with no internet?
I know I've done this with WPR on desktop (and mitmproxy without the upstream cert checking).
Sorry for the extremely (!) delayed reply. I was investigating this and never quite got a satisfactory answer, but here's what I think I see.
In my setup, mitmproxy "fronts" wpr and serves as a full HTTP CONNECT proxy (which wpr doesn't provide) and also port maps. (That's what I describe in https://bugzilla.mozilla.org/show_bug.cgi?id=1522133#c16.) In my logs, I definitely see the TLS handhake that you are describing happening in mitmproxy
, like:
images-na.ssl-images-amazon.com:443 -> 192.168.1.141:8081
::ffff:192.168.1.29:49514: Certificate verification error for images-na.ssl-images-amazon.com: unable to get local issuer certificate (errno: 20, depth: 0)
::ffff:192.168.1.29:49514: Ignoring server verification error, continuing with connection
(from the log at https://drive.google.com/drive/u/1/folders/1PJQCUjf_bdrzc5vBvfzGLP6YvmdZfadu).
So the certificate sniffing happens, it just goes from localhost -> localhost and therefore fails very quickly, 'cuz mitmproxy doesn't trust the wpr cert (which happens to be mitmproxy's own cert(!) -- but that cert isn't in the system trust store, so it's rejected).
AFAICT, this doesn't stop HTTP/2 negotiations; I see things like
192.168.1.29:49518: GET https://192.168.1.141:8081/images/I/41gLHRJCBML._RC|41yzBNflHFL.css,01-yf03D4rL.css,01gq5Ie9j3L.css,31AR91yvQsL.css,01V1p6UrY3L.css,21vlPx3LyyL.css,11UksSmDw-L.css,11+zKfQDbkL.css,21quTPIGBxL.css,01jkimhv0aL.css,01YhS3Cs-hL.css,31K+W+JrcNL.css,01COiFb05sL.css,0110epzdnQL.css,21AUJFAFVFL.css,11X2-nh0PYL.css,01h2e2BEitL.css,110Q3MAjYJL.css,11aMMTVEKqL.css,11oyQ9RIYtL.css,01vd5lqeZUL.css,31G1aiwFgcL.css,01tcPs9LM0L.css,11XscvGD69L.css,01cbS3UK11L.css,21ybJUlhyGL.css,01L8Y-JFEhL.css,01OmXOCBEaL.css_.css?AUIClients/AmazonUI HTTP/2.0
<< 200 25.08k
(also from that log) all over the place.
I have not (yet) run with the network off, although I expect it to be possible.
Comment 20•6 years ago
|
||
Thank you Nick. That certainly looks like HTTP/2 is working.
Unrelated, I learnt from the Chrome Performance Testing talk that WprGo does support playback with a fixed network latency which would help in making our tests more realistic without introducing noise.
Comment 21•6 years ago
|
||
I'm investigating performance testing with fixed network latency using tsproxy (the tool that Chromium uses). (See Bug 1548572)
Because tsproxy is a SOCKS proxy, it's quite easy to integrate with Firefox and Web Page Replay (just set the SOCKS Host in Firefox).
However in doing so I was also able to configure tsproxy as a port mapper so that WebPageRaplay does not need to bind to ports 80 and 443.
i.e. WebPageReplay doesn't need admin privileges to start.
So the setup is:
Firefox: <-> tsproxy <-> WebPageReplay
SOCKS Host: {lanIP:1080} bind to {lanip:1080} http_port=4040 https_port=4041 host {lanIP}
dns.forceResolve: {lanIP} redirect outbound to WPR @ {lanIP}
remap outbound ports (80->4040, 443->4041)
Advantages
• Controls for network latency and bandwidth throttling using battle-tested tsproxy (realism++, expected to reduce noise)
• I verified that replay works with no internet connection
• No administrator privileges required
• Also works on Android
I've been testing this and it seems to work very well.
I'll start long running android tests tonight.
Comment 22•6 years ago
|
||
(In reply to Andrew Creskey from comment #21)
I'm investigating performance testing with fixed network latency using tsproxy (the tool that Chromium uses). (See Bug 1548572)
Because tsproxy is a SOCKS proxy, it's quite easy to integrate with Firefox and Web Page Replay (just set the SOCKS Host in Firefox).However in doing so I was also able to configure tsproxy as a port mapper so that WebPageRaplay does not need to bind to ports 80 and 443.
i.e. WebPageReplay doesn't need admin privileges to start.So the setup is:
Firefox: <-> tsproxy <-> WebPageReplay SOCKS Host: {lanIP:1080} bind to {lanip:1080} http_port=4040 https_port=4041 host {lanIP} dns.forceResolve: {lanIP} redirect outbound to WPR @ {lanIP} remap outbound ports (80->4040, 443->4041)
This remapping + proxy is exactly what I'm doing with mitmproxy
remapping, so I'm glad there's another way.
Comment 23•6 years ago
|
||
(In reply to Andrew Creskey from comment #21)
Advantages
• Controls for network latency and bandwidth throttling using battle-tested tsproxy (realism++, expected to reduce noise)
• I verified that replay works with no internet connection
• No administrator privileges required
• Also works on AndroidI've been testing this and it seems to work very well.
I'll start long running android tests tonight.
That looks very promising! Any idea of how much overhead this will be if run on the same device as playback?
Comment 24•6 years ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #22)
This remapping + proxy is exactly what I'm doing with
mitmproxy
remapping, so I'm glad there's another way.
Yes, that's where I got the idea :)
(In reply to Denis Palmeiro [:denispal] from comment #23)
That looks very promising! Any idea of how much overhead this will be if run on the same device as playback?
That's a good question - in the general case, tsproxy's job is actually to introduce overhead.
I've been thinking more of the android+host setup, but there's a good way to test this: run it with 0ms round trip time and compare against using just Web Page Replay.
I'll include that scenario in the tests that I start now.
Reporter | ||
Comment 25•6 years ago
|
||
:nalexander
I tested integrating WPR with a combination of MitmProxy and WPR using the port forwarding you suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1517323#c28
Same as strategy as suggested
Firefox > Mitm (port forwarding) > WPR
And works as it should
Moving to tsproxy does the same job and adds more features.
I have a pull request on raptor studio with both configurations:
https://github.com/davehunt/raptor-studio/pull/14
Comment 26•6 years ago
|
||
Denis, as discussed, attached is a comparisons of running with and without a local ts_proxy.
tsproxy_0
and tsproxy_1
are the same configuration: tsproxy running locally on the reference laptop, tunnelling to WPR on my Macbook.
straight_wpr_0
and straight_wpr_1
are also the same configuration: Firefox on the reference laptop connects directly to WPR on my Macbook.
It's not clear to me that this shows any overhead from running ts_proxy locally.
But certainly even with WPR recordings and cold loads, there is a lot of variance in the load time.
Updated•6 years ago
|
Comment 27•6 years ago
|
||
FWIW, I don't believe we need administrator access for running WPR on Windows when binding to 127.0.0.1.
Reporter | ||
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Something that I'm seeing quite frequently with WPR is exceptionally long TCP connection establishment times.
In these cases geckoview_example/Fenix was running on Pixel 3 with WPR running on my MacBookPro, same lan:
e.g.
269ms for first connection
https://perfht.ml/2tvVCBk
217ms for first connection
https://perfht.ml/38rdtbA
I'd like to know if anyone else is seeing connection times like this.
Reporter | ||
Comment 29•5 years ago
|
||
This now works with:
go run src/wpr.go record --http_port=8080 --https_port=8081 /tmp/archive.wprgo
or:
go run src/wpr.go replay --http_port=8080 --https_port=8081 /tmp/archive.wprgo
set firefox prefs:
network.dns.forceResolve 127.0.0.1
network.socket.forcePort 443=8081;80=8080
import wpr.cert to firefox
Updated•5 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•