Closed Bug 1356073 Opened 7 years ago Closed 7 years ago

Add a talos test to measure https page load times

Categories

(Testing :: Talos, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

We need to make sure to not ship things like bug 1353216 again.  Therefore we need to add a talos test that reports page load times from some https servers to perfherder.

Ideally we should check with the security team about the different TLS server configurations that we should be testing for performance, especially in the areas where we expect to be doing new development work in the coming months-years.  I'll leave the details up to the experts here.  :-)

Joel, is this something that we can easily(ish) hook up to tp5?
Flags: needinfo?(jmaher)
is there prior art of a webserver that will serve https and actually do TLS which runs on all operating systems?  I am not too familiar with web servers- currently we use a python script to simply serve web pages.  If doing what we do in mochitest to serve https would work using ssltunnel, that might be realistic to startup in talos and we could consider any proxy auto config settings to make that work.

Assuming we need to install something like apache, that will require us to hand edit/reinstall every hardware machine we have (1000+), so that will take some time.  If we were to do that, I would want to add in web-page-replay at the same time- so going this route would end up taking slightly longer to get all the software configs figured out ahead of time.

Other questions-
 - would we want to change all tests to use https?
 - if there are problems, will we get engineers to debug and fix them so the tests are stable (we have hundreds of crashes/week on win8 right now and no luck getting a fix in bug 1345735)?
Flags: needinfo?(jmaher)
1. I think stunnel would be ok. The primary performance dependencies are the TLS stack. Ultimately, we probably want node.
2. This would probably not have been caught because I doubt we had a perf test for CT (and someone would have had to add one).
3. Soon enough, we will also need to add TLS 1.3 0-RTT and TFO, which will be more complicated, especially because TFO == kernel.

ni: mcmanus for his thoughts.
Flags: needinfo?(mcmanus)
1] a unit perf test is ideal here to get started i.e. something that measures handshake times, and perhaps another that measures transfers of something more substantial - but a full fledged mocihtest integration style wpr suite probably doesn't capture anything we aren't already seeing that the simpler test wouldn't. it would be good too - but it would be a secondary (later) goal for me.

1a] I know most of the mochitest stuff brings proxy code into the mix to actually address the webserver as if it were a proxy.. which is weird and clever at the same time. Obviously https:// would have some objections to that.. the easiest workaround is to use the necko pref that lets you list a bunch of domains that you want to resolve to localhost. It wasn't really built with scale in mind but if that's an issue I guess we can look at it.

2] we have https:// based xpcshell tests using node (and specifically node-http2) that can be cribbed from perhaps. These run on all of our platforms except android. test_altsvc.js and test_origin.js are some pointers (and test_http2.js though it uses an old self-signed paradigm that need not be dup'd). moz-http is the server side for them and node-http2 is in the testing part of the tree.

3] for our existing TLS stuff we aren't especially OS dependent - so everybody-but-android might be a MVP for a tls perf test. (it is for the http2 test cases which need https.. we haven't had an android specific bug sneak through yet - this stuff tends to be pretty cross platform).

4] I would expect different tls versions and suites can be tested in the above paradigm with node easily enough. also 0-RTT once we have a 1.3-0rtt server we feel comfortable sticking in the CI. (h20 might be that from what I understand - node-http2 is not - but it wouldn't be a windows option afaik... this is a fast moving space)

^^^ everything above this line is <= gecko 55 and more or less landed

5] when we get to quic I think it will look a lot like the above cases though it might take a while for a windows server to appear - but the good news with quic is that it should have even less OS dependencies than existing code so the lack of windows server code probably would be acceptable for a while. quic would be AFTER 57

6] TFO is harder. dragana (cc'd) I believe is already in touch with Joel re TFO (or will be very shortly)to at least get some kernel support for unit tests.

a] for mac and linux you need to turn a root-priv knob to enable it for the system, for windows you just need to be running >= win10-anniversary edition. that's the easy part.

b] the client and server need to write to the API (just like TLS 1.3 0-RTT data). In firefox this is very OS specific - so targetting all platforms would be ideal. And just like TLS 1.3 0-RTT the server needs to be hacked up a bit to turn it on and I'm not aware of a package to target (fb and google.com run it in real life - cross that with win10 and its a large audience).. it might be easiest to add it to the existing server used for the mochitests.

TFO hasn't landed yet but presumably will be in 55 or 56.

obviously there is more than one bugzilla in here but hopefully that helps get to a minimal useful test.
Flags: needinfo?(mcmanus)
(In reply to Joel Maher ( :jmaher) from comment #1)
>  - would we want to change all tests to use https?

If the Necko/security folks don't have any specific requirements here, I'd probably start with running the entire tp5 suite under https as well.  It'd actually be pretty interesting to be able to head-to-head compare the http vs https results.  If they do, I'd listen to them first.  :-)

>  - if there are problems, will we get engineers to debug and fix them so the
> tests are stable (we have hundreds of crashes/week on win8 right now and no
> luck getting a fix in bug 1345735)?

Absolutely.  Shipping regressions like we did is just unacceptable, and we need to do start measuring more things if we're expected to not regress things.  It looks like that bug was picked up, if you hit others and one engineer was busy (which can happen, speaking as one such person these days myself!) please ask help from their manager to find you someone else with more free time to help out.  Hopefully people will be helpful and understanding because I really think everyone would prefer these regressions to be caught at landing time.  :-)
With specific regard to bug 1353216, the TLS/https server setup would need to be able to do http2 connection coalescing and use a certificate with SCTs (signed certificate timestamps from certificate transparency). See also bug 1349312 regarding adding test-only certificate transparency information to Firefox so we can actually verify test SCTs in an integration test like this.
See Also: → 1349312
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #4)
> (In reply to Joel Maher ( :jmaher) from comment #1)
> >  - would we want to change all tests to use https?
> 
> If the Necko/security folks don't have any specific requirements here, I'd
> probably start with running the entire tp5 suite under https as well.  It'd
> actually be pretty interesting to be able to head-to-head compare the http
> vs https results.  If they do, I'd listen to them first.  :-)
> 

Partly to bump, but partly also to answer the question:

I don't feel that I have any specific security requirements here, I think as a starting point, running tp2 under HTTPS is enough to start with. As David pointed out, it wouldn't have caught the CT regression on its own, however, we keep adding features / checks that are related to or only operate under secure contexts, so it's still valuable to get this kind of data.

I think more specific Talos enhancements should be follow-on bugs, and IMO we should be aiming to have this simple "Talos for HTTPS" bug resolved for the 57 cycle. Let me know if this is something CryptoEng can help with.
as a note, the new tp6 job (been on windows for a couple months now- previously as q1 job on treeherder) is running modern pagesets recorded and replayed via https.  Would it be satisfactory to stick with knowing we are getting https coverage on modern web pages?
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #7)
> as a note, the new tp6 job (been on windows for a couple months now-
> previously as q1 job on treeherder) is running modern pagesets recorded and
> replayed via https.  Would it be satisfactory to stick with knowing we are
> getting https coverage on modern web pages?

This is a great question. The class of problem that raised this bug isn't related to anything running on the page, rather with the TLS configuration. One can imagine talos regressions with DOM features on HTTPS versus HTTP, too, but that isn't exactly the source issue here.

So perhaps this is a good starting point, and we should treat covering all DOM-level features separately?
I am not clear on DOM features- I am happy to help standup specific changes to an existing toolchain or to take a specific toolchain and add it to our framework- is there something specific that we need to change with mitmproxy to gain more realistic coverage?
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #9)
> I am not clear on DOM features- I am happy to help standup specific changes
> to an existing toolchain or to take a specific toolchain and add it to our
> framework- is there something specific that we need to change with mitmproxy
> to gain more realistic coverage?

I'm ignorant of the details here, but all I mean to say is: If we have Talos data for _some_ functionality going over HTTPS, and can watch that for regressions, then I think _this_ issue is resolved. We might want to open follow-up issues if we identify specific features we want to ensure are explicitly being tested over HTTPS.
I think we should start a new bug with features or specific test scenarios- thanks for helping me understand a bit more!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.