Closed Bug 844994 Opened 11 years ago Closed 2 years ago

Add testing STUN Server

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: ekr, Unassigned)

References

(Blocks 1 open bug)

Details

WebRTC is adding a bunch of tests which require functional
STUN service. Currently we just use a STUN server on the
public Internet but ignore failures. However, tests for
bug 843644 will require actual STUN servers.

Here's what's needed:

1. A STUN server that is part of the test infra and can
be relied on. In an ideal world, it would give out
addresses that don't match the local addresses. E.g.,
as if the build hosts were behind a NAT with respect
to the STUN server. These could be dummy addresses that
were unroutable. We just need to be able to detect
that the STUN server is active.

See bug:  807494 for the public STUN server we already
run.


2. A DNS entry that points to the STUN server.
Do we have any updates on this? Bug 843644 is getting pretty close.
John, I think this is just a simple vm that needs to be created in the private AWS cloud. Can you find the right assignee? 

http://en.wikipedia.org/wiki/STUN
Component: General → Release Engineering: Automation (General)
Flags: needinfo?(joduinn)
Product: Testing → mozilla.org
QA Contact: catlee
Version: unspecified → other
Mark Mayo's group set up a public server for us on EC2, so maybe you can start iwth that.
Coop, Catlee can either of you help get this stood up? It should be a pretty simple server system to stand up since Mayo's group already have it running in EC2 I think we just need to copy it into the VPC
Mark, do you have any puppet manifests we can share for this? Or, failing that, an AMI id we could import?
Flags: needinfo?(mmayo)
Any reason not to just use one of the public STUN servers we already have? We're certainly not worried about load. (I'm assuming this is because the test infra can't reach the internet, but want to confirm.)

But otherwise, yes, we have an AMI already patched up and so on you can deploy in your VPC. Adding :whd who can point you to our AWS VPC docs.
Flags: needinfo?(mmayo)
Yeah, the issue is about being able to reach the Internet.
Yeah, we generally don't like test infra hitting external sites. But we hit plenty of mozilla-owned public sites (like hg.m.o, ftp.m.o), so I'm ok with using one of our public ones as well.
We hit mozilla-owned public sites in *infra*, during *setup*. This would make for a total of one and only one set of tests which would fail when a public external server failed.

Can we please create a new test suite, called something like mochitest-totally-unreliable or mochitest-pay-no-attetion-to-failures, for these tests?
These aren't mochitests (or at least not exclusively). Also, as noted above, we need to be able to configure these servers in a specific way. It seems like it would be better to have these be something we actually control as part o the test infra.
I should also mention that I don't want these to be unreliable, so having them ignore failure doesn't seem to work.
We also expect developers to be able to run any tests that we expect them not to break, so it needs to be something utterly unlikely, both inside releng's network and also publicly available and also on the machine of a developer on a plane that doesn't have wifi. In general, I know exactly how to do that: just like we do for the http server that lives in the tree; in this case, I don't quite see how it's going to work.
That's actually quite easy: just have an environment variable that they can use to specify the moz public STUN server.
Sorry, or any other STUN server of their choice.
philor is right, ideally we'd have a stun server we can run locally as part of the test harness. I realize for STUN this may involve faking out some data...but maybe that's more appropriate if we're expecting to use this in automated testing?

Next best would be to use a mozilla-managed stun server. We should first try and use the public ones. If those aren't reliable for whatever reason, then maybe we should look at setting up a few inside the build/test network.
The public ones *are* reliable in general, but as I said above:

1. I keep being told that tests shouldn't rely on public
resources.

2. We need the STUN server to produce different results
than what the client observes on its network interfaces,
which probably means hacking the STUN server.

I realize that none of this is ideal, but considering that
actually having a STUN server integrated as part of the
gtest-based unit test suite is complicated and just standing
up a single STUN server in the test network, that really
would be the best approach for now.
If we really need the STUN server to be configurable so we can test "bad/sub optimal" behaviors then this has to be implemented in the test code itself. While there is some possibility to put a STUN server in the test infra for testing if we needed to, it would certainly not be configurable on the fly from tests.

So, I think we need to figure out how to write this so that it runs locally on the test box itself, and we should also note that it doesn't necessarily have to implement everything the STUN server does, just enough to be a decent pass-through test service.
(In reply to Clint Talbert ( :ctalbert ) from comment #17)
> If we really need the STUN server to be configurable so we can test "bad/sub
> optimal" behaviors then this has to be implemented in the test code itself.
> While there is some possibility to put a STUN server in the test infra for
> testing if we needed to, it would certainly not be configurable on the fly
> from tests.

Right. I agree with that. 

> 
> So, I think we need to figure out how to write this so that it runs locally
> on the test box itself, and we should also note that it doesn't necessarily
> have to implement everything the STUN server does, just enough to be a
> decent pass-through test service.

Right. If there was a case to stand something up to run automation against, it would need to be the happy path case only against a single server.

Based on Mark Mayo's comments above though that load isn't a concern, do we even need to do this? Or is this a WONTFIX?
(In reply to Clint Talbert ( :ctalbert ) from comment #17)
> If we really need the STUN server to be configurable so we can test "bad/sub
> optimal" behaviors then this has to be implemented in the test code itself.
> While there is some possibility to put a STUN server in the test infra for
> testing if we needed to, it would certainly not be configurable on the fly
> from tests.


This isn't what I'm looking for, at least not right now. I just want it to
offer an address that doesn't match the actual address so that we can
minimally test that candidate gathering is working.


> So, I think we need to figure out how to write this so that it runs locally
> on the test box itself, and we should also note that it doesn't necessarily
> have to implement everything the STUN server does, just enough to be a
> decent pass-through test service.

Yes, we eventually need to do this, but given our available resources, it's
going to take some time, so it would still be really helpful to have
a testing server.
(In reply to Eric Rescorla (:ekr) from comment #19)
> (In reply to Clint Talbert ( :ctalbert ) from comment #17)
> > If we really need the STUN server to be configurable so we can test "bad/sub
> > optimal" behaviors then this has to be implemented in the test code itself.
> > While there is some possibility to put a STUN server in the test infra for
> > testing if we needed to, it would certainly not be configurable on the fly
> > from tests.
> 
> 
> This isn't what I'm looking for, at least not right now. I just want it to
> offer an address that doesn't match the actual address so that we can
> minimally test that candidate gathering is working.

Which protocol does it need? Do you have some pointers what the minimum requirements are? We might be able to create a simple solution with httpd.js and dynamic pages.
Blocks: 890832
Product: mozilla.org → Release Engineering
Found in triage.

1) Relying on external sites, or connections to external sites, has caused too many intermittent errors because of intermittent problems outside Mozilla's network... which in turn causes disruption on production trees. 

2) If the scope of this request is to use an existing public-facing Mozilla-hosted server, we should configure systems/tests to use an internal IP where possible. 
2a) From comments in the bug, it sounds like the scope is to setup an internally-for-testing-only STUN server, where Mozilla can tweak the settings for different use-cases.

3) I setup Eric with his own Mozilla-paid AWS account earlier this year, so he could do this himself. I've not seen any movement in this bug since then. 

Erik - anything left to do here?
Flags: needinfo?(john+bugzilla) → needinfo?(ekr)
This is just part of the big project of making a realistic testing environment.
Flags: needinfo?(ekr)
Component: General Automation → General

Not sure if this 9 year old bug is still wanted.

If we still want to test against a STUN server, a lot has changed over the past 9 years. Some options off the top of my head:

  • point against a live STUN server, but don't run tests on push, or if they are run on push, at a tier 2 or 3, which don't close trees
    • if we don't run on push, we can run via tc cron or manual hook trigger
  • have a generic-worker instance spin up a docker STUN server locally, and test against that
  • otherwise spin up a mock STUN server, e.g. via a local python daemon that we can test against

I think all of the above should be self-serveable, though the folks in the Matrix #firefox-ci channel are willing to help mentor if we want to look at one of the above options.

Resolving incomplete, since:

  • we've seen no real progress here for 9 years;
  • we should be able to self-serve, which may not be ideal, but may see progress here sooner than otherwise; and
  • if history is any guide, setting up additional special one-off pieces of infrastructure that doesn't ride the trains is a recipe for unhappiness on all sides.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.