Add an OrangeFactor API endpoint that allows submitting to Elasticsearch

RESOLVED FIXED

Status

P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

Details

Attachments

(1 attachment, 1 obsolete attachment)

Elasticsearch doesn't have it's own authentication and so relies on network auth.

Treeherder currently submits intermittent failure classifications to Elasticsearch directly, since network flows exist between the Treeherder nodes and the OrangeFactor Elasticsearch cluster. 

However once Treeherder moves to Heroku, we'd either need to set up flows between that cluster and Heroku (a pain due to lack of static IPs, and means paying money for a third party addon iirc), or else add a way to submit to Elasticsearch via a public API.

This bug is taking the latter approach.

Brasstacks already has access to the ES (since OF needs it), so I'm just going to add another endpoint to the OrangeFactor API which allows for submissions to Elasticsearch.

We'll limit this endpoint to just the precise ES index we need, and to just doc creation (so it can't be used to eg delete all ES documents), and secure it using a fixed secret key which will be sent via the headers, so that we can just forward on the POST payload verbatim to ES.

Once this has landed, bug 1153324 will then switch Treeherder to it for both prod and also Heroku.
Created attachment 8701892 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch

Elasticsearch isn't publicly accessible, so once Treeherder moves to Heroku, we need a way for it to be able to still post new failure classifications to it.

As such, this adds an API endpoint (`/api/saveclassification`) that forwards the body of POSTs to the endpoint, to Elasticsearch. Authentication is done via an API-KEY header, since using the body would mean needing to sanitise it prior to passing it onto Elasticsearch, and using params would mean potentially leaking the key in logs.

The POST to Elasticsearch is roughly a port of this Treeherder code:
https://github.com/mozilla/treeherder/blob/9c80f8b8850a09a35e6d64066d7f29de124fa9ba/treeherder/etl/classification_mirroring.py#L65-L76

For web.py docs (which aren't great unfortunately), see http://webpy.org/

Once this lands, the Treeherder submission code linked above will be adjusted to instead POST to this new OrangeFactor API.

--

As much as it pains me to land code with no test coverage, it's just not worth trying to create an OrangeFactor test infrastructure at this point (given we'll hopefully be replacing OrangeFactor within the next couple of quarters).

My local testing was as follows:

# Checking 401 if no key at all...

[~/src/orangefactor]$ curl -X POST -i "http://localhost:8080/api/saveclassification"
HTTP/1.1 401 Unauthorized
Transfer-Encoding: chunked
Date: Thu, 24 Dec 2015 23:45:48 GMT
Server: localhost

Unauthorized

# Checking 403 if wrong key...

[~/src/orangefactor]$ curl -X POST -i -H 'API-KEY: aa' "http://localhost:8080/api/saveclassification"
HTTP/1.1 403 Forbidden
Transfer-Encoding: chunked
Date: Thu, 24 Dec 2015 23:45:56 GMT
Server: localhost

Forbidden

# Checking handling when correct key but ES not responding (just didn't start ES ssh tunnel)...

[~/src/orangefactor]$ curl -X POST -i -H 'API-KEY: CORRECTKEY' "http://localhost:8080/api/saveclassification"
HTTP/1.1 500 Internal Server Error
Content-Type: text/html
Transfer-Encoding: chunked
Date: Thu, 24 Dec 2015 23:46:15 GMT
Server: localhost

ConnectionError submitting to Elasticsearch: ('Connection aborted.', gaierror(8, 'Name or service not known'))

# Checking handling when the submission is made to ES fine, but the payload wasn't valid json...

[~/src/orangefactor]$ curl -X POST -i -H 'API-KEY: FOO' "http://localhost:8080/api/saveclassification"
HTTP/1.1 500 Internal Server Error
Content-Type: text/html
Transfer-Encoding: chunked
Date: Thu, 24 Dec 2015 23:51:26 GMT
Server: localhost

HTTPError 400 submitting to Elasticsearch: {"error":"RemoteTransportException[[orangefactor-elasticsearch4.scl3][inet[/10.22.81.108:9300]][index]]; nested: MapperParsingException[failed to parse, document is empty]; ","status":400}

# Check "successful" submission (one could argue an empty doc isn't valid, but that's ES's decision)...

[~/src/orangefactor]$ curl -X POST -i -H 'API-KEY: FOO' "http://localhost:8080/api/saveclassification" -d '{}'
HTTP/1.1 200 OK
Content-Length: 90
Content-Type: application/json; charset=UTF-8
Date: Thu, 24 Dec 2015 23:54:55 GMT
Server: localhost

{"ok":true,"_index":"bugs","_type":"bug_info","_id":"SbiPa6ixSFCTFmcwDy5VrA","_version":1}[~/src/orangefactor]$

# Cleanup...

[~/src/orangefactor]$ curl -X DELETE 'http://localhost:9200/bugs/bug_info/SbiPa6ixSFCTFmcwDy5VrA'
{"ok":true,"found":true,"_index":"bugs","_type":"bug_info","_id":"SbiPa6ixSFCTFmcwDy5VrA","_version":2}
Comment on attachment 8701892 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch

Will, would you be ok reviewing this? It's just that I think it makes sense for the same person to review this as the Treeherder part, plus the main reviewers for OrangeFactor are :jgriffin and :mcote who probably have enough on their plate as it is :-)
Attachment #8701892 - Flags: review?(wlachance)
(See comment 1 for patch details)
Comment on attachment 8701892 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch

Review of attachment 8701892 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to me, but I don't really feel like I'm an expert at validating these sorts of things. It concerns me a bit that we're rolling our own authentication scheme. 

* Have you considered using JWT? The autophone server uses that (also using web.py). It has the nice property of not requiring keys to be sent over the wire. https://github.com/markrcote/phonedash/blob/master/server/handlers.py#L71
* Should we run this by the security team, since we're adding a new write capability to orangefactor?

I'm going to cancel the review, feel free to "r?" again once you've mulled the above over (if you haven't already).
Attachment #8701892 - Flags: review?(wlachance)
(In reply to William Lachance (:wlach) from comment #4)
> This looks fine to me, but I don't really feel like I'm an expert at
> validating these sorts of things. It concerns me a bit that we're rolling
> our own authentication scheme. 

I agree it's not normally advisable, though in this case we're not really rolling our own - other libraries put the tokens/keys they've generated into a header too, the difference here is that we're using a hardcoded key that gets sent over the wire, rather than signing. Bugzilla also uses this approach for its API key too.

> * Have you considered using JWT? The autophone server uses that (also using
> web.py). It has the nice property of not requiring keys to be sent over the
> wire.
> https://github.com/markrcote/phonedash/blob/master/server/handlers.py#L71

I hadn't heard of that library before, thank you for mentioning it. Though using that would mean adding it as a dependency to Treeherder not just OrangeFactor, meaning we have both mohawk and jwt for auth, which would be unfortunate given we've only just been able to remove the dependency on oauth2 :-)

It also seems that JWT may not add a huge amount of value for our use-case, since:
* the API key in the proposed patch is sent as a header (not in the URI), so won't appear in logs anyway.
* we use TLS for the connection, and so unless we don't trust TLS to protect us (or say the node where the TLS is terminated, which is presumably Zeus, not brasstacks), the key is safe even when sent over the wire.
* both the sender and receiver are under our control, so there's less opportunity for user-error than if it were a public API (eg someone POSTing to the HTTP endpoint instead).
* the API endpoint only allows creating new documents, and can't delete existing documents.
* the ElasticSearch instance is only used by OrangeFactor, and contains no confidential data.
* anyone can create a treeherder account without needing approval, and use that account to classify as many jobs maliciously as they like already. Knowing the secret API key would only mean they could submit documents that are arbitrary rather than having the expected fields (which would presumably be easier to delete/clean up than someone submitting them via Treeherder).
* this solution will only be used for a few months until we turn off OrangeFactor entirely.
* using JWT is marginally more complicated, which given the lack of test harness for OrangeFactor might actually mean there's a greater risk of auth bugs than less. (There's not really much that can go wrong with comparing a header value to a known value.)

> * Should we run this by the security team, since we're adding a new write
> capability to orangefactor?

Yeah I'm happy to, wonder who to needinfo?

Thank you for taking a look at this :-)
Mark, could you find someone from the (web?) security team to review Ed's approach above? It's pretty simple, but this does effect what orangefactor exposes to the internet at large.
Flags: needinfo?(mcote)
kang, could you comment here?
Flags: needinfo?(mcote) → needinfo?(gdestuynder)
TLS must always be used for API Keys (following https://wiki.mozilla.org/Security/Server_Side_TLS#Modern_compatibility). Make sure only TLS can connect to the server (i.e. not even listen on HTTP).

Signing is always better than an API key since when signing you do not send over secret key material, thus reduce the chance for leaking it (with API keys, both endpoints know the secret, with signing only the signer does).
Also, I'd note that generally the custom header X-Api-Key is used, or else the standard WWW-Authenticate one (https://tools.ietf.org/html/rfc7235), albeit in practice it does not make a huge difference (it may make it easier to parse for detection tools though).

The code looks ok though, I'm sending this over to our new webapp guys for a better look :)
Flags: needinfo?(gdestuynder) → needinfo?(amuntner)
Depends on: 1238599
Many thanks for taking a look :-)

(In reply to Guillaume Destuynder [:kang] from comment #8)
> TLS must always be used for API Keys (following
> https://wiki.mozilla.org/Security/Server_Side_TLS#Modern_compatibility).
> Make sure only TLS can connect to the server (i.e. not even listen on HTTP).

Bug 1238599 filed for this.

> Signing is always better than an API key since when signing you do not send
> over secret key material, thus reduce the chance for leaking it (with API
> keys, both endpoints know the secret, with signing only the signer does).

Yeah I agree for when using public-private key signing, though in this instance we were thinking more along the lines of signing using a common secret. (Though as it happens the JWT library supports both.) I don't think the added complexity is worth it here, given comment 5.

> Also, I'd note that generally the custom header X-Api-Key is used, or else
> the standard WWW-Authenticate one (https://tools.ietf.org/html/rfc7235),
> albeit in practice it does not make a huge difference (it may make it easier
> to parse for detection tools though).

Bugzilla uses X-BUGZILLA-API-KEY, guess we could use X-ORANGEFACTOR-API-KEY (http://bugzilla.readthedocs.org/en/latest/api/core/v1/general.html#authentication), but happy to use whatever people think best :-)

> The code looks ok though, I'm sending this over to our new webapp guys for a
> better look :)

Great! Adam, any idea when you might be able to take a look at this?
Attachment #8701892 - Flags: feedback?(amuntner)
Attachment #8701892 - Flags: review?(jgriffin)
Jeff, there has been a need-info and feedback request that seems to be slipping through the net. Can you please help get this prioritised.
Flags: needinfo?(jbryner)
Will have a look the week of 2/1 per IRC discussion.
Flags: needinfo?(jbryner)
Comment on attachment 8701892 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch

Review of attachment 8701892 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good; in the future I can see we might want to convert this into a slightly more generic ES proxy, but perhaps we'll never have the need.
Attachment #8701892 - Flags: review?(jgriffin) → review+
(In reply to Jonathan Griffin (:jgriffin) from comment #12)
> Looks good; in the future I can see we might want to convert this into a
> slightly more generic ES proxy, but perhaps we'll never have the need.

There is now an official solution from Elastic:
https://www.elastic.co/downloads/shield

...however the OrangeFactor ES cluster is too old to use it afaict (it's v0.9x).

That said, all this can go away in a couple of months thankfully, when we EOL OrangeFactor :-)
If I understand correctly, the threat we are considering is of unauthorized use of captured credentials, the impact is that arbitrarily and potentially maliciously formatted documents could be submitted by an attacker. 

Several potential security issues are present. Controls exist which to varying degrees protect against each of these failure modes.
1. Obtain credentials via network MiTM
2. Obtain credentials via local file exposure bug
3. Credential replay attack
4. Input validation issues (XSS, SQLI, OS command injection, file include, etc)


1. MiTM
Trust in TLS isn't an all-or-nothing proposition. Should our threat models assume that TLS is always secure, or should we plan for it to potentially fail, where feasable? There are several different approaches which could be taken.
Client certificates, pinning, and other possible verification mechanisms would significantly raise the bar on the categories of TLS vulnerabilities which would result in the possibility of MiTM. 
Our main goal is to prevent credential capture and thus replay. With signed JWT this is more difficult for the attacker, requiring most likely OS command access in order to retrieve a private certificate. Whereas to retrieve the API key, enough attacker "only" needs to know how to break TLS enough to MiTM a connection that isn't validating both parties. Given the history of SSL and TLS protocol flaws, planning for it to fail at some point might be a prudent move.

2. Obtain API key from filesystem
The file containing the API key is necessarily readable by the user under which python is running. Keys and other credentials are thus vulnerable to local file read and include vulnerabilities. 
API keys and credentials being stored internally to the application is common at Mozilla. I'm bringing it up in our security group for discussion. (#security-team on irc)
One option might be to use a virtualenvwrapper postactivate script to set an environment variable with the API key, and have the application read it with os.environ['ENV_VARIABLE_NAME']

3. Replay 
Since an API key is a single factor of authentication, knowledge of this secret would be sufficient for an attacker to submit a malformed request which would be subsequently processed. 
Using client-side certificates with TLS and cryptographically verifying the identity of both ends would make MiTM more difficult. Though this would still place trust in TLS, it would, similarly to signed JWT, limit the submission of a request to clients that know the secret and can complete a cryptographic handshake which verifies the identity of each party.  

4. Input validation issues
An attacker who is authorized to submit arbitrarily malformed requests after presenting valid authentication would be able to exploit any input validation issues present. These could range from persistent XSS to SQL injection, OS command execution, triggering of deserialization bugs, and more. 
Given that this is most likely the ultimate target of any attacker, strict white-listing of the allowed submission format should be performed with any non-compliant request rejected. Attackers would, in the context of this issue, be limited to submitting correctly formatted but useless and safe data.
Flags: needinfo?(amuntner)
Depends on: 1245481
Many thanks for your detailed reply - it's definitely given me ideas of things to try for other projects too :-)

(In reply to Adam Muntner [:adamm] from comment #14)
> If I understand correctly, the threat we are considering is of unauthorized
> use of captured credentials, the impact is that arbitrarily and potentially
> maliciously formatted documents could be submitted by an attacker.

That's correct.

> 1. MiTM
> Trust in TLS isn't an all-or-nothing proposition. Should our threat models
> assume that TLS is always secure, or should we plan for it to potentially
> fail, where feasable?
...
> Client certificates, pinning, and other possible verification mechanisms
> would significantly raise the bar on the categories of TLS vulnerabilities
> which would result in the possibility of MiTM. 
...
> Given the history of SSL and TLS protocol flaws, planning for it to fail
> at some point might be a prudent move.

I agree with this, however I think the security measures need to be proportionate to the threat and other considerations.

In our case:
* There is only a single client, under our control. Therefore (a) we can ensure it's up to date and validates certificates correctly (we're using the latest version of the Python requests library under Python 2.7.10/2.7.11), (b) we can ensure requests aren't made to HTTP and then redirected to HTTPS.
* This is a short-lived API, since the project is going to be end-of-lifed in the next few months.
* This legacy project has no test suite, so the complexity of a more-secure JWT signing based approach has to be weighed against the "easier to review for correctness" approach of having a fixed API key header (particularly given I've not used JWT before).

That said, this has been held up from landing for 5 weeks now, so to avoid any further delay I've settled on a compromise: 

Using a shared key Hawk authentication approach. Treeherder (the single client of this API) already uses Hawk for it's own API, so this doesn't add any additional dependencies to Treeherder (which was another of my issues with using JWT), and I'm more familiar with it.

I'll attach a new patch shortly.


> 2. Obtain API key from filesystem
...
> API keys and credentials being stored internally to the application is
> common at Mozilla. I'm bringing it up in our security group for discussion.
> (#security-team on irc)
> One option might be to use a virtualenvwrapper postactivate script to set an
> environment variable with the API key, and have the application read it with
> os.environ['ENV_VARIABLE_NAME']

I agree - I'm a big fan of using environment variables. Treeherder doesn't use local configuration files within the source directory any more, and once we move to Heroku, we won't be using scripts within /etc/profile.d/ to populates them either.

As for OrangeFactor, setting these via the postactivate script won't work, since virtualenvwrapper isn't used/installed, and the virtualenv itself isn't even activated (the python binary from the venv is used directly). If OrangeFactor wasn't going to be end of life-d imminently, I'd say we should try and rework the box configuration, but as is I think it's risky, since I didn't set it up, and there's I think no puppet management of config, so no way to rollback in the case of regressions.


> 4. Input validation issues

So I've done some research using these (and others):
https://www.elastic.co/community/security
https://www.cvedetails.com/vulnerability-list/vendor_id-13554/Elasticsearch.html
https://www.exploit-db.com/search/?action=search&description=elasticsearch

From reading the various POCs, all of them required use of other endpoints (eg _search, _plugins etc) or manipulation of the URI - whereas this API is hardcoded to "http://ES_CLUSTER/bugs/bug_info/" with no query string. I've just had dynamic scripting disabled on this ES cluster (in bug 1245481) to be safe regardless.

This does of course leave deserialisation exploits in ES code, which we could defend against in the API by ensuring we're only proxying on valid json - however that seems like we're exchanging deserialisation exploits in ES for those in the ancient Python 2.6 on brasstacks.
Created attachment 8715789 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch (Hawk version)

Now uses Hawk instead of a fixed API key.
Attachment #8701892 - Attachment is obsolete: true
Attachment #8701892 - Flags: feedback?(amuntner)
Attachment #8715789 - Flags: review?(mdoglio)
Re the patch - meant to say ideally handlers.py would be broken up into multiple files, however that would require refactoring the way the app config is read.
Given the other limitations and remaining operational time, this is a reasonable approach. Using Hawk is a great idea. The patch looks good to me. Thank you Ed.
Comment on attachment 8715789 [details] [diff] [review]
Add an API endpoint that allows submitting to Elasticsearch (Hawk version)

Review of attachment 8715789 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. Is there a OF dev environment to test it before it goes to production?

::: server/handlers.py
@@ +504,5 @@
> +                 content=web.data(),
> +                 content_type=env.get('CONTENT_TYPE'))
> +    except HawkFail as e:
> +        exception_type = e.__class__.__name__
> +        print "Access denied: %s: %s" % (exception_type, str(e))

I guess this is the only way we can log this error?
Attachment #8715789 - Flags: review?(mdoglio) → review+
(In reply to Mauro Doglio [:mdoglio] from comment #19)
> lgtm. Is there a OF dev environment to test it before it goes to production?

Unfortunately there is no dev/stage instance, no test suite, no way to fully test locally unless you have SSH access to brasstacks (to tunnel the ES connection) and brasstacks is running Python 2.6. Welcome to the 1990s!

That said, by following the instructions in the README, and by also commenting out the BugzillaCache instantiation [1], you can at least get the OF server to start without ES access, and so test everything apart from successfully POSTing to ES.

I've tested locally myself, and my plan is to test the API in production once this lands, before switching Treeherder to it.

[1] https://hg.mozilla.org/automation/orangefactor/file/effe61195b1d/server/handlers.py#l106

> ::: server/handlers.py
> @@ +504,5 @@
> > +                 content=web.data(),
> > +                 content_type=env.get('CONTENT_TYPE'))
> > +    except HawkFail as e:
> > +        exception_type = e.__class__.__name__
> > +        print "Access denied: %s: %s" % (exception_type, str(e))
> 
> I guess this is the only way we can log this error?

If you mean notifying the caller - correct since we're using web.py 0.36 which doesn't support setting a custom response message for 403s (see `ensure_authenticated`'s docstring).

Or if you mean vs `log.warning()` then yeah OF doesn't use the Python logger at all, and the ancient print syntax is due to brasstacks running Python 2.6 :/
Anything printed does get output in the server logs at least.

I've also hit a couple of mohawk quirks, which won't hold up the landing of this, but for which I've opened:
https://github.com/kumar303/mohawk/issues/23
https://github.com/kumar303/mohawk/issues/24
...and hope to have a PR for them at some point.
I've created server/orangefactor_local.conf containing the `treeherder` hawk id and the secret added to treeherder in bug 1238584.

Landed in:
https://hg.mozilla.org/automation/orangefactor/rev/f50d897a9638

Deployed:

[webtools@brasstacks1.dmz.scl3 orangefactor]$ hg pull -uv
warning: hg.mozilla.org certificate with fingerprint af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 not verified (check hostfingerprints or web.cacerts config setting)
pulling from https://hg.mozilla.org/automation/orangefactor/
searching for changes
all local heads known remotely
adding changesets
adding manifests
adding file changes
added 1 changesets with 4 changes to 4 files
resolving manifests
getting requirements/prod.txt
getting server/config.py
getting server/handlers.py
getting server/orangefactor.conf
4 files updated, 0 files merged, 0 files removed, 0 files unresolved


[webtools@brasstacks1.dmz.scl3 orangefactor]$ ../../bin/pip install -r requirements/prod.txt
/home/webtools/apps/orangefactor/lib/python2.6/site-packages/cryptography/__init__.py:26: DeprecationWarning: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of cryptography will drop support for Python 2.6
  DeprecationWarning
Requirement already satisfied (use --upgrade to upgrade): mozautoeslib from https://hg.mozilla.org/automation/mozautoeslib/archive/676a64adef22.zip#egg=mozautoeslib in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): bzcache from https://github.com/jonallengriffin/bzcache/archive/66a00585fae759bba613afcb0b6952fcbb9162d7.zip#egg=bzcache in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 2))
Collecting mohawk==0.3.1 (from -r requirements/prod.txt (line 3))
  Downloading mohawk-0.3.1.tar.gz
Requirement already satisfied (use --upgrade to upgrade): pyes==0.16.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages/pyes-0.16.0-py2.6.egg (from -r requirements/prod.txt (line 4))
Requirement already satisfied (use --upgrade to upgrade): requests==2.9.1 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 5))
Requirement already satisfied (use --upgrade to upgrade): Tempita==0.5.1 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 6))
Requirement already satisfied (use --upgrade to upgrade): web.py==0.36 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 7))
Requirement already satisfied (use --upgrade to upgrade): cffi==1.5.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 8))
Requirement already satisfied (use --upgrade to upgrade): cryptography==1.2.1 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 9))
Requirement already satisfied (use --upgrade to upgrade): enum34==1.1.2 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 10))
Requirement already satisfied (use --upgrade to upgrade): idna==2.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 11))
Requirement already satisfied (use --upgrade to upgrade): ipaddress==1.0.16 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 12))
Requirement already satisfied (use --upgrade to upgrade): ndg-httpsclient==0.4.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 13))
Requirement already satisfied (use --upgrade to upgrade): ordereddict==1.1 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): pyOpenSSL==0.15.1 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): pyasn1==0.1.9 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 16))
Requirement already satisfied (use --upgrade to upgrade): pycparser==2.14 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 17))
Requirement already satisfied (use --upgrade to upgrade): six==1.10.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from -r requirements/prod.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): setuptools>=1.0 in /home/webtools/apps/orangefactor/lib/python2.6/site-packages (from cryptography==1.2.1->-r requirements/prod.txt (line 9))
Building wheels for collected packages: mohawk
  Running setup.py bdist_wheel for mohawk
  Stored in directory: /home/webtools/.cache/pip/wheels/94/b7/c9/415623a2538cd1503fcdd9e4296fff35bd54d2812440cf9318
Successfully built mohawk
Installing collected packages: mohawk
Successfully installed mohawk-0.3.1


[root@brasstacks1.dmz.scl3 ~]# /etc/init.d/orangefactor stop; /etc/init.d/orangefactor start
stopping orangefactor                                      [  OK  ]
starting orangefactorspawn-fcgi: child spawned successfully: PID: 12594
                                                           [  OK  ]
So I've spent the last few hours trying to figure out why after deploying I'm getting 403s even when I'm using the correct credentials:

[~/src/orangefactor]$ http --auth-type=hawk --auth='treeherder:REDACTED' POST https://brasstacks.mozilla.com/orang efactor/api/saveclassification
HTTP/1.1 403 Forbidden
Connection: keep-alive
Date: Mon, 08 Feb 2016 15:38:43 GMT
Server: nginx/0.8.54
Transfer-Encoding: chunked

Forbidden

--

It turns out that the fcgi server output isn't logged anywhere, any my attempts to get the /etc/init.d/orangefactor script to do so have only managed to get the spawn-fcgi output logged, and not that of the child process it creates. Annoyingly spawn-fcgi doesn't provide any in-built logging options and seems to actively redirect the child process stdout/stderr :-/

As such, I've landed a basic logging configuration - with the dev woo_server.py logging to the console, and the fcgi wrapper doing so to server/orangefactor.log:
https://hg.mozilla.org/automation/orangefactor/rev/f120e0d57c08
So the problem is that the fcgi config used on brasstacks (and not present locally) deliberately rewrites what the app sees for the URI, to remove the `/orangefactor` part:

location ~ ^/orangefactor/api/(.*)$ {
    fastcgi_ignore_client_abort on;
    fastcgi_read_timeout 300;
    fastcgi_param REQUEST_METHOD $request_method;
    fastcgi_param QUERY_STRING $query_string;
    fastcgi_param CONTENT_TYPE $content_type;
    fastcgi_param CONTENT_LENGTH $content_length;
    fastcgi_param GATEWAY_INTERFACE CGI/1.1;
    fastcgi_param SERVER_SOFTWARE nginx/$nginx_version;
    fastcgi_param REMOTE_ADDR $remote_addr;
    fastcgi_param REMOTE_PORT $remote_port;
    fastcgi_param SERVER_ADDR $server_addr;
    fastcgi_param SERVER_PORT $server_port;
    fastcgi_param SERVER_NAME $server_name;
    fastcgi_param SERVER_PROTOCOL $server_protocol;
    fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
    fastcgi_split_path_info ^(/orangefactor)(.*)$;
    fastcgi_param PATH_INFO $fastcgi_path_info;
    fastcgi_pass 127.0.0.1:9500;
}

This means mohawk sees:

DEBUG:mohawk.base:parsed URL parts:
{'hostname': u'brasstacks.mozilla.com',
 'path': u'/api/saveclassification',
 'port': 80,
 'query': '',
 'resource': u'/api/saveclassification',
 'scheme': u'http'}

Not only does this have the wrong path, but the port/protocol aren't preserved either (TLS termination is handled by Zeus, so somewhere along the line this is getting lost). Eugh.

Dumping `web.ctx` on prod gives:

{
  'environ': {
        ...
       'HTTP_HOST': 'brasstacks.mozilla.com',
       'PATH_INFO': '/api/saveclassification',
       'QUERY_STRING': '',
       'REQUEST_METHOD': 'POST',
       'SCRIPT_FILENAME': '/orangefactor',
       'SCRIPT_NAME': '',
       'SERVER_NAME': 'brasstacks.mozilla.com',
       'SERVER_PORT': '443',
       'SERVER_PROTOCOL': 'HTTP/1.1',
       ...
       'wsgi.url_scheme': 'http',
  },
  'fullpath': u'/api/saveclassification',
  'home': u'http://brasstacks.mozilla.com',
  'homedomain': u'http://brasstacks.mozilla.com',
  'homepath': u'',
  'host': u'brasstacks.mozilla.com',
  'method': u'POST',
  'path': u'/api/saveclassification',
  'protocol': u'http',
  'query': u'',
  'realhome': u'http://brasstacks.mozilla.com',
  ...
}

Doing the same locally using woo_server.py:

{
  'environ': {
    'ACTUAL_SERVER_PROTOCOL': 'HTTP/1.1',
    ...
    'HTTP_HOST': 'localhost:8080',
    'PATH_INFO': '/api/saveclassification',
    'QUERY_STRING': '',
    'REQUEST_METHOD': 'POST',
    'REQUEST_URI': '/api/saveclassification',
    'SCRIPT_NAME': '',
    'SERVER_NAME': 'localhost',
    'SERVER_PORT': '8080',
    'SERVER_PROTOCOL': 'HTTP/1.1',
    ...
    'wsgi.url_scheme': 'http',
  },
  'fullpath': u'/api/saveclassification',
  'home': u'http://localhost:8080',
  'homedomain': u'http://localhost:8080',
  'homepath': u'',
  'host': u'localhost:8080',
  'method': u'POST',
  'path': u'/api/saveclassification',
  'protocol': u'http',
  'query': u'',
  'realhome': u'http://localhost:8080',
}

So I could piece together the real absolute URL but it would be a bit of a hack, given that pretty much every choice above is either inconsistent between fcgi and non-fcgi or plain lies (eg wsgi.url_scheme of `http` on prod).

Another solution would be to just add a site root setting to orangefactor_local.conf.
The web.py docs are pretty useless (http://webpy.org/cookbook/ctx) - phrases like "appears to be" when used to describe properties on the request don't fill me with promise. 

Looking at the source, ctx.home is what we should be using, and it should already be taking into account the `/orangefactor` base:
https://github.com/webpy/webpy/blob/webpy-0.36/web/application.py#L347

The problem appears to be that the fcgi config on brasstacks is setting `SCRIPT_FILENAME` not `SCRIPT_NAME`.

That said, even after fixing that, `wsgi.url_scheme` is still wrong, and whilst it's likely a case of trying to override it by referring to the header `X-Forwarded-Proto` (or similar) using the fcgi config, it's not worth spending any longer on this, I'm just going to add an app_root setting and have that override ctx.home on prod.
Added the pref in:
https://hg.mozilla.org/automation/orangefactor/rev/cdf6e976c0c0

Added this to the brasstacks server/orangefactor_local.conf:
```
[orangefactor]
# web.ctx.home is incorrect on brasstacks, so we have to override it.
custom_site_root = https://brasstacks.mozilla.com/orangefactor
```

And deployed.

And now the auth works fine:

[~/src]$ http --auth-type=hawk --auth='treeherder:REDACTED' POST https://brasstacks.mozilla.com/orangefactor/api/saveclassification foo=bar
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 90
Content-Type: application/json; charset=UTF-8
Date: Tue, 09 Feb 2016 11:47:48 GMT
Server: nginx/0.8.54

{
    "_id": "nn2TvxFiRHC7Ln6zx7zwUg",
    "_index": "bugs",
    "_type": "bug_info",
    "_version": 1,
    "ok": true
}

Removed that test doc:

[~/src]$ http DELETE 'http://localhost:9200/bugs/bug_info/nn2TvxFiRHC7Ln6zx7zwUg'
HTTP/1.1 200 OK
Content-Length: 103
Content-Type: application/json; charset=UTF-8

{
    "_id": "nn2TvxFiRHC7Ln6zx7zwUg",
    "_index": "bugs",
    "_type": "bug_info",
    "_version": 2,
    "found": true,
    "ok": true
}

And lowered the fcgi (prod) logging level:
https://hg.mozilla.org/automation/orangefactor/rev/eb3397828640

And deployed again.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Depends on: 1250454
You need to log in before you can comment on or make changes to this bug.