Closed Bug 1676676 Opened 4 years ago Closed 3 years ago

Investigate how to deal with CORS on the Glean pipeline, when sending pings from Glean.js

Categories

(Data Platform and Tools Graveyard :: Glean.js, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brizental, Assigned: whd)

References

Details

(Whiteboard: [dataplatform])

Attachments

(4 files)

Glean.js pings that are sent from a browser to https://incoming.telemetry.mozilla.org today, will invariably get a CORS error.

Cross-Origin Request Blocked: The Same Origin Policy disallows
reading the remote resource at https://some-url-here. (Reason:
additional information here).

For the development of the Glean.js proof-of-concept we used a CORS proxy to go around this problem.

It is time to figure out a long term plan to deal with this.

Flags: needinfo?(fbertsch)
Flags: needinfo?(dthorn)
Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m20]

Beatriz, is there any way for us to make these simple requests that don't activate CORS preflight?

Flags: needinfo?(fbertsch) → needinfo?(brizental)

(In reply to Frank Bertsch [:frank] from comment #1)

Beatriz, is there any way for us to make these simple requests that don't activate CORS preflight?

I don't think we can, because we need to send custom headers :(

(In reply to Frank Bertsch [:frank] from comment #1)

Beatriz, is there any way for us to make these simple requests that don't activate CORS preflight?

I don't think so, unfortunately.

Other than the custom header issue that Alessio already mentioned, we will send request with Content-Type JSON.

Flags: needinfo?(brizental) → needinfo?(fbertsch)

Got it. In that case we need to consider all the CORS headers:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: * (maybe? which ones are you all using?)
Access-Control-Allow-Headers: X-Pingsender-Version,X-Pipeline-Proxy,...
Access-Control-Max-Age: _a really long time?_

But, Safari and IE don't support * for -Methods and -Headers (and I assumed you want to support those), so we will need to explicitly define them. Methods probably won't change and can be static, but Headers will not be. We do have a list of metadata headers, so we should be able to use that list to support Allow-Headers.

It doesn't look like we can just add these headers to the app responses, though, since we need to support preflight. We can do that in nginx, but that is a separate config from the metadata headers in gcp-ingestion. We could support that in Sanic but offhand I'm not sure what the implications of that are.

I think it's best to hand this off to :whd and :relud to think through the best place to integrate these headers.

Flags: needinfo?(fbertsch) → needinfo?(whd)

(In reply to Frank Bertsch [:frank] from comment #4)

Got it. In that case we need to consider all the CORS headers:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: * (maybe? which ones are you all using?)
Access-Control-Allow-Headers: X-Pingsender-Version,X-Pipeline-Proxy,...
Access-Control-Max-Age: _a really long time?_

The only method we will need is POST (and OPTIONS?). We will only interact with the server to send pings :)

Looks like we need the CORS headers to be:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST,OPTIONS
Access-Control-Allow-Headers: X-Pingsender-Version,X-Pipeline-Proxy,...
Access-Control-Max-Age: 60

Where Access-Control-Allow-Headers is the only one that can't be a static list (however, it is not updated often). Access-Control-Max-Age is also optional, but we're not expecting to change these very often, and the default is only 5s.

Beatriz, what's the timeline here? How soon do you need this?

Flags: needinfo?(brizental)

The plan is to ship Glean.js with Ion by late January 2021 / early February 2021.

Ideally, I believe the timeline here is mid-December/2020. Do you think that is feasable?

Flags: needinfo?(brizental) → needinfo?(fbertsch)

I doubt it would get done next week, and the following is Thanksgiving week, so it would have to be done in the first two weeks of December to match that timeline. That might be pushing it when considering a new deploy of the edge. We'll need whd's input.

Flags: needinfo?(fbertsch)

(In reply to Frank Bertsch [:frank] from comment #9)

I doubt it would get done next week, and the following is Thanksgiving week, so it would have to be done in the first two weeks of December to match that timeline. That might be pushing it when considering a new deploy of the edge. We'll need whd's input.

Alright let's wait for his input in this case. At the latest this should be sorted out by the Ion integration deadline (late January/2021), for development we can always use a CORS proxy.

looks like frank covered everything the initial needinfo was for

Flags: needinfo?(dthorn)

We'll need whd's input.

I'd like opsec to evaluate this request, so I'm NIing :ajvb. I'm guessing a wide-open CORS configuration will cause some automated endpoint tests to fail such as tls-observatory, but my information here may be out of date. In any case since incoming.tmo is an API server and not a web page so we're probably already failing to conform here. In this case though, we'll be explicitly adding a CORS configuration whereas currently there just isn't one.

What is the expected Referrer going to be for pings coming from glean.js? I'm trying to understand how Pioneer/Ion/Rally (which is presumably an addon of some sort) is expected to interact with incoming.tmo. Should this header propagate to downstream infrastructure? I think from a pipeline perspective we probably don't need it but I'd like to minimally follow any opsec SOP WRT logging here.

Regarding time frame, we have the following EOY change freezes scheduled:

11/23 to 11/27
12/21 to 1/1

so if opsec approves this might fit into early-mid December. However, I'd consider this blocked on the outstanding edge deployment for bug #1666498.

Flags: needinfo?(whd) → needinfo?(abahnken)

(In reply to Wesley Dawson [:whd] from comment #12)

What is the expected Referrer going to be for pings coming from glean.js? I'm trying to understand how Pioneer/Ion/Rally (which is presumably an addon of some sort) is expected to interact with incoming.tmo. Should this header propagate to downstream infrastructure? I think from a pipeline perspective we probably don't need it but I'd like to minimally follow any opsec SOP WRT logging here.

With respect to the Pioneer/Ion/Rally (PiR!?), I'm tasked with writing a document detailing that this/next week. The next evolution of PiR will likely use Glean.js and the headers listed in Access-Control-Allow-Headers (documented here) will need to flow through the pipeline.

Or did you mean something else, :whd?

Flags: needinfo?(whd)

https://mozilla.github.io/glean/book/user/pings/index.html#submitted-headers

I see a couple of new (to me) headers here, specifically X-Client-Type and X-Client-Version, but they are described as "support handling of Glean pings in the legacy pipeline". I'm not sure what this means, since we don't propagate these headers currently. If you want to include these additional headers downstream that sounds like a mostly separate concern from the one being addressed here, and would involve metadata schema updates and additional discussion on the expected utility and use cases for these headers.

I'm tasked with writing a document detailing that this/next week

I'll look at this when it's ready then, I don't understand what the Referer header will be set to for requests to incoming.tmo from the Pioneer/Ion/Rally addon.

Flags: needinfo?(whd)

(In reply to Wesley Dawson [:whd] from comment #14)

https://mozilla.github.io/glean/book/user/pings/index.html#submitted-headers

I see a couple of new (to me) headers here, specifically X-Client-Type and X-Client-Version, but they are described as "support handling of Glean pings in the legacy pipeline". I'm not sure what this means, since we don't propagate these headers currently. If you want to include these additional headers downstream that sounds like a mostly separate concern from the one being addressed here, and would involve metadata schema updates and additional discussion on the expected utility and use cases for these headers.

No, we don't want to include these headers in the pipeline: we want to retain the current behaviour for Glean applications without any special change.

I'm tasked with writing a document detailing that this/next week

I'll look at this when it's ready then, I don't understand what the Referer header will be set to for requests to incoming.tmo from the Pioneer/Ion/Rally addon.

I'm sorry if I keep bugging you for this, but I want to understand it as much as possible before getting heads down. Are we required to set a Referer header due to CORS?

Flags: needinfo?(whd)

Okay, I have some good news here.

I researched more about this, and I stumbled on this article which mentions that when a web extension needs to do a cross-origin request it must add the origin to its permissions list and that will bypass CORS. I tested and indeed it works: https://debug-ping-preview.firebaseapp.com/pings/gleanjs-no-cors-test (ping sent from a web extension by adding "https://incoming.telemetry.mozilla.org/" to the permissions list on the manifest.json). See also Host permissions.

This is good because it loosens our timeline here. We plan to instrument Ion/Rally and Bergamot with Glean.js by late January/2021, but both are web extensions. We still need to figure out a way to send Glean.js pings from websites to incoming.tmo, which will require that server to support CORS, but we don't have urgency for that.

(In reply to Wesley Dawson [:whd] from comment #12)

What is the expected Referrer going to be for pings coming from glean.js?

Glean.js will send it's requests directly from the browser to the server, thus the CORS issue. Now that we know this issue only needs to be dealt with for websites, the origin of the requests is much clearer. If we instrument https://mozilla.com with Glean.js the origin is https://mozilla.com and so on.

Maybe, instead of Access-Control-Allow-Headers: * we could list specifically the websites we instrument with Glean.js here, so e.g. Access-Control-Allow-Headers: https://mozilla.com https://mozilla.org ..., this would require some sort of registration process for everytime a Mozilla website starts using Glean.js, but there is already the application-id registration anyways, so maybe it is not a big deal?

Should this header propagate to downstream infrastructure? I think from a pipeline perspective we probably don't need it but I'd like to minimally follow any opsec SOP WRT logging here.

Only the edge server would need to support CORS, AFAIK.

(In reply to Beatriz Rizental [:brizental] from comment #16)

Okay, I have some good news here.

I researched more about this, and I stumbled on this article which mentions that when a web extension needs to do a cross-origin request it must add the origin to its permissions list and that will bypass CORS. I tested and indeed it works: https://debug-ping-preview.firebaseapp.com/pings/gleanjs-no-cors-test (ping sent from a web extension by adding "https://incoming.telemetry.mozilla.org/" to the permissions list on the manifest.json). See also Host permissions.

This is good because it loosens our timeline here. We plan to instrument Ion/Rally and Bergamot with Glean.js by late January/2021, but both are web extensions. We still need to figure out a way to send Glean.js pings from websites to incoming.tmo, which will require that server to support CORS, but we don't have urgency for that.

Thanks for looking into this. Looks like Pioneer/Ion/Rally won't need CORS then. Let me clear the ni? for Wesley (thanks Wesley!).

Maybe, instead of Access-Control-Allow-Headers: * we could list specifically the websites we instrument with Glean.js here, so e.g. Access-Control-Allow-Headers: https://mozilla.com https://mozilla.org ..., this would require some sort of registration process for everytime a Mozilla website starts using Glean.js, but there is already the application-id registration anyways, so maybe it is not a big deal?

This pretty much depends on how much time it takes to update this, who needs to update this every time a product needs to support Glean.js. Right now, it takes less than a day for an app id to be enabled (it actually takes a few minutes). I would like us to not deviate from this kind of SLA.

Moreover, by requiring enabling on the ops side, it would make it not possible to use the glean debug view without using a CORS proxy, which is very bad for the developer ergonomics.

Flags: needinfo?(whd)

(In reply to Wesley Dawson [:whd] from comment #12)

We'll need whd's input.

I'd like opsec to evaluate this request, so I'm NIing :ajvb. I'm guessing a wide-open CORS configuration will cause some automated endpoint tests to fail such as tls-observatory, but my information here may be out of date. In any case since incoming.tmo is an API server and not a web page so we're probably already failing to conform here. In this case though, we'll be explicitly adding a CORS configuration whereas currently there just isn't one.

Thanks :whd. After this stuff is finalized I'll make sure we add an exemptions/changes needed to our security scanners.

(In reply to Beatriz Rizental [:brizental] from comment #16)

Okay, I have some good news here.

I researched more about this, and I stumbled on this article which mentions that when a web extension needs to do a cross-origin request it must add the origin to its permissions list and that will bypass CORS. I tested and indeed it works: https://debug-ping-preview.firebaseapp.com/pings/gleanjs-no-cors-test (ping sent from a web extension by adding "https://incoming.telemetry.mozilla.org/" to the permissions list on the manifest.json). See also Host permissions.

This is good because it loosens our timeline here. We plan to instrument Ion/Rally and Bergamot with Glean.js by late January/2021, but both are web extensions. We still need to figure out a way to send Glean.js pings from websites to incoming.tmo, which will require that server to support CORS, but we don't have urgency for that.

Ok yeah, this is awesome.

From the Requesting cross-origin permissions section of that article, it says:

By adding hosts or host match patterns (or both) to the permissions section of the manifest file, the extension can request access to remote servers outside of its origin.

Something I'm unclear on is how the "remote server" accepts the request. Is that done through the Access-Control-Allow-Origin header? Or is the "request for permissions" given to the user?

Flags: needinfo?(abahnken)

(In reply to AJ Bahnken [:ajvb] from comment #18)

Something I'm unclear on is how the "remote server" accepts the request. Is that done through the Access-Control-Allow-Origin header? Or is the "request for permissions" given to the user?

IIUC the remote server does not need to take any action. I tried adding incoming.tmo to the permissions list in the manifest.json of a demo webextension and was successful in sending a ping from it (https://debug-ping-preview.firebaseapp.com/pings/gleanjs-no-cors-test this ping exactly). Access-Control-Allow-Origin will only be required when sending a ping from websites, but web extensions not.

(In reply to Beatriz Rizental [:brizental] from comment #19)

(In reply to AJ Bahnken [:ajvb] from comment #18)

Something I'm unclear on is how the "remote server" accepts the request. Is that done through the Access-Control-Allow-Origin header? Or is the "request for permissions" given to the user?

IIUC the remote server does not need to take any action. I tried adding incoming.tmo to the permissions list in the manifest.json of a demo webextension and was successful in sending a ping from it (https://debug-ping-preview.firebaseapp.com/pings/gleanjs-no-cors-test this ping exactly). Access-Control-Allow-Origin will only be required when sending a ping from websites, but web extensions not.

Ok awesome. Thanks.

Component: Glean: SDK → Glean.js
Whiteboard: [telemetry:glean-rs:m20]
Priority: P3 → P4
Whiteboard: [telemetry:glean-js:backlog]

Hey folks, the priority on this bug is changing again as we start to test out Glean in websites.

Pinging :whd and :relud here in the hopes you can tell me what would be the next steps to get this going :)

Flags: needinfo?(whd)
Flags: needinfo?(dthorn)
Priority: P4 → P2
Whiteboard: [telemetry:glean-js:backlog]

I am going to defer to :whd on this, because it seems to me that nginx config is the right place to enable this.

Flags: needinfo?(dthorn)

I'm discussing this with the rest of DSRE tomorrow and will get back to you then. Leaving my NI.

I spoke to :jason and we can use a modified version of sync's CORS configuration that limits the methods a bit more and includes the telemetry custom headers. I will prepare a PR for this later this week.

I think we should probably test this out by provisioning stage using these headers and attempting to send data there via glean-js on a website. Even if we don't have schemas for the data, if data makes it to pbe that would be an indication that CORS is working as expected.

Flags: needinfo?(whd)

The problem of headers lists from comment #4 becomes relevant again. Currently we don't configure headers in ops logic anywhere, and it would be nice to keep it that way, but I'm not sure how we'd manage this without getting the headers into nginx somehow.

The simplest solution is to manage METADATA_HEADERS in ops logic, but this requires duplicating header configuration from code. Generally we'll want to update the default in code along with documentation in the gcp-ingestion repo, so I don't think moving management of this list exclusively to ops logic is the right move (but it could technically work).

Another option might be to move METADATA_HEADERS into a json file in the code. The ops logic that generates nginx configuration can then run the version of ingestion-edge being deployed and extract the default value as json to provide to the nginx configmap for CORS purposes. This creates an implicit interface where the app container puts the headers in a stable location, but once set up ops config changes wouldn't be required as headers are added.

There are probably some other ways of handling this. NI :relud for thoughts.

Flags: needinfo?(dthorn)

I would be inclined to move METADATA_HEADERS entirely to ops logic, and reduce the default in code to an empty list or just "DNT" or something. We can update the documentation spec in gcp-ingestion repo to say that the precise list of headers is up to Ops config.

Flags: needinfo?(dthorn)

(In reply to Daniel Thorn [:relud] from comment #26)

I would be inclined to move METADATA_HEADERS entirely to ops logic, and reduce the default in code to an empty list or just "DNT" or something. We can update the documentation spec in gcp-ingestion repo to say that the precise list of headers is up to Ops config.

As far as I know, every new header requires an associated ingestion-core code change e.g. https://github.com/mozilla/gcp-ingestion/pull/1699/files#diff-c8b8a7774d436a170e6c4cef4748e305480da51a3c26791c7dca8192690419d6. So unless we have a way of handling METADATA_HEADERS without the precise list downstream of the edge (some of these headers are processed specially and would require code changes regardless), I don't think we gain much in terms of maintainability by removing the list from specifically the edge config of gcp-ingestion.

Good point. Moving the config to a json file is fine with me, as is duplicating them, since that's essentially what's going on in gcp-ingestion. Whichever you prefer.

Also, the headers listed for CORS don't exactly match METADATA_HEADERS, because CORS assumes certain headers are always allowed (Content-Length, Date, etc).

(In reply to Daniel Thorn [:relud] from comment #28)

Good point. Moving the config to a json file is fine with me, as is duplicating them, since that's essentially what's going on in gcp-ingestion. Whichever you prefer.

I think I prefer having a JSON file that can be used at deploy time. This limits the duplication to what already exists in gcp-ingestion and means we could in theory refactor that repo to have a single source of truth eventually.

Also, the headers listed for CORS don't exactly match METADATA_HEADERS, because CORS assumes certain headers are always allowed (Content-Length, Date, etc).

This is a good point; the logic that reads the headers list from the container will need to filter them to non-CORS-default headers before passing the list to nginx.

Whiteboard: [data-platform-infra-wg]

assigning this to myself to put that list in a json file, will reassign to :whd when that's ready to be consumed by ops logic

Assignee: nobody → dthorn
Assignee: dthorn → whd

I have preliminary config for this in https://github.com/mozilla-services/cloudops-infra/pull/3484 and https://stage.ingestion-edge.nonprod.dataops.mozgcp.net/ is configured to serve these headers. I think a reasonable next step is to try to send some (possibly bad) data to this endpoint from a website and see if the data makes it to the payload_bytes tables.

Hey Wesley, just tried it out by setting this enpoint on the Glean.js website sample app. Still got CORS errors:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://stage.ingestion-edge.nonprod.dataops.mozgcp.net/submit/glean-sample-website/submission/1/193e978d-b5a9-4402-9c51-b0ca08363370. (Reason: header ‘content-encoding’ is not allowed according to header ‘Access-Control-Allow-Headers’ from CORS preflight response).

I have attached a HAR file describing the requests and responses.

I've updated the CORS list, which now looks like the following:

Accept-Encoding,Connection,Content-Encoding,DNT,Date,Sec-Fetch-Dest,Sec-Fetch-Mode,Sec-Fetch-Site,User-Agent,X-Debug-ID,X-Forwarded-For,X-Pingsender-Version,X-Pipeline-Proxy,X-Source-Tags,X-Telemetry-Agent

Please try again and let me know the result, or refer me to documentation for how I could set up testing myself.

Another observation is that the current configuration uses 'Access-Control-Allow-Origin' "$http_origin", which means if the request doesn't come in with an Origin (as appears to be the case from the HAR file) no Access-Control-Allow-Origin response header is sent, which appears to be desirable behavior. Is it expected that glean-js based requests will set the Origin header? Should we switch to always serve * for this value? It's my understanding that since we're not setting Access-Control-Allow-Credentials, * is probably an acceptable header, but it might make sense to require glean-based requests to set their Origin. I'm also not sure what the client CORS behavior is when that response header isn't set.

Is it expected that glean-js based requests will set the Origin header?

Yes, the browser should set this header by default. I realized it was not doing that because I was simply opening my html file in the browser instead of serving my sample app from a server. I changed that behaviour in the attached PR. If you want to test it yourself, you can download that branch and change this line to

Glean.initialize("glean-sample-website", true, { serverEndpoint: "https://stage.ingestion-edge.nonprod.dataops.mozgcp.net" });

Build instructions are here.


I did test it myself and still got blocked by CORS, even with the Origin header set. I can't see the changes in https://github.com/mozilla-services/cloudops-infra/pull/3484, I think I don't have permissioons, so not sure what is happening. However, the current error is 405 Method Not Allowed on the OPTIONS request so maybe that is issue? Previously the response for that request was 204 No Content.

I'll attach another HAR file with my last try.

This second try looks like it was hitting incoming.telemetry.mozilla.org (the production endpoint) and not stage.ingestion-edge.nonprod.dataops.mozgcp.net, which is likely why it failed in that manner. I can try this out myself later using your branch but my connectivity is pretty bad this week so I might not get to it until next week.

Well, isn't that embarassing. You are right, the endpoint was wrong there. Oversight.

I just tested with the correct enpoint now and it worked :)

Ok, pending some further review and implementation for the old edge I expect to roll this out to production early next week, or late next week/early the following week if we're focused on the merino launch.

Production is serving these headers and I filed https://github.com/mozilla/data-docs/pull/673 to note that. I think this can be closed after final verification that incoming.tmo is able to receive submissions.

Hey Wesley, https://debug-ping-preview.firebaseapp.com/pings/glean-from-website it works! Thanks a lot for working on this :)

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
See Also: → 1742444
Product: Data Platform and Tools → Data Platform and Tools Graveyard
Whiteboard: [data-platform-infra-wg] → [dataplatform]
See Also: → 1777182
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: