Open Bug 1685152 Opened 5 years ago Updated 8 months ago

Expose URI class

Categories

(GeckoView :: General, task, P3)

Unspecified
All
task

Tracking

(Not tracked)

People

(Reporter: agi, Unassigned)

References

(Blocks 6 open bugs)

Details

(Keywords: sec-want, Whiteboard: [geckoview:2023?][geckoview:2024H2?])

It would be really nice if GeckoView would expose a URI class that reuses Gecko's URI code, similar to nsIUri that can be used by embedders to parse uri strings.

This would help embedders be sure that they parse the URI the same way that GeckoView will, e.g. when trying to filter allowed URIs by protocol.

nsIUri definition: https://searchfox.org/mozilla-central/source/netwerk/base/nsIURI.idl

Severity: -- → S3
Priority: -- → P2
Whiteboard: [geckoview:m87]
Whiteboard: [geckoview:m87]

Sebastian mentioned in a meeting that there are a lot of places in AC where they want to parse a URI but don't have access to GeckoView directly, because the module doesn't depend on GeckoView.

See Also: → 1741290

Another use case for this class would be to get the TLD using the Public Suffix List, see this for context https://github.com/mozilla-mobile/fenix/pull/22455#discussion_r756945671

Priority: P2 → P3
Whiteboard: [geckoview:2023?]

From some old notes with Agi:

  • We need to use the parsing library before we even start GeckoRuntime (in Fenix, this happens in the IntentReceiverActivity).
  • The nsIUri doesn't have any other dependencies.
  • Maybe we can expose a static Java class that uses nsIUri as a separate module.
  • Only need to load the library into memory - which might be fast.

Tasks and enhancements should have severity N/A.

Severity: S3 → N/A
Type: enhancement → task

(In reply to [ex-Mozilla] Agi Sferro | :agi from comment #0)

It would be really nice if GeckoView would expose a URI class that reuses Gecko's URI code, similar to nsIUri that can be used by embedders to parse uri strings.

Not just "nice": the lack of this capability leads to security bugs when front-end URL parsing doesn't exactly match what Gecko thinks.

Keywords: sec-want
Whiteboard: [geckoview:2023?] → [geckoview:2023?][geckoview:2024H2?]
Blocks: 1813919
See Also: → CVE-2024-9956

Had a chat with :valentin about this.
Some notes:

  • having parity with Gecko on parsing URIs would be great, especially for cases where we send and receive an URI as currently the result we get back might be different
  • Using the same shared URI object would help on critical paths where currently communicating back and forth would parse the same URI multiple times.
  • the nsIURI code is already part of libxul which is used by Android but lacks the JNI bindings to be used from Java
    • As Jonathan mentioned, we might need the URI parsing functionality before loading Gecko which based on this recent investigation might take ~200ms on a cold start so it would be best to extract this functionality into a new, separate library.
  • the nsIURI code is tied to other functionalities and extracting it to a separate library is not straightforward, needs additional work
  • the plan is for Gecko to rely solely on rust-url but this is not an immediate priority so it might take some time to get there (there are also some edgcases with special treatment in Gecko that need to be refactored)
    • would be better for Android to use this instead. Same would apply to iOS. This will ensure consistency on all platforms.
    • rust-url doesn't have Java/Kotlin/Objective-C bindings yet
  • as a general recommendation: if there is no pressing need to use nsIURI from Gecko stick with Java APIs for the moment and later transition to rust-url.

(In reply to Petru-Mugurel Lingurar [:petru] from comment #6)

  • the nsIURI code is already part of libxul which is used by Android but lacks the JNI bindings to be used from Java
    • As Jonathan mentioned, we might need the URI parsing functionality before loading Gecko which based on this recent investigation might take ~200ms on a cold start so it would be best to extract this functionality into a new, separate library.

Thank you for connecting that perf information with this bug so we have a concrete minimum of how much we would lose! I suspect the cost would be much more than the ~200ms since that value is only for DNS warmup.

  • the nsIURI code is tied to other functionalities and extracting it to a separate library is not straightforward, needs additional work

It would be good to know more about the other functionalities. Implicitly, I think Agi and I had assumed that we would clone nsIUri into it's own library without any other Gecko parts depending on the library. Gecko internals would continue using it's own instance of nsiUri. If we later move to rust-url, then we could come back and decide: do we want to replace the library with nsIUri and keep the existing exposed API or do we want to remove the duplication and use the Gecko version (if perf is not impacted).

  • would be better for Android to use this instead. Same would apply to iOS. This will ensure consistency on all platforms.
  • rust-url doesn't have Java/Kotlin/Objective-C bindings yet

This is interesting - I don't believe we need rust-url to add support for those platform APIs if use rust-url with uniffi, the same way application services adds language support for both of our platforms. We could consider exposing rust-url through app services and get it from there too.

  • as a general recommendation: if there is no pressing need to use nsIURI from Gecko stick with Java APIs for the moment and later transition to rust-url.

With the notes above, I don't think it would be extra work to build on top of nsIUri today until we have more concrete plans for gecko's rust-url implementation which should not change our API surface. We have been using our own Java/Kotlin Uri implementation because the platform Java APIs do not suffice and comment 5 also prefers sharing this from a sec perspective.

A bit more info from talking with Valentin about how rust-url is used in Gecko:

If we'd be to use rust-url also through UniFFI and pass the URL object through ffi then when communicating with Gecko it can be used directly without needing to be parsed again.

(In reply to Petru-Mugurel Lingurar [:petru] from comment #6)

  • As Jonathan mentioned, we might need the URI parsing functionality before loading Gecko which based on this recent investigation might take ~200ms on a cold start so it would be best to extract this functionality into a new, separate library.

I'm not sure I understand why do we need a separate URI parsing functionality, when said functionality is already part of Gecko and is used on Android (as you yourself said above). Can you explain a bit what you mean?

  • the nsIURI code is tied to other functionalities and extracting it to a separate library is not straightforward, needs additional work

Do you mean the C++ code and the XPCOM interfaces? It is not clear to me why would it need to be extracted to a separate library?

  • would be better for Android to use this instead. Same would apply to iOS. This will ensure consistency on all platforms.

There currently is no way to use rust libraries in a direct way in GeckoView (as in: sharing rust libraries with Gecko). This is not a trivial problem to solve. Besides, Gecko is not there yet, and it is unclear how long it would take to get there. Our current URI parsing solutions are not ideal from the security stand point. We can solve these problems by exposing nsIURI, and then we definitely need to start thinking about how to share rust libraries with Gecko.

Implicitly, I think Agi and I had assumed that we would clone nsIUri into it's own library without any other Gecko parts depending on the library.

I am not sure what Jonathan assumed, but methods in XPCOM interfaces are essentially C functions, so we can bind to them via JNI. So, what needs to be done here is to write the JNI bindings for the XPCOM interfaces (at a minimum, we would need the nsIURI and the nsISupports, IIUC, because that is the base interface). The bindings would have to written by hand, as there is no generating code. We can write the generating code, but it's not trivial and would constitute quite a piece of yak shaving, so unless we would have to write a lot of bindings for this project (there are quite a few IDLs around the URI functionality, but it's not clear if we would need them all). Bindings written by hand would have to be kept in sync with the IDLs somehow (not sure how often they change). Also, the lifetime of the objects would have to be in sync and probably tied to the lifetime of the Java counterparts, so we will have to make sure we manage the refcounts correctly.

Whether the resulting Java interfaces would need to be their own library, I don't know. This comment seems to be suggesting a use case for a separate java library.

Flags: needinfo?(petru)

Hi folks, just thought I'd add some of my thoughts to this directly.
One of the main issues with having two URL parsers in Android and Gecko is that the Gecko nsIURI mostly aims to comply with the WHATWG URL standard, while the android URI and URL implementations are aligned with rfc3986 (as far as I know). This difference results in issues like bug 1813919.

As Petru mentioned, we'd ideally use just one parser in both Android code and Gecko, but extracting the Gecko parsers into a separate library that does not depend on libxul is non trivial.

Converting from Android URLs to Gecko URLs in an efficient manner is also likely to affect performance - I'm not aware on how many times we serialize and parse a URL. Having an nsIURI implementation in Android that has a different internal representation than the Gecko implementation would not help with this, as reparsing would still be necessary.

While we aim to have all of the Gecko URL parsers eventually converge on one implementation based on rust-url, that isn't on our team's priority list at the moment.

Note that there are some JAVA libraries out there implementing WHATWG URL - like https://github.com/stephanebastian/whatwg-url
Depending on what the Android team's requirements are, it might be a good option.

I am not sure what Jonathan assumed, but methods in XPCOM interfaces are essentially C functions, so we can bind to them via JNI. So, what needs to be done here is to write the JNI bindings for the XPCOM interfaces

I think one of the issues with that was that you still need libxul to be loaded to use JNI to call these xpcom methods, which I assume is not always the case. Petru, please correct me if that's not accurate.

As Petru mentioned, we'd ideally use just one parser in both Android code and Gecko, but extracting the Gecko parsers into a separate library that does not depend on libxul is non trivial.

Hi Valentin, thank you for chiming in! Your help is much appreciated. Can you expand a little bit on why do the parsers need to be extracted into a separate library?

One of the main issues with having two URL parsers in Android and Gecko is that the Gecko nsIURI mostly aims to comply with the WHATWG URL standard, while the android URI and URL implementations are aligned with rfc3986 (as far as I know). This difference results in issues like bug 1813919.

Interesting, I was not aware of that. Do you happen to know the reason why they are different? My understanding has been that we would not have a separate parsers on the Android side, and instead replace them with the Gecko one. Do you know of any reasons we cannot do that? (At the moment it is not clear to me why would we want to use a Java library, if this functionality is already implemented in Gecko)

I think one of the issues with that was that you still need libxul to be loaded to use JNI to call these xpcom methods, which I assume is not always the case. Petru, please correct me if that's not accurate.

This is the first time I hear that libxul is not loaded.... Without libxul neither GeckoView, nor Fenix would be able to function. What led you to that conclusion? Unless we are now discussing Android Components, which has the ability to swap the browsing engine - yes, if you use WebView, libxul would not be loaded. This is a GeckoView bug, however. Or are we discussing possibility of libxul not having loaded correctly / ending up in a bad state? Are there any other specific situations you have in mind?

Flags: needinfo?(valentin.gosu)

(In reply to [:owlish] 🦉 PST from comment #11)

This is the first time I hear that libxul is not loaded.... Without libxul neither GeckoView, nor Fenix would be able to function.

I was under the impression that we might need to parse URLs before loading libxul, such as for processing intents? For example in bug 1880491.
If that's not a case we care too much about, then calling into Gecko via JNI to create a nsIURI is a great option, and the option that is most likely to avoid future mismatches between GV and Gecko.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(petru)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #12)

(In reply to [:owlish] 🦉 PST from comment #11)

This is the first time I hear that libxul is not loaded.... Without libxul neither GeckoView, nor Fenix would be able to function.

I was under the impression that we might need to parse URLs before loading libxul, such as for processing intents? For example in bug 1880491.
If that's not a case we care too much about, then calling into Gecko via JNI to create a nsIURI is a great option, and the option that is most likely to avoid future mismatches between GV and Gecko.

Ah, ok - yes, that makes sense. Yes, indeed, there are some situations where we need to parse a URL before Gecko would get a chance to load. However, I think we need to try and see if we can find a less dramatic solution than extracting nsIURI into a separate library. Christian thinks it should be possible.

I also asked Christian about the difference in standards utilized for parsing Gecko vs AC - he said there was no particular reason for that, was not an intentional decision, so there is no reason for us to continue to stick with rfc3986.

Valentin, does Gecko use WHATWG everywhere? Thanks!

Flags: needinfo?(valentin.gosu)

Valentin, does Gecko use WHATWG everywhere? Thanks!

Yes. Our implementation aims to be fully compatible with the WHATWG spec. We're not completely there yet, but close. We're currently passing 99.3% of the URL web-platform-tests
The WHATWG URL standard is not too far from rfc3986, but it specifies what browsers actually implement, so it may occasionally change when browsers agree on it. As you noticed, Android can use a parser that implements rfc3986, and it's only a handful of corner cases. 🙂

Flags: needinfo?(valentin.gosu)
Blocks: 1950063
Blocks: 1885093
Blocks: 1965830
See Also: → 1964251, 1974302
See Also: → 1974963
You need to log in before you can comment on or make changes to this bug.