Closed Bug 1635238 Opened 4 years ago Closed 4 years ago

FOG's JS API Design

Categories

(Toolkit :: Telemetry, task, P1)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: chutten, Assigned: janerik)

References

Details

(Whiteboard: [telemetry:fog:m4])

Attachments

(1 file)

FOG needs a metrics API written in JS. This API needs to be designed. Some interesting questions to consider include:

  • How will glean_parser generate the code?
  • Will glean_parser generate JS or is it more efficient to use some sort of IDL, especially given we'll be designing a C++ API as well? (see bug 1635234)
  • Does this design necessitate any differences to the Multi-language Architecture Design previously worked out in bug 1618905?
  • What are the anticipated impacts of the proposed decision in terms of
    • Compile-time performance (incl code generation time)
    • Runtime performance
    • Binary size
    • Memory
  • Where will the documentation live (given that this is implemented on top of FOG but for Glean, this might not be as obvious as it seems)
  • What does this mean for (privileged) Web Extensions?

This bug covers (in not necessarily this order):

  • looking into API description languages in use in mozilla-central,
  • identifying and having discussions with domain experts
  • writing the proposal document
  • having the proposal document reviewed by team members and domain experts

Another point implicitly in the above but perhaps better called out explicitly is:

  • What does this mean for Artifact Builds?
Priority: -- → P2
Assignee: nobody → jrediger
Priority: P2 → P1

first draft of the proposal, asking for team-internal feedback.

Left some initial feedback in the doc. Very interesting possibilities.

We're currently designing the APIs for Project FOG, the project that will bring Glean, our new telemetry system, to Firefox.
The proposal discusses some options for implementing an ergonomic JavaScript API and I'd like all of your feedback on what I came up with.
I've experimented around with the implementations to get a feel for them, but I might have easily missed much better solutions.
I also didn't research the memory and performance impact yet, so if you already see a concern about that please raise it.

If you feel like you're not the best person to give this a look, that's of course fine. If you now someone that should see this, please let me know.

Feedback by two weeks from now (2020-06-24, because Virtu-All Hands) would be very much appreciated.

Flags: needinfo?(tcampbell)
Flags: needinfo?(peterv)
Flags: needinfo?(echen)
Flags: needinfo?(bugs)

Commented in the document

Flags: needinfo?(bugs)

Added some comments in the document.

Flags: needinfo?(echen)

Thanks, :edgar, :smaug and :bholley for initial feedback.
I reworked the proposal to single out one solution that I think will work for us for now.

I'd appreciate if you could give this one more look and point out any flaws I might have missed or concerns you see.
I might approach you again once we start implementing this for general feedback if that's ok.
Feedback by one week from now, Friday, 2020-07-31, would be appreciated.

The document: https://docs.google.com/document/d/17Ik8YQ50FJs8fUvirmHFxTCIq2HrfUazKCpxAHybp7Y/

Flags: needinfo?(tcampbell)
Flags: needinfo?(peterv)
Flags: needinfo?(echen)
Flags: needinfo?(bugs)
Flags: needinfo?(bholley)

It feels a bit sub-optimal to route all the individual telemetry values in generated JS code, because that adds per-process runtime memory overhead. I don't have a sense of how many of these JSMs would be loaded in practice for a given content process, but the memory usage could add up (whereas native code is shared across all processes). I don't have a sense of how big of an issue this is likely to be in practice, but it's a concerning design choice.

It seems like the ideal scheme for the generated metrics would be two WebIDL interfaces, each with a named getter [1]. The outer interface (Glean) would be a global singleton, which would have a getter to generate instances for each of the categories (i.e. .browserMeasurement). Those could all be represented by a single generic interface (GleanCategory), which would get instantiated with the proper index, which would in turn allow it to expose the appropriate metrics through its own named getter (e.g. .click).

As long as there are only a handful of actual metric types, it wouldn't be the end of the world to expose those as WebIDL too, but the proposed approach of exposing them as XPIDL seems sensible (and you won't need the init method, since you can just construct them in C++).

[1] https://developer.mozilla.org/en-US/docs/Mozilla/WebIDL_bindings

Flags: needinfo?(bholley)

(In reply to Bobby Holley (:bholley) from comment #8)

It seems like the ideal scheme for the generated metrics would be two WebIDL interfaces, each with a named getter [1]. The outer interface (Glean) would be a global singleton, which would have a getter to generate instances for each of the categories (i.e. .browserMeasurement). Those could all be represented by a single generic interface (GleanCategory), which would get instantiated with the proper index, which would in turn allow it to expose the appropriate metrics through its own named getter (e.g. .click).

I managed to prototype this and it seems like that's a pretty doable solution indeed.
I will change the proposal accordingly and we'll proceed with that.

Thanks for the useful input!

(Also cancelling the other ni?, I think with above feedback we'll have a good solution now)

Flags: needinfo?(echen)
Flags: needinfo?(bugs)

One thing I didn't manage to do yet is that if we implement metric types in their own XPIDL, how do we return that from a named getter?
That named getter requires a single type to return.
If we make that e.g. nsISupports we also need a .QueryInterface(Ci.nsIGleanCounter) before we can call counter methods on it.

(In reply to Jan-Erik Rediger [:janerik] from comment #10)

One thing I didn't manage to do yet is that if we implement metric types in their own XPIDL, how do we return that from a named getter?
That named getter requires a single type to return.
If we make that e.g. nsISupports we also need a .QueryInterface(Ci.nsIGleanCounter) before we can call counter methods on it.

If you declare the return type as nsISupports and then implement classinfo for the concrete types, XPConnect should automatically expose the right interfaces for you.

Wow, thanks, that was indeed the missing piece. I prototyped it and it works just as expected.
With that this proposal is finally ready and accepted and we can go into actually implementing further stages of FOG.

Proposal status: accepted.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

If there are going to be multiple instances with the same interface, I think I'd probably lean toward using WebIDL rather than XPIDL for them. WebIDL prototype objects take up a fair amount of memory, but when you have a bunch of objects sharing them, they tend to wind up using less memory than the corresponding XPIDL objects, which don't share prototypes.

They're also much faster, which may or may not matter, depending on how much they wind up being used. For TelemetryStopwatch, for instance, converting to WebIDL (from JS rather than XPIDL, granted) made enough difference to trigger a ton of telemetry improvement alerts.

(In reply to Kris Maglione [:kmag] from comment #13)

If there are going to be multiple instances with the same interface, I think I'd probably lean toward using WebIDL rather than XPIDL for them. WebIDL prototype objects take up a fair amount of memory, but when you have a bunch of objects sharing them, they tend to wind up using less memory than the corresponding XPIDL objects, which don't share prototypes.

They do if you implement classinfo no? That's what the XPCWrappedNativeProto stuff is about.

The biggest issue with WebIDL is that it increases binary size way more than xpidl. At least we should avoid use of WebIDL for chrome-only interfaces which contain lots of infrequently used attributes or methods. That is what DOM peers discussed earlier this year.

(In reply to Olli Pettay [:smaug] from comment #15)

The biggest issue with WebIDL is that it increases binary size way more than xpidl. At least we should avoid use of WebIDL for chrome-only interfaces which contain lots of infrequently used attributes or methods. That is what DOM peers discussed earlier this year.

I think we need to weigh the trade-offs in each case. When the performance or runtime memory characteristics outweigh the binary size overhead, I think we should still consider preferring WebIDL even for chrome-only interfaces. They definitely did for TelemetryStopwatch. I don't know if they do here.

(In reply to Bobby Holley (:bholley) from comment #14)

(In reply to Kris Maglione [:kmag] from comment #13)

If there are going to be multiple instances with the same interface, I think I'd probably lean toward using WebIDL rather than XPIDL for them. WebIDL prototype objects take up a fair amount of memory, but when you have a bunch of objects sharing them, they tend to wind up using less memory than the corresponding XPIDL objects, which don't share prototypes.

They do if you implement classinfo no? That's what the XPCWrappedNativeProto stuff is about.

Maybe. Depending on the exact ClassInfo flags. But I can never remember what is left of that logic since we removed the DOMClass flag and the last remnant DOM classes that depended on it. Either way, if we're going to use XPIDL for this, we should at least double check. And if it's going to be used as extensively as some of the other Telemetry APIs, I suspect we're going to want it to be WebIDL for performance reasons, regardless.

(In reply to Kris Maglione [:kmag] from comment #16)

(In reply to Olli Pettay [:smaug] from comment #15)

The biggest issue with WebIDL is that it increases binary size way more than xpidl. At least we should avoid use of WebIDL for chrome-only interfaces which contain lots of infrequently used attributes or methods. That is what DOM peers discussed earlier this year.

I think we need to weigh the trade-offs in each case. When the performance or runtime memory characteristics outweigh the binary size overhead, I think we should still consider preferring WebIDL even for chrome-only interfaces.

Agree we should weigh the trade-offs case-by-case, and that content process memory overhead is generally a more pressing problem than binary size.

(In reply to Bobby Holley (:bholley) from comment #14)

(In reply to Kris Maglione [:kmag] from comment #13)

If there are going to be multiple instances with the same interface, I think I'd probably lean toward using WebIDL rather than XPIDL for them. WebIDL prototype objects take up a fair amount of memory, but when you have a bunch of objects sharing them, they tend to wind up using less memory than the corresponding XPIDL objects, which don't share prototypes.

They do if you implement classinfo no? That's what the XPCWrappedNativeProto stuff is about.

Maybe. Depending on the exact ClassInfo flags.

I don't think it depends on flags: https://searchfox.org/mozilla-central/rev/b4f3ce16c5099cf068fb023341959a0457adec9e/js/xpconnect/src/XPCWrappedNative.cpp#398

In general the principle of XPConnect was always: nsIXPCScriptable lets you control the JSClass, and nsIClassInfo lets you control the prototype.

And if it's going to be used as extensively as some of the other Telemetry APIs, I suspect we're going to want it to be WebIDL for performance reasons, regardless.

I just took some quick measurements in the browser console Nightly on my MBP comparing Cu.waiveXrays(1) with ChromeUtils.waiveXrays(1) [1].

On my machine, I see about 280ns for the XPIDL call and 140ns for the WebIDL call. Things can obviously change quite a bit once you start getting into complex argument conversions, but a decent approximation seems to be 2x faster, 100ns-order-of-magnitude.

My suspicion is that any code path that is sensitive to that level of overhead should probably not be making direct telemetry calls. But I'll let the Telemetry folks confirm.

[1] { const ITERS = 1000*1000; let x = 0; let start = performance.now(); for (let i = 0; i < ITERS; ++i) { x = x+Cu.waiveXrays(1); } let end = performance.now(); console.log((end - start)/ITERS)}

Thanks a bunch for this discussion, that's really helpful.
I guess implementation wise WebIDL and XPIDL are close enough that we can test it out with a single metric type on implementation (testing with both implementations) and go from that.

(In reply to Bobby Holley (:bholley) from comment #17)

I just took some quick measurements in the browser console Nightly on my MBP comparing Cu.waiveXrays(1) with ChromeUtils.waiveXrays(1) [1].

On my machine, I see about 280ns for the XPIDL call and 140ns for the WebIDL call. Things can obviously change quite a bit once you start getting into complex argument conversions, but a decent approximation seems to be 2x faster, 100ns-order-of-magnitude.

I had a hunch last night that my numbers might have been affected by some devtools-debugger-specific behavior in the JS engine, and so this morning I rejiggered the testcase a bit [1]. This gives 40ns for the WebIDL call and 160ns for the XPConnect call, which is a bit more in line with my intuition. So a reasonable approximation is that a simple XPConnect call adds about 100ns of overhead relative to a similar WebIDL call.

[1] {let s = document.createElement("script"); s.textContent = "{ const ITERS = 1000*1000; let x = 1; let start = performance.now(); for (let i = 0; i < ITERS; ++i) { x = x + ChromeUtils.waiveXrays(x); } let end = performance.now(); console.log((end - start)/ITERS)}"; document.head.append(s); s.remove(); }

(In reply to Bobby Holley (:bholley) from comment #19)

This gives 40ns for the WebIDL call and 160ns for the XPConnect call, which is a bit more in line with my intuition. So a reasonable approximation is that a simple XPConnect call adds about 100ns of overhead relative to a similar WebIDL call.

I think that's close to my intuition for simple cases, too. But I'm generally somewhat suspicious of microbenchmarks, and I think there are more than enough cases where even the ~100ns is enough to make a difference. I added ChromeUtils.waiveXrays in the first place because when I profiled the extension binding code, Cu.waiveXrays was a significant source of overhead, and moving it to ChromeUtils made a measurable difference. For things like Telemetry-related APIs which are used in a lot of places, often repeatedly, I'd especially expect that to matter. But, again, it depends on how the API is used, and I don't know that much about this one.

There are also other sources of overhead that are different for XPConnect bindings than for WebIDL bindings. I typically see things like object construction and wrapping being much more expensive for XPConnect WNs than for WebIDL ones, though that's possibly less of an issue for WNs that have a shared prototype, and wrapping is much less of an issue now that pretty much all chrome code lives in the same compartment. But even so, I'd expect that to be a potential issue here, where we'll have a bunch of objects, rather than just one, each of which will need to be wrapped each time it's accessed.

See Also: → 1715542
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: