Open Bug 1932617 Opened 3 months ago Updated 2 months ago

Glean is the host of some of the largest functions of libxul.so

Categories

(Toolkit :: Telemetry, enhancement, P4)

enhancement

Tracking

()

People

(Reporter: nbp, Unassigned)

References

Details

The following is the list of functions from libxul.so extracted from a local build computed with the following command:
readelf -sW libxul.so | grep '[0-9][0-9][0-9][0-9][0-9] FUNC' | sort -n -k 3,3 | c++filt | less

   Num:    Value          Size Type    Bind   Vis      Ndx Name
 20547: 00000000012de7a0 0x18780 FUNC    LOCAL  DEFAULT   16 mozilla::StaticPrefs::StartObservingAlwaysPrefs()
508661: 000000000725c310 0x33937 FUNC    LOCAL  HIDDEN    16 firefox_on_glean::metrics::__glean_metric_maps::event_test_get_error::h2647d7fe72df99f8
508660: 0000000007252550 40377 FUNC    LOCAL  HIDDEN    16 firefox_on_glean::metrics::__glean_metric_maps::event_test_get_value_wrapper::h8ae4a14b2fdf0b87
508658: 0000000007235450 41208 FUNC    LOCAL  HIDDEN    16 firefox_on_glean::metrics::__glean_metric_maps::record_event_by_id::h64df6134abd63c0a
156038: 0000000007225700 61841 FUNC    LOCAL  DEFAULT   16 firefox_on_glean::metrics::__glean_metric_maps::COUNTER_MAP::{{closure}}::hb03762439d8bfe29
582722: 00000000084ca7d0 63012 FUNC    LOCAL  HIDDEN    16 gleam::ffi_gl::Gl::load_with::h07942403e0904c6d
582723: 00000000084d9e00 63012 FUNC    LOCAL  HIDDEN    16 gleam::ffi_gl::Gl::load_with::hff2faa8440f76484
508659: 000000000723f550 77818 FUNC    LOCAL  HIDDEN    16 firefox_on_glean::metrics::__glean_metric_maps::record_event_by_id_with_time::h996dc1b6bb42b3a8

I am opening this bug because many of these functions are larger than the JavaScript Interpreter (55818 bytes), and at the time this was considered an issue as being a potential source for instruction cache invalidation.

Moving to toolkit::Telemetry, as this is fog related.

Component: Glean Platform → Telemetry
Product: Data Platform and Tools → Toolkit

Looking at these one at a time:

  • event_test_get_error is a ~900-line match statement whose cases each expand this macro. To avoid this would require:
    • Reducing the number of event metrics in Firefox Desktop
      • (unlikely to happen and counterproductive if we did)
    • Making an object-safe trait for test_get_errors for FOG's EventMetric to implement, so we could stop using the macro (heck, we might even be able to stop using the big match and instead put the id -> [instance that can test_get_errors()] mapping into a map or other datastructure, at a runtime performance cost (which we'd happily bear as this is a test-only method))
      • (possible, but not without risks and requires investment)
  • event_test_get_value_wrapper is basically the same and would have the same sort of remedies.
  • record_event_by_id is a third thing that is basically the same with the same remedies. Except that this is not a test-only method and so has to be more careful of runtime cost.
  • COUNTER_MAP::{{closure}} is the sort of thing that results if you move these match statements into maps. It is pathologically large because it holds 2300 Use Counters in addition to all the non-usecounter counters. It hasn't been optimized for binary size, memory size, or runtime efficiency, so there's no doubt some better way of organizing this data for quick retrieval. We could even consider reorganizing metric id assignments so that blocks of metric types are contiguous and thus metrics instances can be stored as slices and retrieved cheaply with one arithmetic operation to form an index. (this may run afoul of other optimization hopes like bug 1879329, so watch your step)
    • (plenty of room for experimentation here. We don't have a lot of bandwidth to perform it, though.)
  • record_event_by_id_with_time is a fourth thing that's basically the same as the first three. This, too, is not a test-only method and so has to be more careful of runtime cost. In addition, it is (only) used by IPC which means that it is likely being run on the main thread (ref bug 1641989). But it also means that we have a batch of similar operations to process all at once meaning we may have more options for remedies.
Severity: -- → N/A
Priority: -- → P4
See Also: → 1929210

I've already tried to just remove those wo are test-only in release mode as they are used for test - but discussing with glandium it looks like what we want is to move them to a seperate library / executable so that we can actually test the release.

So I briefly looked into this yesterday and I think a good way forward is to extract those test_get_num_recorded_errors functions into their own trait (maybe those test_get_error too, but that's a bit harder because they have varying return types).
Then we can generate &dyn fog::trait::Metric in those functions (and then inline the other part of that macro) and that alone seems to halve the size of the compiled function.
It requires some reshuffling of Glean code though to make it work (and I won't be able to tackle that this year)

You need to log in before you can comment on or make changes to this bug.