Closed Bug 1816744 Opened 2 years ago Closed 10 months ago

[meta] Glean `stack` metric type

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1831905

People

(Reporter: chutten, Assigned: chutten)

References

Details

(Keywords: meta)

We have a couple of bugs filed already for some narrow-to-broad use cases of a stack metric type: bug 1704854 and bug 1728784.

Plus, we have bug 1784069 coming from the "crash" ping angle bringing in a couple of new cases plus the history of the existing Legacy "crash" ping's stackTraces substructure.

And we can also see BHR's use of stacks for hangs.

And in non-stack but stack-related instances: How does the design of the struct and struct_list metric types factor in (if at all)?

This metabug is about gathering requirements for a stack metric type (some of which we have right here, but no doubt there's more), turning it into a design (the next step on the process), then of course the impl, testing, validation etc. Hopefully we can close out some of the See Also bugs along the way.

Previously, the "stack" metrics in Telemetry included much more than the stack (e.g., loaded module information and exception info). I don't think this additional information is necessary to capture as part of the stack metric type; it can be gathered by other means and represented in other Glean metrics easily. However the relationship between these things (specifically referring to the loaded module in a particular stack frame) may be more efficient as an index (as it was in Telemetry), so that may be an argument for including that information.

As I see it, at a minimum to be useful the stack information needs to accommodate:

  • One thread or all threads
  • For each thread: in the ballpark of at least 10-20 frames
  • For each frame: the module containing the executing code, an instruction pointer as an offset from the beginning of the module, the means by which that instruction pointer was obtained, and optionally the symbolicated frame information (the symbol name, offset into the symbol, and maybe even debug line info if available)

It's not clear to me whether the above parameters should be configurable as static settings in the metric itself, or whether the metric type should have some large upper limits and the code which populates the metric can choose what's included in a particular probe (mainly choosing which threads and how many stack frames to include).

Storage

From a data perspective, these requirements may boil down to a storage format that is fairly terse in the "not much client-side processing" case, providing module names/versions/hashes as strings, and everything else in the stack information would be numbers (thread ids, IPs, indirect module references as an optimization, an enum value relating to IP origin).

When more client-side processing is done, we will necessarily need to transmit much more information as we'll be sending the symbol names and possibly debug details. While this may blow up the ping size, this information is very valuable as it:

  • is the most accurate information we could possibly get,
  • allows immediate action on the information (as opposed to further server-side processing to get the stack information after the fact), and
  • distributes computing to lessen server-side requirements (which may in turn allow for processing more ping volume).

The threads, frames, and modules are unbounded and we should truncate excessive items and include a flag or some other indicator that they were truncated.

For example, stackoverflow errors can have excessive number of frames. In Socorro, we have a processing rule to truncate stacks in the middle that exceed 500 frames. (bug #1763154)

Depends on: 1831905

:afranchuk, design discussions are happening in the doc attached to bug 1831905 if you'd like to take a look. I'm pretty sure the proposed object metric type will suit the stack traces use cases (and I use the thread frames structure as an example, to make sure of it).

(In reply to Chris H-C :chutten from comment #3)

:afranchuk, design discussions are happening in the doc attached to bug 1831905 if you'd like to take a look. I'm pretty sure the proposed object metric type will suit the stack traces use cases (and I use the thread frames structure as an example, to make sure of it).

Great, that looks like it will work fine and make my life easy!

Gonna call this dupe'd by the object metric.

Status: NEW → RESOLVED
Closed: 10 months ago
Duplicate of bug: 1831905
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.