[meta] Glean `stack` metric type
Categories
(Data Platform and Tools :: Glean: SDK, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: chutten)
References
Details
(Keywords: meta)
We have a couple of bugs filed already for some narrow-to-broad use cases of a stack
metric type: bug 1704854 and bug 1728784.
Plus, we have bug 1784069 coming from the "crash" ping angle bringing in a couple of new cases plus the history of the existing Legacy "crash" ping's stackTraces
substructure.
And we can also see BHR's use of stacks for hangs.
And in non-stack but stack-related instances: How does the design of the struct
and struct_list
metric types factor in (if at all)?
This metabug is about gathering requirements for a stack
metric type (some of which we have right here, but no doubt there's more), turning it into a design (the next step on the process), then of course the impl, testing, validation etc. Hopefully we can close out some of the See Also bugs along the way.
Comment 1•2 years ago
|
||
Previously, the "stack" metrics in Telemetry included much more than the stack (e.g., loaded module information and exception info). I don't think this additional information is necessary to capture as part of the stack
metric type; it can be gathered by other means and represented in other Glean metrics easily. However the relationship between these things (specifically referring to the loaded module in a particular stack frame) may be more efficient as an index (as it was in Telemetry), so that may be an argument for including that information.
As I see it, at a minimum to be useful the stack information needs to accommodate:
- One thread or all threads
- For each thread: in the ballpark of at least 10-20 frames
- For each frame: the module containing the executing code, an instruction pointer as an offset from the beginning of the module, the means by which that instruction pointer was obtained, and optionally the symbolicated frame information (the symbol name, offset into the symbol, and maybe even debug line info if available)
It's not clear to me whether the above parameters should be configurable as static settings in the metric itself, or whether the metric type should have some large upper limits and the code which populates the metric can choose what's included in a particular probe (mainly choosing which threads and how many stack frames to include).
Storage
From a data perspective, these requirements may boil down to a storage format that is fairly terse in the "not much client-side processing" case, providing module names/versions/hashes as strings, and everything else in the stack information would be numbers (thread ids, IPs, indirect module references as an optimization, an enum value relating to IP origin).
When more client-side processing is done, we will necessarily need to transmit much more information as we'll be sending the symbol names and possibly debug details. While this may blow up the ping size, this information is very valuable as it:
- is the most accurate information we could possibly get,
- allows immediate action on the information (as opposed to further server-side processing to get the stack information after the fact), and
- distributes computing to lessen server-side requirements (which may in turn allow for processing more ping volume).
Comment 2•2 years ago
|
||
The threads, frames, and modules are unbounded and we should truncate excessive items and include a flag or some other indicator that they were truncated.
For example, stackoverflow errors can have excessive number of frames. In Socorro, we have a processing rule to truncate stacks in the middle that exceed 500 frames. (bug #1763154)
Assignee | ||
Comment 3•2 years ago
|
||
:afranchuk, design discussions are happening in the doc attached to bug 1831905 if you'd like to take a look. I'm pretty sure the proposed object
metric type will suit the stack traces use cases (and I use the thread frames structure as an example, to make sure of it).
Comment 4•2 years ago
|
||
(In reply to Chris H-C :chutten from comment #3)
:afranchuk, design discussions are happening in the doc attached to bug 1831905 if you'd like to take a look. I'm pretty sure the proposed
object
metric type will suit the stack traces use cases (and I use the thread frames structure as an example, to make sure of it).
Great, that looks like it will work fine and make my life easy!
Assignee | ||
Comment 5•10 months ago
|
||
Gonna call this dupe'd by the object
metric.
Description
•