A concept for capturing and examination of stacks for Telemetry

RESOLVED FIXED

Status

()

Toolkit
Telemetry
P3
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Iaroslav Sheptykin, Assigned: Iaroslav Sheptykin, Mentored)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [measurement:client])

Attachments

(1 attachment)

(Assignee)

Description

3 years ago
Created attachment 8681222 [details]
TelemetryHangsExample.png

This bug aims to clarify the requirements and to create the implementation spec for approaching a feature request in bug 1209131.
The core requirement of bug 1209131 is in having a way of capturing and analysing call stacks without causing the browser to crash. This ability will be helpful for developing code involving invariants (see the description of bug 1209131).
This use case imposes two requirements onto Telemetry:
1. Capture a call stack upon a request.
2. Offer captured call stacks for analysis.

Telemetry implements both requirements party through the functionality for detecting [chrome][1] and [thread][2] hangs. Telemetry implements:
1. [call stack capture][3],
2. [display of captured call stacks in about:telemetry][4], and
3. [uploading of captured call stacks][5] to Telemetry server through telemetry pings.

The example below demonstrates the structure of a single captured call stack:
```js
    // An example of a call stack.
    [
      "Startup::XRE_Main", 
      "Events::ProcessGeckoEvents", 
      "nsInputStreamPump::OnInputStreamReady", 
      "nsInputStreamPump::OnStateStop", 
      "browser/content/search/search.xml:72", 
      "gre/components/nsSearchService.js:4283", 
      "gre/modules/Task.jsm:164", 
      "self-hosted:667", 
      "(chrome script)", 
      "(chrome script)"
    ]
```
An example of how about:telemetry displays information about call stacks can be found in the attachment. 

The existing functionality, however, is internal to Telemetry and does not provide any interface to (external) consumers. 

[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#2507
[2]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#3017
[3]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#3960
[4]: https://dxr.mozilla.org/mozilla-central/source/toolkit/content/aboutTelemetry.js#883-960
[5]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#1261-1262
Assignee: nobody → yarik.sheptykin
Priority: -- → P3
Whiteboard: [measurement:client]
(In reply to Iaroslav Sheptykin from comment #0)
> Telemetry implements both requirements party through the functionality for
> detecting [chrome][1] and [thread][2] hangs. Telemetry implements:
> 1. [call stack capture][3],
...
> [3]:
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> Telemetry.cpp#3960

I think that collects native stacks that need server side symbolification.
To keep the effort here down we want to go for MozStackWalk() (or re-use code that builds on top of it):
https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.h

The information we get out of this will probably inform the data format we will submit.
(Assignee)

Comment 2

3 years ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> I think that collects native stacks that need server side symbolification.
> To keep the effort here down we want to go for MozStackWalk() (or re-use
> code that builds on top of it):
> https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.h

Thanks for pointing that out! I assumed that ProcessedStack is the structure that Telemetry prefers.

Among the [uses of MozStackWalk][4] there is [an extension][5] that adds caching to stack information. However, it is recommended for the situations with frequent calls to MozStackWalk only because it results in a considerable increase of memory usage. I don't think this is relevant for our use case. The rest of the use cases work with MozStackWalk directly. SandboxCrash[6] demonstrates how to capture JS stack in case we need that too.

> The information we get out of this will probably inform the data format we
> will submit.

MozStackWalk gives access to the following [information][1] for per stack frame via [MozDescribeCodeAddress][2]:

1. library (char[256]) -> The name of the shared library or executable
2. loffset (ptrdiff_t) -> The address's offset within that library
3. filename (char[256]) -> The name of the file name
4. lineno (unsigned long) -> Line number
5. function (char[256]) -> Function name
6. foffset (ptrdiff_t) -> Offset within that function

There is also a [function][3] for formatting this information as a string. Using this we can capture a stack as ["fame 1 Info", "frame 2 Info", ... , "frame N Info"].
The question remains as to what other information we want to store. As suggested in bug 1209131, we might want to:
1. add an identifier to this stack information,
2. add a counter for identical stacks to avoid repetitions
Additionally how about:
1. information about capture time.
2. information about stack depth, if we plan on limiting it.


[1]: https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.h#63
[2]: https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.h#95
[3]: https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.h#145
[4]: https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3A%22MozStackWalk%28MozWalkStackCallback%2C+uint32_t%2C+uint32_t%2C+void+*%2C+uintptr_t%2C+void+*%29%22
[5]: https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsTraceRefcnt.h#29
[6]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/glue/SandboxCrash.cpp#34
(Assignee)

Comment 3

3 years ago
Telemetry.cpp functions, which construct JS for Chrome and Thread hangs seem not to care much about exact timestamps [1]. Assuming that there is a reason for that, I believe that we should not be capturing such information together with stacks either.

Looking at types of information available through a Crash Report as in [2], I don't see anything beyond what Telemetry is capable to capture trough main pings and environment [3]. I believe therefore, that :bholley should not need anything else to deal with his use-case. This leaves us with the following extra bits of info to capture/calculate:
1. List of captured stacks.
2. Keys for distinguishing stacks.
3. Counter of repeating stacks.

This can be represented as:
{
  "key 1": {
    "count": 2,
    "stack": ["frame 1", "frame 2", ..., "frame N"]
  },
  ...,
  "key N": {
    "count": 1,
    "stack": ["frame 1", "frame 2", ..., "frame N"]
  },
}

Where "key X" is user defined as for example "MOZ_RELEASE_ASSERT_SOME_INVARIANT_FAILED".
How does this sound, Bobby?

[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#2565
[2]: https://crash-stats.mozilla.com/report/index/573f2e25-8239-4145-9837-c0ba12151101
[3]: http://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/environment.html
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 4

3 years ago
(In reply to Iaroslav Sheptykin from comment #3)

> This can be represented as:
> {
>   "key 1": {
>     "count": 2,
>     "stack": ["frame 1", "frame 2", ..., "frame N"]
>   },
>   ...,
>   "key N": {
>     "count": 1,
>     "stack": ["frame 1", "frame 2", ..., "frame N"]
>   },
> }

Or in case we have more than one stack per key:
...
   "key N": [
     {
       "count": X,
       "stack": ["frame 1", "frame 2", ..., "frame N"]
     }
   ],
...
(Assignee)

Comment 5

3 years ago
Georg suggested to capture bits of information about each frame separately instead of aggregating it into a string.

Data format then would look like:
{
  "string-key": [
    {
        "count": 2, // Indicates how often the "stack" was appeared.
        "stack": [
            [module, offset, function-name, source-file, line-no],
            ...
        ]
    },
  ],
  ...
}

This data can be captured with a similar API:
Telemetry::captureStack("string-key");

How do you find that, vladan?
Flags: needinfo?(vladan.bugzilla)
(In reply to Iaroslav Sheptykin from comment #3)
> Looking at types of information available through a Crash Report as in [2],
> I don't see anything beyond what Telemetry is capable to capture trough main
> pings and environment [3]. I believe therefore, that :bholley should not
> need anything else to deal with his use-case. This leaves us with the
> following extra bits of info to capture/calculate:
> 1. List of captured stacks.
> 2. Keys for distinguishing stacks.
> 3. Counter of repeating stacks.
> 
> This can be represented as:
> {
>   "key 1": {
>     "count": 2,
>     "stack": ["frame 1", "frame 2", ..., "frame N"]
>   },
>   ...,
>   "key N": {
>     "count": 1,
>     "stack": ["frame 1", "frame 2", ..., "frame N"]
>   },
> }
>
> 
> Where "key X" is user defined as for example
> "MOZ_RELEASE_ASSERT_SOME_INVARIANT_FAILED".

I think the name should be more explicit about what it does, and what the performance impact is. So something like MOZ_EXPENSIVE_STACK_CAPTURE("Key"). Not sure if it's permissible for the string to be dynamic - I could live with an enum, but a string is nicer.

> How does this sound, Bobby?

So, it's quite possible for us to reach a point of interest multiple times from multiple different stacks. However, I think that capturing the stack each time we hit the point-of-interest could lead to terrible UX if we hit the trigger many times in a row, since capturing stacks is expensive and will probably lead to jank.

Given the use cases, I think it would be best to only capture the stack the first time the trigger is reached in a given session. So the Telemetry would really be "this point of interest was hit X times. Here is the stack for the first time the PoI was reached." This sounds roughly equivalent with what you're describing, but I just wanted to clarify the semantics.

Note that, if the key is dynamic, the programmer could hand-implement a poor man's multiple stack capture (if really needed) by embedding a counter into the key and incrementing that as appropriate. Another advantage of string keys.
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 7

3 years ago
(In reply to Bobby Holley (:bholley) from comment #6)

> I think the name should be more explicit about what it does, and what the
> performance impact is. So something like MOZ_EXPENSIVE_STACK_CAPTURE("Key").

Good point! But, I would personally rather put a note into the documentation warning about time cost of the stack capture. What do you think about it, Georg?

> So, it's quite possible for us to reach a point of interest multiple times
> from multiple different stacks. However, I think that capturing the stack
> each time we hit the point-of-interest could lead to terrible UX if we hit
> the trigger many times in a row, since capturing stacks is expensive and
> will probably lead to jank.
> 
> Given the use cases, I think it would be best to only capture the stack the
> first time the trigger is reached in a given session. So the Telemetry would
> really be "this point of interest was hit X times. Here is the stack for the
> first time the PoI was reached." This sounds roughly equivalent with what
> you're describing, but I just wanted to clarify the semantics.

Thanks for the clarification here, it helps a lot! Telemetry is already sensitive to session changes and will reset session data when a session change is detected (if I read [1] right). If we only have one stack per key per session, then we don't need the array of stacks. I.e.:

"string-key": {
   "count": 2, // Indicates how often capturing the stack for "string-key" was triggered.
   "stack": // The call stack as captured when "count" was 1.
   [
     [module, offset, function-name, source-file, line-no],
     ...
   ]
},

Further, we could do counting of stacks using a keyed histogram. In that case the data structure could shrink to ("string-key" -> stack), as:
{
  "string-key": [ // stack
    [module, offset, function-name, source-file, line-no],
     ...
  ],
},

But we would need to define a keyed histogram such as: 
"TELEMETRY_KEYED_STACK_CAPTURES": {
  "expires_in_version": "never",
  "kind": "count",
  "keyed": "true",
  "description": "The number of times we capture stack for a given user-defined key."
},

Does it make sense, Georg?

> Note that, if the key is dynamic, the programmer could hand-implement a poor
> man's multiple stack capture (if really needed) by embedding a counter into
> the key and incrementing that as appropriate. Another advantage of string
> keys.

That's true. When this becomes an often use-case, we can think of offering a convenient API in Telemetry.
Flags: needinfo?(gfritzsche)
(In reply to Iaroslav Sheptykin from comment #7)
> (In reply to Bobby Holley (:bholley) from comment #6)
> 
> > I think the name should be more explicit about what it does, and what the
> > performance impact is. So something like MOZ_EXPENSIVE_STACK_CAPTURE("Key").
> 
> Good point! But, I would personally rather put a note into the documentation
> warning about time cost of the stack capture. What do you think about it,
> Georg?

I think that is a naming detail that we can think about a bit later :)

> > So, it's quite possible for us to reach a point of interest multiple times
> > from multiple different stacks. However, I think that capturing the stack
> > each time we hit the point-of-interest could lead to terrible UX if we hit
> > the trigger many times in a row, since capturing stacks is expensive and
> > will probably lead to jank.
> > 
> > Given the use cases, I think it would be best to only capture the stack the
> > first time the trigger is reached in a given session. So the Telemetry would
> > really be "this point of interest was hit X times. Here is the stack for the
> > first time the PoI was reached." This sounds roughly equivalent with what
> > you're describing, but I just wanted to clarify the semantics.
> 
> Thanks for the clarification here, it helps a lot! Telemetry is already
> sensitive to session changes and will reset session data when a session
> change is detected (if I read [1] right). If we only have one stack per key
> per session, then we don't need the array of stacks. I.e.:
> 
> "string-key": {
>    "count": 2, // Indicates how often capturing the stack for "string-key"
> was triggered.
>    "stack": // The call stack as captured when "count" was 1.
>    [
>      [module, offset, function-name, source-file, line-no],
>      ...
>    ]
> },

That would be a nice simplification. Unless we have potential other use-cases now where we want to capture more than one stack, we should start with that.
I think we should keep the data format prepared for multiple stacks though - it's negligible overhead and then we could just easily submit more stacks later if needed.
@Vladan, does that sound good to you?

> Further, we could do counting of stacks using a keyed histogram. In that
> case the data structure could shrink to ("string-key" -> stack), as:
> [...]

I think there is no win here from counting in a keyed histogram.
I'd rather keep this together in one data structure.

> > Note that, if the key is dynamic, the programmer could hand-implement a poor
> > man's multiple stack capture (if really needed) by embedding a counter into
> > the key and incrementing that as appropriate. Another advantage of string
> > keys.
> 
> That's true. When this becomes an often use-case, we can think of offering a
> convenient API in Telemetry.

I think that would make analysis much harder then - if the capture code is simple enough we can just think about adding support there when it's needed.
Flags: needinfo?(gfritzsche)
My thoughts:
- I'm not picky about the stack representation format. The forward compatibility of comment 8 is nice
- Tracking the # of invariant violations in a keyed stack histogram would allow for easy inspection via a telemetry dash page. I don't know if Bobby needs this
- afaik MozStackWalk can't return xul.dll etc function names for official builds on Windows. It would require downloading PDBs. What's your plan for this?
Flags: needinfo?(vladan.bugzilla)
Also, SymGetModuleInfo64 isn't thread safe and I don't see MozStackWalk handling synchronization so unless I'm mistaken you'll have to handle it yourself:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681336%28v=vs.85%29.aspx
(Assignee)

Comment 11

3 years ago
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #10)

Hey Vladan! Thanks for the input!

> Also, SymGetModuleInfo64 isn't thread safe and I don't see MozStackWalk
> handling synchronization so unless I'm mistaken you'll have to handle it
> yourself:
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms681336%28v=vs.
> 85%29.aspx

If I read the StackWalk.cpp right, SymGetModuleInfo64 is only used within SymGetModuleInfoEspecial64 [1], which in turn is used by MozDescribeCodeAddress [2]. MozDescribeCodeAddress does some work on thread-safety [3]. As we are planning to use MozDescribeCodeAddress we should be safe therefore. Or am I wrong?

In any case, this issue sounds like an implementation detail, so lets move the discussion on this to the implementation stage.

[1] https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#686
[2] https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#773
[3] https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#789-790
(Assignee)

Comment 12

3 years ago
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #9)

> - I'm not picky about the stack representation format. The forward
> compatibility of comment 8 is nice

Sounds good to me.

> - Tracking the # of invariant violations in a keyed stack histogram would
> allow for easy inspection via a telemetry dash page. I don't know if Bobby
> needs this

I would suggest then that we go for the solution without a keyed histogram. Having a counter right next to the stack seems reasonable in many ways. We could easily add histogram counting in a follow-up bug if we find value in that.

> - afaik MozStackWalk can't return xul.dll etc function names for official
> builds on Windows. It would require downloading PDBs. What's your plan for
> this?

Our thinking was to rely completely on MozStackWalk for capturing stacks. The benefit here is in keeping the implementation simple. But everywhere it fails we would also fail.
This issue sounds like a problem with MozStackWalk, which we might not want to solve inside Telemetry. Overall, it does not seems critical to me right now and we could revisit it during the implementation. I will put it into the requirements list to keep it in mind.

How does that sound, Georg?
Flags: needinfo?(gfritzsche)
(In reply to Iaroslav Sheptykin from comment #11)
> In any case, this issue sounds like an implementation detail, so lets move
> the discussion on this to the implementation stage.

(In reply to Iaroslav Sheptykin from comment #12)
> Our thinking was to rely completely on MozStackWalk for capturing stacks.
> The benefit here is in keeping the implementation simple. But everywhere it
> fails we would also fail.
> This issue sounds like a problem with MozStackWalk, which we might not want
> to solve inside Telemetry. Overall, it does not seems critical to me right
> now and we could revisit it during the implementation. I will put it into
> the requirements list to keep it in mind.

The concern here is that 90% of our users run Windows, so this implementation detail is actually a more significant issue given that MozDescribeCodeAddress would be unable to properly symbolicate stacks on the client side for 90% of our telemetry pings.
So, effectively we have to do server-side symbolification here to get meaningful data?
Or do we have alternatives?

Looks like i was misunderstanding earlier conversations here, i thought we could get away with client-side symbolification.
Flags: needinfo?(gfritzsche) → needinfo?(aklotz)
(In reply to Iaroslav Sheptykin from comment #12)
> > - Tracking the # of invariant violations in a keyed stack histogram would
> > allow for easy inspection via a telemetry dash page. I don't know if Bobby
> > needs this
> 
> I would suggest then that we go for the solution without a keyed histogram.
> Having a counter right next to the stack seems reasonable in many ways. We
> could easily add histogram counting in a follow-up bug if we find value in
> that.

Bobby, is it useful for you to inspect the counts (how often a "stack capture point" was hit) individually (without stacks) on telemetry.mozilla.org?
Or would you always want to look at the stacks anyway?
Flags: needinfo?(bobbyholley)
(In reply to Georg Fritzsche [:gfritzsche] from comment #14)
> So, effectively we have to do server-side symbolification here to get
> meaningful data?
> Or do we have alternatives?

The only way to make client side work with the current Windows implementation of MozDescribeCodeAddress is to configure dbghelp to download pdbs to the client on demand. Given the size of pdb files, this is not a good option.

Note that the Gecko profiler gets around this by taking the addresses from MozStackWalk and then hitting the snappy symbolication server to resolve them. I suppose we could do something similar on the telemetry side, probably via JS.
Flags: needinfo?(aklotz)
The counts are marginally useful - support them if they're easy, but don't bend over backwards for them.
Flags: needinfo?(bobbyholley)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #16)
> Note that the Gecko profiler gets around this by taking the addresses from
> MozStackWalk and then hitting the snappy symbolication server to resolve
> them. I suppose we could do something similar on the telemetry side,
> probably via JS.

Obviously this alternative would significantly increase traffic to the symbolication server. I'm not sure how Vladan feels about this ;-)
Flags: needinfo?(vladan.bugzilla)
I don't think it would increase the traffic significantly. It would just be a matter of symbolicating Nightly stacks that lead to one of these rare, security-sensitive MOZ_CRASH-worthy situations. We already have symbolication of chrome-hangs, which are pretty common. The Symbolication Server can handle that load (mostly) alright.
Flags: needinfo?(vladan.bugzilla)
Note that offline symbolication implies this feature needs to report library version information, same as chrome-hangs.
(Assignee)

Comment 21

3 years ago
Thanks for the input guys!

If I understand the latest discussion right this is the way of obtaining useful stacks: 
1. capture a stack with MozStackWalk
2. request symbolication with the Snappy Symbolication Server [1]

The symbolication server expects requests in a format like this [2]:

curl -d '{"stacks":[[[0,11723767],[1, 65802]]],"memoryMap":[["xul.pdb","44E4EC8C2F41492B9369D6B9A059577C2"],["wntdll.pdb","D74F79EB1F8D4A45ABCD2F476CCABACC2"]],"version":4}' http://symbolapi.mozilla.org/

The request includes an array of "stacks" which seems possible to capture with MozStackWalk. Beside stacks, it also expects "memoryMap" which references .pdb files. I am not sure if any part of MozStackWalk provides this information. How do we obtain that? Is that something to be captured together with stacks? If I understand comment 16 correctly dealing with .pdb files is something we want to avoid. Can we define memoryMap on the client without those files?

[1]: https://wiki.mozilla.org/Snappy_Symbolication_Server
[2]: https://github.com/mozilla/Snappy-Symbolication-Server/
Flags: needinfo?(vladan.bugzilla)
The PDB information is obtained by GetStackAndModules:

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp?from=GetStackAndModules#3997

Calling GetChromeHangReport will return both stack (via MozStackWalk) and PDBs (via GetStackAndModules):

https://hg.mozilla.org/mozilla-central/annotate/cc473fe5dc512c450634506f68cbacfb40a06a23/xpcom/threads/HangMonitor.cpp#l135

So you would probably want to call GetChromeHangReport. GetChromeHangReport doesn't use PDB *files*, it just returns metadata *about* the PDB files. The symbolication of the resulting stack  should happen offline on a server, like chromehangs.

Alternately, you can restrict this feature to OS X Nightlies only.
Flags: needinfo?(vladan.bugzilla)
(Assignee)

Comment 23

3 years ago
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #22)

Hey Vladan! Thanks for your explanations!

> Alternately, you can restrict this feature to OS X Nightlies only.

Considering Aarons comment 13 this would be an unfortunate thing to do. It seems like it is worth going for the server-side symbolication.

> The PDB information is obtained by GetStackAndModules:
> 
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> Telemetry.cpp?from=GetStackAndModules#3997
> 
> Calling GetChromeHangReport will return both stack (via MozStackWalk) and
> PDBs (via GetStackAndModules):
> 
> https://hg.mozilla.org/mozilla-central/annotate/
> cc473fe5dc512c450634506f68cbacfb40a06a23/xpcom/threads/HangMonitor.cpp#l135
> 
> So you would probably want to call GetChromeHangReport. GetChromeHangReport
> doesn't use PDB *files*, it just returns metadata *about* the PDB files. The
> symbolication of the resulting stack  should happen offline on a server,
> like chromehangs.
> 

This sounds like we can reuse much of the existing code for implementing this feature. Georg suggested to look into the data structure of the chrome hangs [1] and see if we can reuse it. Based on that structure we could store capture stacks as follows:

"capturedStacks": {
    "memoryMap": [
      ["wgdi32.pdb", "08A541B5942242BDB4AEABD8C87E4CFF2"],
      ["igd10iumd32.pdb", "D36DEBF2E78149B5BE1856B772F1C3991"],
      ... other entries in the format ["module name", "breakpad identifier"] ...
    ],
    "stacks": [
      [
          [
            0, // the module index or -1 for invalid module indices
            190649 // the offset of this program counter in its module or an absolute pc
          ],
          [1, 2540075],
          ... other frames ...
      ],
       ... other stacks ...
    ],

    "counts": [1, ...],
    "keys": ["string-key", ...],
}

The data present in the structure can be captured, as vladan described in comment 22. To capture a stack we can offer API in form of Telemetry::captureStack("string-key") or similar (comment 8). The implementation would be based on the combination of MozStackWalk and GetStackAndModules.

I am trying to keep this [https://public.etherpad-mozilla.org/p/callstack-capture] etherpad up to date with our intermediate results.

If this sounds more or less reasonable, I could start prototyping an implementation in a separate bug. What do you think, Georg?

[1] http://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/main-ping.html#chromehangs
Flags: needinfo?(gfritzsche)
Sorry for the delay, i was off sick last week.
This looks mostly good to me!

I think we need to change the format a little as we discussed off-line because currently we can't handle two keys having the same stack.
An alternative format could be e.g. instead of "keys":
> "captures": [["string-key", stack-index, count], ...

After that i'd love to see a prototype going.
Flags: needinfo?(gfritzsche)
(Assignee)

Updated

3 years ago
Blocks: 1225851
I think the work here is done, we can follow up on further questions on the implementation bug.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.