Closed Bug 1299773 Opened 8 years ago Closed 7 years ago

Verify stack capturing and symbol fetching after landing in Nightly

Categories

(Toolkit :: Telemetry, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: gfritzsche, Assigned: yarik.sheptykin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Attachments

(1 file, 1 obsolete file)

After bug 1225851 landed, we need to:
* test this out in Nightly
* verify that the symbol fetching in about:telemetry works properly
* fix symbol fetching if neccessary
Assignee: nobody → yarik.sheptykin
My currently Nightly has stack capturing working, but the symbolication is not working.

I think we need to check whether:
* the request to the symbolication server works [1]
* whether we put the symbolicated data in the page correctly
* the stack data in the ping is correct
  - (could e.g. cross-check captured stack data with local debugger call stacks)

1: https://dxr.mozilla.org/mozilla-central/rev/cd4cdcc9ad6c45dad8b8d8c0d40e459db2bca8a1/toolkit/content/aboutTelemetry.js#1067
From a quick testing & debugging tour, this actually succeeds in symbolicating the stacks correctly:
http://imgur.com/a/CVA9C

We get a 200 from the server and readable & correct symbols.
However, something seems to go wrong with rendering it in the about:telemetry page after that.
Iaroslav, can you add an example on the kind of requests you are getting 400 rewrites for?
:bytesized or :gsvelto might be able to provide input here.
Flags: needinfo?(yarik.sheptykin)
Hey everybody!

I am getting 400 back for this request (copied from dev tools):
curl 'http://symbolapi.mozilla.org/' -H 'Host: symbolapi.mozilla.org' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Content-Type: application/json' -H 'Content-Length: 149' -H 'Connection: keep-alive' -H 'Cookie: optimizelyEndUserId=oeu1459886672615r0.43077354202821394; optimizelySegments=%7B%22245984388%22%3A%22false%22%2C%22246002457%22%3A%22search%22%2C%22246073289%22%3A%22none%22%2C%22246073290%22%3A%22ff%22%7D; optimizelyBuckets=%7B%7D; _ga=GA1.2.1787855286.1460290452' -H 'DNT: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -d '{"memoryMap":[["/home/yarik/firefox/libxul.so","10034A751B7F6D1FF285CCA7F40855AA0"]],"stacks":[[[0,34257757],[0,34257990],[0,51442722]]],"version":3}'

For testing I build up a symbolication service locally (with docker). When I hit that local server with the same request, it responds 200 but fails to find symbols. I basically get the same stacks back. :gsvelto, do you have any idea why I get 400 from http://symbolapi.mozilla.org/?
Flags: needinfo?(yarik.sheptykin) → needinfo?(gsvelto)
I haven't done much work on the symbol server myself, but Kirk (:bytesized) has, he probably knows.
Flags: needinfo?(gsvelto) → needinfo?(ksteuber)
As to the functionality of symbolapi.mozilla.com, I'm afraid I have no answers. The software has been rewritten, but is currently undergoing QA before deployment. Until the software has been updated, I am afraid I cannot speak to its behavior whatsoever and the person who can no longer works here.

Running locally, however, I can speak to. The stacks you get back are the same as the ones that went in because the server determines "/home/yarik/firefox/libxul.so" to be an invalid symbol file. I have verified that it successfully symbolicates the same request if the symbol file is "libxul.so". I consider its behavior here to be "correct".
Flags: needinfo?(ksteuber)
(In reply to Kirk Steuber [:bytesized] from comment #6)

> Running locally, however, I can speak to. The stacks you get back are the
> same as the ones that went in because the server determines
> "/home/yarik/firefox/libxul.so" to be an invalid symbol file. I have
> verified that it successfully symbolicates the same request if the symbol
> file is "libxul.so". I consider its behavior here to be "correct".

I see. I was suspecting that. Now I need to figure why my FF build includes the full path into the lib name.
(In reply to Kirk Steuber [:bytesized] from comment #6)
> As to the functionality of symbolapi.mozilla.com, I'm afraid I have no
> answers. The software has been rewritten, but is currently undergoing QA
> before deployment. Until the software has been updated, I am afraid I cannot
> speak to its behavior whatsoever and the person who can no longer works here.

Sorry Kirk, I thought your rewrite had already been deployed :)
(In reply to Kirk Steuber [:bytesized] from comment #6)
> As to the functionality of symbolapi.mozilla.com, I'm afraid I have no
> answers. The software has been rewritten, but is currently undergoing QA
> before deployment. 

Is there a bug for that we can follow to see when it is deployed?
Flags: needinfo?(ksteuber)
Never mind, bug 1330013 tracks the issue, we shouldn't need to block on the new symbolapi deployment.
Flags: needinfo?(ksteuber)
Comment on attachment 8825677 [details]
Bug 1299773: Added rendering of symbolicated stacks.

https://reviewboard.mozilla.org/r/103778/#review104492

Taking this for a quick test drive, i can confirm that the symbolication works properly, nice.
Some smaller issues below.

Clicking "Show raw data from hangs" breaks the rendering of the stacks.
Looks like the listener is using `CapturedStacks.render(gPingData)`, but it should pass the payload.

I can't seem to capture more than 1 stack:
Try capturing two stacks, "test1" & "test2", then check the raw ping data in about:telemetry. I only see "test2" showing up.
It seems that the code in aboutTelemetry.js should render this fine, so lets address this in a separate patch.

In the future it would be nice to have better rendering then formatting everything into the header, but i guess this is working for now.

::: toolkit/locales/en-US/chrome/global/aboutTelemetry.properties:50
(Diff revision 1)
>  chrome-hangs-title = Hang Report #%1$S (%2$S seconds)
>  
>  # Note to translators:
> +# - The %1$S will be replaced with the key containing alphanumeric cahracters and a '-'.
> +# - The %2$S will be replaced with the number of captured stacks for this key
> +captured-stacks-title = Stack for key "%1$S" captured %2$S time(s)

How about we just use:
> %1$S (capture count: %2$S)

Then we can avoid the slightly involved topic of localization & plural forms for now:
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals
Attachment #8825677 - Flags: review?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #12)

> I can't seem to capture more than 1 stack:
> Try capturing two stacks, "test1" & "test2", then check the raw ping data in
> about:telemetry. I only see "test2" showing up.
> It seems that the code in aboutTelemetry.js should render this fine, so lets
> address this in a separate patch.

Now that you mention that I see this strange behavior too. What surprises me most is that performing 2 captures for different keys and calling Telemetry.snapshotCapturedStacks() produce correct output. Capturing same stacks again and going to about:telemetry displays both in the raw ping, but after refreshing the page the captures disappear! There must be something either with about:telemetry or with TelemetrySession that causes this behavior. In TelemetrySession we snapshot stacks with clearing captures. I will see if I can find something tomorrow.

> In the future it would be nice to have better rendering then formatting
> everything into the header, but i guess this is working for now.

Yeah, we should invest into a cleaner solution. I copied the rendering part from chrome hangs and noticed a couple other places which we could improve in the SymbolicationRequest. Should we make a bug for this?
Flags: needinfo?(gfritzsche)
(In reply to Iaroslav Sheptykin from comment #14)
> > In the future it would be nice to have better rendering then formatting
> > everything into the header, but i guess this is working for now.
> 
> Yeah, we should invest into a cleaner solution. I copied the rendering part
> from chrome hangs and noticed a couple other places which we could improve
> in the SymbolicationRequest. Should we make a bug for this?

Let's leave it alone for now, we can still re-visit this later.
We are thinking about re-design & re-write about:telemetry at some point, maybe it's better to address it then.
Flags: needinfo?(gfritzsche)
Comment on attachment 8825677 [details]
Bug 1299773: Added rendering of symbolicated stacks.

https://reviewboard.mozilla.org/r/103778/#review104874

"Captured Stacks" still breaks for me after clicking "Show raw data from hangs".

::: toolkit/locales/en-US/chrome/global/aboutTelemetry.properties:48
(Diff revision 2)
>  # - The %1$S will be replaced with the number of the hang
>  # - The %2$S will be replaced with the duration of the hang
>  chrome-hangs-title = Hang Report #%1$S (%2$S seconds)
>  
>  # Note to translators:
> +# - The %1$S will be replaced with the key containing alphanumeric cahracters and a '-'.

"... with the string key for this stack."

::: toolkit/locales/en-US/chrome/global/aboutTelemetry.properties:49
(Diff revision 2)
>  # - The %2$S will be replaced with the duration of the hang
>  chrome-hangs-title = Hang Report #%1$S (%2$S seconds)
>  
>  # Note to translators:
> +# - The %1$S will be replaced with the key containing alphanumeric cahracters and a '-'.
> +# - The %2$S will be replaced with the number of captured stacks for this key

"... the number of times this stack was captured."
Attachment #8825677 - Flags: review?(gfritzsche)
Comment on attachment 8825677 [details]
Bug 1299773: Added rendering of symbolicated stacks.

https://reviewboard.mozilla.org/r/103778/#review104492

> How about we just use:
> > %1$S (capture count: %2$S)
> 
> Then we can avoid the slightly involved topic of localization & plural forms for now:
> https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_and_Plurals

Good idea! But I keep the note to translators right?
Comment on attachment 8825677 [details]
Bug 1299773: Added rendering of symbolicated stacks.

https://reviewboard.mozilla.org/r/103776/#review108530

Thanks, this looks good to me with the below comments addressed.

Testing with "capturedStacks" data from Nightly, this works fine now for switching between raw & symbolized stacks.
It also handles multiple stacks in the payload fine.

::: toolkit/content/aboutTelemetry.js:1134
(Diff revision 3)
> +                              (index) => this.renderCaptureHeader(index, captures));
>    },
> +
> +  renderCaptureHeader: function CaptureStacks_renderCaptureHeader(index, captures) {
> +    let key = captures[index][0],
> +        cardinality = captures[index][2];

Please use a new declaration for this, i.e. `let cardinality = ...`.

::: toolkit/content/aboutTelemetry.js:1842
(Diff revision 3)
>        req.fetchSymbols();
>    }, false);
>  
>    document.getElementById("captured-stacks-hide-symbols").addEventListener("click",
>      function() {
> -      if (!gPingData) {
> +      if (gPingData && "payload" in gPingData) {

When is the "payload" check needed?
This rendering should only ever be triggered for main & saved-session pings, which always have a payload.

::: toolkit/locales/en-US/chrome/global/aboutTelemetry.dtd:164
(Diff revision 3)
>  <!ENTITY aboutTelemetry.fetchSymbols "
>    Fetch function names for hang stacks
>  ">
>  
>  <!ENTITY aboutTelemetry.hideSymbols "
> -  Show raw data from hangs
> +  Show raw stack data

Good catch!
You also need to change the `fetchSymbols` text above.
I couldn't seem to set r+ in ReviewBoard now, but this is supposed to be a "r+ with thesecomments addressed".
Attachment #8825677 - Flags: review?(gfritzsche) → review+
Attachment #8825677 - Attachment is obsolete: true
Comment on attachment 8831744 [details]
Bug 1299773: Added rendering of symbolicated stacks.

https://reviewboard.mozilla.org/r/108290/#review109584

Thanks!
Attachment #8831744 - Flags: review?(gfritzsche) → review+
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/autoland/rev/1789e79681dd
Added rendering of symbolicated stacks. r=gfritzsche
https://hg.mozilla.org/mozilla-central/rev/1789e79681dd
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Sorry, this doesn't work.
https://hg.mozilla.org/mozilla-central/diff/1789e79681dd/toolkit/locales/en-US/chrome/global/aboutTelemetry.dtd

If you update significantly a string, you need to update the ID.

Filing a follow-up.
Depends on: 1335655
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: