Closed Bug 1424760 Opened 7 years ago Closed 6 years ago

TypeError: data.slices_list is undefined[Learn More] GCTelemetry.jsm:149:1

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
thunderbird_esr52 --- unaffected
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 --- wontfix
firefox60 --- fixed

People

(Reporter: wtds.trabalho, Assigned: pbone)

References

Details

(Keywords: regression)

Attachments

(7 files, 2 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
Build ID: 20171209103450

Steps to reproduce:

After Restore tabs (And after update):

TypeError: data.slices_list is undefined[Learn More]  GCTelemetry.jsm:149:1

  for (let slice of data.slices_list) {
    limitProperties(slice, MAX_SLICE_KEYS);
    limitProperties(slice.times, MAX_PHASES);
  }


Actual results:

Error in:

  for (let slice of data.slices_list) {
    limitProperties(slice, MAX_SLICE_KEYS);
    limitProperties(slice.times, MAX_PHASES);
  }


Expected results:

No errors.

Thanks!
Component: Untriaged → Telemetry
Product: Firefox → Toolkit
Hey Paul, can you take a look?
Flags: needinfo?(pbone)
I think that what's happening is that I miscounted when setting the MAX_GC_KEYS constant and some telemetry items, particularly if GC is busy (when restoring many tabs) and because the limit is exceeded then the slices_list field is deleted and this code crashes, while the code above the MAX_GC_KEYS check does not crash.
Assignee: nobody → pbone
Status: UNCONFIRMED → ASSIGNED
Depends on: 1412729
Ever confirmed: true
Flags: needinfo?(pbone)
Keywords: regression
Priority: -- → P2
limitProperties would reject payloads with the maximum number of properties,
when it should accept these and only reject when they exceed the maximum.
Attachment #8937063 - Flags: review?(gfritzsche)
By numbering each of the properties we can easily check that we do not
exceed the limit in GCTelemetry.jsm.  Note that this is not relevent for the
nursery since it does not send telemetry.
Attachment #8937064 - Flags: review?(mgaudet)
+ Do not continue preparing the ping if limitProperties removed all the
   properties.
 + Log limitProperties failures.

Hi Georg. There's an XXX comment in this diff where I'd like your feedback/advice.

Thanks.
Attachment #8937065 - Flags: feedback?(gfritzsche)
Comment on attachment 8937064 [details] [diff] [review]
Bug 1424760 (Part 2) - Number each of the GC telemetry properties

Review of attachment 8937064 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, count looks good.

Looking at this I do wonder if there's a way to convert this to an assert or static_assert.
Attachment #8937064 - Flags: review?(mgaudet) → review+
> 
> Looking at this I do wonder if there's a way to convert this to an assert or
> static_assert.

I considered that, at least an assert. But it looked like more code/trouble than it was worth since I'd probably overload the JSONPrinter class to do that.
Comment on attachment 8937063 [details] [diff] [review]
Bug 1424760 (Part 1) - Fix an off-by-one error

Review of attachment 8937063 [details] [diff] [review]:
-----------------------------------------------------------------

Easy & clear fix, makes sense.
Attachment #8937063 - Flags: review?(gfritzsche) → review+
Comment on attachment 8937065 [details] [diff] [review]
Bug 1424760 (Part 3) - Try to make GC telemetry pings more robust

Review of attachment 8937065 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/GCTelemetry.jsm
@@ +124,5 @@
>      }
> +    // XXX: Georg, how do I make a warning that would raise an error in our
> +    // testing environments but maybe only log something on users' systems?
> +    console.log("Number of properties exceeded in the GC telemetry " +
> +        name + " ping");

Good question. I'm not aware off-hand of a mechanism that allows generic test failures outside of a specific test.

Do we need this though?
Maybe it's just ok if this is occasionally cut off?
You could look at the incoming data to see what percentage of submissions actually hits the limit.

For error logging, i suggest to set this up the same as we do in other Telemetry components and use `.warn()`:
- https://dxr.mozilla.org/mozilla-central/rev/5d203926da51a7e949a20818664b19d5b115572d/toolkit/components/telemetry/TelemetryEnvironment.jsm#833
- https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Log.jsm#Basic_usage
Attachment #8937065 - Flags: feedback?(gfritzsche)
Flags: needinfo?(pbone)
(In reply to Georg Fritzsche [:gfritzsche] [away Dec 22 - Jan 2] from comment #10)
> Comment on attachment 8937065 [details] [diff] [review]
> Bug 1424760 (Part 3) - Try to make GC telemetry pings more robust
> 
> Review of attachment 8937065 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/GCTelemetry.jsm
> @@ +124,5 @@
> >      }
> > +    // XXX: Georg, how do I make a warning that would raise an error in our
> > +    // testing environments but maybe only log something on users' systems?
> > +    console.log("Number of properties exceeded in the GC telemetry " +
> > +        name + " ping");
> 
> Good question. I'm not aware off-hand of a mechanism that allows generic
> test failures outside of a specific test.
> 
> Do we need this though?

It can be very handy, we do this in the C++ (but without logging) and it is useful.  Since you don't know I'll just use Log.jsm like you suggest.

> Maybe it's just ok if this is occasionally cut off?

Yeah, that's okay, but it really shouldn't be, so an assertion during testing would help, but looks like it wouldn't have caught this particular case.

> You could look at the incoming data to see what percentage of submissions
> actually hits the limit.

Until 59 I don't think we hit the limit, and this is the only reported case of the limit being reached (it crashed because a field was not found after the limit was exceeded.

> For error logging, i suggest to set this up the same as we do in other
> Telemetry components and use `.warn()`:
> -
> https://dxr.mozilla.org/mozilla-central/rev/
> 5d203926da51a7e949a20818664b19d5b115572d/toolkit/components/telemetry/
> TelemetryEnvironment.jsm#833

Thanks.
Flags: needinfo?(pbone)
Attachment #8937065 - Attachment is obsolete: true
Attachment #8939740 - Flags: review?(gfritzsche)
Comment on attachment 8939740 [details] [diff] [review]
Bug 1424760 (Part 3) - Try to make GC telemetry pings more robust

Review of attachment 8939740 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good.
Does test coverage exist for both the expected code path and for going over the limit?

::: toolkit/components/telemetry/GCTelemetry.jsm
@@ +152,5 @@
>  
>    data.slices_list.sort((a, b) => a.slice - b.slice);
>  
> +  if (!limitProperties("data", data, MAX_GC_KEYS, log))
> +    return;

Please always use braces around controlled statements.

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style
(In reply to Georg Fritzsche [:gfritzsche] from comment #13)
> Comment on attachment 8939740 [details] [diff] [review]
> Bug 1424760 (Part 3) - Try to make GC telemetry pings more robust
> 
> Review of attachment 8939740 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks good.
> Does test coverage exist for both the expected code path and for going over
> the limit?

I don't think so.  it would need unit tests to create bogus GC marker data.  SpiderMonkey itself should never exceed this limit, I think that would be a bug in SpiderMonkey.

Are there such unit tests for the telemetry code?  toolkit/components/telemetry/GCTelemetry.jsm is not suitable because it uses SpiderMonkey to generate the data (which is good, it tests that the data makes sense).
Flags: needinfo?(gfritzsche)
I don't we have a suitable test file already, but you could add one to the Telemetry unit tests:
- create a basic toolkit/components/telemetry/tests/unit/test_GCTelemetry.js
- fire "garbage-collection-statistics" with odd data at it?
- make sure things work as expected
Flags: needinfo?(gfritzsche)
Priority: P2 → P1
Hey Paul, were you planning to look into tests here?
Should i cancel the review request to me for now?
Flags: needinfo?(pbone)
I'm writing a very basic test now, just to make sure I test the new code paths.  I'd like to file a seperate bug for someone (I guess me?) to write more complete tests later.

Cancel the review request, I'll upload an updated patch later.

Thanks.
Flags: needinfo?(pbone)
Attachment #8939740 - Flags: review?(gfritzsche)
toolkit/components/telemetry/tests/unit/test_TelemetryGC.js:
toolkit/components/telemetry/tests/unit/xpcshell.ini:
    Add new unit test.

toolkit/components/telemetry/GCTelemetry.jsm:
    Add an extra method to make testing easier.

js/src/gc/Statistics.cpp:
    Add a reference to the unit test for anyone modifying this code.
Attachment #8941730 - Flags: review?(gfritzsche)
+ Log limitProperties failures.
 + Don't remove all the properties from the main GC object, some are needed
   later, just reduce it to the 7 necessary properties.
Attachment #8939740 - Attachment is obsolete: true
Attachment #8941731 - Flags: review?(gfritzsche)
Comment on attachment 8941730 [details] [diff] [review]
Bug 1424760 (Part 3) - Add a TelemetryGC unit test

Review of attachment 8941730 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks good to me with the below addressed.

::: toolkit/components/telemetry/GCTelemetry.jsm
@@ +196,4 @@
>  
> +  // We expose this method so unit tests can call it, no need to test JSON
> +  // parsing.
> +  observe_raw(data) {

We use camelStyle in JS.
(With older tests etc. having some style differences.)

::: toolkit/components/telemetry/tests/unit/test_TelemetryGC.js
@@ +47,5 @@
> +}
> +
> +function assert_num_entries(expect, clear) {
> +  let entries = GCTelemetry.entries("main", clear);
> +  Assert.equal(expect, entries.worst.length, expect + "worst entries");

Nit: There is no space between the number and string.
Ditto below.
Attachment #8941730 - Flags: review?(gfritzsche) → review+
Attachment #8941731 - Flags: review?(gfritzsche) → review+
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/aee5e05a7e0c
(Part 1) - Fix an off-by-one error r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/e65bb33f2ae0
(Part 2) - Number each of the GC telemetry properties r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4ecd0f0fc8d
(Part 3) - Add a TelemetryGC unit test r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f1f863ea198
(Part 4) - Try to make GC telemetry pings more robust r=gfritzsche
I see the following log lines cluttering my log every couple of seconds or so:

> 1516349582494	Toolkit.Telemetry	TRACE	GCTelemetry::limitProperties
> 1516349582495	Toolkit.Telemetry	WARN	GCTelemetry::Number of properties exceeded in the GC telemetry data ping
> 1516349582495	Toolkit.Telemetry	TRACE	GCTelemetry::limitProperties

Is this expected? I'm on the latest Nightly on Windows 10, x64.
Flags: needinfo?(pbone)
No, that's not expected.  I'll revisit this today or tomorrow.
Status: RESOLVED → REOPENED
Flags: needinfo?(pbone)
Resolution: FIXED → ---
(In reply to Paul Bone [:pbone] from comment #25)
> No, that's not expected.  I'll revisit this today or tomorrow.

Please let me know if you need support reproducing it, as I'm able to :)
Hi Georg

How can I enable elemetry in my local build so I can test this?

in about:config toolkit.telemetry.enabled is "true" but has "Locked" as its status.

Thanks.
(In reply to Paul Bone [:pbone] from comment #27)
> Hi Georg
> 
> How can I enable elemetry in my local build so I can test this?
> 
> in about:config toolkit.telemetry.enabled is "true" but has "Locked" as its
> status.

That's already "true" so you don't need to do anything: see https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/internals/preferences.html#id1 .

Long story short: that pref is set to true and locked on nightly, aurora, beta and local dev builds. It's false on Release.
Okay, I was worried about that.  I can reproduce the problem in my normal every-day nightly & its profile.  I cannot reproduce it in the dev build.  I'll try to work out why.
I thought that I fixed an off-by-one error in part 1 of this patch series,
but I didn't, I introduced one.  I'm not sure why the original code didn't
work.

This patch also adds more testing (to avoid off-by-ones) and better logging
that may help if the original problem comes back.
Attachment #8945336 - Flags: review?(gfritzsche)
Comment on attachment 8945335 [details] [diff] [review]
Bug 1424760 (Part 5) - refactoring in test_TelemetryGC.js

Sorry for the delay Paul.
Redirecting due to my week being full here.
Attachment #8945335 - Flags: review?(gfritzsche) → review?(alessio.placitelli)
Attachment #8945336 - Flags: review?(gfritzsche) → review?(alessio.placitelli)
NP Georg,

I noticed the delay but didn't know who else to ask.

Thanks.
Attachment #8945335 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8945336 [details] [diff] [review]
Bug 1424760 (Part 6) - Fixup the regression in part 1

Review of attachment 8945336 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for tackling this. So, this is basically adding more logs to track down the problem, correct?
Attachment #8945336 - Flags: review?(alessio.placitelli) → review+
(In reply to Alessio Placitelli [:Dexter] from comment #34)
> Comment on attachment 8945336 [details] [diff] [review]
> Bug 1424760 (Part 6) - Fixup the regression in part 1
> 
> Review of attachment 8945336 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for tackling this. So, this is basically adding more logs to track
> down the problem, correct?

That's right.  my original fix just made things (slightly) worse. and I've checked the code here and the corresponding code in the JavaScript GC and I can't see how this limit is being crossed. Hopefully with extra logging we'll get a new clue.  But we're not getting automated reports for this either, our good users such as Wellington Torrejais da Silva are reporting them.
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0defe55f6d76
(Part 5) - refactoring in test_TelemetryGC.js r=Dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/1dbd8efeedbf
(Part 6) - Fixup the regression in part 1 r=Dexter
https://hg.mozilla.org/mozilla-central/rev/0defe55f6d76
https://hg.mozilla.org/mozilla-central/rev/1dbd8efeedbf
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
This is still a problem in the lastest nightly.

There are 25 properties appearing in the JSON when the GC could only possibly write 24 fields.  I will investigate why there are 25.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
For 59 this is not important.  The result is that we're getting slightly less good telemetry when the GC is very busy.
Target Milestone: mozilla59 → mozilla60
Comment on attachment 8947412 [details] [diff] [review]
Bug 1424760 (Part 7) - Don't insert an extra property into the GC telemetry

Review of attachment 8947412 [details] [diff] [review]:
-----------------------------------------------------------------

r+ assuming this was manually tested as well :)
Attachment #8947412 - Flags: review?(alessio.placitelli) → review+
Yes,

I finally found the documentation that allowed me to reproduce this in my workspace (rather than just in Nightly): https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/internals/preferences.html and so I was able to quickly find and fix the problem, and test the fix.
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1671742389a7
(Part 7) - Don't insert an extra property into the GC telemetry r=Dexter
https://hg.mozilla.org/mozilla-central/rev/1671742389a7
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Depends on: 1453890
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: