Report which keyed scalars fail to accumulate due to running out of keys

RESOLVED FIXED in Firefox 65

Status

()

enhancement
P3
normal
RESOLVED FIXED
a year ago
6 months ago

People

(Reporter: chutten, Assigned: smurfd, Mentored)

Tracking

Trunk
mozilla65
Points:
---

Firefox Tracking Flags

(firefox65 fixed)

Details

(Whiteboard: [good next bug][lang=c++])

Attachments

(2 attachments, 10 obsolete attachments)

(Reporter)

Description

a year ago
It is possible that a keyed scalar could hit a limit we imposed in the number of keys allowed (defaults to 100[1]). This limit might not be hit in development or test, and could affect the integrity of the data we collect. We should report which scalars hit those limits and how often.

I'm thinking a keyed uint scalar, telemetry.keyed_scalars_exceed_limit, where keys are scalar names.

This is sorta-kinda a Telemetry Health measure in the category of Client Integrity.

Since Scalar names are public already[2] this shouldn't have disclosure or publication problems... but that'll be covered more in the Data Collection Review.

[1]: https://searchfox.org/mozilla-central/rev/f860c2bf00576959b07e2d9619c7b5088b0005be/toolkit/components/telemetry/TelemetryScalar.cpp#90
[2]: https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Scalars.yaml
(Reporter)

Comment 1

a year ago
To help Mozilla out with this bug, here's the steps:

0) Comment here on the bug that you want to volunteer to help. I (or someone else) will assign it to you.
1) Download and build the Firefox source code: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Simple_Firefox_build
- If you have any problems, please ask on IRC (https://wiki.mozilla.org/Irc) in the #introduction channel. They're there to help you get started.
- You can also read the Developer Guide, which has answers to most development questions: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Introduction
2) Start working on this bug. You will be adding a new Keyed uint Scalar, so you'll want to read https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/start/adding-a-new-probe.html and probably also https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/scalars.html. 
- If you have any problems with this bug, please comment on this bug and set the needinfo flag for me. Also, you can find me and my teammates on the #telemetry channel on IRC (https://wiki.mozilla.org/Irc) most hours of most days.
3) Build your change with `mach build` and test your change with `mach test toolkit/components/telemetry/tests/`. Also check your changes for adherence to our style guidelines by using `mach lint`
4) Submit the patch (including an automated test, check the Adding a New Probe doc for how to write one) for review. Mark me as a reviewer so I'll get an email to come look at your code.
- Here's the guide: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch
- We will also need Data Collection Review as we'll be adding a measurement to Firefox: https://wiki.mozilla.org/Firefox/Data_Collection
5) After a series of reviews and changes to your patch, I'll mark it for checkin or push it to autoland. Your code will soon be shipping to Firefox users worldwide!
6) ...now you get to think about what kind of bug you'd like to work on next. Let me know what you're interested in and I can help you find your next contribution.
Mentor: chutten
Priority: -- → P3
Whiteboard: [good next bug][lang=c++]

Comment 2

a year ago
I want to contribute.Please give me the references related to this bug.
(Reporter)

Comment 3

a year ago
I'm not sure what references you mean?
Flags: needinfo?(akshay2gud)

Comment 4

a year ago
Sir it will be my first contribution to an organization. So I don't know how to start on this bug. Sir if any kind of documentation is available please provide me the link. It will be a great help.
Flags: needinfo?(akshay2gud)
(Reporter)

Comment 5

a year ago
Work on this bug will involve adding a keyed uint scalar, so you'll probably want to read the "adding a new probe" guide[1] to learn about adding new data measurements to Firefox, and the "scalars" documentation[2] so you know what a scalar is and what I mean by "keyed uint".

I've provided a lot of additional information in Comment#1 that might be helpful to read through.

[1]: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/start/adding-a-new-probe.html
[2]: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/scalars.html

Comment 6

a year ago
Thank You Sir. I will start working on it.
(Reporter)

Updated

11 months ago
Assignee: nobody → joberts.ff
Status: NEW → ASSIGNED
(Reporter)

Comment 7

11 months ago
jason, do you need any additional help getting started with this one?
Flags: needinfo?(joberts.ff)
(Reporter)

Comment 8

11 months ago
Unassigning due to activity. Jason, if you'd like to pick this back up just let me know.
Flags: needinfo?(joberts.ff)
(Reporter)

Updated

11 months ago
Assignee: joberts.ff → nobody
Status: ASSIGNED → NEW
I have a few questions:

For the Scalars.yaml entry, when should it expire, and what process(es) would it fall under?

For the implementation, how do I get the string of the name of the scalar, should I call it everywhere a TooManyKeys error might occur, and what value should I use (a counter)? 

I guess the test(s) would involve making sure that telemetry notification fires for every method I add it to. Is there a method for registering them (like xpcshell) so the test suite knows it exists?

Thanks in advance
Flags: needinfo?(chutten)
(Reporter)

Comment 10

11 months ago
These are all excellent questions.

It shouldn't expire, since keyed scalars will continue to exist forever.
It should only accumulate on the "main" process as it's the one that actually accumulates to scalars (the other processes just send arrays of instructions over IPC). It checks the limit here: [1]
Be sure to list my email in the alert_emails field, as it'll need an ongoing monitor.

I think you'll be implementing this in C++, so you'll be able to use the ScalarID enum instead of the string (like so: [2]). You can call "ScalarAdd" on it so you don't have to worry about storage (we'll take care of it)

The tests will involve accumulating too many keys to a test keyed scalar and then checking that your new scalar counts the right number of overflows. You can probably add your checks to the test_keyed_max_keys[3] test.

[1]: https://searchfox.org/mozilla-central/rev/83a923ef7a3b95a516f240a6810c20664b1e0ac9/toolkit/components/telemetry/TelemetryScalar.cpp#862
[2]: https://searchfox.org/mozilla-central/rev/83a923ef7a3b95a516f240a6810c20664b1e0ac9/dom/plugins/ipc/PluginModuleParent.cpp#2510
[3]: https://searchfox.org/mozilla-central/rev/83a923ef7a3b95a516f240a6810c20664b1e0ac9/toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js#498
Flags: needinfo?(chutten)
Thanks! I'll take this.
Assignee: nobody → me
(Reporter)

Comment 12

10 months ago
How's it going? Anything I can help with?
Flags: needinfo?(me)
(In reply to Chris H-C :chutten from comment #12)
> How's it going? Anything I can help with?
My bad, I did most of the work a few days after taking the bug, but I've been busy with work and am currently on vacation and never got around to submitting the patch. I'll try and push it around next weekend.
Flags: needinfo?(me)
I don't have the time to see this through, so I'll let it go (and maybe come back to it another time)
Assignee: me → nobody
(Reporter)

Comment 15

9 months ago
No problem!
(Assignee)

Comment 16

9 months ago
I'd like to give this a go...
Assignee: nobody → smurfd
Mentor: jrediger
Hey Nicklas, Chris is currently out of office. I'll cover for him.
I assigned you the bug. If you have any questions ping me here or on IRC (janerik).
(Assignee)

Comment 18

9 months ago
Thanks Jan-erik

So i have edited the Scalar.yaml file and added an entry under telemetry: following the same structure like the rest with the data ive seen mentioned in the comments above... like kind: uint, expires: "never" and so on.


added a ScalarAdd in this if-statement, in TelemetryScalar.cpp
  if (mScalarKeys.Count() >= mMaximumNumberOfKeys) {
    ScalarAdd(mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT, 1);

Then when i got it to build i thought id try to run the tests, just to see what the output would be. 
It started looping it seemed, complaining about the process still active.. not sure though if it has todo with somehting i did now or before :)

Anywho, guess you can do something like subtracting mMaximumNumberOfKeys - mScalarKeys.Count() in the above if-statement, when you know it has exceeded the max number to get the number of scalarkeys over the max.

Then what? i feel like im missing some major part of it :)
Flags: needinfo?(jrediger)
You're already on the right track there. 
The current unmodified tests should run fine, especially if you haven't touched them at all. How did you run the tests?

The subtraction will not be needed at all, all we want to record is that an overflow happened (actually, |mMaximumNumberOfKeys - mScalarKeys.Count| should always be 0 if the above if condition holds, as we don't store more if the limit is reached once).
Unfortunately the solution will be a bit more involved, as we want to record the overflow per scalar name, but we currently don't have a way to actually get to the scalar name from inside the |KeyedScalar| object. 
I'll free some time to come up with a rough solution there and get back to you ASAP.
Flags: needinfo?(jrediger)
So I took another look and this indeed needs a bit more design first to decide where and how we get the scalar name back so we can record the overflow.
I'll hand this over to Chris again, he'll be back next week and can take it from there. Unfortunately that means you're currently blocked on this, but maybe we can find something else for you to work on.
(Assignee)

Comment 21

9 months ago
Thanks for having a look 

i ran my test like : mach test toolkit/components/telemetry/tests/

okey its not a problem, i guess once he have a look at this he will comment and we take it from there...
Flags: needinfo?(chutten)
(Reporter)

Comment 22

9 months ago
Sounds like we need the name in the KeyedScalar object. If we replace the KeyedScalar's mScalarKind with a reference to the BaseScalarInfo we can avoid storage costs -and- get a reference to the scalar name.
Flags: needinfo?(chutten)
(Assignee)

Comment 23

9 months ago
Thanks Chris, think i follow. Looking into it.
(Assignee)

Comment 24

9 months ago
How long does it usually take to run: mach test toolkit/components/telemetry/tests/   ?
It takes over 1h for me and ends in basicly saying some of the tests fail due to timeout. 
Thats even if i have not added my code...

Anywho i Think i have been able to get the name of the scalar, that breaches mMaximumNumberOfKeys in this if-statement.
https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryScalar.cpp#871

Can i somehow trigger above if-statement without running the test-suites? (like from running ./mach run, and maby via BrowserConsole do some magic?) or should i create a specific test to trigger it? just to see that my printf-debugging gets hit and shows me the name ;)
Flags: needinfo?(chutten)
(Reporter)

Comment 25

8 months ago
mach test toolkit/components/telemetry/tests/ runs a lot of tests, some of them are mochitests[1] which require focus be set on the window under test, which is irritating and may lead to timeouts/failures if you try and use the machine while running them.

It's really not the greatest, but it works well in continuous integration :S

I recommend trying `mach test toolkit/components/telemetry/tests/unit/` which will run only the unit tests. It'll run them in parallel, and needs only the terminal window. On my machine (which is decently nice, but some years old now) they take 3 minutes or so. 

Once you are writing your own test, say in test_TelemetryScalars.js, you can specify just that file to mach test and it'll run nice and quickly (and give you more detailed output).

As for TelemetryScalar.cpp#871, that indeed is the condition we're interested in. You might not have luck recording the failure in that function, though, as I'm pretty sure we hold the lock when that function is called. You might have to detect that the return code is ScalarResult::TooManyKeys further out after we release the lock (maybe in [2] where we log it to the console) because we can't acquire the lock to accumulate while we hold the lock trying to accumulate :)

Or maybe it'll be fine there, I'm not sure. I say write a test to find out :)

[1]: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Mochitest
[2]: https://searchfox.org/mozilla-central/rev/51268dcbdff0f6f4a5cff7986df0f616efc5bcfd/toolkit/components/telemetry/TelemetryScalar.cpp#1019
Flags: needinfo?(chutten)
(Assignee)

Comment 26

8 months ago
Hey Chris
hm still issue with running `mach test toolkit/components/telemetry/tests/unit/` 
it comes to toolkit/components/telemetry/tests/unit/test_PingSender.js but then it seems to go into a loop.
First i thought it was because i had added these two lines to my .mozconfig file.
MOZ_TELEMETRY_REPORTING=1
MOZILLA_OFFICIAL=1

Removed them and rebuilt, still ... i also have the same issue with a freshly grabbed Repo mozilla-central from hg.mozilla.org
i CAN run a test on the single test_TelemetryScalars.js and i see my function being run!

In the other bug i am looking at i was asked to submit a  Progress patch, for the Mentor to more easily be able to help. Hope that's allright here aswell!?
(Assignee)

Comment 27

8 months ago
Please see my previous comment
Flags: needinfo?(chutten)
(Reporter)

Comment 28

8 months ago
Comment on attachment 9000068 [details] [diff] [review]
progress-patch-bug1451813.patch

Review of attachment 9000068 [details] [diff] [review]:
-----------------------------------------------------------------

This is good work so far! I have some comments, most of which are about whitespace that crept into this WIP patch and probably would be cleaned up before review anyway so I feel a little weird even mentioning them :S

test_PingSender.js runs fine in automation, so I wonder what could be different for you locally that's causing this oddity. For now proceed by just running the necessary test file and we can look into why test_PingSender.js loops as a problem separate from developing this change. Though maybe it has to do with the lock.

The lock comment's the most relevant one here. I think we need to be careful here with threading, and we can do that by moving our check just a couple of levels up. What do you think about that approach?

::: toolkit/components/telemetry/Scalars.yaml
@@ +1362,5 @@
> +  keyed_scalars_exceed_limit:
> +    bug_numbers:
> +      - 1451813
> +    description: >
> +      Checking if the max number of Scalars is reached.

For keyed measures like this one it is helpful if the description mentions what it is keyed by. So this could be "The number of times keyed scalars exceeded the number of keys limit, keyed by scalar name"

@@ +1366,5 @@
> +      Checking if the max number of Scalars is reached.
> +    expires: "never"
> +    kind: uint
> +    keyed: true
> +    notification_emails: 

You should remove the end-of-line whitespace.

@@ +1367,5 @@
> +    expires: "never"
> +    kind: uint
> +    keyed: true
> +    notification_emails: 
> +      - smurfd@gmail.com

Use my email address (chutten@mozilla.com) for the notification_emails here. I'll take care of questions and monitoring.

@@ +1934,4 @@
>      record_in_processes:
>        - 'content'
>  
> +

This extra whitespace appears to have been added accidentally?

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +728,5 @@
>    ScalarKeysMapType mScalarKeys;
>    const uint32_t mScalarKind;
>    uint32_t mMaximumNumberOfKeys;
> +  const BaseScalarInfo *info;
> +  

Excess whitespace we can omit.

@@ +887,4 @@
>    }
>  
>    if (mScalarKeys.Count() >= mMaximumNumberOfKeys) {
> +    static StaticMutex gTelemetryScalarsMutex;

We might already hold the lock at this point (see for instance [1] which calls [2] which calls here). We might need to catch this case further out (maybe store the return values around [1] and then check to see if it was TooManyKeys), or to pass the lock down this far so we can use it (we'd want to add proof of locking to all the KeyedScalar methods then, which would make this a larger patch).

[1]: https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/toolkit/components/telemetry/TelemetryScalar.cpp#1664
[2]: https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/toolkit/components/telemetry/TelemetryScalar.cpp#736

@@ +890,5 @@
> +    static StaticMutex gTelemetryScalarsMutex;
> +    StaticMutexAutoLock locker(gTelemetryScalarsMutex); 
> +    nsAutoCString sName = GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT);
> +    
> +//    printf("sName = %s\n", ToNewCString(sName));

You could use sName.get() to get a pointer to the underlying buffer.

::: toolkit/components/telemetry/TelemetryScalar.h
@@ +9,4 @@
>  #include "nsTArray.h"
>  #include "mozilla/TelemetryScalarEnums.h"
>  #include "mozilla/TelemetryProcessEnums.h"
> +#include "mozilla/Telemetry.h"

Not necessary to include this here
(Reporter)

Updated

8 months ago
Flags: needinfo?(chutten)
(Assignee)

Comment 29

8 months ago
I have addressed your white space points, and no no, please mention them :)
Changed to your email and changed description in Scalars.yaml

Went ahead and modified the Scalars functions in TelemetryScalar.cpp to have the lock argument.
Feels like that is the way to go, since its just within TelemetryScalar.cpp they are used.
That way i can reuse the lock from earlier stages in the code.. 
Hopefully i have not generalized it too much ...


Jup one of the reasons why atleast the line-break white space was in there, was because i was looking on another bug at the same time. So i had to cut and paste some after generating the patch. 

Tried to get around that by grabbing a fresh m-c mercurial repo. Add bookmarks for each thing im looking at and commit to each bookmark during my progress, then as an example : 'hg update --check bookmark1' to switch to one bookmark, do work, commit, hg update --check bookmark2 ... do work, commit, 'hg update --check main' as is my untouched bookmark ...That seems to work quite well. Until i was now going to grab this patch.

Sorry, not sure where to ask this, have gone through a ton of sites and MDN without any luck finding the answer 
have like 7 commits to one bookmark and would like to have that in one patch. How?!

Found this as an example 
https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmozilla/workflows.html
Sadly no mention on howto generate a patch after your commits :(


hg diff > file.patch gives me only the uncommitted data
hg export -o file.patch gives me my last commit

I could ofcourse go to each changeset and 'hg export -o file1.patch' to get 7 patces... 

There Must be some better way :)
Flags: needinfo?(chutten)
(Reporter)

Comment 30

8 months ago
New contributors are encouraged to use Phabricator for code review. You can find the user guide here: https://moz-conduit.readthedocs.io/en/latest/phabricator-user.html

There should be people on the IRC channels #introduction and #developers who will be able to help you with further questions. We have an IRC guide here: https://wiki.mozilla.org/Irc
Flags: needinfo?(chutten)
(Assignee)

Comment 31

8 months ago
Posted patch Bug1451813_180827_0043.patch (obsolete) — Splinter Review
Sorted out my Mercurial issues via help from #developers.
Sorry for bothering you with that kindof stuff...

Adding a progress patch where i have added the StaticMutexAutoLock 
to the KeyedScalar functions.

Not sure about the name we get though, if its right or not. 
Intentionally left a printf, to see the name it gets.

telemetry.keyed_scalars_exceed_limit , Is that right?!
Attachment #9000068 - Attachment is obsolete: true
Flags: needinfo?(chutten)
(Reporter)

Comment 32

8 months ago
Comment on attachment 9004111 [details] [diff] [review]
Bug1451813_180827_0043.patch

Review of attachment 9004111 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good. The scalar name is very descriptive. I have two notes:

1) Looks like you're going to start recording to your new scalar soon. Take a look at TelemetryScalar::SummarizeEvent for an example of how to accumulate to scalars from within TelemetryScalar.cpp. Keep in mind that your new scalar itself is a keyed scalar that could have more than 100 keys in it, so you'll need a little extra code to avoid an infinite loop.

2) We can save ourselves some effort by storing the ScalarInfo inside the KeyedScalar. We construct KeyedScalars here[1], passing in the scalar's kind. Instead of passing just the kind, we could pass a pointer to the info and then we'd still have the kind (mInfo->kind) but we'd also have easy access to the name (mInfo->name()) whenever we wanted. This then makes later changes to KeyedScalars easy as they'll always have a pointer to their info handy. (and it won't take much more space than storing the kind).

That being said, your existing solution is perfectly fine. It's not wasteful and it'll be very quick once we get the string types correct. If you'd prefer to keep it the way it is, I'm okay with that, too. But then we should probably move the utility function out of KeyedScalar:: and into internal_ next to internal_GetEnumByScalarName

[1]: https://searchfox.org/mozilla-central/rev/e126996d9b0a3d7653de205898517f4f5b632e7f/toolkit/components/telemetry/TelemetryScalar.cpp#1574

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +696,4 @@
>  
>    // Set, Add and SetMaximum functions as described in the Telemetry IDL.
>    // These methods implicitly instantiate a Scalar[*] for each key.
> +  ScalarResult SetValue(const nsAString& aKey, nsIVariant* aValue, const StaticMutexAutoLock& locker);

Convention in this file is to have the locker be the first parameter. Could you reorder these to match?

@@ +731,3 @@
>  };
>  
> +nsAutoCString

Returning nsAutoCStrings is not really something done in the codebase. (I apologize. Strings in mozilla-central are hard. I have to look them up every time.)

In this case the pointer returned from ScalarInfo::name is a pointer to a buffer that will outlive the string we want to return from this function. Since the string can _depend_ on the buffer, what we likely want is a const nsDependentCString.

`return nsDependentCString(mInfo.name());` should be fine in this case.

@@ +732,5 @@
>  
> +nsAutoCString
> +KeyedScalar::GetScalarNameByEnum(const StaticMutexAutoLock& lock, mozilla::Telemetry::ScalarID aId)
> +{
> +  nsAutoCString kName;

`k` is the prefix for static constants. This can just be `name`

@@ +735,5 @@
> +{
> +  nsAutoCString kName;
> +
> +  ScalarKey uniqueId{static_cast<uint32_t>(aId), false};
> +  const BaseScalarInfo& mInfo = internal_GetScalarInfo(lock, uniqueId);

'm' is the prefix for object members. Local variables have no prefix, so this should be something closer to `info`.
(Reporter)

Comment 33

8 months ago
Oh, and no worries about mercurial stuff! I only wish I could be more helpful with those (I use git through an adaptor, so my mercurial knowledge is next-to-none)
Flags: needinfo?(chutten)
(Assignee)

Comment 34

8 months ago
(In reply to Chris H-C :chutten from comment #32)
> 1) Looks like you're going to start recording to your new scalar soon. Take
> a look at TelemetryScalar::SummarizeEvent for an example of how to
> accumulate to scalars from within TelemetryScalar.cpp. Keep in mind that
> your new scalar itself is a keyed scalar that could have more than 100 keys
> in it, so you'll need a little extra code to avoid an infinite loop.

Have addressed everything you mentioned, accept the above.  
Do you mean that, If we reach mMaximumNumberOfKeys [1]
We should get the scalar name, and add values to this new scalar. so like scalar->AddValue , but if we add more than 100 that would itself hit [1] and we are looping. 
So we check if we reach maximum number of scalars for this new scalar.. then we send a Warning and dont add it or?

[1] https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryScalar.cpp#871
Flags: needinfo?(chutten)
(Reporter)

Comment 35

8 months ago
Correct. If we have more than 100 keyed histograms with more than 100keys, our new "telemetry.keyed_scalars_exceed_limit" scalar will also exceed its key limit... which will call into the code to handle keys exceeding the limit which will try to add the key which may loop forever. (or at least it'll be confusing).

You can warn if you'd like, or we can just let the fact that "telemetry.keyed_scalars_exceed_limit" has 100 keys act as an indicator that we've reached the limit and that other keyed histograms may have exceeded the limit but we were unable to record them.
Flags: needinfo?(chutten)
(Assignee)

Comment 36

7 months ago
hehe not sure if i got more sure or more confused.

After this : 
  if (mScalarKeys.Count() >= mMaximumNumberOfKeys) {
    nsDependentCString sName = nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT));

we then do something like : 
    ScalarKeysMapType mExceedScalarKeys;
    ScalarBase* scalarExceed = nullptr;
    mExceedScalarKeys.Get(sName, &scalarExceed);

    scalarExceed->AddValue()  // How Do we get the value to add?
    if (mExceedScalarKeys.Count() >= mMaximumNumberOfKeys) {
      return ScalarResult::TooManyKeys;
    }
Flags: needinfo?(chutten)
(Reporter)

Comment 37

7 months ago
Well, we want 

if(mScalarKeys.Count() >= mMaximumNumberOfKeys && {nameOfKeyedScalar != "telemetry.keyed_scalars_exceed_limit"})
Flags: needinfo?(chutten)
(Assignee)

Comment 38

7 months ago
Just fyi. Have the last couple of days figured out why, when i tried to run 'mach test toolkit/components/telemetry/tests/unit/' it hanged for me. It has something todo with that i am running a Linux distribution not supported by Bootstrapping, 
so im guessing im missing some package installed or configuration made by the bootstrap. That causes me not to be able to create a fake webserver
(Reporter)

Comment 39

7 months ago
Hm, that's odd. Well, luckily I don't think the Scalar tests need pingserver, and we can run your changes on try[1] before we land them to make sure the tests that -do- need it are still running okay.

Are you okay to keep going with this?

[1]: https://treeherder.mozilla.org/#/jobs?repo=try
(Assignee)

Comment 40

7 months ago
I have a Ubuntu VM i can run it in, so its allright. Yeah i think i can keep going.

If we do something like?

  if (mScalarKeys.Count() >= mMaximumNumberOfKeys && 
      nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT)) != "telemetry.keyed_scalar_exceed_limit") {

   // here we add data to the scalar something like i mentioned in Comment 36?
   
  }

Or is it enough to just return ScalarResult::TooManyKeys?
(Reporter)

Comment 41

7 months ago
I think there's a `.EqualsLiteral()` method on ns*String types that would be useful here, but yes.
(Assignee)

Comment 42

7 months ago
  if (mScalarKeys.Count() >= mMaximumNumberOfKeys &&
      !nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT)).EqualsLiteral("telemetry.keyed_scalar_exceed_limit")) {

    ScalarBase* scalarExceed = nullptr;
    mScalarKeys.Get(nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT)), &scalarExceed);

    // Get the scalar value.
    nsCOMPtr<nsIVariant> scalarExceedValue = nullptr;
    scalar->GetValue(scalarExceedValue);

    scalarExceed->AddValue(scalarExceedValue);

    return ScalarResult::TooManyKeys;
  }
-------
I did like that and it compiles all fine. Yet a unmodified test_TelemetryScalars.js fails for some reason

--
 0:09.12 INFO (xpcshell/head.js) | test test_keyed_max_keys pending (2)
 0:09.15 pid:29044 ExceptionHandler::GenerateDump cloned child 29065
 0:09.15 pid:29044 ExceptionHandler::SendContinueSignalToChild sent continue signal to child
 0:09.15 pid:29044 ExceptionHandler::WaitForContinueSignal waiting for continue signal...
 0:09.31 TEST_END: Test FAIL, expected PASS. Subtests passed 71/71. Unexpected 0 - xpcshell return code: -4
--

My thinking is that we need to get the value from 'scalar' to add to our new Exceeded scalar. Right?
Flags: needinfo?(chutten)
(Reporter)

Comment 43

7 months ago
ExceptionHandler means the test crashed. My guess is that either scalarExceedValue or scalar is null... or, since scalarExceed is a keyed scalar you need to provide a key as part of the AddValue call.


Oh, and !nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT)).EqualsLiteral("telemetry.keyed_scalar_exceed_limit") is an interesting condition. If I'm not mistaken you might be checking that telemetry.keyed_scalar_exceed_limit != telemetry.keyed_scalar_exceed_limit

I think you want to check if the name of the keyed scalar who is being asked to add a key (`this`) happens to EqualsLiteral(...)

I might be able to help more if you post your in-progress patch. We'll get this!
Flags: needinfo?(chutten)
(Assignee)

Comment 44

7 months ago
Posted patch Bug1451813_180924_2120.patch (obsolete) — Splinter Review
Ofcourse.

I agree it must have to do with 'scalar' being empty or something similar...
Tried to address both your points.

also tried after i had exported this patch to add 'scalar = internal_ScalarAllocate(mScalarKind);' above the if-statement, and comment away the scalar = *aRet;

yes i have poked some at the test, but just tried to re-test with the original test_TelemetryScalars.js 

But no luck..
Please have a look
Attachment #9004111 - Attachment is obsolete: true
Flags: needinfo?(chutten)
(Reporter)

Comment 45

7 months ago
Comment on attachment 9011580 [details] [diff] [review]
Bug1451813_180924_2120.patch

Review of attachment 9011580 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +883,4 @@
>      return ScalarResult::Ok;
>    }
>  
> +  if (mScalarKeys.Count() >= mMaximumNumberOfKeys && !aKey.EqualsLiteral("telemetry.keyed_scalar_exceed_limit")) {

There's a typo here, I think it's "keyed_scalars_exceed_limit" (notice the plural "scalars")

@@ +885,5 @@
>  
> +  if (mScalarKeys.Count() >= mMaximumNumberOfKeys && !aKey.EqualsLiteral("telemetry.keyed_scalar_exceed_limit")) {
> +    ScalarBase* scalarExceed = nullptr;
> +    nsDependentCString sName = nsDependentCString(GetScalarNameByEnum(locker, mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT));
> +    mScalarKeys.Get(sName, &scalarExceed);

Instead of getting the name and using that to get scalarExceed, could you use internal_GetKeyedScalarByEnum directly? Then with the KeyedScalar for telemetry.keyed_scalars_exceed_limit in scalarExceed we can do a simpler

scalarExceed->AddValue(mScalarInfo.name(), 1);

And that'll make it so telemetry.keyed_scalars_exceed_limit will have a key whose name is the keyed scalar who exceeded the limit, and whose value is 1 more than it was just a moment ago.

Is this making sense?
(Reporter)

Updated

7 months ago
Flags: needinfo?(chutten)
(Assignee)

Comment 46

7 months ago
Yeah that makes sense. Sadly though, i cant seem to use the internal_GetKeyedScalarByEnum from within KeyedScalar::GetScalarForKey

error: use of undeclared identifier 'internal_GetKeyedScalarByEnum'
nsresult rv = internal_GetKeyedScalarByEnum(lock, uniqueId, aProcessOverride, &scalar);

I only can see it being used in either internal_ and TelemetryScalar:: functions.

Seems however like we suspected that it is the 'scalar' that is null causing testfail.
ill keep digging though...
(Reporter)

Comment 47

7 months ago
It's probably because it's further down the file, so the compiler doesn't know it exists at that point. We can put a forward declaration[1] of the function just above the function where we want to use it to promise to the compiler that there is a function that exists like that, it just needs to keep reading to find it.

[1]: https://en.wikipedia.org/wiki/Forward_declaration
(Assignee)

Comment 48

7 months ago
Ahh that i should have figured out.
Progress! It became like below. Hm realize that we dont need the GetScalarNameByEnum function. Should i keep it or remove?

    KeyedScalar* scalarExceed = nullptr;

    ScalarKey uniqueId{static_cast<uint32_t>(mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT), false};
    const BaseScalarInfo& mScalarInfo = internal_GetScalarInfo(locker, uniqueId);

    ProcessID aProcessOverride = ProcessID::Parent;
    nsresult rv = internal_GetKeyedScalarByEnum(locker, uniqueId, aProcessOverride, &scalarExceed);

    if (NS_FAILED(rv)) {
      return ScalarResult::InvalidType; //There are probably some better return value to use?
    }

    scalarExceed->AddValue(locker, NS_ConvertUTF8toUTF16(mScalarInfo.name()), 1);



... On to the testing. in test_TelemetryScalars.js i tried to do the following


First i tried to use KEYED_UINT_SCALAR and do the below, then it would say about 100 times that: Keyed scalars can not have more than 100 keys. With the below it would only say that once.
And i get that(i think), the KEYED_UINT_SCALAR has already added 99 keys. So when asked to add 99 more, it will complain
(I mean, the test passes but it would write 99 INFO rows)
 
Why it would only say it once with KEYED_EXCEED_SCALAR is because we brake it off, right?  

  // Generate the names for the exceeded keys
  let keyNamesSet2 = new Set();
  for (let k = 0; k < 100; k++) {
    keyNamesSet2.add("key2_" + k);
  }

  // Add 100 keys to an histogram and set their initial value.
  valueToSet = 100;
  keyNamesSet2.forEach(keyName => {
    Telemetry.keyedScalarSet(KEYED_EXCEED_SCALAR, keyName, valueToSet++);
  });
Flags: needinfo?(chutten)
(Reporter)

Comment 49

7 months ago
It probably depends on precisely how you implemented the code. It should still print the warning about too many keys the same number of times with and without your changes. If it did anything different that could be confusing... or it might even be an indication that there's something wrong.

Maybe throw your work up in another patch so I can take a look at the whole thing?
Flags: needinfo?(chutten)
QA Contact: gfritzsche
(Assignee)

Comment 50

7 months ago
Posted patch Bug1451813_181004_2350.patch (obsolete) — Splinter Review
ofcourse
Attachment #9011580 - Attachment is obsolete: true
(Reporter)

Comment 51

7 months ago
Comment on attachment 9014888 [details] [diff] [review]
Bug1451813_181004_2350.patch

Review of attachment 9014888 [details] [diff] [review]:
-----------------------------------------------------------------

Seems as though we only have one thing to work out in the main code and then it's just cleanup and tests.

::: toolkit/components/telemetry/Scalars.yaml
@@ +1577,5 @@
> +    release_channel_collection: opt-out
> +    record_in_processes:
> +      - 'main'
> +
> +  keyed_scalars_exceed_limit:

We only need the one definition, I think. These appear to be duplicated.

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +689,5 @@
>  public:
>    typedef mozilla::Pair<nsCString, nsCOMPtr<nsIVariant>> KeyValuePair;
>  
> +  explicit KeyedScalar(const BaseScalarInfo& info) 
> +    : mScalarKind(info.kind)

Now that we're passing in info, we can store it as mScalarInfo (instead of storing just the kind). That'll make mScalarInfo already have the name of the Scalar.

(( We'll have to tidy up anyone who uses mScalarKind to instead use mScalarInfo.kind ))

@@ +723,5 @@
>  
>  private:
>    typedef nsClassHashtable<nsCStringHashKey, ScalarBase> ScalarKeysMapType;
> +  const nsDependentCString
> +    GetScalarNameByEnum(const StaticMutexAutoLock& lock, mozilla::Telemetry::ScalarID aId);

Yup, it seems as though this is no longer used in the patch and can be omitted.

@@ +893,5 @@
> +  if (mScalarKeys.Count() >= mMaximumNumberOfKeys && !aKey.EqualsLiteral("telemetry.keyed_scalars_exceed_limit")) {
> +    KeyedScalar* scalarExceed = nullptr;
> +
> +    ScalarKey uniqueId{static_cast<uint32_t>(mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT), false};
> +    const BaseScalarInfo& mScalarInfo = internal_GetScalarInfo(locker, uniqueId);

We'll be able to remove this.

@@ +895,5 @@
> +
> +    ScalarKey uniqueId{static_cast<uint32_t>(mozilla::Telemetry::ScalarID::TELEMETRY_KEYED_SCALARS_EXCEED_LIMIT), false};
> +    const BaseScalarInfo& mScalarInfo = internal_GetScalarInfo(locker, uniqueId);
> +
> +    ProcessID aProcessOverride = ProcessID::Parent;

Can just call it `process`. I don't think we're overriding the process here, just specifying it.

@@ +902,5 @@
> +    if (NS_FAILED(rv)) {
> +      return ScalarResult::InvalidType;
> +    }
> +
> +    scalarExceed->AddValue(locker, NS_ConvertUTF8toUTF16(mScalarInfo.name()), 1);

This will be the name "telemetry.keyed_scalars_exceed_limit" instead of being the name of the KeyedScalar our users are trying to accumulate too many keys to.

::: toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js
@@ +535,5 @@
>      Assert.equal(keyedScalars[KEYED_UINT_SCALAR][keyName], expectedValue++,
>                   "The key must contain the expected value.");
>    });
> +
> +  // Generate the names for the exceeded keys

First thing we should do is check that (KEYED_UINT_SCALAR in keyedScalars[KEYED_EXCEED_SCALAR]) (because we accumulated 101 keys to it).

@@ +546,5 @@
> +  valueToSet = 0;
> +  keyNamesSet2.forEach(keyName => {
> +    Telemetry.keyedScalarSet(KEYED_EXCEED_SCALAR, keyName, valueToSet++);
> +  });
> +

After this we should assert that KEYED_EXCEED_SCALAR has exactly 100 keys. (it should be missing the last of keyNamesSet2's keys because it already has KEYED_UINT_SCALAR when we started).

This will test that we don't infinitely loop when we run out of space on KEYED_EXCEED_SCALAR.
QA Contact: gfritzsche
(Assignee)

Comment 52

7 months ago
Posted patch Bug1451813_181005_2358.patch (obsolete) — Splinter Review
With some luck this should do the trick, and hopefully no excessive white spaces :)
Flags: needinfo?(chutten)
(Reporter)

Updated

6 months ago
Attachment #9014888 - Attachment is obsolete: true
Flags: needinfo?(chutten)
(Reporter)

Comment 53

6 months ago
Comment on attachment 9014963 [details] [diff] [review]
Bug1451813_181005_2358.patch

Review of attachment 9014963 [details] [diff] [review]:
-----------------------------------------------------------------

Really close, for sure.

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +688,4 @@
>  public:
>    typedef mozilla::Pair<nsCString, nsCOMPtr<nsIVariant>> KeyValuePair;
>  
> +  explicit KeyedScalar(const BaseScalarInfo& info) 

Alas, there's a bit of whitespace here.

@@ +883,5 @@
> +    ProcessID process = ProcessID::Parent;
> +    nsresult rv = internal_GetKeyedScalarByEnum(locker, uniqueId, process, &scalarExceed);
> +
> +    if (NS_FAILED(rv)) {
> +      return ScalarResult::InvalidType;

Let's go with ScalarResult::TooManyKeys for this one, too.

@@ +886,5 @@
> +    if (NS_FAILED(rv)) {
> +      return ScalarResult::InvalidType;
> +    }
> +
> +    scalarExceed->AddValue(locker, NS_ConvertUTF8toUTF16("telemetry.keyed_scalars_exceed_limit"), 1);

Oh, we don't want it this way. We want the key to be mScalarInfo.name(). We want to record the name of the keyed scalar that wants to have too many keys.

::: toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js
@@ +538,5 @@
> +
> +  // Check that KEYED_EXCEED_SCALAR is in keyedScalars
> +  Assert.ok((KEYED_EXCEED_SCALAR in keyedScalars),
> +            "We have exceeded maximum number of Keys.");
> +

We also want to ensure the KEYED_UINT_SCALAR is in keyedScalars[KEYED_EXCEED_SCALAR] and that its value is 1 to make sure we recorded that KEYED_UINT_SCALAR tried to record 1 too many keys.
(Assignee)

Comment 54

6 months ago
Whitespaces should be called Ghostspaces(TM pending Haha) they keep appearing ;). Fixed the code part.

For the test to work, like i think we want, i had to remove these two lines
https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js#517-518
//  Telemetry.keyedScalarSet(KEYED_UINT_SCALAR, LAST_KEY_NAME, 1);
//  Telemetry.keyedScalarSetMaximum(KEYED_UINT_SCALAR, LAST_KEY_NAME, 10)

Otherwise keyedScalars[KEYED_EXCEED_SCALAR][KEYED_UINT_SCALAR] would equal 3.

This should then be correct, right?

  // Check that KEYED_UINT_SCALAR is in keyedScalars and its value equals 1
  Assert.ok((KEYED_UINT_SCALAR in keyedScalars[KEYED_EXCEED_SCALAR]), "The keyed Scalar is in the keyed exceeded scalar");
  Assert.equal(keyedScalars[KEYED_EXCEED_SCALAR][KEYED_UINT_SCALAR], 1, "We have exactly 1 key over the limit");
(Reporter)

Comment 55

6 months ago
We can assert it's 3, then. Even better!
(Assignee)

Comment 56

6 months ago
Posted patch Bug1451813_181012_2317.patch (obsolete) — Splinter Review
Noticed a while ago that the folder structure for telemetry/ had changed some, that TelemetryScalar.cpp moved into a core/ folder. So because of that and hm that i somehow screwed up my mercurial here is a new patch that hopefully does not reintroduce some old issues. Does not Look like that, and hopefully fixes all issues. Please have a look
Attachment #9014963 - Attachment is obsolete: true
Flags: needinfo?(chutten)
(Reporter)

Comment 57

6 months ago
Comment on attachment 9016848 [details] [diff] [review]
Bug1451813_181012_2317.patch

Review of attachment 9016848 [details] [diff] [review]:
-----------------------------------------------------------------

You're right, this does address more or less all of the concerns. Well done!

I did just notice a small bug that we'll need to address, and two really small things.

::: toolkit/components/telemetry/core/TelemetryScalar.cpp
@@ +874,4 @@
>      return ScalarResult::Ok;
>    }
>  
> +  if (mScalarKeys.Count() >= mMaximumNumberOfKeys && !aKey.EqualsLiteral("telemetry.keyed_scalars_exceed_limit")) {

Oh, I just noticed that this condition will create a new key with string "telemetry.keyed_scalars_exceed_limit" on any keyed scalar regardless of whether it hit the maximum number of keys or not.

We need to break this condition up so that every time mScalarKeys.Count() > mMaximumNumberOfKeys we return TooManyKeys.

So maybe something like

if (mScalarKeys.Count() >= mMaximumNumberOfKeys) {
  if (aKey.EqualsLiteral("telemetry.keyed_scalars_exceed_limit")) {
    return ScalarResult::ToManyKeys;
  }
  ...the rest of the block
}

::: toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js
@@ +516,4 @@
>    const LAST_KEY_NAME = "overflowing_key";
>    Telemetry.keyedScalarAdd(KEYED_UINT_SCALAR, LAST_KEY_NAME, 10);
>    Telemetry.keyedScalarSet(KEYED_UINT_SCALAR, LAST_KEY_NAME, 1);
> +  Telemetry.keyedScalarSetMaximum(KEYED_UINT_SCALAR, LAST_KEY_NAME, 10)

Lost a semicolon on this line

@@ +545,5 @@
> +  for (let k = 0; k < 100; k++) {
> +    keyNamesSet2.add("key2_" + k);
> +  }
> +
> +  // Add 100 keys to an histogram and set their initial value.

replace "an histogram" with "the keyed exceed scalar"
(Reporter)

Comment 58

6 months ago
Now, since we're adding new data collection to Firefox we need to request Data Collection Review[1]. Since I'm the one asking for this collection, I'll take care of filling out the form and stuff. Just letting you know ahead of time so you aren't confused about what I'm doing :)

[1]: https://wiki.mozilla.org/Firefox/Data_Collection
Flags: needinfo?(chutten)
(Reporter)

Comment 59

6 months ago
Posted file data review request
Attachment #9017182 - Flags: review?(francois)
(Assignee)

Comment 60

6 months ago
Posted patch Bug1451813_181015_1940.patch (obsolete) — Splinter Review
Allright. Closing in. This should fix these pointers :) 

It was good that you noticed the missing semicolon row. It should not have been there. 
I had removed the two lines in the test to get keyedScalars[KEYED_EXCEED_SCALAR][KEYED_UINT_SCALAR] equal 1, committed. Then submitted progress-patch of that, but then when it was okey that it could be 3 aswell, then i re-added them, made a new commit and those lines where added to the patch :/


Ahh yeah remember reading about those forms when i started to look at this bug...
Attachment #9016848 - Attachment is obsolete: true
Comment on attachment 9017182 [details]
data review request

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate? 

Yes, in Scalars.yaml.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, telemetry setting.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Yes, :chutten.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 1.

5) Is the data collection request for default-on or default-off?

Default ON.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, permanent.
Attachment #9017182 - Flags: review?(francois) → review+
(Reporter)

Comment 62

6 months ago
Comment on attachment 9017264 [details] [diff] [review]
Bug1451813_181015_1940.patch

Review of attachment 9017264 [details] [diff] [review]:
-----------------------------------------------------------------

Patch looks good to me (though the first hunk of the patch file is corrupt. It says 34 affected lines but it should have 20 affected lines). I've sent it up to our "try server" architecture to make sure it builds and passes the tests. We'll follow along here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc35f71fa8ebab2f6ebbd894b7ada9b097fa491e

If it comes out green, I'll mark this checkin-needed and we'll be done!

It might be a good time to start thinking about what you'd like to work on next. Have you any ideas?
Attachment #9017264 - Flags: review+
(Assignee)

Comment 63

6 months ago
Hussah! :)

Hm see alot of green, but some orange, so hmm ... 

Anywho one thing i will have to look into is getting a solid mercuiral workflow. Feels abit like black magic and voodo mixed together at this point ;) I Do have looked into Phabricator, but im not sure it would help me, but its a discussion for #developers though.

I have another patch im gonna start looking at soon, 1370224 and supplied a progress patch for a bug that causes openSUSE linux to not have bootstrap functionallity... but hey, if you had something in mind, please let me know :)
(Reporter)

Comment 64

6 months ago
Seems as though we're getting crashes when test_TelemetryScalars.js is run on a debug build. That usually means we're violating an assert someplace.

If we look at the log we can get the stack trace of the crashing thread. Go here [1] and search for "application crashed" and you should find something that looks like 

> [task 2018-10-16T14:57:22.534Z] 14:57:22     INFO -   0  libxul.so!(anonymous namespace)::KeyedScalar::AddValue(mozilla::BaseAutoLock<mozilla::AnyStaticMutex> const&, nsTSubstring<char16_t> const&, unsigned int) [TelemetryScalar.cpp:dc35f71fa8ebab2f6ebbd894b7ada9b097fa491e : 799 + 0x0]

This means the crash happened in KeyedScalar::AddValue on line 799 of TelemetryScalar.cpp. That line looks like this:

> 799     MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");

That's interesting. I didn't think we'd assert on that. And, in fact, in the JS versions of these APIs (the ones that take nsIVariants) we don't assert. Instead we just return the TooManyKeys error code and continue along.

So. What is the correct way to proceed. I'm not sure. Let's ask Alessio who implemented this about the choice to assert here and not assert on the JS-facing ones, and what the intent is.

[1]: https://taskcluster-artifacts.net/HaZ5SS2XSXKb9ddcEs2sPw/0/public/logs/live_backing.log
Flags: needinfo?(alessio.placitelli)
(Assignee)

Comment 65

6 months ago
okidoki, gonna try to build a local debug build and see if i can get the test to crash aswell... 
But could it not be so that, since we have been poking at what happens when a scalar gets too many keys, this assertion is not correct anymore?!
(In reply to Chris H-C :chutten from comment #64)
> This means the crash happened in KeyedScalar::AddValue on line 799 of
> TelemetryScalar.cpp. That line looks like this:
> 
> > 799     MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");
> 
> That's interesting. I didn't think we'd assert on that. And, in fact, in the
> JS versions of these APIs (the ones that take nsIVariants) we don't assert.
> Instead we just return the TooManyKeys error code and continue along.

This was intentional (see bug 1277806 comment 17 for context): when using the JS API, we most probably have access to the browser console so we can notify devs through it if something goes wrong. With the C++ API is a bit different, as it might be a bit more complicated to access the browser console. We decided to assert on errors in c++ debug builds (so only asserting to devs) and communicate the error through the console with the JS API. Both are meant as ways for developers to catch this error early during development rather than capturing weird data and re-iterate on the code again.

This doesn't mean it can't be changed: since we're about to report which keyed scalars fail using telemetry, I think it should be ok to get rid of this (and related) asserts.
Flags: needinfo?(alessio.placitelli)
(Reporter)

Comment 67

6 months ago
Let's go with not asserting on TooManyKeys. Something well commented like:

// Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
if (sr != ScalarResult::Ok && sr != ScalarResult::TooManyKeys) {

We do still want to assert on other non-Ok cases because telemetry.keyed_scalars_exceed_limit doesn't cover those cases.

This'll be needed for all the KeyedScalar:: calls that have a void return.

Sound like a plan, Nicklas?
(Assignee)

Comment 68

6 months ago
Posted patch Bug1451813_181017_2155.patch (obsolete) — Splinter Review
Yeah that sounds like a plan. Attaching patch to fix that.

BUT, for some reason now the test fails locally. It fails on Telemetry.keyedScalarSet, unless i change the test to have 99 (like i did below, not in the patch) instead of 100, then it passes...What would be the reason for that? It had earlier taken one of the Assertion-if-statements that we changed now, so now it doesnt find that one, and times out?

The error it shows is :
 0:07.50 pid:13279 ExceptionHandler::GenerateDump cloned child 13297
 0:07.50 pid:13279 ExceptionHandler::SendContinueSignalToChild sent continue signal to child
 0:07.50 pid:13279 ExceptionHandler::WaitForContinueSignal waiting for continue signal...


----
  // Generate the names for the exceeded keys
  let keyNamesSet2 = new Set();
  for (let k = 0; k < 99; k++) {
    keyNamesSet2.add("key2_" + k);
  }

  // Add 99 keys to the keyed exceed scalar and set their initial value.
  valueToSet = 0;
  keyNamesSet2.forEach(keyName2 => {
    Telemetry.keyedScalarSet(KEYED_EXCEED_SCALAR, keyName2, valueToSet++);
  });

  // Check that there are exactly 99 keys in KEYED_EXCEED_SCALAR
  Assert.equal(valueToSet, 99,
                "The keyed scalar must contain all the 99 keys.");
----
Attachment #9017264 - Attachment is obsolete: true
Flags: needinfo?(chutten)
(Reporter)

Comment 69

6 months ago
Comment on attachment 9018039 [details] [diff] [review]
Bug1451813_181017_2155.patch

Review of attachment 9018039 [details] [diff] [review]:
-----------------------------------------------------------------

This is why we have tests, so they can catch things like these :D

::: toolkit/components/telemetry/core/TelemetryScalar.cpp
@@ +770,5 @@
>    ScalarBase* scalar = nullptr;
> +  ScalarResult sr = GetScalarForKey(locker, aKey, &scalar);
> +
> +  // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +  if (sr != ScalarResult::Ok && sr != ScalarResult::TooManyKeys) {

I foresee a problem here. In the event the ScalarResult is TooManyKeys we will try to call SetValue on scalar. Scalar is initialized to nullptr, and since we early-return from GetScalarForKey it is likely still nullptr.

We're probably dereferencing a null below here.

This means we need to make these int 2-level if blocks

if (sr != ScalarResult::Ok) {
  // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
  if (sr != ScalarResult::TooManyKeys) {
    MOZ_ASSERT(...);
  }
  return;
}

@@ +784,4 @@
>  {
>    ScalarBase* scalar = nullptr;
> +  ScalarResult sr = GetScalarForKey(locker, aKey, &scalar);
> + 

whitespace ghost
(Reporter)

Updated

6 months ago
Flags: needinfo?(chutten)
(Assignee)

Comment 70

6 months ago
Posted patch Bug1451813_181019_2145.patch (obsolete) — Splinter Review
Yeah that fixed things, now it passes the test locally. 
haha whitespaces. gonna need to increase the color of my whitespace characters :)
Attachment #9018039 - Attachment is obsolete: true
(Reporter)

Comment 71

6 months ago
Comment on attachment 9018703 [details] [diff] [review]
Bug1451813_181019_2145.patch

Review of attachment 9018703 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me, but I've now been looking at it enough times that I might be missing something. Hey :janerik, what do you think about this?
Attachment #9018703 - Flags: review?(jrediger)
(Assignee)

Comment 73

6 months ago
Hmm that was a scary movie ;) 

I see errors, but, there are bug numbers assigned to them, so im guessing that has not todo with this, right? 
Time to cheer perhaps?
Comment on attachment 9018703 [details] [diff] [review]
Bug1451813_181019_2145.patch

Review of attachment 9018703 [details] [diff] [review]:
-----------------------------------------------------------------

The code looks good to me. Some minor nits regarding the now-incorrect assert messages and the test needs to be adjusted to test the correct thing.

::: toolkit/components/telemetry/core/TelemetryScalar.cpp
@@ +773,4 @@
>    if (sr != ScalarResult::Ok) {
> +    // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +    if(sr != ScalarResult::TooManyKeys) {
> +      MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");

The message is not fully correct anymore. It won't trigger for too many keys anymore.

@@ +790,4 @@
>    if (sr != ScalarResult::Ok) {
> +    // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +    if (sr != ScalarResult::TooManyKeys) {
> +      MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");

As above: need to adjust the message.

@@ +807,4 @@
>    if (sr != ScalarResult::Ok) {
> +    // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +    if (sr != ScalarResult::TooManyKeys) {
> +      MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");

As above: need to adjust the message.

@@ +824,4 @@
>    if (sr != ScalarResult::Ok) {
> +    // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +    if(sr != ScalarResult::TooManyKeys) {
> +      MOZ_ASSERT(false, "Key too long or too many keys are recorded in the scalar.");

As above: need to adjust the message.

::: toolkit/components/telemetry/tests/unit/test_TelemetryScalars.js
@@ +552,5 @@
> +    Telemetry.keyedScalarSet(KEYED_EXCEED_SCALAR, keyName2, valueToSet++);
> +  });
> +
> +  // Check that there are exactly 100 keys in KEYED_EXCEED_SCALAR
> +  Assert.equal(valueToSet, 100,

This assert is only checking that we correctly increased `valueToSet` 100 times. It's not testing the keyed scalar at all.
To test that it contains the right number of keys you need to take a snapshot, something along these lines:


let snapshot = Telemetry.snapshotKeyedScalars(Ci.nsITelemetry.DATASET_RELEASE_CHANNEL_OPTIN, false);
Assert.equal(100, Object.keys(snapshot.parent[KEYED_UINT_SCALAR]).length, 
             "The keyed scalar must contain all the 100 keys.");
Attachment #9018703 - Flags: review?(jrediger) → review+
(Reporter)

Comment 75

6 months ago
(In reply to Nicklas Boman [:smurfd] from comment #73)
> Hmm that was a scary movie ;) 
> 
> I see errors, but, there are bug numbers assigned to them, so im guessing
> that has not todo with this, right? 
> Time to cheer perhaps?

Time to cheer indeed, and to make the adjustments Jan-Erik found :)
(Assignee)

Comment 76

6 months ago
Posted patch Bug1451813_181023_1935.patch (obsolete) — Splinter Review
I changed the message to be : "Key too long is recorded in the scalar."
Not sure if that is allright?

Also added that snapshot. Built okey and passed running the xpcshell test.
Attachment #9018703 - Attachment is obsolete: true
Flags: needinfo?(jrediger)
(Reporter)

Comment 77

6 months ago
Comment on attachment 9019438 [details] [diff] [review]
Bug1451813_181023_1935.patch

Review of attachment 9019438 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/core/TelemetryScalar.cpp
@@ +773,4 @@
>    if (sr != ScalarResult::Ok) {
> +    // Bug 1451813 - We now report which scalars exceed the key limit in telemetry.keyed_scalars_exceed_limit.
> +    if(sr != ScalarResult::TooManyKeys) {
> +      MOZ_ASSERT(false, "Key too long is recorded in the scalar.");

Better grammar would be "Key too long to be recorded in the scalar."
(Assignee)

Comment 78

6 months ago
You are right, fixed!
Attachment #9019438 - Attachment is obsolete: true
(Reporter)

Comment 79

6 months ago
Comment on attachment 9019464 [details] [diff] [review]
Bug1451813_181023_2120.patch

Clearing the ni? for Jan-Erik as he provided r+ on the earlier patch. Carrying forward the r+.

Next step, marking this for checkin.
Flags: needinfo?(jrediger)
Attachment #9019464 - Flags: review+
(Reporter)

Updated

6 months ago
Keywords: checkin-needed

Comment 80

6 months ago
Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f54dd68a3330
Report which keyed scalars fail to accumulate due to running out of keys. r=chutten
Keywords: checkin-needed

Comment 81

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f54dd68a3330
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.