Closed Bug 1453591 Opened 6 years ago Closed 6 years ago

Implement measurement persistence to support GeckoView

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: Dexter, Assigned: Dexter)

References

Details

(Whiteboard: [geckoview:klar:p1])

Attachments

(6 files, 4 obsolete files)

59 bytes, text/x-review-board-request
chutten
: review+
gfritzsche
: review+
janerik
: review+
froydnj
: review+
bzbarsky
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
gfritzsche
: review+
bzbarsky
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
froydnj
: review+
janerik
: review+
Details
59 bytes, text/x-review-board-request
esawin
: review+
chutten
: review+
janerik
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
janerik
: review+
Details
1.83 KB, patch
Details | Diff | Splinter Review
GeckoView can be loaded/unloaded frequently. We need a way to regularly persist the measurements and load them back when needed.
Blocks: 1452550
Priority: -- → P2
Assignee: nobody → alessio.placitelli
Priority: P2 → P1
Adding [geckoview:klar] whiteboard tag because this bug is a Klar+GeckoView blocker.
Whiteboard: [geckoview:klar]
Depends on: 1454606
Note: this first push is just my set of local, WIP changes. They are not meant for review yet! What's missing:

- I/O should happen off the main thread;
- proper test coverage (might be blocked by bug 1318091);
- support for keyed scalars;
- documentation.
Attachment #8969178 - Attachment is obsolete: true
Attachment #8970108 - Attachment is obsolete: true
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

This is ready for an initial pass, I'm not flagging for review on it as I'm currently hacking to make gtest coverage work. I'll r? once that's ready.

This implements the timer that will trigger persisting measurements to the disk. It performs the I/O off the main thread. The general idea is that Android can kill us at any time without any previous communication. For this reason I'm attempting saving to a file and then moving it to the final destination.
Attachment #8969177 - Flags: feedback?(jrediger)
Attachment #8969177 - Flags: feedback?(chutten)
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

As for the other patch, I'm holding back the r? for once the test coverage is complete.

This patch adds the logic to Persist/Load back scalars and keyed scalars.
Attachment #8970283 - Flags: feedback?(jrediger)
Attachment #8970283 - Flags: feedback?(chutten)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review245688

This already looks quite good.

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:40
(Diff revision 5)
> +namespace TelemetryGeckoViewPersistence = mozilla::TelemetryGeckoViewPersistence;
> +
> +// Workaround to log and debug on Android from C++. Unfortunately, to my
> +// knowledge, there is no better way.
> +// #define ANDROID_LOG(...)
> +#define ANDROID_LOG(...) printf_stderr("\n**** TELEMETRY: " __VA_ARGS__)

Will you make this a no-op if build for release?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:145
(Diff revision 5)
> +  bool exists;
> +  nsresult rv = file->Exists(&exists);
> +  if (NS_FAILED(rv)) {
> +    ANDROID_LOG("DeleteLocalFile - Can't check if the persistence file exists\n");
> +    return;
> +  }

Do we actually need the existence check? The file will be removed afterwards anyway, we could just try to delete a non-existing file and be good if that fails.

(I've seen too many race-conditions because of multiple checks and modifications on files, this is probably harmless though)

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:264
(Diff revision 5)
> +    if (NS_SUCCEEDED(file->Exists(&backupExists))
> +        && backupExists) {
> +      IFStream f(persistenceBackupFile.get());
> +      backupLoaded = reader.parse(f, root, false /*collectComments*/);
> +      if (backupLoaded) {
> +        ANDROID_LOG("LoadPersistedData - Failed to parse data backup at");

This log message is wrong, isn't it?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:343
(Diff revision 5)
> +    gPersistenceTimer = nullptr;
> +  }
> +}
> +
> +void
> +TelemetryGeckoViewPersistence::ClearPersistenceData()

This function is currently never called.

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:355
(Diff revision 5)
> +    ANDROID_LOG("ClearPersistenceData - Failed to get the path to the persistence file\n");
> +    return;
> +  }
> +
> +  // Trigger the loading of the persistence data. After the function
> +  // completes it will automatically arm the persistence timer.

Wrong comment.
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review245732

::: toolkit/components/telemetry/TelemetryScalar.cpp:3138
(Diff revision 3)
> +/**
> + * Load the persisted measurements from a Json object and injects them
> + * in the relevant process storage.
> + *
> + * @param {aData} The input Json object.
> + * @returns NS_OK if loading was performed, an error code explaing the

nit. explaining

::: toolkit/components/telemetry/TelemetryScalar.cpp:3224
(Diff revision 3)
> +/**
> + * Load the persisted measurements from a Json object and injects them
> + * in the relevant process storage.
> + *
> + * @param {aData} The input Json object.
> + * @returns NS_OK if loading was performed, an error code explaing the

nit. explaining
Attachment #8970283 - Flags: review+
Attachment #8969177 - Flags: feedback?(jrediger) → feedback+
Attachment #8970283 - Flags: feedback?(jrediger) → feedback+
Depends on: 1457127
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review245688

> Do we actually need the existence check? The file will be removed afterwards anyway, we could just try to delete a non-existing file and be good if that fails.
> 
> (I've seen too many race-conditions because of multiple checks and modifications on files, this is probably harmless though)

Good point. Removed that, we can simply catch the failure of `Remove`!

> This function is currently never called.

Good point. I forgot to add a change to this patch before pushing it: basically, we need to expose a new clear method from nsITelemetry (only for GV), so that GeckoViewTelemetryController will be able to take a single snapshot and control the "clear" semantics. This is needed so that the storage and the persistence is cleared just once, at a single and definite point in time.
[geckoview:klar:p1] because Gecko telemetry is a Klar+GeckoView blocker and Alessio is already working on this bug. :)
Whiteboard: [geckoview:klar] → [geckoview:klar:p1]
Attachment #8970285 - Attachment is obsolete: true
Some more context for the reviewers :)

@esawin - what's the best way (Android friendly!) to expose "internal-telemetry-geckoview-load-complete" to GV or the consumer App (e.g. Klar?). The purpose of that notification is to tell the application when it can start requesting snapshots/clear. Can we expose an API endpoint for async waiting on that topic? Or to subscribe for it?

@chutten, @janerik - I know, that's a lot of patches - sorry! The main idea for this is described in [1]: once Telemetry starts, on GeckoView only, we kick off the persistence which regularly (every 1 minute for now, we'll tweak it later in bug 1457382). Each probe module (TelemetryScalar, TelemetryHistogram, ...) is responsible for persisting/unpersisting data from/to Json using jsoncpp. The weird part comes for test coverage.

-- Test coverage addendum --
Android does not support running gtests (see bug 1318091). However, we can still work around this problem by tricking the build system into building our GV specific stuff on all the platforms that support test coverage. That's the reason why I introduced |MOZ_TELEMETRY_GECKOVIEW| and why we have GV gtests in a separate directory compared to our usual gtests. Things got a bit weird when I tried to link TelemetryScalar.cpp to xul-gtest: compilation started failing due to the lack of all the different header files, which I didn't know how to fix. Ideally, @froydnj should take a look at this patch to make sure I'm not doing anything too stupid (he's on PTO). Due to the fact that I'm not compiling TelemetryScalar for the gtest part, I'm re-defining the required symbols in the TestGeckoView.cpp files (see PersistScalars et al).

[1] - https://docs.google.com/document/d/16pyAsPr78ITPJPPK12Dz_vUc-8mkRiNlbf00VHU6POY/edit
Flags: needinfo?(jrediger)
Flags: needinfo?(esawin)
Flags: needinfo?(chutten)
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246074
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246076
Attachment #8970283 - Flags: review?(jrediger)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246078
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246080

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:43
(Diff revision 6)
> +namespace TelemetryGeckoViewPersistence = mozilla::TelemetryGeckoViewPersistence;
> +
> +// Workaround to log and debug on Android from C++. Unfortunately, to my
> +// knowledge, there is no better way.
> +#define ANDROID_LOG(...)
> +// #define ANDROID_LOG(...) printf_stderr("\n**** TELEMETRY: " __VA_ARGS__)

Can we conditionally define this log to be available in debug builds?
`MOZ_LOG` works similarly (see https://searchfox.org/mozilla-central/source/xpcom/base/Logging.h#27-31, that constant is not available from the outside though: https://searchfox.org/mozilla-central/source/xpcom/base/Logging.h#265-267)

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:395
(Diff revision 6)
> +      StaticMutexAutoLock locker(gGeckoViewTelemetryMutex);
> +
> +      nsCOMPtr<nsIFile> backupFile;
> +      if (NS_FAILED(GetPersistenceFile(NS_LITERAL_STRING(".json.bak"), backupFile)) ||
> +          NS_FAILED(backupFile->Remove(false))) {
> +        ANDROID_LOG("Persist - Failed to remove the backup persistence file.");

nit. Comment should mention correct method `ClearPersistenceData`

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:402
(Diff revision 6)
> +      }
> +
> +      nsCOMPtr<nsIFile> persistenceFile;
> +      if (NS_FAILED(GetPersistenceFile(NS_LITERAL_STRING(".json"), persistenceFile)) ||
> +          NS_FAILED(persistenceFile->Remove(false))) {
> +        ANDROID_LOG("Persist - Failed to remove the persistence file.");

nit. as above: `ClearPersistenceData`

::: toolkit/components/telemetry/nsITelemetry.idl:576
(Diff revision 6)
> +
> +  /**
> +   * Reset the storage for all the collection primitives that GeckoView supports.
> +   * Please note that this is only intended to be used by GeckoViewTelemetryController.
> +   */
> +  void clearProbes();

`clearProbes` is not yet called anywhere.
Attachment #8969177 - Flags: review?(jrediger) → review-
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246096

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:61
(Diff revision 6)
> +{
> +  // Get the OS temporary directory.
> +  nsAutoString mockedPath;
> +  GetMockedDataDir(mockedPath);
> +  // Set the environment variable to mock.
> +  ASSERT_EQ(PR_SetEnv(nsPrintfCString("MOZ_ANDROID_DATA_DIR=%s",

`PR_SetEnv` requires the string to be persistently available, it does not copy it[1].
By calling `.get()` you only get a temporary pointer to the underlying string, which is destroyed once the string goes out of scope.
One of the functions to allocate a new `char*` from it or a way to "leak" the string object is necessary.


[1]: https://searchfox.org/mozilla-central/source/nsprpub/pr/include/prenv.h#113-117
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246118

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:43
(Diff revision 6)
> +namespace TelemetryGeckoViewPersistence = mozilla::TelemetryGeckoViewPersistence;
> +
> +// Workaround to log and debug on Android from C++. Unfortunately, to my
> +// knowledge, there is no better way.
> +#define ANDROID_LOG(...)
> +// #define ANDROID_LOG(...) printf_stderr("\n**** TELEMETRY: " __VA_ARGS__)

You could use something like that:
```
#include <android/log.h>

#define ANDROID_LOG(fmt) __android_log_print(ANDROID_LOG_DEBUG, "Telemetry", "%s: " fmt, __func__, #__VA_ARGS__)
```

See https://developer.android.com/ndk/reference/group/logging.
(In reply to Alessio Placitelli [:Dexter] from comment #36)
> Some more context for the reviewers :)
> 
> @esawin - what's the best way (Android friendly!) to expose
> "internal-telemetry-geckoview-load-complete" to GV or the consumer App (e.g.
> Klar?). The purpose of that notification is to tell the application when it
> can start requesting snapshots/clear. Can we expose an API endpoint for
> async waiting on that topic? Or to subscribe for it?

Do we need to expose it in the API at all? The RuntimeTelemetry.getSnapshots() request is async, we could observe "internal-telemetry-geckoview-load-complete" in GeckoViewTelemetryController instead to delay handling of the request.
Flags: needinfo?(esawin)
Depends on: 1457472
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246080

> `clearProbes` is not yet called anywhere.

This will be taken care of by bug 1457472 as we need to refactor the the GeckoView API a bit.
(In reply to Alessio Placitelli [:Dexter] from comment #36)
> -- Test coverage addendum --
> Android does not support running gtests (see bug 1318091). However, we can
> still work around this problem by tricking the build system into building
> our GV specific stuff on all the platforms that support test coverage.
> That's the reason why I introduced |MOZ_TELEMETRY_GECKOVIEW| and why we have
> GV gtests in a separate directory compared to our usual gtests. Things got a
> bit weird when I tried to link TelemetryScalar.cpp to xul-gtest: compilation
> started failing due to the lack of all the different header files, which I
> didn't know how to fix. Ideally, @froydnj should take a look at this patch
> to make sure I'm not doing anything too stupid (he's on PTO). Due to the
> fact that I'm not compiling TelemetryScalar for the gtest part, I'm
> re-defining the required symbols in the TestGeckoView.cpp files (see
> PersistScalars et al).
> 

A note before I dive in, I too had problems including TelemetryScalar in gtest over thisaway: https://reviewboard.mozilla.org/r/232220/diff/3#index_header

Not sure if that helps, or is the cause of your trouble, or is completely unrelated.
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #45)
> A note before I dive in, I too had problems including TelemetryScalar in
> gtest over thisaway:
> https://reviewboard.mozilla.org/r/232220/diff/3#index_header
> 
> Not sure if that helps, or is the cause of your trouble, or is completely
> unrelated.

Thanks for mentioning it! I checked, it's unrelated :( I'm trying to build TelemetryScalar.CPP *again* for gtest, linking the new one instead of the old one. It's tricky :) The header, after the change you landed, works just fine!
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246146

Files, Strings, Timers, and multiple threads. You really did tick all the "tough to review" checkboxes on this one, Alessio :S

Overall the patch and code seem straightforward which is an incredible feat given what this hopes to enable. I have some concerns and comments, though...

One broad impression I had was that it isn't immediately apparent from naming convention, namespace, or location in the file which code is expected to run on which thread. It's a fairly light file (in terms of number of functions) which helps, but I found myself wishing for at least a few more MOZ_ASSERTs denoting which threads were acceptable and which weren't so I could clearly understand which functions were re-entrant. A convention we had at a previous job was to say the thread in the name: PersistenceThreadPersist, MainThreadArmPersistenceTimer

But maybe that's overmuch and I was just thrown by how the utility file functions are persistence-thread-only.

::: toolkit/components/telemetry/Telemetry.cpp:98
(Diff revision 6)
>  #if defined(MOZ_GECKO_PROFILER)
>  #include "shared-libraries.h"
>  #include "KeyedStackCapturer.h"
>  #endif // MOZ_GECKO_PROFILER
>  
> +#if defined(MOZ_WIDGET_ANDROID)

I'll say it once just to get it out of my system:

Ugh. I think we're going to regret not having compile-time flags for GV different from ANDROID.

::: toolkit/components/telemetry/Telemetry.cpp:1865
(Diff revision 6)
> +  // We only support this in GeckoView.
> +  if (mozilla::jni::IsFennec()) {
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  // TODO: supporting clear for histograms will come from bug 1457127.

When will support for Events come? (I'm guessing it isn't scoped)

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.h:11
(Diff revision 6)
> +
> +
> +#ifndef GeckoViewTelemetryPersistence_h__
> +#define GeckoViewTelemetryPersistence_h__
> +
> +class nsITimer;

unused, no?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.h:14
(Diff revision 6)
> +#define GeckoViewTelemetryPersistence_h__
> +
> +class nsITimer;
> +
> +namespace mozilla {
> +namespace TelemetryGeckoViewPersistence {

Why the swap in order between GVTP in the filename and TGVP in the type name?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:13
(Diff revision 6)
> +
> +#include "json/json.h"
> +#include "nsDirectoryServiceDefs.h"
> +#include "nsIFile.h"
> +#include "nsIObserverService.h"
> +#include "nsIInputStream.h"

Includes aren't in order, if that matters to you.

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:277
(Diff revision 6)
> +  });
> +
> +  // Load the file and parse the JSON.
> +  Json::Value root;
> +  {
> +    StaticMutexAutoLock locker(gGeckoViewTelemetryMutex);

The only places this mutex is used are on gPersistenceThread... so why do we have a mutex?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:292
(Diff revision 6)
> +      // We managed to load the backup file, let's remove it.
> +      backupLoaded = NS_SUCCEEDED(ReadFromFile(backupFile, root));
> +      if (!backupLoaded) {
> +        ANDROID_LOG("LoadPersistedData - Failed to load data backup at %s",
> +                    backupFile->HumanReadablePath().get());
> +        mozilla::Unused << backupFile->Remove(false);

Do we really want to remove it? What if Android kills us before the first timer fire? Then we will have lost the backup due to a single short session following an ill-timed shutdown.

(not likely to happen, but.)

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:317
(Diff revision 6)
> +  }
> +}
> +
> +} // anonymous namespace
> +
> +// This namespace

This comment confuses me.

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:343
(Diff revision 6)
> +
> +  ANDROID_LOG("InitPersistence");
> +
> +  // Spawn a new thread for handling GeckoView Telemetry persistence I/O.
> +  // We just spawn it once and re-use it later.
> +  nsresult rv = NS_NewNamedThread("PersistenceIO", getter_AddRefs(gPersistenceThread));

Bikeshed: should we include the word "Telemetry" in here?

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:371
(Diff revision 6)
> +  // our memory in case some other foreground application needs it.
> +  ANDROID_LOG("DeInitPersistence");
> +
> +  if (gPersistenceThread) {
> +    gPersistenceThread->Shutdown();
> +    gPersistenceThread = nullptr;

Are we guaranteed that DeInitPersistence is being called on the Main thread? If not there may be a currently-running timer when we null out this thread reference.

(maybe relative thread priorities make this a non-issue)

::: toolkit/components/telemetry/geckoview/GeckoViewTelemetryPersistence.cpp:388
(Diff revision 6)
> +{
> +  ANDROID_LOG("ClearPersistenceData");
> +
> +  // Trigger the loading of the persistence data. After the function
> +  // completes it will automatically arm the persistence timer.
> +  gPersistenceThread->Dispatch(NS_NewRunnableFunction("ClearPersistedData",

May not hurt to have a MOZ_ASSERT(gPersistenceThread) at the top to cover some rogue caller trying to clear before init/after deinit.
Attachment #8969177 - Flags: review?(chutten) → review-
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246176

Just tiny things, really. Though I would definitely prefer the utility function pulled out.

::: toolkit/components/telemetry/TelemetryScalar.cpp:264
(Diff revision 5)
> +/**
> + * Convert a nsIVariant to a Json::Value, used for GeckoView persistence.
> + */
> +nsresult
> +VariantToJsonValue(uint32_t aScalarType, nsIVariant* aInputValue,
> +                   const char* aOutKey, Json::Value& aOutValue)

In other places you use "Out" to mean an outparam. It might be clearer to call this aJsonKey (if we want to maintain the illusion of this being used generically) or aScalarName (if we want to be very specific)

::: toolkit/components/telemetry/TelemetryScalar.cpp:3064
(Diff revision 5)
> +    ScalarTupleArray& processScalars = iter.Data();
> +    const char* processName = GetNameForProcessID(ProcessID(iter.Key()));
> +
> +    Json::Value currentProcess;
> +
> +    for (ScalarTupleArray::size_type i = 0; i < processScalars.Length(); i++) {

Since this is an nsTArray, we can use new-style for loops, no?

for (const ScalarDataTuple& scalar : processScalars)

(and, later, once we force contributors to use gcc7 and clang4, we could do

for (const auto& {name, value, kind} : processScalars)

see https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code and C++17 Structured Bindings)

::: toolkit/components/telemetry/TelemetryScalar.cpp:3081
(Diff revision 5)
> +
> +  return NS_OK;
> +}
> +
> +/**
> + * Write the scalar data to the provided Json object, for

*the _keyed_ scalar data

::: toolkit/components/telemetry/TelemetryScalar.cpp:3112
(Diff revision 5)
> +    KeyedScalarTupleArray& processScalars = iter.Data();
> +    const char* processName = GetNameForProcessID(ProcessID(iter.Key()));
> +
> +    Json::Value currentProcess;
> +
> +    for (KeyedScalarTupleArray::size_type i = 0; i < processScalars.Length(); i++) {

Ditto range-based for loop

::: toolkit/components/telemetry/TelemetryScalar.cpp:3153
(Diff revision 5)
> +  if (!XRE_IsParentProcess()) {
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  // Utility functions for handling process names. This could potentially
> +  // be moved to TelemetryCommon once they get used by other modules.

We don't already have this?! I say move it now. 

(( we could be really clever with this if we wanted. We could generate at build time the logic for determining name -> id mapping by checking just the first N characters where N currently is 1 (C,D,G,P,E). If we added a Plugin process N is 1 for all but case 'P' when it is 2 (case 'l' case 'a')... we could even get clever about interior string characters... /nerd-snipe ))

::: toolkit/components/telemetry/TelemetryScalar.cpp:3165
(Diff revision 5)
> +    MOZ_ASSERT(false, "Unknown process name.");
> +    // This should never happen.
> +    return ProcessID::Count;
> +  };
> +
> +  auto IsValidProcessName = [](const char* aProcName) -> bool {

What's the value of IsValidProcessName over GetIDForProcessName < ProcessID::Count? That it doesn't assert? I think we can get the same semantics with less duplication.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3210
(Diff revision 5)
> +      if (NS_FAILED(rv)) {
> +        NS_WARNING(nsPrintfCString("Failed to set the value for %s", scalarName.c_str()));
> +        continue;
> +      }
> +
> +      mozilla::Unused << internal_UpdateScalar(lock,

This makes me a little sad that we have to go through all this effort of passing and querying type information just so we can pretend we don't know it when we call UpdateScalar.

No bug, just a little sad.
Attachment #8970283 - Flags: review?(chutten) → review-
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review246182

::: commit-message-4a39a:6
(Diff revision 2)
> +Bug 1453591 - Add a signal for checking when persisted probes loading completes. r?janerik,chutten,esawin
> +
> +This patch adds a new topic, for internal use only, which is notified once the
> +Telemetry core completes loading all the persisted measurements. This will be
> +useful for applications to have a signal for when is the right time to start
> +requesting snapshots/clears.

To say nothing of when it is acceptable to start -recording- data. Persisted data will overwrite current data, so early recordings can't be guaranteed to stick if there's a pending load.
Attachment #8971493 - Flags: review?(chutten)
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246186

Took me longer than I'd care to admit to clue in that your mocked scalar functions were the ones performing the asserts :S

Congratulations on getting this working!

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:24
(Diff revision 6)
> +using namespace mozilla;
> +using namespace TelemetryTestHelpers;
> +
> +#define EXPECTED_STRING "Nice, expected and creative string."
> +
> +const char kSampleData[] = R"content({

This is the first time I've seen raw string literals used. TIL.

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:137
(Diff revision 6)
> +}
> +
> +void
> +TestPersistScalars(Json::Value &aWriter)
> +{
> +  // Report the same data that's in kSampleData for keyed scalars.

copy-pasta for "keyed scalars"

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:306
(Diff revision 6)
> +TEST_F(TelemetryTestFixture, PersistData) {
> +  AutoJSContextWithGlobal cx(mCleanGlobal);
> +
> +  MockAndroidDataDir();
> +
> +  ASSERT_FALSE(CheckPersistenceFileExists(NS_LITERAL_STRING("json")))

This may pass if Exists fails. Maybe ASSERT_TRUE(NS_SUCCEEDED(rv)) in CheckPersistenceFileExists?

::: toolkit/components/telemetry/moz.build:206
(Diff revision 6)
>  # Add support for GeckoView: please note that building GeckoView
>  # implies having an Android build. The packaging step decides
>  # which files to include. As a consequence, we can simply only
>  # include the GeckoView files on all Android builds.
>  if CONFIG['MOZ_WIDGET_TOOLKIT'] == 'android':
> +    # Introduce this define to conditionally enable GV code in the various

enable _Telemetry_ GV code, specifically.
Attachment #8970284 - Flags: review?(chutten)
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246160

::: toolkit/components/telemetry/TelemetryScalar.cpp:3166
(Diff revision 5)
> +    // This should never happen.
> +    return ProcessID::Count;
> +  };
> +
> +  auto IsValidProcessName = [](const char* aProcName) -> bool {
> +    for (uint32_t id = 0; id < static_cast<uint32_t>(ProcessID::Count); id++) {

This is quite similar to the `GetIDForProcessName`. Can we deduplicate that somehow?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3238
(Diff revision 5)
> +  if (!XRE_IsParentProcess()) {
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  // Utility functions for handling process names. This could potentially
> +  // be moved to TelemetryCommon once they get used by other modules.

These are now used in 2 places (here and above in `LoadPersistedScalars`). Should we extract that?
Attachment #8970283 - Flags: review?(jrediger)
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review246200
Attachment #8971493 - Flags: review?(jrediger)
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246204

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:24
(Diff revision 6)
> +using namespace mozilla;
> +using namespace TelemetryTestHelpers;
> +
> +#define EXPECTED_STRING "Nice, expected and creative string."
> +
> +const char kSampleData[] = R"content({

Does the raw string literal even need the word-delimiter there (`content`)?

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:76
(Diff revision 6)
> +  nsresult rv = NS_GetSpecialDirectory(NS_OS_TEMP_DIR,
> +                                       getter_AddRefs(file));
> +  ASSERT_EQ(NS_SUCCEEDED(rv), true);
> +
> +  // Append the filename and the extension.
> +  nsAutoString fileName(NS_LITERAL_STRING("gv_measurements."));

In the main code you used `gv_measurements` (without the dot) and included a leading dot for the extension. Here it's the other way around.
Should we keep this consistent?

In addition the same literal string is reused across several functions. Could it be a file-wide constant instead?

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:106
(Diff revision 6)
> +  fileName.Append(aExtension);
> +  file->Append(fileName);
> +
> +  bool exists = true;
> +  rv = file->Exists(&exists);
> +  ASSERT_EQ(NS_OK, rv) << "nsIFile::Exists cannot fail";

Again: is the existence check here necessary?

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:110
(Diff revision 6)
> +  rv = file->Exists(&exists);
> +  ASSERT_EQ(NS_OK, rv) << "nsIFile::Exists cannot fail";
> +
> +  if (exists) {
> +    rv = file->Remove(false);
> +    ASSERT_EQ(NS_OK, rv) << "nsIFile::Remove cannot fail";

It should have a better error message for the case that it _does_ fail.

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:165
(Diff revision 6)
> +void
> +TestLoadPersistedScalars(const Json::Value& aData)
> +{
> +
> +  Json::FastWriter writer;
> +  printf_stderr("TestLoadPersistedScalars - read %s\n", writer.write(aData).c_str());

We probably don't need the debug print in tests.

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:197
(Diff revision 6)
> + * of functions that return void. These macros can't be used unless that's
> + * the case.
> + */
> +namespace TelemetryScalar {
> +
> +nsresult PersistScalars(Json::Value &aWriter) { TestPersistScalars(aWriter); return NS_OK; }

By mocking out these `Persist*` functions, we don't have any tests on the actual serialization functions, do we?
Attachment #8970284 - Flags: review?(jrediger)
Flags: needinfo?(jrediger)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246146

This is a great point. I've changed the whole code a bit to address that, please let me know if that makes things clearer.

> I'll say it once just to get it out of my system:
> 
> Ugh. I think we're going to regret not having compile-time flags for GV different from ANDROID.

Yeah, I agree :( I hope that will change in the future.

> When will support for Events come? (I'm guessing it isn't scoped)

Good point. I don't know yet. I've filed bug 1457840 to figure this out.

> Why the swap in order between GVTP in the filename and TGVP in the type name?

Good point! I've swapped the namespace's...name, renamed the file and dropped the "mozilla" namespace, as namespace nesting is discouraged as per [our guide](https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Naming_and_Formatting_code)

> Includes aren't in order, if that matters to you.

It does, thanks for catching this! I've sorted it now.

> The only places this mutex is used are on gPersistenceThread... so why do we have a mutex?

Ah! Right, that's vestigial stuff from the previous iterations!

> Do we really want to remove it? What if Android kills us before the first timer fire? Then we will have lost the backup due to a single short session following an ill-timed shutdown.
> 
> (not likely to happen, but.)

This should only be hit if we have some corrupted backup file that we fail to load. In that case leaving the backup file around should be not too useful. Right?

> This comment confuses me.

Heh, right. I've also moved this namespace to the relevant patch, the gtest coverage one. You'll find the fixed comment there!

> Bikeshed: should we include the word "Telemetry" in here?

Right. For some reason I though thread names were limited to 8 characters, but that doesn't seem to be the case!

> Are we guaranteed that DeInitPersistence is being called on the Main thread? If not there may be a currently-running timer when we null out this thread reference.
> 
> (maybe relative thread priorities make this a non-issue)

Good catch, we should be enforcing that. I've added an assertion.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246146

> Right. For some reason I though thread names were limited to 8 characters, but that doesn't seem to be the case!

hah! ` error: static_assert failed "Thread name must be no more than 16 characters"`
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246176

> We don't already have this?! I say move it now. 
> 
> (( we could be really clever with this if we wanted. We could generate at build time the logic for determining name -> id mapping by checking just the first N characters where N currently is 1 (C,D,G,P,E). If we added a Plugin process N is 1 for all but case 'P' when it is 2 (case 'l' case 'a')... we could even get clever about interior string characters... /nerd-snipe ))

Good point! I was planning to, but I forgot :) About being smart, maybe I can file a follow-up bug on this? Or just defer until this becomes a performance problem (if ever).
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review246182

> To say nothing of when it is acceptable to start -recording- data. Persisted data will overwrite current data, so early recordings can't be guaranteed to stick if there's a pending load.

This is "only" a problem for scalars (and events, if we get to that): I filed bug 1454606 to tackle this.
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review246428
Attachment #8971493 - Flags: review?(esawin) → review+
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246204

> Does the raw string literal even need the word-delimiter there (`content`)?

Uhm, right, that's not needed!

> We probably don't need the debug print in tests.

Right, I forgot to remove these :)

> By mocking out these `Persist*` functions, we don't have any tests on the actual serialization functions, do we?

Right - I'm afraid that's a limitation of the current approach. I'm not sure I can do something differently to fix that, though :(
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246204

> Again: is the existence check here necessary?

I think we need it here: I want to make sure that the function fails because the file is not there, not because we failed to detect the presence of the file.
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246186

> This may pass if Exists fails. Maybe ASSERT_TRUE(NS_SUCCEEDED(rv)) in CheckPersistenceFileExists?

Unfortunately we can't use gtest macros in functions that return a value. To work around the problem, I slightly refactored the function.
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246096

> `PR_SetEnv` requires the string to be persistently available, it does not copy it[1].
> By calling `.get()` you only get a temporary pointer to the underlying string, which is destroyed once the string goes out of scope.
> One of the functions to allocate a new `char*` from it or a way to "leak" the string object is necessary.
> 
> 
> [1]: https://searchfox.org/mozilla-central/source/nsprpub/pr/include/prenv.h#113-117

I've opted for leaking through `ToNewCString` since this is just test code.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246146

> This should only be hit if we have some corrupted backup file that we fail to load. In that case leaving the backup file around should be not too useful. Right?

We can have a complete backup if we're killed after the backup's written but before it's moved. Not sure how likely that would be, but I'm betting it's between "not at all" and "never happens"
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246748

::: toolkit/components/telemetry/moz.build:206
(Diff revisions 6 - 7)
>      SOURCES += [
> -        'geckoview/GeckoViewTelemetryPersistence.cpp'
> +        'geckoview/TelemetryGeckoViewPersistence.cpp'
>      ]
>  
>      EXTRA_JS_MODULES += [
>          'geckoview/GeckoViewTelemetryController.jsm',

Oh I suppose this might be one of the reasons we had GVT in some places instead of TGV throughout.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:44
(Diff revision 7)
> +#include <android/log.h>
> +#define ANDROID_LOG(fmt, ...) \
> +  __android_log_print(ANDROID_LOG_DEBUG, "Telemetry", fmt, ##__VA_ARGS__)
> +#else
> +// If we're building for other platforms (e.g. for running test coverage), try
> +// to print something antway.

antway!

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:163
(Diff revision 7)
> +                                  aFile,
> +                                  PR_RDONLY);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  // Make sure to close the stream.
> +  auto scopedArmTimer = MakeScopeExit([inStream] { inStream->Close(); });

More like scopedStreamClose

Does static analysis complain this is unused?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:328
(Diff revision 7)
> +}
> +
> +} // anonymous namespace
> +
> +void
> +TelemetryGeckoViewPersistence::InitPersistence()

Do we need any guards against multiple inits or multiple deinits?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:396
(Diff revision 7)
> +  MOZ_ASSERT(gPersistenceThread);
> +
> +  ANDROID_LOG("ClearPersistenceData");
> +
> +  // Trigger the loading of the persistence data. After the function
> +  // completes it will automatically arm the persistence timer.

What will automatically arm the persistence timer?
Attachment #8969177 - Flags: review?(chutten) → review-
Comment on attachment 8972030 [details]
Bug 1453591 - Add a GetIDForProcessName utility function in TelemetryCommon.

https://reviewboard.mozilla.org/r/240766/#review246764

Maybe on the flight to the All Hands I'll write the necessary function into gen_process_data.py that turns this into O(1)...
Attachment #8972030 - Flags: review?(chutten) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246770

::: toolkit/components/telemetry/TelemetryScalar.cpp:3239
(Diff revision 6)
> +            rv = persistedVal->SetAsUint32(data.asUInt());
> +            break;
> +          case Json::ValueType::booleanValue:
> +            rv = persistedVal->SetAsBool(data.asBool());
> +            break;
> +          default:

Do we support keyed string scalars?
Attachment #8970283 - Flags: review?(chutten) → review-
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246774

My concerns have been taken care of.
Attachment #8970284 - Flags: review?(chutten) → review+
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246748

> More like scopedStreamClose
> 
> Does static analysis complain this is unused?

No, not locally. We'll see how that behaves on try.

> Do we need any guards against multiple inits or multiple deinits?

I don't think we do, as `TelemetryGeckoViewPersistence::*` calls are only meant to be used from the Telemetry core implementation, which is a singleton (and we already [guarantee](https://searchfox.org/mozilla-central/rev/795575287259a370d00518098472eaa5b362bfa7/toolkit/components/telemetry/Telemetry.cpp#1256) it's initialized only once). I added a `MOZ_ASSERT` on the Init for extra-safety.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246080

> Can we conditionally define this log to be available in debug builds?
> `MOZ_LOG` works similarly (see https://searchfox.org/mozilla-central/source/xpcom/base/Logging.h#27-31, that constant is not available from the outside though: https://searchfox.org/mozilla-central/source/xpcom/base/Logging.h#265-267)

Done! I'm also leaving the `printf_stderr` for when we run gtest stuff on other platforms.
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review246770

> Do we support keyed string scalars?

No, [we don't](https://searchfox.org/mozilla-central/rev/795575287259a370d00518098472eaa5b362bfa7/toolkit/components/telemetry/TelemetryScalar.cpp#1322). I just noticed we don't mention it in the docs, so I filed bug 1458463 to fix that.
Attachment #8969177 - Flags: review?(jrediger)
Attachment #8970283 - Flags: review?(jrediger)
Attachment #8971493 - Flags: review?(jrediger)
Attachment #8971493 - Flags: review?(chutten)
Attachment #8970284 - Flags: review?(jrediger) → review?(nfroyd)
Hi Nathan! I flagged you for review on the test coverage patch in this bug: would you kindly review the changes to the build system? The objective here is to run "gtest" coverage for Telemetry core code. Android does not support gtests, so I'm conditionally enabling the code relevant code and linking it to xul-gtest (you suggested doing that on irc some time ago).

How does this look? There's one weird aspect to that: I couldn't find a way to compile/link TelemetryScalar.cpp (which contains some relevant code) to the xul-gtest: it complained about not finding a lot of our usual header files, so I gave up and manually defined the required functions in TestGeckoView.cpp. While this is fine for now, it would be great to fix this in the future. Any idea?

Last, but not least, I would really appreciate if you could give a quick look to the "Bug 1453591 - Add persistence support for GeckoView in the Telemetry core." changeset for sanity checking the usage of my I/O APIs.
Flags: needinfo?(nfroyd)
Comment on attachment 8972030 [details]
Bug 1453591 - Add a GetIDForProcessName utility function in TelemetryCommon.

https://reviewboard.mozilla.org/r/240766/#review246938
Attachment #8972030 - Flags: review?(jrediger) → review+
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246146

> Good catch, we should be enforcing that. I've added an assertion.

Well, turns out adding an assertion was not a great idea: it makes gtest [fail](https://treeherder.mozilla.org/#/jobs?repo=try&revision=9055c0822122ee39f8c5072e55669de0be13e2ee&selectedJob=176523460). I'll roll this back since we're guaranteeing that we get called once by the caller.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

I have a few comments.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:59
(Diff revision 8)
> +const char* kPersistenceFileName = "gv_measurements.json";
> +const char* kPersistenceFileNameNoExt = "gv_measurements";

Please make these `const char kPersistence...[] = "..."`, which is slightly smaller on all of our usual systems.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:63
(Diff revision 8)
> +// measurements.
> +const char* kPersistenceFileName = "gv_measurements.json";
> +const char* kPersistenceFileNameNoExt = "gv_measurements";
> +
> +// The timer used for persisting measurements data.
> +nsITimer* gPersistenceTimer = nullptr;

No need to explicitly initialize this to `nullptr`.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:65
(Diff revision 8)
> +const char* kPersistenceFileNameNoExt = "gv_measurements";
> +
> +// The timer used for persisting measurements data.
> +nsITimer* gPersistenceTimer = nullptr;
> +// The worker thread to perform persistence.
> +nsCOMPtr<nsIThread> gPersistenceThread = nullptr;

This is going to cause a static constructor; you probably want to use a raw pointer here, similar to `gPersistenceTimer`, or use `StaticRefPtr`.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:109
(Diff revision 8)
> +  nsresult rv = GetAndroidDataDir(dataDir);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  // Append the extension to the filename.
> +  nsAutoString fileName;
> +  fileName.Assign(NS_ConvertUTF8toUTF16(kPersistenceFileNameNoExt));

Maybe just define `kPersistenceFileNameNoExt` as a unicode string to avoid conversion?  `u"..."`, that is; unsure if that style will work with `Assign` on Windows, but it should.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:137
(Diff revision 8)
> +  uint32_t count;
> +  rv = stream->Write(aData.Data(), aData.Length(), &count);
> +  // Make sure we wrote correctly.
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  return (count == aData.Length()) ? NS_OK : NS_ERROR_FAILURE;

You need to explicitly close the output stream here; the destruction of the stream will not do that for you.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:144
(Diff revision 8)
> +
> +/**
> + * Read and parses JSON content from a file.
> + *
> + * @param {nsCOMPtr<nsIFile>} aFile - the nsIFile handle to the file.
> + * @param {Json::Value} aRoot - the JSON object that will contain the parsed dat.

"the parsed data"?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:235
(Diff revision 8)
> +  Json::Value root;
> +
> +  // Write the data to a JSON string.
> +  Json::FastWriter writer;
> +  nsAutoCString jsonBuffer(writer.write(root).c_str());

What data are you writing here?  `root` is an empty object, no?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:239
(Diff revision 8)
> +
> +  Json::Value root;
> +
> +  // Write the data to a JSON string.
> +  Json::FastWriter writer;
> +  nsAutoCString jsonBuffer(writer.write(root).c_str());

It would probably be better to write this as:

```
std::string jsonString = writer.write(root);
nsDependentCString jsonBuffer(jsonString.c_str());
```

so as to not allocate a string for the json, and then immediately copy that string into the `nsAutoCString`, and then free the original `std::string`.  Or even dispense with the `nsAutoCString` entirely and just pass `jsonString` into `WriteToFile`.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:258
(Diff revision 8)
> +  // Rename the file and drop the ".bak" extension.
> +  nsresult rv = persistenceFile->MoveTo(nullptr, NS_ConvertUTF8toUTF16(kPersistenceFileName));

You should just use `nsISafeOutputStream` or similar to handle all the details here for you.  You'll probably have to rewrite some of the surrounding code, though.

Not entirely sure that works well with your backup scheme, though?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:259
(Diff revision 8)
> +    ANDROID_LOG("PersistenceThreadPersist - There was an error writing to the backup file.");
> +    return;
> +  }
> +
> +  // Rename the file and drop the ".bak" extension.
> +  nsresult rv = persistenceFile->MoveTo(nullptr, NS_ConvertUTF8toUTF16(kPersistenceFileName));

Same deal here for defining `kPersistenceFileName` as a unicode string to start with, so as to avoid the runtime conversion.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:331
(Diff revision 8)
> +
> +void
> +TelemetryGeckoViewPersistence::InitPersistence()
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +  MOZ_ASSERT(gPersistenceThread, "Init must only be called once.");

I think you mean `!gPersistenceThread`?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:381
(Diff revision 8)
> +    // Always make sure the timer is canceled.
> +    MOZ_ALWAYS_SUCCEEDS(gPersistenceTimer->Cancel());
> +    gPersistenceTimer = nullptr;

This code leaks `gPersistenceTimer`, because you're not releasing it.  Please fix that.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:390
(Diff revision 8)
> +  // This can be run on any thread, as we just dispatch the persistence
> +  // task to the persistence thread.
> +  MOZ_ASSERT(gPersistenceThread);

Maybe handle the case where `gPersistenceThread` didn't get initialized, and just return from this function?

::: toolkit/components/telemetry/moz.build:213
(Diff revision 8)
> +    USE_LIBS += [
> +        'jsoncpp',
> +    ]

What does this do to codesize on Android?

::: toolkit/components/telemetry/nsITelemetry.idl:574
(Diff revision 8)
>     */
>    void clearEvents();
> +
> +  /**
> +   * Reset the storage for all the collection primitives that GeckoView supports.
> +   * Please note that this is only intended to be used by GeckoViewTelemetryController.

One thing you might do is define this on a separate, Android-only interface, so if you need it from JS, you can mostly enforce that only Android calls it?
(In reply to Alessio Placitelli [:Dexter] from comment #77)
> How does this look? There's one weird aspect to that: I couldn't find a way
> to compile/link TelemetryScalar.cpp (which contains some relevant code) to
> the xul-gtest: it complained about not finding a lot of our usual header
> files, so I gave up and manually defined the required functions in
> TestGeckoView.cpp. While this is fine for now, it would be great to fix this
> in the future. Any idea?

That is peculiar, but I can't really say any more without seeing the errors.

> Last, but not least, I would really appreciate if you could give a quick
> look to the "Bug 1453591 - Add persistence support for GeckoView in the
> Telemetry core." changeset for sanity checking the usage of my I/O APIs.

Went over that a bit, I might have duplicated other people's comments, though...
Flags: needinfo?(nfroyd)
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review247010

I think this is OK.

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:192
(Diff revision 9)
> +
> +/**
> + * We can't link TelemetryScalar.cpp to these test files, so mock up
> + * the required functions to make the linker not complain. In order
> + * to use the ASSERT_* gtest macros, we need to introduce another set
> + * of functions that return void. These macros can't be used unless that's

"that return void"?  Or "that return nsresult"?  The current wording matched up with the functions below leaves one scratching their head...unless you mean the functions returning `void` being all the ones above?  But then I don't understand why the comment isn't before those functions, and what the return type has to do with the `ASSERT` macros anyway...

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:215
(Diff revision 9)
> +  AutoJSContextWithGlobal cx(mCleanGlobal);
> +
> +  MockAndroidDataDir();

Maybe these should go in the test fixture for the per-test setup bits?

Bonus points for only doing `MockAndroidDataDir` once per test run, rather than once per individual test.

::: toolkit/components/telemetry/moz.build:208
(Diff revision 9)
>  # which files to include. As a consequence, we can simply only
>  # include the GeckoView files on all Android builds.
>  if CONFIG['MOZ_WIDGET_TOOLKIT'] == 'android':
> +    # Introduce this define to conditionally enable Telemetry GV code in the various
> +    # C++ modules. We need this trick in order to run gtest coverage on Treeherder
> +    # on platform other than Android, since gtests on Android are not supported

Nit: "on platforms other than".  Also, period at the end of the sentence.

::: toolkit/components/telemetry/moz.build:210
(Diff revision 9)
>  if CONFIG['MOZ_WIDGET_TOOLKIT'] == 'android':
> +    # Introduce this define to conditionally enable Telemetry GV code in the various
> +    # C++ modules. We need this trick in order to run gtest coverage on Treeherder
> +    # on platform other than Android, since gtests on Android are not supported
> +    # yet (see bug 1318091).
> +    DEFINES['MOZ_TELEMETRY_GECKOVIEW'] = True

Why is this defined here and in the geckoview/gtest directory?  Are we doing it so there's a single define for enabling the code for gtest and non-gtest builds?
Attachment #8970284 - Flags: review?(nfroyd) → review+
Blocks: 1457472
No longer depends on: 1457472
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

> Please make these `const char kPersistence...[] = "..."`, which is slightly smaller on all of our usual systems.

Ok, doing that! Out of curiosity, how comes that's smaller compared to `const char*`?

> This is going to cause a static constructor; you probably want to use a raw pointer here, similar to `gPersistenceTimer`, or use `StaticRefPtr`.

I've also removed the initialization to `nullptr` here as well.

> Maybe just define `kPersistenceFileNameNoExt` as a unicode string to avoid conversion?  `u"..."`, that is; unsure if that style will work with `Assign` on Windows, but it should.

Looks like it's working fine on windows!

> What data are you writing here?  `root` is an empty object, no?

Yes, but other calls are being added in the other changesets in this bug, see  "Add GeckoView persistence for Telemetry Scalars". I wanted to split the logic away while keeping each changeset compilable on its own.

> You should just use `nsISafeOutputStream` or similar to handle all the details here for you.  You'll probably have to rewrite some of the surrounding code, though.
> 
> Not entirely sure that works well with your backup scheme, though?

Wow, I didn't know about this! Thank you so much, this simplifies things quite a bit!

> What does this do to codesize on Android?

I'm not sure, but it looks like this is [already being used](https://searchfox.org/mozilla-central/rev/ce9ff94ffed34dc17ec0bfa406156d489eaa8ee1/toolkit/crashreporter/minidump-analyzer/moz.build#51) for stack traces on Fennec.

> One thing you might do is define this on a separate, Android-only interface, so if you need it from JS, you can mostly enforce that only Android calls it?

Good point. However, in the long run, we might want to use that on Desktop as well. I would vote for leaving it here for now, but I've filed bug 1458801 for addressing refactoring the interface.
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review247010

> "that return void"?  Or "that return nsresult"?  The current wording matched up with the functions below leaves one scratching their head...unless you mean the functions returning `void` being all the ones above?  But then I don't understand why the comment isn't before those functions, and what the return type has to do with the `ASSERT` macros anyway...

Right, the comment is for the functions above (the return type part). Unfortunately, gtest `ASSERT_*` macros [can only be used](https://github.com/google/googletest/blob/master/googletest/docs/FAQ.md#my-compiler-complains-void-value-not-ignored-as-it-ought-to-be-what-does-this-mean) in functions returning `void`.

> Maybe these should go in the test fixture for the per-test setup bits?
> 
> Bonus points for only doing `MockAndroidDataDir` once per test run, rather than once per individual test.

Done! I'm no gtest guru, so I'm afraid I'll skip the bonus points this time :'(

> Nit: "on platforms other than".  Also, period at the end of the sentence.

Good catch! The period is on the line below, after the "(see ...)" part.

> Why is this defined here and in the geckoview/gtest directory?  Are we doing it so there's a single define for enabling the code for gtest and non-gtest builds?

Yes, this was the intent. It seemed to read better than `#ifdef ANDROID || GECKOVIEW_GTEST`, but I don't have a strong opinion on this. Happy to change if you feel strongly about it or have any suggestion!
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review247352
Attachment #8969177 - Flags: review?(chutten) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review247354
Attachment #8970283 - Flags: review?(chutten) → review+
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review247356

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:294
(Diff revision 6)
>    ANDROID_LOG("PersistenceThreadLoadData");
>  
>    // If the function completes or fails, make sure to spin up the persistence timer.
>    auto scopedArmTimer = MakeScopeExit([&] {
>      NS_DispatchToMainThread(
> -      NS_NewRunnableFunction("MainThreadArmPersistenceTimer", []() -> void {
> +       NS_NewRunnableFunction("MainThreadArmPersistenceTimer", []() -> void {

Looks like an extra space snuck in here.
Attachment #8971493 - Flags: review?(chutten) → review+
(In reply to Alessio Placitelli [:Dexter] from comment #88)
> > Please make these `const char kPersistence...[] = "..."`, which is slightly smaller on all of our usual systems.
> 
> Ok, doing that! Out of curiosity, how comes that's smaller compared to
> `const char*`?

`const char* kPersistenceFileName = "gv_measurements.json"` reserves space for the string constant plus 4 or 8 bytes space for the pointer variable kPersistenceFileName pointing to the string constant. `const char kPersistenceFileName[] = "gv_measurements.json"` just reserves space for the string constant and code referencing kPersistenceFileName will access the string constant address directly.
Depends on: 1459144
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review247478

Small nits regarding docs.

I like the fact that we don't have to deal with the backup file directly anymore.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:76
(Diff revision 10)
> +void PersistenceThreadPersist();
> +
> +/**
> + * Get the path to the Android Data dir.
> + *
> + * @param {nsCString} aOutDir - the variable holding the path.

Comment says `nsCString`, function definition says `nsTString<PathChar>`.

Do we need to have the type in the docu itself?
If so, at least those should match.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:121
(Diff revision 10)
> +}
> +
> +/**
> + * Write string content to a file.
> + *
> + * @param {nsACString} aData - the content to write to the file.

Similar as before.
`nsACString` in the docs, `std::string` in the code.
Attachment #8969177 - Flags: review?(jrediger) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review247482

One issue regarding integer values. Shouldn't be too hard to verify. Maybe we should have a test with potentially malicious data?

::: toolkit/components/telemetry/TelemetryScalar.cpp:2416
(Diff revision 10)
>      if (!processObj ||
>          !JS_DefineProperty(aCx, root_obj, processName, processObj, JSPROP_ENUMERATE)) {
>        return NS_ERROR_FAILURE;
>      }
>  
> -    for (nsTArray<DataPair>::size_type i = 0; i < processScalars.Length(); i++) {
> +    for (ScalarTupleArray::size_type i = 0; i < processScalars.Length(); i++) {

Previously chutten stated that this could use new-style for loops.
The issue was marked resolved without a comment.
Oversight or intentionally keeping it an old-style loop?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3065
(Diff revision 10)
> +    ScalarTupleArray& processScalars = iter.Data();
> +    const char* processName = GetNameForProcessID(ProcessID(iter.Key()));
> +
> +    Json::Value currentProcess;
> +
> +    for (const ScalarDataTuple& scalar : processScalars) {

Ah, here it uses new-style loops. The previous occurence is a left-over from the refactored code.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3167
(Diff revision 10)
> +
> +      switch (scalar.type()) {
> +        case Json::ValueType::intValue:
> +        case Json::ValueType::uintValue:
> +        case Json::ValueType::realValue:
> +          rv = persistedVal->SetAsUint32(scalar.asUInt());

Do we need to account for overflowing here from intentionally malicious data?

`asUInt` asserts the range [1], which might abort [2].

[1]: https://searchfox.org/mozilla-central/source/toolkit/components/jsoncpp/src/lib_json/json_value.cpp#734
[2]: https://searchfox.org/mozilla-central/source/toolkit/components/jsoncpp/include/json/assertions.h#43

::: toolkit/components/telemetry/TelemetryScalar.cpp:3234
(Diff revision 10)
> +
> +        switch (data.type()) {
> +          case Json::ValueType::intValue:
> +          case Json::ValueType::uintValue:
> +          case Json::ValueType::realValue:
> +            rv = persistedVal->SetAsUint32(data.asUInt());

As above (integer wrapping)
Attachment #8970283 - Flags: review?(jrediger) → review+
Comment on attachment 8971493 [details]
Bug 1453591 - Add a signal for checking when persisted probes loading completes.

https://reviewboard.mozilla.org/r/240260/#review247490

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:67
(Diff revision 7)
>  const char16_t kPersistenceFileName[] = u"gv_measurements.json";
>  
> +// This topic is notified and propagated up to the application to
> +// make sure it knows that data loading has complete and that snapshotting
> +// can now be performed.
> +const char* kLoadCompleteTopic = "internal-telemetry-geckoview-load-complete";

nit. pointer vs array to save a byte here as well?
Attachment #8971493 - Flags: review?(jrediger) → review+
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review247492

Still might be good to have a test for malicious data (that is, valid json, but not valid stored measurements), just to make sure we don't crash on it.

::: toolkit/components/telemetry/geckoview/gtest/TestGeckoView.cpp:22
(Diff revision 12)
> +#include "TelemetryTestHelpers.h"
> +
> +using namespace mozilla;
> +using namespace TelemetryTestHelpers;
> +
> +#define EXPECTED_STRING "Nice, expected and creative string."

This is unused.
Attachment #8970284 - Flags: review?(jrediger) → review+
Comment on attachment 8970284 [details]
Bug 1453591 - Add gtest coverage for the persistence logic.

https://reviewboard.mozilla.org/r/239084/#review246204

> I think we need it here: I want to make sure that the function fails because the file is not there, not because we failed to detect the presence of the file.

Sounds good to me, especially in a test.
(In reply to Chris Peterson [:cpeterson] from comment #103)
> (In reply to Alessio Placitelli [:Dexter] from comment #88)
> > > Please make these `const char kPersistence...[] = "..."`, which is slightly smaller on all of our usual systems.
> > 
> > Ok, doing that! Out of curiosity, how comes that's smaller compared to
> > `const char*`?
> 
> `const char* kPersistenceFileName = "gv_measurements.json"` reserves space
> for the string constant plus 4 or 8 bytes space for the pointer variable
> kPersistenceFileName pointing to the string constant. `const char
> kPersistenceFileName[] = "gv_measurements.json"` just reserves space for the
> string constant and code referencing kPersistenceFileName will access the
> string constant address directly.

Thank you so much, I didn't know that! #TIL :D
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

> Ok, doing that! Out of curiosity, how comes that's smaller compared to `const char*`?

Chris already did an excellent job of answering this.

> I'm not sure, but it looks like this is [already being used](https://searchfox.org/mozilla-central/rev/ce9ff94ffed34dc17ec0bfa406156d489eaa8ee1/toolkit/crashreporter/minidump-analyzer/moz.build#51) for stack traces on Fennec.

It looks like on non-android platforms, `jsoncpp` is linked into an external program, not libxul.  And on android platforms, it's linked into a separate library (?).  This use looks like the first libxul use.  And I don't think this use is the first place in libxul we have to parse JSON into something understandable, so I'd like to know if we can find an alternative before adding more code to libxul.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

> It looks like on non-android platforms, `jsoncpp` is linked into an external program, not libxul.  And on android platforms, it's linked into a separate library (?).  This use looks like the first libxul use.  And I don't think this use is the first place in libxul we have to parse JSON into something understandable, so I'd like to know if we can find an alternative before adding more code to libxul.

Mh, good point. The only thing I could find (and the one that I used at the beginning) is [JSONWriter](https://searchfox.org/mozilla-central/source/mfbt/JSONWriter.h). However I had to stop using that after I figured that there's no `JSONReader`. Then there's the JS JSON parsing from the [JS API](https://searchfox.org/mozilla-central/source/js/src/builtin/JSON.cpp). I'm not aware of any other json parsing library in our tree which is already included/linked against libxul. I'm giving [bloaty](https://github.com/google/bloaty) a try for measuring the impact of this library.
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review247620

Alessio and I talked on IRC about the jsoncpp library dependency.  It appears to increase libxul size by ~80k, which lies somewhere between ok and not great.  I am unexcited about adding another way to read json apart from `JSON.parse()`, for both the size and security aspects of jsoncpp.

Alesso said that trying to move the JSON reading into JS is somewhat non-trivial, and is going to discuss options with Georg next week.
Attachment #8969177 - Flags: review?(nfroyd)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

> Mh, good point. The only thing I could find (and the one that I used at the beginning) is [JSONWriter](https://searchfox.org/mozilla-central/source/mfbt/JSONWriter.h). However I had to stop using that after I figured that there's no `JSONReader`. Then there's the JS JSON parsing from the [JS API](https://searchfox.org/mozilla-central/source/js/src/builtin/JSON.cpp). I'm not aware of any other json parsing library in our tree which is already included/linked against libxul. I'm giving [bloaty](https://github.com/google/bloaty) a try for measuring the impact of this library.

Follow-up: the size difference using the `size` binary is about ~80KB. :froydnj raised the question of whether is acceptable or not to use `jsoncpp` here,  rather than use the JS API (i.e. `JSON.parse()`), for both security and size reasons (80KB is ok-ish). For using the JS API, I see at least two problems: (1) JS code is Telemetry instrumented, so we'd need to be extra careful about doing that to perform operations in the Telemetry core; (2) using JS API in c++ makes the code a bit convoluted. Moreover, I thought that JS code was tied to the main thread, but this seems to be no longer the case due to "worker threads". I'll discuss this with Georg next week, to decide how to unblock this.
(In reply to Nathan Froyd [:froydnj] from comment #118)
> Comment on attachment 8969177 [details]
> Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.
> 
> https://reviewboard.mozilla.org/r/237916/#review246986
> 
> > Ok, doing that! Out of curiosity, how comes that's smaller compared to `const char*`?
> 
> Chris already did an excellent job of answering this.

The binary size difference between a global string (const char foo[] = "...";) and a global pointer to string (const char* foo = "...";) is only one of the issues here, and not the most important. Glandium already did a good job explaining what's really at stake a while back:
https://glandium.org/blog/?p=2361
Attachment #8971494 - Attachment is obsolete: true
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review246986

> Follow-up: the size difference using the `size` binary is about ~80KB. :froydnj raised the question of whether is acceptable or not to use `jsoncpp` here,  rather than use the JS API (i.e. `JSON.parse()`), for both security and size reasons (80KB is ok-ish). For using the JS API, I see at least two problems: (1) JS code is Telemetry instrumented, so we'd need to be extra careful about doing that to perform operations in the Telemetry core; (2) using JS API in c++ makes the code a bit convoluted. Moreover, I thought that JS code was tied to the main thread, but this seems to be no longer the case due to "worker threads". I'll discuss this with Georg next week, to decide how to unblock this.

Ok, I got rid of `jsoncpp`: I'm using the JS API for parsing and `JSONWriter` for writing!
@jorendorff - thank you for your help yesterday :) Would you mind giving a quick look at my usage of the JS API, specifically in the `MainThreadParsePersistedProbes` in the "Bug 1453591 - Add persistence support for GeckoView in the Telemetry core." changeset?

A look at the "TelemetryScalar::LoadPersistedKeyedScalars" function in "Bug 1453591 - Add GeckoView persistence for Telemetry Scalars." would be super appreciated: the syntax seems very verbose, maybe there's some JS shortcut I'm not aware of!

@mrbkap - jorendorff mentioned you might be a good person to ask how to get a JS context early at startup (XPConnect stuff!). Am I using `if (NS_WARN_IF(!jsapi.Init(xpc::PrivilegedJunkScope())))` correctly in `MainThreadParsePersistedProbes` ("Bug 1453591 - Add persistence support for GeckoView in the Telemetry core." changeset)? Should I use something else?
Flags: needinfo?(jorendorff)
@mccr8 - flagging you for comment 130, since mrbkap is on PTO :) If you're not the right person, do you know who should I ask?
Flags: needinfo?(continuation)
Maybe Boris or Peterv would know? (The question is in the last paragraph of comment 130.)
Flags: needinfo?(peterv)
Flags: needinfo?(continuation)
Flags: needinfo?(bzbarsky)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review248596

I think this is all OK.  Thanks for working on this!

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:194
(Diff revision 13)
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  // Make sure to close the stream.
> +  auto scopedStreamClose = MakeScopeExit([inStream] { inStream->Close(); });
> +
> +  nsAutoCString buffer;

This variable looks unused now.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:286
(Diff revision 13)
> +  // Build the JSON structure.
> +  mozilla::JSONWriter w(mozilla::MakeUnique<CStringJSONWriter>());
> +  w.Start();
> +
> +  // End the building process.
> +  w.End();

I am assuming that future patches will fill in something interesting here?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:304
(Diff revision 13)
> +  // Android can kill us while we are writing to disk and, if that happens,
> +  // we end up with a corrupted json overwriting the old session data.
> +  // Luckily, |WriteToFile| is smart enough to write to a temporary file and
> +  // only overwrite the original file if nothing bad happened.
> +  const nsCString &content = static_cast<CStringJSONWriter*>(w.WriteFunc())->Get();
> +  if (NS_FAILED(WriteToFile(content.get(), persistenceFile))) {

This call is quite wasteful: `content.get` returns a `const char*`, while `WriteToFile` takes a `const std::string&`.  So the call (silently!) allocates a temporary `std::string`, only to destroy it after the call.  Just have `WriteToFile` take a `const nsCString&` instead, and pass `content` here?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:329
(Diff revision 13)
> +
> +  // If the function completes or fails, make sure to spin up the persistence timer.
> +  nsAutoCString fileContent;
> +  auto scopedArmTimer = MakeScopeExit([&] {
> +    NS_DispatchToMainThread(
> +      NS_NewRunnableFunction("MainThreadArmPersistenceTimer", [fileContent]() -> void {

It would be great if this was something like `[nsCString fileContent = Move(fileContent)]() -> void { ... }` so we're not copying `fileContent` needlessly.  (I think that syntax does what we want here...)  If that doesn't actually work, I guess we will live with it.
Attachment #8969177 - Flags: review?(nfroyd) → review+
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review248854

Thanks, this looks solid with the comments/questions below cleared up.

::: toolkit/components/telemetry/Telemetry.cpp:1877
(Diff revision 13)
>  NS_IMETHODIMP
> +TelemetryImpl::ClearProbes()
> +{
> +#if defined(MOZ_WIDGET_ANDROID)
> +  // We only support this in GeckoView.
> +  if (GetCurrentProduct() != SupportedProduct::Geckoview) {

Should this be a debug assertion?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.h:14
(Diff revision 13)
> +#define GeckoViewTelemetryPersistence_h__
> +
> +namespace TelemetryGeckoViewPersistence {
> +
> +/**
> + * Initializes the GeckoView persistence.

Does this have any side effects that i, as a reader, should be aware of?
Ditto below.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:49
(Diff revision 13)
> +#else
> +// If we're building for other platforms (e.g. for running test coverage), try
> +// to print something anyway.
> +#define ANDROID_LOG(...) printf_stderr("\n**** TELEMETRY: " __VA_ARGS__)
> +#endif // MOZ_WIDGET_ANDROID

Per the moz.build, we're only including this source on Android builds.
Why do we try to handle other platforms here?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:69
(Diff revision 13)
> +// The name of the persistence file used for saving the
> +// measurements.
> +const char16_t kPersistenceFileName[] = u"gv_measurements.json";
> +
> +// The timer used for persisting measurements data.
> +nsITimer* gPersistenceTimer;

Initialize this to `nullptr`?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:287
(Diff revision 13)
> +        MainThreadArmPersistenceTimer();
> +      }));
> +  });
> +
> +  // Build the JSON structure.
> +  mozilla::JSONWriter w(mozilla::MakeUnique<CStringJSONWriter>());

From my understanding, we build up one contiguous string buffer to hold all the JSON data.
Bug 1457624 has some rough numbers on what we assume the payload sizes could be here.

This means that we might get contiguous buffer sizes of 30kb+ for release and 400kb+ for nightly.
Is that realistic to allocate successfully without streaming?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:386
(Diff revision 13)
> +  gPersistenceThread = thread.forget();
> +
> +  // Trigger the loading of the persistence data. After the function
> +  // completes it will automatically arm the persistence timer.
> +  gPersistenceThread->Dispatch(
> +    NS_NewRunnableFunction("PersistenceThreadLoadData", []() -> void {

Nit: Does it have benefits to use a lambda vs. just `NS_NewRunnableFunction("foo", &func)`?

::: toolkit/components/telemetry/nsITelemetry.idl:579
(Diff revision 13)
> +   * Reset the storage for all the collection primitives that GeckoView supports.
> +   * Please note that this is only intended to be used by GeckoViewTelemetryController.

How is this intended to be used?
I don't see any usage of this in patches on this bug.
Can you add context on what this exists for in the comment?
Attachment #8969177 - Flags: review?(gfritzsche) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review248868

This looks good from a high-level perspectice, cheers!
I only have nits below that i'd like to see cleared up.

::: toolkit/components/telemetry/TelemetryScalar.h:96
(Diff revision 13)
>                           const mozilla::Telemetry::DiscardedData& aDiscardedData);
>  
>  void GetDynamicScalarDefinitions(nsTArray<mozilla::Telemetry::DynamicScalarDefinition>&);
>  void AddDynamicScalarDefinitions(const nsTArray<mozilla::Telemetry::DynamicScalarDefinition>&);
>  
> +// These functions are only meant to be used for GeckoView persistence.

Can you add a line on what they are good for?

::: toolkit/components/telemetry/TelemetryScalar.h:98
(Diff revision 13)
> +nsresult PersistScalars(mozilla::JSONWriter &aWriter);
> +nsresult PersistKeyedScalars(mozilla::JSONWriter &aWriter);
> +nsresult LoadPersistedScalars(JSContext* aCx, JS::HandleValue aData);
> +nsresult LoadPersistedKeyedScalars(JSContext* aCx, JS::HandleValue aData);

Nit: Would "serialize"/"unserialize" be more descriptive of what these do?

::: toolkit/components/telemetry/TelemetryScalar.cpp:268
(Diff revision 13)
> +#if defined(MOZ_WIDGET_ANDROID)
> +/**
> + * Write a nsIVariant with a JSONWriter, used for GeckoView persistence.
> + */
> +nsresult
> +VariantToJsonValue(uint32_t aScalarType, nsIVariant* aInputValue,

Nit: Can we name this more descriptively, e.g. `WriteVariantToJSONWriter()`?

::: toolkit/components/telemetry/TelemetryScalar.cpp:1572
(Diff revision 13)
> +                           ScalarSnapshotTable& aScalarsToReflect,
> +                           unsigned int aDataset, bool aClearScalars)
> +{
> +  // The snapshotting function is the same for both static and dynamic builtin scalars.
> +  // We can use the same function and store the scalars in the same output storage.
> +  auto snapshotter = [aDataset, &aLock, &aScalarsToReflect]

Is there a specific reason to make these helpers a lambda instead of a free function?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3158
(Diff revision 13)
> +
> +  return NS_OK;
> +}
> +
> +/**
> + * Load the persisted measurements from a Json object and injects them

Super nit: "load" & "injects" - pick one form?
Attachment #8970283 - Flags: review?(gfritzsche) → review+
I don't see a good reason to use PrivilegedJunkScope here.

It _might_ be OK to use UnprivilegedJunkScope if you really have to.

But better yet would be to use a pattern like what the webidl dictionary Init(const nsAString&) methods do: SimpleGlobalObject::Create(SimpleGlobalObject::GlobalType::BindingDetail) followed by initing your AutoJSAPI with that object.  It will cost a bit more time to create a global here instead of reusing an existing one, so you might want to measure that on your target setup and decide whether that's a viable cost, but it would ensure that you don't have to worry about who else might have messed with your global's standard objects.... and the JS_GetProperty calls are definitely assuming no one has.

It's probably worth filing a followup to create some more GlobalType values, since BindingDetail seems to be leaking out of bindings.

It's also worth checking whether you could in fact just use a webidl dictionary and avoid the manual JSAPI work.  I haven't checked over the processing model for the parsed JSON in detail to see whether that would be feasible.
Flags: needinfo?(bzbarsky)
Oh, and if you do use UnprivilegedJunkScope or PrivilegedJunkScope, please have bholley OK it when he gets back in June.
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249014

::: toolkit/components/telemetry/TelemetryScalar.cpp:3301
(Diff revision 13)
> +  }
> +
> +  for (size_t i = 0, n = processes.length(); i < n; i++) {
> +    // Get the process name.
> +    nsAutoJSString processNameJS;
> +    if (!processNameJS.init(aCx, processes[i])) {

If init() fails, you have two options:

1) Manually suppress the exception and press on to do other things.

2) Return error to caller.

Doing what this code does, pressing on with an exception on the JSContext, will cause various assertions and is generally undefined in terms of how it behaves.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3316
(Diff revision 13)
> +      continue;
> +    }
> +
> +    // And its probes.
> +    JS::RootedValue processData(aCx);
> +    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)

If JS_GetPropertyById returns false you have to suppress the exception or fail out.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3317
(Diff revision 13)
> +    }
> +
> +    // And its probes.
> +    JS::RootedValue processData(aCx);
> +    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)
> +        || !processData.isObject()) {

If processData is not an object, do you want to just silently ignore it, or to fail out?

Either way, document which choice is being made and why, please.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3324
(Diff revision 13)
> +    }
> +
> +    // Iterate through each keyed scalar.
> +    JS::RootedObject processDataObj(aCx, &processData.toObject());
> +    JS::Rooted<JS::IdVector> keyedScalars(aCx, JS::IdVector(aCx));
> +    if (!JS_Enumerate(aCx, processDataObj, &keyedScalars)) {

If JS_Enumerate returns false you need to suppress exeption before pressing on, or fail out.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3331
(Diff revision 13)
> +    }
> +
> +    for (size_t i = 0, n = keyedScalars.length(); i < n; i++) {
> +      // Get the scalar name.
> +      nsAutoJSString scalarName;
> +      if (!scalarName.init(aCx, keyedScalars[i])) {

If init() fails... you know the drill by now.  ;)

::: toolkit/components/telemetry/TelemetryScalar.cpp:3337
(Diff revision 13)
> +        continue;
> +      }
> +
> +      // Get the data for this keyed scalar.
> +      JS::RootedValue keyedScalarData(aCx);
> +      if (!JS_GetPropertyById(aCx, processDataObj, keyedScalars[i], &keyedScalarData)

If JS_GetPropertyById fails... see above.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3338
(Diff revision 13)
> +      }
> +
> +      // Get the data for this keyed scalar.
> +      JS::RootedValue keyedScalarData(aCx);
> +      if (!JS_GetPropertyById(aCx, processDataObj, keyedScalars[i], &keyedScalarData)
> +          || !processData.isObject()) {

Again, please document why this ignores non-objects instead of failing.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3345
(Diff revision 13)
> +      }
> +
> +      // Get the keys in the keyed scalar.
> +      JS::RootedObject keyedScalarDataObj(aCx, &keyedScalarData.toObject());
> +      JS::Rooted<JS::IdVector> keys(aCx, JS::IdVector(aCx));
> +      if (!JS_Enumerate(aCx, keyedScalarDataObj, &keys)) {

If this fails.... you know.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3352
(Diff revision 13)
> +      }
> +
> +      for (size_t j = 0, m = keys.length(); j < m; j++) {
> +        // Get the process name.
> +        nsAutoJSString keyName;
> +        if (!keyName.init(aCx, keys[i])) {

If this fails, the usual.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3358
(Diff revision 13)
> +          continue;
> +        }
> +
> +        // Get the scalar value as a JS value.
> +        JS::RootedValue scalarValue(aCx);
> +        if (!JS_GetPropertyById(aCx, keyedScalarDataObj, keys[i], &scalarValue)

If this fails, the usual.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3359
(Diff revision 13)
> +        }
> +
> +        // Get the scalar value as a JS value.
> +        JS::RootedValue scalarValue(aCx);
> +        if (!JS_GetPropertyById(aCx, keyedScalarDataObj, keys[i], &scalarValue)
> +            || scalarValue.isNullOrUndefined()) {

Document this choice.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3367
(Diff revision 13)
> +
> +        // Unpack the aVal to nsIVariant.
> +        nsCOMPtr<nsIVariant> unpackedVal;
> +        nsresult rv =
> +          nsContentUtils::XPConnect()->JSToVariant(aCx, scalarValue,  getter_AddRefs(unpackedVal));
> +        if (NS_FAILED(rv)) {

Pretty sure a failure rv here needs to be treated just like a JSAPI false return: suppress explicitly or have to bail out.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3375
(Diff revision 13)
> +
> +        // Add the scalar to the map.
> +        PersistedKeyedScalarArray& processScalars =
> +          scalarsToUpdate.GetOrInsert(static_cast<uint32_t>(processID));
> +        processScalars.AppendElement(
> +          mozilla::MakeTuple(nsCString(NS_ConvertUTF16toUTF8(scalarName)),

Do you need the explicit nsCString() bit even though NS_ConvertUTF16toUTF8 is a subclass of nsCString?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:270
(Diff revision 13)
>    }
> +
> +  // Get the data for the scalars.
> +  JS::RootedObject dataObj(jsapi.cx(), &data.toObject());
> +  JS::RootedValue scalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "scalars", &scalarData)

If JS_GetProperty returns false, need to suppress or return...

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:271
(Diff revision 13)
> +
> +  // Get the data for the scalars.
> +  JS::RootedObject dataObj(jsapi.cx(), &data.toObject());
> +  JS::RootedValue scalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "scalars", &scalarData)
> +      && scalarData.isObject()) {

Document the choice here.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:272
(Diff revision 13)
> +  // Get the data for the scalars.
> +  JS::RootedObject dataObj(jsapi.cx(), &data.toObject());
> +  JS::RootedValue scalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "scalars", &scalarData)
> +      && scalarData.isObject()) {
> +    if (NS_FAILED(TelemetryScalar::LoadPersistedScalars(jsapi.cx(), scalarData))) {

If that fails, suppress or return....

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:278
(Diff revision 13)
> +      ANDROID_LOG("MainThreadParsePersistedProbes - Failed to parse 'scalars'.");
> +    }
> +  }
> +
> +  JS::RootedValue keyedScalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "keyedScalars", &keyedScalarData)

If this fails, etc.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:279
(Diff revision 13)
> +    }
> +  }
> +
> +  JS::RootedValue keyedScalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "keyedScalars", &keyedScalarData)
> +      && keyedScalarData.isObject()) {

Document the choice.
Attachment #8970283 - Flags: review-
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review249144

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:49
(Diff revision 13)
> +#else
> +// If we're building for other platforms (e.g. for running test coverage), try
> +// to print something anyway.
> +#define ANDROID_LOG(...) printf_stderr("\n**** TELEMETRY: " __VA_ARGS__)
> +#endif // MOZ_WIDGET_ANDROID

The gtest patch (the last in the stack of changesets) also includes this on other platforms for running test coverage. These log handling is helpful, for example, when running gtests on Windows (on Treeherder).

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:69
(Diff revision 13)
> +// The name of the persistence file used for saving the
> +// measurements.
> +const char16_t kPersistenceFileName[] = u"gv_measurements.json";
> +
> +// The timer used for persisting measurements data.
> +nsITimer* gPersistenceTimer;

That's what I had before. However, Nathan mentioned that it's not needed (see comment 85): static pointers are already initialized to `nullptr`.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:286
(Diff revision 13)
> +  // Build the JSON structure.
> +  mozilla::JSONWriter w(mozilla::MakeUnique<CStringJSONWriter>());
> +  w.Start();
> +
> +  // End the building process.
> +  w.End();

Yes, other patches in the stack already fill in scalar persistence.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:287
(Diff revision 13)
> +        MainThreadArmPersistenceTimer();
> +      }));
> +  });
> +
> +  // Build the JSON structure.
> +  mozilla::JSONWriter w(mozilla::MakeUnique<CStringJSONWriter>());

Mh, good point. I'm changing this to allow for streamed file writes.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:304
(Diff revision 13)
> +  // Android can kill us while we are writing to disk and, if that happens,
> +  // we end up with a corrupted json overwriting the old session data.
> +  // Luckily, |WriteToFile| is smart enough to write to a temporary file and
> +  // only overwrite the original file if nothing bad happened.
> +  const nsCString &content = static_cast<CStringJSONWriter*>(w.WriteFunc())->Get();
> +  if (NS_FAILED(WriteToFile(content.get(), persistenceFile))) {

I've removed this chunk of code and introduced `StreamingJSONWriter` that performs direct disk writes with no intermediate buffer.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:329
(Diff revision 13)
> +
> +  // If the function completes or fails, make sure to spin up the persistence timer.
> +  nsAutoCString fileContent;
> +  auto scopedArmTimer = MakeScopeExit([&] {
> +    NS_DispatchToMainThread(
> +      NS_NewRunnableFunction("MainThreadArmPersistenceTimer", [fileContent]() -> void {

Yeah, this doesn't seem to work. The compiler complains a bit:

```
/mozilla-unified/toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:364:20: error: expected ',' or ']' in lambda capture list
 0:45.23         [nsCString fileContent = mozilla::Move(fileContent)]() -> void {
```

I guess [this](https://isocpp.org/wiki/faq/cpp14-language#lambda-captures) isn't supported yet by our compilers.

::: toolkit/components/telemetry/nsITelemetry.idl:579
(Diff revision 13)
> +   * Reset the storage for all the collection primitives that GeckoView supports.
> +   * Please note that this is only intended to be used by GeckoViewTelemetryController.

Right, let me clarify the intended usage. The first consumer will come with bug 1457472 (which is currently blocked on this bug).
(In reply to Alessio Placitelli [:Dexter] from comment #144)
> :::
> toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:304
> (Diff revision 13)
> > +  // Android can kill us while we are writing to disk and, if that happens,
> > +  // we end up with a corrupted json overwriting the old session data.
> > +  // Luckily, |WriteToFile| is smart enough to write to a temporary file and
> > +  // only overwrite the original file if nothing bad happened.
> > +  const nsCString &content = static_cast<CStringJSONWriter*>(w.WriteFunc())->Get();
> > +  if (NS_FAILED(WriteToFile(content.get(), persistenceFile))) {
> 
> I've removed this chunk of code and introduced `StreamingJSONWriter` that
> performs direct disk writes with no intermediate buffer.

Ok, great. Do we have the same concern when reading from disk?
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249250

::: toolkit/components/telemetry/TelemetryScalar.cpp:3301
(Diff revision 13)
> +  }
> +
> +  for (size_t i = 0, n = processes.length(); i < n; i++) {
> +    // Get the process name.
> +    nsAutoJSString processNameJS;
> +    if (!processNameJS.init(aCx, processes[i])) {

Wow, this is huge! Thanks for spotting that!

::: toolkit/components/telemetry/TelemetryScalar.cpp:3375
(Diff revision 13)
> +
> +        // Add the scalar to the map.
> +        PersistedKeyedScalarArray& processScalars =
> +          scalarsToUpdate.GetOrInsert(static_cast<uint32_t>(processID));
> +        processScalars.AppendElement(
> +          mozilla::MakeTuple(nsCString(NS_ConvertUTF16toUTF8(scalarName)),

Apparently so, the compiler will complain about unmatching arguments otherwise.

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:272
(Diff revision 13)
> +  // Get the data for the scalars.
> +  JS::RootedObject dataObj(jsapi.cx(), &data.toObject());
> +  JS::RootedValue scalarData(jsapi.cx());
> +  if (JS_GetProperty(jsapi.cx(), dataObj, "scalars", &scalarData)
> +      && scalarData.isObject()) {
> +    if (NS_FAILED(TelemetryScalar::LoadPersistedScalars(jsapi.cx(), scalarData))) {

Given the last changes, that function is suppressing exceptions itself. Should I suppress here as well?
(In reply to Georg Fritzsche [:gfritzsche] from comment #145)
> (In reply to Alessio Placitelli [:Dexter] from comment #144)
> > :::
> > toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:304
> > (Diff revision 13)
> > > +  // Android can kill us while we are writing to disk and, if that happens,
> > > +  // we end up with a corrupted json overwriting the old session data.
> > > +  // Luckily, |WriteToFile| is smart enough to write to a temporary file and
> > > +  // only overwrite the original file if nothing bad happened.
> > > +  const nsCString &content = static_cast<CStringJSONWriter*>(w.WriteFunc())->Get();
> > > +  if (NS_FAILED(WriteToFile(content.get(), persistenceFile))) {
> > 
> > I've removed this chunk of code and introduced `StreamingJSONWriter` that
> > performs direct disk writes with no intermediate buffer.
> 
> Ok, great. Do we have the same concern when reading from disk?

We do, but we have no easy way to perform chunked JSON parsing AFAICT. I filed bug 1460911 for dealing with this if it becomes a problem.
Clearing ni? as :bz took care of that.
Flags: needinfo?(peterv)
Flags: needinfo?(jorendorff)
Comment on attachment 8969177 [details]
Bug 1453591 - Add persistence support for GeckoView in the Telemetry core.

https://reviewboard.mozilla.org/r/237916/#review249328

r=me on the MainThreadParsePersistedProbes() bit.
Attachment #8969177 - Flags: review?(bzbarsky) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249332

There's a bunch of specific comments on specific callsites, but really they're all asking for the same thing: a high-level description of the error-handling and recovery strategy around this stuff.  Right now some JSAPI failures are treated as "stop everything", some are treated as "just press on", and it's not clear why the difference are there and which behaviors are the right ones in terms of the desired overall behavior.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3230
(Diff revisions 13 - 14)
>  
>    for (size_t i = 0, n = processes.length(); i < n; i++) {
>      // Get the process name.
>      nsAutoJSString processNameJS;
>      if (!processNameJS.init(aCx, processes[i])) {
> +      JS_ClearPendingException(aCx);

Please document why this is the right behavior, as opposed to failing out.  It's not at all obvious why just ignoring an exception here and pressing on with bogus data is the right thing.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3246
(Diff revisions 13 - 14)
>      }
>  
>      // And its probes.
>      JS::RootedValue processData(aCx);
> -    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)
> -        || !processData.isObject()) {
> +    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)) {
> +      JS_ClearPendingException(aCx);

Again, need to explain why this is a reasonable thing to do.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3252
(Diff revisions 13 - 14)
> +      continue;
> +    }
> +
> +    if (!processData.isObject()) {
> +      // |processData| should be an object containing scalars. If this is
> +      // not the case, silently skip and try to load the data for the other

That's explaining _what_ the code is doing.  That's obvious from the code already.  What's needed is an explanation of _why_ the code is doing that.  What does it actually mean if processData is not an object?  Corrupt data?  Something else?  Why is pressing on the right choice?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3261
(Diff revisions 13 - 14)
>  
>      // Iterate through each scalar.
>      JS::RootedObject processDataObj(aCx, &processData.toObject());
>      JS::Rooted<JS::IdVector> scalars(aCx, JS::IdVector(aCx));
>      if (!JS_Enumerate(aCx, processDataObj, &scalars)) {
> +      JS_ClearPendingException(aCx);

Again, need to explain why.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3276
(Diff revisions 13 - 14)
>        }
>  
>        // Get the scalar value as a JS value.
>        JS::RootedValue scalarValue(aCx);
> -      if (!JS_GetPropertyById(aCx, processDataObj, scalars[i], &scalarValue)
> -          || scalarValue.isNullOrUndefined()) {
> +      if (!JS_GetPropertyById(aCx, processDataObj, scalars[i], &scalarValue)) {
> +        JS_ClearPendingException(aCx);

Explain why.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3281
(Diff revisions 13 - 14)
> -          || scalarValue.isNullOrUndefined()) {
> +        JS_ClearPendingException(aCx);
> +        continue;
> +      }
> +
> +      if (scalarValue.isNullOrUndefined()) {
> +        // We can't set scalars to null or undefined values, skip this

Again, need to explain why this is the right choice...

::: toolkit/components/telemetry/TelemetryScalar.cpp:3291
(Diff revisions 13 - 14)
>        // Unpack the aVal to nsIVariant.
>        nsCOMPtr<nsIVariant> unpackedVal;
>        nsresult rv =
>          nsContentUtils::XPConnect()->JSToVariant(aCx, scalarValue,  getter_AddRefs(unpackedVal));
>        if (NS_FAILED(rv)) {
> +        JS_ClearPendingException(aCx);

Explain why.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3357
(Diff revisions 13 - 14)
>  
>    for (size_t i = 0, n = processes.length(); i < n; i++) {
>      // Get the process name.
>      nsAutoJSString processNameJS;
>      if (!processNameJS.init(aCx, processes[i])) {
> +      JS_ClearPendingException(aCx);

Why?  Especially given that if JS_Enumerate above fails _that_ is treated as a failure condition that immediately returns...

::: toolkit/components/telemetry/TelemetryScalar.cpp:3373
(Diff revisions 13 - 14)
>      }
>  
>      // And its probes.
>      JS::RootedValue processData(aCx);
> -    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)
> -        || !processData.isObject()) {
> +    if (!JS_GetPropertyById(aCx, scalarDataObj, processes[i], &processData)) {
> +      JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3379
(Diff revisions 13 - 14)
> +      continue;
> +    }
> +
> +    if (!processData.isObject()) {
> +      // |processData| should be an object containing scalars. If this is
> +      // not the case, silently skip and try to load the data for the other

Again, explain why, not what.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3388
(Diff revisions 13 - 14)
>  
>      // Iterate through each keyed scalar.
>      JS::RootedObject processDataObj(aCx, &processData.toObject());
>      JS::Rooted<JS::IdVector> keyedScalars(aCx, JS::IdVector(aCx));
>      if (!JS_Enumerate(aCx, processDataObj, &keyedScalars)) {
> +      JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3396
(Diff revisions 13 - 14)
>  
>      for (size_t i = 0, n = keyedScalars.length(); i < n; i++) {
>        // Get the scalar name.
>        nsAutoJSString scalarName;
>        if (!scalarName.init(aCx, keyedScalars[i])) {
> +        JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3403
(Diff revisions 13 - 14)
>        }
>  
>        // Get the data for this keyed scalar.
>        JS::RootedValue keyedScalarData(aCx);
> -      if (!JS_GetPropertyById(aCx, processDataObj, keyedScalars[i], &keyedScalarData)
> -          || !processData.isObject()) {
> +      if (!JS_GetPropertyById(aCx, processDataObj, keyedScalars[i], &keyedScalarData)) {
> +        JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3409
(Diff revisions 13 - 14)
> +        continue;
> +      }
> +
> +      if (!keyedScalarData.isObject()) {
> +        // Keyed scalar data need to be an object. If that's not the case, skip it
> +        // and try to load the rest of the data.

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3417
(Diff revisions 13 - 14)
>  
>        // Get the keys in the keyed scalar.
>        JS::RootedObject keyedScalarDataObj(aCx, &keyedScalarData.toObject());
>        JS::Rooted<JS::IdVector> keys(aCx, JS::IdVector(aCx));
>        if (!JS_Enumerate(aCx, keyedScalarDataObj, &keys)) {
> +        JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3425
(Diff revisions 13 - 14)
>  
>        for (size_t j = 0, m = keys.length(); j < m; j++) {
>          // Get the process name.
>          nsAutoJSString keyName;
>          if (!keyName.init(aCx, keys[i])) {
> +          JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3432
(Diff revisions 13 - 14)
>          }
>  
>          // Get the scalar value as a JS value.
>          JS::RootedValue scalarValue(aCx);
> -        if (!JS_GetPropertyById(aCx, keyedScalarDataObj, keys[i], &scalarValue)
> -            || scalarValue.isNullOrUndefined()) {
> +        if (!JS_GetPropertyById(aCx, keyedScalarDataObj, keys[i], &scalarValue)) {
> +          JS_ClearPendingException(aCx);

Why?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3438
(Diff revisions 13 - 14)
> +          continue;
> +        }
> +
> +        if (scalarValue.isNullOrUndefined()) {
> +          // We can't set scalars to null or undefined values, skip this
> +          // and try to load other scalars.

Why is that the right thing?

::: toolkit/components/telemetry/TelemetryScalar.cpp:3447
(Diff revisions 13 - 14)
>          // Unpack the aVal to nsIVariant.
>          nsCOMPtr<nsIVariant> unpackedVal;
>          nsresult rv =
>            nsContentUtils::XPConnect()->JSToVariant(aCx, scalarValue,  getter_AddRefs(unpackedVal));
>          if (NS_FAILED(rv)) {
> +          JS_ClearPendingException(aCx);

Why?
Attachment #8970283 - Flags: review?(bzbarsky) → review-
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249336

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:278
(Diff revision 14)
> +        || NS_FAILED(TelemetryScalar::UnserializePersistedScalars(jsapi.cx(), scalarData))) {
> +      ANDROID_LOG("MainThreadParsePersistedProbes - Failed to parse 'scalars'.");
> +    }
> +  } else {
> +    // Getting the "scalars" property failed, suppress the exception
> +    // and continue.

Why is this the right choice?

::: toolkit/components/telemetry/geckoview/TelemetryGeckoViewPersistence.cpp:293
(Diff revision 14)
> +      ANDROID_LOG("MainThreadParsePersistedProbes - Failed to parse 'keyedScalars'.");
> +    }
> +  } else {
> +    // Getting the "keyedScalars" property failed, suppress the exception
> +    // and continue.
> +    JS_ClearPendingException(jsapi.cx());

Again, why is this the right choice?
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249250

> Given the last changes, that function is suppressing exceptions itself. Should I suppress here as well?

If the callee guarantees it will never leave an exception, you could just assert that here.
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249332

> Please document why this is the right behavior, as opposed to failing out.  It's not at all obvious why just ignoring an exception here and pressing on with bogus data is the right thing.

Sure! Instead of commenting each single `continue` statement, I've added a top level comment to both functions. The idea here is to load as much data as possible from the file, even in presence of unexpected formats or data corruption. This is coherent with the behaviour of the rest of this module and the other telemetry probe modules (histograms, events). Does this make things clearer?
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249580

::: toolkit/components/telemetry/TelemetryScalar.cpp:3232
(Diff revision 16)
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  // The following block of code attempts to extract as much data as possible
> +  // from the serialized JSON, even in case of light data corruptions: if, for example,
> +  // the data for a single process is corrupted or is in an unexpected for, we press on

small typo.
should be "an unexpected form"

::: toolkit/components/telemetry/TelemetryScalar.cpp:3234
(Diff revision 16)
> +
> +  // The following block of code attempts to extract as much data as possible
> +  // from the serialized JSON, even in case of light data corruptions: if, for example,
> +  // the data for a single process is corrupted or is in an unexpected for, we press on
> +  // and attempt to load the data for the other processes.
> +  for (size_t i = 0, n = processes.length(); i < n; i++) {

Can this use a new-style for loop? `JS::IdVector` should be iterable.

```cpp
for (auto process : processes) { ... }
```

::: toolkit/components/telemetry/TelemetryScalar.cpp:3365
(Diff revision 16)
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  // The following block of code attempts to extract as much data as possible
> +  // from the serialized JSON, even in case of light data corruptions: if, for example,
> +  // the data for a single process is corrupted or is in an unexpected for, we press on

typo. correct: "unexpected form"
Attachment #8970283 - Flags: review+
> The idea here is to load as much data as possible from the file, even in presence of
> unexpected formats or data corruption.

OK. That makes the behavior a lot clearer!
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249688

::: toolkit/components/telemetry/TelemetryScalar.cpp:3235
(Diff revisions 14 - 17)
> -  for (size_t i = 0, n = processes.length(); i < n; i++) {
> +  // The following block of code attempts to extract as much data as possible
> +  // from the serialized JSON, even in case of light data corruptions: if, for example,
> +  // the data for a single process is corrupted or is in an unexpected form, we press on
> +  // and attempt to load the data for the other processes.
> +  JS::RootedId process(aCx);
> +  for (auto processVal : processes) {

auto&, I would think.

::: toolkit/components/telemetry/TelemetryScalar.cpp:3278
(Diff revisions 14 - 17)
>        JS_ClearPendingException(aCx);
>        continue;
>      }
>  
> -    for (size_t i = 0, n = scalars.length(); i < n; i++) {
> +    JS::RootedId scalar(aCx);
> +    for (auto scalarVal : scalars) {

auto&

::: toolkit/components/telemetry/TelemetryScalar.cpp:3374
(Diff revisions 14 - 17)
> -  for (size_t i = 0, n = processes.length(); i < n; i++) {
> +  // The following block of code attempts to extract as much data as possible
> +  // from the serialized JSON, even in case of light data corruptions: if, for example,
> +  // the data for a single process is corrupted or is in an unexpected form, we press on
> +  // and attempt to load the data for the other processes.
> +  JS::RootedId process(aCx);
> +  for (auto processVal : processes) {

auto&

::: toolkit/components/telemetry/TelemetryScalar.cpp:3415
(Diff revisions 14 - 17)
>        JS_ClearPendingException(aCx);
>        continue;
>      }
>  
> -    for (size_t i = 0, n = keyedScalars.length(); i < n; i++) {
> +    JS::RootedId keyedScalar(aCx);
> +    for (auto keyedScalarVal : keyedScalars) {

auto&

::: toolkit/components/telemetry/TelemetryScalar.cpp:3446
(Diff revisions 14 - 17)
>          JS_ClearPendingException(aCx);
>          continue;
>        }
>  
> -      for (size_t j = 0, m = keys.length(); j < m; j++) {
> +      JS::RootedId key(aCx);
> +      for (auto keyVal : keys) {

auto&
Attachment #8970283 - Flags: review?(bzbarsky) → review+
Comment on attachment 8970283 [details]
Bug 1453591 - Add GeckoView persistence for Telemetry Scalars.

https://reviewboard.mozilla.org/r/239082/#review249014

> Do you need the explicit nsCString() bit even though NS_ConvertUTF16toUTF8 is a subclass of nsCString?

Apparently I do :(
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/8e56f01b7ae4
Add persistence support for GeckoView in the Telemetry core. r=bz,chutten,froydnj,gfritzsche,janerik
https://hg.mozilla.org/integration/autoland/rev/47634ee93be3
Add a GetIDForProcessName utility function in TelemetryCommon. r=chutten,janerik
https://hg.mozilla.org/integration/autoland/rev/e5cf4d14019f
Add GeckoView persistence for Telemetry Scalars. r=bz,chutten,gfritzsche
https://hg.mozilla.org/integration/autoland/rev/f128be7d5256
Add a signal for checking when persisted probes loading completes. r=chutten,esawin,janerik
https://hg.mozilla.org/integration/autoland/rev/6bd51139f05a
Add gtest coverage for the persistence logic. r=chutten,froydnj,janerik
Flags: needinfo?(alessio.placitelli)
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ed876c9515d7
Add persistence support for GeckoView in the Telemetry core. r=bz,chutten,froydnj,gfritzsche,janerik
https://hg.mozilla.org/integration/autoland/rev/bda1ce6dee6b
Add a GetIDForProcessName utility function in TelemetryCommon. r=chutten,janerik
https://hg.mozilla.org/integration/autoland/rev/110f98c4f032
Add GeckoView persistence for Telemetry Scalars. r=bz,chutten,gfritzsche
https://hg.mozilla.org/integration/autoland/rev/bd5719b60102
Add a signal for checking when persisted probes loading completes. r=chutten,esawin,janerik
https://hg.mozilla.org/integration/autoland/rev/2fa8aedae0ff
Add gtest coverage for the persistence logic. r=chutten,froydnj,janerik
Attachment #8970283 - Flags: review?(jrediger)
Depends on: 1462825
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: