Closed Bug 1542709 Opened 6 years ago Closed 5 years ago

Enable “metrics.yaml” ingestion from "a-c"'s components in the “probe_scraper”

Categories

(Data Platform and Tools :: Glean: SDK, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:mobilesdk:m8])

Attachments

(3 files)

While talking to Frank, we realized that there might be some problems with ingesting the "metrics.yaml" files defined in app components with respect to their ingestion (and the consequent metric definition in the data views) by the probe-scraper.

Hey Frank, would you kindly expand on this and describe what's the problem? So that we can kick-off the conversation about the solution :) I know that this is probably more of a pipeline bug, I'm happy to transition it there / dupe this against something you already have on file if needed.

Flags: needinfo?(fbertsch)
Priority: -- → P1
Whiteboard: [telemetry:mobilesdk:m8]

Currently, we only parse the applications metrics.yaml and the standard Glean metrics.yaml. That means any additional fields will not be included in the output, so for example, those that come from any component that is also reporting Glean metrics.

To include those, I propose the following:

Requirements:

  • For a component's Glean metric to be reported for any application, that component must be included in the probe-info-service.
  • For an application to include component Glean metrics, they must be defined in a single (configurable) location, e.g. Dependencies.kt. Currently, I don't see a good way to do this with the A-C dependency in Fenix.
  • Glean metrics never change in a BigQuery schema-incompatible way (for example, changing types). This is not a new requirement but is especially important here.
  • The BigQuery tables created for any application using a component will always be the latest version of that component's metrics, even if the dependency is not at the latest version.

Given these two requirements, we can:

  • In probe-scraper:
    1. Parse the metrics.yaml for every application
    2. Parse and include an additional included dependencies file for every application
  • During schema generation:
    1. Create a dependency graph for every application
    2. Include metrics and pings in an application for any dependency it includes

Alessio, can your team look at how we handle dependencies, specifically the Android-Components, and come up with a way for us to parse them? That is the only piece I'm not clear about handling.

Flags: needinfo?(fbertsch) → needinfo?(aplacitelli)
Flags: needinfo?(aplacitelli) → needinfo?(alessio.placitelli)
Assignee: nobody → mdroettboom

We can get the dependencies for an Android app using:

./gradlew app:dependencies --configuration implementation

Which produces something like:

implementation - Implementation only dependencies for 'main' sources. (n)
+--- org.jetbrains.kotlin:kotlin-android-extensions-runtime:1.3.21 (n)
+--- project architecture (n)                                
+--- org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.3.21 (n)    
+--- org.jetbrains.kotlinx:kotlinx-coroutines-core:1.2.0-alpha-2 (n)
+--- androidx.appcompat:appcompat:1.1.0-alpha04 (n)      
+--- androidx.constraintlayout:constraintlayout:2.0.0-alpha4 (n)
+--- io.reactivex.rxjava2:rxandroid:2.1.0 (n)            
+--- io.reactivex.rxjava2:rxkotlin:2.3.0 (n)                               
+--- com.jakewharton.rxbinding3:rxbinding:3.0.0-alpha2 (n)
+--- com.uber.autodispose:autodispose:1.1.0 (n)          
+--- com.uber.autodispose:autodispose-android:1.1.0 (n)
+--- com.uber.autodispose:autodispose-android-archcomponents:1.1.0 (n)
+--- org.jetbrains.anko:anko-commons:0.10.8 (n)         
+--- org.jetbrains.anko:anko-sdk25:0.10.8 (n)            
+--- org.jetbrains.anko:anko-constraint-layout:0.10.8 (n)        
+--- io.sentry:sentry-android:1.7.10 (n)     
+--- com.leanplum:leanplum-core:4.3.1 (n)               
+--- org.mozilla.appservices:places:0.23.0 (n)
+--- org.mozilla.components:concept-engine:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:concept-storage:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:concept-toolbar:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:concept-sync:0.50.0-SNAPSHOT (n)                       
+--- org.mozilla.components:browser-awesomebar:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:feature-downloads:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:browser-domains:0.50.0-SNAPSHOT (n)
+--- org.mozilla.components:browser-icons:0.50.0-SNAPSHOT (n)                                           
...

That seems easier and probably more accurate wrt details than trying to parse the Kotlin dependencies directly.

It seems like kind of a pain to have to check Fenix out, make sure we have an environment with Java and the Android SDK just to run that command, though. I'm pursuing whether we could have this command added to the Fenix build, and then we could just download and parse the build log.

I'm thinking we parse this, and then could look up the location of the corresponding metrics.yaml file for each library that has one through a lookup table that would live in probe-info-service.

:frank -- does that seem sort of on track?

Flags: needinfo?(fbertsch)

It totally does. This seems like the best approach, because it doesn't require the mobile repositories to configure their build files in any particular way.

I agree though that running the application is something we want to avoid. There's a lot of risk of underlying changes, e.g. different Android SDK versions, that could end up with a problem. No reason for us to support those builds in two places - in the repository and in probe-scraper.

Could we first define a structure for the dependency file, and then work on making Fenix include such a file? Options are:

  • Force users to update that file, however it's generated. In mozilla-pipeline-schemas we ensure that the generated files are correct during CI, but don't push them.
  • Force push that a generated version of that file during CI, something aking to this.
  • Include that file as a CI artifact (blegh).

The last one is probably the worst since it would change depending on how they are doing CI.

I'm thinking we parse this, and then could look up the location of the corresponding metrics.yaml file for each library that has one through a lookup table that would live in probe-info-service.

This is where it gets hairy. I initially assumed that all the dependencies would also be their own application, so we would have already parsed their metrics information. But I suppose that may not be the case for dependencies which are not standalone.

I think it may be easier if we still treated them the same, but perhaps just marked them as dependencies-only, so that we didn't deploy BQ tables and schemas for them (since no data will be transferred for them). Instead we'd just use them to load data into other standalone applications.

Flags: needinfo?(fbertsch)

Include that file as a CI artifact (blegh)

This is what was suggested by :nalexander when I asked on the #fenix channel. :frank -- Your concern with this is basically that if they change how they do CI the URL to fetch the artifact would change (and it would be easier to have this info in the GH repository which is less likely to move)?

Force pushing the file to the repo seems doable -- we'd have to ask permission, though ;) I feel like forcing users to update the file is a non-starter, given how complex managing dependencies is already...

As an alternative: would having Fenix's CI upload the file to another location (maybe the probe-scraper repo or another repo for this purpose) be sufficient?

This is where it gets hairy. I initially assumed that all the dependencies would also be their own application, so we would have already parsed their metrics information. But I suppose that may not be the case for dependencies which are not standalone.

Yeah, I think we'll have to (effectively) concatenate the metrics.yaml from the app and all of its libraries, including the glean library itself, which should happen automagically...

Flags: needinfo?(fbertsch)
Type: defect → task

After discussing with Mike, we've agreed that the CI artifact is the easiest solution.

Flags: needinfo?(fbertsch)
Flags: needinfo?(alessio.placitelli)
Component: Telemetry → Glean: SDK
Product: Toolkit → Data Platform and Tools
Version: Trunk → unspecified

Execution plan, from talking to :frank -->

What's important for the schema-generator is to have the superset of all metrics, from all dependencies, from all time. So being really careful about what metrics were active in particular revisions of the app (which is the hard part) it's really necessary.

Therefore: We should generate a dependencies list artifact for all git hashes on master of the app. This dependency info should be parsed by probe_scraper and stored in its output. The schema generator would then use this information for generating the superset of all metrics generated by an app over all time, even if dependencies are later removed.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: