Closed Bug 1832898 Opened 2 years ago Closed 2 years ago

FOG Artifact Build Support appears to not be working on try

Categories

(Toolkit :: Telemetry, defect, P2)

defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox115 --- fixed

People

(Reporter: chutten|PTO, Assigned: chutten|PTO)

References

Details

Attachments

(4 files)

Recent failures like this suggest that FOG's artifact build support that was working (I checked!) on April 20 when we landed bug 1827680 is no longer working.

Not sure what went awry. At least local artifact builds appear unaffected for the time being.

My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.

The consequence: even though they turn some tests on autoland orange with Glean errors, unless there are other issues, they should be allowed to merge to mozilla-central anyway: because the CI for mozilla-central a) does full builds, meaning the tests won't be broken in that way on for the m-c merge commit, and, b) provide the artifacts for future autoland builds, so future modifications of these metrics won't have this problem.

(In reply to Dan Mosedale (:dmosedale, :dmose) from comment #1)

My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.

Chris, are we planning on fixing this (not now) or suppressing the orangy things due to telemetry landing? Feels a bit odd to have autoland lit as a Christmas three because of telemetry

Flags: needinfo?(chutten)

A quick scan of autoland's recent jobs suggests it doesn't actually use artifact builds (look for Ba tasks), so we're clear on this score for now.

Also, I think I have a theory of what's making this happen. If I'm right, I should have this fixed in a jiffy.

Flags: needinfo?(chutten)

Things I've tried:

Things yet to try:

  • Figuring out why the tests from the package-manifest.in set failed so strangely. It's such an odd failure mode, maybe there's something obvious that I can detect from an interactive task?
  • :nalexander's xpt artifacts approach from bug 1666491 which supplies artifacts from the build task to the test task without having to put them in the package

Good thing I covered "figuring out why the tests failed so strangely" because that turned out to be the proper path forward. Something odd's happening and tests are actually failing in my local artifact build now: so let's get them fixed before I add jogfile.json to the package.

This reminds me, I should document how I test JOG on non-artifact builds (important for adding/changing logging and testing fixes in the JOG (which is compiled)):

  1. Add a metric to an artifact build and run ./mach build to generate a jogfile.json in $OBJDIR/dist/bin
  2. Add an xpcshell test that checks that the new metric is present. Ensure it passes.
  3. Switch to a non-artifact .mozbuild (with its own objdir) and revert the metric addition (otherwise it'll get compiled in). Keep the test.
  4. Make necessary changes to compiled code. ./mach build
  5. cp $artifact-objdir/dist/bin/jogfile.json $compiled-objdir/dist/bin
  6. MOZ_LOG="timestamp,sync,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5,jog:5" ./mach xpcshell-test --setpref telemetry.fog.artifact_build=true <the test>. It should pass if it passed in Step 2
  7. Make what changes you need and repeat steps 4-5 until the behaviour is fixed. Then ensure Step 6 still passes.

Really, adding the metric and adding a test for it are unnecessary unless it's the registration of new metrics or the interaction of new metrics with compiled-in metrics that are the problems you're diagnosing... but I'm being complete here.

Guess I should add this to the dev test docs.

Each process needs to register runtime metrics independently, and each process
needs to do this in a set order to ensure the metric ids align.
HashMap does not iterate in a stable order. BTreeMap does.

Before replaying IPC it is crucial that runtime-registered metrics are
registered in the parent process. Otherwise the data being replayed will go to
the builtin metric instead of the runtime one (if there's a runtime one).

For test_JOGIPC this also means ensuring that any "true" runtime-registered
metrics must be deterministically registered wrt the ones that are registered
in the test. Otherwise, an artifact build running the test might assign the
same runtime metric id to two different metrics.

Depends on D179237

The initial implementation of JOG didn't care about static labels. I'm not sure
if that was an oversight on my part or what. But it mostly worked because in
most cases static labeled metrics work a lot like dynamic labeled metrics.

But then test_PingCentre.js began relying on the one behaviour that differs:
recording to a label not in the static list records to __other__
without an invalid_label error.

So now let's support it properly.

Note the use of ordered_labels here is only to get a list that's serializable.
The order doesn't actually matter so long as it's stable in the jogfile.

Depends on D179238

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a7635332d428 Ensure runtime metrics are registered in a stable order r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/8f6472367725 Ensure runtime metrics are registered at the correct time r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/4d312f13db41 Add static labels support to runtime-registered metrics (JOG) r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/a87382da32d2 Add jogfile.json to the package (only in artifact builds) r=nalexander
Regressions: 1853874
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: