Closed Bug 1832898 Opened 2 years ago Closed 2 years ago

FOG Artifact Build Support appears to not be working on try

Tracking

()

Status:

RESOLVED FIXED

Milestone:

115 Branch

Tracking Flags:

Tracking

Status

firefox115

---

fixed

People

(Reporter: chutten|PTO, Assigned: chutten|PTO)

References

Details

Attachments

(4 files)

Bug 1832898 - Ensure runtime metrics are registered in a stable order r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Ensure runtime metrics are registered at the correct time r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Add static labels support to runtime-registered metrics (JOG) r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Add jogfile.json to the package (only in artifact builds) r?nalexander! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review

Chris H-C :chutten|PTO (back March 17)

Assignee

Description

•

2 years ago

Recent failures like this suggest that FOG's artifact build support that was working (I checked!) on April 20 when we landed bug 1827680 is no longer working.

Not sure what went awry. At least local artifact builds appear unaffected for the time being.

Dan Mosedale (:dmosedale, :dmose)

Updated

•

2 years ago

Blocks: 1825863

Dan Mosedale (:dmosedale, :dmose)

Comment 1

•

2 years ago

My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.

The consequence: even though they turn some tests on autoland orange with Glean errors, unless there are other issues, they should be allowed to merge to mozilla-central anyway: because the CI for mozilla-central a) does full builds, meaning the tests won't be broken in that way on for the m-c merge commit, and, b) provide the artifacts for future autoland builds, so future modifications of these metrics won't have this problem.

Alessio Placitelli [:Dexter]

Comment 2

•

2 years ago

(In reply to Dan Mosedale (:dmosedale, :dmose) from comment #1)

My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.

Chris, are we planning on fixing this (not now) or suppressing the orangy things due to telemetry landing? Feels a bit odd to have autoland lit as a Christmas three because of telemetry

Flags: needinfo?(chutten)

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 3

•

2 years ago

A quick scan of autoland's recent jobs suggests it doesn't actually use artifact builds (look for Ba tasks), so we're clear on this score for now.

Also, I think I have a theory of what's making this happen. If I'm right, I should have this fixed in a jiffy.

Flags: needinfo?(chutten)

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 4

•

2 years ago

Things I've tried:

Looking for MOZILLA_OFFICIAL in the code
- removed it all in bug 1820241 (see specifically bug 1820241 comment 9)
Adding jogfile.json to package-manifest.in so it'll be packaged with the build and unpackaged for the test
- All the tests started being unable to retrieve stored values

Things yet to try:

Figuring out why the tests from the package-manifest.in set failed so strangely. It's such an odd failure mode, maybe there's something obvious that I can detect from an interactive task?
:nalexander's xpt artifacts approach from bug 1666491 which supplies artifacts from the build task to the test task without having to put them in the package

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 5

•

2 years ago

•

Edited

Good thing I covered "figuring out why the tests failed so strangely" because that turned out to be the proper path forward. Something odd's happening and tests are actually failing in my local artifact build now: so let's get them fixed before I add jogfile.json to the package.

This reminds me, I should document how I test JOG on non-artifact builds (important for adding/changing logging and testing fixes in the JOG (which is compiled)):

Add a metric to an artifact build and run ./mach build to generate a jogfile.json in $OBJDIR/dist/bin
Add an xpcshell test that checks that the new metric is present. Ensure it passes.
Switch to a non-artifact .mozbuild (with its own objdir) and revert the metric addition (otherwise it'll get compiled in). Keep the test.
Make necessary changes to compiled code. ./mach build
cp $artifact-objdir/dist/bin/jogfile.json $compiled-objdir/dist/bin
MOZ_LOG="timestamp,sync,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5,jog:5" ./mach xpcshell-test --setpref telemetry.fog.artifact_build=true <the test>. It should pass if it passed in Step 2
Make what changes you need and repeat steps 4-5 until the behaviour is fixed. Then ensure Step 6 still passes.

Really, adding the metric and adding a test for it are unnecessary unless it's the registration of new metrics or the interaction of new metrics with compiled-in metrics that are the problems you're diagnosing... but I'm being complete here.

Guess I should add this to the dev test docs.

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 6

•

2 years ago

Attached file Bug 1832898 - Ensure runtime metrics are registered in a stable order r?perry.mcmanis! — Details

Each process needs to register runtime metrics independently, and each process
needs to do this in a set order to ensure the metric ids align.
HashMap does not iterate in a stable order. BTreeMap does.

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 7

•

2 years ago

Attached file Bug 1832898 - Ensure runtime metrics are registered at the correct time r?perry.mcmanis! — Details

Before replaying IPC it is crucial that runtime-registered metrics are
registered in the parent process. Otherwise the data being replayed will go to
the builtin metric instead of the runtime one (if there's a runtime one).

For test_JOGIPC this also means ensuring that any "true" runtime-registered
metrics must be deterministically registered wrt the ones that are registered
in the test. Otherwise, an artifact build running the test might assign the
same runtime metric id to two different metrics.

Depends on D179237

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 8

•

2 years ago

Attached file Bug 1832898 - Add static labels support to runtime-registered metrics (JOG) r?perry.mcmanis! — Details

The initial implementation of JOG didn't care about static labels. I'm not sure
if that was an oversight on my part or what. But it mostly worked because in
most cases static labeled metrics work a lot like dynamic labeled metrics.

But then test_PingCentre.js began relying on the one behaviour that differs:
recording to a label not in the static list records to __other__
without an invalid_label error.

So now let's support it properly.

Note the use of ordered_labels here is only to get a list that's serializable.
The order doesn't actually matter so long as it's stable in the jogfile.

Depends on D179238

Chris H-C :chutten|PTO (back March 17)

Assignee

Comment 9

•

2 years ago

Attached file Bug 1832898 - Add jogfile.json to the package (only in artifact builds) r?nalexander! — Details

Depends on D179239

Pulsebot

Comment 10

•

2 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a7635332d428 Ensure runtime metrics are registered in a stable order r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/8f6472367725 Ensure runtime metrics are registered at the correct time r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/4d312f13db41 Add static labels support to runtime-registered metrics (JOG) r=perry.mcmanis https://hg.mozilla.org/integration/autoland/rev/a87382da32d2 Add jogfile.json to the package (only in artifact builds) r=nalexander

Noemi Erli[:noemi_erli]

Comment 11

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a7635332d428
https://hg.mozilla.org/mozilla-central/rev/8f6472367725
https://hg.mozilla.org/mozilla-central/rev/4d312f13db41
https://hg.mozilla.org/mozilla-central/rev/a87382da32d2

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox115: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 115 Branch

Mike Hommey [:glandium]

Updated

•

1 years ago

Regressions: 1853874

You need to log in before you can comment on or make changes to this bug.

Bug 1832898 - Ensure runtime metrics are registered in a stable order r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Ensure runtime metrics are registered at the correct time r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Add static labels support to runtime-registered metrics (JOG) r?perry.mcmanis! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1832898 - Add jogfile.json to the package (only in artifact builds) r?nalexander! 2 years ago Chris H-C :chutten\|PTO (back March 17) 48 bytes, text/x-phabricator-request		Details \| Review

Bugzilla

FOG Artifact Build Support appears to not be working on try

Categories

(Toolkit :: Telemetry, defect, P2)

Tracking

()

People

(Reporter: chutten|PTO, Assigned: chutten|PTO)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type