FOG Artifact Build Support appears to not be working on try
Categories
(Toolkit :: Telemetry, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox115 | --- | fixed |
People
(Reporter: chutten|PTO, Assigned: chutten|PTO)
References
Details
Attachments
(4 files)
Recent failures like this suggest that FOG's artifact build support that was working (I checked!) on April 20 when we landed bug 1827680 is no longer working.
Not sure what went awry. At least local artifact builds appear unaffected for the time being.
Comment 1•2 years ago
|
||
My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.
The consequence: even though they turn some tests on autoland orange with Glean errors, unless there are other issues, they should be allowed to merge to mozilla-central anyway: because the CI for mozilla-central a) does full builds, meaning the tests won't be broken in that way on for the m-c merge commit, and, b) provide the artifacts for future autoland builds, so future modifications of these metrics won't have this problem.
Comment 2•2 years ago
|
||
(In reply to Dan Mosedale (:dmosedale, :dmose) from comment #1)
My understanding from chutten is that this can also cause metrics that are being landed for the first time (ONLY!) to cause errors on autoland, because it uses artifact builds and the artifacts come from mozilla-central, which don't yet have these metrics.
Chris, are we planning on fixing this (not now) or suppressing the orangy things due to telemetry landing? Feels a bit odd to have autoland lit as a Christmas three because of telemetry
Assignee | ||
Comment 3•2 years ago
|
||
A quick scan of autoland
's recent jobs suggests it doesn't actually use artifact builds (look for Ba
tasks), so we're clear on this score for now.
Also, I think I have a theory of what's making this happen. If I'm right, I should have this fixed in a jiffy.
Assignee | ||
Comment 4•2 years ago
|
||
Things I've tried:
- Looking for
MOZILLA_OFFICIAL
in the code- removed it all in bug 1820241 (see specifically bug 1820241 comment 9)
- Adding
jogfile.json
topackage-manifest.in
so it'll be packaged with the build and unpackaged for the test- All the tests started being unable to retrieve stored values
Things yet to try:
- Figuring out why the tests from the
package-manifest.in
set failed so strangely. It's such an odd failure mode, maybe there's something obvious that I can detect from an interactive task? - :nalexander's xpt artifacts approach from bug 1666491 which supplies artifacts from the build task to the test task without having to put them in the package
Assignee | ||
Comment 5•2 years ago
•
|
||
Good thing I covered "figuring out why the tests failed so strangely" because that turned out to be the proper path forward. Something odd's happening and tests are actually failing in my local artifact build now: so let's get them fixed before I add jogfile.json
to the package.
This reminds me, I should document how I test JOG on non-artifact builds (important for adding/changing logging and testing fixes in the JOG (which is compiled)):
- Add a metric to an artifact build and run
./mach build
to generate ajogfile.json
in$OBJDIR/dist/bin
- Add an xpcshell test that checks that the new metric is present. Ensure it passes.
- Switch to a non-artifact
.mozbuild
(with its own objdir) and revert the metric addition (otherwise it'll get compiled in). Keep the test. - Make necessary changes to compiled code.
./mach build
cp $artifact-objdir/dist/bin/jogfile.json $compiled-objdir/dist/bin
MOZ_LOG="timestamp,sync,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5,jog:5" ./mach xpcshell-test --setpref telemetry.fog.artifact_build=true <the test>
. It should pass if it passed in Step 2- Make what changes you need and repeat steps 4-5 until the behaviour is fixed. Then ensure Step 6 still passes.
Really, adding the metric and adding a test for it are unnecessary unless it's the registration of new metrics or the interaction of new metrics with compiled-in metrics that are the problems you're diagnosing... but I'm being complete here.
Guess I should add this to the dev test docs.
Assignee | ||
Comment 6•2 years ago
|
||
Each process needs to register runtime metrics independently, and each process
needs to do this in a set order to ensure the metric ids align.
HashMap does not iterate in a stable order. BTreeMap does.
Assignee | ||
Comment 7•2 years ago
|
||
Before replaying IPC it is crucial that runtime-registered metrics are
registered in the parent process. Otherwise the data being replayed will go to
the builtin metric instead of the runtime one (if there's a runtime one).
For test_JOGIPC this also means ensuring that any "true" runtime-registered
metrics must be deterministically registered wrt the ones that are registered
in the test. Otherwise, an artifact build running the test might assign the
same runtime metric id to two different metrics.
Depends on D179237
Assignee | ||
Comment 8•2 years ago
|
||
The initial implementation of JOG didn't care about static labels. I'm not sure
if that was an oversight on my part or what. But it mostly worked because in
most cases static labeled metrics work a lot like dynamic labeled metrics.
But then test_PingCentre.js began relying on the one behaviour that differs:
recording to a label not in the static list records to __other__
without an invalid_label error.
So now let's support it properly.
Note the use of ordered_labels here is only to get a list that's serializable.
The order doesn't actually matter so long as it's stable in the jogfile.
Depends on D179238
Assignee | ||
Comment 9•2 years ago
|
||
Depends on D179239
Comment 10•2 years ago
|
||
Comment 11•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a7635332d428
https://hg.mozilla.org/mozilla-central/rev/8f6472367725
https://hg.mozilla.org/mozilla-central/rev/4d312f13db41
https://hg.mozilla.org/mozilla-central/rev/a87382da32d2
Description
•